Back on September 15, 2010, it was «R U OK?» Day. This is a national day of action which is designed to raise awareness of suicide by encouraging people to reach out and make contact with others by asking «Are you OK?»
We were commissioned by the R U OK? organisation to create a visualisation that highlighted the connections people were making throughout the day, and after a number of rounds of brainstorming (and budget cuts) we chose to highlight the connections that were being made over Twitter.
Because of fears about subject matter we weren’t allowed to highlight content contained in any tweets, so to gather our data we performed a search for all tweets in Australia that are directed at someone (@replies), geocoded both ends of the conversation, recorded the timestamp of the message, and map this connection in time and space. For each location we also perform frequency analysis on the tweets and provide a tag cloud of the most used words (available if you hover over a city).
All of this is done in realtime, so you can see actual conversations as they are formed. Beyond the realtime aspect, we also give you two weeks worth of historical data so that you can see the rise and fall of activity throughout the day and across weeks. Each of these hourly periods also has a unique pattern of replies that crisscross the country, showing up when people from other cities strike up conversation with one another.
It’s interesting to see the times of day when people are on Twitter the most, and also see the days of the week when activity is high. On the actual R U OK? Day we saw a dramatic (~2x) spike in @reply activity which we could hopefully attribute to the spread of R U OK? Day.
The Achilles heel of this visualisation, however, is the amount of processing that it has to do. Firstly, it fetches a search from the Twitter API several times a minute (you have to be careful to stay within the rate limits), then for each tweet it has to check whether the sender and receiver have location data, and if they do, geocode both locations.
Geocoding is an expensive operation and because I’m geocoding roughly 200 points a minute, we quickly fall afoul of hourly geocoding rate limits. To counteract this, I have it setup to try geocoding via Google Maps, and than fail over to Yahoo! once we hit the limit. Somewhere along the way I implemented caching of geocoding results and we now have a pretty handy database of geocoded strings.
The price for daisy chaining all these APIs is fragility. It’s a lot of work to keep this system up, and as a result I’ll be taking the whole thing offline in about a week. (Also, it’s costing me a fair bit in App Engine hosting charges.)
Still, if you want to check it out, you can for now. Thereafter, I’ll replace it with a video of what it once was.
Update 2011-04-19: I’ve replaced it with a video of what it once was.