Wednesday, October 31, 2012

data shadows of a hurricane

My colleagues Adham Tamer, Ning Wang, Scott Hale and I have been collecting tweets containing the terms "flood" and "flooding" in order to examine how twitter usage in the context of Hurricane Sandy might reflect lived experiences. In other words, we are examining the human and social data shadows of an innately physical/material event in order to see what it is that they tell us. 

Our initial intent was also to map references to flooding in both English and Spanish in order to explore whether we see significant geographic and linguistic differences in social media reactions to the hurricane. With the rise of crisis mapping and twitter analysis, we reasoned that it would be important to note any potential differences between English and Spanish speakers (Spanish being a native language to millions of people on the US East Cost).    

The maps reveal a few important findings. First, tweets referencing flooding are almost exactly where you would expect them to be. i.e. in the path of the hurricane. But it is interesting that so few people elsewhere in the US are tweeting about some of the unprecedented flooding on the East Coast. In this sense, the geography of data shadows drawn from Twitter appear to be quite effective at reflecting experiences of the storm. The hurricane, in essence, leaves a digital trail.

Second, we see that these data become less useful if we want to draw insights at a finer scale than the county. The data are good at reflecting the broad trajectory of the hurricane, but perhaps less useful for more detailed insights. For instance, it is unclear whether the large number of tweets that we pick up in New York City, as compared to other places, reflects the scale of devastation to the city or just means that New Yorkers are more apt to tweet about such an event.

In other words, it is the absences on this map that are almost more interesting than the mapped results. The lack of published content in Spanish means that we are necessarily only including published content from English speakers in these representations. The absences in the rest of the country are also revealing. Why exactly are so few people in Kentucky, Missouri, Wisconsin etc. tweeting about East Coast flooding? Is it because the act of tweeting about such an event is only really likely to be performed by people in situ, experiencing the storm? Are people outside of the hurricane path simply not that interested in the event? Or should we simply avoid trying to make inferences from twitter data other than recognising the broad patterns that large events leave on the digital landscape?

1 comment:

Winston Smith said...

Thanks for the interesting contribution. Now I just wonder, how did you get that high-resolution location information from Twitter? You aggregated the location data to the county level, but obviously, as your map shows, you collected locational information in a much finer granulation than the county level. Are that many people in the US geo-enabled, and if yes, what method did you use to georeference them? I am just wondering because some studies show that actually only a very small percentage of Twitter users are broadcasting their current location. Hope that wasn't too many questions.. ;-) Thanks!