At the recent #AAG2014 alt.conference on the geoweb and ‘big data’, I was asked to serve as a panellist at the end of the day: summarising some of the day’s themes, and reflecting on how they speak to future directions in the discipline. The responses that I prepared are below. Forgive the scattered nature of the notes, as they were hastily put together.
We were asked to engage with the lightning talks and the ways that they factor into the potential future directions of research. Let me go through a few themes that emerged.
First, I’m not sure we’re all talking about the same thing when we speak about 'big data' and the geoweb. This isn’t necessarily a problem, but I’d hope that future conversations could focus more on what exactly the ‘geoweb’ is? what exactly do we mean when we speak about it? Where are the boundaries between the web and the geoweb? (I’m not sure I clearly see them). Where are the boundaries between the geoweb and what we might think of as the underlying/offline/material geo that seems to underpin, augment, or inform it? I’m also not sure I clearly see those boundaries in part because of the ways that place is always transduced: constantly remade, and reenacted. So, whilst I don’t think we have to agree on any definitions, I do think that we should avoid taking for granted some of the assumptions wrapped into these very powerful terms.
Second, we hear a lot about the need for more mixed methods research. Yes. Absolutely. But I also think that we need to avoid creating caricatures to argue against. Is there anyone out there who is actually saying that big data can answer all facets of all societal questions? How then should we best channel our energies into creating, carrying out, and enacting those hybrid approaches then?Jin-Kyu and others offered us some helpful beginnings here.
Third, it’s nice to see the beginnings of some more cross-pollination between geography, computer science, information studies, internet studies, and other social sciences. There is definitely a lot that we can contribute as geographers, but we also need to make sure that we aren’t reinventing the wheel. So, for instance, we often talk about crowdsourcing or vgi, but there’s a lot of work being done in information studies, psychology, and internet studies trying to understand motivations for crowdsourcing. we could do more to allow that work to cross-over to geography and geoweb research. And then hopefully feed back into it.
Fourth, a lot of our conversations about big data often seem to forget the truly massive amount of paid human labour that goes into the filtering, sorting, cleaning, manipulating, and managing of it. We seem to talk about big data as something that pings around between sensors, datasets, machines, and algorithms. But one of the things that I’m working on is looking at those digital sweatshops, the micro workers, the click workers, the gold farmers - those labourers in the background that are keeping our networks chugging along. And I hope we’ll start to see more of this work - remembering that automation is often an illusion. What should we be asking about those millions of workers in the shadows; doing unorganised; low-paid; alienated work - and making many of our ‘big data’ ecosystems function.
Fifth, building on Jeremy’s comments this morning, I wonder if we should be leading a charge to address - what I think is one of the most pressing issues of our time - concerns about privacy. I think that - as geographers - we’re maybe somewhat unwisely ceding this space to computer scientists - who do tend to be very informed on the topic - and politicians - who, well, don’t tend to be informed on the topic. What should we be doing and saying and researching as geographers, to draw on our expertise and the strengths of our discipline to make a difference - and I want to emphasise - make a difference - in this new world of always-on tracking and monitoring and the datafication of everything.
But how do we also make sure that privacy isn’t used as an excuse for the wholesale locking away of social data by large companies - meaning that we can’t use those data to address the social and human questions that really matter. So, where do we stand on the transparency/privacy spectrum? And, again, what should we be doing about it?
Sixth, a lot of people today spoke about focusing on what, who, and where is left out. I very much agree that this is a crucial first step. Castells puts it well, when he says that "the costs of exclusion from networks increases faster than the benefits of inclusion in the network.” And this is an area of work that we tend to do very well as geographers (this is a question that people in other disciplines often seem to miss), but it is precisely that - a first step. How can we move beyond it? What can or should we do about it? If we establish that the digital layers that augment place are inherently uneven, unrepresentative, and imbalanced, what can we do with that knowledge; what should we do with that knowledge?
We should also think about the flip side of this issue. Whilst there’s been a lot of focus on where there isn’t enough data; or where data might not be able to capture the complexities of any given situation. What about contexts where we have too much data? Some of the talks guided us through methods for dealing with ‘big data'; but we probably need more of this. Should we be having more conversations about what to actually do with it? It would be nice to have conversations about cluster computing, graph databases, agent-based models and other methods for grappling with unmanageable volumes of data. Yes, we always need to remember what those data leave out; but unless we want to abandon the whole big data project we should also be - critically - trying to figure out what those datasets do tell us about society - and how they help us to answer the big questions that we need to ask.
Finally, let’s keep our eyes on the prize. Let make sure that we’re asking the questions that matter, and not being too driven by just what data are available. Let's make sure our research continues to focus on questions about things like inequality, power, voice, control, and human welfare. And I say continue because I was very impressed by the topics that the presentations today were tackling.
We can make sure that we’re shaping not just the questions being asked, but also the data being collected. Some of this means doing things like always being explicit that there is never any such thing as ‘raw data’. Data are always socially, and humanly constructed. And recognising that, in many ways, we’re the privileged ones in this room. We have the knowledge, the skills, and desire to be the ones doing the constructing and doing the shaping of data.
A few weeks ago, Tony Benn - who was a British Labour party politician - passed away. He famously had a set of five questions that he said that we should always ask any powerful person: "What power have you got? Where did you get it from? In whose interests do you exercise it? To whom are you accountable? And how can we get rid of you?” Well I wonder if we shouldn’t adopt those questions to the data intermediaries, systems, platforms, and algorithms that we’re dealing with. "What power have you got? Where did you get it from? In whose interests do you exercise it? To whom are you accountable? And how can we get rid of you?” It’s been nice to see a lot of the work on big data and the geoweb tackling these questions, and I hope we see more of it in years to come.