Monday, May 27, 2013
Mapping Controversy in Wikipedia
Wikipedia, the collection of 37 million articles that anyone can edit, is defined by conflict. The ability for anyone to shape this global repository of knowledge inevitable means that we are presented with fascinating, shocking, and often hilarious discussions on the talk pages of articles. Just check out the talk pages of articles about Barack Obama, the Persian Gulf, and Freddie Mercury (or, if you really want to waste an afternoon, dive into Wikipedia's collection of 'lamest edit wars').
So, a natural question for my colleagues (Taha Yasseri, Anselm Spoerri, and János Kertész) and I was whether we can model and map the controversiality of Wikipedia articles. Does controversy have distinct geographies? It turns out that it does.
To quantify the controversiality of an article based on its editorial history, we focused on “reverts”, i.e. when an editor undoes another editor’s edit completely. We counted all of the reverts in the history of every article and gave a higher weight to editors that revert each other repeatedly. To validate everything, we measured the classifier against human judgement. If you want to read more about the method check our articles here or here.
This all allowed us to get a sense of what the most controversial articles in each Wikipedia language editions are. In English, the most controversial article is George W. Bush, followed by Anarchism, followed by Muhammed. Whereas in French, the top-three most controversial articles are Ségolène Royal, UFOs, and Jehovah's Witnesses (we're certain there are some good jokes hiding in the orders of these lists). For the full list of top-10 controversial articles in ten languages, check out our in press chapter on the topic (or look at the complete lists here and an interactive visualisation of Wikipedia conflicts at this link). But the short version is that at the top of the lists in multiple languages we see articles related to religion, politics, and football; i.e. pretty much exactly what you would expect people to be arguing about.
But what about the geography of these controversial articles in different languages? Where do we see the most controversial articles in different languages? Below is the full list of maps that we created:
What do these maps tell us? First, we see an interesting amount of difference between the various language editions of Wikipedia. Some of the smaller Wikipedias have a high-degree of self-focus in articles that are characterized by the greatest degree of conflict (check out some of Brent Hecht's work for more on this). For instance, we see articles with the highest amount of conflict in the Czech and Hebrew Wikipedias being about the Czech Republic and Israel respectively.
Even when looking at large languages that are primarily spoken in more than one country, we are able to see that a significant amount of self-focus occurs (look at the Arabic and Spanish maps of conflict for examples of this).
The interesting exception to this rule is the Middle East. All languages in our sample apart from Hungarian, Romanian, Japanese, and Chinese actually include articles in Israel as some of those characterised by a large amount of conflict.
Also, worth pointing out is the fact that we see significant differences in the geographic topics that generate the most conflict. The articles in Japanese that generate the most conflict are not only all located in Japan (and are all educational institutions). The Portuguese articles that generate the most conflict are similarly all located in Brasil (the world’s largest Portuguese-speaking nation), with four out of the top five conflict scores being about football teams.
Within our sample, we actually only see the English, German, and French Wikipedias with a significant amount of diversity in the topics and patterns of conflict in geographic articles. This probably indicates the less significant role that specific editors and arguments play in these larger encyclopaedias.
Ultimately by visualizing the geography of conflict in Wikipedia, we're able to see both topics that appear to have cross-linguistic resonance (e.g. Arab-Israeli conflict), and those of more narrow interest such as the Islas Malvinas/Falkland islands article in the Spanish Wikipedia.
These maps therefore offer a window into not just the topics that different language communities are interested in, but also the topics that seem worth fighting about.
To read more about conflict and Wikipedia:
Yasseri, Taha, Spoerri, Anselm, Graham, Mark and Kertesz, Janos, (2014) The Most Controversial Topics in Wikipedia: A Multilingual and Geographical Analysis. In: Fichman P., Hara N., editors, Global Wikipedia: International and cross-cultural issues in online collaboration. Scarecrow Press. Available at SSRN.
Graham, M., M. Zook., and A. Boulton. 2012. Augmented Reality in the Urban Environment: contested content and the duplicity of code. Transactions of the Institute of British Geographers. DOI: 10.1111/j.1475-5661.2012.00539.x
Graham, Mark, The Virtual Dimension (2013). Global City Challenges: Debating a Concept, Improving the Practice, M. Acuto and W. Steele. Available at SSRN: http://ssrn.com/abstract=2212824
Yasseri, T., Sumi, R., Rung, A., Kornai, A., and Kertész, J. (2012) Dynamics of conflicts in Wikipedia. PLoS ONE 7(6): e38869.
Török, J., Iñiguez, G., Yasseri, T., San Miguel, M., Kaski, K., and Kertész, J. (2013) Opinions, Conflicts and Consensus: Modeling Social Dynamics in a Collaborative Environment. Physical Review Letters 110 (8).