Monday, May 27, 2013

Mapping Controversy in Wikipedia


Wikipedia, the collection of 37 million articles that anyone can edit, is defined by conflict. The ability for anyone to shape this global repository of knowledge inevitable means that we are presented with fascinating, shocking, and often hilarious discussions on the talk pages of articles. Just check out the talk pages of articles about Barack Obama, the Persian Gulf, and Freddie Mercury (or, if you really want to waste an afternoon, dive into Wikipedia's collection of 'lamest edit wars').  

So, a natural question for my colleagues (Taha Yasseri, Anselm Spoerri, and János Kertész) and I was whether we can model and map the controversiality of Wikipedia articles. Does controversy have distinct geographies? It turns out that it does.


To quantify the controversiality of an article based on its editorial history, we focused on “reverts”, i.e. when an editor undoes another editor’s edit completely. We counted all of the reverts in the history of every article and gave a higher weight to editors that revert each other repeatedly. To validate everything, we measured the classifier against human judgement. If you want to read more about the method check our articles here or here

This all allowed us to get a sense of what the most controversial articles in each Wikipedia language editions are.  In English, the most controversial article is George W. Bush, followed by Anarchism, followed by Muhammed. Whereas in French, the top-three most controversial articles are Ségolène RoyalUFOs, and Jehovah's Witnesses (we're certain there are some good jokes hiding in the orders of these lists). For the full list of top-10 controversial articles in ten languages, check out our in press chapter on the topic (or look at the complete lists here and an interactive visualisation of Wikipedia conflicts at this link). But the short version is that at the top of the lists in multiple languages we see articles related to religion, politics, and football; i.e. pretty much exactly what you would expect people to be arguing about.

But what about the geography of these controversial articles in different languages? Where do we see the most controversial articles in different languages? Below is the full list of maps that we created:















What do these maps tell us? First, we see an interesting amount of difference between the various language editions of Wikipedia. Some of the smaller Wikipedias have a high-degree of self-focus in articles that are characterized by the greatest degree of conflict (check out some of Brent Hecht's work for more on this). For instance, we see articles with the highest amount of conflict in the Czech and Hebrew Wikipedias being about the Czech Republic and Israel respectively. 

Even when looking at large languages that are primarily spoken in more than one country, we are able to see that a significant amount of self-focus occurs (look at the Arabic and Spanish maps of conflict for examples of this). 

The interesting exception to this rule is the Middle East. All languages in our sample apart from Hungarian, Romanian, Japanese, and Chinese actually include articles in Israel as some of those characterised by a large amount of conflict. 

Also, worth pointing out is the fact that we see significant differences in the geographic topics that generate the most conflict. The articles in Japanese that generate the most conflict are not only all located in Japan (and are all educational institutions). The Portuguese articles that generate the most conflict are similarly all located in Brasil (the world’s largest Portuguese-speaking nation), with four out of the top five conflict scores being about football teams. 

Within our sample, we actually only see the English, German, and French Wikipedias with a significant amount of diversity in the topics and patterns of conflict in geographic articles. This probably indicates the less significant role that specific editors and arguments play in these larger encyclopaedias. 

Ultimately by visualizing the geography of conflict in Wikipedia, we're able to see both topics that appear to have cross-linguistic resonance (e.g. Arab-Israeli conflict), and those of more narrow interest such as the Islas Malvinas/Falkland islands article in the Spanish Wikipedia.

These maps therefore offer a window into not just the topics that different language communities are interested in, but also the topics that seem worth fighting about.



To read more about conflict and Wikipedia:


Yasseri, Taha, Spoerri, Anselm, Graham, Mark and Kertesz, Janos, (2014) The Most Controversial Topics in Wikipedia: A Multilingual and Geographical Analysis. In: Fichman P., Hara N., editors, Global Wikipedia: International and cross-cultural issues in online collaboration. Scarecrow Press. Available at SSRN.

Graham, M., M. Zook., and A. Boulton. 2012. Augmented Reality in the Urban Environment: contested content and the duplicity of code. Transactions of the Institute of British Geographers. DOI: 10.1111/j.1475-5661.2012.00539.x

Graham, Mark, The Virtual Dimension (2013). Global City Challenges: Debating a Concept, Improving the Practice, M. Acuto and W. Steele. Available at SSRN: http://ssrn.com/abstract=2212824

Yasseri, T., Sumi, R., Rung, A., Kornai, A., and Kertész, J. (2012) Dynamics of conflicts in Wikipedia. PLoS ONE 7(6): e38869.

Török, J., Iñiguez, G., Yasseri, T., San Miguel, M., Kaski, K., and Kertész, J. (2013) Opinions, Conflicts and Consensus: Modeling Social Dynamics in a Collaborative Environment. Physical Review Letters 110 (8).

7 comments:

Jeremy said...

I wonder how the page on the Peters projection would rate under your scheme? Or, more generally, what are the most controversial topics in geography, philosophy, etc.?

Mark Graham said...

You can find all the scores here (http://wwm.phy.bme.hu/) (Peters projection score is zero). We didn't use geography or philosophy as categories (we probably should have), but the data are open in case you want to dive in?

metasonix said...

Editwarring by pro-Israel extremists is one of Wikipedia's worst problems. The scope of it is staggering, and has been little studied to date.

How do I know this? I’m co-writing a book about the history and development of Wikipedia. And the material we’ve found doesn’t bode well for its users, nor for its future. We keep our notes for the book on a private wiki (yes, MediaWiki does have good, legitimate applications, I simply don’t think Wikipedia is one of them):
http://www.logicmuseum.com/x/index.php?title=Main_Page

If you'd like to know more, come to the Wikipediocracy forum. The regulars will cheerfully tell you horrifying things about Wikipedia.
http://wikipediocracy.com/forum/

SJ said...

A lovely concept, and mapping tool. Thank you for sharing this. Speaking of projections: how did you choose which map of the world to use?

Mark Graham said...

Thanks for the comment. It uses the Robinson projection (which I think works well for choropleth maps of the world).

LorenAmelang said...

What does it mean that there are conflict dots, even a big red 5MM one, in Antarctica? And little white ones in random spots in the oceans? And seemingly along the equator and prime meridian? And in some maps entire countries are white... Is this explained somewhere I'm not looking?

Elitre said...

The grand total of the articles is not right. Please see https://bugzilla.wikimedia.org/show_bug.cgi?id=50556 .