Wednesday, May 23, 2012

Adieu French: comparing English and French Wikipedias

The English and French Wikipedias are the world's first and third largest versions of the encyclopedia (containing 3.9 and 1.3 million articles respectively).   

I thought that it might be instructive to compare the geographic coverage of the two. Even though there is three times as much content in English than French, one might assume that there are plenty of parts of the world in which people are more likely to annotate or augment space with French content. 

The results are contained in the map below:

We ultimately see only a few countries in which there is more French content: France (of course), Belgium, Luxembourg, the Francophone parts of the Maghreb (Algeria, Morocco, and Tunisia), the DRC, Senegal, and surprisingly Bosnia, Montenegro, and Kosovo.

You would expect the first eight countries on the list to have more French content than English, but there seems to be no obvious reason why Bosnia, Montenegro, and Kosovo have more French-language information about them. Then again, there is not necessarily a reason why there should be more English-language content in every other country in which neither French nor English is the primary language spoken.

Also interesting is that much of the rest of the Francophone world has more English-language content then French. Madagascar, Haiti, Cameroon, Mali etc. all have more written about them in English than French.

What does this map tell us? We know that the number of Wikipedia articles about a place isn't necessarily a great proxy for broader social or cultural relationships and patterns (e.g. the example of the heavy focus on Turkey in the Swahili Wikipedia). But perhaps these patterns of attention do still tell us something about the importance of English vs. French in some of these places. Rwanda, for instance, has more English-language content: a fact that reflects the country's shift into the Anglophone sphere.

Perhaps in much of the rest of the Francophone world we are also seeing a similar (although likely less-pronounced) shift towards use of English as a means of non-local communication and local representation to a broader audience.

I'd welcome any further thoughts or questions....

(for more information about this work, have a look at the other blog posts I've written about the geographies of Wikipedia)


Julien said...

On the French Wikipedia there is a very active editor specialized on the Balkan. He created iirc 14k articles (cities, rivers, people…) on this topic so that explains the focus. He also created articles about Serbia but I guess the English wp is also quite populated on this topic so not visible on the map.

Did you guys use weight to compare the number of articles or do you compare absolute values? The naive weight, if one looked at the total number of articles of both wikipedias, would be three. It could show other trends. But that weight wouldn't tell the whole story: once you've covered some topic pretty extensively you don't grow as fast as when the topic is not covered at all (I see this intuitively as a log curve).

Jacques said...

There may also be historical reasons to the French Wikipedia being more developed than the English one on the Balkans. I think it has to do with the role of France in the Yugoslav Wars.

1. France with its then-President Chirac is considered to have been at the forefront of the creation of the Rapid Reaction Force, one of the main instruments of the NATO military response. It led to the liberation of Sarajevo from the forces of Karadzic and allowed humanitarian aid to reach the city. Chirac was recently made honorary citizen of Sarajevo for those reasons.

2. On the other had, the French laisez-faire (and NATO's, but particularly from Europeans) regarding the Sebrenica massacre has been a very hot topic during the last decade. France has been at the centre since the commanding officer was a French general (Janvier) and that France never conducted rigorous inquiries into the matter, unlike the Dutch.

3. The bombing of Sarajevo in 1992 is a very symbolic event for Europeans, since it was the first time since WII that Europeans were bombing other Europeans. This is particularly true for French people as France was at the centre of WWII and also a "friend" of Serbia since WWI (celebrated in Serbia, and notably Belgrad, by many memorial monuments, which were then trashed during the Yugoslav Wars).

bouchecl said...

It would have been more interesting to compare Wikipedias with a similar size (German vs French) rather than compare the huge English wiki with the (much smaller) French one.

I mainly contribute to the French WP (33k+ edits) but I will translate some of my work for an English speaking audience (I have ~4K edits in English).

The larger edition will benefit from the spillover effect of people like me who can write in English as a second language.

As topic for further research, I would be interested to read a comparison of a multilingual country coverage by region (province, canton, state) and language. Switzerland or Canada would be obvious targets for such a research.

SammyDay said...

Maybe this map tell us... nothing. Why not ?

Scott said...

It's also interesting to note that beyond the raw numbers there are often differences in what topics are written about in each language. The French edition still has many articles that have no English equivalent (I would guess even in some areas where English has a larger number of raw articles.)

Here's a map with German, Spanish, and Portuguese that excludes English. I didn't include French, but it wouldn't be so hard to add it.

For both these maps, I think it would be useful in the future to use shades of the colors to indicate how great the differences is between the article counts (1 more English than French article vs 1000 more English than French articles about a country is a large difference not shown in any of the maps I've seen/made).