Analyze Twitter across languages and countries

The use of technology as telecom has made our world small. A long time ago you were rather special when you had a long distance phone call with the US, but today you could not even impress your grandma with this.

Via internet and smart phones our world has become even smaller. You can easily have a Skype chat with some one in Japan, or sent messages back and forth via Twitter with South Africa without any additional cost.

The Web is a multilingual place and the percentage of English is declining

All this chatter holds information that – when publicly published – can be analyzed. However, one issue remains that technology hasn’t been able to solve. We all speak different languages. We can only chat, tweet and talk with each other if we speak and understand the same language. And the Web is multilingual.

About half of the people that use Twitter don’t speak English and if we ignore them we ignore half the (Twitter)world. According to Semiocast, the top 5 languages on Twitter are English, Japanese, Portuguese, Malay and Spanish. These account for about 83% of total tweets. All other languages make up for the remaining 17% (see screenshot below from Semiocast [1]).

English is by far the most used language on Twitter. At the beginning of 2009 about two third of Tweets were English, so this share has declined. As Twitter is expanding to other countries the percentage of English is likely to drop even further.

Another investigation published in 2011 by Hong et al [2], was based on a far larger amount of tweets than the Semiocast study (62 million and 2.8 million respectively) and identified 104 different languages of which English accounted for 51% of the tweets. The top 10 language communities accounted for 95.6% of all the tweets.

Understand how consumers in multiple countries feel about your company and your competitors across multiple languages

If you want to be able to extract the sentiment or buzz from Twitter you’ll have to use a tool that’s capable of handling a multitude of languages; not just English. However you don’t need to be able to translate every exceptionally rare language on Twitter.

Now suppose you’re a multinational or you’re a small company with ambitions to go and seek new ventures abroad. Your market research will then have to be multilingual. To be able to compare the analysis results across countries you’ll have to translate each language to one standard, which can be English. What you don’t want is an analysis tool for each country with separate logins since you’ll never be able to compare results.

With BuzzTalk 33 languages from 100 different countries are translated to English by using Systran translation technology. Each blogpost or news article is stored in both it’s original language and the English translation. This way you can analyze and compare across countries and across languages.

Don’t limit yourself to one country or one language when you’re company is ambitious. Make sure you choose a tool that is able to analyze Twitter across languages.


  1. Half of messages on Twitter are not in English, Japanese is the second most used language. Semiocast press release 2010 [pdf].
  2. Language matters in Twitter: A large scale study. Lichan Hong et al. Association for the Advancement of Articifical Intelligence 2011 [pdf].

Photo credit



Posted in:

Leave a Reply

Your email address will not be published. Required fields are marked *

The reCAPTCHA verification period has expired. Please reload the page.

This site uses Akismet to reduce spam. Learn how your comment data is processed.