Wiki learns a lesson in Bengali

Read more below

By OUR BUREAU With inputs from New York Times News Service, Sudeshna Banerjee and Chandrima S. Bhattacharya
  • Published 14.07.10

July 13: Wikipedia, the user-edited online encyclopaedia, has realised that its future lies in languages such as Bengali but the symbol of the triumph of amateurism is also finding out that Bengalis cannot be treated as amateurs.

After years of spectacular growth, frequent derision and bitter controversy, Wikipedia is still growing but less quickly. The solution the Wikimedia Foundation, the non-profit organisation which operates Wikipedias in more than 250 languages, has come up with is to target the under-served populations of the globe.

The focus will be on the area that Sue Gardner, the foundation’s executive director, calls the “Global South” — India, South America, West Asia. The foundation is hoping to raise more than $20 million in donations for the mission.

A new board member from India, Bishakha Datta, a filmmaker and advocate for women’s issues, has been appointed despite having little familiarity with Wikipedia because of her experience in running a non-profit organisation in India.

Her inclusion also signals the foundation’s vision of an encyclopaedia that is truly comprehensive because its contributors are much more diverse in sex, age and region — as opposed to the heavily male, young and western group that edits it now.

The key reason the languages of the “Global South” are under-served is lack of contributors. Anyone can create an account and contribute but the community monitors a contributor’s work and if the person proves trustworthy, he gradually gets administrator rights. Because search engines are useful only when there is an abundance of researched and reliable material, the company has paid translators and offered Google translation kits to foster content in many languages that are under-represented on the Internet, including many in South Asia.

Enter, the Bengali, the language and the stickler.

The Bengali Wikipedia took great umbrage and deleted Google-generated content because the translated material simply did not meet its standards.

Ragib Hasan, one of the six administrators of the Bengali Wikipedia, cited a sentence in an article called Polashir Judhho (Battle of Plassey). Siraj-ud-Daulah is spoken of thus: Akoshmik utsahe dhabomanotar karone oti sohojei shotru utpadon probon chhilen, probably a machine-kit translation intended to convey that the nawab had a tendency to make enemies because of his overzealous nature.

“Is this Bengali?” wonders Hasan, who is a research fellow at the department of computer science, Johns Hopkins University.

The Bengali encyclopaedia is a small project with about 21,479 articles, ranked 72 among the Wikipedias, and a core of 15 to 20 contributors. Compare that with the number of English articles: 33.48 lakh.

One of the site’s administrators, Belayet Hossain, said that did not mean the site was desperate for content. “They are doing an experiment on Wikipedia, and it is not a place for experimenting,” Hossain said of Google.

Such sentiments found an outlet last week when hundreds of contributors and supporters gathered in the Polish port city of Gdansk for a Wikimania conference to meet and greet, talk and listen and think about the future of Wikipedia. The utopianism at the core of the encyclopaedia was still present: the setting was a philharmonic hall on an island in the Motlawa River with the bent, largely idle cranes of the Gdansk shipyards, the birthplace of the anti-Soviet Solidarity movement, close on the horizon. The T-shirts said: “Free Knowledge in the City of Freedom.”

On the Bengali Wikipedia, Hasan told The Telegraph: “We started work in an organised manner from 2006. We are spread across Bangladesh, West Bengal and abroad. We generated 10,000 articles in the first six months but then we started concentrating on quality.”

Hasan and the others came across the problem.

Hasan said: “We encourage new articles and translations but the Google translator toolkits used by their translators resulted in hundreds of spelling mistakes in their contributions. We don’t expect a user’s first attempt in writing an article to be perfect. However, we do expect that when someone introduces a lot of spelling errors, they will also follow up and fix the problems.

“We requested them to fix these but our requests were not heeded. So, after some time we had to remove the articles. We have a very small number of volunteers and we are very careful not to introduce mistakes.”

But since many Bengalis take pride in their bilingual prowess, does Wiki need a Bengali version? Jayanta Nath, a civil engineer and another administrator based in Calcutta, said the English Wikipedia was nearing saturation point. “So regional languages are the way forward for growth.”

A. Ravishankar of the Tamil Wikipedia, who could not go to Poland but whose paper raised the translation question, said: “Think of the rural segment. They will need Net content in their mother tongue.”

In understated phrasing, Ravishankar’s paper explained the pitfalls of indiscriminate translation. For example, the Tamil entries covered “too many American pop stars and Hindi movies, which Tamils may not need as a priority.”

There was sloppiness in language and coding. The content was mostly not original, having been translated from English Wikipedia entries. Despite these concerns, Tamil Wikipedia plans on working with Google to continue the additions.

Datta, the new board member, said every language would face such problems. “If Bengali needs to improve, the community and its editors have to act. Wikipedia can help technically. But the community has to act,” she added.