How a Malagasy Teenager Created the World’s Second-Largest Dictionary

by Cyrus Multhauf

Visit the home page of Wiktionary — a sister project of Wikipedia which hosts user-generated dictionaries in over six-thousand languages — and you will find something strange. In a chart organizing the various dictionaries by size, two have the distinction of containing over five million entries. The first, at 6.1 million, is the English dictionary. Coming second, at 5.9 million, is the dictionary for the Malagasy language.

To anyone familiar with Malagasy, this should come as a surprise. As Madagascar’s official language, its speakers number around 25 million. About a quarter of the population of Madagascar, an island country off the southeast coast of Africa, is illiterate. Moreover, only 10% of the population has the internet access needed to edit an online encyclopedia. English, on the other hand, has nearly one billion speakers, a large share of whom live in developed countries with generous computer access.

Given this, a serious question looms: how did the Malagasy Wiktionary come to be so large? Glancing at the Wiktionary’s edit history, one can see that virtually all changes and additions are made by bots. And almost all of these bots were created by a single user.

Across the Malagasy Wiktionary – and some related blogs – many users mention one individual, a serial editor by the username of Jagwar. On Jagwar’s own talk page, and on threads related to many of his edits, angry moderators can be seen questioning the volume, provenance, and reliability of his additions to Wiktionary.

If these scattered sources are to be believed, Jagwar is responsible for the vast majority of edits and additions to the Malagasy Wiktionary. Curious, NEEMblog reached out to Jagwar, who was happy to provide a full explanation of how he built a dictionary that is, by some measures, the second-largest ever compiled.

Malagasy, like many indigenous languages in the Global South, has few formal language resources. When Jagwar – born in Madagascar — discovered Wiktionary as a teenager, he saw an opportunity to correct this.

“What motivated me to work on the Malagasy Wiktionary was that in October 2009 — when back from vacation in Madagascar — I found a largely neglected website containing 110 words here and there. Navigation through content was difficult,” Jagwar told NEEMblog. “On the other hand, I had the French and English Wiktionaries which each had more Malagasy words than the entire Malagasy Wiktionary. [That was] the trigger to start working on the Malagasy Wiktionary. In 2009, [I] didn’t know much programming at that time, so I did everything by hand, at least in the beginning.”

Over time, Jagwar’s coding skills grew sharper. With maniacal perseverance, he programmed a cadre of bots to automatically import entries from other online Malagasy dictionaries. This created the largest-ever Malagasy dictionary. And that was only the beginning.

Each of Wiktionary’s constituent dictionaries contain words not in that dictionary’s language. The English dictionary will have Spanish words defined in English, for example. By creating bots to locate Malagasy words in other Wiktionaries and automatically translate them back into Malagasy’s share of the website, Jagwar expanded the number of entries into the millions.

It was not easy. Like many digital-era disruptors, he was often on the receiving end of copyright notices. Angry emails berated him for ignoring the website’s standards of quality. But after much persistence and a few improvements to his bots, most of Jagwar’s work was accepted by Wiktionary’s community of editors, and his contributions stood.

Dictionaries are often subject to fierce debate. The academic authorities that compile traditional English-language lexicons like the Oxford English Dictionary are routinely described as snobby prescriptivists, referring to a view that maintains strict boundaries between correct and incorrect usage in language. But in recent years, mainstream dictionaries have taken a more descriptivist stance. Last year, the OED welcomed the term “whatevs” into its pantheon of words.

Accompanying — and perhaps encouraging — these recent changes has been a deluge of online dictionaries. Websites like Urban Dictionary and Wiktionary put an authority once reserved for the educated elite into the hands of ordinary computer-users. As a result, online dictionaries tend to have broad mandates. The English Wiktionary hosts slang, acronyms, technical terms, and words from other languages. If a word fits under the website’s generous “common use” policy, it is included.

Linguistic purists will be skeptical of this laissez-faire attitude. Just as fake news is accused of warping the judgement of voters across the globe, so could lazily-constructed dictionaries poison the quality of writing and literature. But for people like Jagwar, new rules present new opportunities. Loose guidelines enabled him to create a valuable tool, one that will assist countless writers, teachers, and language-learners now and into the future.

As Madagascar’s population becomes more educated, more people will enter jobs requiring precise speaking and writing, such as lawyering, consultancy, and journalism. Logging on to Wiktionary for a quick reference, they may offer its obscure creator a word or two of thanks.

Cyrus Multhauf is an itinerant wannabe journalist currently investigating political selfies, the 1975 Australian constitutional crisis (google it), and the best way to eat a mango. If you have a job, please give it to him, as he is likely to fall into a life of petty crime without gainful employment.

You can find him on Twitter: @cwmulth, Instagram: cy_clops_, and on his website: verbatimjournal.com

If you would like to submit your own article, email a final draft to neememes [at] gmail [dot] com

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s