Meet the 'bots' that edit Wikipedia

2012-07-25 08:05:29

By Daniel Nasaw BBC News Magazine, Washington

Wikipedia is written and maintained by tens of thousands of volunteers across the world. Those, in turn, are assisted by hundreds of "bots" - autonomous computer programmes that keep the encyclopaedia running.

"Penis is the male sex organ," the Wikipedia page in question read.

While that statement is undeniably true and thus may merit inclusion in Wikipedia, it belongs nowhere in the site's article on national supreme courts and their legal roles.

When an anonymous Wikipedia reader in South Carolina offered that contribution to the globally popular online encyclopaedia last week, it took just seconds for the blemish to be discovered and deleted.

The vandalism was caught not by a reader, but by a simple artificial intelligence programme called a bot - short for robot.

Virtually invisible

ClueBot NG, as the bot is known, resides on a computer from which it sallies forth into the vast encyclopaedia to detect and clean up vandalism almost as soon as it occurs.

Wikipedia by the numbers

English Wikipedia:

4,005,000 articles

If printed and bound, it would contain more than 1,700 volumes

32,760 contributors making more than five edits per month, about 10-15% of whom are women

More than 700 active bots

All Wikipedias:

22.8m articles in all languages

285 language editions

Smallest edition is Kashmiri, with 131 articles

Source: Wikipedia

It is one of several hundred bots patrolling Wikipedia at any given time. Its role in repairing the Supreme Court article illustrates how bots have quietly become an indispensable - if virtually invisible - part of the Wikipedia project.

"Wikipedia would be a shambles without bots," a Wikipedia administrator known on the site as Hersfold writes in an email.

English Wikipedia alone surpassed four million articles this month. It contains an estimated 2.5 billion words, equivalent to millions of pages, and it is 50 times larger than the Encyclopaedia Britannica.

Wikipedia is maintained across all languages by tens of thousands of editors - about 77,000 of whom make more than five edits a month.

But the project is so vast, and its maintenance so labour-intensive that it defies the capability of its human administrators and editors to keep it in order.

Zapping wiki-vandals

That is where the bots come in.

"We had a joke that one day all the bots should go on strike just to make everyone appreciate how much work they do," says Chris Grant, a 19-year-old student in Perth, Australia who is on the Wikipedia committee that supervises the bots.

One small step for a bot...

One of the first lines of text written by a bot in Wikipedia was composed on 9 December 2002 by rambot. The masterpiece:

"Autaugaville is a town located in Autauga County, Alabama. As of the 2000 census, the population of the town is 820."

"The site would demand much more work from all of us and the editor burnout rate would be much higher."

The bots perform a wide range of editorial and administrative tasks that are tedious, repetitive and time-consuming but vital.

They delete vandalism and foul language, organise and catalogue entries, and handle the reams of behind-the-scenes work that keep the encyclopaedia running smoothly and efficiently and keep its appearance neat and uniform in style.

In brick-and-mortar library terms, bots are akin to the students who shelve books, move stacks from one range to another, affix bar codes to book spines and perform other grunt tasks that allow the trained librarians to concentrate on acquisitions and policy.

Can bots write?

"Wikipedia has just grown so much that I don't know how well people would handle it if all the bots went away," says Brad Jorsch, a computer programmer in North Carolina who runs a bot that tracks the tags reminding editors to add citations to articles.

What do they do?

"Interwiki" bots link articles on the same subject in different languages

Flag potential copyright violations and other irregularities for human review

Add dates to "cleanup" tags so human editors know what needs attention

Add articles to category lists, and lists of categories to articles

Format and repair citations and references

Compare ISBN numbers

Flag images that need more licensing details

Behind the scenes:

Maintain Wikipedia archives

Handle evidence in arbitration and administrative matters

Bots have been around almost as long as Wikipedia itself.

The site was founded in 2001, and the next year, one called rambot created about 30,000 articles - at a rate of thousands per day - on individual towns in the US.

The bot pulled data directly out of US Census tables. The articles read as if they had been written by a robot. They were short and formulaic and contained little more than strings of demographic statistics.

But once they had been created, human editors took over and filled out the entries with historical details, local governance information, and tourist attractions.

In 2008, another bot created thousands of tiny articles about asteroids, pulling a few items of data for each one from an online Nasa database.

Today, the Wikipedia community remains divided on the value of bot-written entries. Some administrators say a stub of an article listing only a few points of data is of little value; others say any new content is good.

Rogue bot fears

The upshot of the disagreement is bots are no longer permitted to write whole articles. But their ability to perform rote maintenance frees up human editors to do research and write entries and check one another's work to ensure accuracy.

Can bots replace human writers?

Typewriter from Thinkstock

These days bots are typically forbidden from writing their own articles and from other writerly tasks like sub-editing. Here's why:

Regional differences confound them: The BBC writes "flavour", for example, while the Associated Press uses the American "flavor"

English grammar is too nuanced to be automated.

Bots cannot do research, in the sense of seeking and synthesising information to support a thesis

"I don't think people realise how much maintenance and meta work goes on in Wikipedia," says Grant.

Some administrators fear a renegade bot will one day inflict catastrophic damage on the encyclopaedia. Think Skynet in the Terminator films.

Those fears are unfounded, says Grant.

For one, a bot is not like an automobile: If a part fails while in operation it will shut down rather than careen into something.

"You'd have to have someone actually have someone programme the bot to go crazy and delete everything," Grant says.

Bots with the rights to delete pages, block editors and take other drastic actions could only be run by editors already entrusted with administrative privileges, Grant says.

The bots do make mistakes, however, if they encounter a new circumstance their programming cannot account for. ClueBot NG, the anti-vandalism bot, has a small rate of false positives - edits it mistakes for vandalism, but which are in fact legitimate.

Since Wikipedia closely tracks edits, however, mistakes can be repaired almost as quickly as they happened, administrators say.

Human writers need not fear they will one day be replaced by bot, bots masters say.

"It takes human judgement to write an article or proof an article or even clean up grammar and spelling," says Jorsch.