Meet the 'bots' that edit Wikipedia

2012-07-25 08:05:29

By Daniel Nasaw BBC News Magazine, Washington

Wikipedia is written and maintained by tens of thousands of volunteers across

the world. Those, in turn, are assisted by hundreds of "bots" - autonomous

computer programmes that keep the encyclopaedia running.

"Penis is the male sex organ," the Wikipedia page in question read.

While that statement is undeniably true and thus may merit inclusion in

Wikipedia, it belongs nowhere in the site's article on national supreme courts

and their legal roles.

When an anonymous Wikipedia reader in South Carolina offered that contribution

to the globally popular online encyclopaedia last week, it took just seconds

for the blemish to be discovered and deleted.

The vandalism was caught not by a reader, but by a simple artificial

intelligence programme called a bot - short for robot.

Virtually invisible

ClueBot NG, as the bot is known, resides on a computer from which it sallies

forth into the vast encyclopaedia to detect and clean up vandalism almost as

soon as it occurs.

Wikipedia by the numbers

English Wikipedia:

4,005,000 articles

If printed and bound, it would contain more than 1,700 volumes

32,760 contributors making more than five edits per month, about 10-15% of whom

are women

More than 700 active bots

All Wikipedias:

22.8m articles in all languages

285 language editions

Smallest edition is Kashmiri, with 131 articles

Source: Wikipedia

It is one of several hundred bots patrolling Wikipedia at any given time. Its

role in repairing the Supreme Court article illustrates how bots have quietly

become an indispensable - if virtually invisible - part of the Wikipedia

project.

"Wikipedia would be a shambles without bots," a Wikipedia administrator known

on the site as Hersfold writes in an email.

English Wikipedia alone surpassed four million articles this month. It contains

an estimated 2.5 billion words, equivalent to millions of pages, and it is 50

times larger than the Encyclopaedia Britannica.

Wikipedia is maintained across all languages by tens of thousands of editors -

about 77,000 of whom make more than five edits a month.

But the project is so vast, and its maintenance so labour-intensive that it

defies the capability of its human administrators and editors to keep it in

order.

Zapping wiki-vandals

That is where the bots come in.

"We had a joke that one day all the bots should go on strike just to make

everyone appreciate how much work they do," says Chris Grant, a 19-year-old

student in Perth, Australia who is on the Wikipedia committee that supervises

the bots.

One small step for a bot...

One of the first lines of text written by a bot in Wikipedia was composed on 9

December 2002 by rambot. The masterpiece:

"Autaugaville is a town located in Autauga County, Alabama. As of the 2000

census, the population of the town is 820."

"The site would demand much more work from all of us and the editor burnout

rate would be much higher."

The bots perform a wide range of editorial and administrative tasks that are

tedious, repetitive and time-consuming but vital.

They delete vandalism and foul language, organise and catalogue entries, and

handle the reams of behind-the-scenes work that keep the encyclopaedia running

smoothly and efficiently and keep its appearance neat and uniform in style.

In brick-and-mortar library terms, bots are akin to the students who shelve

books, move stacks from one range to another, affix bar codes to book spines

and perform other grunt tasks that allow the trained librarians to concentrate

on acquisitions and policy.

Can bots write?

"Wikipedia has just grown so much that I don't know how well people would

handle it if all the bots went away," says Brad Jorsch, a computer programmer

in North Carolina who runs a bot that tracks the tags reminding editors to add

citations to articles.

What do they do?

"Interwiki" bots link articles on the same subject in different languages

Flag potential copyright violations and other irregularities for human review

Add dates to "cleanup" tags so human editors know what needs attention

Add articles to category lists, and lists of categories to articles

Format and repair citations and references

Compare ISBN numbers

Flag images that need more licensing details

Behind the scenes:

Maintain Wikipedia archives

Handle evidence in arbitration and administrative matters

Bots have been around almost as long as Wikipedia itself.

The site was founded in 2001, and the next year, one called rambot created

about 30,000 articles - at a rate of thousands per day - on individual towns in

the US.

The bot pulled data directly out of US Census tables. The articles read as if

they had been written by a robot. They were short and formulaic and contained

little more than strings of demographic statistics.

But once they had been created, human editors took over and filled out the

entries with historical details, local governance information, and tourist

attractions.

In 2008, another bot created thousands of tiny articles about asteroids,

pulling a few items of data for each one from an online Nasa database.

Today, the Wikipedia community remains divided on the value of bot-written

entries. Some administrators say a stub of an article listing only a few points

of data is of little value; others say any new content is good.

Rogue bot fears

The upshot of the disagreement is bots are no longer permitted to write whole

articles. But their ability to perform rote maintenance frees up human editors

to do research and write entries and check one another's work to ensure

accuracy.

Can bots replace human writers?

Typewriter from Thinkstock

These days bots are typically forbidden from writing their own articles and

from other writerly tasks like sub-editing. Here's why:

Regional differences confound them: The BBC writes "flavour", for example,

while the Associated Press uses the American "flavor"

English grammar is too nuanced to be automated.

Bots cannot do research, in the sense of seeking and synthesising information

to support a thesis

"I don't think people realise how much maintenance and meta work goes on in

Wikipedia," says Grant.

Some administrators fear a renegade bot will one day inflict catastrophic

damage on the encyclopaedia. Think Skynet in the Terminator films.

Those fears are unfounded, says Grant.

For one, a bot is not like an automobile: If a part fails while in operation it

will shut down rather than careen into something.

"You'd have to have someone actually have someone programme the bot to go crazy

and delete everything," Grant says.

Bots with the rights to delete pages, block editors and take other drastic

actions could only be run by editors already entrusted with administrative

privileges, Grant says.

The bots do make mistakes, however, if they encounter a new circumstance their

programming cannot account for. ClueBot NG, the anti-vandalism bot, has a small

rate of false positives - edits it mistakes for vandalism, but which are in

fact legitimate.

Since Wikipedia closely tracks edits, however, mistakes can be repaired almost

as quickly as they happened, administrators say.

Human writers need not fear they will one day be replaced by bot, bots masters

say.

"It takes human judgement to write an article or proof an article or even clean

up grammar and spelling," says Jorsch.