2012-07-25 08:05:29
By Daniel Nasaw BBC News Magazine, Washington
Wikipedia is written and maintained by tens of thousands of volunteers across
the world. Those, in turn, are assisted by hundreds of "bots" - autonomous
computer programmes that keep the encyclopaedia running.
"Penis is the male sex organ," the Wikipedia page in question read.
While that statement is undeniably true and thus may merit inclusion in
Wikipedia, it belongs nowhere in the site's article on national supreme courts
and their legal roles.
When an anonymous Wikipedia reader in South Carolina offered that contribution
to the globally popular online encyclopaedia last week, it took just seconds
for the blemish to be discovered and deleted.
The vandalism was caught not by a reader, but by a simple artificial
intelligence programme called a bot - short for robot.
Virtually invisible
ClueBot NG, as the bot is known, resides on a computer from which it sallies
forth into the vast encyclopaedia to detect and clean up vandalism almost as
soon as it occurs.
Wikipedia by the numbers
English Wikipedia:
4,005,000 articles
If printed and bound, it would contain more than 1,700 volumes
32,760 contributors making more than five edits per month, about 10-15% of whom
are women
More than 700 active bots
All Wikipedias:
22.8m articles in all languages
285 language editions
Smallest edition is Kashmiri, with 131 articles
Source: Wikipedia
It is one of several hundred bots patrolling Wikipedia at any given time. Its
role in repairing the Supreme Court article illustrates how bots have quietly
become an indispensable - if virtually invisible - part of the Wikipedia
project.
"Wikipedia would be a shambles without bots," a Wikipedia administrator known
on the site as Hersfold writes in an email.
English Wikipedia alone surpassed four million articles this month. It contains
an estimated 2.5 billion words, equivalent to millions of pages, and it is 50
times larger than the Encyclopaedia Britannica.
Wikipedia is maintained across all languages by tens of thousands of editors -
about 77,000 of whom make more than five edits a month.
But the project is so vast, and its maintenance so labour-intensive that it
defies the capability of its human administrators and editors to keep it in
order.
Zapping wiki-vandals
That is where the bots come in.
"We had a joke that one day all the bots should go on strike just to make
everyone appreciate how much work they do," says Chris Grant, a 19-year-old
student in Perth, Australia who is on the Wikipedia committee that supervises
the bots.
One small step for a bot...
One of the first lines of text written by a bot in Wikipedia was composed on 9
December 2002 by rambot. The masterpiece:
"Autaugaville is a town located in Autauga County, Alabama. As of the 2000
census, the population of the town is 820."
"The site would demand much more work from all of us and the editor burnout
rate would be much higher."
The bots perform a wide range of editorial and administrative tasks that are
tedious, repetitive and time-consuming but vital.
They delete vandalism and foul language, organise and catalogue entries, and
handle the reams of behind-the-scenes work that keep the encyclopaedia running
smoothly and efficiently and keep its appearance neat and uniform in style.
In brick-and-mortar library terms, bots are akin to the students who shelve
books, move stacks from one range to another, affix bar codes to book spines
and perform other grunt tasks that allow the trained librarians to concentrate
on acquisitions and policy.
Can bots write?
"Wikipedia has just grown so much that I don't know how well people would
handle it if all the bots went away," says Brad Jorsch, a computer programmer
in North Carolina who runs a bot that tracks the tags reminding editors to add
citations to articles.
What do they do?
"Interwiki" bots link articles on the same subject in different languages
Flag potential copyright violations and other irregularities for human review
Add dates to "cleanup" tags so human editors know what needs attention
Add articles to category lists, and lists of categories to articles
Format and repair citations and references
Compare ISBN numbers
Flag images that need more licensing details
Behind the scenes:
Maintain Wikipedia archives
Handle evidence in arbitration and administrative matters
Bots have been around almost as long as Wikipedia itself.
The site was founded in 2001, and the next year, one called rambot created
about 30,000 articles - at a rate of thousands per day - on individual towns in
the US.
The bot pulled data directly out of US Census tables. The articles read as if
they had been written by a robot. They were short and formulaic and contained
little more than strings of demographic statistics.
But once they had been created, human editors took over and filled out the
entries with historical details, local governance information, and tourist
attractions.
In 2008, another bot created thousands of tiny articles about asteroids,
pulling a few items of data for each one from an online Nasa database.
Today, the Wikipedia community remains divided on the value of bot-written
entries. Some administrators say a stub of an article listing only a few points
of data is of little value; others say any new content is good.
Rogue bot fears
The upshot of the disagreement is bots are no longer permitted to write whole
articles. But their ability to perform rote maintenance frees up human editors
to do research and write entries and check one another's work to ensure
accuracy.
Can bots replace human writers?
Typewriter from Thinkstock
These days bots are typically forbidden from writing their own articles and
from other writerly tasks like sub-editing. Here's why:
Regional differences confound them: The BBC writes "flavour", for example,
while the Associated Press uses the American "flavor"
English grammar is too nuanced to be automated.
Bots cannot do research, in the sense of seeking and synthesising information
to support a thesis
"I don't think people realise how much maintenance and meta work goes on in
Wikipedia," says Grant.
Some administrators fear a renegade bot will one day inflict catastrophic
damage on the encyclopaedia. Think Skynet in the Terminator films.
Those fears are unfounded, says Grant.
For one, a bot is not like an automobile: If a part fails while in operation it
will shut down rather than careen into something.
"You'd have to have someone actually have someone programme the bot to go crazy
and delete everything," Grant says.
Bots with the rights to delete pages, block editors and take other drastic
actions could only be run by editors already entrusted with administrative
privileges, Grant says.
The bots do make mistakes, however, if they encounter a new circumstance their
programming cannot account for. ClueBot NG, the anti-vandalism bot, has a small
rate of false positives - edits it mistakes for vandalism, but which are in
fact legitimate.
Since Wikipedia closely tracks edits, however, mistakes can be repaired almost
as quickly as they happened, administrators say.
Human writers need not fear they will one day be replaced by bot, bots masters
say.
"It takes human judgement to write an article or proof an article or even clean
up grammar and spelling," says Jorsch.