

Half of the top 50 Wikipedia contributors are bots - lsb
http://slightlynew.blogspot.com/2011/05/vandalism-second-chances-and-bots.html

======
tptacek
Before everyone freaks out about this, the bots that we're talking about are
janitorial, doing things like add dates to wikitags added by human editors.

You can see this right now; go find a page to add a (legit, please)
{{clarify}} or {{who}} or {{where}} tag to. Add it without a date, and wait 5
minutes; Smackbot will date the tag for you.

This is a good thing for _obvious reasons_.

Similarly, I'm guessing the "reversion" stat here is misleading. Reversion
isn't deletion; anything that moves content around the page is likely counted
as well. Write a graf or two on any well-trafficked WP page and you'll
probably see it moved around pretty quickly.

More importantly, reversion isn't permanent. On any popular article, someone
is bound to disagree with the merits of your edit. The normal way of things
when I was wasting time at WP was, edit, revert, talk page, re-post copy.

~~~
lsb
And that's genuinely valuable; it's helpful to know that someone's thought
that this has been unclear for 2 days versus 2 years. The compressed diff will
be quite small, but Wikipedia incrementally gets better.

Have you ever taken too much penicillin? You'll lose some of your good
bacteria and you'll get a fungus from opportunistic infections. (See
<http://en.wikipedia.org/wiki/Antibiotic_candidiasis> and
<http://en.wikipedia.org/wiki/Candidiasis> for examples.)

Same thing with Wikipedia. Bots are really helpful for regular maintenance.

------
tokenadult
I'd like to know how many are plagiarists

<http://en.wikipedia.org/wiki/Wikipedia:Plagiarism>

or persons who misrepresent their identities,

<http://en.wikipedia.org/wiki/Wikipedia:Sock_puppetry>

especially after some infamous incidents

[http://www.roughtype.com/archives/2007/03/head_wikipedian.ph...](http://www.roughtype.com/archives/2007/03/head_wikipedian.php)

[http://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/20...](http://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2010-11-01/Arbitration_report)

of editors with a lot of trust in the Wikipedia editing community who were
abusing that trust. I don't find it easy in the culture of Wikipedia to gain
trust by referring to reliable sources and checking what those sources say.

After edit: The phenomenon of edit wars

<http://en.wikipedia.org/wiki/Wikipedia:Lamest_edit_wars>

goes a long way toward explaining why most Wikipedians have had contributions
reverted, as the submitted blog post reports.

~~~
lsb
As I was manually going through the top 50 list, "Darius Dhlomo, 920829" seems
like someone who was banned, based on significant copyright issues.

Also, plagiarism is difficult, because there are a lot of articles based on
the kernel of public-domain texts, of which citation is not required. And sock
puppetry is difficult, because some people keep bots that do productive work,
but if human beings have trouble sensing that, I imagine it'd be much more
code than the hundred lines of code that it took to process this the first
time: [https://github.com/lsb/ugc-contributors/blob/master/mh-
diffs...](https://github.com/lsb/ugc-contributors/blob/master/mh-diffs.rb)

------
crikli
<http://en.wikipedia.org/wiki/Wikipedia:Bots>

Learned something new today. I didn't have any idea Wikipedia was partially
controlled by bots. I for one welcome our new NLP powered overlords.

Based on that page, though, how are bots _contributing_? Editing, yes.
Creating, how?

~~~
ErrantX
The biggest editing bots tend to do template maintenance (so correcting
formatting issues and adding dates to templates). Bots simply do tedious work
(that would have to be done anyway) faster - so hence a big edit count.

My bot (which notifies English Wikipedia of image deletions on Wikimedia
Commons) edits about three times more than I do each month (and I am somewhere
around #500 activity wise).

The very top bot ClueBot NG does more than double the edits of anyone else per
month; that is an anti-vandalism bot, it catches about 50% of vandalism on
Wikipedia each month. So pretty crucial :) Cydebot (#2) does some sort of
category maintenance (not sure what, never seen it before). SineBot (#5) goes
around and signs peoples comments when they forget (handy beast).

Basically it is just infrastructure; but because of the nature of the Wiki it
is more visible than other forms of infrastructure :)

------
philthy
"Over three quarters of contributions from registered users are from someone
who's had a contribution reverted."

Well this makes sense entirely since it is peer reviewed and the articles are
a living ecosystem which are built off of and on top of prior submissions and
versions.

~~~
lsb
I approached that question from a purity standpoint: if you only limited
Wikipedia to people who did things "right", where would you be? And the answer
turned out to be that you wouldn't be very far.

I think urban planners have found that neighborhood cohesion and community
watches are far better than gated communities and lawless knifefights outside,
and the online textual equivalent of community dynamics agrees with that.

~~~
tokenadult
_if you only limited Wikipedia to people who did things "right", where would
you be?_

Although it's notionally possible that a first editor who got something
"right" (factually correct based on reliable sources) was later reverted by
someone edit-warring to score a point. Sometimes a particular edit makes
Wikipedia worse rather than better. If NONE of my edits had ever been reverted
(several have been), I would think that I wasn't trying hard enough to check
sources and correct commonplace errors. However, it's not clear at the moment
whether the editors with the lowest percentages of reverted edits are those
who do the best edits, or just those whose prejudices (and topic choices)
avoid scrutiny by the mob of other Wikipedia amateur editors. I am conscious
of this issue because I was a professional editor and researcher to make my
living long before most members of the general public had ever heard of the
Internet, and because most wikipedians are so young and so devoid of editing
experience that they readily mistake good edits for bad.

 _I think urban planners have found that neighborhood cohesion and community
watches are far better than gated communities_

It would be interesting to see citations to research on this issue, as well as
the latest citations to research on forming online communities in which fact-
checking and encouragement of careful scholarship become the group ethos. I
live in a crime-free neighborhood with an annual block party and NO crime in a
typical year, not gated, with good community cohesion, but also isolated from
a lot of passers-by simply because it isn't a shortcut to anywhere else.
Wikipedia is so exposed to the outside world, including spamming advertisers
and propaganda agents of warring governments, that it may take more vigilance
to protect than just an informal consensus among volunteer amateur editors.

~~~
lsb
A guy I knew at the Media Lab said that Marvin Minsky tried to correct his
Wikipedia article, but someone reverted it. The IP address came from inside
the Media Lab, so it might have very well been him.

The small-community-far-from-everything is quite similar to Wikipedia in its
early years. Look at 2002-2005 and you'll see lots of anonymous Good
Samaritans contributing helpful content. (Look for R6 and R7 on
[http://slightlynew.blogspot.com/2011/05/who-writes-
wikipedia...](http://slightlynew.blogspot.com/2011/05/who-writes-wikipedia-
information.html) to see a time series of anonymous contributions from 2003 to
2010.)

------
alok-g
While, strictly speaking, this is unrelated to this topic, something has been
bugging me that I would like to bring to the attention of HN readers:

BOUML is a free UML2 tool for C++, Java, Python and others. The developer,
Bruno Pages, has stopped its development due to some issue with Wikipedia
administrators.

I attempted to get more information from Bruno but he is upset enough to not
talk about it. The following is quoted verbatim from <http://bouml.free.fr/>
and is all I know:

"Due to the continuous license violations, attacks and insults from people of
wikipedia ( the worst of them were the administrators Bapti (commons)/ Bapti
(fr.wikipedia)/ Bapti (de.wikipedia) , Dereckson (commons)/ Dereckson
(fr.wikipedia)/ Dereckson (de.wikipedia) and Esby (commons)/ Esby
(fr.wikipedia) ), I have decided to stop work on Bouml except to fix bugs.
Bruno Pagès."

------
wbhart
Oh! The number of articles I have read lately where I have gone and looked up
actual books and discovered that the wikipedia content is copied verbatim from
the books, yet with sentences that do not correlate strung together in a way
that results in totally incorrect information.

And then there is the fact that for ages the Jean Claude van Damme article
said he had killed 100 men with his bare hands {expletives deleted}.

Many of the mathematics articles contain appalling errors. You post about the
errors in the discussion page and the issue is totally ignored. There seems to
be no way to work with the community process to get things fixed.

I use wikipedia a lot for things that don't matter, e.g. as a structured
skeleton for notes on a particular topic that I am writing for my own benefit
and understanding. But as a project, I think it has basically reached its
limits.

There's no way to move it significantly past what it is now.

~~~
xxpor
If there are significant errors, fix them yourself. There's no need to go to
the discussion page.

<http://en.wikipedia.org/wiki/Wikipedia:BOLD>

~~~
CJefferson
Except I find most of my changes get reverted, even if referenced or simply
deleting incorrect and unreferenced information. I don't have time to enter a
life-long battle keeping an article correct.

~~~
TikiTDO
Something does not compute there. If your changes are getting reverted,
obviously someone cares enough to notice, so it follows that they should care
enough to check the discussion. Are you sure you're not just getting hit by a
bot? Maybe you did not tag your changes properly? If anything, there might be
a problem that the admins should look at. I would like to see some of these
changes you are talking about.

~~~
woodall
My experience with Wikipedia has been hit or miss. Sometimes I'll find an
older article and fix it up a bit and it will stay. Other times I get my edit
reverted by an over zealous neck beard who can't handle being wrong on the
internet. I don't have the time to battle these types of people. This is what
sunk, "Wikipedia is not a creditable source" into my brain.

------
scythe
It'd be cool to see how this was calculated. Though considering the amount of
auto-spellcheck and similar that goes on, it's kind of surprising to see that
half of the top contributors are _human_.

~~~
davnola
The source is linked from the OP. Here it is:
[http://slightlynew.blogspot.com/2011/05/who-writes-
wikipedia...](http://slightlynew.blogspot.com/2011/05/who-writes-wikipedia-
information.html)

The author draws a wonderful parallel between bots in the Wikipedia ecosystem
and bacteria in the human body.

~~~
lsb
Thanks for your kind words! I was fascinated by that talk.

And if you want the code + data, you can just xargs
[https://github.com/lsb/ugc-contributors/blob/master/mh-
diffs...](https://github.com/lsb/ugc-contributors/blob/master/mh-diffs.rb)
over all of your pages-meta-history.7z files from
<http://dumps.wikimedia.org/enwiki/20110317/> or get the sqlite database from
Infochimps from [http://www.infochimps.com/datasets/entropy-per-revision-
of-w...](http://www.infochimps.com/datasets/entropy-per-revision-of-wikipedia-
pages-beginning-with-m)

------
T_S_
The idea that a human and a program would use the same interface is a good one
if your goal is to allow the humans to focus on adding domain knowledge and
the machines to focus on the boring repetitive tasks. Consider it a design
pattern, not an evil plot.

------
basseq
The stat I found interesting is this one [citation needed]: while a majority
of edits are done by few editors, most of the content is generated by either
a) anonymous accounts or b) people who have contributed 2-3 times and haven't
returned since. The editors, like the bots, represent maintenance edits.

Keep that in mind the next time Jimmy Wales gives beaucoup credit to editors,
or some super-user gets power-happy.

------
retube
What is the motivation for this? Why would someone write a bot to add content
to wikipedia?

~~~
T-hawk
Also gray-to-black hat spam with links. Just like the content farms that
plague Google's results, they'll push in any low-quality content in hopes of
trolling some clicks to their ad- or malware-laden sites.

~~~
redthrowaway
These are pretty easy to spot, and they get banned very quickly.

------
csomar
I think these are bots that add links (external) to Wikipedia by some
Spammers. I have seen some people doing it.

~~~
brazzy
I think most of those are probably officially endorsed bots that do things
like correct common misspellings, crosslink between different langauge
Wikipedias and do huge one-off tasks like category reorganizations.

