
The value of Tor and anonymous contributions to Wikipedia - blendergeek
https://blog.torproject.org/the-value-of-anonymous-contributions-wikipedia
======
commoner
If you are using Tor for security or privacy reasons, and you would like to
edit Wikipedia while using Tor, you can request permission to do so:

[https://en.wikipedia.org/wiki/Wikipedia:IP_block_exemption](https://en.wikipedia.org/wiki/Wikipedia:IP_block_exemption)

This requires a Wikipedia account and an email address (which can be used
exclusively for Wikipedia). Signing up for a Wikipedia account involves
providing a username and a password, but no personal information is needed.

The "Advice to users using Tor" page has more information:

[https://en.wikipedia.org/wiki/Wikipedia:Advice_to_users_usin...](https://en.wikipedia.org/wiki/Wikipedia:Advice_to_users_using_Tor)

~~~
77pt77
> This requires a Wikipedia account and an email address

It's borderline impossible nowadays to create an email address without a phone
number or a pre-existing email address.

Also almost impossible to do using Tor.

~~~
mkup
Last time I tried to create e-mail address via Tor, ProtonMail worked
flawlessly. No phone number required. They even have .onion domain available
exclusively via Tor.

But other e-mail providers like GMail and Hotmail won't like Tor and ask user
to provide phone number, you are partially right. So solution is simply not to
use these e-mail providers.

~~~
Funes-
>Last time I tried to create e-mail address via Tor, ProtonMail worked
flawlessly. No phone number required.

It must've been some time ago, then, because they now require an SMS (your
phone number), a donation, or another e-mail account.

~~~
TechBro8615
I don’t think that’s true for every exit node. Keep cycling your IP and you
may find one where it’s not required.

~~~
jandrese
Even if you get the account created it will tend to self-lock after a short
time and ask for SMS verification.

Email services that don't SMS verify on VPN or TOR endpoints usually find
themselves on anti-spam blacklists sooner rather than later. Spammers are
constantly on the lookout for email services that don't have trashed
reputations to get around Gmail and other provider's filters.

------
nullc
I burned myself out advocating for this some seven years ago:

[https://lists.wikimedia.org/pipermail/wikitech-l/2013-Decemb...](https://lists.wikimedia.org/pipermail/wikitech-l/2013-December/073785.html)

Unfortunately, the incentive structure at Wikipedia has challenges. Editors
suffer under every ounce of abuse but the cost of excluded contributions is
nearly invisible and not felt personally by anyone.

The result is that convincing people to take even small risks (of abuse) or
costs (of tech measures to mitigate abuse without compromising user privacy)
is extremely hard.

Hopefully this research will help shift the balance.

~~~
FabHK
Hope so. I nearly always use a VPN, and can't contribute to Wikipedia (even if
logged in) due to that. I've once requested an exemption, but it was not
granted.

So, they basically exclude anyone that's somewhat privacy conscious, or
frequently travels to weird jurisdictions that make use of a VPN advisable, or
lives in such jurisdictions. As you say, an invisible loss.

------
bawolff
I think this somewhat misses the point though - i dont think people are
worried about the average TOR user. The average tor user is probably just
fine. People are worried about the one person with an axe to grind making
everyone's life miserable who turns to tor after being blocked through normal
means.

Should people be worried about that? Idk, i think there are probably ways to
mitigate the risk of that at least somewhat. However the article didn't
address that concern, and you can't change hearts and minds if you don't talk
about what people are actually worried about.

~~~
Jap2-0
Exactly. With an IP, it can be pretty easy to stop someone from doing anything
- block one IP (maybe a small range for IPv6), and maybe another for their
phone, and then block account creation from those IPs (I'm a bit rusty on the
technical details of how that works - if it's something that happens
automatically when blocking an IP, or is separate, or I think I heard
something about doing it based on a cookie at one point). No such luck with
Tor.

~~~
nullc
The kind of dedicated habitual abuser bawolff is talking about will also use
an endless series of free/cheap VPSes an VPN services, at least the subset of
ones that might also use tor.

So yes, letting them get away with tor _some_ will let them in a little more,
but not a huge qualitative change to not allowing them to do so.

Not allowing the access, however, blocks a significant amount of legitimate
contributions too.

------
Hitton
Wikipedia has rather heavy handed admins. I remember being caught in /17 IP
range ban. It was apparently because single vandal with ISP which granted
dynamic IPs and had carrier-grade NAT. It caught tens of thousands households.
Luckily the damage was not that great, because the ban was only on english
wikipedia but imho still overkill.

------
jancsika
I'd be interested to know more about the nature of the actual vandalism and
abuse that came from Tor exit nodes prior to Wikipedia implementing the ban.
How long did it stay up before being reverted? How difficult was it to track,
revert, etc. compared to non-Tor-based vandalism? And how effective a tool was
IP-banning for non-Tor-based vandalism at the time?

Plus anything else I haven't thought about regarding the severity of the
vandalism during that time.

Any Wiki admins have first-hand experience?

Edit: clarifications

~~~
duskwuff
Not an administrator, but an onlooker during that period.

Before Wikipedia had any explicit policy re. Tor exit nodes, their IPs would
typically be blocked anyway -- either under longstanding policy regarding open
proxies, or as a result of spam/vandalism edits originating from the IP.
Automatically blocking all Tor exit nodes wasn't a huge change in practice; it
just meant that the process was automatic (so new exit nodes would be blocked
more quickly, and old exit node IPs would be unblocked automatically), and
that the block messages for users on those IPs became more informative.

------
jfengel
Given Wikipedia's demand for citations, is there really any advantage to
allowing anonymous edits? If somebody has privileged information, that may be
valuable knowledge, but it's not what Wikipedia was intended for. I'd expect
anything like that to be deleted as Original Research, or marked as Citation
Needed.

If the citation is public, a non-anonymous person could make the edit as
easily as an anonymous one. That's not to say it will necessarily be made, but
there doesn't seem to be a case that only one person could make the edit.
Allowing anonymous contributions does increase the work force to include
people who feel the need to be more secure, but they don't have access to
special information that makes them uniquely qualified.

~~~
nullc
Under your argument, why should Wikpedia even exist at all? If it's sufficient
that the information is "out there" \-- well then it's all already out there.

The relevant expertise-- which sometimes includes privileged information-- is
part of what lets you know which _public_ information is valuable, relevant,
and worth the effort to bother including.

A user's reason for protecting their privacy may also have absolutely nothing
to do with their possession of any privileged knowledge. It's really
impossible to predict what the long term consequence of compromised privacy
are, and we know that any interaction online can be an invitation to abuse by
crazy people.

So, for example, as part of some discussion online I might find myself reading
an article on some venereal disease. While reading the article I might notice
some omissions or errors and decide to fix them. Later, my edits could end up
as part of a debate about the content of the article (even if my edits were
utterly unobjectionable) ... with an end result of this potentially
embarrassing subject turning up in search results about me, or being
discovered by a political opponent in an entirely unrelated debate a decade
later, and being pulled out of context just to smear me.

This isn't conjectural. I can speak to it personally: For example, over 13
years ago I got in an edit war on Wikipedia over some site policy thing about
users including copyright law violating images on their user pages. I was a
bit of a hothead about it and got myself blocked from editing for 24 hours. I
was appropriately chastised for being an idiot about how it was handled. All
the edits all ultimately went through. But I get regularly slandered by
abusive anonymous accounts about it that are mad at me about unrelated Bitcoin
debates (and can find literally nothing else negative to say about me). They
love to characteristics it in various ways ("fired from wikipedia!"), divorce
it from the context, yank out completely inaccurate off the cuff comments from
other wikipedians made during the event (apparently some random troll was
mistaken for me for a little bit during discussions about the incident).

I would have been much better off contributing anonymously as a result. And
for the little personal benefit I got out of contributing, I would have been
better off not contributing at all rather than end up with this nonsense.

Is Wikipedia or the world really a better place where only people who either
fail to make the above calculation correctly or whom expect some big personal
pay-off by contributing are left editing the site? I don't think so.

------
readhn
Totally makes sense. Fun fact: In the past i was involved in a company where
they implemented anonymous feedback system. The feedback system did so well
.... that they shut it down in a month and never discussed the results. It
uncovered too many problems that nobody in the management wanted to address.
This company is on the verge of bankruptcy now, 5 years after those surveys..

When people are allowed to speak up freely - truth will come out quickly.
(Yes, you will get some noise too, but id rather adjust my signal to noise
ratio then just deal with meaningless noise all the time).

------
FirstLvR
i've said this everywhere... we all need to invest on wikipedia, to make it
powerful, free and useful

it may not work as intented, if you are doing university research but ... for
general purpose we must have a general encyclopedia that actually works

~~~
kevin_thibedeau
Not going to happen so long as self-appointed kings decide what gets to stay.
Exhaustive list of Pokemon? Sure thing. "Non-notable" programming language?
Not worthy.

~~~
amatecha
Yep, I see so many pages get deleted due to "non-notability". Meanwhile you
see exhaustive content about temporarily-popular subjects like a complete plot
synopsis for an entire TV series, plus detailed information about every
location ever described in that series. Like this comprehensive list of all
the characters in the ReBoot series[0]. Hey, I liked the series at the time,
but come on. I can't help but feel a bit frustrated when valuable information
is deleted permanently, but content hundreds of times the size persists for
super-obscure topics that a couple people feel passionate about enough to
fight for and rally their fellow account-holders to vote for.

[0]
[https://en.wikipedia.org/wiki/List_of_ReBoot_characters](https://en.wikipedia.org/wiki/List_of_ReBoot_characters)

~~~
opo
There should be a flag indicating that content is considered 'Not Notable' by
the admins so it is easy to ignore in searches if you so desire.

The idea that content is actually being deleted because it isn't considered
'notable' by some person in 2020 is sad. Who knows what will be considered
'notable' in 2050?

------
peter_d_sherman
>"Wikipedia has tried to block users coming from the Tor network since 2007,
alleging vandalism, spam, and abuse. This research tells a different story:
that people use Tor to make meaningful contributions to Wikipedia, and Tor may
allow some users to add their voice to conversations in which they may not
otherwise be safely able to participate."

A few philosophical observations:

>"Wikipedia has tried to block users coming from the Tor network since 2007,
alleging vandalism, spam, and abuse."

Question #1. If let's say a country like China blocks users and/or content,
citing "vandalism, spam, and abuse" as their reasons for doing so, then is
this Censorship, or is this blocking content/users for "vandalism, spam, and
abuse"?

?

That is, how _exactly_ does one distinguish blocking content/users for
"vandalism, spam, and abuse" differ from Censorship -- if the net effect is
the same in both cases?

See, whatever reasons we give as criteria for this distinguishing process --
we must then be able to apply them equally to the other party.

If we say that it's OK for a company to block users based on "vandalism, spam,
and abuse" \-- then how do we know that the Chinese government (or any other
government that engages in censorship) does not apply the exact same criteria
when it apparently censors, and if so, is it justified in those content
takedowns?

In other words, if we're saying that it's OK for Wikipedia to engage in its
actions, then why is China not justified in doing the same thing?

And if it's not OK for China to do, then why is it justified for Wikipedia to
do?

Disclaimer: I am neither for nor against China, and I am neither for nor
against Wikipedia.

I merely think it that it would make for an interesting philsophical debate as
to why one's actions are justfied, and the other's actions are not, IF BOTH
SETS OF ACTIONS RESULT IN THE SAME EFFECT.

Help me to understand. Pretend I am Socrates, that is, I claim to know
nothing, and it's your goal to educate me...

------
hatmatrix
It doesn't seem to discuss the justification for considering the ban in the
first place. I can imagine as an example, Exxon-Mobile trying dominating the
climate change discussion through anonymous edits (though they didn't, at
least partially; they were retroactively caught trying to modify relevant
pages from IP associated with their business).

------
ryanisnan
Technically speaking, does Wikipedia just keep IPs of all exit nodes? Other
than that, I'm curious what attributes designates traffic as "TOR" traffic.

~~~
bawolff
Yes, it downloads a list of all exit nodes.

Code is open source and viewable at [https://github.com/wikimedia/mediawiki-
extensions-TorBlock](https://github.com/wikimedia/mediawiki-extensions-
TorBlock)

~~~
ryanisnan
I'm curious why TOR would maintain a list of exit nodes so that this is
possible?

~~~
bawolff
Tor is a pretty centralized architecture-the client using tor gets to choose
its path through the tor network. It needs to know all the nodes in the system
in order to construct a valid path

------
surround
Is there any reason why Wikipedia doesn’t assign non-logged-in users a unique
user ID instead of exposing their IP address?

~~~
bawolff
There are some proposals in this direction (serious proposals that might
actually happen. Not just wishful thinking proposals)

[https://meta.wikimedia.org/wiki/IP_Editing:_Privacy_Enhancem...](https://meta.wikimedia.org/wiki/IP_Editing:_Privacy_Enhancement_and_Abuse_Mitigation)

~~~
surround
This proposal echoes my concerns and ideas very closely, thank you.
Unfortunately, the project is “currently in very early phases,” there’s no
“particular deadline,” and there’s a lot of opposition to the proposal [0], so
perhaps it’s wishful thinking after all.

[0][https://meta.m.wikimedia.org/wiki/Talk:IP_Editing:_Privacy_E...](https://meta.m.wikimedia.org/wiki/Talk:IP_Editing:_Privacy_Enhancement_and_Abuse_Mitigation#Support_and_Oppose_List)

------
lucb1e
Wait, am I just completely misreading this or is this contradicting itself?

> the research team found that Tor users made similar quality edits to those
> of IP editors [...] and first-time editors. The paper notes that Tor users,
> on average, contributed higher-quality changes to articles than non-logged-
> in IP editors.

Is it similar quality or higher-quality now? The text also appears (word for
word) on the linked website at nyu.edu. Reading the original paper, guess what
I found?

> Using hand-coded data and a machine-learning classifier, we estimated that
> edits from Tor users are of similar quality to those by IP editors and
> First-time editors. We estimated that Tor users make more higher quality
> contributions than other IP editors, on average, as measured by PTRs.

Almost the same wording and contradiction again. There is a subtle change,
namely that the "of similar quality" judgement is a result of hand- and
machine-classifying, and "more higher quality contributions" is the judgement
of a metric called PTR* . There might also be a difference between "similar
quality edits" and "more high quality edits" (e.g. Tor users might do more
crap edits and more great edits by one metric but simply be about average by
another metric), but I'm not sure if that's just random variation in phrasing
or intentional.

* PTRs are "persistent token revisions". I don't find it very succinctly/adequately explained at first use, but probably if you read the whole paper it makes more sense. To my understanding, it's basically just how much of the contribution was later changed (within a fixed number of subsequent edits), presuming that if it was largely left unchanged, it was probably a welcome and correct edit.

While the article and paper are all positive, I'm not sure whether this might
just be because _of course_ we'd all love to hear how great Tor users are
(many of us are also Tor users: we like to think of ourselves as freedom
fighters, privacy advocates, etc.), but I'm not sure that's what this
unambiguously shows. Perhaps it's worth the moderation effort to unban them,
perhaps not. The paper does acknowledge this to an extent: "We simply cannot
know if our sample of Tor edits is representative of the edits that would
occur if Wikipedia did not block anonymity-seeking users."

Perhaps, instead of ban vs unban, we just need another system to anonymously
contribute changes, like a moderation queue, which would make it less
attractive for vandalism.

(On StackOverflow/StackExchange, anyone can edit without logging in and not
even your IP address is shown. While moderating it, I very very rarely see
trolls or spambots there. I'm not sure if that's because of some magic system
I don't know about or if it's simply because a manual review filters all the
garbage and there is no point in trying.)

~~~
MauranKilom
Incidentally, SO just released a blog post about their spam prevention
measures.

[https://stackoverflow.blog/2020/06/25/how-does-spam-
protecti...](https://stackoverflow.blog/2020/06/25/how-does-spam-protection-
work-on-stack-exchange/)

~~~
lucb1e
Oh cool, that explains why we see so little spam. Still though, Wikipedia is
also a big site with a huge audience, they must have similar issues and
perhaps protections. Assuming there's more to it than just IP bans, the same
could be applied to Tor _plus_ a review queue to make non-spam vandalism also
unattractive.

------
vmception
tl;dr preventing spam by blocking all TOR users is lazy.

~~~
hombre_fatal
Just depends on the service.

You're not lazy just because you decide something isn't worth it.

I've worked on some services where I'm not exaggerating to say that almost all
Tor traffic was abuse.

~~~
vmception
But its more likely that one person was doing one thing taking up the majority
of the badnwidth originating from tor

And the distinct users would be trying to do another thing

------
MintelIE
I use Tor for most of my browsing these days. While I'm under no illusion that
it protects me from the US government (it IS NSA software after all), I'm
fairly confident that it does protect me against non-15-eyes and corporate
spying and data collection.

~~~
slim
15 eyes is a good one (supposed to be 5 eyes)

~~~
MintelIE
It’s 15 eyes now and has been for years.

