
Wikipedia’s Switch to HTTPS Has Successfully Fought Government Censorship - rbanffy
https://motherboard.vice.com/en_us/article/wikipedias-switch-to-https-has-successfully-fought-government-censorship
======
shpx
It won't last, at least for China. Their government is working on a clone of
wiki, scheduled for 2018[0]. Once that's done they'll likely completely ban
the original.

Wikipedia publishes database dumps every couple of days[1]. So it shouldn't be
that expensive for smaller governments to create and host their own censored
mirror. You'd maintain a list of banned and censored articles, then pull from
wikipedia once a month. You'd have to check new articles by hand (maybe even
all edits), but a lot of that should be easily automated, and if you only care
about wikipedia in your native tongue (and it's not english) that's much less
work.

The academics will bypass censorship anyway, since it's so easy[2], so an
autocrat won't worry about intellectually crippling their country by banning
wikipedia. Maybe they don't do this because the list of banned articles would
be trivial to get.

Better machine translation might solve this by helping information flow
freely[3]. We have until 2018 I guess.

[0] [https://news.vice.com/story/china-is-
recruiting-20000-people...](https://news.vice.com/story/china-is-
recruiting-20000-people-to-write-its-own-wikipedia)

[1] [https://dumps.wikimedia.org/backup-
index.html](https://dumps.wikimedia.org/backup-index.html)

[2] [https://www.wired.co.uk/article/china-great-firewall-
censors...](https://www.wired.co.uk/article/china-great-firewall-censorship-
fang-binxing)

[3] [https://blogs.wsj.com/chinarealtime/2015/12/17/anti-
wikipedi...](https://blogs.wsj.com/chinarealtime/2015/12/17/anti-wikipedian-
translation-at-chinas-internet-conference/)

~~~
Markoff
wut? everyone in China already use Baike instead of Wikipedia, nobody really
understand why they are making another website

~~~
runn1ng
To compare

Chinese Wikipedia has 940,000 articles, baike has 6 million articles.

~~~
dmix
Wikipedia editors are pretty strict about what gets to remain a page. Everyone
knows they delete articles unless it has lots of sources and public interest.

With 6x the articles on baike I can't imagine that there is that level of
quality control. Unless there are 6x as many things worth documenting in China
vs rest of the world.

An interesting statistic none-the-less.

~~~
chucksmash
English Wikipedia has 5.4 million articles. I imagine many of them would be
notable in Chinese too.

------
awinter-py
Can an expert comment on side-channel attacks on HTTPS and whether they're
less viable on HTTP/2?

My assumption is that because wikipedia has a known plaintext and a known link
graph it's plausible to identify pages with some accuracy and either block
them or monitor who's reading what.

I also assume that the traffic profile of editing looks different from
viewing.

~~~
chimeracoder
> My assumption is that because wikipedia has a known plaintext and a known
> link graph it's plausible to identify pages with some accuracy

At least in theory, the latest versions of TLS should not be vulnerable to a
known plaintext attack. TLS also is capable of length-padding, which would
reduce the attack surface here as well for an eavesdropper.

My understanding is that HTTP/2 makes it even more difficult to construct an
attack on this basis, because HTTP/2 means multiple requests can get rolled
into one.

Of course, all this is assuming an eavesdropper without the ability to
intercept and modify traffic. In practice, governments will probably just MITM
the connection - we have precedent for governments abusing CAs like this in
the past - and unless Wikipedia uses HPKP _and_ we trust the initial
connection _and_ we trust that the HPKP reporting endpoint isn't blocked, then
it's still possible to censor pages, without anybody else knowing[0].

[0] ie, the government censors will know, and the person who attempted to
access the page will know, but neither Wikipedia nor the browser vendor would
be able to detect the censorship automatically.

~~~
colmmacc
TLS1.2 doesn't have an effective padding scheme, and with most sites
(including Wikipedia) moving to AES-GCM and ChaCha20, it is actually less
effective than the primitive CBC padding, which provided some protection.

TLS1.3, which is still a draft, does have support for record-level padding,
but I haven't seen any of the experimental deployments using it.

HTTP/2 does have support for padding, but again, it's not common to see it
being used, at least not in the kind of sizes it would take to obscure content
fingerprints.

Wikipedia is a particularly hard case for traffic analysis fingerprinting.
First, the combination of page size and image sizes are just highly unique,
even modulo large block/padding sizes. But more importantly, anyone can edit a
wikipedia page, so if the size of a target page isn't unique, it's very easy
to go ahead and edit it to make it so. It would take very large amounts of
padding to defeat this.

So it's definitely possible to fingerprint which wikipedia someone is
browsing. But it's probably not easy to block it; the fingerprint is only
detectable after the page has been downloaded. So it's not very useful for
censorship.

~~~
chimeracoder
> But it's probably not easy to block it; the fingerprint is only detectable
> after the page has been downloaded. So it's not very useful for censorship

Well, it's detectable after the request has been made and Wikipedia sends the
response. Assuming that a government has the capabilities to block delivery of
that response (which they do), they can still implement censorship at this
level, before the page reaches the end user.

------
petre
There was an IPFS clone of wikipedia after Turkey blocked it.

[http://observer.com/2017/05/turkey-wikipedia-
ipfs/](http://observer.com/2017/05/turkey-wikipedia-ipfs/)

------
darkhorn
There were few censored pages on the Turkish Wikipedia when it was on HTTP.
They were the "vagina" article and election prediction article. Only those
pages were censored.

Last month there were some articles on the English Wikipedia about ISIS-
Erdoğan (I don't care true or not). Then they have blocked all Wikipedia (all
languages). Because they were unable to block those individual pages.

~~~
thr0w__4w4y
Yup. Was there 2 weeks ago working with a group of Turkish engineers - I went
online to get some technical information about a particular stream cipher, and
WHOOPS! - Wikipedia is blocked, completely.

Fired up my VPN, accessed the page, thank you very much.

"The Net interprets censorship as damage and routes around it." \- John
Gilmore

------
rocky1138
How do governments censor only parts of Wikipedia when the site is encrypted?
How do they know which pages you are browsing if they can't see the URL?

~~~
zeta0134
That's just it; they can't! When you visit Wikipedia over HTTPS, the only
thing actually visible in plain text is wikipedia.org, and that's only if your
browser is using Server Name Identification (SNI).

Since the rest of the request, including the URL is hidden, governments and
other malicious agents between you and the server cannot actually see what
pages you're requesting directly. They can only see that you are accessing
wikipedia.org and transmitting some data. You may still be somewhat vulnerable
to timing attacks to try to identify what pages you're viewing, but censorship
can't happen at the page level over HTTPS; you have to block the whole thing
in one go.

~~~
Amulet-
The article says

Although countries like China, Thailand and Uzbekistan were still censoring
part or all of Wikipedia by the time the researchers wrapped up their study

The top comment might be asking about the "were still censoring part" of the
article.

~~~
zeta0134
Oh, huh! I missed that entirely, now I'm curious too. HTTPS should make that
difficult, but China has been known to employ all sorts of weird shenanigans--
perhaps they're running a "trusted' MitM as part of the great firewall?

I know that certain companies (like Google and Microsoft) will actively censor
themselves to continue to operate within China, but I figured Wikipedia would
be against that practice on principal. Now I'm curious as to how it's done.

~~~
matt4711
I think china blocks zh.wikipedia.org but all other languages are not blocked.

~~~
kalleboo
When I visited China a bunch of years ago, zh.wikipedia was completely
blocked, and on English wikipedia, only certain articles were deadholed
(tiananmen square...)

------
gwern
After reading through the whole paper, I would have to say that there is far
less censorship of WP, HTTPS or HTTP, than I guessed.

------
enzolovesbacon

      Critics of this plan argued that this move would just result in more 
      total censorship of Wikipedia and that access to some information 
      was better than no information at all
    

I'm no critic of this plan but I still don't understand why this wouldn't
result in more total censorship. Someone explain please?

~~~
dTal
Because Wikipedia is too useful. Note that it required a certain self-
confidence that this was the case for Wikipedia to implement this strategy.
And it's self-fulfilling - if Wikipedia allowed itself to be censored, then it
would have fewer contributors and its usefulness would suffer.

There's a rather interesting analogy to be made with the GPL here. Critics
argue that companies shy away from it because they cannot control it. Yet its
entire goal is to not be controlled, and it draws its strength from the
conviction that the body of GPL software is too useful to ignore. And again,
that's self-fulfilling.

It takes courage, but it's important to know when you have the power to say
"all of me, or none of me".

~~~
dragonwriter
> Critics argue that companies shy away from it because they cannot control
> it.

No, they don't. Critics point out that companies avoid it, and non-critics
ascribe this avoidance to "can't control it", which is false, because nothing
under a third-party copyright under any non-exclusive license can be
controlled by the licensee, but businesses avoiding the GPL don't generally
avoid all non-exclusive licenses.

~~~
cyphar
I think "can't control" refers to sublicensing in this context. People's
dislike over copyleft stems from wanting to make software proprietary (or
proprietary-friendly through lax licensing). Copyleft removes that control,
and the GPL's main strength is that it is so ubiquitous that you cannot
practically avoid it (in most cases).

------
shusson
TIL: HTTPS encrypts the URL.

~~~
blhack
I think it's a fun/educational process to interact with some daemons over
telnet. You can telnet into port 80 and create an HTTP request, for instance.

Certification negotiation happens _before_ the GET request happens, which
means that the "URL" (or, rather, everything after the domain) is encrypted.

You can also see some of this process with curl. So:

    
    
         curl -vvv https://www.google.com/

~~~
gol706
Telnet is a great way to realize that HTTP is just some simple text commands
and not some mysterious binary protocol.

WireShark also provides a good visualization of the HTTPS negotiation process
and the various layers of HTTPS requests and responses. It does take a lot
more to figure out than telnet though.

------
SpacePotatoe
I just wonder what UK government has against German metal bands

~~~
SXX
Against album cover:

[https://en.wikipedia.org/wiki/Virgin_Killer](https://en.wikipedia.org/wiki/Virgin_Killer)

~~~
Amulet-
That seems a bit random

~~~
bowersbros
because the artwork is of a nude 10 year old.

~~~
Amulet-
oh that makes more sense to me

~~~
olivermarks
To be fair to the Scorpions, quote from Wikipedia...original concept for song

'...Time is the virgin killer. A kid comes into the world very naive, they
lose that naiveness and then go into this life losing all of this getting into
trouble. That was the basic idea about all of it' Different times...
[https://en.wikipedia.org/wiki/Virgin_Killer](https://en.wikipedia.org/wiki/Virgin_Killer)

------
vbezhenar
Currently HTTPS sends domain in clear-text before establishing a connection.
It allows to host (and block) website by domain, not by IP. May be HTTPS
should have optional extension to send URI in clear-text before establishing a
connection. This way, if censors decide to block Wikipedia, users can opt-in
into this behaviour and have unblocked Wikipedia except few selected articles.

~~~
knome
Absolutely not. The response to censorship should not be to make things easier
for the censor.

Anyway, the idea is unworkable as the user's client could simply lie about
what URI it's going to send after the encrypted connection is setup.

~~~
vbezhenar
> Absolutely not. The response to censorship should not be to make things
> easier for the censor.

It's not about making things easier for the censor. It's already easy. It's
about making life easier for people who have to live with censorship (pretty
much the entire world, I guess?).

> Anyway, the idea is unworkable as the user's client could simply lie about
> what URI it's going to send after the encrypted connection is setup.

Good server should response with error, I guess.

~~~
thriftwy
People should be fighting for their rights and freedoms, not make their slave
life easier.

------
libeclipse
> a positive effect

Any numbers/figures?

~~~
shpx
[https://dash.harvard.edu/bitstream/handle/1/32741922/Wikiped...](https://dash.harvard.edu/bitstream/handle/1/32741922/Wikipedia_Censorship_final.pdf?sequence=1)

