
Show HN: We built an open source tool for journalists to inspect Wikipedia edits - psobot
http://blog.twg.ca/2014/11/building-wikiwash?
======
crazy2be
This is awesome! I've wanted to build something like this for a while, but
never gotten the chance :). Better tools like this are essential for helping
users accept and reject the correct edits, and making it easier to investigate
when reversions might be unjustified.

The only problem I noticed is that clicking the footnotes does not work, which
makes it harder to evaluate the legitimacy of an edit (well, it just means I
have to have the current article open as well).

~~~
psobot
Thanks!

Hadn't noticed that footnotes issue, but I'll add an issue to the GitHub repo:
[https://github.com/twg/wikiwash](https://github.com/twg/wikiwash).

------
maaaats
A tip on the diffing:

We have done something similar when diffing HTML, e.g. replacing the HTML with
single unicodes. And then we run the diff and get several diff-segments
(EQUALS, INS, DEL). What we have done, is then to scan those for tags, and
split them into a new type.

So an insert like INS(something \xE000 else) would become three _changes_.
E.g. INS(something ) INS_TAG(\xE000) INS( else). So the INS_TAG shouldn't be
wrapped in <ins> when converting this back to HTML.

------
xxxyy
I'm not sure if Wikipedia is OK with that, considering the load it generates
[1]. I am currently doing some research on Wikipedia, and for my purposes I
use the official dumps site at
[https://dumps.wikimedia.org/](https://dumps.wikimedia.org/)

[1] [http://en.wikipedia.org/robots.txt](http://en.wikipedia.org/robots.txt)

~~~
psobot
Load management is a big problem that we're looking to fix in the future.
Wikipedia's API definitely struggles when rendering old revisions of medium-
to-large pages, but we try our best to respect their own API etiquette
guidelines:
[http://www.mediawiki.org/wiki/API:Etiquette](http://www.mediawiki.org/wiki/API:Etiquette).

In the future, we'd love to combine a mix of the Wikipedia API for real-time
edit updates with a copy of the Wikipedia history database stored locally (all
2+ TB of it) to improve performance and decrease load on their servers.

------
JonLim
Oh boy, TWG on HN! ;) Great job with this Peter, playing around with it a bit
more, but I'm intrigued.

------
aw3c2
__You tried to access the
address[http://wikiwash.metronews.ca/](http://wikiwash.metronews.ca/), which
is currently unavailable. __

:(

~~~
psobot
Frantically trying to spin up more capacity for HN right now. Sorry about
that!

EDIT: Should be coping with the load nicely now.

------
tzm
Thanks for releasing this. Any plans to support other sites? Perhaps an open
source diffbot server?

------
allenguo
Great write-up!

------
imaginenore
wikiwash.metronews.ca doesn't load

------
Edmontonian
not a very useful project. Problem for journalists isn't "how to inspect
edits" but rather "which edits on which entries should I inspect"

~~~
davidgerard
In my experience (press volunteer for Wikimedia), just making journalists
aware of the "history" tab is revelatory. It's like they don't see it on the
page. So this may well be more effective than you think.

------
ExpiredLink
This is a surveillance tool so that 'we' can monitor 'their' activity and take
appropriate measures. Wikipedia's "openly editable model" in 2014.

~~~
icebraining
The monitoring was already there from the start (it's just being displayed
differently), and this application is not part of Wikipedia's model, it's a
third-party tool.

If you're complaining that edits are recorded and displayed publicly, you're
thirteen years late. That was always the model of any Wiki.

~~~
ExpiredLink
Of course, you could always click on the 'View history' tab. That's not the
point.

~~~
osivertsson
Then what exactly was your point?

