Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: We built an open source tool for journalists to inspect Wikipedia edits (twg.ca)
131 points by psobot on Nov 18, 2014 | hide | past | favorite | 18 comments



This is awesome! I've wanted to build something like this for a while, but never gotten the chance :). Better tools like this are essential for helping users accept and reject the correct edits, and making it easier to investigate when reversions might be unjustified.

The only problem I noticed is that clicking the footnotes does not work, which makes it harder to evaluate the legitimacy of an edit (well, it just means I have to have the current article open as well).


Thanks!

Hadn't noticed that footnotes issue, but I'll add an issue to the GitHub repo: https://github.com/twg/wikiwash.


A tip on the diffing:

We have done something similar when diffing HTML, e.g. replacing the HTML with single unicodes. And then we run the diff and get several diff-segments (EQUALS, INS, DEL). What we have done, is then to scan those for tags, and split them into a new type.

So an insert like INS(something \xE000 else) would become three changes. E.g. INS(something ) INS_TAG(\xE000) INS( else). So the INS_TAG shouldn't be wrapped in <ins> when converting this back to HTML.


I'm not sure if Wikipedia is OK with that, considering the load it generates [1]. I am currently doing some research on Wikipedia, and for my purposes I use the official dumps site at https://dumps.wikimedia.org/

[1] http://en.wikipedia.org/robots.txt


Load management is a big problem that we're looking to fix in the future. Wikipedia's API definitely struggles when rendering old revisions of medium-to-large pages, but we try our best to respect their own API etiquette guidelines: http://www.mediawiki.org/wiki/API:Etiquette.

In the future, we'd love to combine a mix of the Wikipedia API for real-time edit updates with a copy of the Wikipedia history database stored locally (all 2+ TB of it) to improve performance and decrease load on their servers.


FWIW, the wikisphere thinks this is pretty cool :-) New ways of reusing our stuff are always of interest.


Oh boy, TWG on HN! ;) Great job with this Peter, playing around with it a bit more, but I'm intrigued.


You tried to access the address http://wikiwash.metronews.ca/, which is currently unavailable.

:(


Frantically trying to spin up more capacity for HN right now. Sorry about that!

EDIT: Should be coping with the load nicely now.


Thanks for releasing this. Any plans to support other sites? Perhaps an open source diffbot server?


Great write-up!


wikiwash.metronews.ca doesn't load


not a very useful project. Problem for journalists isn't "how to inspect edits" but rather "which edits on which entries should I inspect"


In my experience (press volunteer for Wikimedia), just making journalists aware of the "history" tab is revelatory. It's like they don't see it on the page. So this may well be more effective than you think.


This is a surveillance tool so that 'we' can monitor 'their' activity and take appropriate measures. Wikipedia's "openly editable model" in 2014.


The monitoring was already there from the start (it's just being displayed differently), and this application is not part of Wikipedia's model, it's a third-party tool.

If you're complaining that edits are recorded and displayed publicly, you're thirteen years late. That was always the model of any Wiki.


Of course, you could always click on the 'View history' tab. That's not the point.


Then what exactly was your point?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: