

Ask YC: Anyone built/seen any cool Wikipedia apps? (e.g. WikiScanner) - colortone

I'm interested in building some applications that can try to make some sense from a given Wikipedia entry's edit history.  There could be some cool integrations with Mechanical Turk or nat'l language eventually, but the basic idea is very simple for now.<p>- Has anyone seen an app that does something along these lines? (aside from WikiScanner, and more interesting than FoxyTunes Planet)
- I could have sworn I heard some noise about a Wikipedia API...was I imagining that?
- Does anyone here have a desire to work on a project like this, or have a recommendation of someone who might?
- What potential problems do you see in building something that crawls Wikipedia?<p>This ain't Powerset ;-) ...but could be cool.<p>Thanks!  Hacker News is the shit...
======
colortone
I just found a newsgroup meme that pointed me to Wikipedia's policy on
crawlers vs. using a Database download:

<http://en.wikipedia.org/wiki/Wikipedia:Database_download>

Can anyone translate this? I understand the part about not letting any
crawlers ping Wikipedia more than one time per second, but what does this DB
download provide and how often is it updated?

~~~
xirium
The full dataset is available as HTML, XML and SQL from
<http://download.wikipedia.org/> . Some versions are updated weekly. Some
versions are updated less frequently. It doesn't occur on a strict schedule
because the volume of data increases and sometimes the snapshot process fails.

For general use, the HTML version would be easiest to process. The HTML is
compressed at a 14:1 ratio using 7zip, so it vastly reduces Wikipedia's server
load and bandwidth utilisation. For diffing edit history, the SQL version
would be most appropriate because it mirrors the Wikipedia schema, including
previous edits.

We were using the HTML dataset for testing our own search, but we've outgrown
it.

~~~
colortone
Thanks so much!

Seen any cool Wikipedia bots or apps in your travels?

ANYthing is interesting to me...

