Here is what I want... I want a way to query and download Wikipedia metadata easily.
For example, I recently saw the Burke Museum released an Wildflower App, but none for Windows Phone. I was thinking I could use Wikipedia to build a very similar app, but I needed to get access to hierarchical metadata (flower type, color, region of the world, environment, etc...). I suspect it's largely there in Wikipedia, but it's not structured (or at least I don't know how to query it).
I can imagine this being of use for a lot of things: plants, birds, fish, basketball, football, countries, presidents, trees, foods, etc...
Has this been done already? Is it a solved problem with Wikipedia already?
In the meantime, some other options that might be useful:
There's another Wikimedia Foundation project specifically for species taxonomy: http://species.wikimedia.org
As 'riffraff mentioned there's a project, DBpedia, which is based on extracting structured data by parsing Wikipedia infoboxes: http://dbpedia.org/About
For certain kinds of queries the Wikipedia category system already provides some semi-structured data, e.g. you can consult https://en.wikipedia.org/wiki/Category:Fictional_parrots to get a list of some famous fictional parrots. You can query by category programmatically using the MediaWiki API http://www.mediawiki.org/wiki/API:Main_page
Completely OT, but is this an established HN convention for marking usernames? I've noticed it here and there, and it seems like a goodish idea.
I'm not sure if he originated it, but it seemed useful to me so I borrowed it.
'riffraff is eight letters long
'riffraff mentioned the project
I just wanted to be able to refer to users with names like "the" without putting scare quotes around the names. What I wanted, in effect, was to prevent evaluation of the names in the sentence.
Not somehow quoting HN names feels even weirder; writing sentences like "anonymous says we should use the smoked paprika" or "nooonespecial stole our bucket of mashed potatoes" just feels wrong.
I'm happy for it to be an idiosyncrasy of mine, though.
(Not being pedantic, just curious.)
My son used to play an FPS that used scrolling text to report on what actions were happening. If you shot a player named, say, "gruseom", the scrolling text would read:
You killed gruseom.
Edit: You're basically using ' the way that @ migrated into general usage from Twitter. But the Lisp ' is so burned into my memory that it was surprisingly hard to see that. I do prefer your ' to @; it's not as heavy.
It doesn't help that people sometimes capitalize usernames when they put them at the beginning of a sentence. /shrug
'riffraff is a symbol.
ScraperWiki is a web-based platform for collaboratively building programs to extract and analyze public (online) data, in a wiki-like fashion. "Scraper" refers to screen scrapers, programs that extract data from websites. "Wiki" means that any user with programming experience can create or edit such programs for extracting new data, or for analyzing existing datasets.
Wikidata, just like Freebase, is trying to collect structured or semi-structured data instead of unstructured data such as that present in Wikipedia. I am happy about the aim (completely unstructured data is basically useless for any serious data reuse and data extraction) but my fear is that they will not succeed as well as they did with Wikipedia. Wikipedia funded its success on the fact that anybody could edit it. In order to edit a wikipedia page you only need very low technical skills and basic writing skills (plus knowledge of the topic, obviously). Adding and manipulating structured data requires people to obey to a certain mental grid, to a formalized model, to a schema developed by someone and put in place to be respected strictly. The vast majority of people is easily demotivated when they are required to learn something substantial beforehand and most of the edits of unskilled users end up removed by watchdog (something seen often in high quality Wikipedia articles: edits made by new users are quickly reverted on the grounds that they did not follow some of the many guidelines that must be followed).
My idea is that many problems found in structured-data projects (FreeBase, MusicBrainz...) could be alleviated by better interfaces and a wide use of automation, both things that Wikipedia projects do not seem to excel in.
In the future, bots using some kind of NLP can scour over the changes and look for new facts to extract and incorporate, either by suggesting features to editors or to just make the change themselves.
For example, representing a term as: Earth "is a" planet "belongs to" the solar system, "has" [properties X, Y, Z].
This has applications in natural language processing and other types of artificial intelligence (for example, within a particular niche, it could be useful in an expert system). Think about this type of thing as a complex graph representing "knowledge" in some abstract but still useful sense.
We are working on three major things:
1. Centralizing language links
2. Providing a central place for infobox data for all Wikipedias
3. Creating and updating list articles based on data in Wikidata
In particular I imagine this will have an initial impact of massively increasing the quality of the lesser used Wiki languages, by leveraging the work on the busier sites.
Commentary by The Atlantic: http://www.theatlantic.com/technology/archive/2012/04/the-pr...