Hacker News new | comments | show | ask | jobs | submit login
Wikidata (wikidata.org)
77 points by kjhughes 1424 days ago | hide | past | web | 27 comments | favorite

I was just thinking about something like this today (or at least what I think this might be, since I really can't tell from the page).

Here is what I want... I want a way to query and download Wikipedia metadata easily.

For example, I recently saw the Burke Museum released an Wildflower App, but none for Windows Phone. I was thinking I could use Wikipedia to build a very similar app, but I needed to get access to hierarchical metadata (flower type, color, region of the world, environment, etc...). I suspect it's largely there in Wikipedia, but it's not structured (or at least I don't know how to query it).

I can imagine this being of use for a lot of things: plants, birds, fish, basketball, football, countries, presidents, trees, foods, etc...

Has this been done already? Is it a solved problem with Wikipedia already?

That is the direction Wikidata is intending to go, yes. But it doesn't have a lot of semantic data in it yet.

In the meantime, some other options that might be useful:

There's another Wikimedia Foundation project specifically for species taxonomy: http://species.wikimedia.org

As 'riffraff mentioned there's a project, DBpedia, which is based on extracting structured data by parsing Wikipedia infoboxes: http://dbpedia.org/About

For certain kinds of queries the Wikipedia category system already provides some semi-structured data, e.g. you can consult https://en.wikipedia.org/wiki/Category:Fictional_parrots to get a list of some famous fictional parrots. You can query by category programmatically using the MediaWiki API http://www.mediawiki.org/wiki/API:Main_page

> 'riffraff

Completely OT, but is this an established HN convention for marking usernames? I've noticed it here and there, and it seems like a goodish idea.

I got it from 'tptacek. Some examples of him using it here: https://news.ycombinator.com/user?id=tptacek

I'm not sure if he originated it, but it seemed useful to me so I borrowed it.

Well, not really. When you quote a username like this, it means you're talking about the name—the symbol—rather than about the user. So this makes sense:

  'riffraff is eight letters long
But this does not make sense:

  'riffraff mentioned the project
... since names don't talk.

You're being more philosophical about it than I am.

I just wanted to be able to refer to users with names like "the" without putting scare quotes around the names. What I wanted, in effect, was to prevent evaluation of the names in the sentence.

Not somehow quoting HN names feels even weirder; writing sentences like "anonymous says we should use the smoked paprika" or "nooonespecial stole our bucket of mashed potatoes" just feels wrong.

I'm happy for it to be an idiosyncrasy of mine, though.

But why quote the name? "gruseom wrote a comment to which tptacek replied" is no different from "Daniel did this and Thomas did that". Is it? If you don't evaluate the names, you end up talking about the names themselves.

(Not being pedantic, just curious.)

Because some people on HN have names that are ambiguous in sentence context. "gruseom" and "tptacek" aren't ambiguous; "the" and "noonespecial" are.

Ahhhhh, I get it now. Yes, ambiguity is an issue.

My son used to play an FPS that used scrolling text to report on what actions were happening. If you shot a player named, say, "gruseom", the scrolling text would read:

  You killed gruseom.
One day I noticed that some wag had picked the username "yourownmotherhowcouldyou".

Edit: You're basically using ' the way that @ migrated into general usage from Twitter. But the Lisp ' is so burned into my memory that it was surprisingly hard to see that. I do prefer your ' to @; it's not as heavy.

Glancing at submissions on the front page at the moment:



It doesn't help that people sometimes capitalize usernames when they put them at the beginning of a sentence. /shrug

Your explanation is the opposite of the classic Lisp usage of symbols. In Lisp, 'foo is not supposed to represent the 3-character string "foo". It can be coerced to that string, but that's not the default or expected representation. Instead, it's supposed to represent an opaque entity, referred to with the arbitrary symbol 'foo. Usernames are much like that.

Let's change the first example to:

  'riffraff is a symbol.
Does that remove the objection?

Wikidata is a collaboratively edited knowledge base operated by the Wikimedia Foundation. It is intended to provide a common source of certain data types (for example, birth dates) which can be used by Wikimedia projects such as Wikipedia. This is similar to the way Wikimedia Commons provides storage for media files and access to those files for all Wikimedia projects. - http://en.wikipedia.org/wiki/Wikidata

ScraperWiki is a web-based platform for collaboratively building programs to extract and analyze public (online) data, in a wiki-like fashion. "Scraper" refers to screen scrapers, programs that extract data from websites. "Wiki" means that any user with programming experience can create or edit such programs for extracting new data, or for analyzing existing datasets. - http://en.wikipedia.org/wiki/ScraperWiki

check out dbpedia's sparql endpoint, or freebase's apis

(Repost of part of my previous comment, still valid, I think)

Wikidata, just like Freebase, is trying to collect structured or semi-structured data instead of unstructured data such as that present in Wikipedia. I am happy about the aim (completely unstructured data is basically useless for any serious data reuse and data extraction) but my fear is that they will not succeed as well as they did with Wikipedia. Wikipedia funded its success on the fact that anybody could edit it. In order to edit a wikipedia page you only need very low technical skills and basic writing skills (plus knowledge of the topic, obviously). Adding and manipulating structured data requires people to obey to a certain mental grid, to a formalized model, to a schema developed by someone and put in place to be respected strictly. The vast majority of people is easily demotivated when they are required to learn something substantial beforehand and most of the edits of unskilled users end up removed by watchdog (something seen often in high quality Wikipedia articles: edits made by new users are quickly reverted on the grounds that they did not follow some of the many guidelines that must be followed).

My idea is that many problems found in structured-data projects (FreeBase, MusicBrainz...) could be alleviated by better interfaces and a wide use of automation, both things that Wikipedia projects do not seem to excel in.

Users and computers can here. Let any unskilled user make a change. There are plenty of editors who just watch the changelog and fix things up for grammer or for NPV violations, and now that can include include incorporating it into the structured data.

In the future, bots using some kind of NLP can scour over the changes and look for new facts to extract and incorporate, either by suggesting features to editors or to just make the change themselves.

For those who are confused (many of the posts currently appear to be), this seems like it is an attempt to semantic-ize Wikipedia-style definitions.

For example, representing a term as: Earth "is a" planet "belongs to" the solar system, "has" [properties X, Y, Z].

This has applications in natural language processing and other types of artificial intelligence (for example, within a particular niche, it could be useful in an expert system). Think about this type of thing as a complex graph representing "knowledge" in some abstract but still useful sense.

Also known as faceted classification.[1]

[1] http://en.wikipedia.org/wiki/Faceted_classification

Am I missing something? Wikidata talks big, but apparently it only contains lists of different language versions of Wikipedia articles. (https://www.wikidata.org/wiki/Wikidata:Introduction#What_do_...) There doesn't seem to be any sort of codified plan forward, so I'm not sure what to think of it.

The initial goals don't seem world changing, but maybe at Wikipedia scale they could be:

We are working on three major things:

1. Centralizing language links

2. Providing a central place for infobox data for all Wikipedias

3. Creating and updating list articles based on data in Wikidata

In particular I imagine this will have an initial impact of massively increasing the quality of the lesser used Wiki languages, by leveraging the work on the busier sites.

You might want to dig into the documentation on the Wikimedia Meta site: https://meta.wikimedia.org/wiki/Wikidata

Entry Q13: triskaidekaphobia (the fear of the number 13)


I'm thinking this is a one-stop-shop for public datasets? Maybe for when you're reading an article on something like 'gun violence in urban areas', but want to see stats from multiple studies on the topic. That's my best guess and I can see that solving the problem of trying to reference data from studies in a unified way.

Noscript users, or in case anyone else sees lots of numbers like "Q1969448" instead of data values in the tables (articles?): Wikidata apparently requires javascript but doesn't make it clear. Allow scripts and refresh.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact