The first baby step is interwiki links. Previously, every article in every version of Wikipedia had a long list of links to every other version of that article, maintained by a small army of flaky bots that clogged up edit histories and often stepped on each others' toes. Now, there's a single Wikidata node that has the single mapping of what article corresponds to what, reflected across all Wikipedia versions. Here's "Tokyo" now:
And here's what it replaces, repeated dozens of times in every language:
The next step will be infoboxes. Instead of every Wikipedia having a separate copy of the population of Tokyo or the GDP of Canada, updated ad hoc by different people whenever they get around to it, there will be a single place storing that data automatically reflected into all Wikipedias.
And it keeps going. Taxonomies of plants and animals change all the time, Wikidata can become their single repository. Wikivoyage currently has to store the phone numbers of each hotel separately in each language version, Wikidata will allow centralizing them. Here's the master plan:
I've spent some time lately building election results maps for Google, and while Wikipedia has been a useful resource, it's also been very frustrating.
There is a wealth of geographic data on Wikipedia, but hardly any of it is in a usable form. Look at all the interesting maps you see there. Somebody generated each of those maps from actual usable data such as shapefiles  or KML files . The generated maps are nice to look at, but I can't do anything else with them. I need the shapefiles or equivalent to build other kinds of maps with the data.
For the Brazil election, we had shapefiles for the municipalities which had been mangled by a process that converted all the municipality names to uppercase with no accent marks. I wanted to display the correctly capitalized and accented names and found this list:
This was great! It had all the data I needed. But it's in a jumbled format that looks nice in the wiki but doesn't lend itself to machine use. It took me a few hours to write some Python code to parse the page and get it into a CSV format that I could import into my database. This is a frustrating kind of work, because I was pretty sure that somebody had started with nice tabular data and generated the initial version of this Wikipedia page from that.
If we're eventually able to get this kind of data as real usable data, that will save a lot of people a lot of work.
1) what need wikidata will fill
2) what they want me to do
3) how I would go about doing (2) (other than "starting the wikidata community", mentioned at the bottom of the page, which sounds like a lot of work).
Maybe I'm not their target audience, but this sure as hell isn't a good elevator pitch.
Which corresponds to "victory", defined as "term that applies to success" , and also known as "win" and "success".
Sorry, but what's the need here? This just seems to dilute whatever's going on at Wikipedia.
Where hovering over "San Francisco's population" would give you the data-point from wikidata?
Is this not the level of parsing that Wolfram Alpha is trying to do?
Also, it would be really interesting to see the following:
Assume you create new google doc, and as you begin typing, in a small window/pane/pop-up, relevant information is displayed based on the context of what you are typing with subtle highlighting. As you typed out "San Francisco's population" that phrase would highlight and the context indicator would display that number.
What would be interesting about this is that if children where using a system like this from their early school days - would they passively absorb such information? would it be annoying or useful?
I've been thinking along the same lines, but for a study prosthetic. Imagine a head-mounted camera that OCRs as you read. So, when you read "Navier-Stokes" there's side-bar with everything that you know about Navier-Stokes, equations, code samples, etc.
There's a fair amount of debate over that property. Are those current high level types (person, place, work, event, organization, term) a good fit for a knowledgebase that aims to structure all knowledge and not just library holdings? Does classifying subjects like inertia, DNA, Alzheimer's disease, dog, etc. as simply "terms" make sense?
More reading related to Wikidata, ontology and types: https://blog.wikimedia.de/2013/02/22/restricting-the-world/.
This is also just one aspect of Wikidata. The centrality of shared table content is important, too. Why have data in a specific language version Wikipedia and point to it from other versions when you can have a central repository that is pointed to using templates from each language version?
There's a strong correspondence between "tabular data" (you probably mean relational) and triples (<predicate,X,Y>). Bot are based on the first-order predicate logic, so there's actually a natural mapping.
This looks like a great response to that. I just hope they've made it easy to interface with.
For instance, imagine a table of all the populations of the countries of the world. Today, someone might make a really good one for the French Wikipedia. But then someone has to make it from scratch, all over again, for the Greek Wikipedia. And when someone updates the French one, the Greek one doesn't update, and vice versa.
With Wikidata you can define the data once, and then transclude it to different pages, with translated labels if necessary.
The first release attacks the problem of "inter-wiki links". On the left hand side of some Wikipedia pages, there are the links to equivalent pages in different languages. Check out the one for http://en.wikipedia.org/wiki/Jimmy_Wales, for instance. Right now these are updated with a system that looks at every possible connection (scaling at O(n2)), and with Wikidata it will be more manageable.
from Omnipedia http://brenthecht.com/papers/bhecht_CHI2012_omnipedia.pdf [pdf]:
One major source of ambiguities in the ILL graph is conceptual drift
across language editions. Conceptual drift stems from the well-known
finding in cognitive science that the boundaries of concepts vary
across language-defined communities . For instance, the English
articles “High school” and “Secondary school” are grouped into a
single connected concept. While placing these two articles in the
same multilingual article may be reasonable given their overlapping
definitions around the world, excessive conceptual drift can result
in a semantic equivalent of what happens in the children’s game
known as “telephone”. For instance, chains of conceptual drift
expand the aforementioned connected concept to include the English
articles “Primary school”, “Etiquette”, “Manners”, and even
“Protocol (diplomacy)”. Omnipedia users would be confused to see
“Kyoto Protocol” as a linked topic when they looked up “High
school”. A similar situation occurs in the large connected concept
that spans the semantic range from “River” to “Canal” to “Trench
warfare”, and in another which contains “Woman” and “Marriage”
(although, interestingly, not “Man”).
It's also worth noting that Freebase itself heavily relied on parsing Wikipedia database dumps to build its ontology -- to a large extent Wikidata is giving structure to data that's been in Wikipedia all along.
Wikidata data is dedicated to the public domain, using http://creativecommons.org/publicdomain/zero/1.0/
Most Freebase data licensed under a CC-BY license. Details are here: http://www.freebase.com/policies/attribution
A CC-BY license can be a burden, if you really want to fulfill all the terms of the license, namely: "You must attribute the work in the manner specified by the author or licensor." If there are 20,00 authors, are you really going to find out how each one wants you to give them attribution? It's impractical, so what you end up doing is giving what you think is reasonable attribution. But you never really know for sure.
Even worse, some of the material in Freebase is under other licenses, such as CC-BY-SA or GFDL.
Q1 - universe
Q2 - Earth
Q3 - life
Q24 - Jack Bauer
Q76 - Barack Obama