
 	Wikidata, the free knowledge base that anyone can edit - a3_nm
https://www.wikidata.org/wiki/Wikidata:Introduction
======
jpatokal
First of all, having worked with wikis for about ten years, finally seeing
this live in Wikipedia is _huge_ \-- but it'll take a long time for it to have
full impact.

The first baby step is interwiki links. Previously, every article in every
version of Wikipedia had a long list of links to every other version of that
article, maintained by a small army of flaky bots that clogged up edit
histories and often stepped on each others' toes. Now, there's a single
Wikidata node that has the single mapping of what article corresponds to what,
reflected across all Wikipedia versions. Here's "Tokyo" now:

<http://www.wikidata.org/wiki/Q1490>

And here's what it replaces, repeated dozens of times in every language:

[http://en.wikipedia.org/w/index.php?title=Tokyo&diff=539...](http://en.wikipedia.org/w/index.php?title=Tokyo&diff=539264613&oldid=539032776)

The next step will be infoboxes. Instead of every Wikipedia having a separate
copy of the population of Tokyo or the GDP of Canada, updated ad hoc by
different people whenever they get around to it, there will be a single place
storing that data automatically reflected into all Wikipedias.

And it keeps going. Taxonomies of plants and animals change all the time,
Wikidata can become their single repository. Wikivoyage currently has to store
the phone numbers of each hotel separately in each language version, Wikidata
will allow centralizing them. Here's the master plan:

<http://en.wikipedia.org/wiki/Wikipedia:Wikidata>

~~~
noahl
That page for Tokyo looks suspiciously like the internals of a natural
language processing system. I can't wait until someone hooks up a
probabilistic speech parser to Wikidata. A talking computer with the knowledge
of Wikipedia could be incredible.

------
Stratoscope
I'm pretty excited about this, if part of the plan is to encourage Wikipedia
contributors to add their data _as data_ instead of as finished output.

I've spent some time lately building election results maps for Google, and
while Wikipedia has been a useful resource, it's also been very frustrating.

There is a wealth of geographic data on Wikipedia, but hardly any of it is in
a usable form. Look at all the interesting maps you see there. Somebody
generated each of those maps from actual usable data such as shapefiles [1] or
KML files [2]. The generated maps are nice to look at, but I can't do anything
else with them. I need the shapefiles or equivalent to build other kinds of
maps with the data.

For the Brazil election, we had shapefiles for the municipalities which had
been mangled by a process that converted all the municipality names to
uppercase with no accent marks. I wanted to display the correctly capitalized
and accented names and found this list:

[http://en.wikipedia.org/wiki/List_of_municipalities_of_Brazi...](http://en.wikipedia.org/wiki/List_of_municipalities_of_Brazil)

This was great! It had all the data I needed. But it's in a jumbled format
that looks nice in the wiki but doesn't lend itself to machine use. It took me
a few hours to write some Python code to parse the page and get it into a CSV
format that I could import into my database. This is a frustrating kind of
work, because I was pretty sure that somebody had _started_ with nice tabular
data and generated the initial version of this Wikipedia page from that.

If we're eventually able to get this kind of data as real usable data, that
will save a lot of people a lot of work.

[1] <http://en.wikipedia.org/wiki/Shapefile>

[2] <http://en.wikipedia.org/wiki/Keyhole_Markup_Language>

------
eykanal
I just read through the entire front page and I have no idea:

1) what need wikidata will fill

2) what they want me to do

3) how I would go about doing (2) (other than "starting the wikidata
community", mentioned at the bottom of the page, which sounds like a lot of
work).

Maybe I'm not their target audience, but this sure as hell isn't a good
elevator pitch.

------
danso
What...is... _this_? Is there any organization of these datapoints? All I see
on the front page is links to single datapoints, such as this:

<http://www.wikidata.org/wiki/Q50000>

Which corresponds to "victory", defined as "term that applies to success" ,
and also known as "win" and "success".

Sorry, but what's the need here? This just seems to dilute whatever's going on
at Wikipedia.

~~~
zeckalpha
Languages don't have bijective mappings of concepts, so this is a hard
problem. Do you have an ontology you'd like to propose?

~~~
danso
No, I mean I was really confused. The term "WikiData" seems to connote data of
the tabular type, like a central repository for public data. Though I'm also
confused at how the mapping for this particular term (for "victory") can't be
done in Wikipedia or the Wiki dictionary.

~~~
zeckalpha
It is, but the problem is whether you are talking about an individual language
version of Wikipedia or the Wikipedia project as a whole. In the article they
talk about the problem of maintaining Interwiki links on each individual
language version, rather than centrally.

This is also just one aspect of Wikidata. The centrality of shared table
content is important, too. Why have data in a specific language version
Wikipedia and point to it from other versions when you can have a central
repository that is pointed to using templates from each language version?

------
wahnfrieden
Cool to see my jQuery tag widget[1] in use here :)

[1] <http://aehlke.github.com/tag-it/>

------
SquareWheel
I'm not big into the Wiki world, but it's also struck me as odd how different
pages refer to the same facts and yet are totally disparate. If one page
updates, does somebody manually have to go update the other page? Does a bot
do it?

This looks like a great response to that. I just hope they've made it easy to
interface with.

~~~
jpatokal
Yup, manual updates and the occasional bot is how it works (or, more often,
doesn't) in the pre-Wikidata world.

------
roryokane
Note that the four-part barcode logo is “WIKI” in Morse code.

------
fasouto
So... it's like a Freebase clone?

~~~
neilk
If I understand it correctly, it has more modest goals. Freebase was trying to
make the uber-map of all data entities. Wikidata is just trying to make data
reuse easier on Wikipedia.

For instance, imagine a table of all the populations of the countries of the
world. Today, someone might make a really good one for the French Wikipedia.
But then someone has to make it from scratch, all over again, for the Greek
Wikipedia. And when someone updates the French one, the Greek one doesn't
update, and vice versa.

With Wikidata you can define the data once, and then transclude it to
different pages, with translated labels if necessary.

The first release attacks the problem of "inter-wiki links". On the left hand
side of some Wikipedia pages, there are the links to equivalent pages in
different languages. Check out the one for
<http://en.wikipedia.org/wiki/Jimmy_Wales>, for instance. Right now these are
updated with a system that looks at every possible connection (scaling at O(n
__2)), and with Wikidata it will be more manageable.

~~~
zeckalpha
Interestingly, interwiki links sometimes have some semantic drifting effect.

from Omnipedia <http://brenthecht.com/papers/bhecht_CHI2012_omnipedia.pdf>
[pdf]:

""" One major source of ambiguities in the ILL graph is conceptual drift
across language editions. Conceptual drift stems from the well-known finding
in cognitive science that the boundaries of concepts vary across language-
defined communities [13]. For instance, the English articles “High school” and
“Secondary school” are grouped into a single connected concept. While placing
these two articles in the same multilingual article may be reasonable given
their overlapping definitions around the world, excessive conceptual drift can
result in a semantic equivalent of what happens in the children’s game known
as “telephone”. For instance, chains of conceptual drift expand the
aforementioned connected concept to include the English articles “Primary
school”, “Etiquette”, “Manners”, and even “Protocol (diplomacy)”. Omnipedia
users would be confused to see “Kyoto Protocol” as a linked topic when they
looked up “High school”. A similar situation occurs in the large connected
concept that spans the semantic range from “River” to “Canal” to “Trench
warfare”, and in another which contains “Woman” and “Marriage” (although,
interestingly, not “Man”). """

------
pella
Atlantic.com (Apr2012) _"The Problem With Wikidata"_

[http://www.theatlantic.com/technology/archive/2012/04/the-
pr...](http://www.theatlantic.com/technology/archive/2012/04/the-problem-with-
wikidata/255564/)

------
mkx
Interesting to see if this will open up the web to more linked data based
trends.

------
whoisstan
This could be a heaven-sent for data exchange and knowledge structures. If
data subjects are labeled with wikidata nouns it could much more
interchangeable. RDF should have gone that route a long time ago.

------
andreyf
Populating this by hand seems like an enormous task that will never end. How
hard could it be to automatically populate the data from publicly available
data sources (e.g. SEC filings)?

~~~
nightrose
That's actually already happening and will happen more in the future.

------
Aissen
There's already <http://dbpedia.org> , but it's nice to see the Wikimedia
foundation finally take matters into their own hands.

------
MetalMASK
Let's see, is it similar to WordNet but with open access (i.e., anyone can
edit)?

~~~
lignuist
Wordnet is more a linguistical ressource, that focusses on word senses. You
would not find things like POPULATION OF CITY-X there.

------
ctdonath
Assurances of accuracy?

~~~
tommorris
Each statement will have the ability to have sources. This is not currently
supported by the UI (hence everyone is being a bit tentative and only putting
in really obvious, uncontentious things) but when it does, it'll basically
contain expressions of the form "X has property Y with a value of Z (type T),
according to sources A, B and C".

------
stck
Where does the fact numbering come from?

Q1 - universe

Q2 - Earth

Q3 - life

...

Q24 - Jack Bauer

...

Q76 - Barack Obama

~~~
tommorris
It's sequential.

~~~
nightrose
Right. With possibly some easter eggs ;-)

