Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
An Ambitious Wikidata Tutorial [pdf] (wikimedia.org)
29 points by amirouche on Oct 18, 2015 | hide | past | favorite | 9 comments


Author here, ask me anything! Slides are also available at http://www.slideshare.net/_Emw/an-ambitious-wikidata-tutoria....


Are they any open experiments to integrate wikidata with AI systems of any kind?


Yes. Kian and WikiBrain are two such projects. Kian is an artificial neural network designed to serve Wikidata, e.g. for classifying humans based on content in Wikipedia [1, 2]. WikiBrain uses Wikidata to recognize the type of relationships or connections between Wikipedia concepts [3, 4].

I suspect larger applications of Wikidata in AI will follow. For example, as of 2010, IBM Watson acquired at least some of its content from DBpedia and YAGO [5], which ultimately derive much of their content from scraping Wikipedia's infoboxes and category system. Now, come 2015, Wikidata is supplying data for some Wikidata infoboxes, and the proportion of infoboxes that pull from structured data in Wikidata will increase over time. And I expect Wikipedia's category system will gradually be supplanted by Wikidata's more expressive property system over time.

Thus, I imagine Wikidata will form a semantic backbone for Q&A systems like Watson in the future.

The Wikidata development team's work is funded through donations by the Allen Institute of Artificial Intelligence, Google, the Gordon and Betty Moore Foundation, and Yandex [6]. So organizations with an interest in AI see potential in Wikidata.

1. https://github.com/Ladsgroup/Kian

2. http://ultimategerardm.blogspot.com/2015/09/wikidata-ten-que...

3. https://github.com/shilad/wikibrain

4. http://conservancy.umn.edu/bitstream/handle/11299/163269/Und...

5. http://www.aaai.org/Magazine/Watson/watson.php

6. http://cacm.acm.org/magazines/2014/10/178785-wikidata/fullte...


There is also Platypus that is a small query answering engine based on Wikidata: http://askplatyp.us


Is Wikidata only for notable data or any data? If only notable data is allowed, then how is that enforced? Who decides if some piece of data is notable?

Can data be permanently, unrecoverably deleted, or is it more like Wikipedia where you can usually go back to see text that was deleted (in older versions of an article)?

The way that units are being handled is troubling. Is the plan to assign a unique integer to every unit that's ever been used? That's a long list of integers.


> Is Wikidata only for notable data or any data?

Wikidata is only for notable data, but the notability threshold is much lower than that for Wikipedia. The criteria for notability are described at [1]. For example, we might add items for all known pathogenic genetic variants, but likely would not have an item for the fire hydrant on your street.

For things not notable enough for Wikidata, interested users could install a local instance of Wikibase [2], the software that runs Wikidata. Wikidata editors and administrators determine what is notable, and have places like [3] to discuss questionable cases.

> Can data be permanently, unrecoverably deleted, or is it more like Wikipedia

Wikidata works like Wikipedia in that regard. Previous versions of a given item or property are almost always viewable (and recoverable) through the History tab [e.g. 4]. In extraordinary cases, like a vandal posting sensitive information about a person, data can be hidden from normal view and/or actually deleted.

> Is the plan to assign a unique integer to every unit that's ever been used?

Basically yes, to my understanding. How many units do you think exist? We already have items for many units, e.g. meter, micrometer, nanometer, foot, yard, bit, byte, , gigabyte, etc. I can see how this implementation might seem naive; e.g. perhaps we could represent one standard metric (like meter or byte) and handle factor conversions of scale (kilometer, millimeter, etc.) through some mechanism we don't have in place right now. Consider also asking about this on the "Contact the Development Team" page [5].

1. https://www.wikidata.org/wiki/Wikidata:Notability

2. http://wikiba.se/

3. https://www.wikidata.org/wiki/Wikidata:Requests_for_deletion...

4. https://www.wikidata.org/w/index.php?title=Q1&action=history

5. https://www.wikidata.org/wiki/Wikidata:Contact_the_developme...


Good tutorial! Wikidata is an awesome initiative for Wikimedia and I would love to see all Wikipedia's structured data powered by wikidata. Wikidata is much easier to interface with than wikipedia infoboxes!


IIRC infoboxes should be powered by wikidata at some point.


Wikidata's new SPARQL service is probably the most useful topic in this tutorial for software developers and anyone interested in the Semantic Web. It allows one to query the vast, free knowledgebase that backs Wikipedia -- almost 15 million entities and over 70 million statements.

Example queries:

* Politicians who died of cancer (of any type): https://query.wikidata.org/#PREFIX%20wikibase%3A%20%3Chttp%3...

* Who discovered the most planets? https://query.wikidata.org/#PREFIX%20wikibase%3A%20%3Chttp%3...

* Largest cities with a female mayor: http://query.wikidata.org/#PREFIX%20wikibase%3A%20%3Chttp%3A...

More Wikidata SPARQL query examples: https://www.mediawiki.org/wiki/Wikibase/Indexing/SPARQL_Quer....




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: