
Wikidata - kjhughes
https://www.wikidata.org/wiki/Wikidata:Main_Page
======
kenjackson
I was just thinking about something like this today (or at least what I think
this might be, since I really can't tell from the page).

Here is what I want... I want a way to query and download Wikipedia metadata
easily.

For example, I recently saw the Burke Museum released an Wildflower App, but
none for Windows Phone. I was thinking I could use Wikipedia to build a very
similar app, but I needed to get access to hierarchical metadata (flower type,
color, region of the world, environment, etc...). I suspect it's largely there
in Wikipedia, but it's not structured (or at least I don't know how to query
it).

I can imagine this being of use for a lot of things: plants, birds, fish,
basketball, football, countries, presidents, trees, foods, etc...

Has this been done already? Is it a solved problem with Wikipedia already?

~~~
mjn
That is the direction Wikidata is intending to go, yes. But it doesn't have a
lot of semantic data in it yet.

In the meantime, some other options that might be useful:

There's another Wikimedia Foundation project specifically for species
taxonomy: <http://species.wikimedia.org>

As 'riffraff mentioned there's a project, DBpedia, which is based on
extracting structured data by parsing Wikipedia infoboxes:
<http://dbpedia.org/About>

For certain kinds of queries the Wikipedia category system already provides
some semi-structured data, e.g. you can consult
<https://en.wikipedia.org/wiki/Category:Fictional_parrots> to get a list of
some famous fictional parrots. You can query by category programmatically
using the MediaWiki API <http://www.mediawiki.org/wiki/API:Main_page>

~~~
saraid216
> 'riffraff

Completely OT, but is this an established HN convention for marking usernames?
I've noticed it here and there, and it seems like a goodish idea.

~~~
gruseom
Well, not really. When you quote a username like this, it means you're talking
about the name—the symbol—rather than about the user. So this makes sense:

    
    
      'riffraff is eight letters long
    

But this does not make sense:

    
    
      'riffraff mentioned the project
    

... since names don't talk.

~~~
tptacek
You're being more philosophical about it than I am.

I just wanted to be able to refer to users with names like "the" without
putting scare quotes around the names. What I wanted, in effect, was to
prevent evaluation of the names in the sentence.

 _Not_ somehow quoting HN names feels even weirder; writing sentences like
"anonymous says we should use the smoked paprika" or "nooonespecial stole our
bucket of mashed potatoes" just feels wrong.

I'm happy for it to be an idiosyncrasy of mine, though.

~~~
gruseom
But why quote the name? "gruseom wrote a comment to which tptacek replied" is
no different from "Daniel did this and Thomas did that". Is it? If you don't
evaluate the names, you end up talking about the names themselves.

(Not being pedantic, just curious.)

~~~
tptacek
Because some people on HN have names that are ambiguous in sentence context.
"gruseom" and "tptacek" aren't ambiguous; "the" and "noonespecial" are.

~~~
gruseom
Ahhhhh, I get it now. Yes, ambiguity is an issue.

My son used to play an FPS that used scrolling text to report on what actions
were happening. If you shot a player named, say, "gruseom", the scrolling text
would read:

    
    
      You killed gruseom.
    

One day I noticed that some wag had picked the username
"yourownmotherhowcouldyou".

Edit: You're basically using ' the way that @ migrated into general usage from
Twitter. But the Lisp ' is so burned into my memory that it was surprisingly
hard to see that. I do prefer your ' to @; it's not as heavy.

------
tzury

        Universe
    

<https://www.wikidata.org/w/index.php?title=Q1>

    
    
        Earth

<https://www.wikidata.org/w/index.php?title=Q2>

    
    
        Life

<https://www.wikidata.org/w/index.php?title=Q3>

~~~
josso

      Douglas Adams
    

<https://www.wikidata.org/w/index.php?title=Q42>

------
gioele
(Repost of part of my previous comment, still valid, I think)

Wikidata, just like Freebase, is trying to collect structured or semi-
structured data instead of unstructured data such as that present in
Wikipedia. I am happy about the aim (completely unstructured data is basically
useless for any serious data reuse and data extraction) but my fear is that
they will not succeed as well as they did with Wikipedia. Wikipedia funded its
success on the fact that anybody could edit it. In order to edit a wikipedia
page you only need very low technical skills and basic writing skills (plus
knowledge of the topic, obviously). Adding and manipulating structured data
requires people to obey to a certain mental grid, to a formalized model, to a
schema developed by someone and put in place to be respected strictly. The
vast majority of people is easily demotivated when they are required to learn
something substantial beforehand and most of the edits of unskilled users end
up removed by watchdog (something seen often in high quality Wikipedia
articles: edits made by new users are quickly reverted on the grounds that
they did not follow some of the many guidelines that must be followed).

My idea is that many problems found in structured-data projects (FreeBase,
MusicBrainz...) could be alleviated by better interfaces and a wide use of
automation, both things that Wikipedia projects do not seem to excel in.

~~~
epaulson
Users and computers can here. Let any unskilled user make a change. There are
plenty of editors who just watch the changelog and fix things up for grammer
or for NPV violations, and now that can include include incorporating it into
the structured data.

In the future, bots using some kind of NLP can scour over the changes and look
for new facts to extract and incorporate, either by suggesting features to
editors or to just make the change themselves.

------
gburt
For those who are confused (many of the posts currently appear to be), this
seems like it is an attempt to semantic-ize Wikipedia-style definitions.

For example, representing a term as: Earth "is a" planet "belongs to" the
solar system, "has" [properties X, Y, Z].

This has applications in natural language processing and other types of
artificial intelligence (for example, within a particular niche, it could be
useful in an expert system). Think about this type of thing as a complex graph
representing "knowledge" in some abstract but still useful sense.

~~~
anvandare
Also known as faceted classification.[1]

[1] <http://en.wikipedia.org/wiki/Faceted_classification>

------
bendmorris
Am I missing something? Wikidata talks big, but apparently it only contains
lists of different language versions of Wikipedia articles.
([https://www.wikidata.org/wiki/Wikidata:Introduction#What_do_...](https://www.wikidata.org/wiki/Wikidata:Introduction#What_do_we_have_so_far.3F))
There doesn't seem to be any sort of codified plan forward, so I'm not sure
what to think of it.

~~~
ZeroGravitas
The initial goals don't seem world changing, but maybe at Wikipedia scale they
could be:

We are working on three major things:

1\. Centralizing language links

2\. Providing a central place for infobox data for all Wikipedias

3\. Creating and updating list articles based on data in Wikidata

In particular I imagine this will have an initial impact of massively
increasing the quality of the lesser used Wiki languages, by leveraging the
work on the busier sites.

------
gioele
Previous HN discussion (50 comments):
<https://news.ycombinator.com/item?id=3775212>

Commentary by The Atlantic:
[http://www.theatlantic.com/technology/archive/2012/04/the-
pr...](http://www.theatlantic.com/technology/archive/2012/04/the-problem-with-
wikidata/255564/)

------
kristopher
Entry Q13: triskaidekaphobia (the fear of the number 13)

<https://www.wikidata.org/w/index.php?title=Q13>

------
csmatt
I'm thinking this is a one-stop-shop for public datasets? Maybe for when
you're reading an article on something like 'gun violence in urban areas', but
want to see stats from multiple studies on the topic. That's my best guess and
I can see that solving the problem of trying to reference data from studies in
a unified way.

------
kissickas
Noscript users, or in case anyone else sees lots of numbers like "Q1969448"
instead of data values in the tables (articles?): Wikidata apparently requires
javascript but doesn't make it clear. Allow scripts and refresh.

