
Show HN: A database of everything (over 55M keys) - outpan
https://www.outpan.com/
======
syats
There's some more complex and curated projects out there, among them:

[https://www.wikidata.org/wiki/Wikidata:Main_Page](https://www.wikidata.org/wiki/Wikidata:Main_Page)

~~~
singularity2001
+1001 for Wikidata

compare
[https://www.outpan.com/data?k=germany](https://www.outpan.com/data?k=germany)

to
[http://netbase.pannous.com/html/verbose/germany](http://netbase.pannous.com/html/verbose/germany)
(wikidata mirror)

or
[https://www.wikidata.org/w/index.php?search=Germany](https://www.wikidata.org/w/index.php?search=Germany)

~~~
hjalle
Netbase/pannous seems to be cool but I guess it still got some work to do as
it's suggesting Bonn as capital of Germany, unless I'm misinterpreting the
statements. Kind of weird result from Google when you search for: "bonn
capital germany". For me, it displays the wiki summary of Berlin but the link
goes to Bonn. Does it usually work like that?

Edit: It was the Bonn Summary of wikipedia

~~~
singularity2001
Bonn was the capital of Germany for many years.

------
GuiA
Signup page is blanking out for me.

Really curious to try it. It's a really neat idea. Has this been done before?
I've never seen anything like it.

The full value of this would likely come from interesting, productive,
insightful visualizations of the underlying graph that is being built.

Questions that come to my mind:

\- What if you write bots that scrape Wikipedia, Twitter, etc. and output
entries from semantic analysis performed on these sources?

\- If many people write such bots, how similar would the graphs be? What are
the parameters that determine graph overlap?

\- Can you use this to tie in to the real world?

 _" A key can be any unique string such as product barcodes, book ISBNs, email
addresses, URLs, domain names, names, phone numbers ..."_

Interesting stuff there... A way to make this into an interesting social
network is if people curated their own graphs, e.g. of books and webpages and
favorite restaurants, and other people could browse these graphs in a read
only mode. (perhaps they can clone it to their graphs or so and then start to
edit them)

The temporal aspect might be interesting too. Would there be value in seeing
how my graph has changed over time? When I was in academia, none of the tools
to keep track of the papers I read suited me. I could see this working well
for this use case scenario.

(if you could write a Prolog against this, would it have interesting
properties?)

I see that a submitter just posted 7630028603780 -> volume -> 75ml (the first
number being probably the bar code of some beverage), that's neat.

Someone else just posted the following entry:

 _5010438013621 - > ingredients -> Water, sugar, mixed fruit juices from
concentrate 10% (Grape, blackcurrant, raspberry), acid (Citric acid), Vimto
flavouring (Includes natural extracts of fruits, herbs, barley malt and
spices), colouring food (Concentrate of carrot, hibiscus), acidity regulator
(Sodium citrate), preservatives (Sodium benzoate, Potassium sorbate), Vitamin
C, Sweeteners (Sucralose, Acesulfame K)._

Now what the site should be doing is converting this to something like:

 _5010438013621 - > ingredients -> water

5010438013621 -> ingredients -> sugar

5010438013621 -> ingredients -> vitamin C

... _

(Outpan person/people, I'm in SF, if you want to chat more around coffee,
there's an email is in my profile)

~~~
exogen
> Has this been done before?

Depends what you mean by "this"! RDF [1] and most of the technology
surrounding it and the "Semantic Web" are based on (subject, predicate,
object) triples almost exactly like this, where each element is often a URI,
and objects are often strings just like they are here.

It even has taken this idea to the next level where the statements expressed
by such a triple can themselves be given an "anonymous" ID, which can then be
used as a subject or object – meaning you can make meta statements about the
statement itself, all while still using this simple system of triples.

There are even entire languages built around querying graphs of such triples:
[https://www.w3.org/TR/sparql11-query/](https://www.w3.org/TR/sparql11-query/)

DBpedia [1] is one such project that attempts to encode data from Wikipedia in
triples like this; their About page says that the 2014 version of the database
had 3 billion triples, so that number is probably much higher now. Here's a
preview if you want to see what these triples look like:

• Homepages of things:
[http://downloads.dbpedia.org/preview.php?file=2015-10_sl_cor...](http://downloads.dbpedia.org/preview.php?file=2015-10_sl_core-i18n_sl_en_sl_homepages_en.ttl.bz2)

• Genders of things:
[http://downloads.dbpedia.org/preview.php?file=2015-10_sl_cor...](http://downloads.dbpedia.org/preview.php?file=2015-10_sl_core-i18n_sl_en_sl_genders_en.ttl.bz2)

etc. You'll notice that RDF predicates are all namespaced by URIs; that way
you can unambiguously know in what sense "homepage" and "gender" are used
(consider more ambiguous properties like "length"). That means there can be
other uses of "homepage", "gender", "length" etc. that mean different things,
and those will be namespaced by a different URI.

Anyway, this Outpan project is obviously a more loose and freeform version of
that – but only slightly; RDF is not very strict at all, it's just that people
have thought a lot about how to successfully model the entire world's
information, and so real-world RDF ontologies end up looking somewhat
complicated. I'm not sure if a freeform version like this has been widely
attempted before.

[1]
[https://en.wikipedia.org/wiki/Resource_Description_Framework](https://en.wikipedia.org/wiki/Resource_Description_Framework)
[2] [http://wiki.dbpedia.org/](http://wiki.dbpedia.org/)

~~~
exogen
To borrow a subject matter that's currently popular on the Outpan homepage,
here are the first 500 facts DBpedia knows about Donald Trump:

    
    
        SELECT DISTINCT ?property ?value WHERE {
            <http://dbpedia.org/resource/Donald_Trump> ?property ?value
        } LIMIT 500
    

Results: [http://dbpedia.org/sparql?default-graph-
uri=http%3A%2F%2Fdbp...](http://dbpedia.org/sparql?default-graph-
uri=http%3A%2F%2Fdbpedia.org&query=SELECT+DISTINCT+%3Fproperty+%3Fvalue+WHERE+%7B%0D%0A++%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FDonald_Trump%3E+%3Fproperty+%3Fvalue%0D%0A%7D+LIMIT+500&format=text%2Fhtml&CXML_redir_for_subjs=121&CXML_redir_for_hrefs=&timeout=30000&debug=on)
(although, note, not every dataset they have is loaded into their SPARQL
endpoint)

As you can see there's a lot of metadata type properties, but scroll down and
you can see his birthdate, children, alma mater, etc.

This page is just a prettified version of that data:
[http://dbpedia.org/page/Donald_Trump](http://dbpedia.org/page/Donald_Trump)

~~~
outpan
This is great! thanks. I will look into adding the dbpedia.org data.

~~~
exogen
Check out [https://www.wikidata.org/](https://www.wikidata.org/) for another
similar project with additional data! Their keys tend to be more opaque [1],
but otherwise it's a very similar approach.

[1] e.g. the key for "Earth" is Q2:
[https://www.wikidata.org/wiki/Q2](https://www.wikidata.org/wiki/Q2)

------
yawniek
does silicon valley now suddenly discover the semantic web and triplestores :O

if you want really all the data: [http://lod-cloud.net/](http://lod-
cloud.net/)

but still, neat project.

~~~
outpan
Semantic web is too good of an idea not too iterate on constantly :)

------
ethernetsalad
As much as I like the idea, I can already see someone called "Drumpf" on the
front page linking attributes about Hillary Clinton and Donald Trump to their
Twitter accounts and not their names. I guess if you're after "everything"
then curation makes no sense but you'll end up with a bunch of nonsense.

~~~
outpan
Twitter url seems like a strong key since it is referenced in many other
contexts.

As for curation, it is intended to provide examples of what key, attr and
values are regardless of their content. This is an experimental feature and
might be removed/tweaked...

------
tisryno
Great concept but I can see it falling out of line very quickly, on the
homepage I spotted "berlin -> country -> germany" Followed by "england ->
capital -> London"

If you search for the key "germany" it has no results, if you search "london"
it finds no results.

The fluidity of the data is definitely a hindrance, if you wanted to use the
dataset you'd have to already know what you are looking for to find the value.

------
erikb
Is 55 million keys a lot for tracking "everything"? My expectation was that
"everything" would need more than 55 trillion keys.

~~~
outpan
"a journey of a thousand miles begins with a single step"

~~~
nenadg
"a journey of a everything begins with a 55M keys"

~~~
outpan
haha. We should use this as the project mantra.

~~~
nenadg
I agree :-) you are welcome

------
jrochkind1
So it's RDF without the globally unique identifiers?

------
kristopolous
So searches like "Linux" "California" "lincoln" "Hitler" "Disney" and "red"
return 0 results

~~~
outpan
I was driving so couldn't stay on the phone. The number of keys for different
concepts is just very large (if not infinite for the sake of avoiding
philosophical debates). It will take a some time to have enough data to cover
even popular keys. Fortunately I have a lot of time :D

------
jitl
What sort of backend storage does this use?

~~~
Arcsech
Given that it's effectively key->key->value, I'm guessing Cassandra for the
main backend. The data model fits very well, and it would give you the kind of
scalability you would need for this sort of thing.

~~~
smarx007
Given how ridiculously basic the website is, I think you're right. It should
have been a real triplestore with a SPARQL endpoint though.

~~~
bpicolo
There are whole databases built around the concept:
[http://www.datomic.com/](http://www.datomic.com/)

~~~
smarx007
I browsed the website, but couldn't grasp what is the main value of that
product or how could it be useful for the showcased website. Could you please
educate me a bit?

------
0xmohit
Is it possible to lookup all the attributes for a given value, say Trump?

------
ivoras
Shame it fails for the trap of natural language ambiguity. So on the front
page I see "England -> capital -> London" and at a glance I thought the
capital (as in money) flows from England and is accumulated in London.

~~~
OJFord
I agree in general (and it's the fault of freely user-definable
keys/attribute) - but that's not a great example, since the arrow aren't read
as a direction of flow.

Perhaps that reveals that a forward slash might be a better separator though -
like a URI.

------
firewalkwithme
The header looks like fastmail :)

