Ask HN: What happened to the semantic web? - drdrey
======
throwaway2016a
The Semantic Web Meetup happens at MIT in Cambridge, MA occasionally:
[https://www.meetup.com/The-Cambridge-Semantic-Web-Meetup-
Gro...](https://www.meetup.com/The-Cambridge-Semantic-Web-Meetup-Group/)

Though it seems mostly cancelled / sporadic now, it had a lot of interesting
people presenting on interesting academic uses of the semantic web / RFD /
etc.

A couple times I was there Tim Berners-Lee himself was there too. He's an
interesting guy to meet.

Overall though, I think due to business reasons (really companies are not
incentivized to share) it has mostly caught on in academia. With a shining
example in "microformats" which gained adoption because companies like Google
adopted them as a way to make gathering (as opposed to sharing) data.

Edit:

Personally I found a lot of aspects useful but others not all that well
thought out when it comes to practical specs. The community has a tendency to
try to build complete taxonomies rather than taxonomies that have long term
usability. As a result they become stale. For example, Friend of a Friend
(FOAF) [1] is nice but it is very narrowly speced in some areas but not
others. For example, there is a tag for AOL Instant Messenger ID but none for
Facebook.

Microformats in a way has some similar issues though not as bad.

[1] [http://xmlns.com/foaf/spec/](http://xmlns.com/foaf/spec/)

------
rspeer
Most technologies that were specific to the "Semantic Web", such as OWL and
SPARQL, failed to scale and failed to solve realistic problems, and therefore
died. (I always maintained that running a SPARQL endpoint amounted to running
a DDoS on yourself.)

However, we got something kind of cool out of the RDF model that underlies it,
especially when some sufficiently opinionated developers identified the bad
parts of RDF and dumped them. We got JSON-LD [1], a way for making APIs
describe themselves in a way that's compatible with RDF's data model. For what
I mean about sufficiently opinionated developers, I recommend reading Manu
Sporny's "JSON-LD and Why I Hate the Semantic Web" [2], a wonderful title for
the article behind the main reason the Semantic Web is still relevant.

Google makes use of JSON-LD in real situations: for example, an airline that
uses JSON-LD can send you an e-mail that Google Assistant can use to update
you on the status of your flight, and Gmail can use to give you a simple
button for checking in.

[1] [https://json-ld.org/](https://json-ld.org/)

[2] [http://manu.sporny.org/2014/json-ld-
origins-2/](http://manu.sporny.org/2014/json-ld-origins-2/)

------
drcode
I think the main issue is that even though "knowledge representation" with
ontologies is an enticing goal, it's simply a fact that real entities, as used
by humans at a practical level, don't map neatly onto mathematically-sound
hierarchies.. . To see this, just look at the arguments the ancient Greeks
already had as to whether a human is a "two-legged featherless animal" or the
endless online arguments as to whether a "circle is an ellipse" or vice versa.

Because of this, there's just not much utility in taking the time to generate
semantic markup- it'll be sloppy and incomplete even when done by a PhD
student specializing in this subject.

~~~
naasking
> don't map neatly onto mathematically-sound hierarchies.

I think they do, but finding the right mathematical model is very difficult.
If it were easy, everyone would be a mathematics PhD.

Learning to program is becoming efficient at recognizing the right spherical
cow in any given situation, because such shortcuts are essential to getting
shit done.

~~~
drcode
I think you'd change your tune (like I did) once you dig into the ugly details
of such mathematical modeling. A book like this will show you just how hard
this is [https://www.amazon.com/Knowledge-Representation-Reasoning-
Ar...](https://www.amazon.com/Knowledge-Representation-Reasoning-Artificial-
Intelligence/dp/1558609326)

~~~
naasking
I recognize it's difficult since I literally said that. But programming is
simplified mathematical modelling of a sort (for instance, via Curry-Howard),
hence my "spherical cow".

------
smadge
The Semantic Web is incompatible with the commercial incentives of most
technology companies. For instance, it would currently be irrational for
Facebook to voluntarily publish their social network using the friend of a
friend schema. Their profit is derived from their centralized, private
ownership of this data. Hopefully we can move towards a decentralized or
federate, public web.

~~~
acutesoftware
Yes, once everyone thinks that 'Data is the new oil', things like public
shared standards fly out the window.

There are many open upper level ontologies available (I counted 16 when I did
a review a few years ago -
[http://www.acutesoftware.com.au/aikif/ontology.html](http://www.acutesoftware.com.au/aikif/ontology.html)),
but the really complete ones are not publically available (Cyc full version,
Googles internal ontology and the countless others held in corporate servers).

~~~
bogomipz
Where is Googles internal ontology used exactly?

~~~
acutesoftware
I don't know, but I just assumed they use it everywhere. They have a very good
mapping of related terms and a fairly consistent mapping.

A visible example is when you look for organisations and they have a
classification against it.

e.g. Google IBM and they call it "Computer manufacturing company" \- these
classifications are different to many of the standards for specific sets of
data

~~~
executesorder66
Where do you see google calling IBM a "Computer manufacturing company" if you
search for IBM? I'm not saying they don't. I just want to see examples of what
you are talking about.

I googled IBM, and did not see it classified as such.

~~~
acutesoftware
When I am logged on to Google I see companies details in a right hand side
pane (when the search term is unambiguous)

For IBM it says

    
    
      IBM
      Computer manufacturing company
      Image result for ibm
      IBM is an American multinational technology company headquartered in Armonk, New York, 
      United States, with operations in over 170 countries. Wikipedia
      Stock price: IBM (NYSE) USD155.39 +2.71 (+1.77%)
      10 Apr., 4:00 pm GMT-4 - Disclaimer
      Founder: Charles Ranlett Flint
      Founded: 16 June 1911, New York City, New York, United States
      Headquarters: Armonk, North Castle, New York, United States
      Subsidiaries: Trusteer, FileNet, IBM Global Services, Ustream, MORE
      Executives: Ginni Rometty (CEO, President, Chairperson), MORE
      Did you know: IBM is the world's eighth-largest information technology company by revenue. 
      wikipedia.org

------
niftich
Onthologies are hard. Curation is harder. People are lazy.

The ideas are still around; some [1] were lifted by Facebook [2], for example.
There's also continuation work that's related, like web annotations [3], but
generally the commercial web is moving even more away from neatly-organized
resources [4] and towards Javascript state machines [5].

[1]
[https://web.archive.org/web/20160713021037/http://dig.csail....](https://web.archive.org/web/20160713021037/http://dig.csail.mit.edu/breadcrumbs/node/215)
[2] [https://developers.facebook.com/docs/graph-
api/overview/](https://developers.facebook.com/docs/graph-api/overview/) [3]
[https://news.ycombinator.com/item?id=13729525#13740110](https://news.ycombinator.com/item?id=13729525#13740110)
[4]
[https://news.ycombinator.com/item?id=12206846#12207459](https://news.ycombinator.com/item?id=12206846#12207459)
[5]
[https://news.ycombinator.com/item?id=12345693#12346371](https://news.ycombinator.com/item?id=12345693#12346371)

~~~
JPLeRouzic
I think you hint great points especially with "People are lazy".

The semantic web was a great idea, but in the period from 2000 to 2010, people
advertized it as a kind of AGI that would solve all hard problems with junk
data.

It is still used in biology, for example in Gene Ontology [0] but the main use
case ( _People are lazy_ ) is "If your research cannot find interesting stuff,
just query Gene Ontology".

[0]
[https://en.wikipedia.org/wiki/Gene_ontology](https://en.wikipedia.org/wiki/Gene_ontology)

~~~
JPLeRouzic
While re-reading what I wrote, a part of a sentence makes me wonder " _a kind
of AGI that would solve all hard problems with junk data_ "

~~~
mateo411
What does AGI mean in this context?

------
crazysmoove
One of the big things to come out of the semantic web was RFD-A (embedding
semantics in unstructured web pages) and similar technologies (microformats,
JSON-LD, schema.org). It's what lets Google show product reviews and rankings
in search results, and lets shopping aggregator sites show things like price
comparisons from other websites. While it's probably not as widespread as its
boosters from a decade ago hoped it would be, it did lead to some helpful
technologies that are in widespread use now.

I wonder if Facebook won't someday be forced to publish its social graph data
in FOAF format the same way Microsoft was forced to publish its Office
document specs as part of an anti-trust decision.

Speaking of Facebook, the OpenGraph tags are another example of widely-used
semantic data on the web, maybe the most widely-used, since all kinds of sites
pull in page summaries, images, and other data from those tags. So while
Facebook doesn't make social network data available, it did popularize a
format for sharing other types of data (about companies, articles, websites,
etc.).

~~~
crazysmoove
Sorry, that should be "RDF-A," not "RFD-A."

------
anon1253
Long ago I wrote a blog post as an introduction with an identical title:
[https://joelkuiper.eu/semantic-web](https://joelkuiper.eu/semantic-web)

At our company we still use Semantic Web (or rather, RDF) for inference and
annotation with medical ontologies (UMLS, Gene Ontology, Human Phenotype
Ontology, etc). The ease of use of triples + SPARQL (basically a PROLOG-ish
unification scheme) is really powerful (and quite performant when using
Jena/Fuseki with Lucene as a text index). But it's a far cry from the "dream"
of semantic web like federated queries and OpenAnnotations (now just W3C
Annotations). Still, every time someone implements an EAV scheme without even
considering an RDF triple store I cringe a bit.

------
ghaff
It was the sort of largely academic tops-down exercise to organizing
information that has mostly lost out time and time again to more organic
bottoms-up/self-organizing approaches. Think Yahoo vs. Google. [ADDED: i.e.
manually populating hierarchies vs. search, in case that wasn't clear] I
remember when it was going to be Web 3.0. Tim Berners-Lee gave a talk about it
when he won the Draper prize.

As others have said, classification is difficult under the best of
circumstances. And it just doesn't fit with the way the Internet has evolved.
We have Wikipedia, not the Encyclopedia Galactica.

~~~
kristianc
I think there's a lot in this. The first users of the Internet saw themselves
as librarians and curators, and sought to impose that vision of the world on
everyone else. For a long time, people had trouble with the idea that
everything didn't need to link to everything else.

~~~
ghaff
Hierarchical structures are how we organized things historically. So I think
it's pretty natural. I know that for a long time I was relatively careful
about filing email, files, etc. into a folder hierarchy and categorizing my
music collection. I won't say folders (and tags/labels) don't still have their
uses. But I've definitely moved away from spending so much upfront time to
carefully organizing stuff that I may want to find some tiny percentage of
some day. Instead I mostly figure I can search for it if I need to.

------
AznHisoka
It happened.

We got meta tags that tell us the published date, author and type of web page.

We got schema for job ads.

We got schema for recipes.

We got schema for thumbnails and images associated with a webpage.

We got schema for ecommerce products

~~~
jefflombardjr
This is all speculation and I have no idea of the actual roadmap for the
specs. As I was reading this comment it gave me another reason to love
component based architecture... I would think it would make sense just to
allow users to self define stuff like that rather than try to do everything
top down.

[https://www.w3.org/standards/techs/components#w3c_all](https://www.w3.org/standards/techs/components#w3c_all)

~~~
jmalicki
Google guides for SEO show you how the semantic web happened.

[https://developers.google.com/search/docs/data-
types/product](https://developers.google.com/search/docs/data-types/product)

Noone knows it's called the semantic web these days. It's just what you have
to do to have you page get picked up and highly ranked by google, and to get
more links from direct product traffic.

------
contingencies
Turns out it's not profitable to encourage understanding, rather it's better
to be a hosted service provider, keep knowledge in a walled garden and charge
for it.

------
pornel
1\. We've realized that people in general can't reliably and consistently mark
data up. That's a problem of incentives, technical difficulties, UI, bitrot of
invisible metadata, etc.

2\. We've settled on extracting information from "raw" text (with everything
from regexes to recognize flight info in e-mails to getting word statistics
from terabytes of garbage) and duct-taping that with special-purpose APIs.

~~~
ssfrr
The flight info example is one of the places where semantic web tech went
mainstream. Those flight emails have embedded metadata in JSON-LD (linked
data) format and Gmail uses it for more specialized display[1].

[1]:
[https://developers.google.com/gmail/markup/reference/flight-...](https://developers.google.com/gmail/markup/reference/flight-
reservation)

------
tuukkah
It pivoted to Linked Data [1] with less focus on ontologies and AI and more
focus on linking, open data and a Web of Data [2].

One nice demo of the latest advances is how you can query Wikidata client-side
without downloading the whole database for queries like "Directors of movies
starring Brad Pitt":
[http://ldfclient.wmflabs.org](http://ldfclient.wmflabs.org)

[1]
[https://en.m.wikipedia.org/wiki/Linked_data](https://en.m.wikipedia.org/wiki/Linked_data)

[2] [https://www.w3.org/2013/data/](https://www.w3.org/2013/data/)

------
domain_ly
It was always a cruel joke, never to be taken seriously.

At its core were SEO hucksters trying pass off page rank hacks as a business
model for consulting work, during the post-dot-com bust period, when money was
scarce and web design couldn't pay the bills anymore.

Many ascended to the priesthood of RESTful web microservice development, where
they poo poo and tisk-tisk improper path grammar and noodle with JSON objects,
in between periods of intense navel gazing.

------
grymoire1
I spent 2 years using a semantic reasoner to develop an ontology for reasoning
about smartgrid vulnerabilities. Ignoring the web aspect, ontologies are very
hard. In addition, one needs to use multiple languages like one to express the
ontology, and another to express a query. Change the ontology a little bit and
the query will break when you run it. There was no integrated IDE that was
complete.

------
sgt101
I had a project where we wanted a ticketing / events ontology and budgeted 6
weeks for three people to build it, in the end we spent probably 3 person
years on it, which was dim... but we got suckered in by the idea that the
ontologies themselves would be valuable (spoiler - nope).

So, knowledge engineering scales badly, but there were other problems. There
was a big debate in the EU community about what kind of reasoner to use, and
for some god awful reason F-logic was chosen, at the time we thought that
reasoners like Otter wouldn't be able to scale and do FOL tractably. It's a
shame that answersets and MCMC probalistic reasoners were 10 years later - I
think that the weak reasoning and poor representation systems were big gaps.

The other problem was institutional, the way that EU semantic web funding
worked, and the way that the projects developed. A lot of money was spent, and
then there was no money - there was no self sustaining legacy.

------
cimmanom
It didn't add any value for commercial entities (and minimal immediate value
beyond self-satisfaction for non-commercial entities) so they didn't devote
any resources to implementing it.

------
Eli_P
What about NIEM (National Information Exchange Model) [1][2].

I see this tech is supposed to be replacement for paper documents and be the
medium for government information arbitrage. The only obstacle for using it
everywhere is structural complexity of NIEM and lack of tools. I've spent a
bit of time hacking it with XML queries and my mind is blown [3].

You can interpret NIEM as a type system similar to types in programming
languages, but for composing electronic documents; it could be integrated with
payment systems. I think progress will go two ways: composing new documents
will be happening with NIEM, older docs could be converted with natural
language processing.

The latest version 4.0 is dated 2017, and US has spent lots of money to build
an XML representation of real-life objects.

[1] [https://www.niem.gov/](https://www.niem.gov/) [2]
[https://en.wikipedia.org/wiki/National_Information_Exchange_...](https://en.wikipedia.org/wiki/National_Information_Exchange_Model)
[3] [https://github.com/NIEM](https://github.com/NIEM)

------
wslh
In a way Freebase and DBpedia were/are practical applications of the concept.
Now when you search on Google they try to understand a simple query and send
you the answer. In Freebase you could write queries about facts retrieved from
many sources.

The utopia is more than this but I assume that few people will used these
tools directly.

------
flukus
It was a solution looking for a problem.

~~~
ZenoArrow
Hardly. The promise is still there, but there are barriers in place to get
there.

One of the most useful aspects of the semantic web is how it enhances the
search for information. Some web citizens have become conditioned to see
Google as the pinnacle of what we can achieve through search, but we can do a
lot better. Let's use an example to illustrate this. Imagine a presidential
election was taking place and you want to understand the positions of the
candidates on topics that matter to you. Let's say foreign policy was
something you were interested in, including their proclivity for war. By
allowing for searching on a richer set of metadata you can more easily access
the information about the positions of these candidates, without the
distortions of Google's page rank algorithms. Think of it like treating the
information of the web as a database you can query more directly. That's the
main promise of the semantic web.

------
hacknat
The simple reason is that people are lazy. Someone isn’t going to put in the
extra work of marking up their text with the correct semantic structure if
they don’t get much out of it, and the ROI for an individual site owner was
dubious at best. People keep talking about RFD, but that barely qualifies IMO,
it was more of an agreed upon RFC so that search engines didn’t force site
owners to all adopt differing indexable formats (events, addresses, etc). Even
with the backing of Google, RFD is not something that most sites are doing
until they start tackling some enterprise grade SEO optimizations.

------
TorKlingberg
We got separate JSON API's instead of machine-readable web pages.

------
jerven
It became useful for people who have as job using public data or providing it.
Such as the life sciences, governments and the archaeology/history and more.
It allows for nice bottom up standards and user interfaces such as flight data
in your e-mail.

If your not consuming a lot of public data or providing data to the public it
is not very useful other than having a bunch of better graph databases
associated with it.

------
mindB
The semantic web is alive and well. It's just not in the places you're
looking. I recommend checking out indieweb.org for a community devoted to
building on the semantic web. Just because the big websites aren't using it
doesn't mean the technology is dying.

------
matell
the (semi)automatic annotation never really happened. there are ontologies,
there are amounts of raw data everywhere on the web, but we haven't discovered
a way how to reliably turn those data into rdf triplets matching the ontology
without doing it by manually hand.

------
NelsonMinar
Google got good enough at divining the content and meaning of a page without
needing magical XML pixie dust to annotate the facts on it.

------
alexchamberlain
I think the triple model is awesome, but we haven’t been able to develop
decent triple stores to back it up.

------
return1
Machines should learn to understand understand human language, not the other
way around.

------
tastyham
It was a stupid idea, although remnants can be seen in html5 with elements
like address and , nav, and section.

Turns out that keeping presentation and data separate is much, much easier.
Hopefully HTML 6 will get rid of everything except for div, span, and form
elements.

------
hguhghuff
I wasted time, money and effort on it.

What happened is that it was pointless.

Build something people want, not the semantic web.

------
amelius
Answer: it became obsolete by the use of Machine Learning.

------
busterarm
uhh...haven't you heard of JSON-LD?

------
adbachman
We call it "blockchain" now.

A lot of the solution in search of a problem work that went into semantic web
just shifted to the crypto currency space.

