
Three reasons why the Semantic Web has failed - rhythmvs
http://gigaom.com/2013/11/03/three-reasons-why-the-semantic-web-has-failed/
======
MrZongle2
Warning: snark ahead.

 _" The result is an inherently boring web of data. Google’s Knowledge Graph
promotional video is a great example of how boring this web can be. “Let’s say
you’re searching for Renaissance Painters”…. Really? Who searches for that?"_

Somebody with a brain that is used for more than gross muscle control over a
TV remote?

 _" I really don’t care what Leonardo DaVinci’s height was or which Nobel
prize winners were born before 1945. I care about how other people feel about
last night’s Breaking Bad series finale. How did they find the ending? What
other series or movies might I enjoy based on those experiences?"_

This made me throw up in my mouth a little.

I can understand that the author wants results to be more applicable to the
everyday life of the searcher, but I don't see his vision of a Semantic Web
any more useful than the current one.

Finding out what other like-minded people think about my favorite TV show
improves the Web, and thus my life? _Really?_

I'm all for more accurate classification of online data. What I _don 't_ want
is "boring" (in the author's words) data being pushed aside in favor of what
_somebody_ thinks is "interesting".

We've already got companies doing that, in the form of "targeted ads".

~~~
rhythmvs
Enchanting way to put it! I’m also horrified by the mercantilist eulogy of
blatant ignorance, these days. But the author of that gruesome piece has a few
points: RDF, OWL, XML specs are tl;dr, too cumbersome to be of practical use,
developers don’t bother, and founders don’t care. Meanwhile Snapchat turned
down a $3-4 bilion offer today. Advances in AI are made because of consumer
driven services. The days of CERN’s Internet are long past…

~~~
greenyoda
_" Advances in AI are made because of consumer driven services."_

Not necessarily. IBM's Watson doesn't seem to have been aimed at consumer
driven services. IBM's market is primarily large companies.

And while we may not have seen much RDF or OWL based stuff on the public web,
that doesn't mean that it isn't being used in proprietary services, such as
searching legal documents.

~~~
rhythmvs
That begs the question: searching those documents is only possible, because
someone did all of the tedious OWL/RDF/etc. markup beforehand. Basically,
that’s the problem with how the semantic web is perceived generally:
sprinkling hints for machines into ambiguous, man-written documents (e.g.
Google’s schema.org).

The Watson computer is good at playing Jeopardy, but it’s just fact
collecting. Inferring a rule set from a corpus of documents, reasoning, and
drawing solid conclusions, is a different game. We would be better off with a
technology that would go over any given corpus of unstructured plain text,
parse, tokenize, normalize, structurize, iterate, and dump the graph in a
queryable data store. Natural language processing with a semantic reasoner.
Legal documents are indeed a great use case.

I went to see my lawyer yesterday. Over the last three years I paid (i.e. was
extorted) about $30k in legal fees. That’s lot of money for thin air. Withal,
most of the research and paperwork I did myself: my lawyer copy/pasting and
putting his court-accredited sign under it. To the legal scribes I’m but a
layman, and may not speak on my own behalf in court (unless I want to
jeopardize my cause). Despite the fact that I am particularly good at parsing
texts, studied linguistics, did a post-doc in natural language processing, and
know how to read laws (that are assumed to be known to and understood by all
citizens, in the first place, are they not?). Fact is: lawyers, judges and
clerks form a self-sustaining caste that benefits of the de facto (and de
jure) monopoly of interpreting the law. They have a lucrative interest in laws
and bills that are poorly written, are contradictory, full of ambiguity and
logical flaws. They share that interest with retarded lawmakers who produce
all that ill-conceived cruft.

Suppose we had our laws written in a formal language, with a well defined
regular grammar; suppose we had unit testing for new bills — machines could
administer justice, and they would be much better at it than the dunces who
went to law school.

There’s simply too little stakes in developing semantic technologies that
would do away with human corruption, underdeveloped intelligence, sentiment,
and subjective interpretations. Or rather: some industries (especially those
which are controlled by the powers that be) have too big a stake in preserving
backward human knowledge parsing. If big law firms have an interest in using
such technologies, they only have so, as long as they gain a competitive
advantage from it (vis-à-vis those which don’t have/use such technologies).
Legal corpora, neatly marked-up by hand (or semi-automated, no difference),
are certainly a valuable asset, that you wouldn’t want everyone to have cheap
access to, not your competitors, and especially not your litigant clients.

If we had instead semantic technologies that would cheaply produce queryable
knowledge systems from large, impervious document collections (like our codes
and jurisprudence), then, very likely, lots of sectors in our so called
“knowledge society” would be disrupted, leaving lots of overpaid “knowledge
workers” unemployed, overnight. That’s a threat to the very industries which
are supposed to support research and development of such technologies, as
potential customers.

That’s different with IBM’s customers, I guess: they do have an interest in
disrupting their industries (or rather the industries in which they are
newcomers), using tech. And they may do so, only because there’s no monopoly
guaranteed by a selfish legal system. Maybe also because present day’s state-
of-the-art in semantic technologies and AI offers good enough technologies for
these use cases, which are less complicated applications than those that would
be needed for use cases wherein more difficult knowledge parsing is required,
_and_ are bound by a legal/economic anathema?

Anyhow, I will support any startup that would create such tech with the
intention to run the human legalese interpreters out of business. And that’s
an exhilarating thought, because if such technology would be produced, it will
be equally good at solving problems in all branches of science.

~~~
mindcrime
_That begs the question: searching those documents is only possible, because
someone did all of the tedious OWL /RDF/etc. markup beforehand. Basically,
that’s the problem with how the semantic web is perceived generally:
sprinkling hints for machines into ambiguous, man-written documents (e.g.
Google’s schema.org)._

Well, some work is underway to automate a certain level of semantic extraction
from works that were not explicitly marked up as such by a human.

That said, I get what you're saying about the law thing, and I think we're
still a decent ways off from a computer that can truly _understand_ the legal
code. :-(

------
001sky
_We need a web in which information (both questions and answers) finds you
based on how your attention, emotions and thinking interconnects with the rest
of the world._

This article sounds like it was written by an ad.

~~~
ultimatedelman
it was, essentially. look at the author's "credentials" at the bottom of the
article.

------
bct
This is one of the worst articles I've read on the subject, and that's a
pretty strong statement.

> “Let’s say you’re searching for Renaissance Painters”…. Really? Who searches
> for that?"

It's an example that happens to be easy to demonstrate. It's pretty easy to
think of other cross-dataset queries that you might be more interested in.

~~~
fidotron
It's a horrific example, but the point being made is true, in that in areas
such as music enough information is generated regularly in a non-structured
way with no tie in to the semantic web that, the Facebook silo aside,
practically no one has up to date information about bands etc., certainly not
Freebase or similar. It's also a lost cause hoping for such things to be
comprehensive.

Facebook have also become a serious problem, in that a lot of places publish
there without realising or caring that their information essentially is locked
in. This gives their graph search a simply enormous advantage over anyone
else.

~~~
mindcrime
_practically no one has up to date information about bands etc., certainly not
Freebase or similar._

Hmm.. that leads to a couple of random thoughts:

1\. How up-to-date and comprehensive does it need to be? What kinds of queries
will people need to access (either directly or indirectly) about music, to
serve their purposes?

2\. DBPedia, through Wikpedia, actually does have a lot of information about
bands and musicians and music. For example, see:

[http://dbpedia.org/snorql/?query=PREFIX+dbo%3A+%3Chttp%3A%2F...](http://dbpedia.org/snorql/?query=PREFIX+dbo%3A+%3Chttp%3A%2F%2Fdbpedia.org%2Fontology%2F%3E%0D%0A%0D%0ASELECT+%3Fname+%3Fdescription+%3Fperson+WHERE+%7B%0D%0A+++++%3Fperson+%3Chttp%3A%2F%2Fpurl.org%2Fdc%2Fterms%2Fsubject%3E+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FCategory%3AAmerican_heavy_metal_singers%3E+.%0D%0A+++++%3Fperson+foaf%3Aname+%3Fname+.%0D%0A+++++%3Fperson+rdfs%3Acomment+%3Fdescription+.%0D%0A+++++FILTER+%28LANG%28%3Fdescription%29+%3D+%27en%27%29+.%0D%0A%7D%0D%0AORDER+BY+%3Fname)

3\. But Wikipedia will never be comprehensive exactly because of their
notability guidelines. All the latest new super-underground Norwegian Black
Metal bands that are recording in a wood shack in somebody's backyard, are not
going to have Wikipedia entries.

4\. On the other hand, musicbrainz.com seems to have an awfully comprehensive
set of listings. And their data is part of the semantic web / linked data
cloud, as well.

By way of example, Verminous are something of an underground act, and are not
on Wikipedia, but their info is in Musicbrainz.

Anyway... just thinking out loud here.

------
toggle
The author is from a company called Bottlenose, which (Wikipedia says) is a
"company that analyzes social media and business data to detect trends for
brands."

So it's pretty clear that he has an angle here.

------
smackay
Instead of the semantic web trying to create knowledge from data the author
wants to take all that data and create more chatter and noise, oops, I mean
buzz.

Don't waste your time reading this nonsense.

------
pencilcheck
This author has no real deep understanding of current AI research and the what
semantic web actually means. Throughout the article, the author not only
ignores and reference any authentic websites for defining semantic web, but
also trying to make fun of it by unrelated example such as Google's Knowledge
Graph.... Comparing semantic web with the Knowledge Graph is like comparing
graphs with oranges. Those are completely different beast and for different
purposes. This goes to show the understanding of the author and his/her
intent. And ironically, everything that the author tried to sell us on are
spot on part of semantic web. For example, the Stream data he/she defined is
precisely what semantic web is trying to do, is to label and give meaning to a
certain data instead of a big chunk of text/binary. The author also mentioned
about pushing and pulling, I argue that the way information/data is
disseminated has nothing to do with the ontology or the semantics of data.

And there you go, such a shallow article with a very aggressive link title
bait, just pathetic.

------
quizotic
Yeah, the article is crap. But did you take a look at the site:
bottlenose.com? It's kinda gorgeous.

Having been in the biz (NLP search with sentiment extraction) once, it still
comes down to precision and recall, with poor precision being the fastest way
to lose sales. I didn't see any indication of bottlenose accuracy. But maybe I
was too blinded by the beautiful d3.js.

Speed is also an issue. Machine Learning takes setup time and good corpus
sets, after which it's pretty fast. Traditional NLP is faster to start but
slower and less accurate after. Neither is remotely close to real-time, which
makes me wonder what they're really delivering.

Also, at least in my experience, I couldn't get product managers/marketers to
give a hoot. But ad-agencies ate it up. And boutique survey shops. And
sometimes CEOs. And sometimes customer-service organizations that had too much
inbound hate mail and had to triage.

I think they may be smoking it to think they're going to get inbound traffic.
I had to pound doors.

------
taeric
So, I actually somewhat agree with the idea that many of the search examples
ads and such use are ridiculously misguided.

Consider, when I'm looking at a blog or an article, it is easy to see other
people's reactions if there is a comment box. What is sometimes harder to find
is the general context of why a blog/article exists. Did its existence prompt
the creation of other blogs and articles? More, is the article still relevant?
I think some pushed for this concept with "trackback" and such. But I don't
think that really took off. (Maybe I just need to learn to use some tools
better.)

However, I think I get lost around the notion that things should be pushed to
you. I mean, unless you are referring to twitter style "you probably ignore
90% of what is pushed at you."

------
VladRussian2
sort of a tragedy of commons. The expenses to be born individually while
rewards to be ripped by the society/community. Only incarnation of semantic
web profitable individually that has been discovered so far is blossoming SEO.

------
mindcrime
I have to admit, I haven't read TFA, and I'm not sure I want to. The Semantic
Web has hardly failed - tons of people _use_ the Semantic Web everyday and
just don't know it. The thing is, the SW isn't necessarily _meant_ to be
something that the average end user knows about and uses explicitly. It's just
about making it easier for machines to understand semantics around data on the
web, so those machines can do a better job of helping the humans do whatever
it is they are trying to do. So Google could be using the Semantic Web behind
the scenes all day long, and the end user would never know it.

And yes, Google do use the Semantic Web.[1][3] So does Yahoo.[2][3] Etc.[3]

It doesn't matter that some people use RDFa, others use microdata, others use
microformats, others use RDF/XML, others use JSON-LD or whatever. That's
irrelevant syntactical details. The point is having explicitly defined
semantics associated with things.

Anyway, the Semantic Web is becoming more and more important with every
passing day. As tools[4] for automating the process of extracting rich
semantics from unstructured data mature and become better and more widely
available, the number of applications for explicit semantics is just going to
mushroom.

Just to illustrate (and forgive me a bit of what might be seen as self-
promotion here) - our Enterprise Social Network product, Quoddy, has Stanbol
integration such that we can process all the various bits of "stuff" that flow
through the system, do semantic concept extraction, and store those entities
and relationships in a triplestore. Our Information Discovery Platform,
Neddick, does the same thing as we consume RSS feed data, Tweets, Emails, etc.
Now we can do things like show you, for, say, a given status update, the blog
posts, emails, tweets, people, documents, etc, that are conceptually related.
And while end-user use of "semantic queries" might not seem useful to some
people, the bottom line is that this enables searches that you just can't do
with "regular" (that is, non-semantic) tech.

An example... let's say you do something with musicians. Your ESN status
update messages occasionally mention, say, Jon Bon Jovi, Bob Marley, Richard
Marx, and Madonna. How would you do a search without SW tech that says "show
me all posts that mention musicians"? Not gonna happen. But with the semantic
extraction + triplestore, we can make that kind of query trivial.

It gets better though... Stanbol comes "out of the box" with the ability to
dereference entities that are in DBPedia and other knowledge bases, which is
cool enough in it's own right... but you can also easily add _local_ knowledge
and your own custom enhancement engines. So now entities that are meaningful
only in your local domain (part numbers, SKUs, customer numbers, employee ID
numbers, whatever) can be semantically interlinked and queried as part of the
overall knowledge graph.

Hell, I'd go so far as to say that Apache Stanbol (along with OpenNLP and a
few related projects... UIMA, Clerezza, etc.) may just be the most important
open source project around right now. And nobody has heard of it. Again, the
Semantic Web is largely not something that the average end user needs to know
or think about. But they'll benefit from the capabilities that semantic tech
brings to the table.

<rant-over />

[1]:
[https://support.google.com/webmasters/answer/99170?hl=en&ref...](https://support.google.com/webmasters/answer/99170?hl=en&ref_topic=1088472)

[2]: [http://developer.yahoo.com/blogs/ydn/searchmonkey-support-
rd...](http://developer.yahoo.com/blogs/ydn/searchmonkey-support-rdfa-
enabled-7458.html)

[3]: [http://ebiquity.umbc.edu/blogger/2011/06/02/microdata-
rdfa-g...](http://ebiquity.umbc.edu/blogger/2011/06/02/microdata-rdfa-google-
bing-yahoo-semantic-web/)

[4]: [http://stanbol.apache.org/](http://stanbol.apache.org/)

~~~
rhythmvs
\- [https://github.com/fogbeam/Quoddy](https://github.com/fogbeam/Quoddy) \-
[http://www.fogbeam.com/](http://www.fogbeam.com/) \-
[http://fogbeam.org/](http://fogbeam.org/)

Looks great!

Any chance any of these could be applied to legal corpora? (Cfr above:
[https://news.ycombinator.com/item?id=6731714](https://news.ycombinator.com/item?id=6731714)
)

~~~
mindcrime
_Any chance any of these could be applied to legal corpora?_

That's a pretty broad question, but generally speaking, I'd say the answer is
"yes". It depends on exactly what you want to do.

Feel free to email me if you'd like to talk about that in more detail. I will
issue this caveat though: We haven't - to date - focused on the legal world,
and it's not something I have a lot of specific knowledge of, vis-a-vis the
domain specific parts.

------
malandrew
It failed because we chose a 1 to 1 relationship between the window object and
the document object. There should instead be a 1 to many relationship.

