
Ask HN: Is the semantic web still a thing? - sysk
A few years ago, it seemed as if everyone was talking about the semantic web as the next big thing. What happened? Are there still startups working in that space? Are people still interested?
======
irickt
The semantic web is now integrated into the web and for the most part it's
invisible. Take a look at the timeline given in this post:
[https://news.ycombinator.com/item?id=3983179](https://news.ycombinator.com/item?id=3983179)

Some of those startups exited for hundreds of millions, providing, for
example, the metadata in the right hand pane of Google search.

The new action buttons in Gmail, adopted by Github, are based on JSON-LD:
[https://github.com/blog/1891-view-issue-pull-request-
buttons...](https://github.com/blog/1891-view-issue-pull-request-buttons-for-
gmail)

JSON-LD, which is a profound improvement on and compatible with the original
RDF, is the only web metadata standard with a viable future. Read the
reflections of Manu Sporny, who overwhelmed competing proposals and bad
standards with sheer technical power: [http://manu.sporny.org/2014/json-ld-
origins-2/](http://manu.sporny.org/2014/json-ld-origins-2/)

There's really no debate any more. We use the the technology borne by the
"Semantic Web" every day.

~~~
mindcrime
I agree with most of what you just said, but one nitpick:

 _JSON-LD, which is a profound improvement on and compatible with the original
RDF,_

I don't think this is right. JSON-LD _is_ RDF. What it isn't is RDF/XML. But
it's important to realize that RDF != RDF/XML. JSON-LD is just an encoding of
the abstract RDF triple model in JSON, just like RDF/XML is an encoding of RDF
into XML.

~~~
rmc
There are several other encodings of rdf. Ntriples, turtle

~~~
mindcrime
Exactly. There's even another JSON encoding, RDF/JSON[1], but I don't think
anybody uses it.

[1]: [https://dvcs.w3.org/hg/rdf/raw-file/default/rdf-
json/index.h...](https://dvcs.w3.org/hg/rdf/raw-file/default/rdf-
json/index.html)

------
andybak
I'm still waiting for a comprehensive rebuttal to Cory Doctorow:
[http://www.well.com/~doctorow/metacrap.htm](http://www.well.com/~doctorow/metacrap.htm)

Just to clarify - I don't think his arguments demolish all aspects of the case
for 'the semantic web' (however ill-defined that term is) but if he's right
then it severely circumscribes the kind of content that will ever have useful
metadata.

At the same time, we are getting better at inferring context without needing
metadata. There is so much more data coming from this source (i.e. the "sod
metadata, let's guess" methodology) than from 'intentional' semantic sources.

So - semantic markup will never be of use unless the content is coming from a
source where the metadata already exists. It will largely be useful to
'database' styles sites rather than 'content' style sites. Think directories
and lists rather than blogs and articles.

(Question to front-end types. Are people still agonising over section vs
aside, dl/dt vs ul/li under the impression that it makes any damn difference?
Angels dancing on the head of a pin...)

~~~
IanOzsvald
Consider Wikipedia->DBpedia->Wikidata->Wikipedia. First we had semi-structured
human-readable information expressed using mediawiki's markup which was hard
to parse automatically. Next the DBpedia project (and YAGO) created tools to
parse mediawiki and extract facts as triples (e.g. "this thing" "is-a"
"company"), they encountered many alternate ways of expressing the same
information (e.g. date variants, weights and measures).

Now the Wikidata project (2012) is normalising the data in Wikipedia so that
projects like DBpedia have an easier time with the raw information (no need to
write alternative parsers for dates, weights, measures and simple facts!). As
a result we've gone from human-readable information to machine-readable
semantic-web-like information which is accessible via Linked Open Data.

Maybe the driver for semantic web data is humans trying to programmatically
consume human-readable information, rather than the other way around?

~~~
jerven
Working on the UniProt RDF I find that everytime we make it easier for
semantic tools to work with our data it actually improves the usability for
human users as well.

The RDF part of the semweb idea encourages us to be extremely explicit with
what we mean with our data. This helps our end users because it removes a lot
of guess work. What was obvious for us as maintainers is not obvious at all
for the biologists who need to do stuff with our data. e.g.
[http://www.uniprot.org/changes/cofactor](http://www.uniprot.org/changes/cofactor)
(going to be live soon) it's a small change from textual descriptions of which
chemicals are cofactors for enzymatic activity to using the ChEBI ontology.
This allows us to better rendering (UI) and better searching. It also makes
the difference clear between cofactor is any Magnesium or cofactor is only
Magnesium(2+).

In the life sciences and pharma semweb has a decent amount of uptake. For the
very simple reason that this branch deals with a lot of varied information and
often mixes private and public data. RDF makes it cheaper for organisation to
deal with this.

SPARQL the query language has a key feature that no other technology has in
the same way. Federalised queries: if I am in a small lab I can't afford to
have datawarehouse of UniProt, it would cost me 20,000 euro - 30,0000 euro
just to run the hardware and maintain it. As a small lab I can use
beta.sparql.uniprot.org for free and still combine it with my and other public
data for advanced queries. Sure uniprot has a good rest interface but it is
limited in what you can do with it in ways that SPARQL never will be.

SPARQL is only interesting as a query language since last year. Schema.org is
only interesting since last year. JSON-LD is only interesting since last year.
Semweb is finally growing into its possibilities and making real what was
promised 17 years ago now.

Of course even in the life science domain many developers don't know what one
can do with semweb tech, and semweb marketing is no where as effective as e.g.
MongoDB or even Neo4J is. So uptake is still slow but it is accelerating!

------
mqsiuser
The company of my professor went bankrupt in 2012 (Ontoprise). Not sure if
they dissolved totally by now.

AI failed (again). I never understood where "Intelligence" lies if the only
thing you can do is infer: If A --> B & B --> C, then also A --> C ("we don't
do anything else since that won't be _logic_ " & then bloating it and naming
it "Reasoner").

If you can't spin of quickly from academic ideas (like Google search) it will
just be ongoing research binding masses of people on the wrong things (to
pursue). Don't tell me they chose to,... still influenced and finding out
later that it wasn't worthwile.

Academia thought it's the next web, but it wasn't. The Web 2.0 was the next
web then, leaving the semantic web in the dust.

"When I see the semantic web (of trust) be done (properly), this is basically
when I can retire" (Tim Berners-Lee ~2004).

Just my (honest) thoughts (as s/w who spend a significant amount of time on
RDF/OWL et al at university).

~~~
bemused
no idea why this comment gets downvoted -> I came to the same conclusion,
having done quite some research at university on this topic as well. The
community does a great job on getting huge funding from the government/eu but
the results are mostly pathetic from a CS pov. eg I came across papers/PhD
thesis where people were fabulating about all the great things you can do when
automatically merging ontologies would be feasible without the slightest
understanding of computational complexity and the semantics of natural
language in general

that said I still see the advantage of semantic web style technology in
cathedral style environments eg corporate knowledge DBs or wikidata IF you can
afford the bloat. most of the time its much more straightforward to just use
your own schematics and call it a day (like the KDE folks did this year,
finally giving up on getting their rdf-database to perform reasonably well and
going back to a relational model for the desktop search)

~~~
jerven
Not sure KDE went back to a relational model. The main change I feel was going
from a single central database (Virtuoso) which had a copy of everything to
keeping the data where it is as much as possible and only store copies where
needed for performance.

Virtuoso being an impressive DB is not the most stable or resource use
friendly datastore for desktop use.

This decentralised data storage actually makes a lot of sense. And I hope to
work on something similar for life science data except that the unifying API
will be SPARQL instead of a C++ API. (Not to say that a single C++ API does
make a lot of sense for the KDE project, where it does not for life science
data.)

------
pudo
For what it's worth, I spent last month trying to use RDF tooling (Python
bindings, triple stores) for a project recently, and the experience has left
me convinced that none of it is workable for an average-size, client-server
web application.

There may well be a number of good points to the model of graph data, but in
practice, 16 years of development have not lead to production-ready tools; so
my guess is that another year will not fix it.

Here's a write-up: [http://pudo.org/blog/2014/09/01/grano-linked-
data.html](http://pudo.org/blog/2014/09/01/grano-linked-data.html)

~~~
parasubvert
I managed to build a reasonably scalable product with the Java based tooling
from the Jena project , along with the Python bindings - also the Clark and
Parsia tools like Stardog are quite workable.

My experience is that much (SPARQL, basic reasoning) is production ready and
has been for a long time, the problem is that it is hard to constrain yourself
to the subset of features that don't lead to exponential computations.

------
brandonb
Peter Norvig put it best: "The semantic web is the future of the web, and
always will be."

(For what it's worth, the startup school video that quote comes from is worth
watching:
[http://youtu.be/LNjJTgXujno?t=20m57s](http://youtu.be/LNjJTgXujno?t=20m57s))

------
tckr
Absolutely. Given the constant progress in extending schema.org and new works
like JSON-LD and Hydra ([http://www.markus-
lanthaler.com/hydra/](http://www.markus-lanthaler.com/hydra/)) I think we are
(slowly) aproaching a state where we will see adoption of semantic schemas in
APIs and website in a wider scale.

~~~
coldtea
If not anything else, I admire your incredible optimism.

------
joostdevries
The way I see it that technology has been on the cusp of being succesful for a
long time.

What the reasons are is largely a matter of opinion. In my opinion there are
several possible reasons: \- the 'semantic' idea of 'strong' modeling of the
world lost out to a competing approach that uses probabilistic models. The
latter models don't require coordinated effort by humans and thus scale
better. \- the semantic approach also suffered from academicians myopically
going over the same millimeters of theory for decades. And losing sight of
matters of practicality. \- The fundamentals of the semantic technology seem
rather brittle to me. With that I mean that a tiny difference in the reasoning
axioms can make the whole reasoning intractable. That might be anti-tethical
to the 'tinkering until it works' approach that software engineers often use.
\- Somehow a broady applicable killer application didn't turn up. But that's a
result as much as it might be a reason.

There's still use for the technology though. If you need to unify data that
follows subtly different datamodels I'd be hard pressed to think of an
alternative. Which makes me wonder whether intelligence agencies use the
technology. F.i. I remember a reference given by Oracle of the US Geospatial
Intelligence Agency using their quad store. The new moniker 'linked data'
emphasises this aspect. Government agencies do struggle with having to relate
data that are obviously related but conceptually, legally subtly different.
And they do spend quite some attention on linked data. They seem to stay
mostly within the RDFS realm and do not stray into more interesting OWL
applications. But even in government circles I get a whiff of the solution-
looking-for-a-problem vibe that hounded the semantic web for so long.

~~~
irickt
It's important to not dwell on the past. In the present here are two strong
developments issuing from the original Semantic Web:

Especially in Europe, research is very active and these technologies are core
to many big science projects. (For example in this thread
[https://news.ycombinator.com/item?id=8510885](https://news.ycombinator.com/item?id=8510885)).
These projects are exploring the higher aims of TBL's proposal - with good
cause.

On a more down-to-earth level, there is now a solid web metadata standard in
place in JSON-LD. The big search engines index it and presumably use it to
give better results. Any startup can add value to published data by adding
links - in a significant extension to the "API economy".

Think about it. The base concept of the semantic web is simply a data exchange
format that can be used to implement a distributed relational database - a
pretty practical idea. By way of the false starts of any broad initiative (eg
XML) and withstanding a lot of political spin that I've never understood, we
now have that standard.

Web developers should look at this opportunity with new eyes.

~~~
porker
> On a more down-to-earth level, there is now a solid web metadata standard in
> place in JSON-LD. The big search engines index it and presumably use it to
> give better results. Any startup can add value to published data by adding
> links - in a significant extension to the "API economy".

One pragmatic question is whether schema.org/JSON-LD is of benefit to anyone
other than Google at the moment? I like the idea but with their dominance it
feels like I am doing work to add value to their business, not mine.

------
bane
A bit of background, I've been working in environments next to, and sometimes
with, large scale Semantic Graph projects for much of my career -- I usually
try to avoid working near a semantic graph program due to my long histories of
poor outcomes with them.

I've seen uncountably large chunks of money put into KM projects that go
absolutely nowhere and I've come to understand and appreciate many of the
foundational problems the field continues to suffer from. Despite a long
period of time, progress in solving these fundamental problems seem hopelessly
delayed.

The semantic web as originally proposed (Berners-Lee, Hendler, Lassila) is as
dead as last year's roadkill, though there are plenty out there that pretend
that's not the case. There's still plenty of groups trying to revive the
original idea, or like most things in the KM field, they've simply changed the
definition to encompass something else that looks like it might work instead.

The reasons are complex but it basically boils down to: going through all the
effort of putting semantic markup with no guarantee of a payoff for yourself
was a stupid idea.

You can find all kinds of sub-reasons why this was stupid: monetization, most
people are poor semantic modelers, technologies built for semantic system
generally suck and are horrible (there's pitifully few reasoners built on any
kind of semantic data, turns out that's hard), etc.

For years the Semantic Web was like Nuclear fusion, always just a few years
away. The promise was always "it will change _everything_ ", yet no concrete
progress was being made, and the vagueness of "everything" turned out not to
be a real compelling motivator for people to start adding semantic information
to their web projects.

What's actually ended up happening instead has been the rebirth of AI. It's
being called different things these days: machine learning, heuristic
algorithms, whatever. But the point is, there's lots of amazing work going
into things like image recognition, context sensitive tagging, text parsing,
etc. that's finding the semantic content within the human readable parts of
the web instead. It's why you can go to google images and look for "cats" and
get pictures of cats.

Wikipedia and other sources has also started to look more structured than it
previously was, with nice tables full of data, these tables have the side
benefit of being machine AND human readable, so when you look for "cats" in
google's search you get a sidebar full of semantic information on the entity
"cats": scientific name, gestation period, daily sleep, lifespan, etc.

Like most things in the fad driven KM world, Semantic Web advocates are now
simply calling this new stuff "The Semantic Web" because it's the only area
that kind of smells like what they want and is showing progress, but it really
has nothing to do with the original proposal and is simply a side-benefit of
work done in completely different areas.

You might notice this died about the same time "Mashups" died. Mashups were
kind of an outgrowth of the Semantic Web as well. One of the reasons that
whole thing died was that existing business models simply couldn't be reworked
to make it make sense. If I'm running an ad driven site about Cat Breeds,
simply giving you all my information in an easy to parse machine readable form
so _your_ site on General Pet Breeds can exist and make money is not something
I'm particularly inclined to do. You'll notice now that even some of the most
permissive sites are rate limited through their API and almost all require
some kind of API key authentication scheme to even get access to the data.

Building a semantic web where huge chunks require payment and dealing with
rate limits (which appear like faults in large Semantic Networks) is a plan
that will go nowhere. It's like having pieces of your memory sectioned off
behind tolls.

Here's TBL on this in 2006 -
[http://eprints.soton.ac.uk/262614/1/Semantic_Web_Revisted.pd...](http://eprints.soton.ac.uk/262614/1/Semantic_Web_Revisted.pdf)

"This simple idea, however, remains largely unrealized."

There's a group of people I like to call "Semanticists" who've come to latch
onto Semantic graph projects, not as a technology, but as a religion. They're
kind of like the "6 minute ab" guy in "There's Something About Mary". They
don't have much in the way of technical idea, but understand the intuitive
value of semantic modeling, have probably latched onto a specification of some
sort, and then belief caries them the rest of the way "it'll change
everything".

But they usually have little experience taking semantic technologies to
successful projects (success being defined as not booting up the machine and
loading the graph into memory, but actually producing something more useful
than some other approach).

There's then another group of Semanticists, they recognize the approaches that
have been proposed have kind of dead-ended, but they won't publicly announce
that. Then when some other approach not affiliated with the SW makes progress
(language understanding AI for example) will simply declare this new approach
as part of the SW and then claim the SW is making progress.

The truth is that Doctorow absolutely _nails_ the problems in his essay
"Metacrap"
[http://www.well.com/~doctorow/metacrap.htm](http://www.well.com/~doctorow/metacrap.htm)

He wrote this in 2001, and the issues he talks about _still_ haven't been
addressed in any meaningful way by professionals working in the field, even
new projects routinely fall for most or all of these problems. I've seen
dozens of entire companies get formed, funded and die without addressing even
a single one of these issues. This essay is a sobering measuring stick you can
use to gauge progress in the field, and I've seen very few projects measure
well against any of these issues.

Semanticists, of both types, are holding the entire field back. If you are
working on a semantic graph project of _any_ kind and your project doesn't
even attempt to address any of these things through the design of the program
(and not through some policy directive or modeling process) you've failed.
It's really hard for me to believe that we're decades into Semantic Graph
technologies and nobody's bothered to even understand 2.5 and 2.7.

If your plan to fix problems you're experiencing with your project, the reason
it isn't producing useful results, is to "continue adding data to it" or "keep
tweaking the semantic models" you've failed.

[http://semanticweb.com/keep-on-keepin-on_b41339](http://semanticweb.com/keep-
on-keepin-on_b41339)

"The Semantic Web is not here yet."

No, I've rethought this, the SW is not like Fusion, it's more like Communism.

~~~
a3n
> The truth is that Doctorow absolutely nails the problems in his essay
> "Metacrap"
> [http://www.well.com/~doctorow/metacrap.htm](http://www.well.com/~doctorow/metacrap.htm)

"Take eBay: every seller there has a damned good reason for double-checking
their listings for typos and misspellings. Try searching for "plam" on eBay.
Right now, that turns up nine typoed listings for "Plam Pilots."

I wonder, are there search tools, anywhere from functions to libraries to
engines, that will search for mis-spellings? Google, DDG and probably everyone
else will _correct_ your mis-spelled query, but will anything large or small
go the extra _miel_ and search for mis-spelled hits?

~~~
bane
There's some query expanders that do this. Some are pretty sophisticated,
calculating the n most likely mispellings based on a statistical model
generated from some large corpus, or based on models of human typing behavior
on QWERTY keyboards.

For example, if you search google right now for "plam pilot" you'll get
results for "palm pilot".

~~~
andrewflnr
GP is asking about searching for "palm pilot" and getting results for "plam
pilot".

~~~
a3n
Yes, that's exactly what I'm asking.

------
mindcrime
_What happened?_

The Semantic Web happened, and is still happening. But most people don't
notice, because the Semantic Web isn't, for the most part, about being visible
to end users. But every site using microdata, microformats, RDFa, etc. _IS_
part of the Semantic Web.

Google, Yahoo, Bing, etc., are all using elements of the Semantic Web.

Just because the average end-user isn't writing SPARQL queries doesn't mean
the Semantic Web isn't around.

 _Are there still startups working in that space?_

We are. I just gave a presentation on using Semantic Web tech in the
enterprise at All Things Open last week, and a related talk at the Triangle
Java User's Group earlier in the week, where we showed off a lot of the ways
we are using the SemWeb stack.

 _Are people still interested?_

Judging from the response to the two talks I just gave, I'd say yeah.

For more on my take on this topic, see:

[http://fogbeam.blogspot.com/2013/11/dominiek-ter-heide-is-
de...](http://fogbeam.blogspot.com/2013/11/dominiek-ter-heide-is-dead-
wrong.html)

------
georgespencer
Tantek is still alive, if that's what you mean.

Jokes aside: microformats started to get pretty good traction, but the biggest
challenge (as I see it) has always been adoption in software, rather than
encoding the data itself.

Flock was the great white hope in this space for a browser which (somewhat
sensibly) used the semantic web to enrich people's lives, but there wasn't
really a killer app for it. It did a lot of interesting stuff, but none of it
was omfg can't live without youuuuu.

If browser vendors start building great features to take advantage of the
semantic web, then developers will start adopting and consumers will start
[tacitly] demanding it.

Interesting point: if you go back to the SciAm article which kickstarted a lot
of interest in the semantic web amongst relative laypeople, then you'll find
that actually it's not dissimilar to where we are today, but we are getting
there without the semantic web.

~~~
mindcrime
_If browser vendors start building great features to take advantage of the
semantic web, then developers will start adopting and consumers will start
[tacitly] demanding it._

I think browsers have little to do with the Semantic Web. The semantic web is
more M2M.

~~~
georgespencer
I'm trying to think of real world applications for the semantic web, in 2014,
which do not require a browser. Struggling.

~~~
mindcrime
I just mean that there's not necessarily a need for any Semantic Web tech _in_
the browser specifically. An application may certainly have a browser
interface, but, to my mind, most of the "stuff" that involves working with
RDF, OWL, SPARQL, etc. is server-side / behind-the-scenes stuff that the user
wouldn't touch directly. The various browser plugins that let you look at
embedded RDFa, microformats, etc. are nifty, but I don't personally consider
that kind of stuff a primary use-case for semweb tech.

~~~
Fannon
My problem with that is: Most modern web-apps run much (up to completely) in
the browser. And the Semantic Web Stack is very heavy if you want to use it
client-side. Here every kilobyte counts.

Well, you could directly query a SPARQL database, but thats the one scenario
where a server still makes sense, to provide some decoupling from the database
model and the client.

~~~
mindcrime
_My problem with that is: Most modern web-apps run much (up to completely) in
the browser._

"Use the right tool for the job". :-) There's no particular reason a "modern
web-app" has to run completely in the browser... it still makes perfect sense
to put heavyweight process intensive stuff, persistence, business logic, etc.
on the server-side. If I'm building an app that uses a lot of "semantic
stuff", most of that stuff is, indeed, happening on the server side.

~~~
Fannon
Well, there are reasons and it's getting more common. In fact most modern
browsers, even on a smartphone, may have more computing power than a server if
you consider that the client has only to serve itself and the good performance
of modern JavaScript engines. And having to do a lot of networking with a
server slows a lot too, since network is one of the slowest and unreliable
parts of an app.

So your answer sounds more like an excuse than a solution to me, sorry.

------
charlysisto
Good ideas don't always follow a linear path. The buzz around semantic web
probably didn't mesure at the time all the obstacles it would find on its way.
Agreeing on categories is one of them but IMHO, the biggest one is legacy
content and the inertia it carries on change.

I use dbpedia on a toy project and really appreciate it although I only use a
very shallow portion of its possibilities. And it's still very brittle on the
edges. Also I don't see it taking any momentum if it's not embraced by more of
the big content players.

An interesting question would be : would semantic web favor google ? It would
certainly help it index content but wouldn't also deprive it from its search
monopoly ?

~~~
robryan
Google is probably one the biggest drivers of structured data on websites.
They pull a lot of the additional search features such as ratings stars from
structured data. [1]

As websites will do anything to better than out in search results there is a
large uptake in adding structured data.

[1]
[https://support.google.com/webmasters/answer/99170?hl=en](https://support.google.com/webmasters/answer/99170?hl=en)

~~~
charlysisto
I was aware of it but I find it's the 'minimum syndical' as you put it in
french. A real way of embracing the semantic web would be, for example, to
expose a LinkData compatible subset of their indexed content (a la dbpedia).

The way they expose the web today is unidimensional : keywords => related
website list.

It's great for humans to parse. But its extremely limited for machines. Why
isn't there yet an API or an UI to ask all books of Japanese novelists of the
last 2 centuries (website links included per book). I am sure this could be in
googles reach and of greta interest for everybody (replace books by manga if
you wish).

And that I would call embracing semantic web.

------
sgt101
In 2003 I was in an EU project (Agentcities) that did interoperability
demonstrations for distributed knowledge applications. We had to develop and
use ontologies for ticketing, transport - and a few other similar things.

My honest expectation was that building these ontologies would take about a
week of group effort. As I remember it we were still at it months later.

This convinced me that the SW was a bust. Moreover one of the roadblocks we
hit was very illuminating to me.

Creating the ontologies and onward development/extension of them was hard
because the tools were so poor (also other things like it is just... hard) but
the lack of tools was widely noted as a clear issue.

No one did anything about it. We had protege then, and lo it is so now.

------
Thiz
Semantic web was replaced by an API.

See, people was tryimg to ram information down the throat of the presentation
layer, and what for if our eyes were never to see it?

If you need meaning, ask for it gently, an API will provide it. And leave html
alone.

~~~
icebraining
You're confusing the Semantic Web with one of its particular technologies
(RDFa). It's absolutely not required to stuff the semantic attributes in the
HTML, you can use the Accept header (or even different URLs) to differentiate
between HTML and RDF documents.

If you use JSON-LD, it's not unlike implementing a REST/JSON API, except that
it allows you to reuse standard formats and integrate with other APIs in an
easier way.

------
coldtea
It was never a thing, outside of some fringe companies (academic spin-offs and
the like) and some academic research inspired by Tim Berner's Lee's ideas.

Oh, that, plus, the misappropriation of the term "semantic" by the designer
community for BS like working with DIVs instead of TDs and having hierarchical
document sections in HTML documents (ad-hoc per website), something that never
gave any particular advantage, not even for screen readers (which were from
the start designed to cop with the mess that document structure in the real
world is).

~~~
PavlovsCat
It's easy to downvote this, but what _are_ the advantages of using the "new
semantic HTML5 tags" like section and aside? I read plenty of articles about
them, and in all the advantages were purely theoretical; sure, search engines
or other user engines _could_ do interesting things with it. But do they?

~~~
eponeponepon
It's a bit chicken-and-egg, really. If nobody uses them, nobody else will ever
bother looking for them - but conversely, if nobody's looking for them,
nobody'll use them. Someone has to make the first move.

There's some traction in academic publishing - aside in particular has some
utility for popup/inline footnoting in some EPUB platforms, and some of the
other new HTML5 elements are finding a place in online journal presentations,
marking up formal abstracts and so on. There's nothing terribly clever going
on yet though, as far as I'm aware, and anything that is will be very much
platform or publisher-specific.

~~~
coldtea
> _It 's a bit chicken-and-egg, really. If nobody uses them, nobody else will
> ever bother looking for them - but conversely, if nobody's looking for them,
> nobody'll use them. Someone has to make the first move._

Except if you know, the universe doesn't work this way, and people and entropy
will abuse and misuse any such system on a web scale, just like Cory Doctorow
suggests.

------
bastawhiz
One of the biggest barriers to the semantic web is the barrier to entry.
Scraping web pages are hard. Parsing HTML (which probably doesn't validate) is
hard. Extracting semantic meaning from parsed HTML is hard.

Even once you've piled on the libraries and extracted the bit of information
that you need, what do you do with that data? You process it a bit and store
it in some kind of data structure. But at this point, you've could have just
pinged the website's API and gotten the same data (and more) in a data
structure.

It turns out it's a heck of a lot easier to return a blob of JSON than it is
to process text in markup on a page. And smaller, as well: JSON often takes up
far less space than the corresponding markup for the same content. That's a
big deal when you're processing a very large amount of information.

There's the promise that AI will someday make this easier: if you eliminate
the parse-and-process-into-a-data-structure step and just assume that an AI
will do that part for you, you're in good shape. But that's nowhere near being
a practical reality for virtually all developers, and APIs eliminate this step
for you.

Even if you use something like HTML microdata, there's very few consumers of
the data. Some browsers give you access to it, but that doesn't make it
extremely useful: if you generated the data on the server side, why not just
make it into a useful interface? Or expose the data as raw data to begin with?
Going through the extra effort to use these APIs is a redundant step for most
use cases.

------
sktrdie
As a beginner semantic web researcher I believe it is very successful. Just
came back from an international semantic web conference [1] where a variety of
useful, interesting and important things have been presented.

I want to stress that the semantic web will probably always be a more academic
field rather than "the next big thing". Similar to the artificial intelligence
field or other academic fields that don't always have to lend themselves to
"help industry make more money".

Nonetheless lots of our technologies are being used by industry (startups and
enterprises). But again, our success doesn't depend on industry adoption. It
depends on how useful the things we research about are from a research point
of view - which is sometimes more theoretical.

As an example, most semantic web people think RDF is a more useful model than
ad hoc data models because we care about generality and serendipitous reuse of
data. Industry rather cares more about simple and efficient and fast tools.

All in all I hope some of our findings can be useful to industry and adopted
by many startups but I hope we'll always find a niche where we can do things
that industry can't afford to do. In a way, semantic web exists as an academic
field precisely because it's unexplored by industry. If it were widely used,
perhaps we'd never need academic fields as industry would take over. I hope
that never happens for the semantic web.

1\. [http://iswc2014.semanticweb.org/](http://iswc2014.semanticweb.org/)

------
jacquesm
Nobody will ever agree with anybody else on what the right hierarchy is for
any given set of metadata so I think that hierarchies will eventually die off.
But tags are metadata too and they are getting more and more common. They're
unstructured in many ways (and therefore messy) but they do a pretty good job
where hierarchies failed to gain good traction.

~~~
sesuncedu
Yes, and no, Minister.

Hierarchy will not die off because it is a core part of the way that the mind
organizes concepts- for example, prototype effects imply several different
levels.

Folk Taxonomies are hierarchical, though the depth is usually smaller than
that of scientific taxonomies and the principles of organization is usually
different.

There can be many ways of arranging the same concepts, and although it _is_
possible to show that some are incorrect, it is in general impossible to show
that one and only one Ontology is correct; this follows from the indeterminacy
of translation (Google Gavagai!)

Attempts to force an alien Ontology onto subject matter experts breaks them.
They just stop performing at an expert level.

It is usually possible to develop a suite of ontologies that are logically
interoperable, but this requires experts who have skills from a variety of
disciplines AND who are capable of deferring to the SMEs on how they see the
world. It may be necessary to have intermediate mapping ontologies, but if the
ontologists working with the different communities of interest are careful,
these mappings can avoid losing meaning.

Tagging as ad-hoc keywords does not work for data interoperability; it also
usually fails to achieve good recall. Flat lists of controlled terms are
usually difficult to apply unless they are very small. When Thomas Vander Wal
coined the term Folksonomy, it was intended to cover the same kinds of
structures as folk taxonomies. Its subsequent application to describe
unstructured lists of tags was a misappropriation.

RDF and OWL added some extra problems. They were designed without much input
from people with actual use cases, and optimised for the wrong things. Some
things were dumbed down because some people didn't understand what had gone
before, and could not understand why the old folk were kicking up such a fuss.
Other things were constrained in order to make OWL DL decidable, even though
the resulting worst case complexity of 2-NEXPTIME means that implementations
have to work on special cases, check for too slow results, and / or limit
expressivity to sub profiles just in order to work.

Other design decisions did not consider the human factors of using OWL. It is
very difficult to explain to people why restricting the range of a property
for a specific class is handled by adding an unnamed superclass. It is also
difficult to explain the Open World Assumption, and the non unique name
assumption. It is also hard to explain why, in a web environment with no
standard world closing mechanism, making everything monotonic is necessary.

It is especially hard to justify the restriction of RDF to binary predicates-
some predicates are intrinsically of higher arity; just because higher arity
predicates can be misused, and it is possible to reduce everything to binary
does not make it desirable.

Having a model that does not match the existing experience of UML modelers,
database designers, or users of other KR systems causes real problems for real
users.

Nevertheless there is baby in the bathwater, and it can become soup. It just
might not look the same.

The schema.org efforts are limited to the extent that they are barely semantic
(a conscious decision by danbri and guha); unfortunately some of the
discussions by others show a visceral distain for any questions as to what
vocabulary choices would actually mean that is almost anti-semantism.

------
hocuspocus
As someone who had to implement PoC's using semantic web at a big company, I'd
say it's still limited to academia and very specific fields in the industry
(like bio-medical research).

On an anecdotal note, no recruiter has ever contacted me because of these
particular keywords on my LinkedIn profile.

------
kriro
Not sure about the semantic web as defined but there's quite a bit of metadata
in some places. Last time I checked a full tweet was 4kb in size which is
quite a lot for 140 chars of text :)

If metadata is useful it will be used but pre-emptively adding semantic
information to everything seems pretty unlikely.

------
bemused
[http://www.cse.buffalo.edu/~rapaport/676/F01/icecreamontolog...](http://www.cse.buffalo.edu/~rapaport/676/F01/icecreamontology.jpg)

------
padde
Wikidata is a good example where "semantic web technology" is really useful,
imo.

[http://www.wikidata.org](http://www.wikidata.org)

------
blablabla123
Like 10 years ago some people said Web 3.0 will be either the Internet of
Things or the Semantic Web.

IoT is becoming a reality, both technology wise but also financially. But
Semantic Web? Some ideas of it are there but I think we are not there yet.

FWIW the W3C has several standards for semantic information and there are even
more in progress. I'm having the impression though that the field is still
heavily academia focused.

~~~
rjsw
I work on an ISO CAD standard [1]. There has been some work done on converting
it to a mixture of RDF and OWL but it isn't ready for serious use yet, file
sizes increase about 10x though if it was being used for real some stuff could
be linked to rather than copied around.

[1]
[https://en.wikipedia.org/wiki/ISO_10303](https://en.wikipedia.org/wiki/ISO_10303)

~~~
blablabla123
Ok crazy... Well I believe the proprietary CAD standards are all just some
binary mess, maybe the 10x is worth it?!

~~~
rjsw
The exchange format for ISO 10303 is ascii files, not binary ones, the 10x
expansion is just from wrapping XML tags around the data. I guess we should
look at JSON-LD though.

------
math0ne
Google has recently started to embed some semantic elements into search
results: [http://googleresearch.blogspot.ca/2014/09/introducing-
struct...](http://googleresearch.blogspot.ca/2014/09/introducing-structured-
snippets-now.html)

------
kirkyz
The semantic thing - modelling language nodally - is inevitable. if you get it
- it is simple - the question is when. tools will enable nodal linking as
standard when the need to communicate becomes blinding. today google has their
internal nodes as we all see - and how powerful its intelligence grows by the
day as a result of their nodal step change. but it will not be the only
creator of coherence - it will not want to be - as that is stupefying. We all
need external predication (not by unstructured text alone - although I note
that text is our evolved method of creating a node - just a little rougher
than a GUID.) Tools to link will be created so information points can fluidly
relate to each other. If we talk to computers today, why will they not talk to
each other tomorrow. Is that far away? So then we must just ask if the SW
framework is well conceived. Personally I like the simplicity and power of
attribute value, and an ID that I can relate.

------
rainhacker
Digital Enterprise Research Institute (DERI) in Ireland specializes in
semantic research: [http://www.deri.ie/](http://www.deri.ie/)

~~~
djulius
Specialized in fact, they "pivoted" some years ago and layed off much of the
staff.

------
CmonDev
People suddenly decided that languages designed for no more than interactive
documents are good enough for apps. We need to wait until the js.MVC craze
fades as it should.

------
oneloop
Big data kinda reduced the need for it. People figured that its easier to
bring the masses to the computers than the computers to the masses.

------
roboben
was it ever a thing outside of an university?

~~~
csirac2
I'm not sure that lack of adoption of sem-web tech (even the loosely defined
kind) in sites serving cat pictures is a big deal.

For some, semantic web means RDF and linked data. Of course, the "full
promise" is queries that have the ability to draw inferences from indirect
relationships in the data - admittedly the few examples I've seen, whilst
remarkable - perform best when there's only one, homogeneous underlying
dataset. Where you have disparate datasets from many
organizations/institutions (and the data spans decades), these things struggle
outside of demos due to the huge work required in normalizing/mapping onto
something common that can be sensibly queried against: and sometimes that's
even when the same ontologies are in use! The underlying data just doesn't
necessarily map very well into the sem-web representations, so duplicates
occur and possible values explode in their number of valid permutations even
though they all mean the same handful of things. And it's the read-only
semantic-web, so you can't just clean it, you have to map it..

Which is why I'm always amazed that
[http://www.wolframalpha.com/](http://www.wolframalpha.com/) works at all. And
hopefully one day [https://www.freebase.com/](https://www.freebase.com/) will
be a thing. I remember being excited about
[http://openrefine.org/](http://openrefine.org/) for "liberating" messy data
into clean linked data... but it turns out that you really don't want to
curate your information "in the graph"; it seems obvious, but traditional
relational datasets are infinitely more manageable than arbitrarily connected
nodes in a graph.

So, most CMS platforms are doing somewhat useful things in marking up their
content in machine-readable ways (RDFa, schema.org [as evil as that debacle
was], HTTP content-type negotiation and so on) either out-of-the-box or with
trivially installed plugins.

If you look around, most publishing flows are extremely rich in metadata. For
all sorts of things, like describing news articles [1], journal articles [2]
(DOIs weren't built to serve the semantic web, but certainly are rich in
metadata), movie/book/audio titles and their content...

Beyond that, we just had GovHack [3] here in Australia a few months ago where
groups were encouraged to do what they could with public government datasets
(which themselves, again, aren't necessarily "semantic web" but are
increasingly using "linked-data" formats/standards for, if not
interoperability, then at least dataset discovery). There are RDF
representations of everything from parliamentary archives [4] to land use [5].

I've personally seen some great applications of inter-organizational data
mashing/sharing/discovery in materials science and a few years ago I really
enjoyed working with bioinformatics services such as [6] which allows some fun
SPARQL queries to answer interesting questions.

[1] [https://developer.ap.org/ap-metadata-
services](https://developer.ap.org/ap-metadata-services) [2]
[http://www.opendoar.org/](http://www.opendoar.org/) [3]
[http://www.govhack.org/](http://www.govhack.org/) [4]
[http://www.parliament.vic.gov.au/vufind/Record/46420/Export?...](http://www.parliament.vic.gov.au/vufind/Record/46420/Export?style=RDF)
[5]
[http://www.data.gov.au/dataset/ca83e9bf-1220-43b6-b332-206bf...](http://www.data.gov.au/dataset/ca83e9bf-1220-43b6-b332-206bf0daa258.rdf)
[6]
[http://biodiversity.org.au/confluence/display/nsl/services](http://biodiversity.org.au/confluence/display/nsl/services)

~~~
mark_l_watson
freebase.com is a real thing: along with DBPedia data, freebase.com is a major
input for Google's Knowledge Graph.

I agree that wolfram alpha rocks - I wish it were less expensive to use
though.

------
auggierose
It's funny, I asked exactly the same question in a seminar about 2 weeks ago.
The answer I got was, there are about 500 researchers assembling to talk about
it (somewhere in Italy, I think), why don't you ask them?

~~~
scast
He's talking about ISWC, which was last week.

------
eskimobloood
Whats about facebooks open graph?

------
Fannon
I've looked into Semantic Web Technologies for a year now and trying to come
to a personal conclusion at the moment. This is my current state, through some
of this may be premature:

PRO:

* I can see that the semantic annotation part of it is spreading. Schema.org / JSON-LD might be the first pragmatic solution that I can imagine actually getting more widespread acceptance. Especially if currently existing Frameworks / CMS add support by default.

* Semantic Annotations are helping big companies like Google to make their products smarter and this is happening right now.

* Semantic Web tries to solve some real problems, not just "academic" problems. Information and Knowledge is indeed rather unconnected which reduces its value tremendously. Right now APIs grow to make this mor accessible, but there are many problems unsolved.

* SemanticWeb has some truly interesting ideas and concepts, that I've grown to like. Of course nearly every one of them could work without buying the whole Semantic Web. But still, I think some very interesting ideas come out of that community.

CON:

* It takes a lot of time to understand the Semantic Web correctly and learning about the technologies behind gets soon very mixed up with a lot of complicated and rather uncommon concepts, like Ontologies.

* The tools (even Triplestores) feel awkward and years behind to what I'm used to as a web developer. There are a LOT of tools, but most seem to be abandoned research project which I wouldn't dare to use in production.

* It gets especially complicated when entering the territory of the Open World Assumption (OWA) and the implications that has on reasoning and logic. Say you want hard (real-time) validation because data is entered through a form on a website. Asking some people from the Semantic Web Domain, the answers varied from "I don't know how to do this" up to "Its complicated, but there is some research... , additional ontology for that...". I'm kind of shocked since this is one of the most trivial and common thing to do in the web. And I really don't want to add another complex layer onto an already complex system just to solve simple problems. Something's wrong with the architecture here.

* OWA might be interesting, but most applications / databases are closed world and it would make many things very complicated to try fit it into the Open World Logic. OWA is an interesting concept and makes sense if you are building a distibuted knowledge graph (Which is a huge task and only few have the ressources to do it), but most people will want to stay closed world just because its much more easy to handle. The Semantic Web seems to ignore reality here and prefers to be idealistic, imho.

* This sums up to me to one big problem: The Semantic Web Technolgies provide solutions to some complex problems, but also make some very easy things hard to do. As long as it doesn't provide some smart solutions (with a reasonable amount of learning / implementation time) to existing problems, I don't see that it will be adopted by the typical web developer.

* There are not enough pragmatic persons around in the community, that actually get nice things done that produce that "I wan't that, too!" effect.

