Some of those startups exited for hundreds of millions, providing, for example, the metadata in the right hand pane of Google search.
The new action buttons in Gmail, adopted by Github, are based on JSON-LD: https://github.com/blog/1891-view-issue-pull-request-buttons...
JSON-LD, which is a profound improvement on and compatible with the original RDF, is the only web metadata standard with a viable future. Read the reflections of Manu Sporny, who overwhelmed competing proposals and bad standards with sheer technical power: http://manu.sporny.org/2014/json-ld-origins-2/
There's really no debate any more. We use the the technology borne by the "Semantic Web" every day.
JSON-LD, which is a profound improvement on and compatible with the original RDF,
I don't think this is right. JSON-LD is RDF. What it isn't is RDF/XML. But it's important to realize that RDF != RDF/XML. JSON-LD is just an encoding of the abstract RDF triple model in JSON, just like RDF/XML is an encoding of RDF into XML.
Also, especially important is that it supports sets and lists, which RDF does not.
>I’ve heard many people say that JSON-LD is primarily about the Semantic Web, but I disagree, it’s not about that at all. JSON-LD was created for Web Developers that are working with data that is important to other people and must interoperate across the Web. The Semantic Web was near the bottom of my list of “things to care about” when working on JSON-LD, and anyone that tells you otherwise is wrong. :P
>TL;DR: The desire for better Web APIs is what motivated the creation of JSON-LD, not the Semantic Web. If you want to make the Semantic Web a reality, stop making the case for it and spend your time doing something more useful, like actually making machines smarter or helping people publish data in a way that’s useful to them.
I offered that link because gives the background that lead to a real, practical metadata standard. The "semantic web" has evolved to be linked data tied together on the public side with JSON-LD and on the private side by whatever implementation technologies.
Manu beat incredible political odds to bring the basic intent of the semantic web forward. One strategy of the linked data community has been to distance itself from "semantic" because the word triggers such rage, as evidenced by the tone of argument in this topic.
It is a mystery to me why these developers waste their time with straw-man hyperbole, offended by a decade old proposal. Those contributing to the "noise machine" are doing a disservice by casting doubt in the name of a no longer operational goal.
With JSON-LD the "semantic web" won. JSON-LD is basic - the basis for whatever one wants to create with linked data. It is not "partial". It is comprehensive and superior to other metadata documents. It's the only solution worth adopting, and it's a standard.
The opportunity, today, is linked metadata for day-to-day applications, which solves a real problem and is getting significant adoption.
Happy to learn the issue seems to be addressed.
Just to clarify - I don't think his arguments demolish all aspects of the case for 'the semantic web' (however ill-defined that term is) but if he's right then it severely circumscribes the kind of content that will ever have useful metadata.
At the same time, we are getting better at inferring context without needing metadata. There is so much more data coming from this source (i.e. the "sod metadata, let's guess" methodology) than from 'intentional' semantic sources.
So - semantic markup will never be of use unless the content is coming from a source where the metadata already exists. It will largely be useful to 'database' styles sites rather than 'content' style sites. Think directories and lists rather than blogs and articles.
(Question to front-end types. Are people still agonising over section vs aside, dl/dt vs ul/li under the impression that it makes any damn difference? Angels dancing on the head of a pin...)
Now the Wikidata project (2012) is normalising the data in Wikipedia so that projects like DBpedia have an easier time with the raw information (no need to write alternative parsers for dates, weights, measures and simple facts!). As a result we've gone from human-readable information to machine-readable semantic-web-like information which is accessible via Linked Open Data.
Maybe the driver for semantic web data is humans trying to programmatically consume human-readable information, rather than the other way around?
The RDF part of the semweb idea encourages us to be extremely explicit with what we mean with our data. This helps our end users because it removes a lot of guess work. What was obvious for us as maintainers is not obvious at all for the biologists who need to do stuff with our data. e.g. http://www.uniprot.org/changes/cofactor (going to be live soon) it's a small change from textual descriptions of which chemicals are cofactors for enzymatic activity to using the ChEBI ontology. This allows us to better rendering (UI) and better searching. It also makes the difference clear between cofactor is any Magnesium or cofactor is only Magnesium(2+).
In the life sciences and pharma semweb has a decent amount of uptake. For the very simple reason that this branch deals with a lot of varied information and often mixes private and public data.
RDF makes it cheaper for organisation to deal with this.
SPARQL the query language has a key feature that no other technology has in the same way.
Federalised queries: if I am in a small lab I can't afford to have datawarehouse of UniProt, it would cost me 20,000 euro - 30,0000 euro just to run the hardware and maintain it. As a small lab I can use beta.sparql.uniprot.org for free and still combine it with my and other public data for advanced queries. Sure uniprot has a good rest interface but it is limited in what you can do with it in ways that SPARQL never will be.
SPARQL is only interesting as a query language since last year. Schema.org is only interesting since last year. JSON-LD is only interesting since last year. Semweb is finally growing into its possibilities and making real what was promised 17 years ago now.
Of course even in the life science domain many developers don't know what one can do with semweb tech, and semweb marketing is no where as effective as e.g. MongoDB or even Neo4J is. So uptake is still slow but it is accelerating!
First problem: multiple levels of indent, so there needs to be logic for that. Second problem: some pages have weird stuff in the events section, like "world population". Third problem: some pages have "date unknown" events. Fourth problem: all the other problems I'd encounter if I looked at more than two of these pages.
So it's a day's worth of work to get something rough and 2-3 days to get something solid. To answer one question of this kind.
Does DBpedia help me?
 e.g. https://en.wikipedia.org/wiki/1999
I am saying this without a value judgment either way, but this is still done today under the banner of accessibility, not SW in the sense being discussed in most comments here. I understand that accessibility requires semantics in a similar way, but the use case is specific: allow screen readers to read your pages.
AI failed (again). I never understood where "Intelligence" lies if the only thing you can do is infer: If A --> B & B --> C, then also A --> C ("we don't do anything else since that won't be logic" & then bloating it and naming it "Reasoner").
If you can't spin of quickly from academic ideas (like Google search) it will just be ongoing research binding masses of people on the wrong things (to pursue). Don't tell me they chose to,... still influenced and finding out later that it wasn't worthwile.
Academia thought it's the next web, but it wasn't. The Web 2.0 was the next web then, leaving the semantic web in the dust.
"When I see the semantic web (of trust) be done (properly), this is basically when I can retire" (Tim Berners-Lee ~2004).
Just my (honest) thoughts (as s/w who spend a significant amount of time on RDF/OWL et al at university).
that said I still see the advantage of semantic web style technology in cathedral style environments eg corporate knowledge DBs or wikidata IF you can afford the bloat. most of the time its much more straightforward to just use your own schematics and call it a day (like the KDE folks did this year, finally giving up on getting their rdf-database to perform reasonably well and going back to a relational model for the desktop search)
Virtuoso being an impressive DB is not the most stable or resource use friendly datastore for desktop use.
This decentralised data storage actually makes a lot of sense. And I hope to work on something similar for life science data except that the unifying API will be SPARQL instead of a C++ API. (Not to say that a single C++ API does make a lot of sense for the KDE project, where it does not for life science data.)
There may well be a number of good points to the model of graph data, but in practice, 16 years of development have not lead to production-ready tools; so my guess is that another year will not fix it.
Here's a write-up: http://pudo.org/blog/2014/09/01/grano-linked-data.html
My experience is that much (SPARQL, basic reasoning) is production ready and has been for a long time, the problem is that it is hard to constrain yourself to the subset of features that don't lead to exponential computations.
(For what it's worth, the startup school video that quote comes from is worth watching: http://youtu.be/LNjJTgXujno?t=20m57s)
What the reasons are is largely a matter of opinion. In my opinion there are several possible reasons:
- the 'semantic' idea of 'strong' modeling of the world lost out to a competing approach that uses probabilistic models. The latter models don't require coordinated effort by humans and thus scale better.
- the semantic approach also suffered from academicians myopically going over the same millimeters of theory for decades. And losing sight of matters of practicality.
- The fundamentals of the semantic technology seem rather brittle to me. With that I mean that a tiny difference in the reasoning axioms can make the whole reasoning intractable. That might be anti-tethical to the 'tinkering until it works' approach that software engineers often use.
- Somehow a broady applicable killer application didn't turn up. But that's a result as much as it might be a reason.
There's still use for the technology though.
If you need to unify data that follows subtly different datamodels I'd be hard pressed to think of an alternative. Which makes me wonder whether intelligence agencies use the technology. F.i. I remember a reference given by Oracle of the US Geospatial Intelligence Agency using their quad store.
The new moniker 'linked data' emphasises this aspect.
Government agencies do struggle with having to relate data that are obviously related but conceptually, legally subtly different. And they do spend quite some attention on linked data. They seem to stay mostly within the RDFS realm and do not stray into more interesting OWL applications.
But even in government circles I get a whiff of the solution-looking-for-a-problem vibe that hounded the semantic web for so long.
Especially in Europe, research is very active and these technologies are core to many big science projects. (For example in this thread https://news.ycombinator.com/item?id=8510885). These projects are exploring the higher aims of TBL's proposal - with good cause.
On a more down-to-earth level, there is now a solid web metadata standard in place in JSON-LD. The big search engines index it and presumably use it to give better results. Any startup can add value to published data by adding links - in a significant extension to the "API economy".
Think about it. The base concept of the semantic web is simply a data exchange format that can be used to implement a distributed relational database - a pretty practical idea. By way of the false starts of any broad initiative (eg XML) and withstanding a lot of political spin that I've never understood, we now have that standard.
Web developers should look at this opportunity with new eyes.
One pragmatic question is whether schema.org/JSON-LD is of benefit to anyone other than Google at the moment? I like the idea but with their dominance it feels like I am doing work to add value to their business, not mine.
I've seen uncountably large chunks of money put into KM projects that go absolutely nowhere and I've come to understand and appreciate many of the foundational problems the field continues to suffer from. Despite a long period of time, progress in solving these fundamental problems seem hopelessly delayed.
The semantic web as originally proposed (Berners-Lee, Hendler, Lassila) is as dead as last year's roadkill, though there are plenty out there that pretend that's not the case. There's still plenty of groups trying to revive the original idea, or like most things in the KM field, they've simply changed the definition to encompass something else that looks like it might work instead.
The reasons are complex but it basically boils down to: going through all the effort of putting semantic markup with no guarantee of a payoff for yourself was a stupid idea.
You can find all kinds of sub-reasons why this was stupid: monetization, most people are poor semantic modelers, technologies built for semantic system generally suck and are horrible (there's pitifully few reasoners built on any kind of semantic data, turns out that's hard), etc.
For years the Semantic Web was like Nuclear fusion, always just a few years away. The promise was always "it will change everything", yet no concrete progress was being made, and the vagueness of "everything" turned out not to be a real compelling motivator for people to start adding semantic information to their web projects.
What's actually ended up happening instead has been the rebirth of AI. It's being called different things these days: machine learning, heuristic algorithms, whatever. But the point is, there's lots of amazing work going into things like image recognition, context sensitive tagging, text parsing, etc. that's finding the semantic content within the human readable parts of the web instead. It's why you can go to google images and look for "cats" and get pictures of cats.
Wikipedia and other sources has also started to look more structured than it previously was, with nice tables full of data, these tables have the side benefit of being machine AND human readable, so when you look for "cats" in google's search you get a sidebar full of semantic information on the entity "cats": scientific name, gestation period, daily sleep, lifespan, etc.
Like most things in the fad driven KM world, Semantic Web advocates are now simply calling this new stuff "The Semantic Web" because it's the only area that kind of smells like what they want and is showing progress, but it really has nothing to do with the original proposal and is simply a side-benefit of work done in completely different areas.
You might notice this died about the same time "Mashups" died. Mashups were kind of an outgrowth of the Semantic Web as well. One of the reasons that whole thing died was that existing business models simply couldn't be reworked to make it make sense. If I'm running an ad driven site about Cat Breeds, simply giving you all my information in an easy to parse machine readable form so your site on General Pet Breeds can exist and make money is not something I'm particularly inclined to do. You'll notice now that even some of the most permissive sites are rate limited through their API and almost all require some kind of API key authentication scheme to even get access to the data.
Building a semantic web where huge chunks require payment and dealing with rate limits (which appear like faults in large Semantic Networks) is a plan that will go nowhere. It's like having pieces of your memory sectioned off behind tolls.
Here's TBL on this in 2006 - http://eprints.soton.ac.uk/262614/1/Semantic_Web_Revisted.pd...
"This simple idea, however, remains largely unrealized."
There's a group of people I like to call "Semanticists" who've come to latch onto Semantic graph projects, not as a technology, but as a religion. They're kind of like the "6 minute ab" guy in "There's Something About Mary". They don't have much in the way of technical idea, but understand the intuitive value of semantic modeling, have probably latched onto a specification of some sort, and then belief caries them the rest of the way "it'll change everything".
But they usually have little experience taking semantic technologies to successful projects (success being defined as not booting up the machine and loading the graph into memory, but actually producing something more useful than some other approach).
There's then another group of Semanticists, they recognize the approaches that have been proposed have kind of dead-ended, but they won't publicly announce that. Then when some other approach not affiliated with the SW makes progress (language understanding AI for example) will simply declare this new approach as part of the SW and then claim the SW is making progress.
The truth is that Doctorow absolutely nails the problems in his essay "Metacrap" http://www.well.com/~doctorow/metacrap.htm
He wrote this in 2001, and the issues he talks about still haven't been addressed in any meaningful way by professionals working in the field, even new projects routinely fall for most or all of these problems. I've seen dozens of entire companies get formed, funded and die without addressing even a single one of these issues. This essay is a sobering measuring stick you can use to gauge progress in the field, and I've seen very few projects measure well against any of these issues.
Semanticists, of both types, are holding the entire field back. If you are working on a semantic graph project of any kind and your project doesn't even attempt to address any of these things through the design of the program (and not through some policy directive or modeling process) you've failed. It's really hard for me to believe that we're decades into Semantic Graph technologies and nobody's bothered to even understand 2.5 and 2.7.
If your plan to fix problems you're experiencing with your project, the reason it isn't producing useful results, is to "continue adding data to it" or "keep tweaking the semantic models" you've failed.
"The Semantic Web is not here yet."
No, I've rethought this, the SW is not like Fusion, it's more like Communism.
The key idea is to make it easy for another party to add the semantics on top of your data. This solves some fundamental issues that you and Cory Doctorow mentioned:
1) The economics equation for tagging now works out. The user that's doing the tagging has an immediate need (and payoff) for doing that tagging.
A corollary of this is that the parts of the web that are most valuable (in the sense that users need them the most) tend to get tagged first.
The following are responses to Cory's essay:
2.1) The person that's doing the tagging is also an end user, so there's an incentive to do the tagging honestly. That doesn't stop the underlying website from lying. But that's an issue with the web in general, and is mitigated by things like SEO penalties, reviews, etc.
2.2) Again, the tagger is the person who benefits from the tagging, so as long as the data is valuable enough, it will be tagged despite laze.
2.3) We haven't overcome human stupidity. Presumably since the person tagging the data needs it, it will be at a "good enough" level to be usable.
2.4) This one doesn't apply; the tagger is a different person.
2.5) 2.6) and 2.7) These are tougher, and we haven't started working on them yet. You have the same problems when trying to consolidate data from multiple sources. One possibility is to have several alternatives and allow searching to choose between them. That's how Bloomberg solves some of these problems, though it does result in fragmentation.
I'd love to talk to you about this some more. You can email me at firstname.lastname@example.org
Full disclosure: I'm one of the founders of http://www.parsehub.com
2.5-2.7 are really hard problems. I think that lots of people working in the field get lost on these by trying to achieve some sort of perfect model, or by trying to aggregate every possible option into their model, but neither of them have really been terribly satisfactory or provide the kind of subtle decision framework that humans feel comfortable with.
Watching the explanation of differential gear https://news.ycombinator.com/item?id=8513209, I thought why not make wikipedia the central axis around which you let the diversity of the semantic web spin at its own pace. If most people agree on this authority (or if you wish convention over configuration mess) things become easily connectable.
In other terms instead of relying on sloppy ontology rely on wikidata_id as the sort of referential association table.
"Take eBay: every seller there has a damned good reason for double-checking their listings for typos and misspellings. Try searching for "plam" on eBay. Right now, that turns up nine typoed listings for "Plam Pilots."
I wonder, are there search tools, anywhere from functions to libraries to engines, that will search for mis-spellings? Google, DDG and probably everyone else will correct your mis-spelled query, but will anything large or small go the extra miel and search for mis-spelled hits?
It does actually work for popular things: http://i.imgur.com/emjLPad.png
For example, if you search google right now for "plam pilot" you'll get results for "palm pilot".
In context: "... the knowledge management community ..."
(I guess mainly acadamic)
I think it's equally disingenuous to suggest that a vision, and associated definitions, aren't allowed to evolve - and to suggest that "X failed" because "X" isn't exactly the same today as it was in 1999.
One of the reasons that whole thing died was that existing business models simply couldn't be reworked to make it make sense. If I'm running an ad driven site about Cat Breeds, simply giving you all my information in an easy to parse machine readable form so your site on General Pet Breeds can exist and make money is not something I'm particularly inclined to do
It may not make sense for every use-case, but plenty of companies have found value in using SemWeb technologies.
I think that's absolutely a fair criticism of my critique. However, I stand by my critique. Nobody is still calling cars "horseless carriages". At some point, things stop evolving and become something else. The SW has had 13 years to demonstrate value and really hasn't been able to do it in any broad sense while everything else the SW was promising has been met and exceeded by non-SW approaches (which are now being co-opted and called the "Semantic Web" simply because they work).
I think my main thrust is that the Semantic Web failed. That's okay. We learned a lot. Now it's time for something else to take over, call it "Cross Domain Reasoning Systems" or "Global Presence" or some other Gartner Conference ready term.
The Dinosaurs died, and that's too bad, but we learned a lot, and out of that we ended up with birds, of which Chickens and Turkeys are delicious.
But it's time to put it to bed and move on. I've found it always curious that people who work with Semantics have such a bad understanding of what things actually mean.
Programmers do have to agree on ways to represent information when interfacing different systems. Right now for the web, that is HTTP APIs.
There is a pretty powerful religion pushing "true REST", i.e. emphasis on using the correct HTTP verb in the proper way in relation to the type of entity being updated.
To me, that is enforcing a very basic general (CRUD) type of knowledge representation for the programmed web. I think it demonstrates that the instinct for a common language is there.
You can also see serious KR use in certain domains like biomedicine.
One area I have been thinking about applying KR is in defining information systems and programming languages. The reason I want to do that is because there seems to be quite a lot of overlap between a very diverse set of programming languages, and also because I want a format to be able to represent algorithms that can serve as a basis for a type of open source operating system. This operating system either needs to force everyone to produce code for a certain programming language or common lower-level virtual machine. OR: it can use a higher-level semantic metalanguage (maybe based on description logics) and then people can program that in the language they choose (perhaps something most suited to a particular system or domain).
All of these are programmer use cases. I think more and more of this general purpose type of knowledge representation will inevitably start being applied by programmers.
One key reason we don't see it applied more often I believe is convenience. You need a convenient representation and convenient tooling. Most of the semantic web or more generally KR tools haven't focused on that enough, which is understandable because the core requirement is machine-processable exchange. I think the trick might be coming up with a way to embed a compact KR more directly in general purpose programming languages or data formats, or translate automatically to and from domain representations to a KR format.
Another idea I had besides building an operating system on top of a DL (translated to and from different programming languages or representations, serving as a common metalanguage) was to try to popularize semantic computing at the same time as you popularize a standard for the metaverse. https://github.com/runvnc/vr
I recently wrote an article touching this subject and how standardized APIs will be a boon to integrating web content in non-standard interfaces, but truth be told i'm secretly a 'semanticist' and have always been excited about the possibilities SW can produce: https://medium.com/@mcriddy/semantic-web-design-92ef35f66c9f
I think this is a very small use case and there's not enough critical mass behind it. The main reason being that even if you were able to aggregate all the different API together to build something new and bold, it's such a one off idea that is specific to a developer that it really wouldn't be of a much use investing huge amounts of time and money towards building a consolidated API that would serve multiple masters.
The rise of AI and machine learning really puts into doubt the economic feasibility of spending money and time on a service that focuses purely on aggregating disparate islands of data unless the payoff existed in that particular market.
You don't see a major API provider that provides data to everything, instead you see small enclaves of niche businesses that focus and specialize in their specific consolidation efforts and get paid for it.
All in all, I think the semantic web will become more of a private endeavor, at least for proprietary commercial goals. As AI gains more ground, the need for such a central API or the need to unify all the different data together will slowly die off.
- I'm not sure the lack of motivation for implementing SW is economic but rather : what's the point if no one is using it.
- I don't think the shortcomings described by metacrap are foundational : it's like saying a word doesn't describe the reality of the thing it 'signifies' therefore language is useless and will not be adopted. There might be competing standards on ontology but I'm sure they'll emerge.
- AI might be (one of) the tool that make semantic web actually relevant. The reason why I believe google has an important part to play in it.
- and finally : developers, developer, developers !!! Bring an elegant tool in the ecosystem (JSON-LD maybe ?) : devs won't shy away in front of horrendous xml, and all of the sudden SW might get caught in a virtuous circle and become the previous big thing.
You might be right, I actually would love to be wrong. But even as a thought experiment (let's suppose the SW was up and working and fully realized today) what are the use cases other than "changing everything"?
Are these use case the same or similar to just doing it through API access to some set of underlying relational databases? Would the API access give us better performance, even under a federated scenario? If not, what's the advantage?
> Bring an elegant tool in the ecosystem (JSON-LD maybe ?) : devs won't shy away in front of horrendous xml, and all of the sudden SW might get caught in a virtuous circle and become the previous big thing.
I'm conflicted on this. A big part of me says that if something ends up taking off, it won't be the ideas posited by the W3C and it'll just be something else that happened to work and now Semanticists are just co-opting it and re-labling it as "Semantic Web".
I'm deciding not to fall for it. The SW failed. If something else takes off that achieves similar goals but through entirely different means, it's not the SW, it's...whatever it is.
I think it's okay the SW failed. If I sound critical it's not because it was a failure, it's because of the people that keep insisting it wasn't in the face of overwhelming evidence.
The failure of the SW has brought us lots of important information about large scale distributed information systems from a technical and sociological standpoint. It's time to study those issues and outcomes and try something else.
Instead, the field is like the search for the Higgs Boson, except whenever the last particle accelerator the community tried failed to produce the Higgs Boson, and instead found some other particle, the community simple decided to call the other particle the Higgs Boson instead. It's kind of mind-bending to work anywhere near the field.
A trivial example I gave on another comment :
> The way google exposes the web today is unidimensional : keywords => related website list. It's great for humans to parse. But its extremely limited for machines. Why isn't there yet an API or an UI to ask for "all books of Japanese novelists of the last 2 centuries" (website links included per book).
An other example that comes to mind is : you query a person and instead of a list of related website you get a set of tabs with : bio from different sources, work, news, images & videos.... Bonus, you don't need to be google anymore to do that ! Ok that's far fetched but imagine the potential for webapps.
So that's kind of one of the canonical examples for the SW that's always given. Query for an object and get an entire dossier back on that object, assembled, federation fashion, from tons of disparate sources all over the web.
It turns out that
a) that doesn't refute the API access to a bunch of relational stores all over the place and the better performance you're likely to get from those stores
b) the notion that just compiling all that stuff in one place works better
c) really any of the issues presented by Doctorow, read his essay again with a critical eye and think about how each of his criticisms would apply to something like this.
Modern search engines have largely figured out how to provide a fairly high-level ontology equivalent to this use-case by simply parsing out the content on the pages and centrally storing it. This is not the Semantic Web.
If google opens up it's index to the world so you could use it as an api to query the web with the power of a relational DB, the case for semantic web would probably be pointless. But it's locked and only serves there ad system.
I'm not defending Semantic Web per say, but because the lack structure in the web makes it only 'parsable' by gigantic entities like google. Their work is remarkable and they offer a nice service I can consume. But I can't really build on it.
I think the debate should shift from the means to the goals... Semantic web might not be the right way to do it, but I strongly believe in the necessity of a way to (openly) connect the dots of all that data or it's a giant waste.
What are the relational stores you mention in a) ?
Yes yes yes. The conversation needs to change from "if we build the Semantic Web it will change everything" (where everything is unspecified and not discussed) to we need to do this, what do we need to do it.
> What are the relational stores you mention in a) ?
Behind most modern on-line connected websites there's a big Postgres/Oracle/SQL Server database somewhere (and increasingly non-relational no-SQL type of stores). The SW basically chose to turn the web into a giant federated RDF triple-store which, if you think about it, is kind of ridiculous from any kind of sensible performance POV, just accessing the information from the source, the databases that are used to generate the pages, seems to make more sense.
That question suggests that you see "The Semantic Web" as being exactly equivalent to "HTML with semantic metadata added". But that most definitely is not the case, and "API access" is part of the Semantic Web. An HTTP based protocol for SPARQL has been around for forever, and continues to evolve, as do the standards around semantic discovery of API services.
The whole RDFa, microformats, microdata, GRDDL thing is just a small part of the overall picture.
Some of these things in your list were proposed specifically because the Semantic Web technology inventory had failed to produce anything useful.
Well, that's one possible narrative one could believe in. Another would be that a lot of people engaging their NIH tendencies invented a parallel technology track, covering a lot of the same ground, and offering no real advantages, just because they could. shrug
I know we disagree on this topic, but thanks for having a sensible debate on it!
So you're arguing that JSON-LD isn't a Semantic Web technology?
It doesn't solve any of the fundamental problems that the Semantic Web has. Just taking JSON (not a SW tech) and specifying a serialization method on it doesn't suddenly make it SW.
In fact, because JSON is easier to handle the XML, it should make the failure of the idea even more apparent.
Instead it's acted as a distraction, emulating "progress" while everybody starts to move their semantic graph engines to support it while not actually kicking the ball forward on any specific front. There's really nothing new that JSON-LD introduces other than being easier to parse. Except that it somehow specifies a syntax that turns JSON into something about as ugly and verbose as XML.
Honestly, your argument, to me, sounds like saying "cars have failed since Henry Ford didn't mention fuel injection, overhead cams, or turbochargers".
The short thought experiment that would reveal that cars won't work simply hasn't been done as a field with the SW and it's usually not done in any kind of Semantic technology circles. You end up with "just tweak the model!" and "it'll start to work when there's sufficient data" and "we just need to build the reasoning engines" is what the field has been spinning on for more than a decade.
Even in SW circles, there's a general consensus that the SW has not arrived. In some more honest pieces it's recognized that it was a failure. But there's tremendous momentum behind the idea because of TBL and people aren't willing to give it up and jump ship onto what's actually working until it gets a big name and W3C (or some other notable committee) to back it.
AI is definitely big, on multiple levels. Even without AGI, the ability of AI/ML techniques to help serve as a bridge to the SemWeb world by, for example, extracting semantic data from unstructured text, is huge. This is why Apache Stanbol excites me so much. I foresee the addition of progressivly better and better enhancement engines for Stanbol, constantly improving the ability to do that structured extraction. This will make the overall Semantic Web vision that much more practical.
"what's the point if no one is using it" IS an an economic problem.
It's still very much the reason people are more inclined to scrape a certain site, they want to simply piggy back off an existing source and charge people money for it. These types also have a tendency to 'pay as little as possible' when it comes to acquiring the data.
IMHO, I've observed that Semantic Technologies have continuously hit the same hard edge cases over and over again and most of those are human factors and social issues. Moreover, there's a number of technologies that provide near-enough functionality without the huge painful social overhead associated with STs, basically worse is better.
What really needs to happen is for somebody to take a few of the huge publicly available triplestores and write some really compelling reasoning systems on them and demonstrate almost effortless merging across the datasets. Outside of fairly trivial examples, almost all of which are similarly or outperformed by near-enough technologies, really complex reasoners haven't happened.
I think the other problem is that most web pages these days are generated from some datastore somewhere. It's a fool's errand to go from this nicely related data to generate a human readable webpages with embedded semantic markup and try to SparQL your way across a hugely federated search domain when you can just get API access to the underlying data in the first place. Semanticists will probably call this "the Semantic web", but querying several databases and merging results programmatically is something that came along decades before the Semantic Web.
Also, DB people are used to have some lower level things in their toolbox, while with these you really don't have an idea what's happening with data structures.
A fair portion of these database is not even maintained today.
Just to conclude, it's very hard to find an alternative in the triplestore world to Neo4J, Cassandra, MongoDB, Couch, <put_your_db_here>, (especially free or cheap) which developers can just get up and running easily and experiment, learn and scale latter.
Documentation and community support is another topic altogether, don't make me start on that one...
Right, you mostly wouldn't do that. If you want access to data in a relational store, from a semweb perspective, you use R2RML with something like D2RQ.
There's also a lot more to the Semantic Web than just extracting structured data from otherwise unstructured content like HTML. If you have semantic data you want to expose, depending on the use-case, you just expose it using the remote SPARQL protocol, for direct M2M use.
The Semantic Web happened, and is still happening. But most people don't notice, because the Semantic Web isn't, for the most part, about being visible to end users. But every site using microdata, microformats, RDFa, etc. IS part of the Semantic Web.
Google, Yahoo, Bing, etc., are all using elements of the Semantic Web.
Just because the average end-user isn't writing SPARQL queries doesn't mean the Semantic Web isn't around.
Are there still startups working in that space?
We are. I just gave a presentation on using Semantic Web tech in the enterprise at All Things Open last week, and a related talk at the Triangle Java User's Group earlier in the week, where we showed off a lot of the ways we are using the SemWeb stack.
Are people still interested?
Judging from the response to the two talks I just gave, I'd say yeah.
For more on my take on this topic, see:
Jokes aside: microformats started to get pretty good traction, but the biggest challenge (as I see it) has always been adoption in software, rather than encoding the data itself.
Flock was the great white hope in this space for a browser which (somewhat sensibly) used the semantic web to enrich people's lives, but there wasn't really a killer app for it. It did a lot of interesting stuff, but none of it was omfg can't live without youuuuu.
If browser vendors start building great features to take advantage of the semantic web, then developers will start adopting and consumers will start [tacitly] demanding it.
Interesting point: if you go back to the SciAm article which kickstarted a lot of interest in the semantic web amongst relative laypeople, then you'll find that actually it's not dissimilar to where we are today, but we are getting there without the semantic web.
I think browsers have little to do with the Semantic Web. The semantic web is more M2M.
Well, you could directly query a SPARQL database, but thats the one scenario where a server still makes sense, to provide some decoupling from the database model and the client.
"Use the right tool for the job". :-) There's no particular reason a "modern web-app" has to run completely in the browser... it still makes perfect sense to put heavyweight process intensive stuff, persistence, business logic, etc. on the server-side. If I'm building an app that uses a lot of "semantic stuff", most of that stuff is, indeed, happening on the server side.
So your answer sounds more like an excuse than a solution to me, sorry.
I use dbpedia on a toy project and really appreciate it although I only use a very shallow portion of its possibilities. And it's still very brittle on the edges. Also I don't see it taking any momentum if it's not embraced by more of the big content players.
An interesting question would be : would semantic web favor google ? It would certainly help it index content but wouldn't also deprive it from its search monopoly ?
As websites will do anything to better than out in search results there is a large uptake in adding structured data.
The way they expose the web today is unidimensional :
keywords => related website list.
It's great for humans to parse. But its extremely limited for machines. Why isn't there yet an API or an UI to ask all books of Japanese novelists of the last 2 centuries (website links included per book). I am sure this could be in googles reach and of greta interest for everybody (replace books by manga if you wish).
And that I would call embracing semantic web.
My honest expectation was that building these ontologies would take about a week of group effort. As I remember it we were still at it months later.
This convinced me that the SW was a bust. Moreover one of the roadblocks we hit was very illuminating to me.
Creating the ontologies and onward development/extension of them was hard because the tools were so poor (also other things like it is just... hard) but the lack of tools was widely noted as a clear issue.
No one did anything about it. We had protege then, and lo it is so now.
See, people was tryimg to ram information down the throat of the presentation layer, and what for if our eyes were never to see it?
If you need meaning, ask for it gently, an API will provide it. And leave html alone.
If you use JSON-LD, it's not unlike implementing a REST/JSON API, except that it allows you to reuse standard formats and integrate with other APIs in an easier way.
APIs are part of the Semantic Web as well.
See, for example,
Oh, that, plus, the misappropriation of the term "semantic" by the designer community for BS like working with DIVs instead of TDs and having hierarchical document sections in HTML documents (ad-hoc per website), something that never gave any particular advantage, not even for screen readers (which were from the start designed to cop with the mess that document structure in the real world is).
There's some traction in academic publishing - aside in particular has some utility for popup/inline footnoting in some EPUB platforms, and some of the other new HTML5 elements are finding a place in online journal presentations, marking up formal abstracts and so on. There's nothing terribly clever going on yet though, as far as I'm aware, and anything that is will be very much platform or publisher-specific.
Except if you know, the universe doesn't work this way, and people and entropy will abuse and misuse any such system on a web scale, just like Cory Doctorow suggests.
Even once you've piled on the libraries and extracted the bit of information that you need, what do you do with that data? You process it a bit and store it in some kind of data structure. But at this point, you've could have just pinged the website's API and gotten the same data (and more) in a data structure.
It turns out it's a heck of a lot easier to return a blob of JSON than it is to process text in markup on a page. And smaller, as well: JSON often takes up far less space than the corresponding markup for the same content. That's a big deal when you're processing a very large amount of information.
There's the promise that AI will someday make this easier: if you eliminate the parse-and-process-into-a-data-structure step and just assume that an AI will do that part for you, you're in good shape. But that's nowhere near being a practical reality for virtually all developers, and APIs eliminate this step for you.
Even if you use something like HTML microdata, there's very few consumers of the data. Some browsers give you access to it, but that doesn't make it extremely useful: if you generated the data on the server side, why not just make it into a useful interface? Or expose the data as raw data to begin with? Going through the extra effort to use these APIs is a redundant step for most use cases.
I want to stress that the semantic web will probably always be a more academic field rather than "the next big thing". Similar to the artificial intelligence field or other academic fields that don't always have to lend themselves to "help industry make more money".
Nonetheless lots of our technologies are being used by industry (startups and enterprises). But again, our success doesn't depend on industry adoption. It depends on how useful the things we research about are from a research point of view - which is sometimes more theoretical.
As an example, most semantic web people think RDF is a more useful model than ad hoc data models because we care about generality and serendipitous reuse of data. Industry rather cares more about simple and efficient and fast tools.
All in all I hope some of our findings can be useful to industry and adopted by many startups but I hope we'll always find a niche where we can do things that industry can't afford to do. In a way, semantic web exists as an academic field precisely because it's unexplored by industry. If it were widely used, perhaps we'd never need academic fields as industry would take over. I hope that never happens for the semantic web.
Hierarchy will not die off because it is a core part of the way that the mind organizes concepts- for example, prototype effects imply several different levels.
Folk Taxonomies are hierarchical, though the depth is usually smaller than that of scientific taxonomies and the principles of organization is usually different.
There can be many ways of arranging the same concepts, and although it is possible to show that some are incorrect, it is in general impossible to show that one and only one Ontology is correct; this follows from the indeterminacy of translation (Google Gavagai!)
Attempts to force an alien Ontology onto subject matter experts breaks them. They just stop performing at an expert level.
It is usually possible to develop a suite of ontologies that are logically interoperable, but this requires experts who have skills from a variety of disciplines AND who are capable of deferring to the SMEs on how they see the world. It may be necessary to have intermediate mapping ontologies, but if the ontologists working with the different communities of interest are careful, these mappings can avoid losing meaning.
Tagging as ad-hoc keywords does not work for data interoperability; it also usually fails to achieve good recall. Flat lists of controlled terms are usually difficult to apply unless they are very small. When Thomas Vander Wal coined the term Folksonomy, it was intended to cover the same kinds of structures as folk taxonomies. Its subsequent application to describe unstructured lists of tags was a misappropriation.
RDF and OWL added some extra problems. They were designed without much input from people with actual use cases, and optimised for the wrong things.
Some things were dumbed down because some people didn't understand what had gone before, and could not understand why the old folk were kicking up such a fuss.
Other things were constrained in order to make OWL DL decidable, even though the resulting worst case complexity of 2-NEXPTIME means that implementations have to work on special cases, check for too slow results, and / or limit expressivity to sub profiles just in order to work.
Other design decisions did not consider the human factors of using OWL.
It is very difficult to explain to people why restricting the range of a property for a specific class is handled by adding an unnamed superclass. It is also difficult to explain the Open World Assumption, and the non unique name assumption.
It is also hard to explain why, in a web environment with no standard world closing mechanism, making everything monotonic is necessary.
It is especially hard to justify the restriction of RDF to binary predicates- some predicates are intrinsically of higher arity; just because higher arity predicates can be misused, and it is possible to reduce everything to binary does not make it desirable.
Having a model that does not match the existing experience of UML modelers, database designers, or users of other KR systems causes real problems for real users.
Nevertheless there is baby in the bathwater, and it can become soup. It just might not look the same.
The schema.org efforts are limited to the extent that they are barely semantic (a conscious decision by danbri and guha); unfortunately some of the discussions by others show a visceral distain for any questions as to what vocabulary choices would actually mean that is almost anti-semantism.
On an anecdotal note, no recruiter has ever contacted me because of these particular keywords on my LinkedIn profile.
If metadata is useful it will be used but pre-emptively adding semantic information to everything seems pretty unlikely.
IoT is becoming a reality, both technology wise but also financially. But Semantic Web? Some ideas of it are there but I think we are not there yet.
FWIW the W3C has several standards for semantic information and there are even more in progress. I'm having the impression though that the field is still heavily academia focused.
Every time you do a Google/Bing search you can see metadata returned. Projects like DBpedia and OpenCyc are huge and will become more important as more devices are networked.
It is unlikely that AI will be able to disambiguate concepts from a raw text corpus like a book (indexing) at the quality of a human specialist any time soon. Perhaps with some recent advances in quantum computing the raw horsepower will arrive.
For some, semantic web means RDF and linked data. Of course, the "full promise" is queries that have the ability to draw inferences from indirect relationships in the data - admittedly the few examples I've seen, whilst remarkable - perform best when there's only one, homogeneous underlying dataset. Where you have disparate datasets from many organizations/institutions (and the data spans decades), these things struggle outside of demos due to the huge work required in normalizing/mapping onto something common that can be sensibly queried against: and sometimes that's even when the same ontologies are in use! The underlying data just doesn't necessarily map very well into the sem-web representations, so duplicates occur and possible values explode in their number of valid permutations even though they all mean the same handful of things. And it's the read-only semantic-web, so you can't just clean it, you have to map it..
Which is why I'm always amazed that http://www.wolframalpha.com/ works at all. And hopefully one day https://www.freebase.com/ will be a thing. I remember being excited about http://openrefine.org/ for "liberating" messy data into clean linked data... but it turns out that you really don't want to curate your information "in the graph"; it seems obvious, but traditional relational datasets are infinitely more manageable than arbitrarily connected nodes in a graph.
So, most CMS platforms are doing somewhat useful things in marking up their content in machine-readable ways (RDFa, schema.org [as evil as that debacle was], HTTP content-type negotiation and so on) either out-of-the-box or with trivially installed plugins.
If you look around, most publishing flows are extremely rich in metadata. For all sorts of things, like describing news articles , journal articles  (DOIs weren't built to serve the semantic web, but certainly are rich in metadata), movie/book/audio titles and their content...
Beyond that, we just had GovHack  here in Australia a few months ago where groups were encouraged to do what they could with public government datasets (which themselves, again, aren't necessarily "semantic web" but are increasingly using "linked-data" formats/standards for, if not interoperability, then at least dataset discovery). There are RDF representations of everything from parliamentary archives  to land use .
I've personally seen some great applications of inter-organizational data mashing/sharing/discovery in materials science and a few years ago I really enjoyed working with bioinformatics services such as  which allows some fun SPARQL queries to answer interesting questions.
I agree that wolfram alpha rocks - I wish it were less expensive to use though.
* I can see that the semantic annotation part of it is spreading. Schema.org / JSON-LD might be the first pragmatic solution that I can imagine actually getting more widespread acceptance. Especially if currently existing Frameworks / CMS add support by default.
* Semantic Annotations are helping big companies like Google to make their products smarter and this is happening right now.
* Semantic Web tries to solve some real problems, not just "academic" problems. Information and Knowledge is indeed rather unconnected which reduces its value tremendously. Right now APIs grow to make this mor accessible, but there are many problems unsolved.
* SemanticWeb has some truly interesting ideas and concepts, that I've grown to like. Of course nearly every one of them could work without buying the whole Semantic Web. But still, I think some very interesting ideas come out of that community.
* It takes a lot of time to understand the Semantic Web correctly and learning about the technologies behind gets soon very mixed up with a lot of complicated and rather uncommon concepts, like Ontologies.
* The tools (even Triplestores) feel awkward and years behind to what I'm used to as a web developer. There are a LOT of tools, but most seem to be abandoned research project which I wouldn't dare to use in production.
* It gets especially complicated when entering the territory of the Open World Assumption (OWA) and the implications that has on reasoning and logic. Say you want hard (real-time) validation because data is entered through a form on a website. Asking some people from the Semantic Web Domain, the answers varied from "I don't know how to do this" up to "Its complicated, but there is some research... , additional ontology for that...". I'm kind of shocked since this is one of the most trivial and common thing to do in the web. And I really don't want to add another complex layer onto an already complex system just to solve simple problems. Something's wrong with the architecture here.
* OWA might be interesting, but most applications / databases are closed world and it would make many things very complicated to try fit it into the Open World Logic. OWA is an interesting concept and makes sense if you are building a distibuted knowledge graph (Which is a huge task and only few have the ressources to do it), but most people will want to stay closed world just because its much more easy to handle. The Semantic Web seems to ignore reality here and prefers to be idealistic, imho.
* This sums up to me to one big problem: The Semantic Web Technolgies provide solutions to some complex problems, but also make some very easy things hard to do. As long as it doesn't provide some smart solutions (with a reasonable amount of learning / implementation time) to existing problems, I don't see that it will be adopted by the typical web developer.
* There are not enough pragmatic persons around in the community, that actually get nice things done that produce that "I wan't that, too!" effect.