As someone who was once a huge semantic web advocate, in retrospect there are so many issues with semweb which make it obvious it was never going to succeed.
For starters, just like XML is a poor, verbose re-implementation of s-expressions (I still love Erik Naggum's rant on this topic [0]), RDF is an equally verbose reimplementation of Prolog clauses. During the early 2000s many technologies, myself included, where infected with some sort of weird mind virus that made us think reimplementing everything in human readable markup derived from SGML was a great thing. Prolog already had problems succeeding, there is no reason to believe a more difficult to implement version would succeed.
Then there's the issue that for a true semantic web you need armies of people creating trustworthy metadata. Even if you have tools to autogenerate RDF (like we do for semantic metatags today), you're not going to have a really interesting "web" of data without massive amounts of selfless labor. That selfless part is important because there's no monetary advantage to producing tons of well curated RDF... look at how much effort website put into resisting screen scraping.
Additionally you have Prolog clauses without Prolog's reasoning power!. SPARQL is just a query language, not a tool for automated reasoning. The OWL spec started with a definition of a reasoning engine that was computationally intractable. The idea of running Prolog on a web of clauses is compelling but to this day we don't have a reasoning engine. The closest thing would be to just load wikidata into SWI-Prolog.
In a surprising number of ways Semweb shares the same problem that crypto does: a distributed, public, trusted, cryptographically ensured immutable ledger is a very interesting technical solution. However in practice there are almost no problems for which it is the correct solution. The vast majority of Semweb "solutions" can also be solve with relational databases and a public facing API.
RDF is that public facing API. You can use whatever technology underneath, and relational databases are not a bad choice. You can even do "inference" in a relational database via Turing-complete CTE queries and make the result queryable as a "view" over the underlying data.
> Additionally you have Prolog clauses without Prolog's reasoning power!. SPARQL is just a query language, not a tool for automated reasoning.
That's not necessarily bad, however. A direct consequence of this is that finding the answers to a SPARQL query can always be done in logarithmic space with respect to the RDF graph, whereas computing answers to a Prolog query need not terminate at all.
I just came here to make the banal remark that RDF/OWL/Sparql etc are great to work with in certain problem spaces and not great in others.
My experience working in Semantic Web was much on the same as my experience with Lisp/Scheme. Great fun, hugely educational, glad I know about it, and its great to spot the related concepts/borrowings when they turn up elsewhere.
I spent 6 years working with RDF Quad Stores on a user facing piece of software for the public and academic library market. RDF was an excellent fit, data model wise, for bibliographic data and SPARQL was nice to work with.
I wish it had taken off more, everything being identified by a URI was a great concept for opening up data lakes, but alas, it wasn't to be.
RDF and Sparql are very much alive, for instance see Wikidata and wikibase (with the public cloud service in closed beta atm). https://wikidata.orghttps://wikiba.se/
Wikidata is very much about the "web of data" or Linked Data as for each real-world entity it collects lots of the identifiers that are used to refer to it on the web and elsewhere. This way, in addition to being a central island it also consists of bridges to the other islands forming the web.
Further, wikibase and Wikibase Cloud will make it easy for people to create their own web databases ("islands") that are published as Linked Data (with RDF and Sparql) in the same way as Wikidata.
It's not so much the data model, as the interlinkedness (for lack of a better term) that was very much posited as the way forward.
In the Bib world things like the Virtual International Authority File (VIAF) so every system talking about the author Charles Dickens used the exact same identifier for him, allowing different data stores to be easily traversed and linked together, by design.
I'm in a similar space (bibliographic metadata, DOIs etc) but not too familiar with this area. Did the desire for interlinkedness go away? What in particular changed? Was an authority file seen as a replacement for a more organic RDF model?
I've not been in the space for 6 years at this point, but a lot of the "cool stuff" being done seems to have died down - partly this was some bad decisions (IMHO) taken by the committee looking to replace Marc21 (Bibframe?), running Quad Stores at scale is still pretty niche etc.
Neo4j is, mostly, a very different kind of graph store. A Labeled Property Graph, and not a Semantic Knowledge Graph which is the kind of graph commonly associated with RDF.
I write “mostly” because recently neo4j has a “neosemantics” project which handles RDF. It is all pretty standards compliant RDF though, nothing custom at all.
RDF comes with this whole set of W3C standards, and standardization seems to be a big point in favor if you need to care about interoperability.
In terms of data model, RDF may be a bit too web centric to be approachable. Linked Data does not seem much better.
This may not be a fully informed view, but so far I think: While the whole area of (broadly) logic-based representations (including datalog, eg. Souffle, datomic, ...) seems to have a bit of a renaissance, all the semantic web stuff seems a bit more like baggage.
In other words: if I wanted to build a private knowledge base, I don't want to pick or publish URIs and the fact that all the documentation pushes me towards open/sharing is just getting in the way.
In contrast to that, relational databases tech has affected programming languages and frameworks over the decades and goes way beyond triple/quads (e.g. LINQ, or relational mappings). This seems to be a big missed change in a way since, some of the "typing" would in principle be a lot closer to class-based OOP then all the relational stuff.
> In other words: if I wanted to build a private knowledge base, I don't want to pick or publish URIs and the fact that all the documentation pushes me towards open/sharing is just getting in the way.
RDF supports "blank nodes" for this use case. These are URI-less nodes that can only be referenced within the graph where they're defined. You'd use URI for reference to outside entities, which can be very relevant even in a private knowledge store.
Blank nodes are one of the contested flame warzones of RDF.
The fact that they don't have a reified instance means that they are open to collapse with other blank nodes in certain inference scenarios and makes a lot of problems as hard as graph isomorphism.
What you mean is the unique name assumption, which is distinct from the OWA, and may be viewed orthogonally and applied or not applied in different RDF contexts.
The problem with blank nodes is that they circumvent the UNA in every scenario, and you need something like "reified blank nodes" to resolve this, a.k.a. by using UUIDs treated as blank nodes.
URIs a mostly namespaces, it has no bearing on being private data. If you know what your doing you’ll be using ontologies made by others, and like having a namespace for a library you’ll want the same for the terms/classes you use from them.
>In other words: if I wanted to build a private knowledge base, I don't want to pick or publish URIs and the fact that all the documentation pushes me towards open/sharing is just getting in the way
To be fair, i think that is kind of like saying, if i don't care about RDF's primary intended use case, then RDF is a bad choice.
I heard SHACL helps with specifying structure (shape), and more so than OWL which is more about meaning (and where it may be acceptable to have an incomplete but "open" specification).
Programming with these structures (triple/quad with shape constraints) would be interesting, general-purpose programs could not easily take advantage in the form of static type-checking. This is roughly the same type of problem as the "object-relational impedance mismatch." I believe some kind of type-checking (shape-checking?) for SPARQL queries might be possible, but don't know if any such thing is implemented.
no qualified to give a definitive answer, but from looking from the outside:
- clunky verbose syntax
- at odds with economics incentive of the web. everyone want to build their own walled gardens and make access to data harder (ex anti-scraping measures)
- reasoning using the open world assumption (OWA) is harder the the other-way around.
- tried to do many things at once.
- the multitude of ways to reason and represent non trivial things. how do you handle disagreements esp when taking into account OWA. it's possible but hard. a lot of people would just rather have a closed world of reasoning where they dictate things instead of tackling it 'the proper way'.
And faulty assumptions about people who would add semantics to data because of ... and because some people also write html. And the semantic web doesn't really solve any problems that anyone has. And performance scaling for semantic reasoning is fundamentally worse than proof-of-work schemes in blokchain. And the semantic languages are typically not that fun and easy and productive to work with.
For starters, just like XML is a poor, verbose re-implementation of s-expressions (I still love Erik Naggum's rant on this topic [0]), RDF is an equally verbose reimplementation of Prolog clauses. During the early 2000s many technologies, myself included, where infected with some sort of weird mind virus that made us think reimplementing everything in human readable markup derived from SGML was a great thing. Prolog already had problems succeeding, there is no reason to believe a more difficult to implement version would succeed.
Then there's the issue that for a true semantic web you need armies of people creating trustworthy metadata. Even if you have tools to autogenerate RDF (like we do for semantic metatags today), you're not going to have a really interesting "web" of data without massive amounts of selfless labor. That selfless part is important because there's no monetary advantage to producing tons of well curated RDF... look at how much effort website put into resisting screen scraping.
Additionally you have Prolog clauses without Prolog's reasoning power!. SPARQL is just a query language, not a tool for automated reasoning. The OWL spec started with a definition of a reasoning engine that was computationally intractable. The idea of running Prolog on a web of clauses is compelling but to this day we don't have a reasoning engine. The closest thing would be to just load wikidata into SWI-Prolog.
In a surprising number of ways Semweb shares the same problem that crypto does: a distributed, public, trusted, cryptographically ensured immutable ledger is a very interesting technical solution. However in practice there are almost no problems for which it is the correct solution. The vast majority of Semweb "solutions" can also be solve with relational databases and a public facing API.
0. https://www.schnada.de/grapt/eriknaggum-xmlrant.html\