Instead of saying, "Give us money to build a webpage", they say, "Give us money to expose metadata annotations using a RESTful API on the semantic web."
I would prepare conference presentations where I was just filling slides up with BS to fill time.
Devs from other universities (gotta check that international research box!) understood the technology even less than our team did. We provided them a tool for storing RDF triples for their webpage so they could store triples about anatomical relationships. They wanted to use said RDF store as their backend database for storing things like usernames and passwords. facepalm
So you have all these academics publishing all this extremely important sounding literature about the semantic web, but as soon as you pry one nanometer deep, it's nothing but a giant ball of crap.
By the end, I figured out the professor was full of crap. When AJAX became a big thing, I remember the professor asking, "Why don't you add AJAX to this project?" What does that have to do with the Semantic Web? In the end, I got a paper published in a fairly prestigious journal by just combining some flashy visuals with the Semantic Web and having the professor be a co-author.
That was one of the big experiences that helped me figure out that academia wasn't for me. And I would be perfectly fine never hearing about triples, RDF, or that other nonsense again!
Sounds like something my manager at my previous job would say. He was a Pointy Haired Boss.
There's no such thing as "right" way to represent any given data stream, just ways that are more or less suitable to specific tasks and interests. That's why HTML failed as descriptive language (and has become fine-grained-formatting language), and it's why symantic web was doa.
I think HTML and the web failed in general. Modern HTML is really nothing more than div tags everywhere, with a handful of span tags. We went from abusing tables to abusing the entire document. We, in effect, eliminated all semantic meaning from a document by making everything generic tag soup.
The DOM + JS has largely supplanted HTML as the source of a web page. Especially when using tools such as React or Angular.
In terms of vision, the rise of native phone apps and the fact that every major site has a mobile version and a separate desktop version really highlights how the web failed.
I do node/React dev for a living. I'll be the first to admit this pile of hacks is total garbage. Mobile web is almost unusable. I hate it. I hate the sites I work on. Their UX is horrid. Native apps are so far superior that they make the web look like an embarrassing relic. But web development pays the bills and keeps the lights on.
When I watch conference presentations it seems to me that there are two groups of people, those to whom the ever more complicated appeals and those to whom the make it simple appeals.
The 'make it simple' currently does not fit with existing workflows, maybe for startups but not for most web agencies where a good decade has now been spent nesting divs in more divs and making it a big mess of un-maintainable 'write only' code, with 97% unused stylesheets that have got to the stage where nobody knows what anything does, they just add new hacks to it.
With where we are with HTML5 it should be easy to markup your document semantically, however, Google have kind of given up with that and if you want to put some semantic goodness in there then you add some JSON-LD on the end rather than put property tags in everything throughout the document. It is as if Google would prefer the document to be doubled up, once with trillions of divs for some bad CSS and then done again to be machine readable.
Regarding mobile, 'progressive web apps' is widely supported and has removed the need for custom mobile applications. This is progress.
However... I'm writing this on my phone so HN is part of the mobile web and it works well. I read the post on twobithistory.org on my phone and that also works well. I doubt that they have an app and even if they did, why should I install it and what would happen when I follow a link from HN to them? I'll get the mobile site or would the app catch the link and open itself?
I don't even have the apps of the news sites I read most. Reading them on the phone with Firefox and uBlock is good enough. Their apps probably contain more spyware then the adblocked sites.
So the mobile web did not conpletely fail. It's still what's on the screen of my phone for about 50% of the time: since last charge 3h 55m of screen time, 57m used by phone calls, 1h 54m Firefox, 15m WhatsApp, etc.
You are an unusual smartphone user. For most people, it is 0m.
I'm imagining starting with webassembly for sandboxing. We can then expose through to webassembly a useful set of API primitives from the underlying OS for text boxes, widgets and stuff.
Apps would live in a heavily sandboxed container and because of that they could be launched by going to the right URL in a special browser. We could use the same security model as phone apps - apps have a special place to store their own files. They have some network access, and can access the user's data through explicit per-capability security requests.
That would allow a good, secure webapp style experience for users. But the apps themselves would feel native (since they could use native UX primitives, and they would have native performance).
Developers could write code in any language that can compile to webassembly. We could make a bundler that produced normal applications (by compiling the app out of the sandbox). Or we could run the application in a normal web browser if we wanted, by backing the UX primitives with DOM calls, passed through to WASM.
Hopefully that explains why "API primitives exposed to webassembly" feels to me like thinking about the web from the wrong end. The social end is what makes the web tick. It could be built with tinkertoys for all I care.
For my money its that Java applets and X-Windows didn't have a distribution mechanism and security model. They simply didn't do anything I couldn't already do with desktop apps and HTML.
Also, frankly, they were kind of slow and not very good. I think thats the biggest problem with this sort of idea - the breadth of surface area for GUI toolkits is crazy huge. Building something that works well, and works cross-platform is a seriously huge amount of work.
www has 3 main ways of discovery that alternative technologies didn’t offer: 1) search (leading you to correct info in the site, instead of just to a landing page. 2) overview pages, short summary with links to the actual info (google news, etc), 3) deep hyperlinks that everyone can easily discover (browser url) and provide elsewhere (email, Facebook posts, twitter posts, etc).
The first one is very much based on the semantic qualities of html, where google can crawl a page and make some educated guess about what the page is about.
Biggest problem with mobile apps is that discovery is completely channeled through commercial app stores.
I would like to see an alternative web tech stack that doesn’t skip the discovery part. Web assembly with canvas for example is completely useless for a search crawler.
Timeline is one of the key ones. According to chrome task manager (because browsers need task managers now) the page I'm typing this reply on the contains a text area and your comment is consuming 30MB of RAM. Back when Java applets were getting their reputation for being slow I would have been lucky to have a computer with 32MB of RAM, 8 and 16MB were still common at that time. Now there were some other things that made applets awful, but if they were introduced today they wouldn't seem nearly as bad as we remember, on the same computer this page would be clunky.
For x-windows, it was never really a contender because there was no MS compatibility, but the potential was there.
So on that case, you can check if UWP APIs are available and use all of them, depending on UWP permissions for the app.
Chrome is following a similar route with ChromeOS and Chrome Android.
As native/Web developer I tend to have a native bias, but PWAs look like the way the Web might win back native. It isn't fully there though.
At a minimum we'd need a separate API to enforce a web / mobile style security boundary.
I main Linux on all of my machines. Most of the native apps on it have terrible UX. Even big apps - On touch screens, Firefox doesn't support two-fingered scroll. Chrome won't snap to the side of a desktop. Neither will bring up a virtual keyboard if I click on the URL bar.
The majority of my native apps that I use don't support fractional scaling - apps like Unity3d are unusable on 13 inch screens and there's no way for me to zoom in or out on them. Even system dialogs suffer from this problem sometimes, it's like nobody on native ever learned what an 'em' unit is and they're still stuck in 1990 calculating pixel positions.
To contrast, most of the websites that I'm using, even when they're badly designed will work on smallish screens or can be individually zoomed in and out. My keyboard shortcuts work pretty much the same across every site (aside from the rare exception that tries to be all fancy and implement its own). If they break, it's not rare for me to be able to open up an inspector and add one or two CSS rules that fix the problem.
Reading Hacker News, I sometimes wonder if I'm just browsing/using entirely different sites/apps than everybody else is. I don't understand how my experience is so different.
Regarding semantic HTML, I generally don't have too much of a problem there either. I don't think semantic HTML is hard to write - I use it on every single one of my sites. If you're using React and it can't be used without spitting divs all over the place, maybe the solution is just to stop using React? Modern HTML is only going to look like div soup if you fill it with divs.
I mean, I can build you a horrible SQL database that requires 30 joins on every data call, but that doesn't necessarily mean that SQL is bad. It means that auto-generating SQL tables based on a bunch of cobbled-together frameworks and user-scripts is bad. Treat your application's DOM like you would a schema, and put some thought into it. That will also solve a great deal of the responsive design problems on mobile that people are talking about, because light DOM structures are more flexible than heavy ones.
Buzzword-compliant semantic HTML or semantically useful HTML? Is there any user-agent out there that benefits from the extra work you're putting in?
The criticisms this article levies about the semantic web are pretty much straight on as far as I can tell.
Semantic HTML is pretty straightforward though - it's using HTML to describe content, rather than purely for layout. Some sites do it better than others, but it's certainly not dead or abnormal -- and many static HTML generators are... decent.. ish. Semantic HTML is using stuff like article tags and sections, using actual links instead of just putting a click handler on a div, stuff like that. The stuff that makes it easy to parse and understand a web page.
It's very useful - semantic HTML is the reason that sites like Pocket work, it's the reason why reader mode in Firefox/Safari works. It's the reason why screenreaders on the web work so much better than on native apps (at least as far as my experience on Linux has gone, maybe other people have better apps than me :)) It also (opinion me) makes styling easier, because light descriptive DOM structures tend to be easier to manipulate in CSS than large ones.
The semantic web, to the extent that it's well-defined at all is more about the metadata associated with a webpage. Very different concepts.
On the other hand, the last time I launched Braid on Linux, I had to manually change my resolution back afterwards and it removed my desktop background.
And I just felt like, "I'm sure there was a really good, sensible reason for whatever hack this game relied on when it originally launched on Linux. But... come on, if you had used some common framework, for all of the terrible problems that might have brought, when I launched it years later it would have at least full-screened properly."
So I dunno. The number of really big Linux apps that end up using their own custom display code is surprising to me. Even Emacs isn't fully using GTK. I assume developers of those apps are smart, so I assume there must be a good reason for it.
People forget that all these fancy frameworks produce actual HTML5 DOMs, who cares if those are static or dynamic. I someone wants to write a semantic web parser/crawler then it's a great idea, but probably it shouldn't be done using wget. :-)
So just add a layout-language to divide it from the content, as already done with style?
My core objection to "the Semantic Web" is the non-existence of "the Semantic". There is no way you can get everyone to agree upon a universal "semantic", and if you can, which you can't, you can't get people to accurately use it, and if you can, which you can't, you can't prevent people from gaming it into uselessness. But it all starts from the fact that the universal semantic doesn't even exist.
Somewhere there's a great in-depth series of blog posts from someone who describes just trying to get libraries to agree upon and correctly use an author field or set of author fields for libraries. This is a near-ideal use case, because you have trained individuals, with no motivations to insert bad data into the system for aggrandizement or ad revenue. And even just for that one field, it's a staggeringly hard problem. Expecting "the web" to do any better was always a pipe dream. Can't dredge it up, though.
(To the couple of other replies about how "it's happening, because it's happening in [medical] or [science]", well, no, that's not it. That's a smaller problem. The Semantic Web (TM) would at most use those as components, but nobody would consider that The Semantic Web (TM), at least in its original incarnation. Yes, smaller problems are easier to solve; that does not make the largest version of the problem feasible.)
And I'm pretty sure it was the whole point. Nobody would ever have written as many reams of marketing material if the pitch was "Hey, someday, you'll be able to reach out to the web, and with specialized software for your particular domain you can access specialized web sites with specialized tags that give you access to specialized data sets that can be fed to your specialized artificial intelligence engines!"
Because that pitch is basically a "yawn, yeah, duuuuuh", and dozens of examples could have been produced even ten years ago. The whole point was to have this interconnected web of everything linking to everything, and that's what's not possible.
Consider these passages, from "The Semantic Web," Tim Berners-Lee et al, Scientific American, May 2001:
"Like the Internet, the Semantic Web will be as decentralized as possible. Such Web-like systems generate a lot of excitement at every level, from major corporation to individual user, and provide benefits that are hard or impossible to predict in advance. Decentralization requires compromises: the Web had to throw away the ideal of total consistency of all of its interconnections, ushering in the infamous message 'Error 404: Not Found' but allowing unchecked exponential growth."
"Semantic Web researchers... accept that paradoxes and unanswerable questions are a price that must be paid to achieve versatility. We make the language for the rules as expressive as needed to allow the Web to reason as widely as desired. This philosophy is similar to that of the conventional Web: early in the Web's development, detractors pointed out that it could never be a well-organized library; without a central database and tree structure, one would never be sure of finding everything. They were right. But the expressive power of the system made vast amounts of information available, and search engines... now produce remarkably complete indices of a lot of the material out there. The challenge of the Semantic Web, therefore, is to provide a language that expresses both data and rules for reasoning about the data and that allows rules from any existing knowledge-representation system to be exported onto the Web."
I think one of the big problems with the semantic web was that it turned "varying degrees of partial progress" into "multiple competing approaches", each of which wanted to detract from the others.
That sounds interesting! Don't suppose you remember any more terms that might enable a Google search to find it?
I have a foggy recollection that Jerry Fodor (Holism) or maybe Eco had thoughts on the matter.
In all fairness, it was also promoted by legitimate academics including Time Berners-Lee. I actually saw him give a talk about it a number of years back for his Draper Prize speech.
Btw, I don't think this or failure of semantic web reflect badly on TBL. He is in a rare class of folks, along with Stallman, who can leave a bigger dent in the world while missing their ideals by a mile than most of is could if we got everything we wanted.
My main point was that this wasn't just some fantasy of out-to-lunch bureaucrats.
"Semantic web" was like a keyword that popped up in the abstract of virtually every paper published on the topic since the mid 90s or so. It seemed to me one of those phrases that virtually guaranteed grants of funding if you could work it into your paper.
I think you're correct that people working in the field of semantic web oftentimes didn't understand (or lost sight of) the intended nature and utility of the concept. But I also think that its buzzword status lead to the label being applied to many things that were only loosely related.
There was (is?) a lot of good work happening under the heading of "semantic web", but—partly because the term was not an apt one for the work, and partly because the dream of the semantic web never really actualized—that work has remained relatively obscure. Obversely, there was much work of questionable worth happening under the same banner, likely because it was a reliable way to get funding, at which point we return to that extremely important-sounding literature that's nothing but a giant ball of crap...
In theory the idea is brilliant but the execution is poor. In many cases the world of academia is way too detached from software development to be able to produce any real-world useful tooling beyond toy projects and proof-of-concepts.
It's not just academia. I made an observation about how most BigCo and BigGovts use a lot of Semantic Web: https://news.ycombinator.com/item?id=18036041
On the breadth end there may be a need for a sort of "contract of depth commitment" to provide insurance against the flighty hijinks of the ever-expanding mind, and at the real-world level you need a sort of "grant of exploration" which allows academia to bring its own type of surface-level one-night-stand concern or even ADHD to the game. Without the latter there's probably too much risk of repeating history while someone else is blazing the trails you won't try.
Anyway thanks for that comment, it's really interesting.
TLDR; What semweb / linked data lacks the most, is a practical logic / reasoning layer, so that facts can more easily be inferred form existing plain data. I further suggest this is already available in a rock-solid proven technology; Prolog.
Slides and video:
I tried to steer away from typical databases, but in the end I was just getting in the way.
>More than 2,000 European AI experts have come together to collaborate on research and call for large-scale funding from the European Union to counter China and America’s rapid progress in the field.
you wouldn't get them worrying about the US / China's rapid progress in Semantic Web.
What would you say to them?
Unfortunately some developers think that once you've chosen one particular storage technology (RDF, relational DB, document storage) that's the bucket you hae to put everything into in order to do anything in your application. I suspect that was the mental model of the other devs in the story above.
Aside from that, I guess there's theoretically nothing stopping you from using the triplestore for usernames/passwords (I hope you mean salted passwords) but sheesh, talk about killing a fly with a bazooka.
To be fair to those colleagues, it might have been less about them being clueless, and more about them wanting to offload work to my team, lol.
2) if you ask such a stupid question you probably won't be able to properly secure the machine hosting this
3) your users will be totally screwed when you get hacked
And this is why you should never store passwords but only a salted hash, and never trust any service that can email you your password when you click "I forgot my password".
Well... it happened.
1) We got schema data for Job Postings that companies like Google reads to build a job search engine.
2) We got schema for recipes. https://schema.org/Recipe
3) We got the Open Graph schema for showing headlines/preview images in social networks. http://ogp.me/
4) We got schema for reviews: https://developers.google.com/search/docs/data-types/review
5) We got schema for videos: https://developers.google.com/search/docs/data-types/video
6) We got schema for product listings.
All of the crazy bikeshedding about labyrinthine XML standards, triples, etc. or debating what a URL truly means has very little to show for the immense time investment.
The main lesson I take away is that you absolutely need to start with real consumers and producers, and never get in the state where a long period of time goes by where a spec is unused. Most of the semweb specs spent ages with conflicting examples, no working tooling, etc. which was especially hazardous given the massive complexity and nuance being built up in theory before anyone actively used it.
They are organized around use cases, not data origins. They are opt-in, rather relying on distribution via some discovery standard. They are only used by subsets of the industry they serve, and that's ok. They aren't integrated with regulation in ways that force their use. (list continues)
I always felt that one problem of more widespread adaption by users was missing software. There were and still are all these great microdata pieces out there, but can my browser save an address, add an event to my calender etc.? If these things were easily possible to even an inexperienced user, I think we would see much more semantic markup. But in the meantime, search engines and social networks sure have an easier time scraping information then before we talked about the semantic web.
First, there is no universal "semantic". The meaning of things is ambiguous, and given a broad enough pool of people and cultural contexts, it becomes nigh impossible to converge on a single consistent model for individual terms and concepts in practice. A weak form of this is very evident in global data interchange systems and implementations, and the Semantic Web took this bug factory and dialed it up to eleven. In short, the idea of the Semantic Web requires semantics to be axiomatic and in the real world they are inductive and deeply contextual. (Hence the data model rule of "store the physics, not the interpretation of the physics.") That said, it is often possible to build an adequate semantic model in sufficiently narrow domains -- you just can't generalize it to everything and everybody.
Second, implementation at scale requires an extremely large graph database, and graph database architectures tend to be extremely slow and non-scalable. They were back then and still are today. This is actually what killed the Semantic Web companies -- their systems became unusable at 10-100B edges but it was clear that you needed to have semantic graphs in the many trillions of edges before the idea even started to become interesting. Without an appropriate data infrastructure technology, the Semantic Web was just a nice idea. Organizations using semantic models today carefully restrict the models to keep the number of edges small enough that performance will be reasonable on the platforms available.
The Semantic Web disappeared because it is an AI-Complete problem in the abstract. This was not well understood by its proponents and the systems they designed to implement it were very, very far from AI-Complete.
Also, rights. People "own" things like sports fixture lists... they don't want you extracting that data without paying to use it.
The semantic web could be perfect technically, but it was never going to apply to content that people were attempting to "monetise"... which seems to be most of the web's content.
A machine-targeted web structure that contains some lies is not useful for machines because they can't filter out those lies yet. It might become useful when they can (but that might be a hard-AI problem), but it's simply not usable until that point.
I made a few observations in my own comment.
One being that there is no usable graph store you and I can use as of 2018.
Another being about monetizing the Semantic Web when playing the role of the data/ontology provider. You provide all the data while the consumers (Siri, Alexa and Google Home) get the glory: https://news.ycombinator.com/item?id=18036041
It sounds like the Semantic Web failed because we tried treating a longstanding (and possibly unresolvable) ontological problem as a straightforward and technical one.
I don’t really follow academic philosophy, but is it known these days if such categorisation problems are even “solvable” in the general case?
I believe that we actually can generalize modelling. We have dictionaries filled with definitions, given enough time and discipline, I don't see why we couldn't make them formal. It's not an engineering problem though.
Freebase, after being bought by Google became the foundation of the Google Knowledge Graph (aka "things not links"). This kicked off an arms race between all the major search providers to build the largest and most complete knowledge graphs (or at least keep pace with Google ). Instead of waiting for folks to tag every single page, it turned out that simple patterns cross referenced across billions of pages were good enough to extract useful knowledge from unstructured text.
Some companies who had easier access to structured but dirty data (like LinkedIn and Facebook) were also able to utilize (and contribute to) all of that research by building their own knowledge graphs with names like the Social Graph and Economic Graph. Those in turn are helping to power a decent amount of their search and ad targeting capabilities as well as spawning some interesting work
All those knowledge graphs became a major part of Siri, Alexa and Google Home's ability to answer a wide range of natural language queries. As well as being pretty fundamental to a lot of tech like semantic search, improved ecommerce search and a bunch of intent detection approaches for chatbots.
So yeah while the technology and associated research did turn out to be incredibly useful, adding fancier meta-tags to pages was not the direction that proved the most useful.
The idea with the semantic web was that it would be open and it would belong to its users, not to some cabal of giant corporations that would use it to control the internets.
That notion of openness and co-authorship of the knowledge on the web is now as dead as the parrot in the Pythons skit. And we're all much the worse for it- see all the debates about privacy and ownership of personal information and, indeed, metadata.
I made an observation about monetizing the Semantic Web when playing the role of the data/ontology provider. You providea all the data while Siri, Alexa and Google Home gets the glory: https://news.ycombinator.com/item?id=18036041
Unfortunately, they got acquired by Google, and Freebase eventually shut down. Thinking back now, I wonder if there would have been a business model in hosting private data graphs to subsidize the open source data.
 - https://en.wikipedia.org/wiki/Freebase
 - https://opensource.googleblog.com/2010/08/acre-open-source-p...
But the acquisition prevents there from existing a service that can inspire a wider ecosystem on something like a federated platform.
Blood monopolies, man
You might be interested in the Underlay Project that Danny is starting: https://underlay.mit.edu/
Makes sense for them to buy it and get rid of it that way.
Except they didn't : The Freebase team moved to Google and carried on the work there, as "Google Search".
Serialization is not the hard part.
The semweb community was obsessed with ontologies and OWL and schemas and taxonomies. If we can just break the problem down enough, the logic went, then systems will be able to infer new data about the world. But it never worked out that way.
Eventually you just have to write some code. If you have to write code anyway, all the taxonomies and RDF in the world aren't helpful (indeed, they're almost certainly the least efficient way to model the problem). You just scrape the pieces of knowledge out of whatever JSON, HTML, or whatever else and glue them together with the code. You don't need the all-knowing semantic web, you just need a .csv of whatever tiny piece of it you care about.
I have a distinct memory of trying to sell someone on the startup I was working on, a SPARQL database. I was pitching RDF as a way to model the problem, but eventually the person I was pitching just said "well, we can just outsource the scraping to our eastern european devs and put it all in one big table." I had a kind of "oh my" moment where I realized that the startup was never going to work: in the real world, you just write code and move on. Taking part in the great semantic knowledge base of the world doesn't matter and isn't needed.
The other end of semweb, the "machine-readable web", more or less came to pass. schema.org, opengraph, and that sort of thing did 99% of what the semweb community wanted at 5% of the effort. The fact that all of that data is not in one giant database doesn't really matter to anyone; you rarely care about more than 2 or 3 web pages at once.
Web pages are a giant mess of content and presentation, and CSS doesn't really help much. XML is at least a way of describing data in a meaningful way. <book>, <author>, <chapter>, etc. XSLT provided a way of formatting XML in the browser. Sure the internet would still be full of inconsistent content structures, but it would still be way easier to machine read than the big mess of arbitrary <div>s and <p>s (most of which just display something blinky) that we have today.
Except of course the developers who are using react & friends because it allows them to separate content(data) from presentation(view).
The idea behind the whole thing was that the "web devs" could be hired for their HTML/CSS skills and never have to touch code. The reality was, they had to become experts in a peculiar, clunky, XSLT-based programming language of their own.
Did you "push" or did you "pull"?
I think you're proving my point. :-)
More specifically, we defined parameterized components that could be composed to form the output. The definition of those components was incredibly verbose, as was the instantiation, and it was tightly coupled to parts of the domain model in surprising ways. IIRC WebDevs ended up having to work a step removed from the final XSLT, using some DSL for components that transformed down to XSLT.
Further, deponent sayeth not.
And how does my question prove your point?
I am asking, because I really would like to understand, what happened there.
I never had much luck using XSLT for anything non-trivial, and I imagine that experience isn't uncommon.
Actually, that was XSL:FO.
XSLT is transformation, not formatting; of course you could translate to (X)HTML with embedded styles, or XSL:FO or something else that includes formatting, but XSLT doesn't do formatting itself.
Typical workflow is: XML -> XSL-T -> XSL-FO
The writer says that a business owner must add their office info to Google or Yelp and suggests there are no alternatives to such centralized repositories of information. However OpenStreetMap also has opening hours for businesses and medical practioners and that data is yours to process and play around with as you like.
In fact, there is just so much data present now in OSM, we simply lack convenient end-user tools to extract and process it automatically.
I mean the person looking for the hours would probably go to one of those sites, but the hours wouldn’t be stored there.
It's facepalms all the way down.
As the title of the class indicates, the idea was to encourage the creation of real-world applications, and to that end the class groups were encouraged to have a mix of Course 6 and business school team members. At the time, it seemed that the Semantic Web was more of an academic/open source project rather than something that was widely embraced by developers, although some guest speakers did have working applications at their places of business. I think the hope was to seed the Cambridge startup ecosystem with SW/Linked Data examples that could encourage its spread into the real world.
One of the teams in our class actually turned their project into a startup that was later acquired. I ran into one of the co-founders a few years later and asked if they continued to use the Semantic Web/Linked Data model that they had demoed in class. The answer: No, because it couldn't scale. That was an issue that was anticipated and discussed during the class, but there was hopeful talk that scaling issues would be resolved in the near future through various initiatives.
Berners-Lee was successful with the Web because it was not an academic idea like Nelson's and Engelbart's hypertext, but it was a pragmatic technology (HTTP, HTML and a browser) that solved a very practical problem. The semantic web was a vague vision that started with a simplistic graph language specification (RDF) that didn't solve anything. All the tools for processing RDF were horrendous in complexity and performance and everything you could do with it could typically be solved easier with other means.
Then the AI-people of old came on board and introduced OWL, a turn for the worse. All the automatic inference and deduction stuff was totally non-scalable on even toy examples, let alone web scale. Humans in general are terrible in making formal ontologies, even many computer science students typically didn't really understand the cardinality stuff. And how it would bring us closer to Berners-Lee vision? No idea.
Of course, its basic assumptions about the openness, distributedness and democratric qualities of the Web also didn't hold up. It didn't help that the community is extremely stubborn and over confident. Still.They keep on convincing themselves it all is a big success and will point at vaguely similar but successful stories built on completely different technology as that they were right. I think this attitude and type of people in the W3C also has lead to the downfall of the W3C as the Web authority.
The practice in the community is to choose a fragment of OWL/description logic that fits your needs. Different tools for different uses. In practice I'm especially fond of the simplest languages, just a little more expressive than a database schema or an UML class diagram, as they are easy to describe things with and yet very useful, with lots of efficient algorithms to infer new things.
I could never really thought understand what it was going to do in specific terms, going from a "Programs could exchange data across the Semantic Web without having to be explicitly engineered to talk to each other" to some specific cases that seemed useful.
Spolsky had a great blog about this ki d of thing. CS people looking at napster, overemphasizing the peer-to-peer aspects and endeavouring to generalized it. Generalising is what science does, so the drive was there.
Generalising a solution is... It can lead you down path to solutuon-seeking problem. The web is also hard. Lots of chicken-egg problems to solve.
When TBL released www he had a browser, server and web pages that you could use right now. The "standards" existed for a non abstract reason.
On criticisms of W3C... idk. The have an almost impossible job. The world's biggest companies control browsers. Standards are very hard to change. Very hard network effect problems, people problems. Enourmous economic, political & ideological interests are impacted by their decisions.
You could say that they not have been involved with the project until it was much more mature and they could decide whether or not to include it. That said, if they were those sorts of people I stead of academic... I'm not sure if that's better.
So.e things just don't work out.
I'll accept some responsibility for that last bit, as somebody who has been active in promoting, and advocating for the adoption of, SemWeb tech. I could do more / do a better job in that regard.
There's also a lot of good information at
although I fear that site doesn't get as much love / attention as it should, and some of the links might be stale.
We think that the main way to achieve a practical semantic web is to have AI synthesize a Knowledge Graph from applying CV/NLP techniques to understanding all webpages. More about our project here:
Other fields are moving towards semantic also
In the end the semantic Web uptake was on the data not the meta data.
Regards to the academic semweb grant story these same idiots are now chasing the cloud with out a clue. And before it was grid.
For some fields there is uptake because it solves problems. But they hardly market themselves as semweb.it's more profitable to market they solve solutions.
People (and programmers) are lazy, and ignorant. If its not in their face broken it frequently won't get fixed. I used to have html validator default enabled in firefox, which would point out HTML errors for every page I landed on. The percentage of web pages that had in your face HTML errors despite all the tools to check for broken HTML still didn't mean people put in the effort to assure their pages were error free.
Basically, if the page rendered "correctly" in the developers browser and maybe another test browser or two, then it was job done.
<!-- Twitter -->
<meta name="twitter:card" content="summary" />
<meta name="twitter:site" content="@TwoBitHistory" />
<!-- OpenGraph -->
<meta property="og:image" content="https://twobithistory.org/images/logo.png" />
<meta property="og:url" content="https://twobithistory.org/2018/05/27/semantic-web.html" />
<meta property="og:title" content="Whatever Happened to the Semantic Web?" />
<meta property="og:description" content="In 2001, Tim Berners-Lee, inventor of the World Wide Web, published an article in Scientific American.
Not to mention, the end result is a hairball alright, a big pile of tangled up
hacky, ad-hoc APIs, bashed together as fast as possible, "to get things done
... and everyone is still using XML anyway.
There was a lot of clunkiness there, in the W3C standard, but, W3C standards are made to be openly debated and revised. Facebook APIs, on the other hand - not so much.
Not only did these outbuzzword the Semantic Web, but as it turns out it's much easier to have a bunch of GPUs running CNNs to extract semantic info from the dirty data you have rather than attempting to cram that data into a well-specified ontology and enforcing that ontology on new incoming data.
For a AI/ML to provide that insight - requires the ML to have access to a good Ontology.
The reason is more nuanced. The main reason being money: https://news.ycombinator.com/item?id=18036041
One of the best examples of the semantic web was Daylife, and they wound up being "acquired" by two bit players that figured out how to monetize things better.. :-/
The barrier to entry is thinking in “graph” instead of relational dB, which is a big cultural change, and then shifting focus and attention to the information science of building valuable ontologies. Once you make the leap, it’s hard to go back - it’s an order of magnitude productivity gain.
The software stack is getting better and more robust-- you can do things quickly with billions of triples that would take you weeks of development to program in a non-trivial relational database environment. The Semantics 2018 conference just took place in Vienna. It was heavy in industry presence and there was a lot of money going around. These guys guys don't give money outside the company unless they're going to get value for it.
So yes, reports of the imminent arrival of the Semantic Web ten years ago were greatly exaggerated. But if you're looking for a topic with an amusingly clueless commentariat, you'll do better to google "PHP object-oriented programming" (or just "hacker", for that matter).
As soon as you start actually consuming semantic data it becomes a protocol that begs to be "hacked".
Because the web is never going to change to adopt a semantic web standard. What we have now are facsimiles of the semantic web, things like Open Graph (which only provides the gist of page media, if that), proprietary search engine results, and proprietary APIs for walled gardens like Facebook.
It's looking like machine learning is going to provide richer gists and then manually-coded directories will provide user interface controllers for those gists in Alexa and other agents. It's a far cry from a truly semantic web but most people won't know the difference.
This is actually a pretty easy problem to solve, but to do it, we'd be running against the wind of capitalism. The semantic web is running behind the scenes at Google, ad agencies, even the NSA. Except they've built it around people's private data instead of publicly accessible documents.
Just to throw some ideas out there, I would start with the low-lying fruit: we need a fully-indexed document store that doesn't barf on mangled data. We need a compelling reason for people to have public profiles again (or an open and secure web of trust for remote API access). We need annotated public relationship graphs akin to ImageNet or NIST for deriving the most commonly-used semantics (edit: DBpedia is a start). Totally doable, but developers gotta pay rent.
Sounds more or less like what the Urbit project (https://urbit.org/) is trying to accomplish. Not an endorsement; it has serious flaws just like everything else. This is a very hard problem to solve. But I sure do hope someone manages to figure it out.
Basically any form of structured data, be it in XML or JSON, served through some channel of data, is everything people need. There is no benefit in further standardization. Simple, informal standards work better than monstrous specifications that nobody ever bothers to deal with properly. The most important part is reducing friction, that's why JSON is the most successful format despite its shortcomings.
One being that while a set of SPARQL Federated Queries would elegantly replace my assorted, custom collection of python scripts, scrapy and PhantomJS (slowly porting over to puppeteer) programs talking to Postgres, there is no usable graph store you and I can use as of 2018.
Another being about monetizing the Semantic Web when playing the role of the data/ontology provider.
The majority of your clients will want your data in relational formats than turtle/RDF files/format anyways.
.. and if you do provide all the data, the consumers (Siri, Alexa and Google Home) get the glory: https://news.ycombinator.com/item?id=18036041
Perhaps determining "meaning" on the web is similar, where the synthetic "order" approach is semantic markup, but the analytical "chaos" approach is NLP, image object recognition, etc.
I think you need both, since human produced content doesn't always follow discrete predefined categories, but also has patterns that can be pre-classified to solve real problems more easily.
Although HTTP has proven resilient, HTML/XML has not. XML's verbosity that enables semantic meaning is exactly its undoing compared to JSON. When meaning is built into both client and server, communication needs to be skinny not rich.
Exactly that was the lacking brick of true distributed linked (semantic) web which now has a chance to be fulfiled by IPFS/IPNS/IPLD or some upcoming standarized equivalent.
I think it won’t work if the underlying transport/presentation is „the web“ (i.e as in Web 2.0).
Instead of decorating semantics/hints around the actual information mostly for SEO reasons it should work the opposite: using all available semantic hints and information bits there already are to create new information by aggregating and putting things in a new context.
It adds value while building upon previous knowledge and allows information and context to be relevant indefinitely.
You will find that:
A) It's probably not that valuable
B) None of the hard problems are technological
It‘s also hard to describe but the best analogy I can come up with: picture a CMS that actually is about _content_ instead of being tied to presentation. So e.g. writing an article about a certain historical event at a certain place consists of stringing all the information and relationships together.
Bringing the correct pieces together eliminates errors and gives a piece of information more meaning when used in different contexts.
Being able to correctly reference e.g. Venice, Italy instead of Venice LA, CA makes a huge difference when looking up time schedules, weather forecast, flight connections etc. Sure there are IATA codes for airports. Wouldn‘t it be great to mention Springfield in an article and having all information about that place (as well as all „backlinks“)?
I also don‘t think it is a technology problem.
However, I‘d like to think about this more in terms of DRY principle of information. There are publications on the web that solely exist to duplicate short-lived, relatively low quality information and putting ads on it. This may be acceptable for some consumer‘s point of view, but fails to create long-lasting contribution to mankind.
Just dumping all bits we currently store into massive archives is possible but taking measures to keep the amount of „information archeology“ needed to understand this data feels the right thing to do.
I‘ll iterate on that.
Sorry if this reads even more confusing and esoteric, need some sleep now.
You're giving an example about an improved CMS. If I imagine myself in the shoes of any actual stakeholder who's got a bunch of employees using (or is paying for the development of) a nontrivial CMS system, I don't really see why they would consider your proposed features as needed and valuable. They don't have a problem with referencing the correct Venice, they can say what they want to say as accurately they want with the current CMS systems. If they're writing an article, then either the weather forecast and flight connections would be relevant to the intended message and included by the writer/editor, or otherwise they should be avoided in order not to distract readers from what the publisher wants. Similarly, having 'backlinks' may be considered harmful if the publisher doesn't want the reader to easily go to another resource.
That is the point of looking at the benefit to stakeholders. It doesn't matter if some approach will or will not "create long-lasting contribution to mankind", that's not why technologies get chosen - if the stakeholders who are making the decision on whether to use this technology have an incentive to do so, it will get used, and if they don't have such an incentive, then the technology will die.
And that's the prime weakness of semantic web - its usefulness requires content creators to adopt the technology, but it doesn't provide any strong incentives for these content creators to do so; the main potential benefits accrue to someone else e.g. the general public, not to those who would need to bear the costs of adapting the content. I don't see how it can be successful without addressing this important misalignment of incentives, since incentives matter far more than technology.
Because ML solves problems symbolic approaches cannot solve (dealing with huge amounts of raw, poorly structured data) and symbolic approaches solve problems ML cannot (dealing with logical reasoning and inferences, like in the query "give me all cities of more than 1 million inhabitants that are less than 300km away from Paris, sorted from southernmost to northernmost").
> We begin with a small seed set of (author, title) pairs [...]. Then we find all occurrences of those books on the Web [...]. From these occurrences we recognize patterns for the citations of books. Then we search the Web for these patterns and find new books. --- http://dis.unal.edu.co/~gjhernandezp/psc/lectures/MatchingMa...
Starting with a seed of just five author-title pairs, their formula found thousands more.
For a ML to provide that insight - requires the ML to have access to a good Ontology.
The reality for the rest of us: It does not make financial sense to build the Semantic Web.
It's a chicken and egg problem.
No one has been able to find a way to monetize the Semantic Web when playing the role of the data/ontology provider.
You cant slap on an ad. You hand off the data and someone else renders it and slaps on an ad raking in all the money.
If you are a data provider, it's much more practical going the traditional way: using relational databases and importing/exporting/feeding data using relational formats than turtle/RDF files. The majority of your clients will want your data in that format anyways.
Designing, Building, Maintaining, Querying an Ontology takes a huge amount of expertise/resources.
Even if you had all the money in the world to obtain the data: there currently exist, in single digits, with none of them being open source/free, capable/scalable triple stores that can store an Ontology/Graph that's dense enough to be meaningful while providing any level of practical turnaround time for queries.
Individuals like you and I or small businesses just don't have this expertise/resources.
We would be spending too much time writing our own graph database, carrying out alignments between entities from various datasets, looking at and correct bad data etc before we even get to what we originally set out to do.
Instead, most of us scrape the data from HTML/REST+JSON; use taxonimies at best and custom code to do what we need to get done and call it a day.
12 years ago when I started learning about the Semantic Web, I envisioned we will be, in 2018, using software agents to make our lives simpler:
1. My software bot looks at my calendar to figure out my day's trip and queries the traffic data from the endpoints relevant to my route
2. It also tries to estimate when I will have time to eat and generate a list of nearby restaurants or fastfood locations depending on my available time
3. It would be able to query endpoints from gas stations relevant to my route to figure out whether and if I should fill up gas
4. If portions of my route has toll roads on it, to find out if I already have a pass and remind me to put it in my car
A critical component of this happening would be support for federation: la SPARQL Federated Query.
While SPARQL does support Federated Queries, no one has an incentive to support the feature because of the above mentioned monetization challenge.
So is my vision in shambles?
No. I still things done but now through an assorted, custom collection of python scripts, scrapy and PhantomJS (slowly porting over to puppeteer) programs talking to Postgres.
There is not a single line of SPARQL involved in the whole pipeline and it does what I want it to do.
... just like everybody else, we are getting along just fine with our hacky solutions.
1. There are few Internet historians. It is difficult and thankless work that doesn't pay well. And too much of the information is lost to history, or can protested by gain-saying, even the true bits.
2. The first browser, Silversmith, was released in 1987. [http://www.linfo.org/browser.html] (thanks BELUG). It worked in English and grew out of my work on the Association of American Publisher's Electronic Manuscript project, the first U.S. electronic publishing project using tags outside of IBM's product offerings. At the time I had been on the Internet since 1972 and I was tired of typing 128.24.67.xxx. (There was a phone book of IP addresses at the time and I was listed in it for work I was doing on satellite image processing on Illiac.)
3. The second browser, a version of Silversmith, was designed for Old Norse for a researcher and it used Norse runes for the display; the controls were in Roman characters.
4. The third browser, a version of Silversmith, was a semantic browser for a U.S. military application. It was successful as far as I know.
5. The fourth browser, Erwise, [http://www.osnews.com/story/21076/The_World_s_First_Graphica...] came about after I gave a paper on Silversmith in Gmunden, Austria in 1988. Erwise worked in the Finnish language. I understand that TimBL looked at it before developing the W3c browser but decided against using it because the comments were in Finnish.
6. I have seen various dates for the browsers from TimBL and MarcA, but they were at least a few years after Silversmith. We can call them the 5th and 6th, but I'm not sure of the ordering. Both of these browsers were based on the earlier AAP Book tag set.
7. Some of my work on Silversmith grew out of Ted Nelson's work on the Alexandria (Xanadu) project. Much of his work has still not been implemented, but that may soon change.
8. Ted developed hypertext controls for printed documents. In that approach when you finished reading a child section a return page number was there to show you where you left off.
9. I developed the first eHypertext system for networks that would link you back to your source document that you came from by pressing the ESC key. In Silversmith you could link between text, images, sound and semantic information.
10. Silversmith is a scalable system. Please observe that just because you don't know how that is done, does not mean it cannot be done. That was what was told me about browsing and searching earlier also.
11. At the time Silversmith was developed, it was understood that VC's would not talk to you without you having a working product. Once I had it working, I found that VC's would still not talk to you. I talked with about a dozen Boston VC's. They would not even sit for a demonstration. I did a demonstration for the ACM in 2007 (thanks PeterG). That is the nature of tools and the bane of toolsmiths, no one wants to pay for tools. I have a recurring nightmare of the yokel who returned his anvil to the smithy saying, "It doesn't work. I can't use it to make beautiful horseshoes like Kevin does, and he has the same anvil. There's something wrong with this one." With Silversmith I lost a competition among 80 vendors for a search application, when none of the others even had an application. One competitor even called me and demanded all my specs and internal design documents. That is largely why you will not find any published information on Silversmith.
12. I can't tell you how many times I have been schooled on programming languages. "You should program that using ThinkC/ObjectiveC/SmallTalk/the X-System." "You need to switch to Ruby-On-Rails/Perl/Python/Awk, that's the way to do it." People, it's not the language, it's the data structures and the code that is important. And, enough with "speed is important." We are all using supercomputers and they will never be fast enough.
13. Silversmith predated the W3c work by several years, that is why I prefer to use the term "semantic web" (lower case) to distinguish it from the W3c term. I discussed the term "web" with Ted and he agreed that was in use earlier before the WWW.
14. Monitizing a tool is an interesting discussion. No one wants to pay $1 every time they pick up a hammer. But for a cabinetmaker, his/her primary tool is the table saw. This means that they are more than willing to pay on a regular basis for maintenance. They must, it is their livelihood, and the manufacturer is going to make money on that maintenance. He does not expect to be able to charge the cabinetmaker a portion of his sales. That is not how that market works. To me, even razor and blades are not fully monitized if you sharpen your own blades.
15. Semantic work is "path dependent" work. Once you start down a certain path it becomes very difficult to retrace your steps. I used to be critical of academics who "sold out" to the W3c vision, but now I realize that for the most part they are trying to provide what the industry wants and uses.
16. Work on Silversmith continues and I'm pleased to say that it is progressing well. The next version will assist in finding and using knowledge in a more conceptual way.
History's written by the victors, so this is a little SGML oriented -- I still recall my own transition from developing Gopher sites to WWW sites for Lynx, and how at the time, these things felt the same -- though it was clear what would win.
Wikipedia's discussion of browser history also omits Gopher + clients, which is too bad, it was kind of a big deal at the time.
To put it on the timeline, your Silversmith was 1987, Berners Lee's WWW browser 1990, McCahill's Gopher 1991, Lynx 1992, and Andreessen's Mosaic 1993.
Still blown away by the force of an idea trying to happen.