I do have one minor technical criticism though. It is so common for people to conjoin parameter with the components of a query string that we don't give it a second thought. The specification, though, does delineate these terms. See: https://tools.ietf.org/html/rfc3986#section-3.4 and the preceding paragraph.
Specifically parameters are trailing data components of the path section of the URI (URL). The query string is separated from the path section by the question mark. URI parameters are rarely used though so this is a common mistake.
Also encoding ampersands into a URI (URL) using HTML encoding schemes is also common, but that is incorrect. URI encoding uses percent coding as its only encoding scheme, such as %20 for a space. Using something like & will literally provide 5 characters in the address unencoded or may result in something like %26amp; in software that auto-converts characters into the presumed encoding.
To encode any string (for example a URL) containing & in HTML, you must HTML-encode that &. Using & in the value of the href attribute for an a-tag must result in a URL containing just & in place of the entire entity. This is a property of HTML that has nothing to do with URLs or URL encodings.
I remember dismissing the "World Wide Web" because my 80286-based IBM PC I used at college in 1993 couldn't run a graphical web browser (that I knew of) so I compared the terminal versions of a web browser to Gopher and determined the latter was far superior - it had more content and was much cleaner to use in a terminal.
The history of the Internet and Web definitely would have been soooo different had URLs been formatted like "http:com/example/foo/bar/baz" for sure. It's so much cleaner and sensical. Part of the mystique of "foo.com" is that it somehow seems completely different from "bar.org". Not sure why, but it just is.
Just a side note: DOS and Windows using \ instead of / is annoying and has been annoying for nearly 40 years and I don't ever think I'll ever find it not annoying. You'd think 4 decades would be enough time, but it still bugs me.
What we call the web, surely? I appreciate we conflate the two, but in this context I think that's what was meant.
And it's really the URL (URI, IRI, URN...) that makes the web. An amazing thing.
I get incredibly frustrated with websites (e.g. many SPAs or frontend-happy sites) that don't provide a URL that I can bookmark and share where it obviously makes sense to do so.
Well Chrome devs seem to disagree, with the recent push to hide URLs, and so do SPA devs. Is the article meant to save the URL, or rather a fairwell? Anyway, URLs aren't particularly well-designed IMO, for their use of the ampersand char as separator in the query part alone, since these conflict with ampersand as used for starting entity references in HTML and other SGML docs.
??? - those usually use the URL bar in my experience.
They aren't conventionally "shareable", first off, which violates more or less the spirit of what a URL is supposed to represent on the Web.
I've written plenty of SPAs myself that use the URL and you couldn't actually tell it's an SPA unless you have the technical experience to know what to look for.
If what you're saying is "this infinite-scrolling page sucks at respecting URLs", that's something else, and it's not specific to SPAs. Infinite scrolling predates SPAs (Here is one for jQuery! https://infinite-scroll.com/#initialize-with-jquery), and doesn't have to break URLs if you don't write your code like an uncivilized brute.
"You may not be using HTML for that page in 20 years time, but you might want today's links to it to still be valid. The canonical way of making links to the W3C site doesn't use the extension."
I am constantly surprised by how difficult many static site generators make it to create URLs of this format (notice no trailing slash, before you tell me about the "Pretty URLs" options in most software).
That is cool. Does anyone have any more info? Perhaps a picture?
> I was able to find these pages through Google, which has functionally made page titles the URN of today.
> Given the power of search engines, it’s possible the best URN format today would be a simple way for files to point to their former URLs.
Daniel Bernstein proposes a document ID that can be found in search engines: https://cr.yp.to/bib/documentid.html
I actually started using this before, but found it to be clumsy and stopped using it.
Someone else has suggested a UUID instead: https://lobste.rs/s/xltmol/this_page_is_designed_last#c_nis6...
But that's still clumsy. I'd prefer something shorter.
Perhaps the title of the page is the best option, as people are more likely to have that saved than the UUID: https://lobste.rs/s/xltmol/this_page_is_designed_last#c_0snr...
> I imagine it’d be uncommon for someone has the UUID but not the website saved.
To make a URL for content that can change, you would need to use a system that lets you create a mutable URL, and then you can make that point to an IPFS link. DNS works well for that. You can add a "_dnslink" TXT DNS record to a domain, and then when anyone with an IPFS-supporting browser (or browser add-on) accesses the domain, their browser can fetch the content over IPFS from anyone seeding it, and help seed it if the user wants. (Yes, this wouldn't work well at all for domains with dynamic content. Works great for domains that have static content, including sites made by static site generators, etc.)
I personally serve my blog with IPFS by making its files accessible over IPFS, putting a _dnslink TXT record on my domain pointing to the directory's current IPFS link, and then my domain's A record is pointing to a service (cloudflare-ipfs.com) that responds to HTTP requests by serving contents from the IPFS link that my _dnslink record points to. I'm using multiple free IPFS pinning services to keep my blog's files seeded on IPFS. I like that I'm not tied down to any of them, and I could easily replace them with other services or my own server without changing any of the rest of the setup. Additionally, anyone that liked my blog could help seed it themselves, so it could outlast the pinning services and me.
Assuming a world where IPFS was commonly-supported and my content was well-liked enough to get seeded by others, the only point of failure for keeping my site up is the domain name staying staying renewed and the DNS service I'm using staying up. Though if those went down, as long as someone still had the last IPFS link to my site, my site would still be accessible through that as long as people seeded it.
I believe the Ethereum Name Service would also be a good decentralized alternative to using DNS for keeping update-able URLs pointing to IPFS content, but I haven't personally used it and I don't know if there are good integrations that make it usable for that today. IPFS also has a feature called IPNS for creating mutable links to content that can be updated by whoever owns the private key, which sounds perfect on paper, but it doesn't work well in my experience for a few reasons (latency, timeouts, etc), so I wouldn't recommend it.
> One such effort was the Semantic Web. The dream was to create a Resource Description Framework (editorial note: run away from any team which seeks to create a framework), which would allow metadata about content to be universally expressed. For example, rather than creating a nice web page about my Corvette Stingray, I could make an RDF document describing its size, color, and the number of speeding tickets I had gotten while driving it.
> This is, of course, in no way a bad idea. But the format was XML based, and there was a big chicken-and-egg problem between having the entire world documented, and having the browsers do anything useful with that documentation.
The author completely falls short to describe the evolution of the SemWeb over the past 10 years. Tons of specs, several declarative languages and technologies have been grown to not just get beyond the verbosity of a serialization format such as XML, but also move away from the classic relational data model.
Turtle, JSON-LD, SPARQL, Neo4J, Linked Data Fragments,... come to mind. And then there are the emerging applications of linked data. If anything, the Federated Web is exactly about URLs and semantic web technologies based on linking and contextualizing data.
The entire premise of Tim Berner Lee's Solid/Inrupt is based on these standards including URI's.
Linked data and federation isn't just about challenging social media, it's also about creating knowledge graphs - such as wikidata.org - and creating opportunities for things such as open access and open science.
Then there's this:
> httpRange-14 sought to answer the fundamental question of what a URL is. Does a URL always refer to a document, or can it refer to anything? Can I have a URL which points to my car?
> They didn’t attempt to answer that question in any satisfying manner. Instead they focused on how and when we can use 303 redirects to point users from links which aren’t documents to ones which are, and when we can use URL fragments (the bit after the ‘#’) to point users to linked data.
Err. They did.
That's what the Resource Description Framework is all about. It gives you a few foundational building blocks for describing the world. Even more so, URI's have absolutely NOTHING to do with HTTP status codes. It just so happens that HTTP leverages URI's and creates a subset called HTTP URL's that allows the identification and dereference of webbased resources.
You can use URI's as globally unique identifiers in a database. You could use URN's to identify books. For instance urn:isbn:0451450523 is an identifier for the 1968 novel The Last Unicorn.
So, this is a false claim. I could forgive them for inadvertently not looking beyond URL's as a mechanism used within the context of HTTP communication.
> In the world of web applications, it can be a little odd to think of the basis for the web being the hyperlink. It is a method of linking one document to another, which was gradually augmented with styling, code execution, sessions, authentication, and ultimately became the social shared computing experience so many 70s researchers were trying (and failing) to create. Ultimately, the conclusion is just as true for any project or startup today as it was then: all that matters is adoption. If you can get people to use it, however slipshod it might be, they will help you craft it into what they need. The corollary is, of course, no one is using it, it doesn’t matter how technically sound it might be. There are countless tools which millions of hours of work went into which precisely no one uses today.
I'm not even sure what the conclusion is here. Did the 'hyperlink' fail? did the concept of a 'URI' fail? (both are different things!) Because neither failed, on the contrary!
Then there's this wonky comparison of the origin of the Web with a single project or a startup. The author did the entire research on the history of the URI but they still failed to see that the Internet and the Web were invented by committee and by coincidence. Pioneers all over the place had good ideas, some coalesced and succeeded, others didn't. Some were adapted to work together in a piece-meal fashion such as Basic Auth.
And that's totally normal. Organic growth and distribute development is the baseline. Yes, the Web as we know it today is the result of many competing voices, but at the same time it could only work if everyone ended up agreeing over the basics.
The fact of the matter is that some companies - looking at you FAANG - would rather have us all locked in a closed, black-box ecosystems, rather then having open standards around that allow for interoperability, and thus create opportunities for new threats to challenge their business interests.
I understand that the article is written by CloudFlare, a CDN company with its own interests. But I'm trying to wrap my ahead around how the author failed in addressing exactly future opportunities and threats, after this entire exposé.
Not sure what you mean by that, but Zack wasn't writing something to further some secret interest Cloudflare has in the structure of URLs.
URLs are names for things (companies, mailboxes, pictures of cats), but they're also (encoded) directions to get representations of those named things.
CloudFlare is concerned with the mechanics mostly, the latter. Things like Wikidata, knowledge bases, schema.org are interested in the former perspective.
Anything that is URL addressable is great for Cloudflare.
I think that the author addressed this very well:
There is a popular perception that the internet standards bodies didn’t do much from the finalization of HTTP 1.1 and HTML 4.01 in 2002 to when HTML 5 really got on track. This period is also known (only by me) as the Dark Age of XHTML. The truth is though, the standardization folks were fantastically busy. They were just doing things which ultimately didn’t prove all that valuable.
One such effort was the Semantic Web.
Most of the things you listed were developed in that period. I'll make a partial exception for JSON-LD because - as the author of that standard himself says:
So screw it, we thought, let’s create a graph data model that looks and feels like JSON, RDF and the Semantic Web be damned.
I hate the narrative of the Semantic Web because the focus has been on the wrong set of things for a long time.
> There is a popular perception that the internet standards bodies didn’t do much from the finalization of HTTP 1.1 and HTML 4.01 in 2002 to when HTML 5 really got on track. This period is also known (only by me) as the Dark Age of XHTML.
I think that's hindsight bias talking.
Who knew at the time how the next 20 years would play out. Google was just in it's infancy. Internet Explorer dominated the browser market and the same concerns - vendor lock-in and proprietary protocols - were just as much a thing back then as they are today.
HTML5 could emerge because of the wide adoption of XHTML and web standards by developers and designers. Not despite the existence of XHTML. The latter is just heavily colored value attribution on the part of the author.
> The truth is though, the standardization folks were fantastically busy. They were just doing things which ultimately didn’t prove all that valuable.
I think this applies to literally any sizable enterprise as rising complexity diminishes predictability. The only way to find out whether or not a complex enterprise is valuable is... by going down that road and test your ideas.
The implication made here is a take against standards bodies not following market dynamics - doing market research; following dominant technologies - but instead impose their own principled vision on a market.
But that's a false dichotomy. If anything, standards bodies are committees in part made up of people who are also affiliated or represent incumbents in the marketplace. And in part they are made up of people who defined interests groups outside of commercial ventures such as academia, research, public governance, and so on.
The output of a standards body is by very definition a compromise that doesn't tailor the specific needs and wants of a single actor. That's actually a good thing.
> I hate the narrative of the Semantic Web because the focus has been on the wrong set of things for a long time.
The author is correct. The RDF spec has a lot of shortcomings. And the Semantic Web discussion was a difficult debate for a long time because things hadn't coalesced in a clear vision. And that's not a bad thing.
Context matters. At that time, nobody knew what the SemWeb was supposed to become or into what it would evolve. It was simply an idea and there were a few tacit attempts to work in a problem space that wasn't fully charted yet. It's hard to navigate if you don't know the lay of the land, right?
This blogpost was written when the final recommendation of JSON-LD was published. And that specification could only emerge after it was clear that the direction of the debate wasn't leading to nowhere.
All I see is a normal evolution of things in an R&D context. By the same token, you could argue that the telegraph was a useless device because usage declined and nobody is using that technology anymore. But then you'd disregard the fact that the existence and use of telegraphs inspired others to create improvements such as the telephone or the radio.
Which is fine, except the premature standardization approach used by Semantic Web technologies destroyed any chance they had of working.
Actually, no. HTML5 forked from HTML4, not XHTML because the W3C had a different vision.
This isn't the just authors view: I followed the mailing list and it's pretty well understood.
See for example the "A Competing Vision" in https://diveinto.html5doctor.com/past.html
> Err. They did.
> That's what the Resource Description Framework is all about. It gives you a few foundational building blocks for describing the world. Even more so, URI's have absolutely NOTHING to do with HTTP status codes. It just so happens that HTTP leverages URI's and creates a subset called HTTP URL's that allows the identification and dereference of webbased resources.
> You can use URI's as globally unique identifiers in a database. You could use URN's to identify books. For instance urn:isbn:0451450523 is an identifier for the 1968 novel The Last Unicorn.
> So, this is a false claim. I could forgive them for inadvertently not looking beyond URL's as a mechanism used within the context of HTTP communication.
See this is almost the canonical example of why the semantic web remains the once and always future of the web.
Take this: Even more so, URI's have absolutely NOTHING to do with HTTP status codes. It just so happens that HTTP leverages URI's and creates a subset called HTTP URL's that allows the identification and dereference of webbased resources.
Sure, URIs are just the addressing scheme. I think we all get that. But the practicalities of building systems means that the applications have to understand both the addressing scheme, and some way of handling errors which status codes supply. Notably all implementations of URIs (HTTP, Files, IPFS) have to implement error handling themselves.
The holistic approach that the (non-semantic) web took in evolving the browser, HTML, and HTTP together meant that practical applications could be built on it.
Contrast that to the ideological approach of the semantic web, where - yes, Resource Description Framework (RDF) gives you addresses, but it's a weak data modelling approach that would be ignored if it was in a programming language (eg, the lack of list support! - see )
Anyway, to go to your original point: the original httpRange-14 was in the context of HTTP URIs, but the issue equally applies to non-HTTP URIs. At least for HTTP we can discuss it sensibly because status codes are part of the spec. For URIs in a general sense it seems impossible to resolve this (no pun intended).
 See Decision 3 in http://manu.sporny.org/2014/json-ld-origins-2/ (or read the whole article. It's good).