With 2+ decades to look back on after this was written, it turns out the author was wrong about this. Web surfers often get 404 errors or invalid deep links get automatically redirected to the main landing page -- and people do not lose confidence in the site owner. Broken links to old NYTimes articles or Microsoft blog posts don't shake the public's confidence in those well-known companies. Link rot caused by companies reorganizing their web content is just accepted as a fact of life.
This was the 2013 deep link I had to USA Treasury rates: http://www.treasurydirect.gov/RI/OFNtebnd
... and the link is broken now in 2021. I don't think that broken link shakes the public's confidence in the US government. Instead, people just use Google/Bing to find it.
I don't agree. I know Microsoft will move or even outright delete important content because chasing shiny new ideas will get somebody promoted. And so I direct people away from Microsoft solutions because that's going to make my life easier.
Any link to Microsoft documentation that doesn't both go via aka.ms and have a significant user community to ensure the aka.ms link stays correct will decay until it's useless.
So by the time you read this https://aka.ms/trustcertpartners may point at a page which is now styled as a wiki or a blog post or a video, but it will get you the list of CAs trusted by Microsoft's products.
However links from that page (e.g. to previous lists) are internal Microsoft links and so, unsurprisingly, they've completely decayed and are already worthless.
For the US Treasury, just like Congress or the Air Force, I don't have some alternative choice, so it doesn't really matter whether I think they're good or bad at this. But Microsoft is, to some extent at least, actually in a competitive business and I can just direct people elsewhere.
I don't doubt there is a narrower tech audience (e.g. HN) that would base tech stack strategy on which company has the most 404 errors but my comment was specifically responding to the author's use of "generally".
I interpreted "they generally lose confidence in the owner of the server" -- as making a claim about the psychology of the general public non-techie websurfers. TBL had an intuition about that but it didn't happen. Yes, people are extremely annoyed by 404 errors but history has shown that websurfers generally accept it as a fact of life.
In any case, to followup on your perspective, I guess it's possible that somebody avoided C# because the Microsoft website has too many 404 errors and chose Sun Java instead. But there were lots of broken links in sun.java.com as well (before and after Oracle acquisition) so I'm not sure it's a useful metric.
Ah, so that’s why Microsoft’s financials have been tanking.
Do the other providers that you recommend in place of Microsoft have a better record of not changing/removing URIs on their website?
I was originally going to say that although the links on the Microsoft page currently work they can't be expected to stay working. They had worked when I last needed them after all. But that was at least a year ago and so I followed one, and sure enough meanwhile Microsoft had once again re-organised everything and broke them all ...
Even the link to their current list as XML (on Microsoft's site) from this page is broken, it gives you some XML, as promised, but I happen to know the correct answer and it's not in that XML. Fortunately the page has off-site links to the CCADB and the CCADB can't be torched by somebody in a completely different department who wants to show their boss that they are an innovator so the data in there is correct.
Microsoft provides customer facing updates for this work by the way. The current page about that assures you that updates happen once per month except December, and tell you about January and February's changes. Were there changes in March, April, or May? Actually I know there were, because Rob Stradling (not a Microsoft employee) mirrors crucial changes to a GitHub repository. Are they documented somewhere new that I couldn't find? Maybe.
No, but some of us do get incredibly pissed off with them.
It's unendingly tiresome to find that some piece of content has disappeared from the web, or been moved without a redirect. Often archive.org can help but there's plenty of stuff that's just ... gone.
I don't necessarily run into this problem every day, but anything up to two or three times a week depending on what I'm looking for.
Is <10% worth caring about? It's certainly not the difference between life and death for almost everyone; no successful business or website is going to shut down because they had ads, or because they broke old URLs. On the other hand, <10% is hardly trivial, people work hard for things that gain a lot less than that, and really, is defining redirects that hard?
You can get away with it when you are "the government .gov", but if you are "small start-up.com" and I want to invest in your risky stocks and all I get is an "Oops page" when I want to know a little more about your dividend policy, I'm gone.
I do, to a degree. When I follow some old link to a site and realize they've changed their structure and provided no capability to find the original article if it's still available but at a new URI, I lose confidence.
Not in them overall, but in their ability to run a site that's easy to find what you want and that's usefulness over time isn't constantly undermined. I lose confidence in their ability to do that, which affects how I view their site.
> ... and the link is broken now in 2021. I don't think that broken link shakes the public's confidence in the US government.
Maybe not the Govt itself, but in their ability to provide information through a website usefully and in a way that's easy to understand and retains its value? I'm not sure most people had the confidence to begin with required for them to lose it.
I'm sure that wouldn't have been an issue if it was super popular and respected, but it was already fading away at that point.
However, it is extremly annoying and it happend to me just recently with Microsoft, when I researched PWAs... I had bookmarked a link to their description on how PWAs are integrated into Microsoft Store. I could only find it on archive.org...
URLs change because systems change.
Someone follows a link, gets a 404 or main page and they're gone.
Cool URIs Don't Change (1998) - https://news.ycombinator.com/item?id=23865484 - July 2020 (154 comments)
Cool URIs Don't Change - https://news.ycombinator.com/item?id=21720496 - Dec 2019 (2 comments)
Cool URIs don't change. (1998) - https://news.ycombinator.com/item?id=21151174 - Oct 2019 (1 comment)
Cool URIs don't change (1998) - https://news.ycombinator.com/item?id=11712449 - May 2016 (122 comments)
Cool URIs don't change. - https://news.ycombinator.com/item?id=4154927 - June 2012 (84 comments)
Tim Berners-Lee: Cool URIs don't change (1998) - https://news.ycombinator.com/item?id=2492566 - April 2011 (25 comments)
Cool URIs Don't Change - https://news.ycombinator.com/item?id=1472611 - June 2010 (1 comment)
Cool URIs Don't change - https://news.ycombinator.com/item?id=175199 - April 2008 (9 comments)
> File name extension. This is a very common one. "cgi", even ".html" is something which will change. You may not be using HTML for that page in 20 years time, but you might want today's links to it to still be valid. The canonical way of making links to the W3C site doesn't use the extension.(how?)
And the page URL is [...]/URI.html
When I switched from PHP to static generators (2017), most URIs continued working without redirects.
1. https://www.w3.org/Provider/Style/URI works
2. If https://www.w3.org/Provider/Style/URI.html is ever 404 then you get a very useful 300 page (e.g., try https://www.w3.org/Provider/Style/URI.foobar)
3. However, from (2) you can see Spanish page is encoded as `.../URL.html.es` which is bad because `.../URI.es` does not exist
Yes it might be all Flash.
> The tag algorithm lets people mint — create — identifiers that no one else using the same algorithm could ever mint. It is simple enough to do in your head, and the resulting identifiers can be easy to read, write, and remember.
They would seem to solve at least part of the problem Berners-Lee opined about:
> Now here is one I can sympathize with. I agree entirely. What you need to do is to have the web server look up a persistent URI in an instant and return the file, wherever your current crazy file system has it stored away at the moment. You would like to be able to store the URI in the file as a check…
> You need to be able to change things like ownership, access, archive level security level, and so on, of a document in the URI space without changing the URI.
> Make a database which maps document URN to current filename, and let the web server use that to actually retrieve files.
Not a bad idea, if you have a good URI scheme to back it up.
I completely believe in the spirit of TBL's statement, and wish the internet had evolved to followed some of the ideals implicit there in. I recall that people took some pride in the formatting of their HTML so that "view source" showed a definite structure as if the creator was crafting something.
Now for the most part, URLs are often opaque and "view source" shows nothing but a bunch of links to impenetrable java script. I actually wonder when "view source" is going to be removed from the standard right-click window as it barely has meaning to the average user any more.
Actually I am quite surprised that it hasn't happened yet.
The end goal is to no longer have any thick device. Engineers will store and manage all of their code in the cloud, then pay a subscription fee to access it.
Rent everything, own nothing.
I bet TBL hates this just as much as the thing the web of documents and open sharing mutated into.
But lots of users still use a desktop browsers for working jobs, especially now a days.
You always want to test on the browser users will ultimately use anyways, even if you have guarantees the code works exactly the same for dev and user editions.
And you can remove things like popups instead of agreeing to them.
If the repo can't reproduce the contents of the page, simply treat that as an error and don't render the page.
I'd even say barely any sites in general outside of dev blogs + projects.
And even then they always link to the repo.
(Of course most pages are not willing to make their raw source available, so you would see a lot of errors.)
I think a source map is a better approach than a GitHub link, is was all I was saying!
It also probably wouldn't be too hard to make some heuristics to figure out whether scripts are unminified and not require a source map. It wouldn't be 100% accurate, and it wouldn't avoid some form of intentional obfuscation that still uses long names for things, but it would probably work pretty well.
> a source map is a better approach than a GitHub link
Most HTML pages are generated from a corresponding Markdown file, in which case a "View Source" link is placed in the footer which goes to that Markdown file in git.
The web is dynamic, not static. Sites come and go, pages come and go, content gets moved and updated and reorganized and recombined. And that's a good thing.
If content has moved, you can often use Google to figure out the new location. And if you want history, that's what we fortunately have archive.org for.
For any website owner, ultimately it's just a cost-benefit calculation. Maintaining old URIs winds up introducing a major cost at some point (usually as part of a tech stack migration), and at some point benefits from that traffic isn't worth it anymore.
There's no purist, moral, or ethical necessity to maintain URIs in perpetuity. But in reality, because website owners want the traffic and don't want to disrupt search results, they maintain them most of the time. That's good enough for me.
I still hope for the new web standards revoliution. Zeldmans of XXI century, where are you?
I disagree. If I generate urls systematically, then when I change the scheme I can easily send a 302 with the new url.
It’s also not hard to just keep a mapping table with old and new.
I think that it’s lazy programmers who can’t handle it. Or chaotic content creators.
I’ve had many conversations with SharePoint people who frequently change urls by renaming files and then just expect everyone linking to it to change their links. They seem to design content without ever linking because links break so much.
It’s the damndest thing as it makes content hard to link to and reuse. People are too young to even care about links and stuff.
Of course if SharePoint search didn’t suck it wouldn’t be as much of an issue.
Honestly that's just not true. If you upgrade to a new CMS/system, the way it routes URLs can be completely incompatible with the old format and it just can't accomodate it.
And if you're dealing with 10,000,000's of URL's and billions of pageviews per month, you're talking about setting up entire servers and software in front of your CMS dedicated just to handling all the redirect lookups.
And you also easily run into conflicts where there is overlap between the old naming convention and the new one, so a URL still exists but it's to something new, while the old content is elsewhere.
Yes it's possible. But the idea that it's "not hard" is also often very false.
> I think that it’s lazy programmers
No, it's the managers who decided it wasn't worth paying a programmer 2 or 4 weeks to implement, because other programming tasks were more important and they employ a finite number of programmers.
For commercial websites, it's not laziness or "chaos". It's just simple cost-benefit.
Clean user friendly urls are often at odds with persistent urls because of the information you can't keep in a url for it to be truly persistent.
Interestingly though, the original URL was: http://www.longbets.org/601
And these are the URL rules I use. Whenever I make any compromise on them or change the priority, I regret it down the road. Nr 1 and nr 2 are directly taken form the OP article.
- URL-rule 1: unique (1 URL == 1 resource, 1 resource == 1 URL)
- URL-rule 2: permanent (they do not change, no dependencies to anything)
- URL-rule 3: manageable (equals measurable, 1 logic per site section, no complicated exceptions, no exceptions)
- URL-rule 4: easily scalable logic
- URL-rule 5: short
- URL-rule 6: with a variation (partial) of the targeted phrase (optional)
Access logs are underrated. It is not some legacy concept. It is supported my all new platforms as well, including various Kubernetes setups.
It's a lot of work because I need to customize URI structure in blog systems and put complex rewrite rules.
To ensure the setup is more or less correct, I have a bash script that tests the redirects via curl.
Kudos for maintaining that compatibility, but it seems that this kind of thing is addressed in the linked document:
> File name extension. This is a very common one. "cgi", even ".html" is something which will change. You may not be using HTML for that page in 20 years time, but you might want today's links to it to still be valid. The canonical way of making links to the W3C site doesn't use the extension.
The original site ran on IIS and was old enough that MS software at the time still used three letter extensions for backwards compatibility, around 1997-1998 IIRC.
I am not convinced that some of the reasons are not theoretical as well. The article in general seems very dismissive of the practical reasons to change URIs, like listing only the 'easily dismissable' ones to begin with.
> The solution is forethought - make sure you capture with every document its acceptable distribution, its creation date and ideally its expiry date. Keep this metadata.
What to do with an expired document?
Don't get me wrong, it's good to strive for long lasting URIs and good design of URIs is important, but overall I believe a good archiving solution is a better approach. It keeps the current namespaces cleaner and easier to maintain. (Maintaining changes in URI philosophy and organization over tens to hundreds of years sounds like a maintenance nightmare.)
The archiving solution should have browser support: If a URI can not be resolved as desired, it could notify the user of existing historic versions.
If you use this add-on: https://addons.mozilla.org/nl/firefox/addon/wayback-machine_... it will pop up a box when it sees a 404 offering to search the wayback machine for older versions.
This might be argument in favor of preemptively adding another level of hierarchy to your URLs, so that when the time comes to move the entire site to a new backend, you can just proxy the entire /2019/ hierarchy to a compatibility layer.
But who are we kidding, we live in a world where 12 months of support for a widely used framework is called "LTS". Nobody seems to care whether their domains will even resolve in 10 years.
I think that is an important thing to do (as I agree desigining URIs is important!) I am just not sure it is reasonable, or even desireable, for them to be indefinately maintained. I am not sure what the right timeframe for deprecation would be either.
301/308 redirects, on the other hand, can be maintained more or less indefinitely as long as you keep around a mapping of old URLs to new URLs. If you need to change your URLs again, you just add more entries to the mapping database. Make your 404 handler look up this database first.
One thing you can't do, though, is repurposing an existing URL. Hence the versioning. :)
I can't see why. You maintain a table of redirects. When you change the URI organization, which would break the URIs, you add the appropriate redirects to prevent that. Then, when you change it again, you just append the new redirects without removing the old ones. If necessary, resolve redirect chains server-side. The table may grow large, but it doesn't seem much more complicated than maintaining redirects across just two generations. Am I missing something?
To mitigate link rot I share URLs along with the title and a snippet from the page. A good title can act as an effective search key to find the page at its new URL even if the domain has changed.
Unfortunately, a lot of web pages have useless titles. Often sites use the same title on every page even when pages have subjects that are obvious choices for inclusion in the title (make, model & part # or article title that appears in the body element but not in the title). With the rise of social media, lots of pages now have OpenGraph metadata (e.g. og:title) set but don’t propagate that to the regular page title.
Page titles are used by search engines and by browsers to set window, tab, and bookmark titles.
Again, why do webmasters and web developers neglect the title element?
Recent discussion on this one's last submission was not even a year ago
$ curl -I -H 'Upgrade-Insecure-Requests: 1' 'http://www.w3.org/Provider/Style/URI.html'
HTTP/1.1 307 Temporary Redirect
HTTP/1.1 200 OK
date: Thu, 17 Jun 2021 10:18:14 GMT
last-modified: Mon, 24 Feb 2014 23:09:53 GMT
$ curl -I 'https://www.w3.org/'
strict-transport-security: max-age=15552000; includeSubdomains; preload
Edit - I've just had a look and URI covers the lot so my bad