>When you change a URI on your server, you can never completely tell who will have links to the old URI. [...] When someone follows a link and it breaks, they generally lose confidence in the owner of the server.
With 2+ decades to look back on after this was written, it turns out the author was wrong about this. Web surfers often get 404 errors or invalid deep links get automatically redirected to the main landing page -- and people do not lose confidence in the site owner. Broken links to old NYTimes articles or Microsoft blog posts don't shake the public's confidence in those well-known companies. Link rot caused by companies reorganizing their web content is just accepted as a fact of life.
... and the link is broken now in 2021. I don't think that broken link shakes the public's confidence in the US government. Instead, people just use Google/Bing to find it.
I don't agree. I know Microsoft will move or even outright delete important content because chasing shiny new ideas will get somebody promoted. And so I direct people away from Microsoft solutions because that's going to make my life easier.
Any link to Microsoft documentation that doesn't both go via aka.ms and have a significant user community to ensure the aka.ms link stays correct will decay until it's useless.
So by the time you read this https://aka.ms/trustcertpartners may point at a page which is now styled as a wiki or a blog post or a video, but it will get you the list of CAs trusted by Microsoft's products.
However links from that page (e.g. to previous lists) are internal Microsoft links and so, unsurprisingly, they've completely decayed and are already worthless.
For the US Treasury, just like Congress or the Air Force, I don't have some alternative choice, so it doesn't really matter whether I think they're good or bad at this. But Microsoft is, to some extent at least, actually in a competitive business and I can just direct people elsewhere.
>I don't agree. [...] And so I direct people away from Microsoft solutions
I don't doubt there is a narrower tech audience (e.g. HN) that would base tech stack strategy on which company has the most 404 errors but my comment was specifically responding to the author's use of "generally".
I interpreted "they generally lose confidence in the owner of the server" -- as making a claim about the psychology of the general public non-techie websurfers. TBL had an intuition about that but it didn't happen. Yes, people are extremely annoyed by 404 errors but history has shown that websurfers generally accept it as a fact of life.
In any case, to followup on your perspective, I guess it's possible that somebody avoided C# because the Microsoft website has too many 404 errors and chose Sun Java instead. But there were lots of broken links in sun.java.com as well (before and after Oracle acquisition) so I'm not sure it's a useful metric.
Why can't it be both? They accept it as a fact of life but also lose confidence. I understand why links get broken and accept that it can happen but when it does, I try to stray away from that website. Non-techie websurfers probably don't even understand why the link is broken so they get even more confused. One one hand they might think it's their fault but on the other hand, they might assume the entire website stopped working.
>I know Microsoft will move or even outright delete important content because chasing shiny new ideas will get somebody promoted. And so I direct people away from Microsoft
Do the other providers that you recommend in place of Microsoft have a better record of not changing/removing URIs on their website?
Yes. You might be underestimating just how bad Microsoft are at this.
I was originally going to say that although the links on the Microsoft page currently work they can't be expected to stay working. They had worked when I last needed them after all. But that was at least a year ago and so I followed one, and sure enough meanwhile Microsoft had once again re-organised everything and broke them all ...
Even the link to their current list as XML (on Microsoft's site) from this page is broken, it gives you some XML, as promised, but I happen to know the correct answer and it's not in that XML. Fortunately the page has off-site links to the CCADB and the CCADB can't be torched by somebody in a completely different department who wants to show their boss that they are an innovator so the data in there is correct.
Microsoft provides customer facing updates for this work by the way. The current page about that assures you that updates happen once per month except December, and tell you about January and February's changes. Were there changes in March, April, or May? Actually I know there were, because Rob Stradling (not a Microsoft employee) mirrors crucial changes to a GitHub repository. Are they documented somewhere new that I couldn't find? Maybe.
> and people do not lose confidence in the site owner.
No, but some of us do get incredibly pissed off with them.
It's unendingly tiresome to find that some piece of content has disappeared from the web, or been moved without a redirect. Often archive.org can help but there's plenty of stuff that's just ... gone.
I don't necessarily run into this problem every day, but anything up to two or three times a week depending on what I'm looking for.
I see this just as how ads are perceived to be annoying. People can complain all the time but this annoyance, or in this case distrust, just doesn't seem to affect anything.
It's a good comparison. An effect can be real, and people just not notice it, ever. Let's round off the ad effect to 10% loss of users. How do you notice that? You can simulate out traffic, and decrease the mean 10% at some random point and draw time-series: it's actually quite hard to see, particularly with any kind of momentum or autocorrelation trends. And that's with website traffic where you can quantify it. How do you measure the impact on something more intangible? If people can not notice that effect of ads, they can certainly not notice subtler effects like the long-term harm of casually breaking links...
Is <10% worth caring about? It's certainly not the difference between life and death for almost everyone; no successful business or website is going to shut down because they had ads, or because they broke old URLs. On the other hand, <10% is hardly trivial, people work hard for things that gain a lot less than that, and really, is defining redirects that hard?
Speaking of noticing small effects, Mechwarrior Online was four months into its open beta before anyone noticed that an expensive mech upgrade that was supposed to make your weapons fire 5% faster actually made them fire 5% slower.http://www.howtospotapsychopath.com/2013/02/20/competitively...
Yes, it's a kind of slippery slope: as typically set up, changes are biased towards degrading quality. If you run a normal significance test and your null is leaving it unchanged, then you will only ever ratchet downwards in quality: you either leave it unchanged, or you have a possibly erroneous choice which trades off quality for degradation, and your errors will accumulate over many tests in a sorites. I discuss this in the footnote about Schlitz - to stop this, you need a built-in bias towards quality, to neutralize the bias of your procedures, or to explicitly consider the long-term costs of mistakes and to also test quality improvements as well. (Then you will be the same on average, only taking tradeoffs as they genuinely pay for themselves, and potentially increasing quality rather than degrading it.)
I swear this happens every time I visit a form. Either the 1 image I need has been purged or every post links to some other dead form. I've had pretty bad luck with archive.org for these things.
Yes, because the examples you give already have a solid reputation.
You can get away with it when you are "the government .gov", but if you are "small start-up.com" and I want to invest in your risky stocks and all I get is an "Oops page" when I want to know a little more about your dividend policy, I'm gone.
For me it depends on where the link originated. If it's a 404 within your own site, that shows sloppiness that will make me question your ability to run a business. However, if I get a 404 on a link from an external site, I'll assume things have been reorganized and try to find the new location myself (while wondering why it was moved). Changing URIs without setting up forwarding is so common I wouldn't give it much thought.
> Web surfers often get 404 errors or invalid deep links get automatically redirected to the main landing page -- and people do not lose confidence in the site owner.
I do, to a degree. When I follow some old link to a site and realize they've changed their structure and provided no capability to find the original article if it's still available but at a new URI, I lose confidence.
Not in them overall, but in their ability to run a site that's easy to find what you want and that's usefulness over time isn't constantly undermined. I lose confidence in their ability to do that, which affects how I view their site.
> ... and the link is broken now in 2021. I don't think that broken link shakes the public's confidence in the US government.
Maybe not the Govt itself, but in their ability to provide information through a website usefully and in a way that's easy to understand and retains its value? I'm not sure most people had the confidence to begin with required for them to lose it.
I disagree. In your example, it doesn't cause me to loose faith in the US treasury itself, but I am less confident that they can reliability and correctly run a website.
It is more like, I lose confidence in whatever claims (if any) the links were meant to support and confidence that the linked-to site can be linked to as a source.
I suspect it depends on what the existing reputation is. I recall a discussion here on Hacker News where someone asked why people weren't using Google+ for software blog posts anymore, and the response was that they'd broken all the links.
I'm sure that wouldn't have been an issue if it was super popular and respected, but it was already fading away at that point.
The people who have to navigate the MS documentation are not generally the same people who choose to use MS as a supplier. If you're that big and rich you will have totally separate engagements with different layers of a business. For most suppliers that's not the case, and documentation is a competitive advantage.
I don't know. If I consistently see 404s, I eventually associate that domain with "broken" and start avoiding it. Especially in search scenarios where there's lots of links to choose from.
I've seen some crazy stuff driven by the fear of broken links. One place I worked had all the traffic for their main landing page hitting a redirect on a subsystem because the subsystem had a url that was higher in google etc. I worked on the subsystem and rather then fix things in DNS and the search engines they preferred to expect us to keep our webservers up at all times to serve the redirect. We were on physical hardware in those days and while we had redundancy it was all in one location, made for some fun times.
Well, the claim is that people "generally lose confidence," which I'd interpret as a decrease in confidence, not a total destruction of confidence. Microsoft and the US treasury have some dead links, sure, but they run big sites. Most the vast majority of links that you'd encounter through normal browsing lead to reasonable places.
Yes I agree to the point, that I wouldn't lose confidence in the company.
However, it is extremly annoying and it happend to me just recently with Microsoft, when I researched PWAs... I had bookmarked a link to their description on how PWAs are integrated into Microsoft Store. I could only find it on archive.org...
Ah, the digital naiveté of the nineties. Nowadays, cool URLS get changed on purpose so you have to enter the platform through the front instead of bouncing off your destination.
Good point. If you’re the only user with utm_origin=boogers&utm_medium=poop, it’ll be trivial to connect you between websites. It’s all automated, so there’s slim to no chance of making a server admin chuckle while checking the logs, unfortunately.
If you are using Firefox Nightly, you can install arbitrary add-ons (though quite a number of them don't work due to UI difference, container doesn't work too) now after going though some steps: https://blog.mozilla.org/addons/2020/09/29/expanded-extensio...
I'm fairly sure the parent was joking and in full agreement with you. Just so this isn't a point out the joke post, a few years ago I saw a HN post from a bunch of people who used print to PDF, and pdfgrep as a bookmarking solution. It doesn't solve the original problem, but it does act a a coping strategy for when content goes missing. I've been using it for a good while now and it's been real useful already.
This is just a trick to hide that the documentation is terrible and useless. If we get to a “this content doesn’t exist” then we think that there is an answer to our question, we just can’t find it. We eventually give up, blaming ourself for not finding it; instead of finding and getting something stupid and not useful.
Old software? It's difficult enough to find documentation on the current Oracle database. You only get random versions through a search engine, and can neither navigate into a different one, nor replace the version on the URL and get to a functioning page. Also, you can't start from the top and try to follow the table of contents because it's completely different.
> File name extension. This is a very common one. "cgi", even ".html" is something which will change. You may not be using HTML for that page in 20 years time, but you might want today's links to it to still be valid. The canonical way of making links to the W3C site doesn't use the extension.(how?)
Since I switched from ASP to PHP (2008), I avoided file extensions in page URI in most cases, and instead placed every page into its own folder.
This is compatible with every web server without using rewrite rules.
When I switched from PHP to static generators (2017), most URIs continued working without redirects.
To be fair, it's the OP that chose to use the URL with the extension. However, you could say WC3 could've disable their servers from automatically creating URLs with file extension if they wanted to follow this.
I'm still not sure why Tag-URI[1] hasn't gained more support:
> The tag algorithm lets people mint — create — identifiers that no one else using the same algorithm could ever mint. It is simple enough to do in your head, and the resulting identifiers can be easy to read, write, and remember.
They would seem to solve at least part of the problem Berners-Lee opined about:
> Now here is one I can sympathize with. I agree entirely. What you need to do is to have the web server look up a persistent URI in an instant and return the file, wherever your current crazy file system has it stored away at the moment. You would like to be able to store the URI in the file as a check…
> You need to be able to change things like ownership, access, archive level security level, and so on, of a document in the URI space without changing the URI.
> Make a database which maps document URN to current filename, and let the web server use that to actually retrieve files.
Not a bad idea, if you have a good URI scheme to back it up.
I even wrote a Ruby library for it[2] (there are others[3][4] but no Javascript one that I can find, it being the language that produces the worst URLs and doesn't seem to have a community that cares about anything that happened last week, let alone 20 years ago, that's not a surprise)
I recall it was almost axiomatic that the more expensive the content management system, the worse the URLs it produced.
I completely believe in the spirit of TBL's statement, and wish the internet had evolved to followed some of the ideals implicit there in. I recall that people took some pride in the formatting of their HTML so that "view source" showed a definite structure as if the creator was crafting something.
Now for the most part, URLs are often opaque and "view source" shows nothing but a bunch of links to impenetrable java script. I actually wonder when "view source" is going to be removed from the standard right-click window as it barely has meaning to the average user any more.
Which is why you can still install software you want from any source on laptops (for the time being), but have to go through Apple to get software on your phone.
The end goal is to no longer have any thick device. Engineers will store and manage all of their code in the cloud, then pay a subscription fee to access it.
Rent everything, own nothing.
I bet TBL hates this just as much as the thing the web of documents and open sharing mutated into.
Compared to the regular Firefox, there aren't any extra features (other than features that will reach the regular version in a few weeks, since Dev Edition is an alpha/beta of the next release). It's just a branding thing, a few extras by default, and a separate browser profile (might come in handy too).
Yeah, seems a bit pointless to split builds for no reason.
You always want to test on the browser users will ultimately use anyways, even if you have guarantees the code works exactly the same for dev and user editions.
Firefox's current devtools shipped beginning with Firefox 4. It was only Firefox 3.x that shipped without any kind of tooling for web developers. Prior to Firefox 3, the original DOM Inspector was built-in by default (just like the case with Mozilla Suite and SeaMonkey).
... and a "go fuck yourself" to anybody who publishes pages that simply don't require source maps to begin with? I admit that would at least be in line with the current trend of prioritizing/encouraging the kinds of authors and publishers who engage in (the mess of) what is considered "good" modern Web development.
I mean, I think the whole idea of only showing websites with unminified source available is silly, but I'm willing to think about the idea ;)
I think a source map is a better approach than a GitHub link, is was all I was saying!
It also probably wouldn't be too hard to make some heuristics to figure out whether scripts are unminified and not require a source map. It wouldn't be 100% accurate, and it wouldn't avoid some form of intentional obfuscation that still uses long names for things, but it would probably work pretty well.
Most HTML pages are generated from a corresponding Markdown file, in which case a "View Source" link is placed in the footer which goes to that Markdown file in git.
Tell it to Tor. They are destroying every URI created, indexed, known, and used over the last 15 years on Oct 15th, https://blog.torproject.org/v2-deprecation-timeline . Because of this indieweb has refused to add tor onion service domain support for identity.
Onionland people had plenty of warning though. My old v2 Onion bookmarks are all discarded. The new V3 addresses are a good indicator of which .onion operators are serious and want to stay online no matter what.
I consider myself pretty active on tor. I've hosted a number of services for a decade. While I heard about the new Tor v3 services a long time ago I didn't hear about Tor v2 support being completely removed until April 2021. It was quite a shock.
At the end of the day, it's simply an unrealistic expectation that URIs don't change. And declaring for 20+ years that this isn't "cool" isn't going to change a thing about it...
The web is dynamic, not static. Sites come and go, pages come and go, content gets moved and updated and reorganized and recombined. And that's a good thing.
If content has moved, you can often use Google to figure out the new location. And if you want history, that's what we fortunately have archive.org for.
For any website owner, ultimately it's just a cost-benefit calculation. Maintaining old URIs winds up introducing a major cost at some point (usually as part of a tech stack migration), and at some point benefits from that traffic isn't worth it anymore.
There's no purist, moral, or ethical necessity to maintain URIs in perpetuity. But in reality, because website owners want the traffic and don't want to disrupt search results, they maintain them most of the time. That's good enough for me.
It is very sad that this "cost-benefit" calculation ends up ruining the web. Devs too lazy to learn the basics and doing everything with React and a gazzilion of libs, nobody caring about accessibility, nobody caring about the url structure. It is very possible to go through 50 CMSes and still keep the URLS intact. Just that we forgot the users and UX moved away giving up the place for the DX.
I still hope for the new web standards revoliution. Zeldmans of XXI century, where are you?
> it's simply an unrealistic expectation that URIs don't change.
I disagree. If I generate urls systematically, then when I change the scheme I can easily send a 302 with the new url.
It’s also not hard to just keep a mapping table with old and new.
I think that it’s lazy programmers who can’t handle it. Or chaotic content creators.
I’ve had many conversations with SharePoint people who frequently change urls by renaming files and then just expect everyone linking to it to change their links. They seem to design content without ever linking because links break so much.
It’s the damndest thing as it makes content hard to link to and reuse. People are too young to even care about links and stuff.
Of course if SharePoint search didn’t suck it wouldn’t be as much of an issue.
Honestly that's just not true. If you upgrade to a new CMS/system, the way it routes URLs can be completely incompatible with the old format and it just can't accomodate it.
And if you're dealing with 10,000,000's of URL's and billions of pageviews per month, you're talking about setting up entire servers and software in front of your CMS dedicated just to handling all the redirect lookups.
And you also easily run into conflicts where there is overlap between the old naming convention and the new one, so a URL still exists but it's to something new, while the old content is elsewhere.
Yes it's possible. But the idea that it's "not hard" is also often very false.
> I think that it’s lazy programmers
No, it's the managers who decided it wasn't worth paying a programmer 2 or 4 weeks to implement, because other programming tasks were more important and they employ a finite number of programmers.
For commercial websites, it's not laziness or "chaos". It's just simple cost-benefit.
It's easy to put a reverse proxy in front of your service to translate old permanent URLs into whatever you're running currently. It's a table of redirects, and perhaps a large one, but it doesn't require a lot of logic to handle.
Jeremy Keith's 11 year long bet against cool URLs (specifically on longbets.org) comes up next year -- and it looks like he might lose it
https://longbets.org/601/
We've been thinking about link rot since then, and yet we're still letting it happen without much care. Kinda' sad, but at least we're resting on the shoulders of archive.org.
And these are the URL rules I use. Whenever I make any compromise on them or change the priority, I regret it down the road. Nr 1 and nr 2 are directly taken form the OP article.
URL-rules
- URL-rule 1: unique (1 URL == 1 resource, 1 resource == 1 URL)
- URL-rule 2: permanent (they do not change, no dependencies to anything)
- URL-rule 3: manageable (equals measurable, 1 logic per site section, no complicated exceptions, no exceptions)
- URL-rule 4: easily scalable logic
- URL-rule 5: short
- URL-rule 6: with a variation (partial) of the targeted phrase (optional)
URL-rule 1 is more important than 1 to 6 combined, URL-rule 2 is more important than 2 to 6 combines, ... URL-rule 5 and 6 are always a trade-of. Nr 6. is optional. A truly search optimized URL must fulfill all URL-rules.
> I've a bunch of small sites whose links end in ".php" despite they no longer run on PHP.
Kudos for maintaining that compatibility, but it seems that this kind of thing is addressed in the linked document:
> File name extension. This is a very common one. "cgi", even ".html" is something which will change. You may not be using HTML for that page in 20 years time, but you might want today's links to it to still be valid. The canonical way of making links to the W3C site doesn't use the extension.
I'm not sure if they still use it but a site I redid for a company many moons ago in Perl had Apache set to treat .htm files as CGI scripts so we didn't have to change the previous, static site's URL's.
The original site ran on IIS and was old enough that MS software at the time still used three letter extensions for backwards compatibility, around 1997-1998 IIRC.
Ironically this was linked to with the .htm extension link but it was nice to discover that the non-page specific URI also worked on this site: https://www.w3.org/Provider/Style/URI
> There are no reasons at all in theory for people to change URIs (or stop maintaining documents), but millions of reasons in practice.
I am not convinced that some of the reasons are not theoretical as well. The article in general seems very dismissive of the practical reasons to change URIs, like listing only the 'easily dismissable' ones to begin with.
> The solution is forethought - make sure you capture with every document its acceptable distribution, its creation date and ideally its expiry date. Keep this metadata.
What to do with an expired document?
Don't get me wrong, it's good to strive for long lasting URIs and good design of URIs is important, but overall I believe a good archiving solution is a better approach. It keeps the current namespaces cleaner and easier to maintain. (Maintaining changes in URI philosophy and organization over tens to hundreds of years sounds like a maintenance nightmare.)
The archiving solution should have browser support: If a URI can not be resolved as desired, it could notify the user of existing historic versions.
An expired document can be removed, and a "410 Gone" response returned. This is clearer for the user, who can expect not to find the document elsewhere on the website.
Many APIs use versioning to keep the current namespace clean while still supporting the old version. It goes like /v1/getPosts, /v2/post/get, etc.
This might be argument in favor of preemptively adding another level of hierarchy to your URLs, so that when the time comes to move the entire site to a new backend, you can just proxy the entire /2019/ hierarchy to a compatibility layer.
But who are we kidding, we live in a world where 12 months of support for a widely used framework is called "LTS". Nobody seems to care whether their domains will even resolve in 10 years.
> Many APIs use versioning to keep the current namespace clean while still supporting the old version. It goes like /v1/getPosts, /v2/post/get, etc.
I think that is an important thing to do (as I agree desigining URIs is important!) I am just not sure it is reasonable, or even desireable, for them to be indefinately maintained. I am not sure what the right timeframe for deprecation would be either.
URLs should be maintained even if the content is gone. At the very least you can give a useful HTTP return code, a permanent redirect or gone is more useful than a catch-all 404. You're either bridging old links to current links or telling visitors the old content has been removed.
I don't think it would be feasible to maintain a fully functional copy of a dynamically generated page at the old URL for any length of time. That's just a recipe for security nightmare, not to mention the SEO penalty for duplicate content.
301/308 redirects, on the other hand, can be maintained more or less indefinitely as long as you keep around a mapping of old URLs to new URLs. If you need to change your URLs again, you just add more entries to the mapping database. Make your 404 handler look up this database first.
One thing you can't do, though, is repurposing an existing URL. Hence the versioning. :)
> (Maintaining changes in URI philosophy and organization over tens to hundreds of years sounds like a maintenance nightmare.)
I can't see why. You maintain a table of redirects. When you change the URI organization, which would break the URIs, you add the appropriate redirects to prevent that. Then, when you change it again, you just append the new redirects without removing the old ones. If necessary, resolve redirect chains server-side. The table may grow large, but it doesn't seem much more complicated than maintaining redirects across just two generations. Am I missing something?
Why do webmasters and web developers neglect the title element?
To mitigate link rot I share URLs along with the title and a snippet from the page. A good title can act as an effective search key to find the page at its new URL even if the domain has changed.
Unfortunately, a lot of web pages have useless titles. Often sites use the same title on every page even when pages have subjects that are obvious choices for inclusion in the title (make, model & part # or article title that appears in the body element but not in the title). With the rise of social media, lots of pages now have OpenGraph metadata (e.g. og:title) set but don’t propagate that to the regular page title.
Page titles are used by search engines and by browsers to set window, tab, and bookmark titles.
Again, why do webmasters and web developers neglect the title element?
Urls are the foundation of the internet and it always makes me sad when navigating to a page results in a 404 instead.
On a somewhat related sidenote, this is why I created a note taking tool that publishes *permanent urls* - because your notes can change but your links shouldn't :)
Sometimes I think that immutability should have been a built in part of the web that way we wouldn't be so reliant on archivers like the Internet Archive. Someone still needs to store it, but it really shouldn't have been foisted onto one organization.
With 2+ decades to look back on after this was written, it turns out the author was wrong about this. Web surfers often get 404 errors or invalid deep links get automatically redirected to the main landing page -- and people do not lose confidence in the site owner. Broken links to old NYTimes articles or Microsoft blog posts don't shake the public's confidence in those well-known companies. Link rot caused by companies reorganizing their web content is just accepted as a fact of life.
This was the 2013 deep link I had to USA Treasury rates: http://www.treasurydirect.gov/RI/OFNtebnd
... and the link is broken now in 2021. I don't think that broken link shakes the public's confidence in the US government. Instead, people just use Google/Bing to find it.