Hacker News new | past | comments | ask | show | jobs | submit login
Cool URIs don't change (1998) (w3.org)
309 points by Tomte on June 17, 2021 | hide | past | favorite | 140 comments



>When you change a URI on your server, you can never completely tell who will have links to the old URI. [...] When someone follows a link and it breaks, they generally lose confidence in the owner of the server.

With 2+ decades to look back on after this was written, it turns out the author was wrong about this. Web surfers often get 404 errors or invalid deep links get automatically redirected to the main landing page -- and people do not lose confidence in the site owner. Broken links to old NYTimes articles or Microsoft blog posts don't shake the public's confidence in those well-known companies. Link rot caused by companies reorganizing their web content is just accepted as a fact of life.

This was the 2013 deep link I had to USA Treasury rates: http://www.treasurydirect.gov/RI/OFNtebnd

... and the link is broken now in 2021. I don't think that broken link shakes the public's confidence in the US government. Instead, people just use Google/Bing to find it.


> people do not lose confidence in the site owner

I don't agree. I know Microsoft will move or even outright delete important content because chasing shiny new ideas will get somebody promoted. And so I direct people away from Microsoft solutions because that's going to make my life easier.

Any link to Microsoft documentation that doesn't both go via aka.ms and have a significant user community to ensure the aka.ms link stays correct will decay until it's useless.

So by the time you read this https://aka.ms/trustcertpartners may point at a page which is now styled as a wiki or a blog post or a video, but it will get you the list of CAs trusted by Microsoft's products.

However links from that page (e.g. to previous lists) are internal Microsoft links and so, unsurprisingly, they've completely decayed and are already worthless.

For the US Treasury, just like Congress or the Air Force, I don't have some alternative choice, so it doesn't really matter whether I think they're good or bad at this. But Microsoft is, to some extent at least, actually in a competitive business and I can just direct people elsewhere.


>I don't agree. [...] And so I direct people away from Microsoft solutions

I don't doubt there is a narrower tech audience (e.g. HN) that would base tech stack strategy on which company has the most 404 errors but my comment was specifically responding to the author's use of "generally".

I interpreted "they generally lose confidence in the owner of the server" -- as making a claim about the psychology of the general public non-techie websurfers. TBL had an intuition about that but it didn't happen. Yes, people are extremely annoyed by 404 errors but history has shown that websurfers generally accept it as a fact of life.

In any case, to followup on your perspective, I guess it's possible that somebody avoided C# because the Microsoft website has too many 404 errors and chose Sun Java instead. But there were lots of broken links in sun.java.com as well (before and after Oracle acquisition) so I'm not sure it's a useful metric.


Why can't it be both? They accept it as a fact of life but also lose confidence. I understand why links get broken and accept that it can happen but when it does, I try to stray away from that website. Non-techie websurfers probably don't even understand why the link is broken so they get even more confused. One one hand they might think it's their fault but on the other hand, they might assume the entire website stopped working.


> And so I direct people away from Microsoft solutions because that's going to make my life easier.

Ah, so that’s why Microsoft’s financials have been tanking.


>I know Microsoft will move or even outright delete important content because chasing shiny new ideas will get somebody promoted. And so I direct people away from Microsoft

Do the other providers that you recommend in place of Microsoft have a better record of not changing/removing URIs on their website?


Yes. You might be underestimating just how bad Microsoft are at this.

I was originally going to say that although the links on the Microsoft page currently work they can't be expected to stay working. They had worked when I last needed them after all. But that was at least a year ago and so I followed one, and sure enough meanwhile Microsoft had once again re-organised everything and broke them all ...

Even the link to their current list as XML (on Microsoft's site) from this page is broken, it gives you some XML, as promised, but I happen to know the correct answer and it's not in that XML. Fortunately the page has off-site links to the CCADB and the CCADB can't be torched by somebody in a completely different department who wants to show their boss that they are an innovator so the data in there is correct.

Microsoft provides customer facing updates for this work by the way. The current page about that assures you that updates happen once per month except December, and tell you about January and February's changes. Were there changes in March, April, or May? Actually I know there were, because Rob Stradling (not a Microsoft employee) mirrors crucial changes to a GitHub repository. Are they documented somewhere new that I couldn't find? Maybe.


thank god for the wayback machine ;)


> and people do not lose confidence in the site owner.

No, but some of us do get incredibly pissed off with them.

It's unendingly tiresome to find that some piece of content has disappeared from the web, or been moved without a redirect. Often archive.org can help but there's plenty of stuff that's just ... gone.

I don't necessarily run into this problem every day, but anything up to two or three times a week depending on what I'm looking for.


I see this just as how ads are perceived to be annoying. People can complain all the time but this annoyance, or in this case distrust, just doesn't seem to affect anything.


Gwern found that placing banner ads on his site significantly decreased traffic: https://www.gwern.net/Ads


It's a good comparison. An effect can be real, and people just not notice it, ever. Let's round off the ad effect to 10% loss of users. How do you notice that? You can simulate out traffic, and decrease the mean 10% at some random point and draw time-series: it's actually quite hard to see, particularly with any kind of momentum or autocorrelation trends. And that's with website traffic where you can quantify it. How do you measure the impact on something more intangible? If people can not notice that effect of ads, they can certainly not notice subtler effects like the long-term harm of casually breaking links...

Is <10% worth caring about? It's certainly not the difference between life and death for almost everyone; no successful business or website is going to shut down because they had ads, or because they broke old URLs. On the other hand, <10% is hardly trivial, people work hard for things that gain a lot less than that, and really, is defining redirects that hard?


Speaking of noticing small effects, Mechwarrior Online was four months into its open beta before anyone noticed that an expensive mech upgrade that was supposed to make your weapons fire 5% faster actually made them fire 5% slower. http://www.howtospotapsychopath.com/2013/02/20/competitively...


This principle seems like it extends far beyond site traffic, especially since something like life satisfaction is much harder to measure.


Yes, it's a kind of slippery slope: as typically set up, changes are biased towards degrading quality. If you run a normal significance test and your null is leaving it unchanged, then you will only ever ratchet downwards in quality: you either leave it unchanged, or you have a possibly erroneous choice which trades off quality for degradation, and your errors will accumulate over many tests in a sorites. I discuss this in the footnote about Schlitz - to stop this, you need a built-in bias towards quality, to neutralize the bias of your procedures, or to explicitly consider the long-term costs of mistakes and to also test quality improvements as well. (Then you will be the same on average, only taking tradeoffs as they genuinely pay for themselves, and potentially increasing quality rather than degrading it.)


I swear this happens every time I visit a form. Either the 1 image I need has been purged or every post links to some other dead form. I've had pretty bad luck with archive.org for these things.


Yes, because the examples you give already have a solid reputation.

You can get away with it when you are "the government .gov", but if you are "small start-up.com" and I want to invest in your risky stocks and all I get is an "Oops page" when I want to know a little more about your dividend policy, I'm gone.


For me it depends on where the link originated. If it's a 404 within your own site, that shows sloppiness that will make me question your ability to run a business. However, if I get a 404 on a link from an external site, I'll assume things have been reorganized and try to find the new location myself (while wondering why it was moved). Changing URIs without setting up forwarding is so common I wouldn't give it much thought.


> Web surfers often get 404 errors or invalid deep links get automatically redirected to the main landing page -- and people do not lose confidence in the site owner.

I do, to a degree. When I follow some old link to a site and realize they've changed their structure and provided no capability to find the original article if it's still available but at a new URI, I lose confidence.

Not in them overall, but in their ability to run a site that's easy to find what you want and that's usefulness over time isn't constantly undermined. I lose confidence in their ability to do that, which affects how I view their site.

> ... and the link is broken now in 2021. I don't think that broken link shakes the public's confidence in the US government.

Maybe not the Govt itself, but in their ability to provide information through a website usefully and in a way that's easy to understand and retains its value? I'm not sure most people had the confidence to begin with required for them to lose it.


Consumers will accept a lot of abuse before their faith is completely shaken, especially with large companies. There's still nothing cool about it.


I disagree. In your example, it doesn't cause me to loose faith in the US treasury itself, but I am less confident that they can reliability and correctly run a website.


It is more like, I lose confidence in whatever claims (if any) the links were meant to support and confidence that the linked-to site can be linked to as a source.


I suspect it depends on what the existing reputation is. I recall a discussion here on Hacker News where someone asked why people weren't using Google+ for software blog posts anymore, and the response was that they'd broken all the links.

I'm sure that wouldn't have been an issue if it was super popular and respected, but it was already fading away at that point.


The people who have to navigate the MS documentation are not generally the same people who choose to use MS as a supplier. If you're that big and rich you will have totally separate engagements with different layers of a business. For most suppliers that's not the case, and documentation is a competitive advantage.


I don't know. If I consistently see 404s, I eventually associate that domain with "broken" and start avoiding it. Especially in search scenarios where there's lots of links to choose from.


I've seen some crazy stuff driven by the fear of broken links. One place I worked had all the traffic for their main landing page hitting a redirect on a subsystem because the subsystem had a url that was higher in google etc. I worked on the subsystem and rather then fix things in DNS and the search engines they preferred to expect us to keep our webservers up at all times to serve the redirect. We were on physical hardware in those days and while we had redundancy it was all in one location, made for some fun times.


Well, the claim is that people "generally lose confidence," which I'd interpret as a decrease in confidence, not a total destruction of confidence. Microsoft and the US treasury have some dead links, sure, but they run big sites. Most the vast majority of links that you'd encounter through normal browsing lead to reasonable places.


Yes I agree to the point, that I wouldn't lose confidence in the company.

However, it is extremly annoying and it happend to me just recently with Microsoft, when I researched PWAs... I had bookmarked a link to their description on how PWAs are integrated into Microsoft Store. I could only find it on archive.org...


I agree. It matters a lot more if the content is (re)discoverable.


Ah, the digital naiveté of the nineties. Nowadays, cool URLS get changed on purpose so you have to enter the platform through the front instead of bouncing off your destination.


And they get loaded down with query string fields for analytics. I kill those with https://github.com/ClearURLs/Addon


I wonder if there is a similar extension that leaves in the UTM parameters but fills them with curse words and other sophomoric junk data.


Good idea, but risky if not done right or not adopted by many people. You'd be increasing your specificity and therefore the ability to track you.


Good point. If you’re the only user with utm_origin=boogers&utm_medium=poop, it’ll be trivial to connect you between websites. It’s all automated, so there’s slim to no chance of making a server admin chuckle while checking the logs, unfortunately.


I keep checking to see when this add-on will get approved to Firefox Android's anointed list


If you are using Firefox Nightly, you can install arbitrary add-ons (though quite a number of them don't work due to UI difference, container doesn't work too) now after going though some steps: https://blog.mozilla.org/addons/2020/09/29/expanded-extensio...


Good thing I still keep the version pinned at 68!


What's happening with this project? The rules repo hasn't been updated for months even with several PRs


I'm not aware of anyone doing this on purpose. It's bad for bookmarks, it's bad for SEO, bad for internal linking, and external linking.

URLs change because systems change.


A proposed solution to this: https://jeffhuang.com/designed_to_last/


Characterizing TBL as naive is absurd. He was right. And the URLs you describe are anything but cool.


I'm fairly sure the parent was joking and in full agreement with you. Just so this isn't a point out the joke post, a few years ago I saw a HN post from a bunch of people who used print to PDF, and pdfgrep as a bookmarking solution. It doesn't solve the original problem, but it does act a a coping strategy for when content goes missing. I've been using it for a good while now and it's been real useful already.


Fantastic way to lose your traffic though.

Someone follows a link, gets a 404 or main page and they're gone.


It's not traffic they are after, but monetization. The traffic that does not monetize only costs money.


Sigh. I remember when people used to put up websites for fun.


Also when example.com/2010/10/10/article-title redirects to the homepage m.example.com


Up there with the Netiquette. I'd love a timeline where this matters, but this ship has sailed long, long ago.


Past related threads:

Cool URIs Don't Change (1998) - https://news.ycombinator.com/item?id=23865484 - July 2020 (154 comments)

Cool URIs Don't Change - https://news.ycombinator.com/item?id=21720496 - Dec 2019 (2 comments)

Cool URIs don't change. (1998) - https://news.ycombinator.com/item?id=21151174 - Oct 2019 (1 comment)

Cool URIs don't change (1998) - https://news.ycombinator.com/item?id=11712449 - May 2016 (122 comments)

Cool URIs don't change. - https://news.ycombinator.com/item?id=4154927 - June 2012 (84 comments)

Tim Berners-Lee: Cool URIs don't change (1998) - https://news.ycombinator.com/item?id=2492566 - April 2011 (25 comments)

Cool URIs Don't Change - https://news.ycombinator.com/item?id=1472611 - June 2010 (1 comment)

Cool URIs Don't change - https://news.ycombinator.com/item?id=175199 - April 2008 (9 comments)


Ha, tell that to anyone at *.microsoft.com I swear every link to a MS doc / MSDN site I have clicked 404s or goes to a home page


Yesterday I found a link on a MS document that went via a protection.outlook.com redirect, and another link on the same page linked to itself


This is just a trick to hide that the documentation is terrible and useless. If we get to a “this content doesn’t exist” then we think that there is an answer to our question, we just can’t find it. We eventually give up, blaming ourself for not finding it; instead of finding and getting something stupid and not useful.


They can't conceive of anyone coming to a page from anywhere other than an internal recently generated page :(


And Oracle as well. Good luck trying to find old software and/or documentation links that don't 404.


Old software? It's difficult enough to find documentation on the current Oracle database. You only get random versions through a search engine, and can neither navigate into a different one, nor replace the version on the URL and get to a functioning page. Also, you can't start from the top and try to follow the table of contents because it's completely different.


> What to leave out

> [..]

> File name extension. This is a very common one. "cgi", even ".html" is something which will change. You may not be using HTML for that page in 20 years time, but you might want today's links to it to still be valid. The canonical way of making links to the W3C site doesn't use the extension.(how?)

And the page URL is [...]/URI.html


It works fine without the suffix as well: https://www.w3.org/Provider/Style/URI


Since I switched from ASP to PHP (2008), I avoided file extensions in page URI in most cases, and instead placed every page into its own folder. This is compatible with every web server without using rewrite rules.

When I switched from PHP to static generators (2017), most URIs continued working without redirects.


Although there is no redirect, the proposed best practice URL also works: https://www.w3.org/Provider/Style/URI


To be fair, it's the OP that chose to use the URL with the extension. However, you could say WC3 could've disable their servers from automatically creating URLs with file extension if they wanted to follow this.

1. https://www.w3.org/Provider/Style/URI works

2. If https://www.w3.org/Provider/Style/URI.html is ever 404 then you get a very useful 300 page (e.g., try https://www.w3.org/Provider/Style/URI.foobar)

3. However, from (2) you can see Spanish page is encoded as `.../URL.html.es` which is bad because `.../URI.es` does not exist


A page URL. I guess this could be considered the canonical example of hn not always linking to the coolest source?


> You may not be using HTML for that page in 20 years time

Yes it might be all Flash.


I'm still not sure why Tag-URI[1] hasn't gained more support:

> The tag algorithm lets people mint — create — identifiers that no one else using the same algorithm could ever mint. It is simple enough to do in your head, and the resulting identifiers can be easy to read, write, and remember.

They would seem to solve at least part of the problem Berners-Lee opined about:

> Now here is one I can sympathize with. I agree entirely. What you need to do is to have the web server look up a persistent URI in an instant and return the file, wherever your current crazy file system has it stored away at the moment. You would like to be able to store the URI in the file as a check…

> You need to be able to change things like ownership, access, archive level security level, and so on, of a document in the URI space without changing the URI.

> Make a database which maps document URN to current filename, and let the web server use that to actually retrieve files.

Not a bad idea, if you have a good URI scheme to back it up.

I even wrote a Ruby library for it[2] (there are others[3][4] but no Javascript one that I can find, it being the language that produces the worst URLs and doesn't seem to have a community that cares about anything that happened last week, let alone 20 years ago, that's not a surprise)

[1] http://taguri.org/

[2] https://github.com/yb66/tag-uri/

[3] https://gitlab.com/KitaitiMakoto/uri-tag/

[4] https://metacpan.org/pod/URI::tag


I recall it was almost axiomatic that the more expensive the content management system, the worse the URLs it produced.

I completely believe in the spirit of TBL's statement, and wish the internet had evolved to followed some of the ideals implicit there in. I recall that people took some pride in the formatting of their HTML so that "view source" showed a definite structure as if the creator was crafting something.

Now for the most part, URLs are often opaque and "view source" shows nothing but a bunch of links to impenetrable java script. I actually wonder when "view source" is going to be removed from the standard right-click window as it barely has meaning to the average user any more.


I expect to eventually browsers be split into users and developer editions, with the tooling only available on developer edition.

Actually I am quite surprised that it hasn't happened yet.


Hasn't that already happened? The user browsers are on phones and tablets; the developer browsers are on laptops and desktops.


Which is why you can still install software you want from any source on laptops (for the time being), but have to go through Apple to get software on your phone.

The end goal is to no longer have any thick device. Engineers will store and manage all of their code in the cloud, then pay a subscription fee to access it.

Rent everything, own nothing.

I bet TBL hates this just as much as the thing the web of documents and open sharing mutated into.


Not wrong, the mobile market is massive.

But lots of users still use a desktop browsers for working jobs, especially now a days.


Mozilla making your point for you: Firefox Developer Edition (https://www.mozilla.org/en-US/firefox/developer/)


Compared to the regular Firefox, there aren't any extra features (other than features that will reach the regular version in a few weeks, since Dev Edition is an alpha/beta of the next release). It's just a branding thing, a few extras by default, and a separate browser profile (might come in handy too).


Why?


Yeah, seems a bit pointless to split builds for no reason.

You always want to test on the browser users will ultimately use anyways, even if you have guarantees the code works exactly the same for dev and user editions.


Because it has been a common trend in computing to split consumer and development devices.


For browsers, the opposite has been true. Developer tools that you once installed as extensions (hello Firebug) are now shipped in the browser.


Firefox's current devtools shipped beginning with Firefox 4. It was only Firefox 3.x that shipped without any kind of tooling for web developers. Prior to Firefox 3, the original DOM Inspector was built-in by default (just like the case with Mozilla Suite and SeaMonkey).


I know, just expect it to turn full circle.


It kind of did with mobile / desktop.


View source doesn't show anything that cURL doesn't, and browser inspectors are far more useful. Still, I would hate to see view source disappear.


Inspector has more features, but for me view source with CTRL-F is more performant


Inspector shows what the browser is currently rendering. View source might match that, or it might be something subtly (or not so subtly) different.


Except in Microsoft Edge, where it hangs.


The DOM view on the other hand is super useful as it is effectively a de-obfuscated version of what those javascript soup create.

And you can remove things like popups instead of agreeing to them.


Perhaps 'view source' should link to the authors github, highlighting the repo that can be built to produce the page.

If the repo can't reproduce the contents of the page, simply treat that as an error and don't render the page.


Yes because all code is available via GitHub.


(or other code repo)


Barely any commercial sites have public repos though.

I'd even say barely any sites in general outside of dev blogs + projects.

And even then they always link to the repo.


If you actually want to do this, you could have the browser check for source maps, and refuse to render any pages that did not provide them: https://developer.mozilla.org/en-US/docs/Tools/Debugger/How_...

(Of course most pages are not willing to make their raw source available, so you would see a lot of errors.)


... and a "go fuck yourself" to anybody who publishes pages that simply don't require source maps to begin with? I admit that would at least be in line with the current trend of prioritizing/encouraging the kinds of authors and publishers who engage in (the mess of) what is considered "good" modern Web development.


I mean, I think the whole idea of only showing websites with unminified source available is silly, but I'm willing to think about the idea ;)

I think a source map is a better approach than a GitHub link, is was all I was saying!

It also probably wouldn't be too hard to make some heuristics to figure out whether scripts are unminified and not require a source map. It wouldn't be 100% accurate, and it wouldn't avoid some form of intentional obfuscation that still uses long names for things, but it would probably work pretty well.


Refer to <https://www.gnu.org/software/librejs/> (caveat: very broken and currently under-resourced; see <https://lists.gnu.org/mailman/listinfo/bug-librejs>).

> a source map is a better approach than a GitHub link

def.


I do this with http://chriswarbo.net

Most HTML pages are generated from a corresponding Markdown file, in which case a "View Source" link is placed in the footer which goes to that Markdown file in git.


Tell it to Tor. They are destroying every URI created, indexed, known, and used over the last 15 years on Oct 15th, https://blog.torproject.org/v2-deprecation-timeline . Because of this indieweb has refused to add tor onion service domain support for identity.


Onionland people had plenty of warning though. My old v2 Onion bookmarks are all discarded. The new V3 addresses are a good indicator of which .onion operators are serious and want to stay online no matter what.


I consider myself pretty active on tor. I've hosted a number of services for a decade. While I heard about the new Tor v3 services a long time ago I didn't hear about Tor v2 support being completely removed until April 2021. It was quite a shock.


At the end of the day, it's simply an unrealistic expectation that URIs don't change. And declaring for 20+ years that this isn't "cool" isn't going to change a thing about it...

The web is dynamic, not static. Sites come and go, pages come and go, content gets moved and updated and reorganized and recombined. And that's a good thing.

If content has moved, you can often use Google to figure out the new location. And if you want history, that's what we fortunately have archive.org for.

For any website owner, ultimately it's just a cost-benefit calculation. Maintaining old URIs winds up introducing a major cost at some point (usually as part of a tech stack migration), and at some point benefits from that traffic isn't worth it anymore.

There's no purist, moral, or ethical necessity to maintain URIs in perpetuity. But in reality, because website owners want the traffic and don't want to disrupt search results, they maintain them most of the time. That's good enough for me.


It is very sad that this "cost-benefit" calculation ends up ruining the web. Devs too lazy to learn the basics and doing everything with React and a gazzilion of libs, nobody caring about accessibility, nobody caring about the url structure. It is very possible to go through 50 CMSes and still keep the URLS intact. Just that we forgot the users and UX moved away giving up the place for the DX.

I still hope for the new web standards revoliution. Zeldmans of XXI century, where are you?


> it's simply an unrealistic expectation that URIs don't change.

I disagree. If I generate urls systematically, then when I change the scheme I can easily send a 302 with the new url.

It’s also not hard to just keep a mapping table with old and new.

I think that it’s lazy programmers who can’t handle it. Or chaotic content creators.

I’ve had many conversations with SharePoint people who frequently change urls by renaming files and then just expect everyone linking to it to change their links. They seem to design content without ever linking because links break so much.

It’s the damndest thing as it makes content hard to link to and reuse. People are too young to even care about links and stuff.

Of course if SharePoint search didn’t suck it wouldn’t be as much of an issue.


> I can easily send

Honestly that's just not true. If you upgrade to a new CMS/system, the way it routes URLs can be completely incompatible with the old format and it just can't accomodate it.

And if you're dealing with 10,000,000's of URL's and billions of pageviews per month, you're talking about setting up entire servers and software in front of your CMS dedicated just to handling all the redirect lookups.

And you also easily run into conflicts where there is overlap between the old naming convention and the new one, so a URL still exists but it's to something new, while the old content is elsewhere.

Yes it's possible. But the idea that it's "not hard" is also often very false.

> I think that it’s lazy programmers

No, it's the managers who decided it wasn't worth paying a programmer 2 or 4 weeks to implement, because other programming tasks were more important and they employ a finite number of programmers.

For commercial websites, it's not laziness or "chaos". It's just simple cost-benefit.


This is a recommended read on the subject "Designing a URL structure for BBC programmes" https://smethur.st/posts/176135860.

Clean user friendly urls are often at odds with persistent urls because of the information you can't keep in a url for it to be truly persistent.


It's easy to put a reverse proxy in front of your service to translate old permanent URLs into whatever you're running currently. It's a table of redirects, and perhaps a large one, but it doesn't require a lot of logic to handle.


Jeremy Keith's 11 year long bet against cool URLs (specifically on longbets.org) comes up next year -- and it looks like he might lose it https://longbets.org/601/


Looks like he will lose the bet.

Interestingly though, the original URL was: http://www.longbets.org/601


> A 301 redirect from www.longbets.org/601 to a different URL containing that text would also fulfill those conditions.


Yes. Good thing it's still an "HTML document". One could argue that a single page app loading content via json would not qualify.


We've been thinking about link rot since then, and yet we're still letting it happen without much care. Kinda' sad, but at least we're resting on the shoulders of archive.org.


So much of it is unnecessary. Any stats on link rot in 2021? Last I remember it's something like 10% of all links every year.


This is my battle-proven (tired and true) checklist for what I call targeted pages. (Pages that are entry pages for users, mostly via organic Google traffic) https://www.dropbox.com/s/zfwd331ehrucgw4/targeted-page-chec...

And these are the URL rules I use. Whenever I make any compromise on them or change the priority, I regret it down the road. Nr 1 and nr 2 are directly taken form the OP article.

  URL-rules
    - URL-rule 1: unique (1 URL == 1 resource, 1 resource == 1 URL)
    - URL-rule 2: permanent (they do not change, no dependencies to anything)
    - URL-rule 3: manageable (equals measurable, 1 logic per site section, no complicated exceptions, no exceptions)
    - URL-rule 4: easily scalable logic
    - URL-rule 5: short
    - URL-rule 6: with a variation (partial) of the targeted phrase (optional)
URL-rule 1 is more important than 1 to 6 combined, URL-rule 2 is more important than 2 to 6 combines, ... URL-rule 5 and 6 are always a trade-of. Nr 6. is optional. A truly search optimized URL must fulfill all URL-rules.


Interestingly the National Science Foundation URLs they used as examples of URLs that probably won't work in a few years all still work.


My experience is that most developers don't even check the access logs to see what impact removing/changing an endpoint will have.

Access logs are underrated. It is not some legacy concept. It is supported my all new platforms as well, including various Kubernetes setups.


I opened my website in 2006. Since 2009, I've kept all the links unchanged or redirected through two rebuilds.

It's a lot of work because I need to customize URI structure in blog systems and put complex rewrite rules.

To ensure the setup is more or less correct, I have a bash script that tests the redirects via curl. https://bitbucket.org/yoursunny/yoursunny-website/src/912f25...


I've a bunch of small sites whose links end in ".php" despite they no longer run on PHP.


> I've a bunch of small sites whose links end in ".php" despite they no longer run on PHP.

Kudos for maintaining that compatibility, but it seems that this kind of thing is addressed in the linked document:

> File name extension. This is a very common one. "cgi", even ".html" is something which will change. You may not be using HTML for that page in 20 years time, but you might want today's links to it to still be valid. The canonical way of making links to the W3C site doesn't use the extension.


Back then we didn’t have routers


I'm not sure if they still use it but a site I redid for a company many moons ago in Perl had Apache set to treat .htm files as CGI scripts so we didn't have to change the previous, static site's URL's.

The original site ran on IIS and was old enough that MS software at the time still used three letter extensions for backwards compatibility, around 1997-1998 IIRC.


Ironically this was linked to with the .htm extension link but it was nice to discover that the non-page specific URI also worked on this site: https://www.w3.org/Provider/Style/URI


> There are no reasons at all in theory for people to change URIs (or stop maintaining documents), but millions of reasons in practice.

I am not convinced that some of the reasons are not theoretical as well. The article in general seems very dismissive of the practical reasons to change URIs, like listing only the 'easily dismissable' ones to begin with.

> The solution is forethought - make sure you capture with every document its acceptable distribution, its creation date and ideally its expiry date. Keep this metadata.

What to do with an expired document?

Don't get me wrong, it's good to strive for long lasting URIs and good design of URIs is important, but overall I believe a good archiving solution is a better approach. It keeps the current namespaces cleaner and easier to maintain. (Maintaining changes in URI philosophy and organization over tens to hundreds of years sounds like a maintenance nightmare.)

The archiving solution should have browser support: If a URI can not be resolved as desired, it could notify the user of existing historic versions.


An expired document can be removed, and a "410 Gone" response returned. This is clearer for the user, who can expect not to find the document elsewhere on the website.


> The archiving solution should have browser support: If a URI can not be resolved as desired, it could notify the user of existing historic versions.

If you use this add-on: https://addons.mozilla.org/nl/firefox/addon/wayback-machine_... it will pop up a box when it sees a 404 offering to search the wayback machine for older versions.



Brave does it natively. It's surprising to see how many pages return 404 before redirecting to a live version.


Many APIs use versioning to keep the current namespace clean while still supporting the old version. It goes like /v1/getPosts, /v2/post/get, etc.

This might be argument in favor of preemptively adding another level of hierarchy to your URLs, so that when the time comes to move the entire site to a new backend, you can just proxy the entire /2019/ hierarchy to a compatibility layer.

But who are we kidding, we live in a world where 12 months of support for a widely used framework is called "LTS". Nobody seems to care whether their domains will even resolve in 10 years.


> Many APIs use versioning to keep the current namespace clean while still supporting the old version. It goes like /v1/getPosts, /v2/post/get, etc.

I think that is an important thing to do (as I agree desigining URIs is important!) I am just not sure it is reasonable, or even desireable, for them to be indefinately maintained. I am not sure what the right timeframe for deprecation would be either.


URLs should be maintained even if the content is gone. At the very least you can give a useful HTTP return code, a permanent redirect or gone is more useful than a catch-all 404. You're either bridging old links to current links or telling visitors the old content has been removed.


I don't think it would be feasible to maintain a fully functional copy of a dynamically generated page at the old URL for any length of time. That's just a recipe for security nightmare, not to mention the SEO penalty for duplicate content.

301/308 redirects, on the other hand, can be maintained more or less indefinitely as long as you keep around a mapping of old URLs to new URLs. If you need to change your URLs again, you just add more entries to the mapping database. Make your 404 handler look up this database first.

One thing you can't do, though, is repurposing an existing URL. Hence the versioning. :)


> (Maintaining changes in URI philosophy and organization over tens to hundreds of years sounds like a maintenance nightmare.)

I can't see why. You maintain a table of redirects. When you change the URI organization, which would break the URIs, you add the appropriate redirects to prevent that. Then, when you change it again, you just append the new redirects without removing the old ones. If necessary, resolve redirect chains server-side. The table may grow large, but it doesn't seem much more complicated than maintaining redirects across just two generations. Am I missing something?


Why do webmasters and web developers neglect the title element?

To mitigate link rot I share URLs along with the title and a snippet from the page. A good title can act as an effective search key to find the page at its new URL even if the domain has changed.

Unfortunately, a lot of web pages have useless titles. Often sites use the same title on every page even when pages have subjects that are obvious choices for inclusion in the title (make, model & part # or article title that appears in the body element but not in the title). With the rise of social media, lots of pages now have OpenGraph metadata (e.g. og:title) set but don’t propagate that to the regular page title.

Page titles are used by search engines and by browsers to set window, tab, and bookmark titles.

Again, why do webmasters and web developers neglect the title element?


Urls are the foundation of the internet and it always makes me sad when navigating to a page results in a 404 instead. On a somewhat related sidenote, this is why I created a note taking tool that publishes *permanent urls* - because your notes can change but your links shouldn't :)

https://wiki.dendron.so/notes/2fe96d3a-dcf9-409b-8a09-fdaa5a...


Cool URIs now return a 200, redirect you to index.html, and then show a 404 page.


Sometimes I think that immutability should have been a built in part of the web that way we wouldn't be so reliant on archivers like the Internet Archive. Someone still needs to store it, but it really shouldn't have been foisted onto one organization.


One of the things that hype me about IPFS is that URIs can't change. Resources are immutable and the URI is only the hash of the content.

https://ipfs.io/


I did a small script to go through my past tweets. An awful lot of the links I mentionned in them DO NOT WORK anymore. Such a pity.


I bet you not even 50% of your bookmarks are still working. It is a shame.


tomte, why do you keep resubmitting these old things over?

Recent discussion on this one's last submission was not even a year ago


The original 1998 URL to this article was http, not https. It changed.


It did not change. http is still served but if you are getting https it is due to hsts. Regardless; redirect != it changed


Note that HTTP is also upgraded if the UA requests it with the Upgrade-Insecure-Requests header:

  $ curl -I -H 'Upgrade-Insecure-Requests: 1' 'http://www.w3.org/Provider/Style/URI.html'
  HTTP/1.1 307 Temporary Redirect
  location: https://www.w3.org/Provider/Style/URI.html
  vary: Upgrade-Insecure-Requests


$ curl -i 'http://www.w3.org/Provider/Style/URI.html'

HTTP/1.1 200 OK

date: Thu, 17 Jun 2021 10:18:14 GMT

last-modified: Mon, 24 Feb 2014 23:09:53 GMT ...


  $ curl -I 'https://www.w3.org/'
  HTTP/2 200 
  [...]
  strict-transport-security: max-age=15552000; includeSubdomains; preload
  content-security-policy: upgrade-insecure-requests


The original URL still works, that's the key.


Fair point but I believe the article refers to URI's rather than URL's so they're still technically correct.

Edit - I've just had a look and URI covers the lot so my bad


In the next 5-10 years the https:// will be amp:// when Google completes its process of consuming the web like noface in Spirited Away.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: