Hacker News new | past | comments | ask | show | jobs | submit login
Cool URIs Don't Change (1998) (w3.org)
182 points by bpierre on Dec 4, 2021 | hide | past | favorite | 72 comments



Microsoft is probably one of the worst offenders, especially in the past few years. It seems like they're actively destroying documentation and making it hard to find important information, so much that I often use archive.org instead.


Apple may be worse. They move documentation to the Documentation Archive and don't replace it. The Archive is a giant mass of "outdated", no longer updated documents, each assigned to 1 category. The Archive only has a title search now; full text search broke years ago.

All documentation on Help Books was archived, for instance. It's been 7 years since they've seen an update and they now contain inaccuracies – but there are no other official guides. Check out that UI: https://developer.apple.com/library/archive/documentation/Ca...

This is a technology that is still used. Nearly all of Apple's own apps have Help Books, including new ones like Shortcuts. Yet they have absolutely no official documentation on using that technology.


> Check out that UI

Off-topic, but damn - tune down that candyness a bit and it looks much better and cleaner than what's there today.


That page looks great, I don't know what you're talking about. Are you bothered by 2 very plain gradients? I don't see candy at all


I understand it as “a side note: ui on these screenshots looks very contrast and saturated, but still better than any modern [flat blob] ui”.


I suspect the commenter is referring to the same thing I was — the Aqua screenshot on the guide.


I don't know what it is that makes Microsoft so inelegant. Not only what you've said, but their APIs / programming environment in general is ugly and (I presume) unwieldy. Their apps (I just switched to Excel / OneNote / etc. from Google) have bugs that don't exist with competitors. The other day I couldn't use OneNote because my Internet went down (?!). Same for Excel, it doesn't reload immediately upon reconnect like Google Sheets did.

I don't get Microsoft. They're huge. They hire a lot of people. Their products are kludges.


backwards compatibility is a major reason why their APIs are so ugly. I always assumed that was a core company value. ironic that they turn around and break links to their own docs.



There are over a thousand redirects in an Apache config file for a company I contracted with. The website was 20 when I worked there, it is now 26 years and AFAIK they still stick to this principle. And it's still a creaky old LAMP stack. It can be done, but only if this equation holds:

  URL indexing discipline > number of site URLs
(There was no CMS, every page was hand-written PHP. And to be frank, maintenance was FAR simpler than the SPA frameworks I work with today.)


So what happen to that URN discussion? It has been 20 years. Have there been any results I can actually use on the Web today? I am aware that BitTorrent, Freenet and IPFS use hash based URIs, though none of them are really part of the actual Web. There is also rfc6920, but I don't think I have ever seen that one in the wild.

Hashes aside, allowing linking to a book by it's ISBN doesn't seem to exist either as far as I am aware, at least not without using Wikipedia's or books.google.com's services.


Twenty years on, and I can still link to any item at Amazon as long as I have it's ASIN, and using the template:

    https://www.amazon.com/exec/obidos/ASIN/<asin id>
Say what you will about Amazon (and Jeff Bezos), but I don't think they've broken a URL to any product of theirs ever.


Not broken perhaps, but I regularly click a link to a product and get a page about a totally different product.


My understanding is that product pages aren’t immutable, so sellers sometimes change their content instead of making a new page to keep the SEO the original page accumulated.

Having public edit history + permalinks to specific immutable revisions of product pages would be nice, but I could see how they’re not incentivized to add it because they don’t want ppl demanding an older price / feature that got edited out later on.


IEEE Xplore at least uses DOIs for research papers. Don't know if anyone else does, though.


Everyone uses DOIs for research papers, and https://doi.org/<DOI> will take you there. In fact, I think the URI form is now the preferred way of printing DOIs.


Cool rules of thumb don't run contrary to human behaviour and/or rules of nature.

If what you want is a library and a persistent namespace, you'll need to create institutions which enforce those. Collective behaviour on its own won't deliver, and chastisement won't help.

(I'd fought this fight for a few decades. I was wrong. I admit it.)


People can know what good behaviour is, and not do good; that doesn't mean it isn't helpful to disseminate (widely-agreed-upon!) ideas about what is good. The point is to give the people who want to do good, the information they need in order to do good.

It's all just the Golden Rule in the end; but the Golden Rule needs an accompaniment of knowledge about what struggles people tend to encounter in the world—what invisible problems you might be introducing for others, that you won't notice because they haven't happened to you yet.

"Clicking on links to stuff you needed only to find them broken" is one such struggle; and so "not breaking your own URLs, such that, under the veil of ignorance, you might encounter fewer broken links in the world" is one such corollary to the Golden Rule.


In this case ... it's all but certainly a losing battle.

Keep in mind that when this was written, the Web had been in general release for about 7 years. The rant itself was a response to the emergent phenomenon that URIs were not static and unchanging. The Web as a whole was a small fraction of its present size --- the online population was (roughly) 100x smaller, and it looks as if the number of Internet domains has grown by about the same (1.3 million ~1997 vs. > 140 million in 2019Q3, growing by about 1.5 million per year). The total number of websites in 2021 depends on what and how you count, but is around 200 million active and 1.7 billion total.

https://www.nic.funet.fi/index/FUNET/history/internet/en/kas...

https://makeawebsitehub.com/how-many-domains-are-there/

https://websitesetup.org/news/how-many-websites-are-there/

And we've got thirty years of experience telling us that the mean life of a URL is on the order of months, not decades.

If your goal is stable and preserved URLs and references, you're gonna need another plan, 'coz this one? It ain't workin' sunshine.

What's good, in this case, is to provide a mechanism for archival, preferably multiple, and a means of searching that archive to find specific content of interest.


Are losing battles still worth fighting?

Personally I believe yes, because there are still those that benefit in the interim. Compare that to not bothering to fight at all in the first place.


There's a good recent School of Life video on rearranging deck chairs on the Titanic (Alain defends the practice).

There's also performing maintenance to defer the inevitable. I'm generally in favour of that.

But if you're getting exercised and emotional over something that simply and repeatedly proves not to work ... and there's some specific goal for that behaviour ... which can be more readily achieved by other means ... then I'd strongly suggest bailing on that fight.

TBL got some things right about the Web. He had a specific purpose in mind. The Web has (IMO very much for the worse) moved beyond that.

As I see it:

- The goal behind the advice is to create a long-term useful addressable archive of online content.

- The method advised ... has failed in this. Spectacularly.

- Other methods exist. They are being used, and work reasonably well.

- At some point you cut your losses.

That said, I'm cutting mine in this thread. Cheers.


Even though you cut your losses, I wanted to thank you for this comment. I sincerely appreciate the reading/watching material.


For the absolute vast majority of cases this is a quite a simple problem. Most don't care though.

A rewrite of a site does not rewrite all the content. It is still there, the database might have been migrated but all the information is still there and the conversion process has everything it needs to do the final step of preserving one ID, or one string, that trivially can map an old URL to a new one that contains the same actual content.

Sure, some intermediate pages might get lost but that is not something that is particularly valuable anyway and not something one usually links directly to. Don't let perfect get in the way of good enough.


Sites aren't merely re-written, they're abandoned, hosters/maintainers die, they're hijacked, bought and sold, or vanish in datacentre fires or other disasters. Entire TLDs may disappear or get transferred.

The foundations, in a word, are profoundly unstable. The entire ediface has little robustness.

If you want preservation, find a better mechanism. URIs ain't it.


Sure, but that is not really what the article is about.

And rewrites are the most common of them and also being the easiest to actually have control over, and most can't even be bothered with that.


Collective behavior can work if it’s incentivized.


Not where alternative incentives are stronger.

Preservation for infinity is competing with current imperatives. The future virtually always loses that fight.


June 17, 2021, 309 points, 140 comments https://news.ycombinator.com/item?id=27537840

July 17, 2020, 387 points, 156 comments https://news.ycombinator.com/item?id=23865484

May 17, 2016, 297 points, 122 comments https://news.ycombinator.com/item?id=11712449

June 25, 2012, 187 points, 84 comments https://news.ycombinator.com/item?id=4154927

April 28, 2011, 115 points, 26 comments https://news.ycombinator.com/item?id=2492566

April 28, 2008, 33 points, 9 comments https://news.ycombinator.com/item?id=175199

(and a few more that didn't take off)


I know it seems to be part of HN culture to make these lists, but not sure why. There's a "past" link with every story that provide a comprehensive search for anyone that is interested in whatever past discussions :-/


Immediacy and curation have value.

Note that dang will post these as well. He's got an automated tool to generate the lists, which ... would be nice to share if it's shareable.

https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...


I love that people think that @dangbot is a person. (Yes, of course there's a person behind the bot...)


If he's a bot, he's a good email conversationalist.

(I email HN fairly regularly, mostly brief suggestions/fixes/issues on posts or threads, occasionally longer.)

A bit on dang's link tool here: https://news.ycombinator.com/item?id=28436784 (that links to further discussion). This describes the tool: https://news.ycombinator.com/item?id=26158300

And on the why:

https://news.ycombinator.com/item?id=22638065

https://news.ycombinator.com/item?id=28250683


    After the creation date, putting any information in the name is asking for trouble one way or another.
Clearly these suggestions predate SEO.


and postdate :-)


that URL changed; it used to start `http:`-- now it starts `https:` -- not cool!


The HTTP url works fine still, it sends you to the right place.


Not exactly though, it only redirects you to the HTTPS version if it was set up that way. Otherwise, it will show a broken page.


but the entire point of the rule is that you should set up your sites so that old URLs continue to work.


I can't follow – does it or doesn't it?


If you can't follow, try running `curl` with the `-L` option.


thanks, a chain of proper redirects ending with a 200 constitutes a cool uri for me. So what's the commenter's problem? Can anybody elaborate? @tingletech? @laristine?


This is a big problem for me. I cite sources on my website and people frequently use them, but the German government seems hell-bent on rotating their URL scheme at least once a year for no reason. URLs to pages that still exist keep changing. I struggle to refer to anything on their websites.


Any favorite strategies for achieving this in practice, e.g. across site infrastructure migrations? (changing CMS, static site generators, etc)

Personally about the only thing that has worked for me has been UUID/SHA/random ID links (awful for humans, but it's relatively easy to migrate a database) or hand-maintaining a list of all pages hosted, and hand-checking them on changes. Neither of which is a Good Solution™ imo: one's human-unfriendly, and one's impossible to scale, has a high failure rate, and rarely survives migrating between humans.


On the personal level: For every markdown article/README I write, I pipe it through ArchiveBox / Archive.org and put (mirror) links next to the originals in case they go down.

At the organizational level: my company has a dedicated “old urls” section in our urls.py routes files where we redirect old URLs to their new locations. Then on critical projects we also have a unit test that appends all URLs ever used to a tracked file in CI and checks that they still resolve or redirect on new deployments. Any 404 for a legacy URL is considered a release-blocking bug.


Yer darn right they don't change and this is one cool URL because I remember reading this years ago in exactly the same place.

Here is a relevant Long Bet that I think about often (only has one year left to go!) https://longbets.org/601/ "The original URL for this prediction (www.longbets.org/601) will no longer be available in eleven years."


The reason this advice hasn't been taken in the last 23 years is that with all the questions and answers in this page, there's one question that was missing. "Doing the things you advised us to do in the other questions costs $X. (Or costs time and our time is worth $X.) Will you be paying us $X?"


Them: There seems to have been a misunderstanding. We thought it was clear that we'll be equal partners in this startup. You get 3% equity which is MORE than generous given that this is my life's dream I'm sharing with you (here's a 5yr NDA btw restricting you from working anywhere else in the same industry for that duration of time). So go ahead and start asap. We have all the ideas and you do all the coding, for free, this is called skin in the game. Then after the product works we'll all make lots of money! That's why none of us take any salary. We all work. Us by providing ideas and you by coding hard and giving 110%. Also please sign here stating that you are not an employee but a contractor, even though we'll call you an employee and treat you as such.


Kudos to WIRED, which keeps around re-directs to almost all 10-15 year-old articles.

IME, in general, URIs for well-known sites (likely to survive) are more likely to keep working. Lately I've noticed many problems with sites that changed all spaces to underscores.


Uri's are hierarchical, and hierarchies are notoriously difficult to get right. Even if you get them right, the definition of right often changes over time. In my experience, things have to be flat to be static. IDs seem to work pretty well


Now a days it feels like many websites simply don’t have uri schemes.

Relatedly, why does Firefox reload the page when duplicating a tab? If the page is a mediocre single page app, losing state essentially means being sent to the front page.


accompanied very well by "Building for Users: Designing URLs"[1] in Aaron Swartz' unfinished book.

[1] https://www.morganclaypool.com/doi/pdfplus/10.2200/S00481ED1...


And then DBPedia switched to https://


Except the name URL to URI!


> Do you really feel that the old URIs cannot be kept running? If so, you chose them very badly. Think of your new ones so that you will be able to keep then running after the next redesign.

"Never do anything until you're willing to commit to it forever" is not a philosophy I'm willing to embrace for my own stuff, thanks. Bizarre how blithely people toss this out there. Follow the logic further: don't rent a domain name until you have enough money in a trust to pay for its renewals in perpetuity!

> Think of the URI space as an abstract space, perfectly organized. Then, make a mapping onto whatever reality you actually use to implement it. Then, tell your server. You can even write bits of your server to make it just right.

Oh, well if it's capable of implementing something abstract, I'm sure that means there will never be any problems. (See: the history of taxonomy and library science)


Going a little in the opposite direction is the best compromise: keep the information that was available via the old url accessible via the old url in perpetuity (or for the duration of the website). Even a redirect is better than destroying URLs that have been linked to elsewhere for years.


Right -- so now I can never use that URL for something else!


That was my thought too. It's nice to try making an MYP without any nasty technical debt, but only if it's a hobby and you can pay your bills without needing to ever launch.


you exaggerate by far. "After the next redesign" becomes "foerever" and why not risk appearing uncool in case?


IPFS gives you immutable links


Until the IPFS devs decide that IPFS protocol v1 might become insecure in the medium term and create an new, secure, better, incompatible IPFS protocol v2 and every link, every search index, every community ever suddenly disappears. Don't laugh. It happened to Tor two months ago. The onion service bandwidth has been reduced by about 2/3 over November as distros and people upgraded to incompatible Tor v3 only software.

https://metrics.torproject.org/hidserv-rend-relayed-cells.pn...

The weakness is always the people.


> The onion service bandwidth has been reduced by about 2/3 over November

Back to 2019 levels of bandwidth? I feel like I may be misreading that graph, but I'm more curious about what bandwidth suddenly spiked the last two years more so than the drop back down.

At any rate, I always understood the main point of Tor was providing an overlay network for accessing the internet and maintaining secure anonymity as much as possible, with its own internal network being more of a happy side effect.

I don't think IPFS would be as quick to kill compatibility with a vulnerable hashing algo compared to Tor since they're not aiming for security and anonymity as primary goals.


cool!


Except now you can't update the contents of a page not even one tiny bit without also changing its URL, which is equally useless.

While I concede that the ability to retrieve the previous version of a page by visiting the old URL (provided anybody actually still has that old content) might come in handy sometimes, I posit that in the majority of cases people will want to visit the current version of a page by default. Even more so, I as the author of my homepage will want the internal page navigation to always point to the latest version of each page, too.

So then you need an additional translation layer for transforming an "always give me the latest version of this resource"-style link into an "this particular version of this resource" IPFS link (I gather IPNS is supposed to fill that role?), which will then suffer from the same problem as URLs do today.


a content-derived uri is a necessity for some things (indisputable facts, scientific papers, opinions at a moment in time, etc.) but foolish for others. Think of a website displaying the current time or anything inherently mutable.

But having unchanged documents move to new locations on the same domain without a redirect but a 404 is just utter unforgivable failure. Or silently deleted documents, also an uncool nuisance.

Both happen a lot. That's what comes to my mind, when I read the initial quote.


> the internal page navigation to always point to the latest version of each page,

Let the webserver redirect the uri without version to the most recent one. Problem solved. Remember, redirects are valid, logical uris.


Which still requires some additional configuration effort compared to today, though.


lol, you mean everything is more work than just letting things crumble. Fine then, but not cool.


Me updating page contents happens (relatively) often. Me re-working my whole page structure which would break existing conventional URLs happens very rarely.

Because conventional URLs don't care about updates to a page's contents, this means the common case of only updating pages requires no additional configuration. I only need to invest additional time setting up redirects on the rare occasion that I actually do re-arrange file names etc.

IPFS URLs on the other hand are revision-specific, so they break (or rather actually get out of date and eventually break if at some point in the future nobody has a copy of that version cached/pinned somewhere any more) as soon as I fix even one tiny little typo, so I need to set up some sort of URL mapping service right from the start.


Get with it, Webmasters!


I feel at this point reminiscing about a time when the web was actually designed to be usable isn’t really productive.

Some of the largest companies on the planet are actively opposed to this concept. If you care about this kind of thing champion it from within your own organization.


A lot of things can be said about the impact that the largest companies have on the web, but on the specific discussion about not breaking URIs, I think they are generally good at keeping, or redirecting, links to content for as long as the content is up.


They just create walled gardens instead.

Pinterest is a great example of how organizations put business interests ahead of building an accessible web.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: