
Cool URIs don't change (1998) - benjaminjosephw
https://www.w3.org/Provider/Style/URI.html
======
Communitivity
It is confusing to a lot of people, but they aren't functionally
interchangeable.

Basically you have Uniform Resource Locators (URLs), Uniform Resource Names
(URNs), and Uniform Resource Identifiers (URIs). You also have International
Resource Identifiers (IRIs), which are URIs with rules allowing for
international character sets in things like host names.

Every URN and URL is a URI. However, not every URI is a URN, or a URL.

A URN has a specific scheme (the front part of a URI before the :), but it
does not contain instructions on how to access the identified resource. We
humans might automatically map that to an access method in our head (e.g.,
digital object identifier URNs like doi:10.1000/182, which we who have used
DOIs know maps to
[http://dx.doi.org/10.1000/182](http://dx.doi.org/10.1000/182)), but the
instruction isn't in the URN.

A URL is not just an identifier but also an instruction for how to find and
access the identified resource.

For example [http://example.org/foo.html](http://example.org/foo.html) says to
access the web resource /foo.html by using the HTTP protocol over TCP to
connect to IP address which example.org resolves to, on port 80.

An example of URIs which are not URLs are the MIME content ids used to mark
the boundaries within an email (cid scheme), e.g., cid:foo4%25foo1@bar.net.

You can get more information at:
[https://tools.ietf.org/html/rfc2392](https://tools.ietf.org/html/rfc2392)

~~~
andreasvc
IMHO the distinction between URL and URI is similar to the debate on SI
prefixes for bytes, or whether we should insist on calling Linux GNU/Linux;
i.e., most people just don't care enough so these things will never gain
currency.

~~~
jephir
Yep. The WHATWG recommends just using the term URL instead of URI.

> Standardize on the term URL. URI and IRI are just confusing. In practice a
> single algorithm is used for both so keeping them distinct is not helping
> anyone. URL also easily wins the search result popularity contest.

[https://url.spec.whatwg.org/#goals](https://url.spec.whatwg.org/#goals)

~~~
dingo_bat
Thank you so much! I needed an authoritative source to throw at people when
they try to belittle me for using URL, instead of URI. To me, URL just
_sounds_ better.

~~~
tremon
And it is more descriptive as well, in the context of web browser addresses:
why call something an animal, when you can call it a dog?

------
makecheck
A lot of missteps in the early days of web technologies have made stable URLs
impractical, unfortunately.

One problem is that someone decided to include file name extensions. Maybe
this happened naturally because web servers made it so easy to expose entire
directory structures to the web. And yet, this continues to be used for lots
of other things. It is so ridiculous that a ".asp" or ".php" or ".cgi" causes
_every link, everywhere_ to depend on your arbitrary implementation details!

Another problem is that many software stacks are just not using brains when it
comes to what would make a useful URL. Years ago I was very frustrated working
with an enterprise software company that wanted to sell us a bug-tracking
system and they didn’t have simple things like "server.net/123456" to access
bug #123456; instead, the URL was something absolutely heinous that wouldn’t
even fit on a single line (causing wrapping in E-mails and such).

Speaking of E-mail, I have received many E-mails over time that consisted of
like TWELVE steps to instruct people on how to _reach_ a file _on the web_.
The entire _concept_ of having a simple, descriptive and stable URL was
completely lost on these people. It was always: 1. go to home page, 2. click
here, ..., 11. click on annoying “content management system” with non-standard
UI that generates unbookmarkable link, 12. access document. These utterly
broken systems began to proliferate and it rapidly reached the point where
most of the content that mattered (at least inside companies) was not
_available_ in any sane way so deep-linking to URLs became pointless.

~~~
SomeCallMeTim
>A lot of missteps in the early days of web technologies have made stable URLs
impractical, unfortunately.

Preserving URLs across technology migrations is _always_ possible (assuming
the old documents or at least a rough equivalent still exist in the new
system), and usually not even hard, to achieve. Rewrites are not rocket
science.

Assuming short-sighted developers put .php on the end of all the links,
migrating to a better system can use rewrite rules that trivially remove any
.php ending from a URL, or that remap old folder layouts to new folder
layouts.

Even if you can't use mod-rewrite because of complexity that it can't handle,
all you need is a server with a hash map (or equivalent) that responds with
301 redirects [1] from the old poorly designed link to the newer, cleaner
link.

The other problem you mention (no deep linking at all) is its own form of
stupidity in design, but since the deep links don't even exist, that's a
separate issue.

[1]
[https://en.wikipedia.org/wiki/HTTP_301](https://en.wikipedia.org/wiki/HTTP_301)

------
zymhan
" There is nothing about HTTP which makes your URIs unstable. It is your
organization. "

I think this could be applied to more than just how companies manage URLs.

Also, I'm trying to find a post I recently read that talked about how calling
URLs "URI"s is just confusing nowadays since almost everyone still only knows
the term URL, and they're functionally interchangeable.

~~~
kfullert
Was it
[https://news.ycombinator.com/item?id=11673058](https://news.ycombinator.com/item?id=11673058)
(My url isn't your url) ?

~~~
zymhan
It is indeed!

------
pidg
Another addition to their 'Hall of Flame' might be the British Monarchy. A
couple of weeks ago, they broke _every_ existing URI when they moved from
www.royal.gov.uk to www.royal.uk. Every URL from the old domain gets
redirected to the root of the new site.

[https://www.google.co.uk/search?q=site%3Aroyal.gov.uk](https://www.google.co.uk/search?q=site%3Aroyal.gov.uk)

~~~
_puk
Never fear, the Daily Mail will surely make a call to oust the Monarchy on
this basis..

You'd think ICANN would have a .monarchy TLD by now

mailto:liz@british.monarchy

~~~
coldtea
> _Never fear, the Daily Mail will surely make a call to oust the Monarchy on
> this basis.._

On this basis? The year being 2016, and those being "blue blooded" privileged
rulers is not enough?

~~~
_puk
Apologies, I wasn't trying to infer an opinion either way.

More an opinion that the decision to provide the worst possible UX has a
likelihood of being highlighted by the populist media.

------
chias
Interesting, the link as posted here violates the guidelines. Perhaps you
meant to link to

[https://www.w3.org/Provider/Style/URI](https://www.w3.org/Provider/Style/URI)

;)

~~~
oneeyedpigeon
What's weird is that the URL gives a 'mixed-content' warning in Chrome,
supposedly for the logo. But in the markup, that image is reference by a
relative URL; I can't figure out why Chrome is trying to load that image via
HTTP...

~~~
iancarroll
Somewhat oddly, the relative path 301s to an HTTP address but is then changed
to HTTPS via HSTS.

------
seagreen
This is a make-work trap for conscientious people.

If it's more efficient for your business/project to change your URIs when
going through a website design, go ahead (with the knowledge that you'll lose
some traffic, etc.)

Seriously, there's no reason to feel guilty over this. It's not your fault,
it's the fault of a system that built two UIs into every website (the
website's HTML and the URL bar -- the second of which is supposed to be useful
for browsing and navigation just like the first).

If W3C actually cared about links continuing to work, they would fix it at the
technical level by promoting content-addressable links instead of trying to
fix it at the social level (which will never work anyway, the diligent people
that care about these things will always be just a drop in the bucket).

~~~
oneeyedpigeon
I work for a publisher that produces several articles a day that include
external links for reference. More and more I seem to be coming across cases
where those links are now broken. In the near future, I will startup an
automated script that checks for broken links (and I'm guessing I may have to
get it warning on redirects since certain bad actors use redirects when they
should be using 410).

When I have some decent results, I'll be ensuring the editorial team is aware
of which sites in particular are prone to breaking links, and which they can
trust. The net effect will be that we will be less likely to drive traffic to
certain domains. Whether enough other people will do this to make any kind of
meaningful difference is unknown, but it's certainly better to be a
trustworthy site that it can't hurt to link to.

On a related note, I learnt this week that taken-down YouTube videos are a
PITA. Not only do they give a 200 when requested, they also give zero results
when looked-up via the youtube api. Sure, they can still be treated as a
'broken link' from our end, but it would be nice to be able to differentiate
between a video that was taken down and one that may never have existed in the
first place.

~~~
greglindahl
There's always the Internet Archive Wayback Machine "save page now" feature.
Wikipedia's been pretty enthusiastic about using it to make sure that external
links remain available -- and that the cited version is available.

~~~
colejohnson66
What I want to do is make an extension that sends a request to
web.archive.org/save/{URL} whenever you visit a webpage. To prevent flooding,
I would make it only send a request for a specific URL once a month.

------
alistairjcbrown
I remember Jeremy Keith talking about this at dConstruct conference; he put a
bet on Long Bets that the URI of the bet wouldn't change [0]

[0] - [http://longbets.org/601/](http://longbets.org/601/)

~~~
corford
A bit meta but can someone tell me if Warren Buffet is on course to win his
bet[0]? It's set to expire next year.

[0] [http://longbets.org/362/](http://longbets.org/362/)

~~~
SyneRyder
There was an episode of Planet Money just a couple of months ago that says
Buffett is indeed on track to win the bet:

[http://www.npr.org/2016/03/10/469897691/armed-with-an-
index-...](http://www.npr.org/2016/03/10/469897691/armed-with-an-index-fund-
warren-buffett-is-on-track-to-win-hedge-fund-bet)

------
wtbob
On a related note, URIs shouldn't end in extensions (use content
negotiation!), content should be readable without executing code (no
JavaScript necessary), content should be available in multiple languages (use
content negotiation!), and RESTful interface should offer a simple forms-based
interface for testing, &c.

~~~
Tomte
I've settled for "every article has an URL that looks like a folder"
(including trailing slash – that particular debate is pointless), only
resources like pictures look like a file with extensions.

That's easy to achieve with all CMSen, but also trivially done with a static
website (the web server is configured to use index.html or whatever you like).

------
zeveb
I was recently digging through my old blog's archives, and it was appalling
how many URLs from the early 2000s have completely disappeared, despite the
fact that the sites which served them remain — and gratifying when I was able
to reload some fringe resource from 1998 or 2003.

The Web is about webs of durable readable content, not about ephemeral walled-
garden apps.

~~~
_puk
Unfortunately GitHub seems to be pretty guilty of this too.

Lost count of the number of times I've clicked a link in a blog post only to
end up with a GitHub 404; I'm sure it is only going to get worse.

Most annoying thing with that one is usually simply going to GitHub search and
putting in the original 404'd repo name turns up the one I want.

I'm sure having the 404 serve up a "did you mean" search result would solve
the issue for the most part.

------
hyperpape
One downside of this: I now feel like I can't create a proper place to keep my
writing or other ideas until I carefully think of a URL scheme that I can
maintain for eternity.

~~~
hwh
just use random numbers.

This is meant more seriously than it might sound. But in the field of
"persistent identifiers", there's a notion that languages changes over time (a
common example being the word "gay"), so introducing meaning into
identification schemes might not be a good idea.

~~~
consto
Alternative why not have both? Use the meaningful url as most links, but have
a randomly generated perma link that will always redirect to the same article

~~~
PavlovsCat
Or sequential numbers, if you don't need to hide anything from being
discoverable that way. Or is there another downside to that?

I use "slugs" for category type nodes like "photos" or "blog", but I don't
give individual "things" a slug (never liked the idea of putting the title
into the URL just for SEO). And of course, everything that has a slug always
also has the numerical ID and can be reached both ways.

Now that I think of it, I might add a table keeping track what nodes slugs
were used for, and in case of 404 either redirect when there is only one
option, or display all the nodes that once used that slug if there is more
than one. Seems like that would be the next best thing to an _actually_ never
changing URL, right?

~~~
chriswarbo
> Or sequential numbers, if you don't need to hide anything from being
> discoverable that way. Or is there another downside to that?

Sequential numbers either require state, which must be kept in sync (e.g. the
current-highest index, or the set of all URLs used so far which you can take
the "max" of); or else their discovery process is slow (i.e. keep looking up
URLs until you find the highest unavailable one).

Random numbers don't need state; you can either make the space really huge,
and ignore the potential for collisions; or else you can look up the generated
URL to see if it's in use, and only have to perform another check in the
unlikely case that it is.

Whilst it may be easy to maintain the state for sequential numbers (if, say,
they already exist as database IDs), the failure mode of random numbers is
better. If something goes wrong with the lookup process for random numbers,
e.g. it might miss a collision, we're still protected by the incredibly low
chance of collisions in the first place. For sequential numbers, we're
_guaranteed_ to hit a collision in this case, i.e. if a glitch prevents the
state from getting updated, the next URL will _definitely_ collide with the
previous URL.

~~~
MichaelBurge
In Postgres, sequences are guaranteed to be monotonically increasing even in
the presence of failed transactions. So retrieving an index using nextval()[2]
will never collide[1].

[1] A malicious pedant could create situations where it wraps around or
collides.

[2] [http://www.postgresql.org/docs/current/static/functions-
sequ...](http://www.postgresql.org/docs/current/static/functions-
sequence.html)

~~~
chriswarbo
> In Postgres, sequences are guaranteed to be monotonically increasing even in
> the presence of failed transactions.

Well, I was talking about what happens in case of failure; not the chances of
a failure to begin with. If, hypothetically, Postgres forgot to update a
sequence number, then a collision would be guaranteed, since that's a property
of sequential numbering: all updates are contending for the same "next"
number, and must be managed in some way. Random numbers spread across a large
range, so work even without management.

Even if we ignore the possibility of bugs in Postgres, or cosmic rays flipping
bits in RAM (which may have a similar probability to a random ID scheme
colliding), etc. that's still only true inside the confines of Postgres.

Once data starts working its way through layer after layer of shell scripts,
caches, proxies, etc. there's a lot of scope for things to go awry. Plus,
since we're talking about longevity, there's no guarantee that we'll still be
using Postgres in 20 years' time, or maybe we need to migrate or integrate
with some other system, etc.

I'm reminded of the "one in a billion" chances that used to be claimed for DNA
evidence in court. It is certainly true that the probability of two non-twins
having identical DNA is that low, but that doesn't really mean much; even if
the probability of identical DNA were zero, that doesn't tell us anything
about the collison rate of the sub-set of bases which were used, or the error
rate of the sampling process, or the mislabelling rate of the lab, or the
corruption rate of the evidence handlers, etc. :)

------
brightshiny
There's RFC 4151 to keep in mind, TagURI, a scheme for URIs that is
independent of URL (but can use it too). One reason to use it would be to mark
a page as being the same resource even though the domain or URL had changed.

[http://www.taguri.org/](http://www.taguri.org/)

[https://en.wikipedia.org/wiki/Tag_URI_scheme](https://en.wikipedia.org/wiki/Tag_URI_scheme)

I wrote a library for it in Ruby which is how I know about it.

~~~
masonlee
This is a cool scheme to address the problem of URL-impermanance due to the
fact that DNS name ownership/control can change over time. It allows you to
specify a URL plus a date.

I wrote a short blog post about the problem a few years back, and the Tag URI
scheme ended up being one of the best solutions I came across, which is how I
know about it. Some links in that post and comments may be of interest to
people: [https://masonlee.org/2009/08/21/is-the-web-sticky-
enough/](https://masonlee.org/2009/08/21/is-the-web-sticky-enough/)

------
tremon
I wonder if the footnote was also written in 1998:

 _Historical note: At the end of the 20th century when this was written,
"cool" was an epithet of approval particularly among young, indicating
trendiness, quality, or appropriateness._

~~~
lstamour
Nope.
[https://web.archive.org/web/19990508205057/http://www.w3.org...](https://web.archive.org/web/19990508205057/http://www.w3.org/Provider/Style/URI.html)

It was added sometime between Nov. 27, 2001 and Dec. 14, 2001:
[https://web.archive.org/web/20011214140114/http://www.w3.org...](https://web.archive.org/web/20011214140114/http://www.w3.org/Provider/Style/URI.html)

For those wondering like I was (sigh), the typos that appear in the Hall of
Flame footnotes when added in early 2000 were mostly fixed sometime in 2001,
the last typo "uit" fixed in 2004.

------
thudson
Cool URLs still work in OmniWeb on a NeXT with a decades old bookmarks file:
[https://www.flickr.com/photos/osr/17082625625/lightbox](https://www.flickr.com/photos/osr/17082625625/lightbox)

The Mondo 2000 interview bookmarks, not so much.

------
alpb
The example URL they list from w3.org website
([http://www.w3.org/1998/12/01/chairs](http://www.w3.org/1998/12/01/chairs))
is now broken. Great deal of irony.

~~~
gerry_shaw
It requires authentication so not broken but still a poor example.

------
kerrsclyde
What about case sensitivity?
[https://www.w3.org/provider/style/uri](https://www.w3.org/provider/style/uri)
doesn't work.

~~~
lewiseason
What about it? That's a different URI.

------
prsutherland
Pre-DMCA. If this gets an update, TBL should add legal reasons and law
enforcement as a reason for URI change.

~~~
frenchy
I've always figured an HTTP 410 response would be sufficient for that.

~~~
prsutherland
I'd still consider a 410 as a broken link. The article states

> Pretty much the only good reason for a document to disappear from the Web is
> that the company which owned the domain name went out of business or can no
> longer afford to keep the server running.

------
nommm-nommm
Shouldn't there be some tradeoffs?

Say I sign up with service x with the name y. My URL is www.x.com/users/y.
Years later I delete my account. Someone else signs up with the name y. Now
www.x.com/users/y goes to someone else's resources. The old URL is broken.

The only way to prevent this is either give the user a url (that they will
want to share) that is not meaningful or disallow anybody to sign up with a
name that was ever in use and names are a resource that is very limited.

Neither seems ideal. I do agree on principal that URLS shouldn't change,
though.

Hotmail actually has this problem, or at least they used to. They delete
accounts that are inactive for a long time and someone else can sign up with
that name. The new person can get email addressed to the previous owner.

~~~
krapp
>Now www.x.com/users/y goes to someone else's resources. The old URL is
broken.

The URL isn't broken if it just points to a different resource.

~~~
oneeyedpigeon
I'd rather it broke. If someone linked to the old URL, describing it as "the
profile page of an international child murderer", and I later inherited that
URL by signing up for the service, I wouldn't be too pleased.

The "y" example is confusingly simplistic; the site already has a "can't use
duplicate usernames" rule, so 404'ing the original URL isn't 'worse' for
anyone than if the original user didn't delete their account.

------
kazinator
_" Except insolvency, nothing prevents the domain name owner from keeping the
name."_

Ah, the halcyon days of the Internet's adolescence.

This was before fiascos like Mike Rowe, of Canada, having his mikerowsoft.com
taken away.

~~~
andreasvc
I just learned of this case. I would say he didn't have his domain name taken
away, but voluntarily gave it up when he chose to settle (for an X-box...).
I've always found it a disappointing aspect of lawsuits that they end up in
settlements so often, this would have formed an interesting precedent since he
might have had good chances of winning the case (real name, no intention of
deception). Shouldn't there be some kind of rule that prevents lawsuits from
being settled when it's in the public interest or in the interest of justice?

~~~
kazinator
So he just wasn't "cool", and therefore, so weren't his URI's. :)

Working definition of "cool URI":

* All URI's currently resolving are "tentatively cool".

* Any URI that disappears and changes at any time isn't "cool" \--- and never was: the "tentatively cool" designation was mistaken all along.

------
nathancahill
At least one of their example URLs on the page still points (eventually) to
the same content: [http://www.nsf.gov/cgi-
bin/getpub?nsf9814](http://www.nsf.gov/cgi-bin/getpub?nsf9814)

Although they seem to have not learned anything, and are now using .jsp
instead of .pl.

------
miseg
It's tough!

I have a custom PHP app that includes marketing pages.

I'd like to crowbar Wordpress into the server to serve the marketing pages
instead, to make it easie to change text over time.

A .htaccess set of redirect rules may indeed work, but it's hard work to keep
all URLs working.

~~~
nkuttler
> It's tough!

It really depends on which tools you use. Yes, wp has a horrific URL routing
mechanism, and on top of that can randomly redirect 404 pages by guessing the
target.

------
avar
I wonder if TBL would have written the same article 10 years later when search
engines had gotten much better.

It's still an inconvenience when a URL moves, but before the likes of Google
that used to be a huge inconvenience and it would often take tens of minutes
to track down the new location, now it's on the order of tens of seconds at
most.

~~~
ygra
That depends very much on the content. If you only have an URL and no hint of
the content or title of that page, then finding it may prove difficult.

~~~
TeMPOraL
See for example any kind of documentation for any piece of hardware that's
older than 5 years. The companies change their support pages structrue so
often that it's incredibly hard to track anything down.

Cynic in me thinks they're doing it on purpose.

~~~
vidarh
Those are particularly nasty because model numbers are often so arbitrary that
a few years down the line searching for an old model number often becomes a
near impossible quest of wading through stuff related to newer models with
nearly the same model number.

------
prashnts
Oh well, Hall of fame #1 leads to a 404 for both links. So, I propose that it
should be fair to call it (Hall of fame #1)².

~~~
metafunctor
It's the Hall of _flame_. They are examples of URIs that were changed for no
good reason.

------
nxzero
Idea that something cool would stay cool forever sounds like oxymoron.

For example, Yahoo.com has remained the URI for Yahoo's homepage and will
continue to until it's the 404 page. Yahoo.com is not cool.

~~~
rimantas
No, the idea is that URL that don't change are cool.

~~~
nxzero
>> "No, [repeat idea]."

Is not a meaningful response to the claims I presented.

At the point that I'm able to swap two resource with another and only reason
to believe that a given resource is in fact the resource I referenced is the
fact that the unique identifiers are the same means it's in fact useless as an
indentifier.

In fact, I would claim that the whole idea of URIs as currently used is
malformed, since it fails to allow a user to be able to know the resource
reference is in fact the resource received.

