
Cool URIs don't change. - diwank
http://www.w3.org/Provider/Style/URI.html
======
makecheck
One thing that's always bugged me is file extensions...I hate having to type
".php" or ".cgi" or ".asp" or whatever else just happens to reflect the
server's implementation at the time. If they switch from Microsoft to LAMP
then their URLs will probably all break needlessly.

It can be much worse though...exposing machine names, unnecessary complexity
and parameters, all changing from year to year.

A server doesn't _have_ to puke its implementation details everywhere. I can't
even count all the "enterprise" apps that make that mistake (e.g. a helpdesk
system that gives me
"serverNameThatWillChangeNextYear.domain.net/some/unnecessarily/convoluted/path.unnecessaryExtension?whatthehellisallthis&garbage1=a&garbage2=b&finallyRelevantBugNumber"
instead of a stable URL like "company.net/bugs/bug123456". And just try
E-mailing a complex URL to somebody (it wraps, and time is spent to awkwardly
correct it).

Honestly, it's as if most web developers don't understand how powerful URLs
can be. If you make URLs short and stable and use them to help look up stuff,
they can be very nice. Instead in the past I've seen people E-mailing 9-step
instructions on _how to find something_ because the damn URL is unreliable.

~~~
porges
On a related tangent, can we please have an operating system that eschews
extensions and instead stores the MIME type for each file?

~~~
jamesbritt
Ugh. I have that in the konqueror file browser. It beleives that my .epub
files are zip archives, so the default handler is not an epub reader.

~~~
dredmorbius
What's file(1) say about those files?

If it's bad magic, update your distro. If it's a Konqueror error, file a bug.
I'd be surprised if upstream hasn't addressed this (quick DDG/Google doesn't
turn up any similar complaints).

~~~
jamesbritt
But they _are_ zip files. (Though `file somebook.epub tells me "data". Sigh.
Desktop and OS seem to be on different pages.)

Just like _.css and_.log are text files, but I may not want the same default
file handler.

What I like is having *.epub open in, say, calibre, but if I append a .zip
extension then it will open in xarchive.

Anyways, I open most things from the command line and wrote my version of
'open' so I get what i expect 99% of the time. :)

I'm using KDE3 so I'm not expecting any upstream fixes for this in my
lifetime. I can live with crafting my own solutions.

What might work best is if mime types were used by default but forcing
behavior for specific extensions was much easier. Get the best of both worlds
(which I can mostly do in KDE3 Konqueror but it's tedious.)

~~~
dredmorbius
File(1) on my Debian wheezy says they're epubs. File has dealt with nested
formats for decades. It examines up to the first 1024 bytes of the file IIRC.

Other examples: tar.gz, WAR files, most ODF formats.

Edit: smartphone tyops fixed.

------
mef
Not only has the URL to that page not changed in 13 years, its content hasn't
either.
[http://web.archive.org/web/19990508205057/http://www.w3.org/...](http://web.archive.org/web/19990508205057/http://www.w3.org/Provider/Style/URI.html)

~~~
technel
I had a number of revelations through rereading this classic article:

First, holy crap, I hadn't been to <http://www.w3.org/> in a long time, and it
looks like they've actually made it to the 21st century!

Second, perhaps cool URIs don't change, but it seems like
<http://www.w3.org/Provider/Style/URI.html> is kind of an unfortunate URI.
What's "Provider"? Why are "Provider" and "Style" uppercase? And what's wrong
with 301 redirecting (don't break old URIs, but still restructure them as your
website matures and you realize a better organizational hierarchy)?

Third and perhaps most importantly, this all seems like a pretty awesome
problem to have! How many websites survive more than a few years? (Geocities
doesn't count.)

~~~
charliesome
> _Second, perhaps cool URIs don't change, but it seems
> like<http://www.w3.org/Provider/Style/URI.html> is kind of an unfortunate
> URI. What's "Provider"? Why are "Provider" and "Style" uppercase?_

They're uppercase because they're uppercase. Asking why is like asking why
Rubyists like_using_method_names_like_this and .NET devs
LikeUsingMethodNamesLikeThis.

~~~
technel
Convention is for URLs to be all lowercase, which IMO aids in readability and
certainly makes them easier to type (imagine giving a URL to a friend over the
phone -- remember this is 1999).

~~~
porges
Well luckily for you, the URI is case-insensitive and redirects (301) to the
canonical version.

------
AntiRush
I'm not sure if you did it on purpose, but

<http://www.w3.org/Provider/Style/URI>

works and is more in the spirit of the article than

<http://www.w3.org/Provider/Style/URI.html>

~~~
mkopinsky
However, <http://www.w3.org/Provider/Style/URI/> does not work. I guess this
makes some sense, but certainly violates the "typical" functionality on the
web where trailing slashes are ignored.

~~~
alanh
Agreed. That’s a rule I mention in my piece, “URL as UI”, which I really want
everyone to read. (If it feels like self-promotion, then please kindly ignore
my username / blog masthead. I only care about these ideas.)
<http://alanhogan.com/url-as-ui>

~~~
adavies42
> Gracefully handle 404s.

> [...]

> Consider linking to your home page and/or site map [...]

For a second I thought you were advocating _redirecting_ 404s to the homepage,
and I was about to flame you. I _hate_ that one, and I think it's harmful
enough that you probably ought to have a point about not doing it in that
article.

------
willsulzer
I agree with this article to the extent that we should choose our URIs
carefully and semantically. However, it is very likely that your web facing
application, and the semantics of your resources will change over time. What
if I wanted to deprecate and eventually remove my 'Providor' or 'Style'
resource? Maybe I decided to create a 'blogs' and an 'articles' resource. That
would change the URI from <http://www.w3.org/Provider/Style/URI.html> to
<http://www.w3.org/blogs/articles/URI.html> (I know that my choice of new
resources aren't the most interesting but you get the point). Of course, we
would still support the old URI via a 301 redirect to the new one, or continue
to serve up the page from the old URI and use a rel canonical meta tag. I'd
advocate for 301ing to the new URI and letting the past be the past. We
shouldn't be bound to these decisions for the rest of our lives. Unless you're
sure that the focus of your web app will never change, or you plan to build
your new resource URIs around old resource URIs that were chosen at a time
when your new resources weren't being considered (which would lead to far
worse URIs), then I'd plan for your URIs to change.

------
yaix
Just saw another classic in the comments of a different story:

Top 10 mistakes of Web design <http://www.useit.com/alertbox/9605.html>

More classics anybody?

~~~
TeMPOraL
Good they mention "banner blindness" and related phenomena. I see webpages
sometimes that make these mistakes, and once or twice I ever got burned by it
- like when I missed a very important "Point 0" from some IRC channel's FAQ.
This Point 0 was so important for channel mods, that they put it directly
above the page headline, and I honestly _didn't perceive it was there_ until I
reread the page for the third time.

------
Sambdala
It's somewhat ironic that this was submitted as
<http://www.w3.org/Provider/Style/URI.html> rather than
<http://www.w3.org/Provider/Style/URI>.

------
crazygringo
HN, what is going on here? Why are the four comments which disagree with the
article (including my comment) all being _downvoted_?

Downvotes are for comments that don't contribute, not comments you disagree
with.

------
comex
The problem is that given enough time, the URL _will_ break-- you'll redo the
entire site (a shiny new CMS/cloud provider/hosted solution/server!), or
accidentally delete everything, or get hacked, or get acquired, or go out of
business. Or even if you keep your important links up forever, various obscure
bits and pieces will come and go, and years later, somebody will find an old
link to one of them. Bitrot is a fundamental limitation of the web, and it
sucks.

The workaround is the Wayback Machine, which is amazing, but could be more
comprehensive and frequently updated. I wish someone like Google would throw
more servers at it.

~~~
seldo
The point of the article is that as long as you control the domain, you have
no excuse for your links breaking. Going out of business, and therefore being
unable to pay for your domain, is specifically called out as a valid reason,
but there's no reason you'd lose control of your domain after being acquired,
even if you decided to redirect your old links to newer information. Shiny new
software is also specifically called out as a bad reason to break links, since
backwards-compatible redirects are trivial. And if you're capable of
permanently losing all your data through accidental deletion or a server being
compromised, you have much bigger problems.

All that said, you're fundamentally right -- sometimes information stops being
available because it's out of date, and keeping it available would be
confusing (if a product is no longer available, it would be strange to
maintain a page describing it for years afterwards). Archiving through the
Wayback machine is a very helpful stopgap, but expecting them to continuously
archive every version of the entire Internet for all time won't scale.

What's needed is a distributed, decentralized system, ideally at the protocol
level. Imagine if a GET request by default gave you the "current" version of a
page, but you could send an extra header that said "give me this page, as it
appeared at date-time X". This would remove the confusion caused by the
existence of a page being conflated with that page being current[1], and allow
sites to maintain clean navigational and data structures by flagging outdated
pages as "expired" instead of completely deleting them. When a server got a
request for a page that used to but no longer exists, it could respond with a
new 4xx-series header, "No longer current", indicating the document is not
available for the given date-time, but is available for an earlier date.

[1] I frequently get people sending me ANGRY emails about flippant, immature
blog posts I wrote 10+ years ago[2]. They assume that because it's still on my
website, I still stand by those statements, when in fact I'm just reluctant to
delete information.

[2] The posts still get traffic, because links to them made 10+ years ago
still work, despite rewriting my CMS 3 times.

~~~
adavies42
> What's needed is a distributed, decentralized system, ideally at the
> protocol level. Imagine if a GET request by default gave you the "current"
> version of a page, but you could send an extra header that said "give me
> this page, as it appeared at date-time X".

Sounds like Freenet USK's (see <https://freenetproject.org/understand.html>,
search for USK (and boo on them for not having any anchors on that page)).

------
einhverfr
Some of it depends on the nature of the URI though. The article is about
linked resources, but a lot of URI usage isn't really about linked resources.
Many of these can change reasonably.

For example, is there any harm in changing where that contact us form points
to? Do you have to maintain the contact us forms API to be backwards
compatible forever? This really depends on what you are doing.

I agree with the article as it relates to documents for the most part. There
are cases where maintaining URL backwards compatibility is a problem due to
unforeseen dead ends. However, for the most part it shouldn't be. However,
URIs needs for persistence varies so widely that I don't know one can
generalize much beyond that.

~~~
alexmuller
> For example, is there any harm in changing where that contact us form points
> to?

The point could more be made: "Why not keep that URL the same?"

It's trivial to do technically, and there's no advantage to be had from moving
back and forth between foo.com/contact and foo.com/contact-us. If the URL of
your company's contact form gets published in a book, would that change your
mind?

~~~
einhverfr
Sorry I was misunderstood. I was talking about the URL where the form is
submitted, not the URL of the form. For the URL of the form, it's generally
desirable to keep it the same all things being equal because this doesn't
break other people's links.

But where it is a form submission target, the only reason to care is if you
are accepting third party form submission. But we are no longer talking about
hypertext resources at that point which is my main point.

------
bradwestness
Organizations with large websites use content management systems. Vendors of
content management systems come and go as technology changes. Usually,
changing the CMS a website uses necessitates URI structure changes depending
on the conventions of the system.

Maintaining in perpetuity a complete set of redirects (or stack of rewrite
rules) for every page for a website consisting of millions of pages for
decades is not feasible in most cases. Things change, departments are created
or disbanded or renamed. Management decides certain things should not be
accessible to the public or stored in a particular location.

It's incredibly unrealistic to expect every URI to be permanent.

~~~
alanh
You are describing the status quo as you see it, as if it were a rebuttal to
the ideal towards which we should be striving.

~~~
bradwestness
No, I'm saying that technology changes, the web changes, your organization
changes and the content on the website changes. To act like it's even
desirable to have all of the same documents at the same locations on your
website in 20 years is misguided.

------
zmoazeni
Jeremy Keith's talk and resulting long bet on the topic is interesting. He
contends that it's not proven that data put on the internet is in fact
immortalized.

\- <http://vimeo.com/34269615>

\- <http://longbets.org/601/>

------
crazygringo
> _Pretty much the only good reason for a document to disappear from the Web
> is that the company which owned the domain name went out of business or can
> no longer afford to keep the server running. Then why are there so many
> dangling links in the world? Part of it is just lack of forethought._

Nope, it's the lack of a crystal ball. The technology world moves fast. And
breaks things, as Zuck says. I have no idea how a site of mine might be
structured even a year from now, and if the currently URL scheme will conflict
directly with its needs or not, or if an old-to-new translation will be
trivial or overly resource-intensive. By now, the world has mostly realized
that over-planning is bad, and agile is important.

So who cares if a 5-year-old URL to a page virtually nobody ever visits
anymore doesn't work. It's easy to Google the keywords in the link, and you'll
probably be able to find the content if it's still around.

(Obviously big sites have an incentive to keep their links working, but they
don't need an article from the W3C to tell them that.)

~~~
yaix
"technology world moves fast [...] currently URL scheme will conflict directly
with its needs or not"

That is your mistake. The URI should be decoupled from the technology you are
using. The URI is part of the content and not of the software behind the site.

~~~
alanh
Amen. A thousand times, amen.

If your stack requires tight coupling with URL structure, you have doomed
yourself already.

------
robbiemitchell
> If so, you chose them very badly.

I love how this starts out by (wrongly) assuming that "you" is a single person
or even stable group of people, and that the answer to a changing URI is to
say "You did it wrong."

------
kevinsd
It is like asking a developer not to change API. Or keep the old version.

~~~
noamsml
And companies that have millions of developers making money off their APIs
(and, in one way or another, paying some dividend back to the API creators)
indeed do tend to keep their APIs fairly backwards-compatible, with some
notable exceptions.

------
mindslight
And the _coolest_ URIs _can't_ change.

