Basically you have Uniform Resource Locators (URLs), Uniform Resource Names (URNs), and Uniform Resource Identifiers (URIs). You also have International Resource Identifiers (IRIs), which are URIs with rules allowing for international character sets in things like host names.
Every URN and URL is a URI.
However, not every URI is a URN, or a URL.
A URN has a specific scheme (the front part of a URI before the :), but it does not contain instructions on how to access the identified resource. We humans might automatically map that to an access method in our head (e.g., digital object identifier URNs like doi:10.1000/182, which we who have used DOIs know maps to http://dx.doi.org/10.1000/182), but the instruction isn't in the URN.
A URL is not just an identifier but also an instruction for how to find and access the identified resource.
For example http://example.org/foo.html says to access the web resource /foo.html by using the HTTP protocol over TCP to connect to IP address which example.org resolves to, on port 80.
An example of URIs which are not URLs are the MIME content ids used to mark the boundaries within an email (cid scheme), e.g., cid:firstname.lastname@example.org.
You can get more information at:
> Standardize on the term URL. URI and IRI are just confusing. In practice a single algorithm is used for both so keeping them distinct is not helping anyone. URL also easily wins the search result popularity contest.
One problem is that someone decided to include file name extensions. Maybe this happened naturally because web servers made it so easy to expose entire directory structures to the web. And yet, this continues to be used for lots of other things. It is so ridiculous that a ".asp" or ".php" or ".cgi" causes every link, everywhere to depend on your arbitrary implementation details!
Another problem is that many software stacks are just not using brains when it comes to what would make a useful URL. Years ago I was very frustrated working with an enterprise software company that wanted to sell us a bug-tracking system and they didn’t have simple things like "server.net/123456" to access bug #123456; instead, the URL was something absolutely heinous that wouldn’t even fit on a single line (causing wrapping in E-mails and such).
Speaking of E-mail, I have received many E-mails over time that consisted of like TWELVE steps to instruct people on how to reach a file on the web. The entire concept of having a simple, descriptive and stable URL was completely lost on these people. It was always: 1. go to home page, 2. click here, ..., 11. click on annoying “content management system” with non-standard UI that generates unbookmarkable link, 12. access document. These utterly broken systems began to proliferate and it rapidly reached the point where most of the content that mattered (at least inside companies) was not available in any sane way so deep-linking to URLs became pointless.
Preserving URLs across technology migrations is always possible (assuming the old documents or at least a rough equivalent still exist in the new system), and usually not even hard, to achieve. Rewrites are not rocket science.
Assuming short-sighted developers put .php on the end of all the links, migrating to a better system can use rewrite rules that trivially remove any .php ending from a URL, or that remap old folder layouts to new folder layouts.
Even if you can't use mod-rewrite because of complexity that it can't handle, all you need is a server with a hash map (or equivalent) that responds with 301 redirects  from the old poorly designed link to the newer, cleaner link.
The other problem you mention (no deep linking at all) is its own form of stupidity in design, but since the deep links don't even exist, that's a separate issue.
> The more expensive the CMS, the crappier the URLs.
I think this could be applied to more than just how companies manage URLs.
Also, I'm trying to find a post I recently read that talked about how calling URLs "URI"s is just confusing nowadays since almost everyone still only knows the term URL, and they're functionally interchangeable.
You'd think ICANN would have a .monarchy TLD by now
On this basis? The year being 2016, and those being "blue blooded" privileged rulers is not enough?
More an opinion that the decision to provide the worst possible UX has a likelihood of being highlighted by the populist media.
If it's more efficient for your business/project to change your URIs when going through a website design, go ahead (with the knowledge that you'll lose some traffic, etc.)
Seriously, there's no reason to feel guilty over this. It's not your fault, it's the fault of a system that built two UIs into every website (the website's HTML and the URL bar -- the second of which is supposed to be useful for browsing and navigation just like the first).
If W3C actually cared about links continuing to work, they would fix it at the technical level by promoting content-addressable links instead of trying to fix it at the social level (which will never work anyway, the diligent people that care about these things will always be just a drop in the bucket).
When I have some decent results, I'll be ensuring the editorial team is aware of which sites in particular are prone to breaking links, and which they can trust. The net effect will be that we will be less likely to drive traffic to certain domains. Whether enough other people will do this to make any kind of meaningful difference is unknown, but it's certainly better to be a trustworthy site that it can't hurt to link to.
On a related note, I learnt this week that taken-down YouTube videos are a PITA. Not only do they give a 200 when requested, they also give zero results when looked-up via the youtube api. Sure, they can still be treated as a 'broken link' from our end, but it would be nice to be able to differentiate between a video that was taken down and one that may never have existed in the first place.
Now, while I very much like the idea of content-addressable systems, that is not the solution to this problem. Addresses/names often are used to identify more abstract things than "this sequence of bytes". For example, company A's current list of prices is not a fixed sequence of bytes, but rather an abstract concept that refers to information that varies over time. The purpose of a name in this case is that it allows you to obtain an up-to-date version of some information.
A name that is derived from the content that you want to obtain cannot possibly do that job. Only names maintained by people who understand the continuity of those varying byte sequences can do that.
My main response to this is: just because we can't always use content-addressable links doesn't mean we shouldn't use them where we can. I'm sure you see where I'm going with this. When pirate movie websites have a 1000x more rock-solid way of identifying content than the NY Times we have a problem.
But even in the most inhospitable situations like the one you describe we can still do better. Content-addressability won't help, agreed. How about promoting UUID style non-human readable paths instead? One of the reasons links change is that the content changes slightly and the name goes out of date. Perhaps in your case legal says "these prices may be up to ten minutes out of date, we can no longer call that path /prices/current/feed". If the path was /3824-3822-4864 this wouldn't be a problem. Information about what that endpoint returns can be placed in a description field or in the docs where it can be kept up-to-date without breaking links.
Also, in your specific example, it would still be possible to replace the old content with a document explaining the change and linking to the new URI, or possibly even an HTTP redirect. Plus, a semantically meaningless name might still not prevent the problem, as there just as well could be legal consequences if you have communicated before that this URI provides "current prices" via some other means than the URI itself.
Now the situation is different, and a lot of these messages have finally sunk into the mainstream. I think you're right; there's no need for an army of purists to keep driving these points home. We get it, we know where the trade-offs are. This needs to be interpreted in a larger context of what was going on at the time.
Any strong language I used was directed at me as of a year ago who was wasting time on redirects no one would ever use, not the original authors.
 - http://longbets.org/601/
2. The winnings (plus growth) are awarded with fanfare to the winner's preferred charity. The winner, if still alive, can change charities if desired (but not to multiple recipients). If the winner is no longer alive and the originally designated charity is gone or drastically changed, Long Bets may award the winnings to a charity deemed closest to the winner's original intentions.
> A 301 redirect from www.longbets.org/601 to a different URL containing that text would also fulfill those conditions.
> …entering the characters http://www.longbets.org/601 into the address bar of a web browser or command line tool (like curl) OR using a web browser to follow a hyperlink that points to http://www.longbets.org/601 MUST …
The bet (and the referenced article) is about the availability of the resources at the end of those URLs - the URL itself is just a reference, or a pointer if you will to a resource.
That's easy to achieve with all CMSen, but also trivially done with a static website (the web server is configured to use index.html or whatever you like).
There are plenty of tools which can serve this need. Why complicate every implementation with duplicative functionality?
But it makes no sense to have a link to a document and get completely different content back based on language. And file extensions are needed since browsers don't expose a way to ask for, say, PDFs or images over HTML.
I'm sure it's behind the proliferation of the .format extension to determine response type.
The Web is about webs of durable readable content, not about ephemeral walled-garden apps.
Lost count of the number of times I've clicked a link in a blog post only to end up with a GitHub 404; I'm sure it is only going to get worse.
Most annoying thing with that one is usually simply going to GitHub search and putting in the original 404'd repo name turns up the one I want.
I'm sure having the 404 serve up a "did you mean" search result would solve the issue for the most part.
Facebook is the worst at this. They even require you to have an account and log in to read a public-facing community/restaurant fb page
This is meant more seriously than it might sound. But in the field of "persistent identifiers", there's a notion that languages changes over time (a common example being the word "gay"), so introducing meaning into identification schemes might not be a good idea.
I use "slugs" for category type nodes like "photos" or "blog", but I don't give individual "things" a slug (never liked the idea of putting the title into the URL just for SEO). And of course, everything that has a slug always also has the numerical ID and can be reached both ways.
Now that I think of it, I might add a table keeping track what nodes slugs were used for, and in case of 404 either redirect when there is only one option, or display all the nodes that once used that slug if there is more than one. Seems like that would be the next best thing to an actually never changing URL, right?
Sequential numbers either require state, which must be kept in sync (e.g. the current-highest index, or the set of all URLs used so far which you can take the "max" of); or else their discovery process is slow (i.e. keep looking up URLs until you find the highest unavailable one).
Random numbers don't need state; you can either make the space really huge, and ignore the potential for collisions; or else you can look up the generated URL to see if it's in use, and only have to perform another check in the unlikely case that it is.
Whilst it may be easy to maintain the state for sequential numbers (if, say, they already exist as database IDs), the failure mode of random numbers is better. If something goes wrong with the lookup process for random numbers, e.g. it might miss a collision, we're still protected by the incredibly low chance of collisions in the first place. For sequential numbers, we're guaranteed to hit a collision in this case, i.e. if a glitch prevents the state from getting updated, the next URL will definitely collide with the previous URL.
 A malicious pedant could create situations where it wraps around or collides.
Well, I was talking about what happens in case of failure; not the chances of a failure to begin with. If, hypothetically, Postgres forgot to update a sequence number, then a collision would be guaranteed, since that's a property of sequential numbering: all updates are contending for the same "next" number, and must be managed in some way. Random numbers spread across a large range, so work even without management.
Even if we ignore the possibility of bugs in Postgres, or cosmic rays flipping bits in RAM (which may have a similar probability to a random ID scheme colliding), etc. that's still only true inside the confines of Postgres.
Once data starts working its way through layer after layer of shell scripts, caches, proxies, etc. there's a lot of scope for things to go awry. Plus, since we're talking about longevity, there's no guarantee that we'll still be using Postgres in 20 years' time, or maybe we need to migrate or integrate with some other system, etc.
I'm reminded of the "one in a billion" chances that used to be claimed for DNA evidence in court. It is certainly true that the probability of two non-twins having identical DNA is that low, but that doesn't really mean much; even if the probability of identical DNA were zero, that doesn't tell us anything about the collison rate of the sub-set of bases which were used, or the error rate of the sampling process, or the mislabelling rate of the lab, or the corruption rate of the evidence handlers, etc. :)
What about for indicating the likely content of a URL, when all someone has is that URL?
Please, not just a UUID. I'd like to know what to expect before clicking a link.
I think nu.nl has found an interesting solution to this problem: every article has a canonical url of the form $section/$articlenumber/$title.html -- However, the "title" part (everything between the last slash and .html) is ignored by the server, and is pretty much free to alter as you like. All these three links resolve to the same article:
I wrote a library for it in Ruby which is how I know about it.
I wrote a short blog post about the problem a few years back, and the Tag URI scheme ended up being one of the best solutions I came across, which is how I know about it. Some links in that post and comments may be of interest to people: https://masonlee.org/2009/08/21/is-the-web-sticky-enough/
Historical note: At the end of the 20th century when this was written, "cool" was an epithet of approval particularly among young, indicating trendiness, quality, or appropriateness.
It was added sometime between Nov. 27, 2001 and Dec. 14, 2001: https://web.archive.org/web/20011214140114/http://www.w3.org...
For those wondering like I was (sigh), the typos that appear in the Hall of Flame footnotes when added in early 2000 were mostly fixed sometime in 2001, the last typo "uit" fixed in 2004.
The Mondo 2000 interview bookmarks, not so much.
> Pretty much the only good reason for a document to disappear from the Web is that the company which owned the domain name went out of business or can no longer afford to keep the server running.
Say I sign up with service x with the name y. My URL is www.x.com/users/y. Years later I delete my account. Someone else signs up with the name y. Now www.x.com/users/y goes to someone else's resources. The old URL is broken.
The only way to prevent this is either give the user a url (that they will want to share) that is not meaningful or disallow anybody to sign up with a name that was ever in use and names are a resource that is very limited.
Neither seems ideal. I do agree on principal that URLS shouldn't change, though.
Hotmail actually has this problem, or at least they used to. They delete accounts that are inactive for a long time and someone else can sign up with that name. The new person can get email addressed to the previous owner.
What is really bad is when there is some document/resource/content that is linked to by others (be it in other web pages or in bookmarks), and then that document changing its address, as all those references now don't point to the document anymore, even though the document actually still is available somewhere. An address being reused for something different isn't really all that much worse than the existing references leading to a 404. But either of those is bad if the original content is still available somewhere.
The URL isn't broken if it just points to a different resource.
The "y" example is confusingly simplistic; the site already has a "can't use duplicate usernames" rule, so 404'ing the original URL isn't 'worse' for anyone than if the original user didn't delete their account.
Ah, the halcyon days of the Internet's adolescence.
This was before fiascos like Mike Rowe, of Canada, having his mikerowsoft.com taken away.
Working definition of "cool URI":
* All URI's currently resolving are "tentatively cool".
* Any URI that disappears and changes at any time isn't "cool" --- and never was: the "tentatively cool" designation was mistaken all along.
Although they seem to have not learned anything, and are now using .jsp instead of .pl.
I have a custom PHP app that includes marketing pages.
I'd like to crowbar Wordpress into the server to serve the marketing pages instead, to make it easie to change text over time.
A .htaccess set of redirect rules may indeed work, but it's hard work to keep all URLs working.
It really depends on which tools you use. Yes, wp has a horrific URL routing mechanism, and on top of that can randomly redirect 404 pages by guessing the target.
It's still an inconvenience when a URL moves, but before the likes of Google that used to be a huge inconvenience and it would often take tens of minutes to track down the new location, now it's on the order of tens of seconds at most.
Cynic in me thinks they're doing it on purpose.
Sci-Hub works largely through an applied identifier, DOI (though URLs also work), which is a unique-per-article identifier present since the 1990s. Earlier gets a bit difficult.
Title, author, date, and publisher, you know, what your uni essays instructor always insisted on for footnotes, are pretty good identifiers, and will likely be meaningfully unique. Adding a publication location (as in a newspaper byline) also helps.
Ironically, many news organisations fail to include such information not only on individual articles but anywhere apparent on their website. You'll get city, and possibly county, but not a state or province or country. This in an age of, literally, worldwide access.
Otherwise, The Internet Archive does yoeman's work in creating a permanent record, where they're allowed to.
For example, Yahoo.com has remained the URI for Yahoo's homepage and will continue to until it's the 404 page. Yahoo.com is not cool.
Is not a meaningful response to the claims I presented.
At the point that I'm able to swap two resource with another and only reason to believe that a given resource is in fact the resource I referenced is the fact that the unique identifiers are the same means it's in fact useless as an indentifier.
In fact, I would claim that the whole idea of URIs as currently used is malformed, since it fails to allow a user to be able to know the resource reference is in fact the resource received.
In 1998, "yahoo.com" referred to the homepage of Yahoo!. In 2016, "yahoo.com" still refers to the homepage of Yahoo!. The content of the page isn't the same, but it's still the same 'entity' (the homepage of Yahoo!). Hence, despite Yahoo! being uncool, and the contents of the Yahoo! homepage being uncool, the URL of the Yahoo! homepage is cool.
It would be nice if URLs had a built-in capability to reference immutable versions of things, similar to git references. This is done on niche networks like IPFS (I think) and Freenet. It's easy to imagine a DNS-like layer which sends "pet names" like "yahoo.com" to the latest version, whilst leaving "raw" names alone (like DNS does for IP addresses).
With that said, it would also be nice if DOM elements got their own URLs which user agents could load instead of the whole page (imagine anchors but using something like XPath). That would make hyperlinks more useful and bring us a step closer to Xanadu-like behaviour, e.g. rather than copying text from a page and wrapping it in quotation marks, you could embed that element inside (a quotation element inside) your page.
Oxymoron is an idea whose definition competes with itself.
Nothing about coolness is against "staying forever" as such.
An actual oxymoron would be that something can be a "permanent fad" (as the nature of fads is to be transient).
Or that "it's cool to be uncool".