Hacker News new | comments | ask | show | jobs | submit login
Cool URIs don't change (1998) (w3.org)
297 points by benjaminjosephw on May 17, 2016 | hide | past | web | favorite | 122 comments



It is confusing to a lot of people, but they aren't functionally interchangeable.

Basically you have Uniform Resource Locators (URLs), Uniform Resource Names (URNs), and Uniform Resource Identifiers (URIs). You also have International Resource Identifiers (IRIs), which are URIs with rules allowing for international character sets in things like host names.

Every URN and URL is a URI. However, not every URI is a URN, or a URL.

A URN has a specific scheme (the front part of a URI before the :), but it does not contain instructions on how to access the identified resource. We humans might automatically map that to an access method in our head (e.g., digital object identifier URNs like doi:10.1000/182, which we who have used DOIs know maps to http://dx.doi.org/10.1000/182), but the instruction isn't in the URN.

A URL is not just an identifier but also an instruction for how to find and access the identified resource.

For example http://example.org/foo.html says to access the web resource /foo.html by using the HTTP protocol over TCP to connect to IP address which example.org resolves to, on port 80.

An example of URIs which are not URLs are the MIME content ids used to mark the boundaries within an email (cid scheme), e.g., cid:foo4%25foo1@bar.net.

You can get more information at: https://tools.ietf.org/html/rfc2392


IMHO the distinction between URL and URI is similar to the debate on SI prefixes for bytes, or whether we should insist on calling Linux GNU/Linux; i.e., most people just don't care enough so these things will never gain currency.


Yep. The WHATWG recommends just using the term URL instead of URI.

> Standardize on the term URL. URI and IRI are just confusing. In practice a single algorithm is used for both so keeping them distinct is not helping anyone. URL also easily wins the search result popularity contest.

https://url.spec.whatwg.org/#goals


Thank you so much! I needed an authoritative source to throw at people when they try to belittle me for using URL, instead of URI. To me, URL just sounds better.


And it is more descriptive as well, in the context of web browser addresses: why call something an animal, when you can call it a dog?


A lot of missteps in the early days of web technologies have made stable URLs impractical, unfortunately.

One problem is that someone decided to include file name extensions. Maybe this happened naturally because web servers made it so easy to expose entire directory structures to the web. And yet, this continues to be used for lots of other things. It is so ridiculous that a ".asp" or ".php" or ".cgi" causes every link, everywhere to depend on your arbitrary implementation details!

Another problem is that many software stacks are just not using brains when it comes to what would make a useful URL. Years ago I was very frustrated working with an enterprise software company that wanted to sell us a bug-tracking system and they didn’t have simple things like "server.net/123456" to access bug #123456; instead, the URL was something absolutely heinous that wouldn’t even fit on a single line (causing wrapping in E-mails and such).

Speaking of E-mail, I have received many E-mails over time that consisted of like TWELVE steps to instruct people on how to reach a file on the web. The entire concept of having a simple, descriptive and stable URL was completely lost on these people. It was always: 1. go to home page, 2. click here, ..., 11. click on annoying “content management system” with non-standard UI that generates unbookmarkable link, 12. access document. These utterly broken systems began to proliferate and it rapidly reached the point where most of the content that mattered (at least inside companies) was not available in any sane way so deep-linking to URLs became pointless.


>A lot of missteps in the early days of web technologies have made stable URLs impractical, unfortunately.

Preserving URLs across technology migrations is always possible (assuming the old documents or at least a rough equivalent still exist in the new system), and usually not even hard, to achieve. Rewrites are not rocket science.

Assuming short-sighted developers put .php on the end of all the links, migrating to a better system can use rewrite rules that trivially remove any .php ending from a URL, or that remap old folder layouts to new folder layouts.

Even if you can't use mod-rewrite because of complexity that it can't handle, all you need is a server with a hash map (or equivalent) that responds with 301 redirects [1] from the old poorly designed link to the newer, cleaner link.

The other problem you mention (no deep linking at all) is its own form of stupidity in design, but since the deep links don't even exist, that's a separate issue.

[1] https://en.wikipedia.org/wiki/HTTP_301


Can't remember where I first read it but back when "canonical URLs" were finally starting to get traction the saying went that:

> The more expensive the CMS, the crappier the URLs.


" There is nothing about HTTP which makes your URIs unstable. It is your organization. "

I think this could be applied to more than just how companies manage URLs.

Also, I'm trying to find a post I recently read that talked about how calling URLs "URI"s is just confusing nowadays since almost everyone still only knows the term URL, and they're functionally interchangeable.


Was it https://news.ycombinator.com/item?id=11673058 (My url isn't your url) ?


It is indeed!


Another addition to their 'Hall of Flame' might be the British Monarchy. A couple of weeks ago, they broke every existing URI when they moved from www.royal.gov.uk to www.royal.uk. Every URL from the old domain gets redirected to the root of the new site.

https://www.google.co.uk/search?q=site%3Aroyal.gov.uk


You'll be pleased to learn that the National Archives keeps an archive of all UK government sites, e.g.

http://webarchive.nationalarchives.gov.uk/20130403203037/htt...

See http://nationalarchives.gov.uk/webarchive/


Never fear, the Daily Mail will surely make a call to oust the Monarchy on this basis..

You'd think ICANN would have a .monarchy TLD by now

mailto:liz@british.monarchy


>Never fear, the Daily Mail will surely make a call to oust the Monarchy on this basis..

On this basis? The year being 2016, and those being "blue blooded" privileged rulers is not enough?


Apologies, I wasn't trying to infer an opinion either way.

More an opinion that the decision to provide the worst possible UX has a likelihood of being highlighted by the populist media.


You're thinking of the Guardian ... the DM love the royals.


hah, I regret the comment either way now!


Interesting, the link as posted here violates the guidelines. Perhaps you meant to link to

https://www.w3.org/Provider/Style/URI

;)


A subtle way of airing my own views on the matter... nothing to do with blindly copy-pasting. ;)


What's weird is that the URL gives a 'mixed-content' warning in Chrome, supposedly for the logo. But in the markup, that image is reference by a relative URL; I can't figure out why Chrome is trying to load that image via HTTP...


Somewhat oddly, the relative path 301s to an HTTP address but is then changed to HTTPS via HSTS.


This is a make-work trap for conscientious people.

If it's more efficient for your business/project to change your URIs when going through a website design, go ahead (with the knowledge that you'll lose some traffic, etc.)

Seriously, there's no reason to feel guilty over this. It's not your fault, it's the fault of a system that built two UIs into every website (the website's HTML and the URL bar -- the second of which is supposed to be useful for browsing and navigation just like the first).

If W3C actually cared about links continuing to work, they would fix it at the technical level by promoting content-addressable links instead of trying to fix it at the social level (which will never work anyway, the diligent people that care about these things will always be just a drop in the bucket).


I work for a publisher that produces several articles a day that include external links for reference. More and more I seem to be coming across cases where those links are now broken. In the near future, I will startup an automated script that checks for broken links (and I'm guessing I may have to get it warning on redirects since certain bad actors use redirects when they should be using 410).

When I have some decent results, I'll be ensuring the editorial team is aware of which sites in particular are prone to breaking links, and which they can trust. The net effect will be that we will be less likely to drive traffic to certain domains. Whether enough other people will do this to make any kind of meaningful difference is unknown, but it's certainly better to be a trustworthy site that it can't hurt to link to.

On a related note, I learnt this week that taken-down YouTube videos are a PITA. Not only do they give a 200 when requested, they also give zero results when looked-up via the youtube api. Sure, they can still be treated as a 'broken link' from our end, but it would be nice to be able to differentiate between a video that was taken down and one that may never have existed in the first place.


There's always the Internet Archive Wayback Machine "save page now" feature. Wikipedia's been pretty enthusiastic about using it to make sure that external links remain available -- and that the cited version is available.


What I want to do is make an extension that sends a request to web.archive.org/save/{URL} whenever you visit a webpage. To prevent flooding, I would make it only send a request for a specific URL once a month.


> If W3C actually cared about links continuing to work, they would fix it at the technical level by promoting content-addressable links [...]

Now, while I very much like the idea of content-addressable systems, that is not the solution to this problem. Addresses/names often are used to identify more abstract things than "this sequence of bytes". For example, company A's current list of prices is not a fixed sequence of bytes, but rather an abstract concept that refers to information that varies over time. The purpose of a name in this case is that it allows you to obtain an up-to-date version of some information.

A name that is derived from the content that you want to obtain cannot possibly do that job. Only names maintained by people who understand the continuity of those varying byte sequences can do that.


Heh, you got me:) I agree the situation is more complicated than I described.

My main response to this is: just because we can't always use content-addressable links doesn't mean we shouldn't use them where we can. I'm sure you see where I'm going with this. When pirate movie websites have a 1000x more rock-solid way of identifying content than the NY Times we have a problem.

But even in the most inhospitable situations like the one you describe we can still do better. Content-addressability won't help, agreed. How about promoting UUID style non-human readable paths instead? One of the reasons links change is that the content changes slightly and the name goes out of date. Perhaps in your case legal says "these prices may be up to ten minutes out of date, we can no longer call that path /prices/current/feed". If the path was /3824-3822-4864 this wouldn't be a problem. Information about what that endpoint returns can be placed in a description field or in the docs where it can be kept up-to-date without breaking links.


I don't think that would make any real difference. Links break because people switch to new software that has different URIs for the content and they don't see the problem, or due to software that has stupid mechanisms for generating URIs, that's it. URIs changing due to a concious decision to change the URI are the rare exception, it's mostly just a side effect that noone cares about.

Also, in your specific example, it would still be possible to replace the old content with a document explaining the change and linking to the new URI, or possibly even an HTTP redirect. Plus, a semantically meaningless name might still not prevent the problem, as there just as well could be legal consequences if you have communicated before that this URI provides "current prices" via some other means than the URI itself.


Well exactly. You can change your URLs if you want, but it's not cool. You can choose not to be cool.


Working on IPFS/Tahoe-LAFS/Camlistore/BitTorrent is cool. Doing make-work isn't.


Like a lot of things a decade and a half ago, the W3C was pushing a good idea to the point of it being an unrealistic ideal, in an environment where most people were doing the exact wrong/opposite thing.

Now the situation is different, and a lot of these messages have finally sunk into the mainstream. I think you're right; there's no need for an army of purists to keep driving these points home. We get it, we know where the trade-offs are. This needs to be interpreted in a larger context of what was going on at the time.


Oh, well said!

Any strong language I used was directed at me as of a year ago who was wasting time on redirects no one would ever use, not the original authors.


I remember Jeremy Keith talking about this at dConstruct conference; he put a bet on Long Bets that the URI of the bet wouldn't change [0]

[0] - http://longbets.org/601/


A bit meta but can someone tell me if Warren Buffet is on course to win his bet[0]? It's set to expire next year.

[0] http://longbets.org/362/


There was an episode of Planet Money just a couple of months ago that says Buffett is indeed on track to win the bet:

http://www.npr.org/2016/03/10/469897691/armed-with-an-index-...


CAGR of the S&P500 from 1/1/2008 until the end of 2015 was 6.5%. No idea what fund of funds they used but it's probably not going to be trivial to beat that (+whatever 2016 holds) over the time span after fees.


Thanks guys! :)


I wonder what happens to long bets when the charity of the winner no longer exists by the time a bet ends.


This is defined in the rules:

2. The winnings (plus growth) are awarded with fanfare to the winner's preferred charity. The winner, if still alive, can change charities if desired (but not to multiple recipients). If the winner is no longer alive and the originally designated charity is gone or drastically changed, Long Bets may award the winnings to a charity deemed closest to the winner's original intentions.

http://longbets.org/procedure/


Thanks!


The most likely change would be for protocol to move from http to https (and for non-secure URIs to be 301 redirected/forced by HSTS) though I don't think that qualifies as a change under their rules.


Under "Detailed Terms":

> A 301 redirect from www.longbets.org/601 to a different URL containing that text would also fulfill those conditions.


Interestingly it only specifies one level of redirection. So if the redirect were from http to https, and then to some other URL, it seems like that would fail to satisfy the requirements as written.


I think that's being pedantic; multiple redirects really don't break the spirit of the clause.


The bet is for www.longbets.org/601, and does not include a http or https protocol. Amusingly enough www.longbets.org/601 currently redirects to longbets.org/601, though the detailed terms cover this.


The Detailed Terms section makes clear that it's for HTTP:

> …entering the characters http://www.longbets.org/601 into the address bar of a web browser or command line tool (like curl) OR using a web browser to follow a hyperlink that points to http://www.longbets.org/601 MUST …


He's actually already won. The original URL included www., the current one doesn't anymore.


> A 301 redirect from www.longbets.org/601 to a different URL containing that text would also fulfill those [winning] conditions.

The bet (and the referenced article) is about the availability of the resources at the end of those URLs - the URL itself is just a reference, or a pointer if you will to a resource.


> A 301 redirect from www.longbets.org/601 to a different URL containing that text would also fulfill those conditions.


On a related note, URIs shouldn't end in extensions (use content negotiation!), content should be readable without executing code (no JavaScript necessary), content should be available in multiple languages (use content negotiation!), and RESTful interface should offer a simple forms-based interface for testing, &c.


I've settled for "every article has an URL that looks like a folder" (including trailing slash – that particular debate is pointless), only resources like pictures look like a file with extensions.

That's easy to achieve with all CMSen, but also trivially done with a static website (the web server is configured to use index.html or whatever you like).


> RESTful interface should offer a simple forms-based interface for testing

There are plenty of tools which can serve this need. Why complicate every implementation with duplicative functionality?


Content negotiation for language and format? Not in any human-usable system, please. Sharing a URL then getting content in a different language is incredibly annoying. There may be no alternative for home pages, and using the Accept-Lang header is nicer than using the broken geo system for sure.

But it makes no sense to have a link to a document and get completely different content back based on language. And file extensions are needed since browsers don't expose a way to ask for, say, PDFs or images over HTML.


On the other hand, "extension" is just an interpretation.


Unfortunately not all CDNs respect the Vary:Accept header.

I'm sure it's behind the proliferation of the .format extension to determine response type.


So, a spec that doesn't describe the current state of the web at all and which hosts have found unworkable and over-constraining.


I was recently digging through my old blog's archives, and it was appalling how many URLs from the early 2000s have completely disappeared, despite the fact that the sites which served them remain — and gratifying when I was able to reload some fringe resource from 1998 or 2003.

The Web is about webs of durable readable content, not about ephemeral walled-garden apps.


Unfortunately GitHub seems to be pretty guilty of this too.

Lost count of the number of times I've clicked a link in a blog post only to end up with a GitHub 404; I'm sure it is only going to get worse.

Most annoying thing with that one is usually simply going to GitHub search and putting in the original 404'd repo name turns up the one I want.

I'm sure having the 404 serve up a "did you mean" search result would solve the issue for the most part.


> The Web is about webs of durable readable content, not about ephemeral walled-garden apps.

Facebook is the worst at this. They even require you to have an account and log in to read a public-facing community/restaurant fb page


I would say "was supposed to be", not "is", but would otherwise agree.


One downside of this: I now feel like I can't create a proper place to keep my writing or other ideas until I carefully think of a URL scheme that I can maintain for eternity.


just use random numbers.

This is meant more seriously than it might sound. But in the field of "persistent identifiers", there's a notion that languages changes over time (a common example being the word "gay"), so introducing meaning into identification schemes might not be a good idea.


Alternative why not have both? Use the meaningful url as most links, but have a randomly generated perma link that will always redirect to the same article


Because other people will also link to it using the "meaningful" URL, and when that URL changes the links will break, and now you've defeated the whole purpose of using URLs that don't change.


Let your sitemap page provide a URL history database (directly accessible from any 404 page too!) for every non-permanent URL. That way you can find old permalink mappings for your links.


Or sequential numbers, if you don't need to hide anything from being discoverable that way. Or is there another downside to that?

I use "slugs" for category type nodes like "photos" or "blog", but I don't give individual "things" a slug (never liked the idea of putting the title into the URL just for SEO). And of course, everything that has a slug always also has the numerical ID and can be reached both ways.

Now that I think of it, I might add a table keeping track what nodes slugs were used for, and in case of 404 either redirect when there is only one option, or display all the nodes that once used that slug if there is more than one. Seems like that would be the next best thing to an actually never changing URL, right?


> Or sequential numbers, if you don't need to hide anything from being discoverable that way. Or is there another downside to that?

Sequential numbers either require state, which must be kept in sync (e.g. the current-highest index, or the set of all URLs used so far which you can take the "max" of); or else their discovery process is slow (i.e. keep looking up URLs until you find the highest unavailable one).

Random numbers don't need state; you can either make the space really huge, and ignore the potential for collisions; or else you can look up the generated URL to see if it's in use, and only have to perform another check in the unlikely case that it is.

Whilst it may be easy to maintain the state for sequential numbers (if, say, they already exist as database IDs), the failure mode of random numbers is better. If something goes wrong with the lookup process for random numbers, e.g. it might miss a collision, we're still protected by the incredibly low chance of collisions in the first place. For sequential numbers, we're guaranteed to hit a collision in this case, i.e. if a glitch prevents the state from getting updated, the next URL will definitely collide with the previous URL.


In Postgres, sequences are guaranteed to be monotonically increasing even in the presence of failed transactions. So retrieving an index using nextval()[2] will never collide[1].

[1] A malicious pedant could create situations where it wraps around or collides.

[2] http://www.postgresql.org/docs/current/static/functions-sequ...


> In Postgres, sequences are guaranteed to be monotonically increasing even in the presence of failed transactions.

Well, I was talking about what happens in case of failure; not the chances of a failure to begin with. If, hypothetically, Postgres forgot to update a sequence number, then a collision would be guaranteed, since that's a property of sequential numbering: all updates are contending for the same "next" number, and must be managed in some way. Random numbers spread across a large range, so work even without management.

Even if we ignore the possibility of bugs in Postgres, or cosmic rays flipping bits in RAM (which may have a similar probability to a random ID scheme colliding), etc. that's still only true inside the confines of Postgres.

Once data starts working its way through layer after layer of shell scripts, caches, proxies, etc. there's a lot of scope for things to go awry. Plus, since we're talking about longevity, there's no guarantee that we'll still be using Postgres in 20 years' time, or maybe we need to migrate or integrate with some other system, etc.

I'm reminded of the "one in a billion" chances that used to be claimed for DNA evidence in court. It is certainly true that the probability of two non-twins having identical DNA is that low, but that doesn't really mean much; even if the probability of identical DNA were zero, that doesn't tell us anything about the collison rate of the sub-set of bases which were used, or the error rate of the sampling process, or the mislabelling rate of the lab, or the corruption rate of the evidence handlers, etc. :)


Yeah, I was thinking of auto-increment database IDs actually :)


> never liked the idea of putting the title into the URL just for SEO

What about for indicating the likely content of a URL, when all someone has is that URL?


Oh, that's a good point. There's really something to be said for hovering a link and getting more than just random numbers and letters. And I know this as a user, but I never thought about the stuff I make in this light.. thanks!


Another reason why meaningful names are a terrible idea is because they're often valuable real estate which people will want to reclaim for something else later, or part of valuable real estate like some institutional website's directory hierarchy which the admins feel a compulsion to keep clean and logical and free of crufty old side-passages from years or decades ago.


I feel like the canonical URL for every page should be an UUID, with a database of which human readable document names maps to which UUID:s (and a history of changes should be provided via some tool in case somebody links using the human readable name and it gets replaced, perhaps via the sitemap page).


I feel like the canonical URL for every page should be an UUID

Please, not just a UUID. I'd like to know what to expect before clicking a link.

I think nu.nl has found an interesting solution to this problem: every article has a canonical url of the form $section/$articlenumber/$title.html -- However, the "title" part (everything between the last slash and .html) is ignored by the server, and is pretty much free to alter as you like. All these three links resolve to the same article:

http://www.nu.nl/buitenland/4263268/

http://www.nu.nl/buitenland/4263268/amerikaanse-stad-moet-st...

http://www.nu.nl/buitenland/4263268/judge-orders-cleveland-m...


I almost felt like adding that. I'm guessing appending a title after a # character would be the most technically simple solution?


Technically simple, yes, but semantically different. A link /path/to/page#title would imply (at least to me) that the page holds more information than only the #title article.


Well, if the language has changed so much that the URL is meaningless what are the odds the content will be comprehensible?


Why not store things by date, at least?


The creation date isn't always that relevant, and may in fact be confusing if the page is later updated.


You can always post it again under a new URI and redirect the original to the new more appropriate spot.


I, on the other hand, am looking forward to carrying this responsibility! I'm about to move my blog from Wordpress to a custom solution, and besides importing all the content, I plan to set up a URL router that matches all old posts' URLs and redirect them to the same posts in the new scheme. :).


There's RFC 4151 to keep in mind, TagURI, a scheme for URIs that is independent of URL (but can use it too). One reason to use it would be to mark a page as being the same resource even though the domain or URL had changed.

http://www.taguri.org/

https://en.wikipedia.org/wiki/Tag_URI_scheme

I wrote a library for it in Ruby which is how I know about it.


This is a cool scheme to address the problem of URL-impermanance due to the fact that DNS name ownership/control can change over time. It allows you to specify a URL plus a date.

I wrote a short blog post about the problem a few years back, and the Tag URI scheme ended up being one of the best solutions I came across, which is how I know about it. Some links in that post and comments may be of interest to people: https://masonlee.org/2009/08/21/is-the-web-sticky-enough/


I wonder if the footnote was also written in 1998:

Historical note: At the end of the 20th century when this was written, "cool" was an epithet of approval particularly among young, indicating trendiness, quality, or appropriateness.


Nope. https://web.archive.org/web/19990508205057/http://www.w3.org...

It was added sometime between Nov. 27, 2001 and Dec. 14, 2001: https://web.archive.org/web/20011214140114/http://www.w3.org...

For those wondering like I was (sigh), the typos that appear in the Hall of Flame footnotes when added in early 2000 were mostly fixed sometime in 2001, the last typo "uit" fixed in 2004.


That's somewhat ironic given how durable the word "cool" has been, despite the general trend of slang to go out of style fairly quickly (c.f. def, bully, radical, bodacious, groovy, boss, all that, etc...).


Cool URLs still work in OmniWeb on a NeXT with a decades old bookmarks file: https://www.flickr.com/photos/osr/17082625625/lightbox

The Mondo 2000 interview bookmarks, not so much.


The example URL they list from w3.org website (http://www.w3.org/1998/12/01/chairs) is now broken. Great deal of irony.


It requires authentication so not broken but still a poor example.


What about case sensitivity? https://www.w3.org/provider/style/uri doesn't work.


What about it? That's a different URI.


Pre-DMCA. If this gets an update, TBL should add legal reasons and law enforcement as a reason for URI change.


I've always figured an HTTP 410 response would be sufficient for that.


I'd still consider a 410 as a broken link. The article states

> Pretty much the only good reason for a document to disappear from the Web is that the company which owned the domain name went out of business or can no longer afford to keep the server running.


Not 451?


Shouldn't there be some tradeoffs?

Say I sign up with service x with the name y. My URL is www.x.com/users/y. Years later I delete my account. Someone else signs up with the name y. Now www.x.com/users/y goes to someone else's resources. The old URL is broken.

The only way to prevent this is either give the user a url (that they will want to share) that is not meaningful or disallow anybody to sign up with a name that was ever in use and names are a resource that is very limited.

Neither seems ideal. I do agree on principal that URLS shouldn't change, though.

Hotmail actually has this problem, or at least they used to. They delete accounts that are inactive for a long time and someone else can sign up with that name. The new person can get email addressed to the previous owner.


What this is about is content that gets moved to different addresses. Recycling addresses isn't that good either, but also not that bad.

What is really bad is when there is some document/resource/content that is linked to by others (be it in other web pages or in bookmarks), and then that document changing its address, as all those references now don't point to the document anymore, even though the document actually still is available somewhere. An address being reused for something different isn't really all that much worse than the existing references leading to a 404. But either of those is bad if the original content is still available somewhere.


The service should not reuse your username for someone else. Or, perhaps, it should not reuse your user ID, and offer a username resolution service?


>Now www.x.com/users/y goes to someone else's resources. The old URL is broken.

The URL isn't broken if it just points to a different resource.


I'd rather it broke. If someone linked to the old URL, describing it as "the profile page of an international child murderer", and I later inherited that URL by signing up for the service, I wouldn't be too pleased.

The "y" example is confusingly simplistic; the site already has a "can't use duplicate usernames" rule, so 404'ing the original URL isn't 'worse' for anyone than if the original user didn't delete their account.


Sure not technically broken, it is semantically broken as the meaning has changed.


"Except insolvency, nothing prevents the domain name owner from keeping the name."

Ah, the halcyon days of the Internet's adolescence.

This was before fiascos like Mike Rowe, of Canada, having his mikerowsoft.com taken away.


I just learned of this case. I would say he didn't have his domain name taken away, but voluntarily gave it up when he chose to settle (for an X-box...). I've always found it a disappointing aspect of lawsuits that they end up in settlements so often, this would have formed an interesting precedent since he might have had good chances of winning the case (real name, no intention of deception). Shouldn't there be some kind of rule that prevents lawsuits from being settled when it's in the public interest or in the interest of justice?


So he just wasn't "cool", and therefore, so weren't his URI's. :)

Working definition of "cool URI":

* All URI's currently resolving are "tentatively cool".

* Any URI that disappears and changes at any time isn't "cool" --- and never was: the "tentatively cool" designation was mistaken all along.


At least one of their example URLs on the page still points (eventually) to the same content: http://www.nsf.gov/cgi-bin/getpub?nsf9814

Although they seem to have not learned anything, and are now using .jsp instead of .pl.


It's tough!

I have a custom PHP app that includes marketing pages.

I'd like to crowbar Wordpress into the server to serve the marketing pages instead, to make it easie to change text over time.

A .htaccess set of redirect rules may indeed work, but it's hard work to keep all URLs working.


> It's tough!

It really depends on which tools you use. Yes, wp has a horrific URL routing mechanism, and on top of that can randomly redirect 404 pages by guessing the target.


I wonder if TBL would have written the same article 10 years later when search engines had gotten much better.

It's still an inconvenience when a URL moves, but before the likes of Google that used to be a huge inconvenience and it would often take tens of minutes to track down the new location, now it's on the order of tens of seconds at most.


That depends very much on the content. If you only have an URL and no hint of the content or title of that page, then finding it may prove difficult.


See for example any kind of documentation for any piece of hardware that's older than 5 years. The companies change their support pages structrue so often that it's incredibly hard to track anything down.

Cynic in me thinks they're doing it on purpose.


Those are particularly nasty because model numbers are often so arbitrary that a few years down the line searching for an old model number often becomes a near impossible quest of wading through stuff related to newer models with nearly the same model number.


It's a place where metadata can prove useful.

Sci-Hub works largely through an applied identifier, DOI (though URLs also work), which is a unique-per-article identifier present since the 1990s. Earlier gets a bit difficult.

Title, author, date, and publisher, you know, what your uni essays instructor always insisted on for footnotes, are pretty good identifiers, and will likely be meaningfully unique. Adding a publication location (as in a newspaper byline) also helps.

Ironically, many news organisations fail to include such information not only on individual articles but anywhere apparent on their website. You'll get city, and possibly county, but not a state or province or country. This in an age of, literally, worldwide access.

Otherwise, The Internet Archive does yoeman's work in creating a permanent record, where they're allowed to.


Oh well, Hall of fame #1 leads to a 404 for both links. So, I propose that it should be fair to call it (Hall of fame #1)².


It's the Hall of flame. They are examples of URIs that were changed for no good reason.


Idea that something cool would stay cool forever sounds like oxymoron.

For example, Yahoo.com has remained the URI for Yahoo's homepage and will continue to until it's the 404 page. Yahoo.com is not cool.


No, the idea is that URL that don't change are cool.


>> "No, [repeat idea]."

Is not a meaningful response to the claims I presented.

At the point that I'm able to swap two resource with another and only reason to believe that a given resource is in fact the resource I referenced is the fact that the unique identifiers are the same means it's in fact useless as an indentifier.

In fact, I would claim that the whole idea of URIs as currently used is malformed, since it fails to allow a user to be able to know the resource reference is in fact the resource received.


The "cool URLs" idea doesn't (AFAIK) cover mutability and temporal issues.

In 1998, "yahoo.com" referred to the homepage of Yahoo!. In 2016, "yahoo.com" still refers to the homepage of Yahoo!. The content of the page isn't the same, but it's still the same 'entity' (the homepage of Yahoo!). Hence, despite Yahoo! being uncool, and the contents of the Yahoo! homepage being uncool, the URL of the Yahoo! homepage is cool.

It would be nice if URLs had a built-in capability to reference immutable versions of things, similar to git references. This is done on niche networks like IPFS (I think) and Freenet. It's easy to imagine a DNS-like layer which sends "pet names" like "yahoo.com" to the latest version, whilst leaving "raw" names alone (like DNS does for IP addresses).

With that said, it would also be nice if DOM elements got their own URLs which user agents could load instead of the whole page (imagine anchors but using something like XPath). That would make hyperlinks more useful and bring us a step closer to Xanadu-like behaviour, e.g. rather than copying text from a page and wrapping it in quotation marks, you could embed that element inside (a quotation element inside) your page.


Yahoo simply naming their name might make them more cool and reflect they've changed. Have a page forever saying Yahoo is now _______ would be lame and void the point of changing their name.


>Idea that something cool would stay cool forever sounds like oxymoron.

Oxymoron is an idea whose definition competes with itself.

Nothing about coolness is against "staying forever" as such.

An actual oxymoron would be that something can be a "permanent fad" (as the nature of fads is to be transient).

Or that "it's cool to be uncool".




Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: