While the main concept (Don't change your URIs!) is good, I can't agree at all with their advice on picking names, in particular the 'what to leave out' section. No subject or topic? The justification for this is flimsy at best - 'the meaning of the words might change' So what? People cope with this all the time in other media, e.g. old books. It's not too confusing. What's more confusing is a URI that has all the meaning removed, after all this whole URI discussion is about the human appearance of URIs. Take out the topics and you are just left with dates, numbers and unspecific cruft. If I was designing a company's website, I'm sure as hell going to put the product pages under '/products'.
FWIW, the document's own URI is terrible: 'https://www.w3.org/Provider/Style/URI' - who could have any idea what the page is about from that? And what if the meaning of the word 'Provider' or 'Style' changes in x years from now? :) You could argue that the meaning/usage of 'URI' has already changed, because practically no-one uses that term any more. Everyone knows about URLs, not URIs. Not many people could tell you what the difference was. So the article's URI has already failed by its own rules.
IMO that's a pretty good URL. For example if you drop it often in conversations, you can remember it, since it's short enough and has no numbers or awkward characters. I would have preferred lower casing and if you try it with lower cased letters, it doesn't work, but other than that ...
No, a URL doesn't necessarily have to give you the title of the article, even if having some related words in it might be good for SEO value. If you paste it in plain text or similar, add a description to it. Here's how:
I think the short gist of it is: naming things is a hard problem.
I think you both stumbled upon a fundamental part of the discussion: the tension between finding a way to identify resources (or concepts, or physical things) in a unique and unambiguous fashion, and affordances provided by natural language that allow human minds to easily associate concepts and labels with the things they refer to.
The merit of UUID's, hashes or any other random string of symbols which falls outside of the domain of existing natural languages, is that doesn't carry any prior meaning until an authority within a bounded context associates that string with a resource by way of accepted convention. In a way, you're constructing a new conceptual reference framework of (a part of) the world.
The downside is that random strings of symbols don't map with widely understood concepts in natural language, making URL's that rely on them utterly incomprehensible unless you dereference them and match your observation of the dereferenced resource with what you know about the world (e.g. "Oh! http://0x5235.org/5aH55d actually points to a review of "Citizen Kane")
By using natural language when you construct a URL, you're inevitably incorporating prior meaning and significance into the URI. The problem is that you then end up with the murkiness of linguistics and semantics, and ends with all kinds of weird word plays if you let your mind roam entirely free about the labels in the URI proper.
For instance, there's the famous painting by René Margritte "The treachery of images" which correctly points out that the image is, in fact, not a pipe: it's a representation of a pipe. [1] By the same token, an alternate URI to this one [2] might read http://collections.lacma.org/ceci-nest-past-une-pipe, which incidentally correct as well: it's not a pipe, it's a URI pointing to a painting that represents a physical object - a pipe - with the phrase "this is not a pipe."
Another example would be that a generic machine doesn't know if http://www.imdb.com/titanic references the movie Titanic or the actualy cruiseship, unless it dereferences the URI, whereas we humans understand that it's the movie because we have a shared understanding that IMDB is a database about movies, not historic cruiseships. Of course, when you build a client that dereferences URI's from IMDB, you basically base your implementation on that assumption: that you're working with information about movies.
Incidentally, if you work with hashes and random strings, such as http://0x5235.org/5aH55d, you're client still has to be founded on a fundamental assumption that you're dereferencing URI's minted by a movie review database. Without context, a generic machine would perceive it as random string of characters which happens to be formatted as a URI, and dereferencing it just gives a random stream of characters that can't possibly be understood.
It's an interesting topic. I agree with you that identifiers can be intended for humans or machines, and there's often different features to optimize for depending. URIs are the strange middle ground where they include pitfalls of having to account for both humans and machines.
In an interesting way, each individual website has to come up with its own system for communication. It may be a simple slug (/my-new-blog/), or it may be an ID system (?post=3). It could be something else completely.
There is some value in offering that creativity, but a system where URIs are derived from content also makes a lot of sense to me. You mentioned a hash which I think is the right idea.
It seems reasonable enough that URIs could take inspiration from other technologies like git, or even (dare I say) blockchain. This leads naturally to built in support for archiving of older versions, as content is diffed between versions.
There's some fun problems to think about like how to optimize the payload for faster connections, then generate reverse diffs for visiting previous versions. Or if browsers should assume you always want the newest version of the page, and automatically fetch that instead.
This solves some problems, and creates many others. Interesting thought experiment anyway.
This is one of those classic, foundational documents about the Web. But it's rarely followed. Tool use has come to dominate the form that URIs take; tools are used both for delegation and to absolve humans from crafting URIs by hand. Switching tools frequently ruins past URIs.
Additionally, widespread use of web search engines has made URI stability less relevant for humans. Bookmarks are not the only solution to find a leaf page by topic again. A dedicated person might find that archiving websites may have preserved content at their old URIs.
Some of this is allowed to happen because the content is ultimately disposable, expires, or possesses limited relevance outside of a limited audience. Some company websites are little more than brochures. Documents and applications that are relevant within organizations can be communicated out of band. Ordinary people and ordinary companies don't want to be consciously running identifier authorities forever.
The web has evolved well beyond what it was envisioned to be at the time this was written - a collection of hyperlinked documents.
The reason for the eventual demise of the URL will simply be the fact that the concept of "resource" will just not be sufficient enough to describe every future class of application or abstract behavior that the web will enable.
It depends on how you define a "resource" and what which value you attribute to that resource. And this is exactly the crux: this is out of the scope of the specification. It's entirely left to those who implement URI's within a specific knowledge domain or problem domain to define what a resource is.
Far more important then "resource" is the "identifier" part. URI's are above all a convention which allows for minting globally unique identifiers that can be used to reference and dereference "resources" whatever those might be.
It's perfectly valid to use URI's that reference perishable resources that only have a limited use. The big difficulty is in appreciating resources and deriving how much need there is to focus on persistence and longevity. Cool URI's are excellent for referencing research (papers, articles,...), or identifying core concepts in domain specific taxonomies, or natural/cultural objects, or endorsement of information as an authority,...
The fallacy, then, is reducing URI's to how the general understanding of how the Web works: the simple URL you type in the address bar which allows you to retrieve and display that particular page. If Google et al. end up stripping URL's from user interfaces, and making people believe that you don't need URI's, inevitably a different identifier scheme and a new conceptual framework will need to be developed to just to be able to do what the Web is all about today: naming and referencing discrete pieces of information.
Ironically, you will find that such a framework and naming scheme will bear a big resemblances, and solves the same basic problems the Web has been solving for the past 30 years. And down the line, you will discover the same basic problem Cool URI's are solving today: that names and identifiers can change or become deprecated as our understanding and appreciation about information changes.
I don't think it has evolved. I feel that it became more like a hack, on top of a hack, on top of another hack, and so on.
In the late 90's - early 2000's, HTML started to being pushed into fields that, at my opinion, were unrelated (remember active desktop?). Before you had time to react, HTML was being used to pass data between applications. At the time I was already doing embedded stuff and I remember being astonished to learn that I have to code an HTML parser/server/stack in my small 16-bit micro because some jerk thought it was a good idea to pass an integer using HTML (SOAP, for example).
In the meantime, HTML was being dynamically generated, and then dynamically modified in the browser, and then modified back in the server using the same thing you use to modify it in the browser. It's a snowball that will implode, sooner or later.
"a hack, on top of a hack, on top of another hack, and so on" is evolution.
My HN username may be a case in point, drawing from a selection of twice five[0] digits due to legacy code of Hox genes: https://pubmed.ncbi.nlm.nih.gov/1363084/
[0]
"This new art is called the algorismus, in which /
out of these twice five figures /
0 9 8 7 6 5 4 3 2 1, /
of the Indians we derive such benefit"
You might see the Homer Simpson Car[0] and call it evolution too. But what I see is a mess, described as a sequence of hacks and bad decisions, just like HTML (and web stuff) today.
(1) some operators only care about a handful of the URLs under their domain;
(2) hardly anyone uses link relations, so most links are devoid of semantic metadata and are essentially context-free, requiring a human to read the page and try to guess the purpose of the link;
(3) so many 'resources' are now entire applications, and the operators of these applications sometimes find it undesirable to encode application state into the URI, so for these you can only get to the entry point -- everything else is ephemeral state inside the browser's script context.
But I disagree with the statement that "the reason for the eventual demise of the URL will simply be the fact that the concept of 'resource' will just not be sufficient enough to describe every future class of application or abstract behavior that the web will enable."
URIs are a sufficient abstraction to accomodate any future use-case. It's a string where the part before the first colon tells you how to interpret the rest of it. It'd be hard to get more generic, yet more expressive.
The demise of URLs, if it ever comes to pass, will be due to politics or fashion: e.g. browser vendors not implementing support for certain schemes, lack of interoperability around length limits, concerns about readability and gleanability, and vertical integration around content discovery.
The web has evolved well below what it was envisioned to be 20 years ago. I can't think of a single Web-based activity I do that is not a significantly worse experience now than it was in the past.
Rhetorical question: Why must we charge annually to control domains? Should we stop doing this in the name of greater URL stability?
The article states early on, “Except insolvency, nothing prevents the domain name owner from keeping the name.” As it turns out, insolvency is a pretty significant source of URL rot, but also so is non renewal of domains by choice or by apathy, whether for financial or mere personal energy reasons (“who is my registrar again? Where do I go to renew?”) especially by individuals. You start a project and ten years later your interest has waned.
Domains are an increasingly abundant resource as TLDs proliferate. Why not default to a model where you pay once up front for the domain, and thereafter continued control is contingent on maintaining a certain percentage of previously published resources, and if you fail at that some revocable mechanism kicks in that serves mirrored versions of your old urls. Funding of these mirrors comes from the up front domain fees. Design of the mechanism is left as an exercise for the reader :-)
It's true that domains shouldn't be free, but it's a pity the money ends up piling up at ICANN. If I understand correctly, they have hundreds of millions of dollars just sitting around, on account of their monopoly.
Blogger has been serving urls for something like 17 years. I’d wager its sites have something like 2x or more average url lifespan at this point than the typical site. What we want right now is more url stability not perfect assurance of 100 year url lifespan. Don’t let the perfect be the enemy of the good.
Charging a small annual fee to me seems to be a much more elegant solution than any sort of domain monitoring system. It is a very simple way to make defunct domains available again and provide some resistance against one person registering massive amounts of domains.
It works OK to recover unused domains (but definitely no perfectly) but it does mean nearly all URLs get broken eventually even if the content is still archived somewhere.
Yes. Have you tried to do that even for moderately complex sites?
I have tried to do it a few times, and eventually just gave up. Carrying forward bad naming decisions from the past, is tremendous effort. When cleaning up the house, I also don't leave around sticky notes at the places where I removed documents from.
On top of this:
- When using static site generators, it's not even possible to do 301 redirects (you would have to ugly slow JS version).
- It does not help if you don't own the old DNS name anymore.
> When using static site generators, it's not even possible to do 301 redirects (you would have to ugly slow JS version).
That isn't always true, depending on your choice of web server. You can use mod_rewrite rules in Apache's .htaccess files so if your generator is aware of previous URLs for given content it could generate these to 30x redirect visitors and search bots to the new right place.
Off the top of my head I'm not aware of a tool that does this, but it is certainly possible. It would need to track the old content/layout so you'd need the content in a DB with history tracking (or a source control system) or the tool could leave information about previous generations on each run for later reading. Or it could simply rely on the user defining redirect links for it to include in the generated site.
Of course if you are using a static site generator for maximum efficiency you probably aren't using Apache with .htaccess processing enabled! I suppose a generator could also generate a config fragment for nginx/other similarly though that would not be useful if you are publishing via a web server where you do not have appropriately privileged access to make changes to that.
I have done this for a moderately complex site, it was a bit of work, but not the end of the world. I'm sure some things went missing, but we got 99% of it, which I consider successful enough.
You can do 301s statically, by generating whatever your particular version of an .htaccess file is in place. Or, you can generate the HTML files with the meta-redirect header in place.
The DNS is obviously an issue, but that's not really relevant. The article is advocating for URLs not changing. It's not saying that they mustn't change, just that it's really cool for everyone if they don't.
A classic way to do these redirects is on the front web server itself: .htaccess, nginx config, etc.
When you change the structure of your urls, you can generally generate redirect rules to translate old urls to the new structure. Or run a script to individually map each old url to its new one.
Note: I've never done the layter for more than a few hundreds urls, I don't know if it scales well for a very large site
> When cleaning up the house, I also don't leave around sticky notes at the places where I removed documents from.
This is a poor analogy. Perhaps “I’m a librarian for a library with thousands or millions of users, and when I rearrange the books, I don’t leave sticky notes pointing to the new locations”
I don't know about this specific website (or if it even exists), but the 981212 part of the "link" looks like the identifier to me. The way many sites are set up, most of the link is "locating", but it also contains a unique "identifying" component (page/post/item id). You can remove almost all of the locating parts and the identifier still works so the link can be resistant to everything from just a title change to a complete restructuring (as long as the IDs are kept).
The text below that example says that the .html ought not to be there. That's clearly not intended to be part of what that example is demonstrating, but I guess it's just there because they were going for real world examples.
The arbitrary path hierarchy is not so bad. Better than every URI just being https://domainname.com/meaninglesshash. You can also stick a short prefix in front, like https://domainname.com/v1/money/1998/etc, so that all documents created after a reorg can use a different prefix. If your reorg is so severe that there's no way to keep access to old documents under their old URI, even if it has its own prefix, it seem unlikely they'll be made available in any other location. In that context you can imagine the article is imploring you "please don't delete access to old documents".
Your remaining objections, for host name and access, boil down to "don't use URIs at all, and don't bother to avoid changing them". As I type this comment I'm starting to realise that was your whole point, but it was a bit buried alongside minor objections to this particular example. It's also perhaps a bit of an extreme point of view. Referencing a git hash alongside a URI is sensible, but on its own it's pretty useless, and many web pages won't have anything analogous.
Would say the most excusable part is the protocol but of course that generally ends up being a 301, albeit the URI has indeed changed.
Hostname, well perhaps if a company has been merged/sold.
Path/query is really down to information architecture and planning that early on can go a long way, e.g. contact, faq belonging in a /site subdirectory.
File extension doesn't really matter nowadays
Main thing is there's no technical reasons for the change. I recently saw someone wanting to change the URLs of their entire site because they now use PHP instead of ASP. They could use their webserver for PHP to deal with those pages and save the outside world a redirect and twice as many URLs to think about.
I really wish HTTPS hadn't changed the URL scheme so you could host both HTTPS and fallback HTTP under the same URL. However most HTTPS sites will redirect http://domain/(.*) to https://domain/$1 (or at least they should) so this doesn't need to break URLs.
> This is really a description where to find a document ("locator" not "identifier").
This is excellent. I wish more people would make your distinction between URL and URI. URIs really are supposed to be IDs. When put in that parlance, it's hard to say that IDs should change willy-nilly on the web. That said, I think that does deprioritize a global hierarchy / taxonomy for a fundamentally graph-like data structure.
> If you want something that does not change, don't link to a location but link to content directly
I see motivation for this, but I've personally found this to be equally as problematic as blending the distinction between URIs and URLs. Most "depth" and hierarchy that's in URLs is stuff that ideally would be in the domain part of the URL. For instance:
and the "blog" subdomain would be owned by a team. You could imagine "payments", "orders", or whatever combo of relevant subdomains (or sub-subdomains). In my experience this hierarchical federation within an organization is not only natural, it's inevitable: Conway's Law.
So I do very much believe that the hierarchy of content and data is possible without needing a flat keyspace of ids. Just off the top of my head, issues with the flat keyspace are things like ownership of namespaces, authorization, resource assignment, different types of formats/content for the same underlying resources etc. Hierarchies really do scale and there's reason for them.
That said, most sites (the effective 'www' part of the domain) are really materialized _views_ of the underlying structure of the site/org. The web is fundamentally built to do this mashup of different views. Having your "location" be considered a reference "view" to the underlying "identity" "data" would go a long way to fixing stuff like this.
> Historical note: At the end of the 20th century when this was written, "cool" was an epithet of approval particularly among young, indicating trendiness, quality, or appropriateness. In the rush to stake our DNS territory involved the choice of domain name and URI path were sometimes directed more toward apparent "coolness" than toward usefulness or longevity. This note is an attempt to redirect the energy behind the quest for coolness.
It's 2020 and "cool" still has that same meaning, as an informal positive epithet. I believe "cool" is the longest surviving informal positive epithet in the English language.
"Cool" has been cool since the 1920s, and it's still cool today. "Cool" has outlived "hip," "happening," "groovy," "fresh," "dope," "swell," "funky," "bad," "clutch," "epic," "fat," "primo," "radical," "bodacious," "sweet," "ace," "bitchin'," "smooth," and "fly."
My daughter says things are "cool." I predict that her children will say "cool," too.
Sophisticated used to mean false, as in sophistry: with intent to deceive. So a sophisticated wine, was an adulterated wine, that had something other than fermented grape juice in it.
"Silly" is the standard example of semantic shift over what people generally perceive to be a pretty extreme distance: https://www.etymonline.com/word/silly
I second this. Linguistics and "slang" is pretty interesting in its own right. I believe that "cool" is more universally used. Usually smooth is used to describe an action that someone did, not really heard it being used in place of "cool".
I tend to agree, smooth is IME also usually (but definitely not entirely) used sarcastically, when someone does something accidentally silly, like bumping into a glass door or similar.
It is also very international: can be used in Europe and East-Asia with the same meaning, probably globally in fact. Heck, the Japanese government started a "Cool Japan" program few years ago, and it has been borrowed as 酷 (kù) in Chinese. That’s cool.
Schroeder, Fractals, Chaos, Power Laws, points out there are probability distributions which follow the Lindy effect, and suggests they be used for project planning. ("the longer an engineering task remains unfinished, the longer it will probably take to finish")
The expression fetch, popularized by the 2006 American cultural meta-documentary and seminal work "Mean Girls" is in fact an abbreviation of the term "fetching", i.e. attractive in the British vernacular.
"Sweet" is probably more common than "cool" in New Zealand, although it's usually tied into "sweet as", as in "that's a sweet as car" or "the weather is sweet as today".
"fresh" in the vein of "cool" was definitely already in the list of "dad phrases" when I was in school. I think the usage as shorthand for "breath of fresh air" such as used in product reviews is distinct. "Sweet" is getting there faster than "cool" is. "Dope" as cool never took off here due to a local meaning of "dope" as "idiot".
One thing I have been wondering about - speaking of changing URIs, did they (W3C) change/merge the domain name from w3c.org to w3.org at some point? Some old documents seem to point to w3c.org instead of w3.org. (e.g. http://www.w3c.org/2001/XMLSchema) Not that it hugely matters, the old (?) w3c.org links still work, since they are redirected anyway.
According to WHOIS, w3c.org is from 1997 while w3.org is from 1994.
A message from a W3C staff member on a W3C mailing list on 1999-06-21 mentions [1] that w3c.org should redirect to the corresponding page at w3.org, and the latter is considered the 'correct' domain.
This is a great link and I think I’ll share it to people. I find that I struggle trying to explain why URIs shouldn’t change because it’s so ingrained in me.
One of OneDrive’s pet peeves is that if I move a file it changes the URI. So any time someone moves a file, it breaks all the links that point to it. Or if they change the name from foo-v1 to foo-v2. I wish they’d adopt google docs.
I wish operating systems managed files in a similar way. Ideally filesystems would be tag-based [1] rather than hierarchy-based. This would make hyperlinks between my own personal documents much easier and time-resistant as my preferences for file organization change.
MacOS does this. Native mac apps somehow can preserve file references even after the source file has been moved or renamed. The unfortunate part however is many cross platform apps are't written using the Mac APIs which then leaves an inconsistent experience.
I think it's for reasons like this that many mac users strongly prefer native apps over Electron or web apps.
Could have fooled me with regard to Windows. I'm unfortunately not sure what a "native" Windows app is at this point. They've gone through so many frameworks over the years, everything is a mish-mosh.
And this isn't just a result of legacy compatibility. If you are a developer today, and you want to make a really good Windows app, what approach do you take? Is it obvious?
On windows its just a resource hog. On linux and mac they stick out like a pimple on a pumpkin. The number 1 annoyance for me is because they are based on chromium which doesn't have wayland support, all electron apps do not dpi scale properly with multiple monitors.
I've encountered this disconnect between web documents and file systems a few times - in Windows at least, moving or renaming a file also changes the URI (unless you check the journal, how can you know that C:\test.txt that was just moved to D:\test.txt was the same document?), so its hard to argue why doing it over HTTP should be any different...
Content-addressing helps a ton. I wish links on the web, especially, had been automatically content-addressed from the start. Would have helped a bunch with caching infrastructure and fighting link-rot. Oh well.
Does make updating more awkward, and you still need some system of mapping the addresses to friendly names.
The problem boils down to this question: What is an URI actually referencing? Does it identify a discrete piece of information (e.g. a text) in the abstract sense? Or does it identify to a specific physical and/or digital representation of that information?
Within the context of digital preservation and on line archives, where longevity and the ephemeral nature of digital resources are at odds, this problem is addressed through the OAI-ORE standard [1]. This standard models resources as "web aggregations" which are represented as "resource maps" who are identified through Cool URI's.
It doesn't solve the issue entirely if you're not the publisher of the URI's your trying to curate. That's where PURL's (Persistent URL's) [2] come into play. The idea being that an intermediate 'resolver' proxies requests to Cool URI's to destination around the Web. The 'resolver' stores a key-value map which requires continually maintenance (Yes, at it's core, it's not solving the problem, it's moving the problem into a place where it becomes manageable). An example of a resolver system is the Handle System [3].
Finally, when it comes to caching and adding a 'time dimension' to documents identified through cool URI's, the Memento protocol [4] reuses existing and defines one extra HTTP Header.
Finding what you need via a Cool URI then becomes a matter of content negotiation. Of course, that doesn't solve everything. For one, context matters and it's not possible to a priori figure out the intentions of a user when they dereference a discrete URI. It's up to specific implementations to provide mechanisms that captures that context in order to return a relevant result.
If you have sequential pages, I don't like dates in the URIs. For example if you have something spread over 5-pages (e.g. a 5-part blog post), I should be able to guess the URIs for all 5 parts just given one. Dates mean that I cannot do that.
In the early days, before the spam, a post would create pingbacks at some well-known-url, so post #2 would create a pingback link at post #1 if you referenced it.
> I didn't think URLs have to be persistent - that was URNs.
This is the probably one of the worst side-effects of the URN discussions. Some seem to think that because there is research about namespaces which will be more persistent, that they can be as lax about dangling links as they like as "URNs will fix all that". If you are one of these folks, then allow me to disillusion you.
Most URN schemes I have seen look something like an authority ID followed by either a date and a string you choose, or just a string you choose. This looks very like an HTTP URI. In other words, if you think your organization will be capable of creating URNs which will last, then prove it by doing it now and using them for your HTTP URIs. There is nothing about HTTP which makes your URIs unstable. It is your organization. Make a database which maps document URN to current filename, and let the web server use that to actually retrieve files.
Did this fail as a concept? Are there any active live examples of URNs?
URN namespace registrations are maintained by IANA [1].
One well-known example is the ISBN namespace [2], where the namespace-specific string is an ISBN [3].
The term 'URI' emerged as somewhat of an abstraction over URLs and URNs [4]. People were also catching onto the fact that URNs are conceptually useful, but you can't click on them in a mainstream browser, making its out-of-the-box usability poor.
DOI is an example of a newer scheme that considered these factors extensively [5] and ultimately chose locatable URIs (=URLs) as their identifiers.
The most common place I have seen them is in XML namespaces, eg in Jabber like xmlns='urn:ietf:params:xml:ns:xmpp-streams'
When a protocol ID is a URI it is common to use a URL rather than a URN so that the ID can serve as a link to its own documentation.
There is a bonkers DNS record called NAPTR https://en.wikipedia.org/wiki/NAPTR_record which was designed to be used to make the URN mapping database mentioned towards the end of your quote, using a combination of regex rewriting and chasing around the DNS. I get the impression NAPTR was never really used for resolving URNs but it has a second life for mapping phone numbers to network services.
GS1 (the supermarket item barcode people) integrated their global-trade system that’s used by literally everyone with URN URIs - I worked with them when doing an RFID project a few years ago. In practice it meant just prefixing “urn:” to everything - it felt silly.
That is nice in theory, but in practice stuff like archive.org are vital. If you see a document you want to refer to later, you need to archive it, either in a personal archive or via archive.org.
There are too many moving parts to trust that even domain names will be the same. See geocities and tumblr for recent example. If you want a document, you should have archived it.
The article isn't arguing that URIs don't change; it's arguing that they shouldn't. (The part involving judgement is elsewhere in the title—the word 'Cool'—so it can certainly seem like an assertion of fact rather than of value at a glance.) It thus seems to me that the response "in practice, URIs do change" doesn't undermine that point; your discussion of the need for some solution to the problem rather supports their point—if URIs didn't change, then there wouldn't be a problem to be solved.
(Or maybe your point was deeper, that one not only can't trust that the resource location won't change but even that the resource itself will still be available somewhere? That is true, too! But saying that archive.org is the solution is just making one massively centralised point of failure. That doesn't mean that we shouldn't have or use archive.org, but that we should regard it as just the best solution we have now rather than the best solution, full stop.)
The problem with URIs is that they weren't foreseen as the gateway to a whole slew of web applications, whose URIs can have a lifetime no longer than to serve that one request. There is a continuum here from long lived useful URIs all the way to ephemeral ones.
And then there are the URIs that aren't even made for human consumption, ridiculously long, impossible to parse or pass around. Another class is those that get destroyed on purpose. Your favorite search engine should just link to the content. Instead they link to a script that then forwards you to the content. This has all kinds of privacy implications as well as making it impossible to pass on for instance the link to a pdf document that you have found to a colleague because the link is unusable before you click it and after you click it you end up in a viewer.
> Your favorite search engine should just link to the content. Instead they link to a script that then forwards you to the content. This has all kinds of privacy implications as well as making it impossible to pass on for instance the link to a pdf document that you have found to a colleague because the link is unusable before you click it and after you click it you end up in a viewer.
Good for you. Now try it a number of times instead of just once and you'll see they insert their 'click count' script in there a very large fraction of the times.
Whenever I see a person or API use URI instead of URL I feel like I'm in an alternate universe. Turns out the distinction is that URIs can include things like ISBN numbers, but everything with a protocol string is a URL so really URL is probably the right term for most modern uses.
To be clear, the difference is that an URI generally only allows you to refer to a resource ("Identifier"), whereas an URL also tells you where to find and access it ("Locator").
For instance, `https://example.com/foo` tells you that the resource can be accessed via the HTTPS protocol, at the server with the hostname example.com (on port 443), by asking it for the path `/foo`. It is hence an URL. On the other hand, `isbn:123456789012` precisely identifies a specific book, but gives you no information about how to locate it. Thus, it is just an URI, not an URL. (Every URL is also an URI, though.)
I agree, but the confusion will continue. I read maybe a decade ago that URLs are just for network content and came away with the understanding that while URIs could include any "protocol" (like file: or smb: or about:) URLs were more specific. And thus if you wanted to talk about protocol agnostic locations, you should use URI. But that was totally wrong!!
End of the day, there is not clarity, so just use the term that will be best understood by the person you are talking to. URL is a good default, probably even for "about:"
There is also an argument for your original version with just the ID regarding unchanging URL's.
The Amazon URL that includes the title should be fairly stable, but if you look at e.g. a Discourse forum URL you see it contains the topic title, which can change at any time and then the URL changes with it. The old URL still works, because Discourse redirects, but this can't be taken for granted.
So Discourse then has these URL's referring to the same topic:
But... this link works. Everything after /B0849MPK73/ is because you reached that product page through search, and it stores the search term in the URL. You can remove it and the site works just fine.
If you’re interested in taking this to a new level. You should check out initiatives like
handle.net (technically it’s like a url shortner, but there’s an escrow agreement you need to sign first to make sure that the urls stay available). Purl and w3id.org (that allow for easy moving of whole sites to a new domain name. And of course https://robustlinks.mementoweb.org/spec/
* Simplicity: Short, mnemonic URIs will not break as easily when sent in emails and are in general easier to remember.
* Stability: Once you set up a URI to identify a certain resource, it should remain this way as long as possible ("the next 10/20 years"). Keep implementation-specific bits and pieces such as .php out, you may want to change technologies later.
* Manageability: Issue your URIs in a way that you can manage. One good practice is to include the current year in the URI path, so that you can change the URI-schema each year without breaking older URIs.
I'm in the midst of moving a website from mediawiki to a bespoke solution for hosting the data which will enforce structure on what's being presented. In the process, URLs will change, but, part of the migration is setting things up so that, for example, if someone goes to http://www.rejectionwiki.com/index.php?title=Acumen they will be redirected automatically to http://www.rejectionwiki.com/j/acumen so old links will always work. This seems a minimal level of backwards compatibility (although I wonder if there is any specific protocol for how to implement this that will keep search engine mojo—but not a lot because the site gets most of its traffic from word of mouth between users).
The point of the article is that someone visiting the old URL should the old resource as opposed to a 404, an error, or some different content. If you can't keep the old URL the second best thing to do is a redirect. (EDIT: I guess being pedantic the point is to design the URLs so you don't need to change them later, but "get it perfect the first time" is kinda useless advice :-)
This is what 301 HTTP status (permanent redirect) should be for... [1] So it seems to me if you use 301 you should be good to go.
Also from a quick search it seems the recommended thing to do is remove the old URLs from your sitemap.
True. Yet this submission will have dramatically greater visibility than it otherwise would have because the HN facebook bot linked it 5 minutes ago[1]. As a web archivist, I've dealt a lot with the erosion of URI stability at the hands of platform-centric traffic behavior and I don't see it letting up any time soon.
Sidenote: The fb botpage with a far larger audience, @hnbot[2], stopped posting some months ago.
6.2.1 "(...) The definition of resource in REST is based on a simple premise: identifiers should change as infrequently as possible. Because the Web uses embedded identifiers rather than link servers, authors need an identifier that closely matches the semantics they intend by a hypermedia reference, allowing the reference to remain static even though the result of accessing that reference may change over time. REST accomplishes this by defining a resource to be the semantics of what the author intends to identify, rather than the value corresponding to those semantics at the time the reference is created. It is then left to the author to ensure that the identifier chosen for a reference does indeed identify the intended semantics."
6.2.2 "Defining resource such that a URI identifies a concept rather than a document leaves us with another question: how does a user access, manipulate, or transfer a concept such that they can get something useful when a hypertext link is selected? REST answers that question by defining the things that are manipulated to be representations of the identified resource, rather than the resource itself. An origin server maintains a mapping from resource identifiers to the set of representations corresponding to each resource. A resource is therefore manipulated by transferring representations through the generic interface defined by the resource identifier."
No. /posts/18 will always refer to the same post. Post 18 will never show up on another url. And no other post will show up on 18. You may delete it but intentionally deleting something because it needs to be gone is not about what this post is talking about.
SEO has caused many companies to adopt unsustainable naming schemes. A url that references and ID is not going to have to change if a word in the title of an article is changed.
The number one worst offender of this is microsoft onedrive. Document name or location changed? well you'll need to reshare the file/folder with everyone.
> When someone follows a link and it breaks, they generally lose confidence in the owner of the server.
Is it a bias I've developed or has anyone else realized just how many dangling links on microsoft.com? Redistributables, small tools, patches, support pages, documentation pages. I've recently found out when a link domain is microsoft.com I subconsciously expect it to be 404 with about 50% chance.
I've noticed that the fashion industry is just rife with linkrot, and they spoil very quickly. If you're looking at a forum post from longer than 3 months ago chances are links to specific products will instead redirect to the store's front page or a 404.
Is there a benefit to this? I am mostly just frustrated.
It's really interesting to see perils of old findings becoming relevant when it becomes an actual pain to practitioners. Recent hype to functional programming language and using immutable data was already out there among academics in 90s but wasn't really used in practice until now.
There is a new reason that probably didn't exist back then, the application/cms powering the old pages has been replaced and it would be a massive effort to get the old pages working on the same urls they did before.
I think archive.org is the better long term plan. Not only does it preserve urls forever, it also preserves the content on them.
Side topic, sorry in advance but, am I the only one frustrated by how this page is rendered in a mobile browser?
I know, probably this wasn't an issue back in 1998 but I would have expected something that was more resilient to devices from w3. Of course, I might be overseeing issues.
The site is perfectly responsive (even if the margins are a bit large). The problem is that makers of mobile phone browsers decided to assume pages are not responsive and need a large width unless you include a specific meta tag - which is an absolutely stupid assumption and not something anyone could have foreseen in 1998.
FWIW, the document's own URI is terrible: 'https://www.w3.org/Provider/Style/URI' - who could have any idea what the page is about from that? And what if the meaning of the word 'Provider' or 'Style' changes in x years from now? :) You could argue that the meaning/usage of 'URI' has already changed, because practically no-one uses that term any more. Everyone knows about URLs, not URIs. Not many people could tell you what the difference was. So the article's URI has already failed by its own rules.