We take the typical blog url design (/2024/08/14/slug) for granted but back in the very early 2000s pretty much every blog tool had its own URL design. Matthew Thomas back then took an inventory:
I could have sworn there was a changeset in which Matt Mullenweg was implementing those cruft-free URLs in his new fork called Wordpress, but trying for google for something with "Wordpress" from the early 2000s is basically impossible in 2024.
He calls file extensions cruft, but i've come to value them. They are a simple way to indicate file type - desired or offered - which is easily understood by machines and people.
I currently work with an API which does a bit of content negotiation using the Accept header, so clients can request data in various formats - application/json for a snapshot, text/event-stream for an updating feed, or text/html for an interactive dashboard. I wish it didn't. I wish we'd just used file extensions. Trivial to use in a browser or via curl, trivial to implement on either side.
That's fine (and already common) for images, JSON, etc.
But nobody wants webpage URL's that randomly end in .php, .htm, .html, .aspx, and so forth. That's just noise that is both gibberish and entirely irrelevant to the user.
For _APIs_ I prefer to use both - the only downside is that resource names need to be restricted to _not_ include trailing `.{EXT}`s (either at all or limiting EXT to things that aren't valid content types).
E. g. `/books` - looks at the `Accept` header. `/books.json` - sets the `Accept` header to `application/json`. `/books.xml` - `application/xml`, and so on.
I guess this reflects a view of blogging that maybe is more what people today would use twitter or mastodon for, with lots of blogposts with the same title like "open thread" or "links for sunday". Today people mostly use blogs to publish essays, and then a slug based on the title should be sufficient, since you're not going to publish two essays with the same title. That's what substack uses.
I think the date is still extremely valuable. Knowing whether something is from last month or a decade ago makes a huge difference. It's also useful so that URL's can be sorted by date.
Also, "you're not going to publish two essays with the same title" feels false. If you write 1,000 pieces and use short titles and tend to write about the same subjects, it feels extremely likely that you'll wind up repeating titles.
And it's sad how often one needs to use the URL to find the date, since many authors just don't put it on the page (corporate sites are particularly scared of dating their stuff)
Others seem to think just day and month is fine, as if the year isn't the most significant part. And if both numbers are <=12 then you have to go and find out what locale the author formats their dates in...
I agree, but I think it's important to note that the date in the URL can also be misleading. For example, it's often assigned at time of creation. If that page or post gets updated years later, even if almost entirely rewritten, it still has the original date in the URL
> even if almost entirely rewritten, it still has the original date in the URL
If we're talking about blogs/news, they don't ever get almost entirely rewritten. The original publication is the only date that matters, and it matters a lot.
If we're talking about evergreen content like documentation, then of course you don't put dates in the URL. A small "last updated" on the page itself is appropriate there.
> If we're talking about blogs/news, they don't ever get almost entirely rewritten.
Unfortunately, this isn't the case. It should be the case IMHO, but it (currently) isn't. The SEO/marketing people nowadays (ab)use popular pages for the search rankings and update them regularly to keep the content fresh and highly ranked (since search engines give much preference to new content).
Also, even for strict blogs/news, it's not unusual for a particular post to be a draft for many months before publishing. Most serious blog will fix the date to match publish date, but that isn't what happens by default especially in Wordpress (which is the most important platform for blogs).
You're onto something: Very early blogs around the millennium where often build around very short paragraphs; not big articles. Take a look at these; if one squints those looked far more like later Twitter streams:
Then in the early 2000s blogging resulted in a style with longer articles instead of paragraphs. In the middle 2000s a "retro" style begun with far shorter and differentiated entries, the so-called tumblelog:
How do search engines figure out the date of webpages that don't contain it in the metadata?
Poorly.
I have a blog so old I titled it an "online diary." It pre-dates search engines, so they tend to date the diary entries (blog posts) based on first crawl. Which means lot of the dates presented by the search engines are off by several years.
Well, arguably both Movable Type and Radio Userland's URLs were already pretty cruft-free. The success of Wordpress was mostly due to other factors (free, php, great feeds, great markup in default templates, great support for plugins).
I found Notion's URL schema interesting as well. They have to contend with renames of pages, reorganisation of the hierarchy and all that. So they have something like:
notion.so/:account/Current-Name-of-Page-:pageid
where the name changes if the page is renamed, but the redirect works, as the page ID is unchanged. In fact, one can just use
notion.so/:account/:pageid
and gets redirected to the right page, or even
notion.so/:account/Anything-else-:pageid
works too...
This is very handy in my use cases, when various Notion data is extracted into another tool, reassembled, and then needed to have a link to the original page. I don't need to worry about the page's name, or how that name gets converted into the URL, or any race conditions....
The page hierarchy is then just within the navigaton, not in the URL, so moved pages continue to work too (even if this looks like a flatter hierarchy than it really is).
I'm sure there are plenty of drawbacks, but I've found it an interesting, pragmatic solution.
I've noticed that Confluence, Reddit, and a good number of news sites do the same thing. Usually the title segment is entirely ignored, meaning you can prank someone by changing the title part to something shocking, and it just redirects to the usual page, because the server only cared about the ID bit
The fact that so many sites do this (including "normie" news sites) shows that site designers clearly believe users want and expect "informational"/"denormalized" URLs, rather than /?id=123
The better way to implement this is to serve a 301 redirect if the words in the URL don't match the expected ones, that avoids trickery and also removes the risk of the same page being accidentally indexed as duplicate content.
There is (was?) also a terrible flaw with this, in that any site hosted on Notion (even using its own domain) would show you public pages with a valid page id.
So you can tell what the URL might point to by looking at it. That’s one of the important things mentioned in the article linked on this HN post: URLs are used by both computers and people.
It's also for crawlers. When doing technical SEO, having a human readable slug in the URL is low hanging fruit that is often overlooked. This, as well as having a <title> structure of `CURRENT_CONTENT_TITLE | WEBSITE_NAME` are things that are quite trivial to implement and provide a significant uplift in both SEO and UX.
Back when I was working on GOV.UK Verify we had URLs that looked something like /verify-passport for English and /cy/verify-passport for Welsh. I made the decision that if readable URLs was a design goal they should be readable in both languages, and ended up localising them all to (for example) /verify-passport and /gwirio-pasbort. No idea if anyone ever noticed, but sometimes it’s nice to sweat the small stuff.
I had the same thought for a restaurant website that served multiple languages. I figured customers might glance at the URL when it's being shared, and appreciate it being in their language.
What do you do when the translated slug happens to be the same in multiple languages? I ended up still having the country code in the slug.
This is important to do, in my opinion. You should not assume that non-English-speaking users all understand a little bit of English. It also confers some SEO benefits.
That being said, if the URL has a language code (/en/), it would be good to change the language code manually, and still end up on the right page. Sometimes the language switcher is really well hidden. However a visible language switcher is even better.
Best time of my professional career, broadly (although there were obvious frustrations). I’d 100% recommend at least a couple-year stint at GDS if it’s an option, as it completely changes the way you think about a huge number of things.
There was another classic from that era, powergenitalia.com, which purported to be the Italian division of Powergen, has now disappeared and apparently was a prank. Interestingly, some of the early captures of it seem to be another company, so maybe it was just an early attempt at SEO and they might have removed it due to the risk of being sued for trademark infringement: https://web.archive.org/web/20040830080331/http://powergenit...
"Q: Can I provide my own wood?
A: In most cases we can handle your wood. We do require all shipments to be clean, free of parasites and pass all standard customs inspections."
I’m glad this article mentions GitHub, who have had some of the best URL design I’ve ever seen, and have done since they first launched.
I use that ALL the time. I can navigate straight to any issue by typing a URL. I can switch to the “actions” view for a repo by adding /actions. I can see the file I’m looking at in a branch by editing the URL and swapping “main” for the branch name.
All available via the UI as well, but I interact with GitHub so often that the tiny efficiency boost I get from navigating by URLs really starts to add up.
I also trust them not to break links, based on their track record. My notes and blog posts and even my source code are full of links to issues or code snippets on GitHub.
In rails, if you don't spend effort to undo, you get URLs that match your database setup in a RESTful way. For me, that's far to tight coupling, and problematic in other ways.
But the result is that without thinking about it, without a second of time spent on designing an URL structure, you get a very nice, consistent and clean one for free.
I agree that is a cool feature. I would say it's from its Rails background where such a thing is encouraged (.json or .html (or no extension)) on a resource give you two different outputs
An example of a not-so-great URL design: Amazon product links have an optional slug before everything else like `{slug}/dp/{id}`. So you end up copying a gigantic URL everytime you wish to share a product unless you use the share product button to get the shortened link.
I think this is actually a really great design in that the name of the product can always be in the URL. The "slug" is completely ignored and just there for SEO/humans. If you send a link I instantly know what it's for -- that's pretty useful!
I also like that the "id" which is an ASIN (Amazon Standard Identification Number) which is a superset of all ISBNs. This means you can just enter any book ISBN directly into the browser and end up on the right page (at least historically) instead of having to search for it.
This got a major retailor (Walmart maybe?) into issues awhile back as they were pulling the product title (only) from this param so people were having a heyday "renaming" official products on the official site.
>>Amazon product links have an optional slug BEFORE everything else
>I think this is actually a really great design in that the name of the product can always be in the URL
he said "before". you could accomplish your goal putting the slug "after". he's making the point that having a place after which you can harmlessly delete the rest of the url is better than having embedded NOPs surrounded by identifying information (not that the average user will ever edit any url, but there is still merit in what he said that you missed, and he's not disagreeing with you)
While the slug helps someone know what they're opening before reading it, most apps have link previews which give you just enough information you need.
I dont know if I'm in the minority but I really dislike link previews the vast majority of the time. They take up too much space and offer to little value. In discord I often x out of the preview and I know lots of others who do too.
Maybe on some sites it's fine but I feel like link previews need more modularity on size or something. Perbaps even configurable both host and client
The information density of the preview has to match, if not exceed, that of the context in which it is provided. Most of the times it's a placeholder image with less text than the URL itself.
At least that stuff is actually informative to a human. A lot of Amazon's URL bloat is the analytics crap they shove into query string. I started using ClearURLs specifically because of Amazon.
If you're referring to the StackOverflow example from the article, it's different because they follow `/questions/:id/:slug`. Keeping slug at the end makes it a lot easier to delete while keeping it readable.
It is a curious decision why they chose to put the ID before the slug.
Stack Overflow, in contrast, puts the ID first, followed by an often very long slug. Which seems to be the more common pattern generally, as far as I can tell.
> So you end up copying a gigantic URL everytime you wish to share a product
Yeah, when I used Amazon I found this incredibly annoying. When I wanted to share a link, I'd have to spend a few minutes figuring out how much of that stuff I could remove and testing the resulting URL before sharing it. A relatively minor irritation, but an irritation nonetheless.
having, like you, studied amazon urls, I just automatically delete everything?inclusive after the "?", then I edit the slug to make it contain a message personalized for my recipient
> website under a .is domain (which is for Iceland, apparently).
I don’t super like repurposing country names like that. Including .io.
It feels this disregards the actual meaning of the extension while ignoring some very legal consequences to be under Indian legal system instead of EU or US.
To be fair most countries rolling out ccTLDs these days aren't doing it so patriotic citizens can represent their home country - it's an international cash grab. They're fine with randoms from all over the world giving them money, so randoms from all over the world are welcome to use their ccTLDs.
Note, if you are using optional slugs or otherwise, you should have a canonical url in the header so that search results will be collated to a single canonical url.
Post author here. There are so many great additional examples of intriguing URL patterns in the comments here. TY everyone for sharing ones you remember!
The Reuters links are an example of good links IMHO. They're not earth-shattering, and follow some fairly generic guidelines, but work quite well.
Format is
reuters.com/:category/:headline:date
which is all you need to know what you're clicking on. For example, I don't need to describe this link in order for its contents - and its time-relevance - to be understood:
%short-namespace% – one or two letters that identify the page type, no dependency to any site hierarchy
%unique-slug% – only use a-z, 0-9, and – in the slug, no double — and no – or – at the end. Only use “speaking slugs” if you have them under your total editorial control.
I might be an outlier, but I don't like slugs in URLs.
They make URLs unnecessarily long, often forcing people to use URL shorteners -- completely defeating the purpose.
They get awkward when the author changes the title. Other commenters mentioned some tricks to get around this issue, but all involve redirects. Cools URLs shouldn't change in the first place.
They don't copy cleanly if you use nonalphanumeric characters, as in nearly every language other than English.
Virtually nobody just looks at a URL these days anyway, with all the search engines, cute thumbnails, and OpenGraph metadata that provide a glimpse of the actual content for you before you even click on it. This is doubly true in the non-English-speaking parts of the world where a slug in a shared URL is often just a jumble of %HEX.
Hand-picked words in URLs are fine, e.g. /about/me. I'm only talking about autogenerated slugs for user-submitted content above.
Of course they work properly once you paste them into a browser. But then you've visited the URL.
One alleged justification for slugs is that they allow people to guess what the URL is about without actually visiting it. This usually happens in forums, comments, text messages, and other places where people can only use plain text. But non-ASCII slugs look undecipherable in exactly such places.
Similar to the Slack example given in the post with /is/ URLs, KDE has /for/ URLs with pages which present the KDE project and software for various user profiles: developers, kids, scientists, students, creators, gamers, activists, etc.
Love this. A big miss teams have are with affiliate/refer a friend urls. Like if you think of your Uber referral code being 5 characters instead of a general 16 character hash. Shorter makes it easier for people to remember and share their code.
Using a numeric ID + ignored path part is easy to implement, but actually using the textual part without an exposed ID seems more elegant to me. Tip for implementing that:
* have a separate table that maps slugs to IDs, allowing many-to-one relationship, because content's title will be updated, and you don't want to break old links.
* long slugs will get truncated by users. A zero-cost way to recover from that is `select id where slug >= ? order by slug limit 1`
* in either case don't forget to redirect to the canonical URL, so that people can't create duplicate or misleading URLs on your site.
I don't like slug-only URLs because they are hard to make concise. If the textual part can be compressed into one or two words, like what qntm does in his website, then it is okay, but most schemes do not really care much about slugs...
the problem with that, as mentioned in the article, is that that breaks if the content of the page changes. So on stack overflow for example, what happens if someone changes the title of their question? you either break the url or use an old version of the title, thus being misleading.
From a URL=Resource perspective I don't love unstructured strings in the url because they make it trickier (never impossible) to extend with sub-resources. For example, if I have a url for a blog post with a slug, it's more difficult to represent a comment as a sub-record of that post:
What’s the best way to handle url slugs that change? For example, if I have www.example.com/page/foo, and the user changes that page’s title to bar, the slug updates to www.example.com/page/bar and anyone visiting the old url gets automatically redirected to the new one. But now the old slug of foo can’t be used again (without appending some unique identifier to it, like foo-th683gh9i).
It’s not great but it matters less when the content you get going to that page is so unremarkable. Don’t forget you can do that to any url, even of sites that don’t use optional slugs, if your goal is just vague, evil-by-link-appearance.
I like the slack.com/is/ scheme that makes the resource into a simple, legible phrase. I do something similar on my site: everything in the "projects" category has a URL that starts with "what-about", e.g.: https://joeldueck.com/what-about/splitflap/
I've come to regret that most of my projects have this URL design- mainly because it gets harder to track via third party analytics platforms like Sentry and Clarity.
eg. series/9876545678 and series/098767890 get treated differently and the analytics get difficult to merge. But really they're the same page just hydrated with different data.
Should've used query params, eg series?id=9876545678
Another interesting related area is designing URLs for third-party components.
Third-party component have to coexist with existing site navigation logic, so generally you can't safely add URL-based configuration to such a component.
Fortunately, configuration can now be stored in fragment directives in order to hide this from normal site routing. e.g.
I gotta say, Datadog does this pretty well. They manage all their state (just information state, not like user sessions lol) in the URL, which makes it easy to integrate with and dynamically generate links and share information, and manages to stay human readable.
The Slack URL scheme, and a few others mentioned in other comments take me right back to hp.com/go/<Insert Product>, so hp.com/go/proliant would take you to Proliant servers, maybe.
The idea was really cool, but from talking to people at HP at the time, the implementation was apparently a complete nightmare done with an insane number of rewrites. It was sort of a hit and miss if the thing you typed in after /go/ would actually take you to the correct location, if any.
It was the shear amount of them that was the problem. Plus this was in the early 2000s and they where, if I recall the story correctly, manually managed.
I don't think will be relevant going forward, Safari already hides the URL beyond the domain name by default, and I presume other browsers do/will too.
As the article mentions, URLs are used in more contexts than being displayed in the address bar, so the content remains relevant regardless of Safari's poor aesthetic decisions.
I forget where I ran across it, but one interesting adoption of URL design is to make the root of the directory part of the site's domain name. I.e. there was someone's website that was shared on HN, where their name was assembled with the domain name, TLD, and some characters after the first slash:
This can be a complex topic if you don’t set clear constraints on what constitutes a valid character in your URL or domain.
For instance, in query parameters, spaces are encoded as '+'. But what if '+' is also a valid character in your domain? You then need to disambiguate i.e between "name?foo+bar" meaning "foo bar" or "foo+bar". Which one is the user actually referring to?
In our case, we ended up needing users to send the name in the body, and now we have to manage multiple encoding protocols (url, queryparam, the body...).
A user might have entities named "foo bar", "foo%20bar", and "foo%2520bar". Sometimes, mistakes happened because users forgot to double encode or they used the wrong protocol. As this names were used in URL, query parameters, and the body, and each has its own.
As I mentioned, with clear constraints and rules, we can accomplish anything we need, it can get complex. My takeaway from this project is to limit the valid characters and make it simple for everyone.
Actually, not encoded slashes in the paths are not that unheard of as one might think. As an example, this is actual Wikipedia article about... the meaning of "two slashes": https://en.wikipedia.org/wiki/// . Also, encountering buggy concatenations like example.com//some-path or example.com/base//some-path is quite common.
I think the URL design of stackoverflow leavs room for improvement. The id should not be necessary. StackOverflow demands unique questions. If a question doesn't have a unique slug, is the question unique? Great URL design to me is if the slug is suffficient for uniqueness, without an id.
I can't check this because I'm on mobile, but I presume Stack Overflow uses a canonical tag in the HTML to state their preference that the longer version with the slug should be the default, because that's the one search engines use.
How about great email address design? Of course firstname@lastname.com is top tier. But there are some interesting hacks you can do, such as firstn@melastname.com if your last name domain isn't available.
I never thought of this! I have an 'a' in my first name but just checked, 'rest of my first name + last name'.com is already taken. Oh well, I already have 'my initials'.dev, it'll have to do.
> Granted, it can also be used deceptively. For example, this is the same URL as above but it portends completely different contents (without breaking the link):
Fortunately for SO the fake slug is not preserved and redirects to the real one (so e.g. stackoverflow.com/questions/16245767/motheficker is not served from their site), much to the chagrin those of us with childish sense of humor who some 25 years ago enjoyed dynamically generated nonsense like:
Everything should be accessible via the identity of its composition (a hash or equivalent). Then all the data needed to render it be computed or downloaded from some peered cache (DHT).
So when you bookmark the Hacker News frontpage, that would be a hash of its current content and then you will visit that stale stale version forever and never see any new stories?
It was cool to see Jessica Hische called out. We own a couple of her children's books. Always fun when my parenting and tech worlds collide in surprising ways.
id's that skip the textual part on their lookup/url validation and also don't redirect are not ideal, probably as bad as soft 404s. Maybe not as bad for bots if the canonical tag shows the intended URL.
Personally I'd avoid using id's and use a 32-bit hash of the URL which is more or less as performant as a straight id lookup. I usually went with murmurhash.
You can 301 redirect some locale to your "base" URL if you want.
mysite.com/en-us/some-page > mysite.com/some-page
But don't stress too much. Google doesn't really care about URL content any more. People on phones don't care what your URL says. It's at most desktop users, and devs.
Don't stress localizing your URLs...
mysite.com/fr-ca/some-page is just as good as mysite.com/fr-ca/une-page... and the former is a lot easier to tie into email marketing variables.
Just keep your sitemaps in the localized folder.
mysite.com/sitemap.xml... just a link to the various localized sitemaps.
mysite.com/en-us/sitemap.xml etc.
By keeping sitemaps in a localized folder, it'll make it a lot easier for yourself as you go to register your site with each market's locale.
If you just have to localize URLs... consider doing what Amazon does and just tie the URL to an ID.
“We use the words in a URL as a very very lightweight factor. And from what I recall this is primarily something that we would take into account when we haven’t had access to the content yet… [but] as soon as we’ve crawled and indexed the content there then we have a lot more information. And then that’s something where essentially if the URL is in German or in Japanese or in English it’s pretty much the same thing.”
https://web.archive.org/web/20030810201315/http://mpt.phrase...
He was on the search for his ultimate blogging system, where this "cruft-free" URL structure should be used:
https://web.archive.org/web/20051107103030/http://mpt.phrase...
I could have sworn there was a changeset in which Matt Mullenweg was implementing those cruft-free URLs in his new fork called Wordpress, but trying for google for something with "Wordpress" from the early 2000s is basically impossible in 2024.
Update: I found this: https://ma.tt/2004/08/mike-on-uris/