
Database IDs Have No Place in URIs (2008) - tosh
https://johntopley.com/2008/08/19/database-ids-have-no-place-in-uris/
======
benkuhn
This argument seems backwards.

As far as I know, no database has ever had any of the problems he mentions
with IDs not round-tripping properly during backup/restore. (Please correct me
if I'm wrong! But why do I think this is true? Because any database that
doesn't roundtrip primary keys would break all foreign keys on backup/restore.
Databases with broken backup/restore tend to either fix it real fast, or go
away.)

On the contrary, it's much _easier_ to keep IDs stable over time, than any
other field. The problem with using a different field as an identifier (e.g. a
URL component) is that almost any property with an objective meaning, might
change over time or become non-unique. This is why it's a best practice not to
use, e.g., user email as a primary key.

For instance, Stack Overflow allows users to edit the title of a post. Right
now when that happens, the canonical URL changes (to include the new slug),
but the old URL is easily 301'd to the new one because it includes the ID. If
the numeric ID was not in the URL, you'd be stuck with either a misleading
slug, or having to maintain a list of former slugs/redirects for each post.
Plus, you'd have to hack your slug-generation algorithm to ensure the
generated slugs to be unique, and the easiest way to do that is... by
appending an incrementing ID to the end of the slug.

The point about avoiding cruft like .aspx in URLs is well taken, but
unfortunately it's in direct tension with the point about keeping them stable
given that "meaningful" non-cruft tends to change over time!

(And if you don't want someone to guess your metrics from IDs, you can use a
random non-numeric ID instead!)

~~~
cryptonector
Pick a primary key (i.e., set of columns) for a table. If you expose this
outside the database in stable ways (e.g., in URIs), then do not permit
changes to that table's rows' primary keys.

There's NO NEED for a primary key to be an INTEGER or any sort of ROWID.

I agree with TFA -if this is what it says- that one should not put INTEGER
PRIMARY KEY values in URIs -- it's useless to users. Just make the "slug" a
column, make a NOT NULL and UNIQUE, disallow changes, and use that. You don't
have to make the slug a PRIMARY KEY -- NOT NULL and UNIQUE has the same
semantics as PRIMARY KEY.

Surely TFA wouldn't argue to have _no_ DB key at all in URIs, so I won't
bother with that possibility. And yes, I should read TFA...

~~~
derefr
What do you do when literally every "natural" column of your table is mutable,
but you need the row's identity to be stable?

Well, you need what is called a
[https://en.wikipedia.org/wiki/Surrogate_key](https://en.wikipedia.org/wiki/Surrogate_key).

Surrogate keys need to be made up from whole cloth, apart from the natural
data (because if they are _derived_ from any of the natural data, they'll
become incorrect when the data changes.) And what's easy to make up from whole
cloth? Sequential integers. (Or random numbers, which—if you want to guarantee
no collisions across a distributed DB cluster—gives you ≥128-bit random
numbers, i.e. UUIDv4s.)

~~~
cryptonector
The slug has to be computed at INSERT time (e.g., by a trigger) and not
allowed to change thereafter.

~~~
eridius
But that's no good because it means the slug may end up having no relation at
all to the actual content of the page (after the page has been edited). The
only thing worse than no slug is a bad slug. Don't bake bad slugs into your
database design.

~~~
cryptonector
An integer has even less relation to the actual content.

~~~
eridius
That's why you pair the immutable integer with a mutable slug. Nobody
memorizes slugs so you don't have to worry about supporting the case of a user
manually typing in the URL from memory (which is about the only time when
having a short integer in the URL would matter). With this scheme, you can
change the slug at will, and in fact you can support loading the URL without
having a slug at all, or with a made-up slug. This makes the URL resilient to
minor transcription errors (e.g. accidentally deleting the last character of
the URL, or screwing up when retyping it), and it also means anyone who's
aware of this behavior can skip the slug entirely when retyping the URL.

------
jolmg
> If you have to move the site to a different box, can you guarantee that
> those database IDs will remain the same?

If you don't, then your foreign keys will be broken.

> If you have to restore the site’s data from a backup, can you guarantee that
> those database IDs will remain the same?

Ditto. If they're not, then that backup seems broken.

> If you have to switch to a different database server (say from Microsoft SQL
> Server to Oracle), can you guarantee that those database IDs will remain the
> same?

Ditto.

I've only had one situation where I had to change the database ids, including
making sure to change all foreign keys accordingly in all tables, and that was
when I was migrating multiple company-hosted instances of a webapp they had in
their intranets to a unified internet-hosted server. I was merging multiple
databases into one so modifying ids was necessary so they wouldn't clash.

At least in my case, I don't think any of those companies' employees would
have had the expectation for their intranet URLs to work. I mean, even the
host/domain part of the URI changed. Why would they have the expectation for
the path part to still work?

------
ehutch79
I don't understand why people are saying this is a security issue.

If user a has private document 1234 at /doc/1234 and user b has private
document 1235 at /doc/1235

when user a goes to /dec/1235 IT SHOULD NOT RETURN THAT DOCUMENT.

Who in their right minds considers obscurity to be a replacement for
authorization?

If a random anonymous individual can access private data by hitting a url, the
problem is not the id used in the url.

~~~
airstrike
> when user a goes to /dec/1235 IT SHOULD NOT RETURN THAT DOCUMENT.

As a matter of fact, it should return 404 NOT FOUND for both users A and B

~~~
newscracker
There’s some time for the OP to edit and fix that typo. But yeah, I noticed
that at first glance too.

~~~
ehutch79
Yes, it was a typo

------
Jwarder
One idea I like is to have a normal internal database ID, but also have a
public UUID column to use for URLs, APIs and that sort of thing.

This way you can change the record's UUID at any time to display different
data without having to worry about updating a bunch of internal foreign keys.
You don't have to worry about a user making easy edits to the UUID in the URL
to find nearby records (although you still need real protections if you need
to restrict data by user).

~~~
perlgeek
> This way you can change the record's UUID at any time to display different
> data without having to worry about updating a bunch of internal foreign
> keys.

That is true, but changing the UUID is also likely to break a lot of
assumptions that your users make. If your user indexes your forum posts, for
instance, and you change the UUIDs, they'll re-index posts with new UUIDs as
new posts, and send end users to now invalid UUIDs until the old index entries
are evicted.

~~~
aaronharnly
Yeah, I thought that line was weird. It's more like – you can change the
record's internal foreign key without breaking your users! Once users have
bookmarks, IDs more or less permanent.

~~~
Jwarder
Naturally it depends on the data. I've used it when we want to show the most
recent version of a record on the public website but we want to keep older
copies of of the data to reference with past events. Eg customer's current
address can always be identified with id x but the address records for
specific past orders reference other records.

------
einrealist
"URIs should be meaningful, not cluttered with meaningless information."

Actually, URIs are opaque. The parts of an URI are implementation details -
even the protocol, domain and port. To the user, changing database IDs, that
happen to be part of the URI, is no different to changing the domain, port
number or the protocol.

Whenever I get into a discussion about "URI design", I state two rules:

1) Care about the URI design as much as you want. But never expect from
someone else - especially the client - to make sense of that design.

2) When necessary, handle URI changes in a way that does not break the client
(redirects are your friend).

Btw. For this reason, browsers tend to remove the address bar from the UI (as
much as possible).

------
slig
For a content website such as SO, the biggest advantage of using /<id>/<slug>/
is being able to update the `slug` whenever the `title` of your content
changes, and then it's very easy do a 301 redirect and you're good to go.

~~~
dqybh
If you're afraid of Google bombing, the redirect is not even necessary, just
make sure the <link rel=canonical> tag contains the correct URL.

~~~
eli
I prefer redirects because if someone edits a URL to look like
/1234/offensive-phrase-here/ and shares the link I don't want it to appear
that we choose that phrase as a slug.

------
fabian2k
That article seems to immediately contradict itself by saying that the most
important aspect of URIs is that they never change, and then suggests using
the slug derived from the title as unique identifier. Titles change all the
time on Stack Overflow, this would immediately break URLs.

~~~
bazzargh
The title and slug are related but not tied together. It's quite common to
visit slug-based sites and find the slug is a relic of the original title. In
addition, slugs have to be unique when used this way, but titles don't. eg a
title of _Database IDs Have No Place in URIs (2008)_ may appear again next
year, but the url would need to point to a different conversation; your CMS
would suggest or generate the slug based on the original title.

~~~
eli
The title of a news article sometimes changes because of new reporting that
makes the original title inaccurate. Leaving the old title as the slug could
cause confusion. Think about a breaking news event where details are very
sketchy at first but the article is updated as more becomes known.

------
GauntletWizard
This guy sounds like he has bad backups and bad database management. Nothing
more. That "database ID" is probably an auto-incrementing primary key, and
yes, moving it to another box, restoring from backups, and even migration to
another server type should keep the same ID. If not, you've done it wrong.

In none of those situations will new entries necessarily follow the same
pattern, i.e. there may be huge gaps or it might start filling in deleted
questions so entries are "out of order", but those are separate problems and
also remedied by using your schema correctly.

In short: this post is very misinformed and I hope that it's presented for
that purpose and not taken as serious advice.

~~~
aiCeivi9
> this post is very misinformed.

This. The answer to 3 question in article is always "YES".

> The slug itself can easily be automatically generated when a new question is
> saved. Then you can simply retrieve a question by its slug.

And that was just pure ignorance. What happens to slug when question(or
product) is edited?

------
madrox
I get the vibe a lot of these critiques are rooted in academic understanding
of the issues rather than real world practice. In practice, these are all
problems that can be addressed...some rather trivially.

The biggest real world concern I've ever had with embedding incremental IDs
into URLs is that incremental IDs can leak information, like how many records
you have and how quickly that's growing. Depending on your business, you may
want to obfuscate that.

------
adrianmsmith
Also, using a numerical auto-incrementing ID in the URL will expose how many
users you have so far. Not a problem for Stack Overflow, but I’ve had clients
(startups) who wanted to put IDs in the URL, when you sign up and your URL is
/profile/432 it’s quite clear the product doesn’t have many users.

~~~
petepete

        ALTER SEQUENCE users_id_seq RESTART WITH 54321;
    

"Wow, this service is probably mature, well-tested and reliable!"

~~~
koolba
And then come back a week later to see that 500001 => 500004...

~~~
TeMPOraL
The number of users who notice the user ID is a small fraction of your total
users. The number of users who would notice the ID didn't grow much over the
week is a small fraction of that fraction.

~~~
zAy0LfpBZLC8mAC
Namely, cour competition that tries to figure out what to do about you or
whether to even care, yes.

~~~
dexterdog
Exactly. I have done this in research on competition many times.

------
jayrwren
Pretty sure the last 10yrs of stackoverflow proves that this is not a concern.

I'd file this under: things to not worry about.

~~~
geezerjay
Sequential numeric ids leak information regarding your operations.

[https://en.m.wikipedia.org/wiki/German_tank_problem](https://en.m.wikipedia.org/wiki/German_tank_problem)

~~~
wutman
This is exactly right, but in my opinion usually not important. If Joe Public,
or a competitor, or whoever knows how many posts you have, users you have,
administrators you have, etc how much does that actually affect your
operation? I guess they can better compete against you in some small way with
that information, but in my opinion it's almost never worth the effort to
obfuscate. It's like optimizing for a problem you don't have.

~~~
geezerjay
> how much does that actually affect your operation?

If it's a business service then the operation is not the end goal but the
means to the end.

If it's a business service then the question you should be asking is "should I
care that my competitors are able to observe and monitor my service metrics?".

Quite often, businesses do care a lot.

------
_fzslm
Looks like Hacker News does this too:

[https://news.ycombinator.com/item?id=1](https://news.ycombinator.com/item?id=1)

------
kozak
That can be either good or bad depending on situation. There are plenty of
cases where concise sequential identifiers are an advantage.

~~~
kevsim
Fully agree. For example, it's very handy that GitHub issues are auto
incrementing integers starting at 1, so that I can type "Fixes #123" in a
commit message instead of "Fixes #<some-uuid-or-something>".

Of course, that short numeric ID doesn't necessarily need to be part of the
DB's primary key.

~~~
slig
Is there any best practice on implementing an ID field that starts at 1 for
each repository (in GH's example)?

~~~
kevsim
I guess it would be pretty easy in postgres to create a sequence per
repository and have the field use the next value in the sequence (i.e. like it
normally does for auto incrementing IDs but where you'd specify the sequence
to use at insert time).

~~~
slig
Thanks, I'll have a look!

------
localhostdotdev
pretty cool how the urls stayed the same:

[http://stackoverflow.com/questions/13204/why-doesnt-my-
cron-...](http://stackoverflow.com/questions/13204/why-doesnt-my-cron-job-
work-properly) (still works)

[http://stackoverflow.com/questions/21064/massive-
reputation-...](http://stackoverflow.com/questions/21064/massive-reputation-
jump) ("this question was removed from stack overflow for reasons of
moderation", still available:
[https://i.imgur.com/Csy94IL.png](https://i.imgur.com/Csy94IL.png))

------
LoSboccacc
this seems backward, as far as the pk remain meaningless they're easier to
port around, not harder. it's once you give them a meaning then trouble starts
(for example you can't change the slug down the road if the title changes)

and with integer keys the worst that can happen is that the sequence is not in
the backup so after a restore you need to have the p sequence restart from the
previous max id + 1; meanwhile with slug built off user entered data a change
in collation can really screw up your day, unless you want to mangle all texts
into basic ascii, and SO doesn't
[https://stackoverflow.com/questions/41102371/sql-doesnt-
diff...](https://stackoverflow.com/questions/41102371/sql-doesnt-
differentiate-u-and-ü-although-collation-is-utf8mb4-unicode-cia) (see the ü in
the link more than the question itself)

------
3xblah
"URIs in Stack Overflow look like this:

[http://stackoverflow.com/questions/13204/why-doesnt-my-
cron-...](http://stackoverflow.com/questions/13204/why-doesnt-my-cron-job-
work-properly")

Article gives only cons. What are the pros?

[https://stackoverflow.com/q/13204](https://stackoverflow.com/q/13204)

Shorter, easier for users to work with. Could be even shorter with a domain
hack. Does Stack Overflow have one?

[https://en.wikipedia.org/wiki/Domain_hack](https://en.wikipedia.org/wiki/Domain_hack)

[http://xona.com/domainhacks/s.cgi?q=stackoverflow.com](http://xona.com/domainhacks/s.cgi?q=stackoverflow.com)

~~~
JeremyBanks
They have s.tk but it's mostly used for internal/intranet links:
[https://meta.stackexchange.com/q/109795/134300](https://meta.stackexchange.com/q/109795/134300)

------
geophile
I don't get this argument at all.

If the ID comes from something like a SERIAL column, then yes, you can port
those easily. I think every database system has something like SERIAL, as well
as the ability to control an initial value, taking care of portability
concerns.

SERIAL has other problems, namely predictability, so another possibility is
UUIDs. And those should be pretty portable too. I know DBMS UUID
implementations are different from OS implementations, but the risk of
collision should still be vanishingly small.

And if you don't use IDs, then what's the choice? If the URI contains an
article name then what about article names being changed? What about duplicate
article names?

~~~
steego
I can't believe people have been giving this post this much credit in the
first place. All of his contrived examples are solved with trivial methods and
it just sounds like he doesn't know how to migrate data from one database
system to another.

------
thecatspaw
The issue I see with searching by slugs is thats its slower than searching by
integer, and its more likely to be entered incorrectly by users when typing a
url.

You also can not change the slugging algorithm if you want to keep bookmarks
valid

~~~
adrianmsmith
I don’t agree it’s slower in a practical sense. If your service is backed by a
relational database like PostgreSQL then you’d have an index on the column
you’re looking up. So a request involves network round-trip latency, parsing
the request and preparing the response, and searching for the object in the
database using the index, most likely a tree with a depth of 3 or 4 levels. I
don’t think the number of characters in the key, or the data type, will play a
big role here. Happy to see evidence to the contrary though if I’m wrong!

The algorithm flexibility might not be a big problem either. The old slugs are
stored in the database. Update the algorithm, new slugs adhere to the new
algorithm, old ones unchanged, everything still works.

~~~
lxn
1) Indexing text vs indexing integer it's very different when it comes to
memory/space required by the database.

2) There is already an index on ids assuming they are used as primary keys,
why adding another index (on text)?

3) To support changing slugs (titles) you will have to keep all the old slugs
as well (to redirect them to their new versions). With ids you don't have to
do this - ids have no reason to change.

In the end you might not feel the difference in execution time, but hardware
requirements for servers...

I agree that they are annoying, but when the only reason against them is to
improve URLs aesthetics (assuming they don't pose a security/information
disclosure risk) - the trade of performance, hardware requirements and extra
code needed is not worth it IMHO.

~~~
thecatspaw
> 3) To support changing slugs (titles) you will have to keep all the old
> slugs as well (to redirect them to their new versions). With ids you don't
> have to do this - ids have no reason to change.

Which basically means a join in the database. While not terribly expensive,
its a lot more involved than a simple column value check.

------
dusted
When talking about public pieces of information, being able to refer to them
by their identity is useful and any hashing would be counterproductive (just
wasting power), also, I'm sure any database can operate on unique,
incrementing numbers more quickly than uuids. If we're talking about private
information, it should be transferred in another way than the url in the first
place, yes, the chance of "guessing" a hash is lower than an incrementing
number, but there are other ways it could be intercepted, a screenshot,
accidential copy/paste for example.

------
michrassena
Say I'm convinced that IDs could possibly leak information, but UUIDs are
problematic (e.g. cost of enforcing uniqueness grows with each added row).
Would encrypting the ID with some symmetric key plus a salt be sufficient? The
client sees a number where the ID can be recovered on the server side, but
which the benefits of having an easily generatable unique identifier while
making it challenging to recover the actual ID).

------
jasonhansel
If you want to estimate the total # of items in a DB based on auto-increment
ID numbers, you can use:
[https://en.m.wikipedia.org/wiki/German_tank_problem](https://en.m.wikipedia.org/wiki/German_tank_problem)

(Apparently this was used by militaries in WW2, to estimate the total number
of tanks from the serial numbers printed on them.)

------
xg15
What does he propose I do if two questions map to the same slug or if - beware
- someone wants to edit their question?

------
cmiles74
I worked with a customer that was concerned about the "security" of database
IDs in the URL. I though their solution was novel: they chose a random long
integer whenever a new row was created. It's fast like an integer and random
like a UUID, the best of both worlds!

------
fjni
Genuine question: What's the proposal for items in an API which have no
concept of a "slug"? We could generate a separate mapping of "public ids" to
the database internal ones, but I don't even see how that would solve any of
the issues here.

~~~
theoh
I'm not a database expert, but concepts like surrogate and natural keys come
to mind. A unique slug is a natural key, but for most, if not all, items, you
can come up with a natural key by combining a few bits of data (I've seen name
* date of birth used by doctors in general practice, for example). Often
there's a pre-existing unique code external to the app which can be used, like
a driver's licence number or something like that. That removes the need for a
surrogate key in the database.

The tricky case is something like email drafts: it's possible for a user to
create and then save a draft with no addressee, subject or content. In that
case there is little alternative but to make the system use a surrogate key (a
database internal key) because the other possible unique identifying values
(e.g. time of creation of the draft) are sort of unsatisfactory, because they
don't really have a strong, meaningful connection with the content of the
item.

------
ricardobeat
And then the question gets updated and you either have a broken URI, or a slug
that doesn’t match the content.

I’ve never heard of IDs being something mutable. When would that ever happen?

------
Geee
This is bullshit. It's absolutely necessary for the user to be able to refer
to the resource by an ID of some sort. A slug is not an ID.

------
btbuildem
One reason would be that the slugs are not guaranteed to be unique - they're
based on user-entered post titles.

------
cryptonector
Better blog title: "Do not use INTEGER PRIMARY KEYs in URIs" or "Do not use
ROWIDs in URIs".

------
spartas
It's a good thing™ Stack Overflow doesn't have those IDs in the URI anymore.

------
coldtea
> _of Atwood 's calibre_

What was Atwood's calibre at that point? (pre StackOverflow)

~~~
boobsbr
Coding Horror began in 2004, that's where I knew him from.

------
pointe
You should not expose any internal identifier (numeric or otherwise) because
it then becomes public & part of your API (even if undocumented).

~~~
ilikehurdles
In one case the api I worked on used encrypted IDs on the way out to the
client, and decrypted them on the way back to the db. Even when multiple
clients referred to the same resource, they would see different public IDs,
but any instance of the API could reach the underlying value from those public
values.

------
dr01d
Uuids

------
trpc
I like the way medium.com deals with IDs, 10 random hex chars (i.e. > 10^12)
put at the end of the slug are more than enough for most use cases that can be
addressed via a url and still clean for search engines and humans.

But also the way letterboxd and also quora deal with URLs is admirable but it
needs special care (at least in the case of quora)

