
Auto increment is a terrible idea - wheresvic1
https://www.clever-cloud.com/blog/engineering/2015/05/20/why-auto-increment-is-a-terrible-idea/
======
lucideer
> _imagine you make a tab-completed typo when deleting a row. Yeah, you 've
> just deleted an arbitrary row somewhere in your DB_

This line seems to be the sole justification for the hyperbolic title. You
might do tab-completed typo...

There's three justifications listed - the above (fairly minor, imo) one,
information disclosure, and entity enumeration (which in this context is just
another term for information disclosure).

The part on information disclosure has this line:

> _all it takes to know your user count is to create an account. A number of
> social networks boasting impressive user bases were pwn3d this way_

What? I would love a reference or explanation here. Who has been "pwn3d" by
exposing their user count?

The "security through obscurity is bad" rhetoric has been rightly debunked
plenty at this stage - obscurity is a valid component of layered security, but
obscuring your table row count seems... odd. Why would that be a secret?

If you're worried about people being able to enter primary keys into a URL to
access/scrape arbitrary records, you need better auth.

~~~
davweb
If you're doing manual updates to a production DB, using START TRANSACTION and
then checking the outcome of your changes before you use COMMIT stops typos
becoming catastrophes.

~~~
j_s
In case it's not obvious: new auto increment IDs are skipped forever whenever
a transaction is rolled back.

This isn't really an issue at all but can upset those with OCD-ish tendencies.

~~~
corobo
those with OCD-ish tendencies best hope they never lose any customers then.

~~~
btschaegg
This. There's so much in software development that relies on guarantees that
can only be upheld by behavior like this that I _sincerely_ doubt that anyone
having issues with skipped rows is in the right place.

There are arguments for and against any technique, but this one seems really
ridiculous to me, especially if you consider what any "countermeasure" would
do to the code (which is usually much more important than the data in the long
run!).

------
daigoba66
I really like UUIDs for all of the reasons described in the article. But can
there can be significant performance related drawbacks:

UUIDs are double the size of 64 bit integers. Which could mean fewer rows on
data and index pages, leading to memory pressure, leading to I/O pressure.

UUIDs are also random. I'm not an expert when it comes to PostgreSQL
internals. But for MSSQL it's common and recommended to use your PK as a
clustered index. The randomness can wreck havoc on the clustered index,
causing severe fragmentation due to page splitting. There are some techniques
to mitigate the effects of this (decreasing page fill factor, using
incrementing UUIDs, using a surrogate IDENTITY clustered key), but each has
significant tradeoffs

~~~
cm2187
The article doesn't mention them but there are incremental only UUIDs
(NEWSEQUENTIALID in TSQL) that solve that problem. Or you can prefix it with a
timestamp.

~~~
dhd415
The sequencing of those UUIDs resets every time Windows restarts. Given the
frequency of OS and SQL Server updates that require reboots, that wasn't good
enough for any of my workloads.

~~~
cm2187
Agree. And if you have two clusters, they will start at different IDs. Which
is why I would probably prefix them with a timestamp (if it is not a sensitive
information). Say the number of seconds since 1-1-2000. Fits on 32 bits, that
still leaves you with 96 bit of entropy for a UUID, which is plenty to
guarantee uniqueness unless you are working at massive scale.

------
prepend
I think you’re better off keeping the auto increment PK then expose a uuid
token anywhere it can be viewed.

You probably don’t want to be literally passing around the literal primary key
if you are concerned about leaking security information.

But I think it’s probably better so the user id being known is not a problem.
For example, not caring if people know how users/widgets/whatever.

~~~
brianwawok
This. Use the autoinc for FK. Use GUID for exposing.

------
dhd415
Making blanket generalizations in the title of an article is a terrible idea.
Auto-increment integer IDs work for different use cases than UUIDs. This
article does a decent job of describing the scenarios where UUIDs work well,
but its section on when UUIDs are less optimal is a bit lacking. In
particular, UUIDs are generally a bad choice of key for clustered or index-
organized tables because their random insertion order results in fragmentation
in the primary index. PostgreSQL doesn't have clustered indexes and perhaps
that's why that drawback wasn't mentioned.

------
Cieplak
Article doesn't mention this but if you're exposing UUIDs to users, it's nice
to base62 or base58 encode them.

E.g.

    
    
        uuid v4: de7013fb-658e-40a0-9c02-d23080b57fdb
        base58 : tt7Udwi9Tx9VE1ze7HDBxi
    

Here's a lib I wrote in c++ for doing this, with python and erlang bindings,
[https://github.com/cieplak/uuid](https://github.com/cieplak/uuid)

~~~
viiralvx
What's the benefit of doing this?

~~~
pluto9
It makes your URLs shorter.

You could also use base64, but it has some non-URL-friendly characters so it
would have to be escaped.

------
nathan_f77
I like using both UUIDs and integers. I use auto-incrementing integers
internally for all the model relationships, but the user only ever sees random
UIDs. I also like having different UID prefixes for each model, so you can
tell what a given UID is referring to.

Here's the code I'm using for Rails:
[https://gist.github.com/ndbroadbent/628ef1d0c044e402d7501fb7...](https://gist.github.com/ndbroadbent/628ef1d0c044e402d7501fb7373cbaff)

------
crazygringo
128 bits can be a waste of space, but I appreciate not wanting to publicly
reveal the table size.

One of my biggest wishes from a database would be a function similar to auto
increment, except that would pick randomly from the column's full integer
range, guaranteed not to collide. I almost never need 128 bits -- 32 bits will
let me handle 4 billion users.

It seems to me that a column index could be used to do that quickly --
essentially, if it's a binary tree, pick branches randomly but never pick a
branch known to be full.

------
elsurudo
Ehh, if your main concern if obfuscation of the primary ID, you can...
obfuscate your primary IDs to be used in your URLs.

------
dragonwriter
> A relational database is a graph where nodes are called entities and edges
> relations.

No, it's not. That's a common rationalization of the term by people who have
zero knowledge of relational DB theory, but a relation is answer of entities,
not a relationship between them; do, in this graph interpretation, “relation”
isn't the right name for an edge, as it names a set of structurally similar
nodes. An edge is a “(foreign key) relationship”.

> The role of a primary key is to provide a stable, indexable reference to an
> entity. You have two ways to construct your primary key: either as a
> semantic key, or as a technical key.

Why is this piece inventing new language when there are existing, widely
recognized terms? This is the classic ”natural key” vs. “surrogate key”
distinction.

> Also, serial is a 32 bit integer.

There are good reasons to avoid autoincrement, but that's not one, it’s just a
reason to use bigserial instead of serial.

> If you don't have mutable state, then you don't need a technical primary
> key, and you can use a semantic key.

Even if you have mutable state, you can use a natural (“semantic”) key instead
of surrogate (“technical”) key so long as the _identifying_ attributes are
immutable (which, if they are genuinely identifying, they logically must be.)

------
gjem97
Does anyone use ULIDs [1] instead of UUIDs for stuff like this? I like that
they're sortable.

[1] [https://github.com/alizain/ulid](https://github.com/alizain/ulid)

~~~
joekrill
What use is sorting a randomly generated value? I guess you're technically
sorting on creation time?

~~~
dozzie
> What use is sorting a randomly generated value?

In general, order determinism. The elements in the table or list don't shuffle
on every refresh.

------
johnlbevan2
A UUID is not always non-predictable / entirely random. For example:
`NEWSEQUENTIALID()` in MS SQL Server uses a randomly generated UUID, then
auto-increments from there. More info on this function's usage here:
[https://www.codeproject.com/Tips/125526/Using-ROWGUIDCOL-
in-...](https://www.codeproject.com/Tips/125526/Using-ROWGUIDCOL-in-SQL-
Server)

~~~
pwinnski
This is why the writer specifically called out UUIDv4, which is random by
definition. That's what type 4 means.

------
ealexhudson
There are definitely better reasons that those given in the article. My two
favourite are:

* it prevents tables having any obvious "natural" ordering. I still see developers writing queries, relying on the order of the data but not specifying ORDER BY. The flip-side to this is that sometimes they discover they need an ordering that they don't have - e.g. date record created - which is also great practice. * it removes pre-conceptions that developers might have about auto increments. I've seen auto-increment used for invoice numbers more times than I care to remember: a task it's entirely unsuited for. If a developer needs a monotonic ID, but don't have an AI field, they have to think longer and harder about the type of number they need, and what the rules about transactions rolling back need to be. Again, better correctness is the result (at a cost of more time / thought).

Auto-increment isn't a terrible idea at all, and there are good UI reasons to
not like it, but there are also decent reasons to use something else.

~~~
wolfgang42
What makes autoincrement unsuited for invoice numbers, and what would you
recommend instead? (I'll admit to being one of those developers who hasn't
thought long and hard about this, but I can't off the top of my head come up
with any reasons why they would be a problem.)

------
EGreg
There is a different reason that auto incrementing ids are bad: they don't
scale.

When you need to shard your table, the autoincrementing is is a bottleneck
across all shards as you need to hold a lock on it when incrementing. In fact,
that's exactly how it's implemented explicitly in POSTGRES.

And if you want something that's way MORE distributed such as CockroachDB or
(RIP) Riak then forget about it.

But you don't necessarily need UUIDs. You can use for example
[https://www.npmjs.com/package/shortid](https://www.npmjs.com/package/shortid)

What we do is generate a random 7 character string (alphanumeric lowercase)
and then check if it exists in the db and - if not - insert it, using a
transaction. Otherwise repeat until one works.

------
pmontra
I'm using PostgreSQL's UUIDs as primary keys (and foreign keys) in a Phoenix
application (a JSON backend to a React frontend). The only painful moments
are:

* Typing SQL queries in the development database.

* Trying to remember which record is which by looking at the UUIDs. Often the first two characters are enough but it's a lot of random noise.

Inside the program it's all the same. No apparent performance penalties so
far, but we don't have much data.

I used slugs in Rails apps not to disclose numerical ids from public URLs and
data. There is no need to do that with UUIDs. However, if this were a server
rendered web app I'll probably still generate slugs and use them in the URLs.
UUIDs in permalinks are just too weird to look at.

------
tatersolid
ULIDs are a much better answer than type-4 GUIDs. You get insertion-order
sorting with a large random component to prevent collisions/guessing.

A similar solution called COMB identifiers dates back at least 15 years in the
MSSQL community.

Watch out for sort order in MSSQL with binary ULIDs cast to UNIQUEIDENTIFIERs;
they sort with the last 4 bytes as most significant, so some byte swapping is
necessary to prevent the performance degradation caused by index page
splitting (I wrote a ULID implementation for MSSQL that handles this).

[https://github.com/alizain/ulid](https://github.com/alizain/ulid)

------
Cieplak
Would it be useful to compress 128-bit UUIDs to fit them into narrower columns
and reduce IO? I don't fully grasp the issues this would entail.

Looks like there's an old patent related to this:

[https://www.google.com/patents/US5913217](https://www.google.com/patents/US5913217)

------
NedIsakoff
Who the heck uses a editor with tab completion to make changes to production
database? As a DBA, I'm horrified...

------
danielovichdk
"Use UUIDs as primary keys. They can be freely exposed without disclosing
sensitive information, they are not predictable and they are performant."

When have a primary key ever been sensitive infomation ? And why would they be
predictable ?

~~~
krapp
If you use auto-incrementing IDs as both primary keys and account identifiers,
which a lot of web applications do, it might be possible to identify admin and
employee accounts because they're more likely to have the lower IDs. A recent
Steam hack relied on this to allow bypassing the approval process for
Greenlight[0].

[0][https://medium.com/swlh/watch-paint-dry-how-i-got-a-game-
on-...](https://medium.com/swlh/watch-paint-dry-how-i-got-a-game-on-the-steam-
store-without-anyone-from-valve-ever-looking-at-it-2e476858c753)

------
tinus_hn
UUIDs save you from accidentally relying on incrementing identifiers and they
make it easier to keep them unique.

The downside is that they are really long and can’t be typed or remembered.

------
snarfy
Use both. PK = int, APIID = UUID

Using integers, auto incrementing or not, reduces the resource pressure on
indexing when doing complex joins.

When you need to expose a key, expose the APIID.

~~~
lmm
That's the worst of both worlds. Writes are slower than ever since you have to
update both indices on every insert. If you normalize your database you have
to do a bunch of extra joins for every API response. If you denormalize it you
have twice as many columns and either complex foreign key constraints (slowing
down writes even further) or the possibility of getting your data into an
inconsistent state.

------
zspitzer
this is all great, but what happens when you need to reference several uuids
in an url?

~~~
wheresvic1
Check out twitter snowflake for shorter distributed uuids

~~~
piquadrat
Snowflake has been retired in 2014 [1]. Are there any similar services around
that are still maintained?

[1]:
[https://github.com/twitter/snowflake](https://github.com/twitter/snowflake)

~~~
wheresvic1
Aha yes the server has been retired - but the underlying principles were
documented in a paper and there are quite a few implementations out there.

------
zerostar07
i expected a blockchain solution. disappointed

------
rkagerer
Your problem isn't auto-increment, it's what you're deciding to expose outside
your database.

Many of your critiques stem from the simple assumption that primary keys will
be shared externally (see final link below on why that's not necessarily a
good idea). There's nothing wrong with using auto-increment primary keys, then
creating a different, dedicated field to store external identifiers. Before
UUID's were invented we used simple random numbers for the latter (converting
to hex or base-64 in URL's if more compactness was desired) and they worked
just fine.

I agree 100% with frik's comments. Keep in mind your team will be dealing with
primary keys a LOT. In our own projects I've seen developers become a little
less efficient when working with databases that overuse bulky UUID's. I don't
feel they're particularly "human friendly". They take up more pixel width in
SQL query results, leading to more back and forth scrolling. JOIN results
starting with a whole screenfull of ID garble weren't uncommon. Unless your
developers have eidetic memory they wind up doing more copy and pasting (with
the associated cursor movements, clicks and ergonomic odometer ticks) to jot
them down or share them. Ever tried to name a UUID over the phone? In the
early days of your project, when you're doing a whole heck of a lot of raw
interaction with your database, auto-increments can offer a slight advantage
of being so much more compact, easier to remember, and simple to communicate.
It may not sound like much of an edge, but it's one less friction point
slowing down collaboration.

I think you might also be overlooking the fact that some UUID implementations
are designed to be _unique_ , not _random_ , and depending on which you use
might leak information or be predictable
([https://blogs.msdn.microsoft.com/oldnewthing/20120523-00/?p=...](https://blogs.msdn.microsoft.com/oldnewthing/20120523-00/?p=7553)).
I'd be careful about sharing those primary keys unless you're sure they're
generated from a decent [maybe cryptographically secure pseudo-] random number
generator and aren't derived from the current time / MAC address. If you want
to dive deeper there's a subtle tradeoff here between privacy and ease of
sorting (e.g. in a clustered index).

I empathize with your argument that sequential keys can lend themselves to
typos deleting the wrong record (which can be a pain even in non-production
environments) and have seen arguments to the benefit of knowing UUID's in
advance before a record is inserted (less constraints writing your SQL,
reduced roundtrips) and ease of merging data between databases. It can also be
deadly if your auto-increment generator reuses deleted keys and your system
doesn't account for that. In the past I've created my own ID dispensing
mechanisms optimized according to the features I wanted (e.g. sequential or
random, reservable, globally or locally unique, human-friendly with number of
digits growing in chunks, etc) but admittedly that could be overkill for your
project.

I don't have anything personal against UUID's and don't mean to come off
condescending, I just think you missed some aspects in the message you
broadcast.

Here's a great, easy-to-read analysis of the pros/cons and when to use each:
[https://tomharrisonjr.com/uuid-or-guid-as-primary-keys-be-
ca...](https://tomharrisonjr.com/uuid-or-guid-as-primary-keys-be-
careful-7b2aa3dcb439)

------
frik
UUID are ugly, hard to remember in your head, take a lot of space, etc.

I favor auto increment IDs any day, except in specific situations. It's worth
to consider to learn how other sites dealt with IDs in public, while
internally using auto increment IDs. Short: there are several strategies.

But to each their own, if you love UUID, just use it, but don't write lame
blog posts how terrible things are.

~~~
protonfish
I always try to use a semantic key first, if it makes sense. For example, I
see this in my current database:

id utility

1 electric

2 gas

3 water

What is the point of the numeric ids versus simply making the name the primary
key? I'd argue there is none, plus there significant drawbacks.

Every time we start with the utility name and need to get the row id, the
database needs to search on a non-primary key. This is going to be commonly
used in JOINs (and tables designed with superfluous ids lend themselves to
many gratuitous joins.) Optimizing the primary keys based on the typical data
used to search (actual key data from the user's standpoint) will make your DB
blazing fast.

Also, using integers everywhere (or even worse, UUIDs) destroys the
readability of queries. How many integer ids could be used in a query and
still be understandable? The id `3` could mean almost anything, but the id
`water` brings meaning to queries. The benefits to maintainability are
significant.

Lazy DB designers seem to have a bad habit of adding auto-ids to everything
regardless of value. I even see joiner tables (that should be using the
composite primary key of multiple foreign keys) using unnecessary auto ids!
This is sloppy work.

In short, I have found significant benefits to using semantic keys first (if
appropriate - there are definitely reasons to use incremental integer or UUID
keys in many tables) for these reasons:

1\. Fewer columns without loss of meaning 2\. Fewer joins needed 3\. Faster
searches 4\. More human-readable

------
_ao789
I agree

------
tkyjonathan
So, you would rather have your primary keys be a 36 character UUID string
instead of a 4-8 byte integer?

This is for every single row in your database and will increase the size of
your database substantially.

~~~
daigoba66
Don’t store UUIDs as strings.

Use the native UUID data type, which only uses 16 bytes (UUIDs are basically
just 128 bit integers). That is still double the size of a 64 bit integer, but
the trade off is that it can be random and universally unique.

~~~
33degrees
... for databases that have a native UUID type. MySQL doesn't, and the choices
are either store them as strings or store them as non readable bytes. Not the
greatest options.

~~~
dragonwriter
You can store them as bytes and use a function to convert them to/from a
display format (which is what DBs with a built in UUID type do under the
covers.)

~~~
33degrees
Yes, and I've done that in the past; it's still somewhat laborious though.

