As the article points out, this is only an issue with UUIDv1. They claim "However, it is what most applications use." but I have no idea how true this is. I was under the impression that the vast majority of UUID generators were v4 by default. For example:
The `uuidgen` CLI tool, at least for modern versions (I have not checked historically), says (from https://man7.org/linux/man-pages/man1/uuidgen.1.html): "By default uuidgen will generate a random-based UUID if a high-quality random number generator is present." (later it lists /dev/random as such a generator, present on almost all systems)
What's an example of a system that generates v1 uuids by default?
Although plenty of UUIDs are passed as strings in eg JSON, I was under the impression that where performance really matters (like db indexes) they were stored and compared as 128 bit fields. To be fair, the points about word sizes and ordering make sense.
Didn’t know this - true as of MySQL 8.X. I would say “shocking” if it wasn’t in keeping with a number of dubious decisions made over the years. We are using MySQL, but thankfully are generating UUIDs in our app layer.
1. Nobody uses UUIDv1. Why use UUIDv1 as a straw man argument?
2. UUID strings are awful for storage -- don't use them. Yes there are databases that support UUIDs natively, why is whether or not a UUID fits into a machine word relevant? You use UUIDs for its other properties that 64-bit integers cannot offer. KSUIDs are touted as fixing all the aforementioned issues but they're even bigger than UUIDs.
3. Both KSUIDs and UUIDs are hard for humans to read compared to 64-bit integers.
4. You don't have to encode UUIDs as hexadecimal numbers plus dashes. You can choose any binary encoding you want, I am partial to Crockford Base32 because of how general-purpose it is (no vulgarities, case insensitive so it works on Windows filesystems).
5. I still consider time-sortable UUID alternatives (like ULID) to be UUIDs. This article should have explicitly mentioned UUIDv1 and UUIDv4 in the title and it wouldn't have been so flamebait.
> Is my knee-jerk judgement that this advice borders on nonsense, unwarranted?
No, the advice is nonsense. URIs in what scheme?
I mean, since URN has a URI scheme and UUID is a URN namespace, so urn:uuid:<uuid-value> is a URI, “use a URI” is not really a mutual-exclusive alternative to using UUID, its just much less specific.
Yeah, everything I’ve seen (and written) which uses UUIDs uses UUIDv4, and the main alternative for similar use seems to be ULID, not any of the other UUID versions.
Similar to other comments, I've only encountered v4 in my career. Is there a large domain where v1 is the norm that dominates the statistic, and most people happen to not work in that domain? If the author knows, I wish they'd say.
> They are awful as keys – being strings, comparisons are dramatically slower than with integers. And even if your database has a UUID type, it’s still worse because the identifier doesn’t fit into a machine word.
I’m just a bit confused, a UUID is made up of hexadecimal digits, so why would it be stored as a string? It’s also 128 bits long, so it should fit into two words, excluding whatever overhead the DBMS puts on the data type, which is really their problem to worry about.
This is a non-uncommon mistake. People, such as the author, seem to think that the UUID is a hex string containing dashes, since that is the format that they most frequently see them represented.
You are correct that a UUID is a 128 bit identifier, and so, fits in 128 bits.
A company which handles money as its primary business is using MongoDB? Hopefully the actual dollars are always handled in a “realSQL” DB with proper transaction isolation.
I recalled that specifically because I remember having the same reaction when I found it out (I think? I have a distinct memory of learning this) back in 2012 or something.
I've had a similar issue with MongoDB's ObjectIDs. They are generated using a combination of process id, UNIX timestamp and a counter that is randomly initialized during process creation. The issue when docker comes into the mix is that the root process id of every container is 1 so a decent chunk of entropy is removed from the ObjectID. Add to that the fact that the timestamp doesn't have millisecond resolution, the only thing saving you is praying the counter of any of your processes never overlaps during the same second.
It's unlikely to happen but still possible and it has brought down some of our parallel worker pool because once you have a collision, you are bound to keep generating the same id sequence until you restart your whole process to randomize the counter again.
MongoDB eventually switched away from doing that and now just generates random numbers for the pid and machineid fields of ObjectIDs. The timestamp is still there because people rely on being able to sort on that (which is a bad idea for various reasons), but it's at least 24 bytes of randomness now.
I've never thought UUIDv1 was useful in any virtualized context, and I hope it should be obvious, but maybe it's worth stating in the UUID generation docs. It is already explained somewhat well what the versions are in Python docs.
However, with all the things already supporting UUID, I also don't see any reason to switch from UUIDv4 to anything else. I don't see how UUID, in general is obsolete, with the support it has from different libraries, and databases.
The ULID spec encourages people to assume they can be sorted in create time order, but it does not handle clock skew. I wouldn’t use it except in a system that can rely on having a single monotonic clock, because I worry about things that are almost true.
I was confused by this title because I only use uuid v4...the author covers that in the article, but I'm surprised that so many people use uuid v1. I thought v4 was the most popular, but that's probably just because I mostly work with my own code
Is there any reason to use anything except completely random UUIDs? I vaguely remember reading about problems with MAC-based UUIDs decades ago, my impression was that they have been discouraged for a long time already.
> Note: this is only correct about UUID version 1. However, it is what most applications use.
Okay, so, not all UUIDs, just v1. And, for some anecdata, I've actually only interacted with UUID v4 in my entire career; I don't know what the actual norm is, but I'm surprised to hear that it might still be v1.
> The only other practical option is version 4 – the random UUID – but random is intuitively worse, right? Read on to find out.
Oh… how is it worse?
> * They are awful as keys – being strings, comparisons are dramatically slower than with integers. And even if your database has a UUID type, it’s still worse because the identifier doesn’t fit into a machine word.
> * They are excessively long – each character of a UUID only encodes 3.5 bits of information if you count the dashes. That’s twice as less compared to 6 bits of Base64.
Sorry, UUIDs are not strings, they're 128-bit integers. They have a standardized string representation, but if you're storing a UUID as a string, you're either being required to because your language/db/tools/etc. don't support UUIDs correctly, or you're doing it wrong.
> * They are not time-ordered – despite containing a timestamp, its bits are mixed up within the UUID: the top bytes of the UUID contain the bottom bytes of the timestamp. Databases do not like an unordered primary key – it means that freshly inserted rows can go anywhere in the index. And you can’t use UUIDs for ad-hoc time sorting by time, either.
This is definitely a drawback when using a UUID as a primary key, and there are alternatives for this specific use-case. However, I think the best solution I've seen to this is to use a typical 64-bit integer for the primary key, but a UUID for a user-visible ID (so that you don't leak information about the primary keys to users); this makes joins and indexes fast, but avoids the leak to the end-user.
> * They are bad for human comprehension – UUIDs tend to look alike, and it’s hard to visually seek and compare them. This comes from experience.
This is exactly why they shouldn't be used as an Id anywhere that a human needs to interact with one. In the above solution I mentioned, the most common ID for which you'd want to use a UUID is the user's id—the user specifically has no reason to ever refer to their or anyone else's id; they'll use the human-readable username/handle equivalent instead. And developers don't need to care about UUIDs ever because inside the db, you'd have the integer primary key that you use for joins. This seems to solve all the problems?
> I kindly suggest that UUIDs are never the right answer.
Honestly, I think you've only convinced me that UUID v1 is never the right answer… and I think that's mostly been true since v4 came about.
The database argument is also specific to certain databases and table types. It should only matter for clustered tables. So if e.g. I use Postgres, this doesn't matter as it doesn't have clustered tables.
Sometimes, I am amazed about what gets on the front page of ycombinator.
TLDR: Don't use UUID v1, since its entropy is based on the Mac address, if your cloud provider is generating the same mac addresses for all your containers.
In practice, I generate UUIDs entirely using entropy from /dev/random. The probability of a collision is really low for most use cases (although not if you are Google and need something unique across all database rows in your company or something similar).
Hopefully you're setting the appropriate bits. A UUID is a bit-packed struct/union at heart, really; if you're just reading 128-bits of random data from /dev/random, that's not a UUID; passing it off as such would be needlessly confusing.
(It's fine to make a new format / it's not terrible approach for making a random ident, though you might want to peek into, e.g., ksuid from the OP for some interesting points about why you might not want to do that, plus some advice about getrandom() over /dev/random.)
This is a bold claim and doesn't match my experience at all. UUIDv4 is all I see, everywhere, everyday.
That's also a big enough caveat to put in the title: if you have a beef with UUIDv1, say UUIDv1 is obsolete.