Hacker News new | past | comments | ask | show | jobs | submit login
Moving Away from UUIDs (2018) (neilmadden.blog)
281 points by Alupis on Nov 21, 2022 | hide | past | favorite | 230 comments



If you're using them for unguessable random strings then yeah, they're not ideal.

If you're using them for providing a unique id in a distributed system, with very little chance of collision & fitting them in a db column, then they are great.


Pretty much, my first reaction was "people use UUIDs for session tokens ? why? ?

Seems like author made some bad choices in previous systems and now just figured out why tbh.


I’m not sure it’s bad to use a random UUID (v4) generated with a random number generator designed for cryptography for a validated session key.

A guess means making a request to your server. You won’t be concerned with ~2^64 guesses per second.

I’m not suggesting anyone do it, if you have a choice. (Especially consider you’ll probably have to go through the trouble to justify it to people who read articles like this but don’t understand the math.) But if you have an existing system, consider whether you can let it stand.


Well, existing system (that for whatever reason can't do CSPRNG-> base64) could always concat 2 UUIDs


Depending on the UUID algorithm, some are cryptographically sufficient true random, then it would make sense..


what if you sha it?


Adding a crypto hash allows to check that the hashed value was not changed, because finding another value with the same hash is hard, by definition of a crypto hash.

But here the problem is not forging an ID, it's guessing an ID, and hashing does not widen the search space, does not increase randomness.


> Adding a crypto hash

I think the poster you replied to was meaning using the hash output as the token, not that you would maintain the original token and a salted hash for verification.

If they are thinking SHA(GenerateUUID()) would have better entropy then they are incorrect even though all SHA variants output more than the 128-bits in the source UUID. I assume such misunderstanding comes from the fact that some PRNGs are based upon repeated application of cryptographically assured hash functions against the seed data.

Using some unreversible transform would solve the issue of potentially leaking information in the UUIDs, but if that is an issue then instead use a UUID variant based on purely random data (v4?) as that would be more efficient and not result in value that is longer but contains no extra entropy.


That actually reduces the usefulness as you're hashing the data into a smaller length.


It seems uuids are 128 bit, while sha is 160 bit. There is also sha256 and sha512 for longer hashed. So there shouldnt be any worries about the hash being shorter.


Rereading I am guessing you're merely pointing out that the comment regarding shortening the length is untrue. If you already understand the entropy issue here, please treat my "you"s as royal you's.

You have a 128 bit value. That's 128 binary digits. Each digit can be zero or one. That means you have 2^128 possible distinct values. (Ignoring the fixed bits in UUIDs since it's not important for sake of this argument.)

Now you use a one-way cryptographic hash on top, like sha256. This will return a specific hash for any given input. It is always the same for a specific given input, and it is nearly always distinct. The output that a hash has may have more bits, but the number of distinct values can't increase; it can only ever decrease. That's because you could only ever give it 2^128 different values. How could it ever return more outputs if each input corresponds to one output?

To make it more clear, let's say you have a database where you want to store a customer's zip code so you can use it as some kind of validation later on to ensure it matches, but you don't want to store it in plaintext, so you hash it. The hash is 160 bits. Secure, right? Wrong. There are less than 50,000 zip codes. It would be trivial to calculate the hash of every single one and use it as a simple hashmaps from hashed value to plaintext.

You may be thinking this is impractical for an input domain as large as 2^128, but realistically it only adds a slight roadblock. Knowing the only valid values will be hashed UUIDs, instead of picking 160 random bits, you'd be much better off picking a random UUID, hashing it, and trying that for each attempt.


Yes, some hashes might not meaningfully hurt it, but they won’t add any entropy, which is the real problem.


Not being snarky: what's the risk of using UUIDs for session tokens if they are created by the server/db and are always verified by server (db) (for authorisation etc)?


Well, V4 UUIDS per wiki are pretty random, but your generated UUID could actually use your MAC address and current time to be globally unique. So, less entropy. Just use them as a (globally) unique thing but not as a secret.


Basically, know your UUID generator type. V1, V2, V6 and V7 are mac/time dependant and more useful for f.ex. DB-keys whilst V4 is more useful for things that should actually be secret.


So there's nothing actually wrong with UUIDs as secrets, if you know what you're doing and how to mitigate the risks?

So pretty much the same as every other damn thing in software that gets an "X Considered Harmful" article? :-D


I would trust a reputable cryptographic random number generator library to really care about generating truly unguessable, high entropy cryptography-grade random numbers. I would trust a reputable UUID library to generate a UUIDv4 which is random enough to not produce a collision. I would not trust a reputable UUID library to generate truly unguessable, high entropy cryptography-grade UUIDv4s.


Not really. The articles point is that even a v4 UUID (the random one) doesn't have enough randomness as other options, and it has a much less compact representation.

UUIDs are not designed to be secrets, so they are a poor choice. They'll probably work, but there are better options.


If you know what you're doing and mitigating the risks, you don't waste your time trying to use UUIDs for secrets. Therefore people using UUIDs for secrets, by definition, don't know what they're doing and certainly aren't mitigating the risks.


UUID is fundamentally just a binary --> text encoding for 128-bit numbers.

There's nothing whatsoever wrong with using a cryptographically secure mechanism to generate a random 128-bit number and then representing that as a UIID in plaintext.

The issue would be using a UUID generator (there are many versions, and several of those use MAC addresses and time for a bunch of the "entropy" - so they are not cryptographically secure / random).

Your comment is overly reductive.


You’re splitting hairs and missing the point of the article.

Nobody is referring to “UUID” and just meaning the representation. I would think it’s obvious people are referring to using a UUID generator e.g. `uuid.uuid4()` so no, I’m not being overly reductive. I’m just following the common understanding that everyone has when we say “UUID.”


From my experience even from bigger companies it is sometimes common practice.


Yeah I don't really get the point of this article, if you need random values of a specific size don't use uuid, it's literally specified to be one exact length and format.


>>Yeah I don't really get the point of this article,

To get clicks?


You're not wrong lol


The number of comments saying "using UUIDs for secrets isn't that bad" suggests this article needs to be written...


one exact length and five "versions" of the format (so far)

https://en.wikipedia.org/wiki/Universally_unique_identifier#...


I made a comparison list with the most known uuids out there, a couple of days ago, it was quite fun discovering all the different kinds of uid and their pros/cons.

https://adileo.github.io/awesome-identifiers/



KSUIDs are fairly popular and missing from your list:

https://github.com/segmentio/ksuid


what's the resolution on those? 32 bits, 100 years.. that seconds right? doesn't sound excellent for time ordering. 100 years also seems a little short but at least I'll be dead


Don't look at it as being your problem in 100 years, but as helping employment in 100 years and helping the economy ;)


ULID example should be in uppercase.

Love this chart tho.


Also most well-designed systems only use the UUID as the representation format and use raw bits in performance-critical parts.


The raw bits are the UUID, the hex string is just a human-readable representation that also plays nicely with JSON.


Tell that to Django (well 5 years ago anyways iirc, don't know what it does now). Pretty sure it used to store uuids as strings columns in your sql.


I suppose Django wouldn't consider the speed gains of using raw integers in the database worth the hassle of dealing with binary data when you have to manually deal with the database somehow. I usually use string columns for UUIDs myself for the same reason.

It's also not given that it'll be a performance benefit, you probably receive UUIDs as strings from some client and probably want to return UUIDs as strings to the client, and that conversion isn't free.


Yep, looks like it does the right thing in PostgreSQL but not anywhere else [0].

https://docs.djangoproject.com/en/4.1/ref/models/fields/#uui...


I feel like it did strings in postgres too, not too long ago and I had a <brain explode> moment when I worked on a codebase and had to figure out why queries were terrible


Supposedly the behavior hasn't changed since at least version 1.8:

https://docs.djangoproject.com/en/1.8/ref/models/fields/#uui...

It may not have worked correctly on your project for some reason?


Actually now that I remember we were using MySQL and I was pushing for migrating to postgres partially for this reason, sorry it was a long time ago in startup time


Or to PowerBI, which will any UUID to a string even in joins. That cast + string comparisons + killing of indexes is not conducive to performant queries...


It’s a 128 bit integer - the serialization format does not change the fact.


Use uint128_t instead.


It is also highly recommended that you include a check digit into it, to minimize the chance of a collision. I've used https://arthurdejong.org/python-stdnum for that purpose.


I don't see how a check digit minimizes the chance of collision. (Here, I'm assuming that a check digit is calculated from the other digits. What am I thinking about incorrectly?)


Looking at the docs for the library linked, it appears to be a Verhoeff algorithm check digit... so yeah, you're correct.

This is effectively a simplistic stand-in for a CRC type system -- useful to detect if the data has been corrupted, but not useful to avoid collisions.


And if someone is worried about UUID collisions, they need to rethink their priorities in life.


You are correct, this should teach me not to write comments when I'm too tired. :/

The check digit wouldn't really help with collisions, since if the strings are the same the digit will be too. They are primarily useful when we need to ensure correctness on human input.


There's probably a non-trivial amount of folks that equate a UUID with "unguessable" given their appearance. They are, after all, not sequential and using them to obscure things like number of users (using a UUID in place of an incrementing number) seems like a natural fit.

Given how easy it is to generate a UUID in most languages, and given the low likelihood of a collision within a system - it wouldn't be a huge leap to think UUID's could replace homebrewed random string generators for things like password reset tokens, etc.


> There's probably a non-trivial amount of folks that equate a UUID with "unguessable" given their appearance.

That's near enough to true for anyone not operating at "web scale".

FAANG/BAT engineers need to care. My systems with 10s or 100s of thousands of users (or, you know, a few thousand users tops) are without doubt going to be re-written (probably several times) well before I have to worry about having so many UUIDs in the wild that this becomes a reasonable thing to worry about.

For me, at the scale of systems I run (or will conceivably run in the medium term future), I think the simplicity/understandability of code that uses native language UUID functions is "the right thing". Whoever does the next big rewrite to support a few million MAU will be thankful they don't have to work out WTF I was thinking when I decided to roll my own random access tokens.


I doubt FAANG engineers need to care either. Ignoring that the author imagines 8k IoT devices per living human for one service, 2^64 requests per second is an absurd number to use. Assuming one server can do 10M RPS, you'd need 1.8 trillion servers to handle that load. You'd also need over 2 billion Tb/s of bandwidth to receive just the UUIDs with no overhead.

It doesn't matter what computing resources your attacker has; the limit is how much your infrastructure can handle, and the author casually overestimates that by about 10 orders of magnitude. So replace 35 minutes with 350 billion minutes, or about 660,000 years.


Thanks for this. I thought I must be missing something because this seems like such an obvious point.

I find it hard to believe that there is a problem with a (cryptographically random) 122 bit session key considering that a brute force attack on it will result in a DDoS, which is obviously self limiting.

Lots of people here are saying “never use a uuid for a session key”, but I don’t understand this. What’s the accepted entropy for a session key?


I think the even more absurd rec is to use 160 bits as a "sweet spot"? Why? Who said that? Which real world scenarios? Why not 159 or 161...

Then you realize the author is just talking out their rear end with no thought...

"Yes I often find my cracking buddies with their super computers just give up hacking my online user service when I bumped my user token length from 159 to 160 length", said nobody, ever.


> "Yes I often find my cracking buddies with their super computers just give up hacking my online user service when I bumped my user token length from 159 to 160 length", said nobody, ever.

Reminds me of this sketch: https://youtu.be/IHfiMoJUDVQ


Even they shouldn't need to be concerned much with collisions. Wikipedia suggests[0] "generating 1 billion UUIDs per second for about 85 years". Is it possible? Sure. Is it likely? Not really.

[0]: https://en.wikipedia.org/wiki/Universally_unique_identifier#...


I guess from the article it's not just collisions, but the (significantly more likely) problem of guessing a UUID that's valid (out of all the issued tokens).

But yeah, even that is very very low risk. The article had to make some outrageously pessimistic assumptions to get it's "38 minutes!" number. Issuing a million tokens a second with two year validity, and getting attacked with the entire hash rate of the bitcoin mining community. And having both enough backend capacity to handle all those requests while at the same time having no observability or rate limiting to mitigate a brute force attack.


> I guess from the article it's not just collisions, but the (significantly more likely) problem of guessing a UUID that's valid (out of all the issued tokens).

Assuming random UUIDs:

If you're counting all the UUIDs anyone makes, then valid<->attacker matches are a subset of all possible collisions and therefore less likely.

If your baseline is only the collisions between valid UUIDs, then whether an attacker is more or less likely to collide depends on whether they're generating UUIDs at least half as fast as the system they're attacking.


> That's near enough to true for anyone not operating at "web scale". FAANG/BAT engineers need to care.

I’d argue even then it’s really not much a concern. You’d need to generate 1 billion UUID v4’s per second for over 75 years to have a 50% chance of there being a single collision.


You can generate sequential UUIDs, IIRC, that’s the best way to store them in a db and still have good partitioning/indexing. I don’t use UUIDs often, but I vaguely remember researching this problem space at some point.


I think most languages let you chose which version of UUID you want - with most defaulting to the random version (I think 4?) by default.

There are other versions that are sequential/time-based though, but using these could open the door to de-obfuscating whatever data you wanted to protect via UUID's in the first place (like how many sales orders you receive per hour, etc).


I don’t think uuids are designed for obfuscation, though they certainly help with that as a side effect. I could be wrong though, I’ve never looked into it.


They (randomized type 4 UUID's) obfuscate as a side effect because they are much more difficult to guess due to their randomness. As the article points out though, they are not impossible to guess... but it will come down to your risk tolerance and what the UUID's are "protecting".

People like to reach for UUID's when obfuscation is needed because inventing your own duplicate-aware random string algorithm isn't what most folks want to spend their time thinking about. Plus, these days, many databases come with UUID-aware data types that make using UUID's fairly straight forward.


UUIDs are a vast improvement over integers for preventing simple attacks like +/-ing the id and seeing what happens.


But then you're back to collisions, and you may as well be using longs.


I think v7 uses microseconds since epoch + random data. The odds of a collision should be practically 0, or more likely to find a sha256 collision.


> more likely to find a sha256 collision.

This is obviously, and egregiously, false.


I don't know. You'd need quite a number of threads + machines generating uuids in the exact same microsecond to get an opportunity for a collision. It doesn't seem obviously false.


I didn't say a collision is easy, I said it's obviously false it's harder than colliding a sha256, a space roughly 95780971304118053647396689196894323976171195136475136 times larger.


It /can/ be smaller though, as the probability is linked to the number of global threads generating UUIDs in any given microsecond. Thus if you have enough machines generating UUIDs, you'll have more chances for a collision. Given that it is only possible to generate a collision in a given microsecond, and not globally across ALL TIME, over ∞ time, the probability of a collision(UUID) == 0, while a collision(sha256) == 100%.


Your counter rolls eventually. You're underappreciating how much larger 2^256 is than 2^80 or even 2^128.


“Moving Away From Misusing UUIDs”


My only wish is that UUIDs were sortable and still contained their timestamp. When bug hunting, sometimes things become a little more obvious when there is an exact start and end to ids with issues.


There are KSUIDs that aim to satisfy this

A go ref impl: https://github.com/segmentio/ksuid



Depends on the version used. Some of them do encode time. But since people don’t like to leak information they use the random version (4).


They're little endian so not sortable


What does that have to do with anything?


>>>> My only wish is that UUIDs were sortable and still contained their timestamp. When bug hunting, sometimes things become a little more obvious when there is an exact start and end to ids with issues.

>>> Depends on the version used. Some of them do encode time.

Encoding time isn't enough, it has to be big endian (unless you write a special sorting function for uuids). Timestamped uuids store the timestamp as [timestamp_low, timestamp_mid, version(!), timestamp_high][1] which doesn't sort right.

[1] https://en.m.wikipedia.org/wiki/Universally_unique_identifie...


According to that Wikipedia page the binary representation of UUID 1 is big endian. It's the date-time and MAC address version.


The whole 128-bit value is encoded in big endian. But the fields decompose into something that's not. If you search the page for "UUID record layout" you'll find the timestamp as I've described it.


Last week I replaced an old broken window. I took a fragment of glass to the local glass company, and asked whether they cut custom glass. The receptionist said, "No, we do not." After some back-and-forth, I realized she thought I wanted to provide my own glass. I clarified, "I am just trying to replace a broken window." She handed me a form, and the next day I had a correctly shaped glass pane.

If UUIDs contain time information, then they can be sorted by time. The details of the encoding, while important for actually implementing the sorting algorithm correctly, don't really seem relevant when reasoning at a high level?


You can use ULID and store it as UUID since they are the same size. You can check this article for the details:

https://blog.daveallie.com/ulid-primary-keys


UUIDv7 is sortable by time but I’m not sure if it’s possible to derive the time stamp from the UUID somehow.


The first 48 bits of uuidv7 is the number of microseconds since the epoch.


I’ve always liked the pattern of putting timestamps on any objects in my DBs.


I implemented it myself. Was a little bit tricky, but not rocket science.


Mongodbs ObjectId has this property.


Something I don't understand: how are UUIDs not safe given that they are probably better than 99.9999% of passwords generated by users?


Does your UUID library use a cryptographic safe RNG?


Java's does, and that's the implementation the article discusses.


But this is the point though, UUID is the wrong tool for the job. You want a cryptographically random blob of entropy and you reach for a UUID because it happens to contain some of that in a specific implementation.

UUIDs are for uniqueness and involve implicit trust. Cryptographic libraries are what you need to generate entropy blobs without weakening security/confusing the next developer etc.


UUIDs are nearly half the mac address of the server + a timestamp. They are in no way random.


That's UUID v1. The random one that everyone uses is v4.


I have seem some common libraries that default to v1, so I can see why there’s some confusion in here.


> Something I don't understand: how are UUIDs not safe given that they are probably better than 99.9999% of passwords generated by users?

UUIDs are 128 bits. Which is beat by a 5 character a-z random string.

It's certainly possible that they're better than the median password - especially if there isn't a check against a common password list. But it's pretty easy for user chosen passwords to be much, much better.

I strongly doubt that your 6 9s estimate is accurate.


> UUIDs are 128 bits. Which is beat by a 5 character a-z random string.

A sibling gives the actual math that shows how wrong this is, but this doesn't even pass the most rudimentary sniff test. The most common encoding for a lowercase string would be in 8 bits per character, so a 5 character string can get you at most to 40 bits.

And that's assuming you allowed every one of the 256 possible characters. You're restricting it down to 26 characters.

EDIT: I was curious, so I checked. Even if you allowed every current Unicode character, 5 characters only gets you to ~86 bits of entropy:

log2(149186^5) ~= 85.9

As for the original 6 nines claim, I also calculated the entropy for a 14 character random password that allows all 62 letters+numbers plus 8 special characters:

log2(70^14) ~= 85.8

It's not until 20 characters that it matches a UUID v4. So, yeah, I'm okay with OP's 6 nines.


128 bits are 16 bytes, which is at best a binary string of 16 characters. Remove some bits for the not random parts of the UUID and still you don't get down to 5 characters. Furthermore "a 5 character a-z random string" is less than 5 bits per character. Make them less than 6 by adding A-Z and the ten digits.

About storage, at least PostgreSQL has been using 16 bits of storage since at least version 8 many years ago.

https://www.postgresql.org/docs/current/datatype-uuid.html

https://www.jacoelho.com/blog/2021/06/postgresql-uuid-vs-tex...


A 5 character a-z random string has log2(26^5) =~ 23.5 bits of entropy, way less than 128.


The best case for a 5 ascii character password is 7 * 5 = 35 bits.


Also UUID v3 and v5 produce IDs from identifiers such as URLs which can be quite useful if you want two different systems to generate the same exact UUID given knowledge of the same URL.

For example, in a REST system that needs UUIDs I'd use the REST URL of the object as the UUID.


The best format:

{opaqueTokenTypePrefix}_{crockfordEncodedEntropy}

Also: pass token through a bad words and "credit card lookalike" filter.

Optionally encode author cluster/region details in the low order bytes to resolve before eventual consistency in active-active systems.


> If you're using them for unguessable random strings then yeah, they're not ideal.

Why? I like to use them for private/secret URLs ...


> If you think you are likely to attract this kind of attention then you might want to carefully consider which side of the Mossad/not-Mossad threat divide you live on and maybe check your phone isn’t a piece of Uranium.

To be honest, we get something like this kind of attention (Tbps of forged requests / brute-force registration attacks per day), and all we do is provide a free API that's rate-limited per user account for multitenant-QoS reasons. People do all sorts of crazy stuff to try to sneakily drip-register a thousand accounts over several weeks so they can then launch some big job that uses all the keys in tandem to evade the rate limits.

Little do they know, they don't get any benefit from that, even while they're nominally "getting away with it"; doing that just makes our servers fall over! :P

---

Separately, you should really consider a level between "Mossad" and "not-Mossad": the sorts of people who hack crypto exchanges. They tend to use exactly the kind of "saw it in a movie"-level techniques that you'd think wouldn't happen because "you can just use rubber-hose cryptanalysis." Except if you're a socially-anxious math-genius fifteen-year-old living in Belarus, and there's a cryptosystem that you have unlimited access to a local copy of, maybe the rubber-hose cryptanalysis is actually harder!


> you should really consider a level between "Mossad" and "not-Mossad": the sorts of people who hack crypto exchanges.

Bad example, since the biggest crypto hacker is North Korea.

> North Korean government-backed hackers have stolen the equivalent of billions of dollars in recent years by raiding cryptocurrency exchanges, according to the United Nations. In some cases, they’ve been able to nab hundreds of millions of dollars in a single heist, the FBI and private investigators say.*

https://edition.cnn.com/2022/07/10/politics/north-korean-hac...


I always wonder about these "North Korean" hackers given that I've heard the average North Korean citizen's access to computers (and the outside internet) is extremely limited. Where are they getting these world class hackers from if this is the case?

It must be a group of outside parties who use North Korea as a mask and/or employer, right?


> Where are they getting these world class hackers from if this is the case?

The New Yorker ran a piece on this recently-ish[1]:

  The most promising students are encouraged to use computers at schools. Those who excel at mathematics are placed at specialized high schools. The best students can travel abroad, to compete in such events as the International Mathematical Olympiad. Many winners of the Fields Medal, the celebrated prize in mathematics, placed highly in the contest when they were teen-agers.
They cultivate their math talent, and after graduation they offer them jobs in cyber espionage that beat their alternatives by a wide margin.

[1] https://www.newyorker.com/magazine/2021/04/26/the-incredible...


"offer"


Or, when you're a quasi-religious dictatorship with strict information control, you just recognize math talent in 1st grade and funnel students into patriotic computer security-specific training.

North Korea has been ruled by the Kims for 74 years at this point.

A few computers aren't expensive for even a low-wealth country, and math is cheap to teach.


A few billion dollars hacked will pay for a lot of nice computers, even after the Kim skim.


There is a truly excellent series of Podcast from the BBC called "The Lazarus Heist", explaining in detail how the cyberwarfare capacity of North Korea is constitued, and what kind of attack they have done; I cannot recommend it enough.

The short version: they take their best math-inclined students, and send groups of them in China, to learn about computer hacking (and the world at large, to know what or who to hack). The group aspect serves as a self-surveillance check, to minimize risks of defections.


Do they need to be world class hackers?


DPRK 1%ers get a western education and have normal Internet access.

It’s not ideological, it’s practical; get a cybersecurity degree in the states/Europe and live “lavishly” with your family or flee and never see anyone you love again (they don’t kill them they’re just poor and in NK).


Uh, I doubt many/any of them are “crack a crypto algorithm” sophisticated.

These groups know how to buy exploits, attach payloads, and be persistent. Not trivial per se but hardly PhD math stuff…


They may very well be able to find their own 0-day exploits, which are far more likely to even exist than a "crack a crypto algorithm" method (which often just doesn't exist).


Not the ransomware crews, and def not when you’re pulling in 100mm+, it would lose you money to try…


IMO, this is bad math and why you probably need to be more cautious. I actually went through similar math with my team last year.

* It's long been know that UUID should never be used as a security mechanism. While the math is interesting, the fact they're using it as justification for moving away from UUIDs is concerning. It'd be like publishing a post titled "we're moving away from MD5"

* If you're using these tokens for human-entered purposes, you should implement account based rate limiting. It's nearly impossible for a brute force attack if a single account can only have, say 100 attempts per day before contacting support. There are very very use cases where a human-based token will ever need more than 20 attempts per day.

* Use long, high-character count tokens if they're intended to be machine/copy-n-paste only. Storage is cheap. Use something big and long.

Seriously, rate limit your shit. The second that rate limits are introduced you control all of the major variables in your security posture.


Exactly the article glossing over rate limiting and proceeding to do some math without it kinda ruined the article for me.

The minute is rate limited it’s crumble.


If the attacker doesn't care which account they break into, they could try a different account each time and then account-based rate limiting doesn't help as much.

(Depending on how many accounts there are that they can try.)


Hopefully they aren't able to get a list of all your user accounts. The IDs should not be iterable (like a UUID, for example!), and the friendly name (like an email address) is private information that shouldn't be getting out.


Generally, the subject tokens (short-ish, human enterable tokens) should only be active on a fraction of your accounts at any one time.


I've got a UUID in the URL of a new system (not a real UUID, but 128 random bits encoded in the UUID format), because that kept the existing links active. If someone were to guess one (which isn't that interesting, BTW, it's not account info or anything like it), he'd have a hard time checking a million per second, because the response is never going to be faster than 10ms or so.


I think the author forgets that while you may (potentially) be able to generate quintillions of UUID’s per second, there is no way you’ll be able to validate that they’re correct at the same speed.

At least, I don’t feel bad saying my server would melt if it had to serve that many requests.


Yeah, since they're trying to guess a random UUID from all issued UUIDs, they would have to make a request for each generated UUID. Even if you assume that ISPs would allow that kind of bandwidth, the entire internet would grind to a half before the attack even began. Also, the 2 * 46 figure used to represent the number of UUIDs is way too large. With proper access control of resources, we only really need to worry about active session IDs and in a world with 8 billion people, there's no way that there would be 2 * 46 active session IDs.

If you assume that every person on earth was hooked to your service 24/7 and ignore the significant bandwidth limitation, it would still take more than 6 months for the entire Bitcoin network to hijack 1 random user's session. But it doesn't make sense to ignore bandwidth limitation anyway since it's the bottleneck. The Bitcoin network computes all these hashes in parallel and there is no way that anything close to this degree of parallelization can be achieved at the network layer.


+1. Most server deployments will break at less less than 10k RPS. If we talk about the largest scale public cloud deployments, we might end up in the millions to billions of RPS due lots of lots of instances. But the number will be very far off from the 24293141000000000000 rps rate that the author used for estimating the impact.

Plus any deployment that is so large that a user could actually generate a reasonable amount of traffic will likely also have some variant of a DDOS/fraud/rate-limiting protection or at least alarms and manual interventions that will kick in once such a traffic flood is observed.


While you're not wrong, this is the type of thinking that can introduce security vulnerabilities.

Your current systems might not be able to validate quintillions of UUID's per second, but your future systems might be able to (or at least get a lot closer).


This is also the kind of thinking that people use for public-key cryptography: it's perfectly possible to guess the private key of any TLS cert, for example, it just takes a very long time with today's computers. Future computers may well guess it very quickly, so should we stop using it?

Cryptography is a moving a target. Any cryptographic method except an OTP used with perfect opsec is brute forceable, and will need to be changed when your threat model changes (for example, when computing speed advances significantly).


Yes, but the public-key crptography includes a massive buffer against the unknown.


This article is totally wrong about everything.

An id is not a password.

Hash rate is totally not comparable with just trying every combination.

You can generate combinations instantly, you dont need a gpu for that.

The delay is how long the server takes to respond and how many connections it can have at the same time.

You're also blocked by the server after trying a couple of thousand.


The title is misleading. This article argues that you should not use a UUID for a _session cookie or access token_, which was never the intended purpose of a UUID.


I don't think intended purpose cashes out into anything here. Either UUID has enough random bits for your case as a session token or it doesn't. UUID isn't special.

I don't find any variable of TFA's hypothetical UUID-breaker scenario convincing either. Not the number of tokens issued, nor the adversary having Bitcoin network levels of compute, nor the ability to verify tokens at anything close to that speed.


yes exactly, who in their right mind would assign a UUID as a session token?!?! i mean, good point, wow, this article proves exactly why UUID shouldn't be used for such... then proceeds to show basically a method that is currently used by many... sigh


I had a database table where I used UUID as primary key. Big mistake. Haunts us to this day.

Not sortable. Takes a lot of space. Table relationships are annoying. Etc.

What we do instead is have a secondary UUID key and keep Bigint as primary keys. Then use the UUID column in the external context instead.

UUIDs are fine for 99.99999% of the time in your own domain.

Don’t expect universal uniqueness across all domains.


The reason to use UUIDs as primary keys is to allow creating records including primary keys outside of the database before posting them, especially in distributed systems.

UUIDs are sortable, but don’t give you creation-order sorting (of course, its abusing bigint PKs to rely on them for that, too.) If you want creation-order sorting, storing a creation timestamp and sorting on that works, and I’ve never had a db that had a business requirement for creation order sorting and didn’t also have one for actual creation time.


I've become of the mindset that most records should have creation and update timestamps - even if you do not currently plan to use them. They might come in handy later, and the code to implement them in most languages/frameworks is trivial anyway.


These days ULID (https://github.com/ulid/spec, or any of the variants), or most recently the new UUID versions (https://www.ietf.org/archive/id/draft-peabody-dispatch-new-u...) give you creation-order sorting.


What's the benefit of having the primary key before the row is stored in the database?

Maybe concurrent inserts to multiple data stores so you don't need to wait for an initial ID from the database. You'd have to trust the client to give you a “good” UUID as well as the normal distributed problems of one of the RPCs failing. I know both Twitter (called Snowflake iirc) and Google have unique ID services for this use case.


> What’s the benefit of having the primary key before the row is stored in the database?

If I have a set of linked records in a relational schema representing a complex object, I can create them all with one round trip rather than multiple.

Also, large numbers of clients can insert in that way without contention, whereas if you use a sequence generator for PKs, it becomes a resource around which there is contention when creating rows.

> You’d have to trust the client to give you a “good” UUID

Sure, this lets you scale, e.g., backend service instances (which are db clients) without (as much as otherwise) contention in the db layer, its not usually something you would do with external, untrusted clients.

> I know both Twitter (called Snowflake iirc) and Google have unique ID services for this use case.

Snowflake was one of the inspirations for the newer (draft) UUID versions [0] (though, unlike them, it had a design constraint of fitting into 64 instead of 128 bits.)

[0] https://www.ietf.org/archive/id/draft-peabody-dispatch-new-u...


> I can create them all with one round trip rather than multiple.

I see. I'd lean towards using common table expressions. The first insert statement returns the primary key and other inserts can depend on the key. I do understand it's not a panacea and composing queries can be problematic.

> a sequence generator for PKs, it becomes a resource around which there is contention

As I understand, Postgres sequences don't block which leads to a different problem [1].

[1]: https://news.ycombinator.com/item?id=27843084


> Not sortable

That's not correct; its trivial to come up with an ordering, and I don't know of a database in practice that doesn't permit sorting on a UUID.

> Table relationships are annoying

… in SQL,

  user_id REFERENCES users
It's exactly the same, regardless of the type of the column…?

> Takes a lot of space

Yes … but also no. It's 16 B vs. a serial's 4 B, I grant, but compared to a varchar, it's immaterial. (And particular in comparison to the number of times I see people use a varchar for an enum…) Certainly there could be a case where a row is wide b/c of UUIDs, but in practice, rows are wide either because of the data, or because of poor design.


AFAIK UUIDs as pks in databases is extremely standard. I would suggest the biggest downside is debugging where writing id = 5 is much easier than id = 'xxxx-xxxx-xxxx-xxxx'.


On the other hand you won’t get wrong result if you wrote wrong table name. Was hit once. Deleted row from the wrong table.


yeah, I seen bugs in multi-db systems where a dataset gets accidentally composed from data from each due to aligning ids. Was a very strange afternoon.


Just curious to learn: when do you need sortable primary keys?


It makes pagination much more efficient and possible to completely use an index (re: data is pre-sorted) if you have some sort of increasing id, vs sorting in-memory on request. Even having some sort of indexed "createdAt" field, and sorting on that, isn't super reliable for pagination when the dataset is changing. You usually want a stable sort to make pagination nice, so you add an ID to it, but then this gets very inefficient w.r.t index use if the PK is a UUID since it is not monotonic/sortable.


UUIDs are sortable, and you can do efficient pagination on them. (The results wouldn't be in any meaningful order, but that's orthogonal.)


Yeah sorry I had a brain fart and was thinking of stable pagination w/ compound sort+filter. It doesn't matter if id is UUID, sigh.

Where date > y or (date eq y and id > x) order by date, id

This can't be completely fulfilled by an index. It'll have to sort/merge somevresults in memory, very CPU expensive.

More examples here https://www.mixmax.com/engineering/api-paging-built-the-righ...


In sqlserver (and others?) you can have one clustered index. Typically this is the primary key (doesn’t have to be though).

The clustered index is actually the physical order of the data stored on disk, so any new guid causes tremendous amounts of churn.

Not sure how modern this concern is, but I would guess a LOT of SASS companies have to consider this.

Quick edit: sequential GUIDs are obviously a thing and I believe alleviate a lot of these concerns. I do not believe that was an option in sql server 2008 r2 (a version that lived a LONG time).


A clustered index is certainly not actually the physical order of the data stored on the disk, at best it tends to approximate it by placing data associated with similar keys in similar blocks. You would have to actually physically reorganize your entire clustered table or index to get it in physical order, and often you lose performance by doing so.

The performance impact of randomly distributed keys on any ordered index on that key is certainly real though, so I agree with you there.


Almost every database algorithm is based on ordering and binary search.


… and for which UUIDs are sortable.

OP called them "not sortable", which is a poor phrasing, as they're sortable. What he's likely getting at is that they're not ordered by generation time, which matters to some folks.


When you need to get a sorted list or perhaps a range?

When do you need primary keys?


that doesn't really make sense though. how would you know the range to sort by? presumably you'd use the criterion directly to sort.

a better reason and justification to not use UUIDs is that they take up needless space if you're space constrained. they also have worse performance


Maybe you are doing some sort of cursor for pagination or whatever. You can use one ID and fetch the next 50 records. Then use the last ID in those 50 to fetch the next 50, etc.


but you can still do pagination with uuids, your rows can be sorted on any arbitrary criteria like created date or updated date.


But that would be slow, even with an index, as compared to using a clustered primary key. Especially when your data doesn't fit in ram.


Sure, already mentioned UUID’s being slow. There are trade offs


Dates might not be unique but I can see other possibilities that would be fine.


When is an instance you'd need to sort by ID? How does it make table relationships annoying?


I’m not the parent but for me:

1. Sort by ID if you want to see things by insert order and are using serial/autoincrement for the PK.

2. Writing (or reading, or talking about) SQL queries in which the IDs are present.

I solved the first problem by using ULID at one point but in future I’d probably not use a UUID until I had a specific need for it. And as others have mentioned, if that need is about the outside world I’d probably go with an extra column.


> 1. Sort by ID if you want to see things by insert order and are using serial/autoincrement for the PK.

Most RDBMSes (possibly all of the big ones?) do this by default if you opt for no ORDER BY clause, however that IS then depending on undefined behavior which is bad.


> Takes a lot of space

Wait, did you store them as strings instead of as a native UUID type (or any sort of 128 bit integer type)?


Storing UUID's in their binary format quickly becomes a PITA for debugging or doing manual queries. Some DB's have functions that can convert to human-readable strings which make this less painful, but it's never going to be as simple as using integer id's.

As with all things, there are tradeoffs that must be considered before building the system.


Which DB doesn't support UUID as a type so I can make sure to never use it?


More than you'd think... including the venerable MySQL.


This is a good approach. I wonder if using a UUID-aware data type (like PostgreSQL's UUID type) would improve performance without making the second column necessary?


But to use both, you have to store both, to use it as a lookup, you have to [unique] index both; you can no longer easily partition (and guarantee uniqueness)

And an auto incrementing bigint doesn't guarantee order.


And if you need universal uniqueness across all domains, just pick a 256-bit random number using a CSPRNG like /dev/urandom.


I'm surprised to hear it's not sortable. Why can't they just memcmp it?


You can and databases do.

What people want is for it to be sortable by generation time, i.e., they want it to share the "larger values were generated later" property of timestamps (but without collisions) or SERIALs (but without the locking on the serial / independent generation).


Got it. Thanks for correction.


Everytime I used uuids before I always ended up with terrible index performance


Sorry, but the main point of this article goes under YAGNI or "no, you aren't Google" for me.

If you aren't generating a thousand IDs per second for every person on the planet, you're fine.

Even from a guess-ability standpoint, it's more important to put reasonable rate limits on your endpoints than worrying about someone putting bitcoin-network-level of resources against your endpoints.


> I find that 160 bits is a sweet spot of excellent security

Please, for the love of God, leave cryptography to the cryptographers.


> The dash-separated hexadecimal format takes 36 characters to represent 16 bytes of data.

You can use a different formatting. I would suggest looking at https://github.com/oculus42/short-uuid Of course if you just want a random ID, then you might not need a UUID. But UUIDs have the advantage that there are different versions and you can distinguish them; e.g. you might want a unique ID that gives you some debugging information (where/when was it created), so you use v1 and later you can decide to switch to v4 if decide you want the IDs to carry no information.

Indepedent of how you generate the ID, I think the base-57 encoding that shortUUIDs use is quite good when the IDs are user facing. Not using O,0,l,1,I in the alphabet makes IDs more readable.



Yeah, I mean, in general you should always think about how much entropy you need and know how much you're getting. I think it should be fairly standard knowledge that a UUID may only provide 122 bits of entropy, but then again what should be standard is not what is standard.

It should also be standard to understand the birthday paradox and when it's relevant.

In a ton of cases 122 bits is totally acceptable and it's really up to you to understand when it isn't. In fact, in lots of cases you can get away with less, like 96bits, etc.

It should be pretty easy to answer "how much do you need?" by asking what your tolerance for collisions is.


Assuming a basic 4 bits of entropy per character, then you'd need a 30 character password to have as much entropy as is encoded in your GUID. If they're worried about 122 bits of GUID entropy, but their passwords are shorter than 30 characters, it feels like they misplaced their worry.


The issue isn't really passwords where a UUID would be totally overkill, it's for things like capabilities. For example, I could put a file at `example.com/<some uuid>` and no one could find it unless I told them about it. As the author points out, my chance of an attacker guessing that url isn't 2^128 but instead 2^122. For many cases that's actually fine.


It’s 2^122. Or 2^128 if you don’t care about stupid standards (why would you). Birthday paradox is about getting collisions, not guessing.


Right, yes, guessing once specific uuid wouldn't be subject to the birthday paradox, thanks. Edited my post to no longer reflect that.


Who the heck stores uuids in their string form. Its only useful for transport. For stoage you use bytes.


Any time I get what looks like a UUID from a service not under my control and I can't guarantee will always be a UUID.


Sure but at that point you're talking about external systems anyways, and the same would be true for any identifier they give you. One day that int may change into a raw byte array, string, braille, emoji-encoded[1], you name it.

This article is about scenarios where it IS under your control.

[1]: https://pypi.org/project/base-emoji/


You can still store and process it as a large integer, rather than a less efficient string, unless something in your stack is enforcing validation of the version/variant bits.

Unless your concern is more that they may not be unique enough due to bad choices in their derivation function, but this will not change no matter what representation you store them in.


Am i mis-reading this article?

BESIDES not having any particular way to validate a token without asking the service, making the rate a hell of a lot slower than 2^64 tokens per second (lol wut) doesn’t it also assume that you have 2^46 valid tokens in existence? Isn’t that 70 TRILLION valid tokens, or nearly 9000 tokens per human on earth?


Having had to this past week work on scrubbing a codebase looking for hard coded values... I will say, the prominence of the UUID format was at least very beneficial when searching beyond the configuration files. [\w\/_+-]{20,} also worked for finding longer matches, but more noise.

I'm not sure it's worth it to use more than a UUID for some use cases, but for a lot, it's fine. Maybe CUID if there's a decent library for your language/platform.

Aside... Whoever makes such a system that is generating/receiving OAuth tokens at that rate, and won't see/detect/feel a brute force attack of that scale probably didn't do anything to protect their SMS verification codes (only 6 digits), you'll definitely brute force that against a known password breach far more quickly, but okay, in either case.


I prefer encoded GUID generators like: pid, memory location, machine hardware ID, UTC epoch date-time, back-patched hash( serialized object, and salt)

Guaranteed globally unique in concurrency, built in data integrity check, and non-blocking read-modify-Bork resistant error detection

You are welcome, =)


It is great that people are concerned about UUID entropy, because some implementations actually got much less than ~120 bits.

However, I think article missed the point that you shouldn’t use UUIDs as a security measure anyway.


Just to be clear, 122 random bits are just fine as a security measure depending on what you want to do. For example, if I have 1 URL and I'm trying to prevent someone from guessing it.

How you choose to encode the bits doesn't affect the security; if you use UUID form it's just as secure as if you use Base64. Regardless, if I had my 122 random bits, you would require 2^121 guesses to have a 50% chance of guessing it.

Edit: this is in the non-quantum case cf. https://en.m.wikipedia.org/wiki/Grover%27s_algorithm

Edit 2: (3:32 ET) done editing


What about his solution (160-bit (20 byte) random value that is then URL-safe base64-encoded)? Is that a good and properly used security measure for a URL?


Depending on how many characters of representation you are willing to stomach, I prefer to use a more limited encoding than base64. Something that avoids IL1o0 so that a human could type it without guess work. As your human is not likely to know that you are including "0" but not "o", you have to nix both of them. This only matters if the url has any chance of being read by a human (ie ids). If it is some monstrous 2000 character thing that already contains gobs of metadata, go nuts.


I've never understood why UUIDs are... a thing. I understand how it can useful to have a name for different kinds of identifiers, but what is the purpose of adding variant/version bits and unifying different kinds of IDs into a single thing called a UUID? The article for instance never did anything with the "UUID-ness" of the IDs so they might as well have started with "random 128-bit integers".

Can anyone explain this?


They're for distributed systems.

The entire point is that different computers can generate ID's that can be merged in a database later on and not collide.

And the way they're designed is less likely to collide than random numbers are. (But you need to pick the appropriate UUID version to ensure that, depending on your system's architecture.)

If you're generating ID's that originate in a single centralized table, you don't need UUID's -- autoincrement or random integer is the simpler choice.


Okay, thanks. Thinking about it a little more I think I get it.

1. We want a single truly universal namespace.

2. We want to have different methods of generating IDs, since there are a lot of different people in the whole universe and they may have different needs.

3. Each single method is designed to not generate collisions.

4. But there could be collisions between different methods. So some bits are added to show what method was used to generate the ID to guarantee IDs generated by different methods don't collide. This is also why we need a spec that defines what each method is, the versions are like a scare global namespace that needs to be allocated carefully.

I think the reason I found this difficult is most of the times I see UUIDs, (1) and (2) aren't actually needed (this article being representative).


You don't need a UUID for that. A 128 bit random integer will work just fine. Better even, as no bits are wasted on meta data.


I'm getting down votes, presumably because people think what I said is incorrect. Would anybody care to explain how UUIDs can have lower probability of collision than just random bits? (Apart from using a central coordinator for handing them out of course, which would defate the whole point.)


You can easily look it up on Wikipedia. [1]

But version 1 of UUID's, for example, concatenates a MAC address, a timestamp, and a "uniquifying" clock sequence.

As long as you're ensuring your MAC addresses are all unique (which they should be, except for manufacturing error) and only one UUID library per device, collisions are impossible.

Whereas with random numbers, there's always a chance of collisions. (There are also other versions of UUID's that do include randomness, with collision potential, in exchange for not revealing information such as timestamps and MAC addresses.)

[1] https://en.wikipedia.org/wiki/Universally_unique_identifier


Ah, got it. Reading the spec, that seems like an awful lot of trouble to prevent something that will never happen anyway.

Just wondering, can v1 guarantee uniqueness with multiple processes using the same network adapter or within containers?


It's because this statement:

    You don't need a UUID for that. A 128 bit random integer will work just fine.
Basically reads:

    You don't need a UUID for that. A UUID will work just fine.


The rfc is rather long for a UUID being just a random integer...

But if we're talking about UUID v4, it seems that 122 out of 128 are indeed random. [1] So why not just go all the way? Who cares about these 6 bits of meta data?

[1] https://www.rfc-editor.org/rfc/rfc412


A UUID is random bits. The statement was oxymoron


But it isn't. A UUID is a disjoint-union of several different types of IDs, only one of which is just random bits, and if random bits is all you care about, then "random 128-bit number" is not only clearer but gives more entropy than "version 4 UUID".

That what I'm trying to ask, what is the purpose of this disjoint union, when would you ever use the "UUID-ness" of UUIDs (which is not the same as asking about the virtues of UUIDs of a particular version).


The virtue is, software has some idea what a UUID is. There's a mostly-random version which is also understood.

If you want to start doing your own thing, a random number is good. It's hard to get a good random number. I suggest starting with, hey you guessed it, the UUID library.


One reason is web scraping. Twitter used to have sequential IDs that made it easy to tell how many users they’d added over a period of time, even after they were public.


Google+ UIDs were UUIDs (or something looking very much like them).

They were nonsequential and sparse.

For those of us looking to 1) quantify activity and 2) migrate users and data off the system, they were pretty confounding.

Fortunately, Google also provided lists of those UUIDs by way of robots.txt sitemap files.

I used one sample of ~50k of those to estimate total G+ active users as of ~2014 (another group polled a random sampling of 500k for a more precise measurement). And when G+ folded in 2019, I'd provided that information plus some additional bits gleaned over the years and from some additional sources to estimate just how large the archive dataset might be, for ArchiveTeam.

One place where the sparse population might prove really useful is in telephony. Phone numbers as we know them today are densely populated, and in fact, frequently re-used (which is why your new phone is receiving debt-collection calls for its previous holder). It also makes war-dialing or random-dialing viable for robocallers.

If only 1 in 10 billion numbers was valid (about the saturation rate of G+ UUIDs), war-dialing / random dialing would be all but ineffective. If you could dial one number per second, you'd have a 50% chance of hitting a live number ... in 158 years.

(Of course, if you had a listing of valid numbers, again, see G+'s robots.txt files, your search space would be far smaller.)


How do random 128 bit IDs not fix this problem?


And what are the hyphens good for? And what is the reason for their irregular placement?


My guess would be readability. Easier to "parse" by humans then just a long string of numbers and letters without "interpunction".


They evolved that way, and the hyphens denote the various parts of UUIDv1, the original version of UUIDs; roughly:

  123e4567-e89b-12d3-a456-426614174000
  ^^^^^^^^ ^^^^ ^^^^ ^^^^ ^^^^^^^^^^^^
  time lo  time time  seq   node id
            mid  hi
+/- a few bits reserved for version & variant. The random ones are version 4, but they still share the same string representation as v1. (But outside of the bits mentioned in the article, the other bits are just random; the diagram above doesn't apply to v4.) The only thing the hypens really do there is make it easier to see where the version bits are, if you want to visually verify that it's a v4 UUID.

See: https://en.wikipedia.org/wiki/Universally_unique_identifier


There are different types of UUID that serve different purposes https://web.archive.org/web/20220623031329/https://www.ietf....


Does anyone have experience with KSUID (K-Sortable Unique IDentifier) ? Interested in cons.



Hello. I tried my hand at writing this in JavaScript. See: https://codepen.io/javajosh/pen/BaVrBMb?editors=0010


I really like https://github.com/ai/nanoid which is available for a wide range of programming languages. It creates short, random, and URL-safe IDs.


I think we don't need to reinvent the wheel for handling user session, especially in web based application. There are common best practices that everybody already know such as using jwt, https and store in http only cookie


Unlike the UUID scheme, strings of completely random characters don't sort/tree-index very well. Apart from that they are, in my opinion, a much better alternative.


On the flip side UUIDs are great as non-cryptographic identifiers...

It's easy to get and for any non sec application nearly impossible to replicate.


I rather like how stripe uses prefixes for all of their object IDs returned from the API.

A charge looks like: ch_3M0k8bFSpHML0ApB0Pd8Zxmq


UUID, as the name implies, is used for identifiers - You know this.

These use-cases you're implying seem to be the real problem.


Should rename the article to be more accurate. Why you shouldn't use uuids for a token password.


I figured this would be an article giving reasons not to use UUIDs as database identifiers...


Few things say amateure more than calling random values pretty unique


The title of this article is as misleading as it could possibly get.

The author's recommendation is to make UUID's longer and encode them with base 64 instead of hexadecimal. That's it. Somehow that means "moving away from UUID's."


UUIDs are 128 bits by definition. So there is no such thing as a “longer UUID”, but if you use that concept then you are indeed “moving away from UUIDs.”


I prefer z-base-32 rather than url-safe base64.


TL;DR: if you want the chance of a collision to be the same as hard drive corruption, you're "safe" with 128 bit random ids until nearly a trillion objects. That doesn't mean it's a particularly great idea though because you have to not just look at the probability of collision but what net impact that will have to the system/business as a result.

If you want future proof random/unsorted ids just use 512 bits. That's large enough to maintain a 128 bit search space with existing hash algorithms even in a world with practical quantum computers that run Shore's et all. This is approximate reasoning not a formal proof but is reasonably grounded. In the big picture 64 byte vs 16 byte ids are unlikely to be the thing that kills your company. A security breach or corruption of a key database record very well could.


I need 10^157 bit of entropy.


Another advantage of UUIDs is that they're more or less monotonically increasing over a medium-to-long time periods. This is very helpful for database insert performance.


Not all UUID versions are monotonically increasing. This article is specifically talking about V4 UUIDs which contain random values only.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: