
UUIDs generally do not meet security requirements - xnyhps
https://littlemaninmyhead.wordpress.com/2015/11/22/cautionary-note-uuids-should-generally-not-be-used-for-authentication-tokens/
======
elithrar
The problem highlighted here is not so much UUIDs, but instead the use of poor
PRNGs (i.e. things other than /dev/urandom or getrandom(2) on Linux). All of
the issues with the example libraries are around of the use of PRNGs from math
libs, not crypto libs.

A "secure token generation" library might have the same flaws (and many do!).

A _correctly generated_ v4 UUID (S4.4 of the RFC) should be acceptable for use
as a secure token, given it has 122 random bits.

The trouble with UUIDs, if any, is that only v4 UUIDs are really suitable for
use as secure tokens, with v1 through v3 being entirely unsuitable.

Personally, outside of DB PKs in Postgres (or similar), I prefer to base64
encode 256-bits from /dev/urandom for tokens, as they're a little shorter (and
Go makes it easy enough to do that).

~~~
xnyhps
> A _correctly generated_ v4 UUID (S4.4 of the RFC) should be acceptable for
> use as a secure token, given it has 122 random bits.

No it's not, from §4.4:

> Set all the other bits to randomly (or pseudo-randomly) chosen values.

I can use a PRNG with a cycle length of 8 and it would be fully correct
according to the RFC, but it would be trivial to brute-force all the values.

~~~
Natanael_L
Correctly generated implies not having such a computationally weak correlation

~~~
xnyhps
Obviously I'm interpreting "correctly generated" to mean "generated in
accordance with the RFC", so no, it does not.

~~~
derefr
"Correctly generated" also means, at the very least, "useful to the people
implementing it." If you can't use a given UUID implementation without
generating collisions in the first 100 items, you won't use that UUID
implementation. There's an "implicit spec" behind each spec of what people
will or won't actually bother to do.

Generating correlated v4 UUIDs isn't _wrong_ , but it is _stupid_ in the sense
of being _self-defeating_ : nobody who _wants to generate v4 UUIDs_ wants them
with insufficient randomness, and will have no reason to _not_ consider a bad
random-number source to be a bug, because it's interfering with getting them
the thing they want to get by generating v4 UUIDs in the first place.

(The real point of the explicit UUID spec, meanwhile, is in saying what
constitutes a _valid_ UUID—and it's certainly _valid_ to generate a UUIDv4
using insufficient entropy. There's no way for a peer _receiving_ such UUIDs
to guess that they're maybe-predictable-with-enough-effort and reject them on
that basis, which is all "validity" can ever mean: what can or cannot be
technically enforced by protocol peers.)

------
viraptor
Fortunately Python is safe in this case.
[https://hg.python.org/cpython/file/2.7/Lib/uuid.py#l582](https://hg.python.org/cpython/file/2.7/Lib/uuid.py#l582)
\- urandom is the source for the UUID4. (same for 3.5:
[https://hg.python.org/cpython/file/3.5/Lib/uuid.py#l600](https://hg.python.org/cpython/file/3.5/Lib/uuid.py#l600))

~~~
methyl
Ruby is also safe: it uses OpenSSL/urandom for UUID:
[https://github.com/sj26/ruby-1.9.3-p0/blob/master/lib/secure...](https://github.com/sj26/ruby-1.9.3-p0/blob/master/lib/securerandom.rb#L59)
[https://github.com/sj26/ruby-1.9.3-p0/blob/master/lib/secure...](https://github.com/sj26/ruby-1.9.3-p0/blob/master/lib/securerandom.rb#L246)

------
SamReidHughes
Even if you're only concerned about making unique identifiers, that don't need
to be secret, that merely need to be unique, hiding the UUID generation state
is a very good idea if users might have the ability to cause new UUID
generators to be created. Otherwise, instead of a 2^(128/2) = 2^64 birthday
attack, they can make 2^N generators, find the one that overflows into the
next the quickest (i.e. a birthday attack on the 2N msb's), and have
2^(128-2N) work to do. This means you could do a 2^36-sized birthday attack
and then have 2^56 work left to do, and that part's not memory-bound. For
example, this works if you have generators that seed a 128-bit counter with
/dev/urandom, and then increment from there. You can avoid this if you "just"
use some CSPRNG.

------
mAritz
For anyone else wondering about their node.js code: node-uuid uses nodes
crypto.randomBytes: [https://github.com/broofa/node-
uuid/blob/master/uuid.js#L59](https://github.com/broofa/node-
uuid/blob/master/uuid.js#L59)

------
StavrosK
I wrote a fairly popular Python library to easily generate short UUIDs
(encoded with base57):

[https://github.com/stochastic-
technologies/shortuuid](https://github.com/stochastic-technologies/shortuuid)

Since it turned out that most uses didn't _actually_ need a UUID and just
needed a random string, I figured I'd maximize the randomness per bit and
added a .random() call that returns a short string that is taken from
os.urandom().

It would probably be good if you asked yourself whether you actually need a
UUID or a random string, and used the right tool for the job. Using a UUID
when you want a random string and vice versa leads to these kinds of problems.

------
RyanZAG
Shouldn't this be 'Javascript UUIDs generally do not meet security
requirements' ?

As he mentions in his story, the server was actually using Java UUIDs which
are cryptographically secure. So 'UUIDs generally do not meet security
requirements' is false, as generally here only applies to Javascript. Python,
Ruby, Java all have cryptographically safe UUID.

~~~
Lazare
> Shouldn't this be 'Javascript UUIDs generally do not meet security
> requirements' ?

Nothing wrong with v4 UUIDs generated in Javascript, so long as you don't use
Math.random(). Every current browser supports, eg, crypto.getRandomValues(),
which is cryptographically secure. If you use a broken PRNG in any other
language, you'll get broken v4 UUIDs too. :)

(Also, don't make the common mistake of conflating "UUID" with "v4 UUID". A v4
UUID contains 122 bits of (hopefully) random data. A v1 UUID contains 0 bits
of randomness.)

------
netheril96
This kind of problem is so prevalent that I think directly creating an 128-bit
random ID from a CSPRNG is much safer than betting on the saneness of UUID
implementers.

~~~
lmm
There's nothing insane about using a faster nonsecure PRNG for a library
function that you expect to possibly be used on a hot path, as long as the
security properties are clearly documented.

~~~
marcosdumay
CSPRNGs are not much slower than the alternatives, and really, how often the
hot path on data creation falls on the processor, instead of memory access or
IO?

Nope, looks like a really bad choice for generic libraries.

------
castell
When do you use UUIDs and why?

I use traditional auto increment integer IDs as primary keys in SQL databases.
I use those SQL query results in application code. Debugging seems a lot
easier with smaller integer values rather than UUIDs that I saw in several
enterprise software and SQL-DBs.

~~~
lmm
I use UUIDs for pretty much any "primary key"-like scenario:

* They're natively supported in most databases and languages

* They're strongly typed, meaning there's no risk of accidentally doing things that make no semantic sense with them (e.g. adding or multiplying)

* They ensure that any API user will use the correct datatype, rather than some clients breaking when your ids go above 2^31

* They avoid exposing information about how many entries they are, e.g. you can't tell how many users I have by signing up and checking your user ID

* The client can generate them without roundtripping to the database; this can save on roundtrips when you're saving several related pieces of data, and makes it easier to have circular datastructures if you need them

* As others have said, they're usable in an AP datastore

None of this is impossible to do with integers, but UUIDs make it very easy.

~~~
sinatra
Doesn't the random nature of uuids make them extremely inefficient candidates
for primary key as they can't be indexed well?

~~~
lmm
It's not like an auto increment key is meaningful, so the read performance
will be the same either way. As for writes, a new key could go anywhere in the
index, but OTOH you can insert multiple values concurrently (unlike with auto
increment keys with MVCC where you'll have collisions and one insert will have
to be rolled back and redone). As always, benchmark your use case and see what
gives acceptable performance.

~~~
bza
> As for writes, a new key could go anywhere in the index

This forces index tree rebalancing to occur on many (even most) writes, which
is hugely detrimental to performance.

~~~
lmm
Which tree structure is this for? Many tree structures (e.g. the classic red-
black tree) perform much better (doing less rebalancing) for randomized
inserts than for ordered ones.

------
miohtama
UUID4 random is 122 bit of randomness:

[https://gist.github.com/miohtama/6e72c2458a7138599dc1](https://gist.github.com/miohtama/6e72c2458a7138599dc1)

~~~
masklinn
The point here is that UUID4 is only as random as the underlying source of
randomness. If the source of randomness is garbage your UUID4 is garbage as
well. v8's Math.random() does not even remotely come close to a CSPRNG, and as
a post a few days ago noted[0] it has a very, very small cycle length, so you
can easily (via brute-force) find where in the cycle it was when it generated
a given value, and enumerate all the UUIDs it _could_ generate.

[0] [https://medium.com/@betable/tifu-by-using-math-
random-f1c308...](https://medium.com/@betable/tifu-by-using-math-
random-f1c308c4fd9d#.7p2jjzxba)

~~~
im2w1l
C's rand() is typically poor as well. Thought it was par for the course for
builtin rng to be somewhat shitty, but maybe other languages stack up better?

~~~
revelation
C's rand() (rather, your libcs rand()) is perfectly fine for what it's
supposed to be: fast random data, with no particular guarantees to
cryptographic security, for things like randomized algorithms or the 99% of
cases where you're picking what powerup to spawn instead of generating super
important key material.

To make it into a CSPRNG would be a detriment. The only thing I would change
is to have a minimum cycle length.

~~~
adrianN
No, C's rand is unsuited for most randomized algorithms. MCMC algorithms won't
work with rand unless you're lucky, it's even unsuited for choosing pivots for
quicksort. The only application where it might be ok is games.

~~~
masklinn
> The only application where it might be ok is games.

Not even that. Different types of games require different types of RNGs (e.g.
in an RPG or strategy game you probably want a stable seeded RNG to preclude
RNG save-scumming), the C standard requires almost no guarantee of rand().

------
sneak
[https://twitter.com/mjmalone/status/667429857165488130](https://twitter.com/mjmalone/status/667429857165488130)

The PRNG behind Math.random() has been fixed in Chrome very recently.

~~~
Filligree
Fixed?

As far as I know, there are no CSPRNGs that are as fast as, say, mersenne
twister. So if 'fixed' means 'made it a CSPRNG', then I'd have to say they
broke it. crypto.getRandomValues already exists.

~~~
TillE
How fast do most users of a PRNG really need it do be? I think it's not
unreasonable to pick something like ChaCha20 as a default PRNG
(cryptographically secure, huge seed space, much smaller state than MT19937),
and let people who need millions of random numbers for simulations use
something else.

~~~
mmalone
The algorithm that V8 chose to replace the current generator, xorshift128+,
passes BigCrush and can produce a random uint64 in just over 1 nanosecond on a
modern processor (that's ~7GB/s). Seems like a good choice. Not sure how it
compares to something like ChaCha20 on performance, but I'm guessing it
compares favorably (though it's unlikely to be your bottleneck either way).

Since I wrote the above linked article there's been a bunch of discussion,
some of which has been about using crypto to back Math.random(). I'm sort of
torn on that front - I feel like a good PRNG is useful for some stuff (like
array shuffles), but maybe not? Maybe there are undiscovered (or even
discovered, but not well known) vulnerabilities that justify CSPRNG even
there, as there are with hash collision vulnerabilities.

Anyways, what I learned is that the benchmarks (particularly SunSpider, it
seems) are putting pressure on implementors to (over)emphasize Math.random()
performance, but nothing is really pressuring them to produce good quality.
Sounds like the best thing to do might be to put some quality requirements in
the ECMA spec to balance the performance pressures. Check out my recent
Twitter likes for more details. I'm @mjmalone.

------
rb12345
Actually, this may be even worse than described. The UUID digits all amount to
the hex digit of hi in the 16 __3 place (i.e. the 4th least-significant
nybble). However, it also turns out that hi[init]*0xFFFF will give the same
digit sequence but with a leading 0. This probably can be used to determine
the RNG state from a single UUID, but the calculations for that are beyond me.

