A "secure token generation" library might have the same flaws (and many do!).
A correctly generated v4 UUID (S4.4 of the RFC) should be acceptable for use as a secure token, given it has 122 random bits.
The trouble with UUIDs, if any, is that only v4 UUIDs are really suitable for use as secure tokens, with v1 through v3 being entirely unsuitable.
Personally, outside of DB PKs in Postgres (or similar), I prefer to base64 encode 256-bits from /dev/urandom for tokens, as they're a little shorter (and Go makes it easy enough to do that).
No, you're missing the point. Security requires more than randomness, it also requires unpredictability. The RFC Section 6 is very clear about this: "Do not assume that UUIDs are hard to guess; they should not be used as security capabilities". I don't know how many UUID generators have cryptographic properties, but it is wrong to assume that it it suitable for security just because the randomness appears good. You can speak to any cryptographer about that.
No it's not, from §4.4:
> Set all the other bits to randomly (or pseudo-randomly) chosen values.
I can use a PRNG with a cycle length of 8 and it would be fully correct according to the RFC, but it would be trivial to brute-force all the values.
In the same section of uuid RFC: "See Section 4.5 for a discussion on random numbers.", and then: "Advice on generating cryptographic-quality random numbers can be found in RFC1750 .", so it's not like authors were unaware of the implications of trivial PRNG. It's just up to the implementation whether CSPRNG is required (user-visible security tokens), or just a PRNG (internal identifiers).
Generating correlated v4 UUIDs isn't wrong, but it is stupid in the sense of being self-defeating: nobody who wants to generate v4 UUIDs wants them with insufficient randomness, and will have no reason to not consider a bad random-number source to be a bug, because it's interfering with getting them the thing they want to get by generating v4 UUIDs in the first place.
(The real point of the explicit UUID spec, meanwhile, is in saying what constitutes a valid UUID—and it's certainly valid to generate a UUIDv4 using insufficient entropy. There's no way for a peer receiving such UUIDs to guess that they're maybe-predictable-with-enough-effort and reject them on that basis, which is all "validity" can ever mean: what can or cannot be technically enforced by protocol peers.)
Fortunately the birthday attack (assuming SHA512 is secure) is at 256 bits. So I don't believe you've lost security in this case... but it would seem that you've almost certainly lost performance.
Since it turned out that most uses didn't actually need a UUID and just needed a random string, I figured I'd maximize the randomness per bit and added a .random() call that returns a short string that is taken from os.urandom().
It would probably be good if you asked yourself whether you actually need a UUID or a random string, and used the right tool for the job. Using a UUID when you want a random string and vice versa leads to these kinds of problems.
(Also, don't make the common mistake of conflating "UUID" with "v4 UUID". A v4 UUID contains 122 bits of (hopefully) random data. A v1 UUID contains 0 bits of randomness.)
Nope, looks like a really bad choice for generic libraries.
I use traditional auto increment integer IDs as primary keys in SQL databases. I use those SQL query results in application code. Debugging seems a lot easier with smaller integer values rather than UUIDs that I saw in several enterprise software and SQL-DBs.
* They're natively supported in most databases and languages
* They're strongly typed, meaning there's no risk of accidentally doing things that make no semantic sense with them (e.g. adding or multiplying)
* They ensure that any API user will use the correct datatype, rather than some clients breaking when your ids go above 2^31
* They avoid exposing information about how many entries they are, e.g. you can't tell how many users I have by signing up and checking your user ID
* The client can generate them without roundtripping to the database; this can save on roundtrips when you're saving several related pieces of data, and makes it easier to have circular datastructures if you need them
* As others have said, they're usable in an AP datastore
None of this is impossible to do with integers, but UUIDs make it very easy.
This forces index tree rebalancing to occur on many (even most) writes, which is hugely detrimental to performance.
For the database itself, the benefits are simply those that come from the fact that every ID, regardless of where it was generated, should be unique. This means you can, per the_mitsuhiko, generate keys in a distributed fashion, but it means more than that. Even rows from disparate databases have unique identifiers, which has benefits all the way down the line, as everyone has a way to refer to a particular entry that uniquely identifies it, regardless of where it came from. It lets you separate, and recombine the data, without worrying (depending on the type of UUID, at least) about collisions (separating being something like a NoSQL database, where based on the hashing key it gets written to one nodeset, allowing you to scale out; recombining being something like that, or you had multiple regions/databases in a SQL solution, but you need to run queries against both and merge the results for insertion into a single database for various metrics or similar).
Whether you need any of that is entirely dependent on your use cases; an incrementing ID in the database is perfectly valid for many needs.
When you need the client to determine the PK before an operation happens. In particularly necessary when working with distributed environments. There are many different forms of UUID and when you know what you are doing you can build very powerful systems with them that you cannot do with auto incrementing integer keys (even with holes).
I would also debate your use of the word 'easily'; what's easier, setting up zookeeper/etcd, or just generating a UUID?
But yes, if your use case is "I want to make sure I have non-conflicting IDs across my cluster", synchronizing them is a possible solution. And the right one, if your requirements stipulate absolute ordering based on the generation of each ID.
After all UUIDs only make it unlikely that you have conflicting IDs not impossible. Especially if your random source is not that random.
Plus a UUID can be generated by a source outside your network -- eg, client side, with the ability to treat them as quasi-nonces for replay detection.
Or inside your network, during long-running partitions.
The bigger question is: why would I install, manage, update, monitor zookeeper/etcd and write my own custom ID generator and all the storage mechanisms around it, when I can just use UUIDs?
On the other hand, a well chosen UUID, can be generated anywhere and be relied on to be globally unique. So maybe your client wants to create some records on the client, sync to a local db, and then do a batched update the next time the laptop it's running on has a wifi connection. Or maybe you want to scale your databases horizontally, or distribute your app across multiple data centers, or basically make any sort of AP (ie, Available and Partition Tolerant) system. In which case, integer IDs are asking for a world of pain, but a well chosen token is great, because you can keep working efficiently when talking to your Single Source Of Integer Truth is slow or impossible.
(And yes, there's other solutions to the issue. But UUIDs can solve it with a minimal amount of design and coding.)
> Debugging seems a lot easier with smaller integer values rather than UUIDs
My experience has been that both are just as easy, and leaking integer IDs can be a (mild) security vulnerability.
That seems to be pretty common, although v8's seems especially terrible. In thread linked by PhantomGremlin pcwalton also points out that stronger RNG can be "detrimental" to benchmarks if they're slower, as many benchmarks ultimately bench the speed of the basic RNG (or for an other issue in the same category the speed of the basic hash function), so the incentive for builtin RNGs is to be fast and cheap, not to be good.
Also worth checking out: https://channel9.msdn.com/Events/GoingNative/2013/rand-Consi...
To make it into a CSPRNG would be a detriment. The only thing I would change is to have a minimum cycle length.
Not even that. Different types of games require different types of RNGs (e.g. in an RPG or strategy game you probably want a stable seeded RNG to preclude RNG save-scumming), the C standard requires almost no guarantee of rand().
This is silly, there are plenty of ways randomness is used to communicate randomness to the user (think games, song shuffling, visualizations, art, etc. Algorithms using randomness ≠ randomized algorithms.
The PRNG behind Math.random() has been fixed in Chrome very recently.
As far as I know, there are no CSPRNGs that are as fast as, say, mersenne twister. So if 'fixed' means 'made it a CSPRNG', then I'd have to say they broke it. crypto.getRandomValues already exists.
Since I wrote the above linked article there's been a bunch of discussion, some of which has been about using crypto to back Math.random(). I'm sort of torn on that front - I feel like a good PRNG is useful for some stuff (like array shuffles), but maybe not? Maybe there are undiscovered (or even discovered, but not well known) vulnerabilities that justify CSPRNG even there, as there are with hash collision vulnerabilities.
Anyways, what I learned is that the benchmarks (particularly SunSpider, it seems) are putting pressure on implementors to (over)emphasize Math.random() performance, but nothing is really pressuring them to produce good quality. Sounds like the best thing to do might be to put some quality requirements in the ECMA spec to balance the performance pressures. Check out my recent Twitter likes for more details. I'm @mjmalone.