"A UUID is 128 bit long, and consists of a 60-bit time
value, a 16-bit sequence number and a 48-bit node
identifier.
The time value is taken from the system clock, and is
monotonically incrementing. However, since it is possible
to set the system clock backward, a sequence number is
added. The sequence number is incremented each time the
UUID generator is started. The combination guarantees that
identifiers created on the same machine are unique with a
high degree of probability."
Looking at the code, the node id is derived from the MAC, and the sequence number state is stored in /tmp or your home dir.
In an era of vm's that are frozen and copied around between hosts, this seems like a spectacularly bad implementation. It's way to easy for this code to end up running from the same start state in such a situation, and as soon as that happens you'll have tons of exactly duplicated UUID sequences. I would hate to have to debug the mess that would produce. And that's aside from possible security issues.
Please do not use this gem. SecureRandom.uuid has a sane (and standard) implementation.
Not that people necessarily do this, but if you duplicate a VM without changing the copy's MAC address then you're setting yourself up for all sorts of problems. It's not all that unreasonable to assume that MAC addresses are unique. After all, that's their whole point.
I looked further into the code and it's actually uglier than that. Since ruby can't get at the MAC directly, instead they call the ruby global rand function (which is a fast and weak psuedorandom generator on many platforms) 256 times, concatenate the results and sha1 it to create a fake MAC, which is then stuffed into the state file. If a state file exists, it is preferred to creating a new one. So really the entire initial sequence state is in the state file and system clock. That's way to likely to overlap IMHO, particularly on vm images that may not update the system clock immediately on spawning an image, before the app comes up.
Your point about always using a new MAC when you thaw or duplicate a vm image is well taken.
I'm convinced that when code has cryptographic implications, people should not roll their own ad hoc weird solutions. Nearly every platform has a way of accessing high quality entropy. Version 4 random UUID's have been standardized for a while now. I don't understand the logic of using some "I invented this" implementation.
Good lord. Even if they used true randomness to generate their fake MAC, you'd still expect a collision after doing this only about 16 million times. What were they thinking?
The RFC outlines a single variant of UUID, and describes five different versions of this variant. One of these versions is random. The author, however, is not using this version. I may be mistaken fact, but the uuid Ruby gem he used appears to implement a different variant altogether. From wikipedia:
In the canonical representation, xxxxxxxx-xxxx-Mxxx-Nxxx-xxxxxxxxxxxx, the
most significant bits of N indicates the variant (depending on the variant;
one, two or three bits are used). The variant covered by the UUID
specification is indicated by the two most significant bits of N being 1 0
(i.e. the hexadecimal N will always be 8, 9, A, or B).
In the variant covered by the UUID specification, there are five versions.
For this variant, the four bits of M indicates the UUID version (i.e. the
hexadecimal M will either be 1, 2, 3, 4, or 5).
The examples from the blog post don't appear to fit the RFC variant:
Version 4 UUIDs use a scheme relying only on random numbers. This algorithm
sets the version number as well as two reserved bits. All other bits are set
using a random or pseudorandom data source. Version 4 UUIDs have the form
xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx where x is any hexadecimal digit and y
is one of 8, 9, a, or b. e.g. f47ac10b-58cc-4372-a567-0e02b2c3d479.
Ruby 1.9 includes version 4 UUIDs as a built-in: SecureRandom.uuid. The author should consider using that instead.
We ended up using Ruby's SecureRandom.hex, which is 128 bits of the form "e7e12f9b82f71d90b678373c360e497c" - roughly equivalent to the 126 bits of randomness you get from a version 4 UUID.
The same problem applies with php using the function `uniqid`. The function does not generate cryptographically secure tokens. Here are 10 generated in a row:
Yeah, uniqid is just an altered timestamp as I understand it. But it's not meant to be in any way cryptographically secure or random, though I'm sure someone somewhere uses it for tokens, passwords, whatnot.
This advice doesn't appear to be universally true. There are different versions of UUID, including version 4 for truely "random" (or pseudo-random) UUIDs.
def uuid4():
"""Generate a random UUID."""
# When the system provides a version-4 UUID generator, use it.
if _uuid_generate_random:
_buffer = ctypes.create_string_buffer(16)
_uuid_generate_random(_buffer)
return UUID(bytes=_buffer.raw)
# Otherwise, get randomness from urandom or the 'random' module.
try:
import os
return UUID(bytes=os.urandom(16), version=4)
except:
import random
bytes = [chr(random.randrange(256)) for i in range(16)]
return UUID(bytes=bytes, version=4)
You're absolutely right. It was 100% our fault that Meldium was broken - we didn't pay attention to the implementation, and we assumed that job IDs were random when they really weren't. I don't mean to cast any blame on UUIDs or resque-status in this post.
The easiest way is to just read 128-bits of randomness. There's no good use case, outside of wanting sorted identifiers, that warrant using non-random UUIDs.
Base64 is really annoying to deal with if you're using it for identifiers; you'll inevitably want to drop something's ID in the middle of, say, URLs, or usernames, or email addresses, and then the case-sensitivity and those three extra characters (+ / =) will bite you.
z-base-32[1] is a much human-friendlier alternative (while still being edible in nice discrete machine-word-chunks), although sadly, it's not supported by nearly as many stdlibs.
Because URLs (especially the abbreviated kind that link-shorteners, pastebins, and image-hosts use) sometimes have to pass through bridges such as "reading them off a screen, writing them down on paper, and then typing them in somewhere else", or the slightly more direct "reading them off a billboard."
It's reliably double the length of the input byte array, it isn't subject to transcription errors or special character conflicts, and hex conversions exist in every stdlib ever.
If you want random but don't care about globally unique, use a function that calls `/dev/urandom` is your best choice for cryptographically secure tokens.
Cryptographically random bit strings are more likely to be collision-free than whatever UUID scheme you can cook up. The reason is that finding a collision among (uniform, independent) random n-bit strings is extremely low as long as you produce fewer than roughly 2^(n/2) of them, whereas generating the same UUID according to a deterministic process is relatively plausible if you consider non-synchronized clocks and virtualization.
The potential for duplicates is at odds with the statement "best choice for cryptographically secure". As with anything security related, never roll your own implementation if you can at all help it.
That aside, urandom won't actually have duplicates. Just don't truncate it to a tiny size. And as far as I'm concerned grabbing a chunk of urandom is using someone else's implementation.
That's actually a good point I hadn't considered. Presumably /dev/urandom uses a CSPRNG implementation and 128 bits from that is just as good as 128 bits from any other CSPRNG source.
In an era of vm's that are frozen and copied around between hosts, this seems like a spectacularly bad implementation. It's way to easy for this code to end up running from the same start state in such a situation, and as soon as that happens you'll have tons of exactly duplicated UUID sequences. I would hate to have to debug the mess that would produce. And that's aside from possible security issues.
Please do not use this gem. SecureRandom.uuid has a sane (and standard) implementation.
Looking at the code, the node id is derived from the MAC, and the sequence number state is stored in /tmp or your home dir.