Be careful with your random tokens

jasonwatkinspdx · on Feb 19, 2013

  "A UUID is 128 bit long, and consists of a 60-bit time
  value, a 16-bit sequence number and a 48-bit node
  identifier.
  
  The time value is taken from the system clock, and is 
  monotonically incrementing. However, since it is possible 
  to set the system clock backward, a sequence number is 
  added. The sequence number is incremented each time the 
  UUID generator is started. The combination guarantees that 
  identifiers created on the same machine are unique with a 
  high degree of probability."

Looking at the code, the node id is derived from the MAC, and the sequence number state is stored in /tmp or your home dir.

In an era of vm's that are frozen and copied around between hosts, this seems like a spectacularly bad implementation. It's way to easy for this code to end up running from the same start state in such a situation, and as soon as that happens you'll have tons of exactly duplicated UUID sequences. I would hate to have to debug the mess that would produce. And that's aside from possible security issues.

Please do not use this gem. SecureRandom.uuid has a sane (and standard) implementation.

mikeash · on Feb 20, 2013

Not that people necessarily do this, but if you duplicate a VM without changing the copy's MAC address then you're setting yourself up for all sorts of problems. It's not all that unreasonable to assume that MAC addresses are unique. After all, that's their whole point.

jasonwatkinspdx · on Feb 20, 2013

I looked further into the code and it's actually uglier than that. Since ruby can't get at the MAC directly, instead they call the ruby global rand function (which is a fast and weak psuedorandom generator on many platforms) 256 times, concatenate the results and sha1 it to create a fake MAC, which is then stuffed into the state file. If a state file exists, it is preferred to creating a new one. So really the entire initial sequence state is in the state file and system clock. That's way to likely to overlap IMHO, particularly on vm images that may not update the system clock immediately on spawning an image, before the app comes up.

Your point about always using a new MAC when you thaw or duplicate a vm image is well taken.

I'm convinced that when code has cryptographic implications, people should not roll their own ad hoc weird solutions. Nearly every platform has a way of accessing high quality entropy. Version 4 random UUID's have been standardized for a while now. I don't understand the logic of using some "I invented this" implementation.

mikeash · on Feb 20, 2013

Good lord. Even if they used true randomness to generate their fake MAC, you'd still expect a collision after doing this only about 16 million times. What were they thinking?

mikebabineau · on Feb 19, 2013

The RFC outlines a single variant of UUID, and describes five different versions of this variant. One of these versions is random. The author, however, is not using this version. I may be mistaken fact, but the uuid Ruby gem he used appears to implement a different variant altogether. From wikipedia:

    In the canonical representation, xxxxxxxx-xxxx-Mxxx-Nxxx-xxxxxxxxxxxx, the 
    most significant bits of N indicates the variant (depending on the variant; 
    one, two or three bits are used). The variant covered by the UUID 
    specification is indicated by the two most significant bits of N being 1 0 
    (i.e. the hexadecimal N will always be 8, 9, A, or B).
    In the variant covered by the UUID specification, there are five versions. 
    For this variant, the four bits of M indicates the UUID version (i.e. the 
    hexadecimal M will either be 1, 2, 3, 4, or 5).

The examples from the blog post don't appear to fit the RFC variant:

    855ff330-5ce6-0130-d84d-12313d05011b
    33aa4b00-5ce7-0130-d84d-12313d05011b

If the gem used [variant 1] version 4 UUIDs instead, this wouldn't be a problem (http://en.wikipedia.org/wiki/Universally_unique_identifier#V...):

    Version 4 UUIDs use a scheme relying only on random numbers. This algorithm 
    sets the version number as well as two reserved bits. All other bits are set 
    using a random or pseudorandom data source. Version 4 UUIDs have the form 
    xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx where x is any hexadecimal digit and y 
    is one of 8, 9, a, or b. e.g. f47ac10b-58cc-4372-a567-0e02b2c3d479.

Ruby 1.9 includes version 4 UUIDs as a built-in: SecureRandom.uuid. The author should consider using that instead.

bradleybuda · on Feb 19, 2013

We ended up using Ruby's SecureRandom.hex, which is 128 bits of the form "e7e12f9b82f71d90b678373c360e497c" - roughly equivalent to the 126 bits of randomness you get from a version 4 UUID.

nodesocket · on Feb 19, 2013

The same problem applies with php using the function `uniqid`. The function does not generate cryptographically secure tokens. Here are 10 generated in a row:

    5123ed3e43a9c
    5123ed3e43c3c
    5123ed3e43c44
    5123ed3e43c4b
    5123ed3e43c51
    5123ed3e43c56
    5123ed3e43c5c
    5123ed3e43c62
    5123ed3e43c6b
    5123ed3e43c74

krapp · on Feb 19, 2013

Yeah, uniqid is just an altered timestamp as I understand it. But it's not meant to be in any way cryptographically secure or random, though I'm sure someone somewhere uses it for tokens, passwords, whatnot.

beambot · on Feb 19, 2013

This advice doesn't appear to be universally true. There are different versions of UUID, including version 4 for truely "random" (or pseudo-random) UUIDs.

Looking at the python implementation:

http://docs.python.org/2/library/uuid.html

  def uuid4():
    """Generate a random UUID."""

    # When the system provides a version-4 UUID generator, use it.
    if _uuid_generate_random:
        _buffer = ctypes.create_string_buffer(16)
        _uuid_generate_random(_buffer)
        return UUID(bytes=_buffer.raw)

    # Otherwise, get randomness from urandom or the 'random' module.
    try:
        import os
        return UUID(bytes=os.urandom(16), version=4)
    except:
        import random
        bytes = [chr(random.randrange(256)) for i in range(16)]
        return UUID(bytes=bytes, version=4)

icoder · on Feb 19, 2013

iirc the PHP explicitly states that these are not true random

ars · on Feb 20, 2013

You can set the more_entropy argument of uniqid to true to get better randomness.

http://www.php.net/uniqid

twistedpair · on Feb 19, 2013

Who ever said UUID's were random? They are sufficiently structured to be globally UNIQUE. That's the contract.

bradleybuda · on Feb 19, 2013

You're absolutely right. It was 100% our fault that Meldium was broken - we didn't pay attention to the implementation, and we assumed that job IDs were random when they really weren't. I don't mean to cast any blame on UUIDs or resque-status in this post.

icoder · on Feb 19, 2013

And that becomes clear from the article but leaves the title a bit off

vilda · on Feb 19, 2013

Depends on implementation - always read docs. UUID type 4 is random and does not follow standard rules for UUID generation.

MichaelGG · on Feb 20, 2013

The easiest way is to just read 128-bits of randomness. There's no good use case, outside of wanting sorted identifiers, that warrant using non-random UUIDs.

jazzychad · on Feb 19, 2013

Plleeeaaaaase change your logo (and "Home") to link to your main homepage instead of your blog's root... more: http://blog.jazzychad.net/2012/05/28/startups-fix-your-blog-...

potshot · on Feb 19, 2013

Great point. Our blog is on Squarespace and we didn't see a way to make it happen - we're checking w/ them to see if we're doing something wrong.

dwb · on Feb 19, 2013

Why not use Base64-encoded random bytes? You can fit the same amount of random bits into a smaller ASCII representation. Not a big issue, I know, but hey. Here's Ruby's: http://www.ruby-doc.org/stdlib-1.9.3/libdoc/securerandom/rdo...

derefr · on Feb 20, 2013

Base64 is really annoying to deal with if you're using it for identifiers; you'll inevitably want to drop something's ID in the middle of, say, URLs, or usernames, or email addresses, and then the case-sensitivity and those three extra characters (+ / =) will bite you.

z-base-32[1] is a much human-friendlier alternative (while still being edible in nice discrete machine-word-chunks), although sadly, it's not supported by nearly as many stdlibs.

[1] http://philzimmermann.com/docs/human-oriented-base-32-encodi...

moe · on Feb 20, 2013

I don't see why you'd drop to such an inefficient encoding (especially when dealing with URLs) instead of just using url-safe base64; http://en.wikipedia.org/wiki/Base64#URL_applications

derefr · on Feb 20, 2013

Because URLs (especially the abbreviated kind that link-shorteners, pastebins, and image-hosts use) sometimes have to pass through bridges such as "reading them off a screen, writing them down on paper, and then typing them in somewhere else", or the slightly more direct "reading them off a billboard."

moe · on Feb 20, 2013

How barbaric!

cheald · on Feb 20, 2013

Or just use hex.

> `head -c 20 /dev/random`.unpack("H*").first => "0cc57e2664051b3fd803754fd538b83317c9057f"

It's reliably double the length of the input byte array, it isn't subject to transcription errors or special character conflicts, and hex conversions exist in every stdlib ever.

vilda · on Feb 19, 2013

Java UUID.randomUUID() is OK: "The UUID is generated using a cryptographically strong pseudo random number generator."

nodesocket · on Feb 19, 2013

If you want random but don't care about globally unique, use a function that calls `/dev/urandom` is your best choice for cryptographically secure tokens.

nhaehnle · on Feb 20, 2013

Cryptographically random bit strings are more likely to be collision-free than whatever UUID scheme you can cook up. The reason is that finding a collision among (uniform, independent) random n-bit strings is extremely low as long as you produce fewer than roughly 2^(n/2) of them, whereas generating the same UUID according to a deterministic process is relatively plausible if you consider non-synchronized clocks and virtualization.

biot · on Feb 19, 2013

The potential for duplicates is at odds with the statement "best choice for cryptographically secure". As with anything security related, never roll your own implementation if you can at all help it.

Dylan16807 · on Feb 20, 2013

That aside, urandom won't actually have duplicates. Just don't truncate it to a tiny size. And as far as I'm concerned grabbing a chunk of urandom is using someone else's implementation.

biot · on Feb 20, 2013

That's actually a good point I hadn't considered. Presumably /dev/urandom uses a CSPRNG implementation and 128 bits from that is just as good as 128 bits from any other CSPRNG source.