
Be careful with your random tokens - bradleybuda
http://blog.meldium.com/home/2013/2/19/be-careful-with-your-random-tokens
======
jasonwatkinspdx

      "A UUID is 128 bit long, and consists of a 60-bit time
      value, a 16-bit sequence number and a 48-bit node
      identifier.
      
      The time value is taken from the system clock, and is 
      monotonically incrementing. However, since it is possible 
      to set the system clock backward, a sequence number is 
      added. The sequence number is incremented each time the 
      UUID generator is started. The combination guarantees that 
      identifiers created on the same machine are unique with a 
      high degree of probability."
    

Looking at the code, the node id is derived from the MAC, and the sequence
number state is stored in /tmp or your home dir.

In an era of vm's that are frozen and copied around between hosts, this seems
like a spectacularly bad implementation. It's way to easy for this code to end
up running from the same start state in such a situation, and as soon as that
happens you'll have tons of exactly duplicated UUID sequences. I would hate to
have to debug the mess that would produce. And that's aside from possible
security issues.

Please do not use this gem. SecureRandom.uuid has a sane (and standard)
implementation.

~~~
mikeash
Not that people necessarily do this, but if you duplicate a VM without
changing the copy's MAC address then you're setting yourself up for all sorts
of problems. It's not all that unreasonable to assume that MAC addresses are
unique. After all, that's their whole point.

~~~
jasonwatkinspdx
I looked further into the code and it's actually uglier than that. Since ruby
can't get at the MAC directly, instead they call the ruby global rand function
(which is a fast and weak psuedorandom generator on many platforms) 256 times,
concatenate the results and sha1 it to create a fake MAC, which is then
stuffed into the state file. If a state file exists, it is preferred to
creating a new one. So really the entire initial sequence state is in the
state file and system clock. That's way to likely to overlap IMHO,
particularly on vm images that may not update the system clock immediately on
spawning an image, before the app comes up.

Your point about always using a new MAC when you thaw or duplicate a vm image
is well taken.

I'm convinced that when code has cryptographic implications, people should not
roll their own ad hoc weird solutions. Nearly every platform has a way of
accessing high quality entropy. Version 4 random UUID's have been standardized
for a while now. I don't understand the logic of using some "I invented this"
implementation.

~~~
mikeash
Good lord. Even if they used true randomness to generate their fake MAC, you'd
still expect a collision after doing this only about 16 million times. What
were they thinking?

------
mikebabineau
The RFC outlines a single variant of UUID, and describes five different
versions of this variant. One of these versions _is_ random. The author,
however, is not using this version. I may be mistaken fact, but the uuid Ruby
gem he used appears to implement a different variant altogether. From
wikipedia:

    
    
        In the canonical representation, xxxxxxxx-xxxx-Mxxx-Nxxx-xxxxxxxxxxxx, the 
        most significant bits of N indicates the variant (depending on the variant; 
        one, two or three bits are used). The variant covered by the UUID 
        specification is indicated by the two most significant bits of N being 1 0 
        (i.e. the hexadecimal N will always be 8, 9, A, or B).
        In the variant covered by the UUID specification, there are five versions. 
        For this variant, the four bits of M indicates the UUID version (i.e. the 
        hexadecimal M will either be 1, 2, 3, 4, or 5).
    

The examples from the blog post don't appear to fit the RFC variant:

    
    
        855ff330-5ce6-0130-d84d-12313d05011b
        33aa4b00-5ce7-0130-d84d-12313d05011b
    

If the gem used [variant 1] version 4 UUIDs instead, this wouldn't be a
problem
([http://en.wikipedia.org/wiki/Universally_unique_identifier#V...](http://en.wikipedia.org/wiki/Universally_unique_identifier#Version_4_.28random.29)):

    
    
        Version 4 UUIDs use a scheme relying only on random numbers. This algorithm 
        sets the version number as well as two reserved bits. All other bits are set 
        using a random or pseudorandom data source. Version 4 UUIDs have the form 
        xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx where x is any hexadecimal digit and y 
        is one of 8, 9, a, or b. e.g. f47ac10b-58cc-4372-a567-0e02b2c3d479.
    

Ruby 1.9 includes version 4 UUIDs as a built-in: SecureRandom.uuid. The author
should consider using that instead.

~~~
bradleybuda
We ended up using Ruby's SecureRandom.hex, which is 128 bits of the form
"e7e12f9b82f71d90b678373c360e497c" - roughly equivalent to the 126 bits of
randomness you get from a version 4 UUID.

------
nodesocket
The same problem applies with php using the function `uniqid`. The function
does not generate cryptographically secure tokens. Here are 10 generated in a
row:

    
    
        5123ed3e43a9c
        5123ed3e43c3c
        5123ed3e43c44
        5123ed3e43c4b
        5123ed3e43c51
        5123ed3e43c56
        5123ed3e43c5c
        5123ed3e43c62
        5123ed3e43c6b
        5123ed3e43c74

~~~
krapp
Yeah, uniqid is just an altered timestamp as I understand it. But it's not
meant to be in any way cryptographically secure or random, though I'm sure
someone somewhere uses it for tokens, passwords, whatnot.

~~~
beambot
This advice doesn't appear to be universally true. There are different
versions of UUID, including version 4 for truely "random" (or pseudo-random)
UUIDs.

Looking at the python implementation:

<http://docs.python.org/2/library/uuid.html>

    
    
      def uuid4():
        """Generate a random UUID."""
    
        # When the system provides a version-4 UUID generator, use it.
        if _uuid_generate_random:
            _buffer = ctypes.create_string_buffer(16)
            _uuid_generate_random(_buffer)
            return UUID(bytes=_buffer.raw)
    
        # Otherwise, get randomness from urandom or the 'random' module.
        try:
            import os
            return UUID(bytes=os.urandom(16), version=4)
        except:
            import random
            bytes = [chr(random.randrange(256)) for i in range(16)]
            return UUID(bytes=bytes, version=4)

------
twistedpair
Who ever said UUID's were random? They are sufficiently structured to be
globally UNIQUE. That's the contract.

~~~
bradleybuda
You're absolutely right. It was 100% our fault that Meldium was broken - we
didn't pay attention to the implementation, and we assumed that job IDs were
random when they really weren't. I don't mean to cast any blame on UUIDs or
resque-status in this post.

~~~
icoder
And that becomes clear from the article but leaves the title a bit off

------
jazzychad
Plleeeaaaaase change your logo (and "Home") to link to your main homepage
instead of your blog's root... more:
[http://blog.jazzychad.net/2012/05/28/startups-fix-your-
blog-...](http://blog.jazzychad.net/2012/05/28/startups-fix-your-blog-
links.html)

~~~
potshot
Great point. Our blog is on Squarespace and we didn't see a way to make it
happen - we're checking w/ them to see if we're doing something wrong.

------
dwb
Why not use Base64-encoded random bytes? You can fit the same amount of random
bits into a smaller ASCII representation. Not a big issue, I know, but hey.
Here's Ruby's: [http://www.ruby-
doc.org/stdlib-1.9.3/libdoc/securerandom/rdo...](http://www.ruby-
doc.org/stdlib-1.9.3/libdoc/securerandom/rdoc/SecureRandom.html#method-c-
urlsafe_base64)

~~~
derefr
Base64 is really annoying to deal with if you're using it for identifiers;
you'll inevitably want to drop something's ID in the middle of, say, URLs, or
usernames, or email addresses, and then the case-sensitivity and those three
extra characters (+ / =) will bite you.

z-base-32[1] is a much human-friendlier alternative (while still being edible
in nice discrete machine-word-chunks), although sadly, it's not supported by
nearly as many stdlibs.

[1] [http://philzimmermann.com/docs/human-oriented-
base-32-encodi...](http://philzimmermann.com/docs/human-oriented-
base-32-encoding.txt)

~~~
moe
I don't see why you'd drop to such an inefficient encoding (especially when
dealing with URLs) instead of just using url-safe base64;
<http://en.wikipedia.org/wiki/Base64#URL_applications>

~~~
derefr
Because URLs (especially the abbreviated kind that link-shorteners, pastebins,
and image-hosts use) sometimes have to pass through bridges such as "reading
them off a screen, writing them down on paper, and then typing them in
somewhere else", or the slightly more direct "reading them off a billboard."

~~~
moe
How barbaric!

------
vilda
Java UUID.randomUUID() is OK: "The UUID is generated using a cryptographically
strong pseudo random number generator."

------
nodesocket
If you want random but don't care about globally unique, use a function that
calls `/dev/urandom` is your best choice for cryptographically secure tokens.

~~~
biot
The potential for duplicates is at odds with the statement "best choice for
cryptographically secure". As with anything security related, never roll your
own implementation if you can at all help it.

~~~
Dylan16807
That aside, urandom won't actually have duplicates. Just don't truncate it to
a tiny size. And as far as I'm concerned grabbing a chunk of urandom is using
someone else's implementation.

~~~
biot
That's actually a good point I hadn't considered. Presumably /dev/urandom uses
a CSPRNG implementation and 128 bits from that is just as good as 128 bits
from any other CSPRNG source.

