

Fast ID Generation - jondot
http://blog.paracode.com/2012/04/16/fast-id-generation-part-1/

======
asdffdsa33
Your "tl;dr" isn't a "tl;dr". You simply said "I'm going to talk about..."
instead of condensing your article and saying something like "I use a high
resolution clock and core Id (for multi-core) to generate 1000 Ids/sec on
multiple systems"

~~~
jondot
Thanks, point taken. I've rephrased it, CTRL-F5 / CTRL-R to force-expire the
browser cache.

------
nodata
Didn't all of the guid libraries solve this already?

~~~
jondot
No, because guid/uuids don't necessarily sort by time. You might argue a
Version 1 UUID, which may fit, but was designed for a now outdated purpose.

You have more problems, if you cannot allow the number of bits a UUID spans,
which is one of the things Twitter's Snowflake solved.

~~~
NyxWulf
Can you elaborate on a few things? You say that UUID version 1 was designed
for an outdated purpose. What do you mean by that? UUID V1 does sort by time,
so I'm not sure where the outdated purpose is relevant to that consideration.

You say you cannot allow the number of bits a UUID spans. Is that a hard
constraint, or is that an intuitive desire? in my experience limiting the
number of bits is usually to fit on a single machine or in limited memory.
However designing a distributed id system is all about scale and minimizing
complexity from my point of view.

When you start to scale out to dozens of machines, the weight of these complex
methods really starts to become evident.

I resisted UUID's for a long time, but I finally gave in, and you know what?
It's one of the best decisions I've ever made. The design simplicity with
v1UUID's is amazing. With the simple_uuid ruby gem I can generate 40kish
UUID's per machine on a modest ec2 instance. Using NTP our clock drift is in
the 10s of microseconds range IIRC. Using Cassandra and Hadoop we don't have
to worry as much about fitting into a single machine. The overall improvements
have been staggering though. The simplest and most important benefit is none
of the services have to read or request information before generating an ID.
Now it's generate an id, send the event on it's way, next. I'll leave it as an
exercise for the interested reader to map out the synchronization process for
generating unique id's without locking across multiple threads and multiple
machines.

tl;dr Just use version 1 UUID's and NTP.

~~~
jondot
You present very valid arguments.

Regarding UUID v1, refer to spec in section 4.5. "Node IDs that Do Not
Identify the Host". In the case of generating IDs, I don't care about the
device's MAC or allowed block of addresses to use, so I can make a better use
of that. Some implementations of UUID v1 uses random jitter which is what I
believe is up to date and useful for this case.

In some cases not allowing the number of bits is a hard constraint.

In other cases you'd want to generate a TimeUUID (Cassandra) and take down 2
goals in one strike.

Eventually, in the second part, I pick generating a TimeUUID with simple_uuid
(which uses jitter,
[https://github.com/ryanking/simple_uuid/blob/master/lib/simp...](https://github.com/ryanking/simple_uuid/blob/master/lib/simple_uuid.rb#L56))
exactly per your conclusion.

The goal of this part of the post was to present the problems this kind of
thing presents :)

------
DanielStraight
If you need the time, couldn't you just add a timestamp?

Sometimes ordering by ID makes sense, but I don't understand why it would be
seen as necessary.

~~~
jondot
Sure, you could. The post refers to the case of just time+randomness. This
case is unfortunately not sufficient when you apply the additional constraints
mentioned (such as number of bits).

It's not necessary, but if you try to store data in things like Cassandra (or
even Redis), it becomes crucial for your data model querying strategies.

~~~
Retric
48 bit's is plenty to use time stamp truncated to whatever accuracy you need +
either random assignment or giving each machines it's own range to assign
numbers. Until you need more than 100,000 ID's per second when you just need
more bits.

