
2^120 Ways to Ensure Unique Identifiers - tlarkworthy
https://www.firebase.com/blog/2015-02-11-firebase-unique-identifiers.html
======
CRidge
Perhaps I'm missing something here, but where's the guarantee that these ID's
are unique? As far as I can tell, an ID is made up of a timestamp and a random
number, generated by the JavaScript enginge - and typically seeded with the
current time.

Wouldn't that give relatively good chances that for a system with many users,
two users would actually generate the same ID?

~~~
bmm6o
Best-case scenario (for them) is that the random data is "truly" random. 72
bits of the ID are sourced from Math.random(), which means you need to start
worrying about collisions when clients attempt to generate 2^36 events with
the same timestamp. That's about 70 billion, so it's not something that would
keep me up at night.

What's harder to quantify is the role that sub-optimal random number
generators would play in this. Implementation of Math.random is left to the
browser, but you could imagine a simple LCG with just a 64-bit long as its
state, initialized with the current time. Having only 64 bits of state is the
limiting factor here, both for properly filling out the 72 bits of entropy
(you can't) and for 2 users to have the same internal state in their LCG.

------
antics
Only-kind-of-related: Instagram has a cool post about using ordered IDs to
shard photos. It's here: [http://instagram-
engineering.tumblr.com/post/10853187575/sha...](http://instagram-
engineering.tumblr.com/post/10853187575/sharding-ids-at-instagram)

Pretty cool stuff.

------
xwintermutex
I don't understand this sentence: "One caveat to the randomness is that in
order to preserve chronological ordering if a client creates multiple push IDs
in the same millisecond, we just ‘increment’ the random bits by one."

~~~
DanielStraight
The ID consists of a timestamp to millisecond precision and a random value.
Since the timestamp only offers millisecond precision, two IDs generated in
the same millisecond could be out of order depending on the random value
because the timestamps will be the same. In order to prevent that, if two IDs
are generated in the same millisecond, instead of generating a new random
value, they take the random value from last time and add 1. This ensures that
even though the timestamp of the second ID is the same as the first one, it
will sort after the first one.

~~~
xwintermutex
I see, thanks!

------
emmab
The article would have benefited from explaining in what situations you might
prefer this to a UUID

~~~
pstrateman
The only reason to use this over a UUID is if you want the primary key to be
implicitly chronologically ordered.

The ordering guarantee is weak though, it's provided by clients all
potentially with their own concept of time.

~~~
rosser
Yeah, you absolutely can't trust client-generated timestamps. We have a table
tracking metadata on customer-uploaded photos (dimensions, file size, color
profile, _& c_), including the camera's timestamp. According to their cameras,
our customers have uploaded pics taken before the Buddha sat beneath the Bodhi
tree, and after the Khitomer Accord was signed.

Fortunately, we also store an upload timestamp, and our servers all sync to
local NTP.

------
zaroth
> "This guarantees chronological ordering, since the ServerValue.TIMESTAMP
> will be resolved on the Firebase server to be an accurate timestamp."

To be sure, this can't possibly guarantee chronological ordering. But it can
get you fairly close and perhaps almost predictable chronological ordering...

If at any level there is a clock which can skew and is being adjusted with a
stochastic process, there is jitter, and you can get out of order. The
solution is simply to apply corrective factors in a continuous manor -- slow
down or speed up time. That's what the client adjustment should have been
doing all along, and then you would never need the ServerValue.TIMESTAMP hack.

If Firebase is relying on a ntp utility, I'm not sure but I would assume that
they are smart enough to slow or speedup time and not jerk to a new setting.
Or at least they would have a flag to enable that functionality.

Edit: Distributed auto-incrementing counter is not that hard. A field for
time, a field for ID. I don't see a reason to conflate the two. I guess it
makes object instantiating easier, since you don't have to fetch an ID in any
sense. 64 vs 128 bits is not a big difference. So in the end it's a good
design decision and is a net positive for them.

~~~
x3c
Could you please elaborate on time speed up or slow down? Or if you could
refer me to some reading material? It's an interesting problem and I'd love to
deep dive into it.

~~~
zaroth
[http://www.ntp.org/ntpfaq/NTP-s-algo.htm#Q-ALGO-CLK-
UPD](http://www.ntp.org/ntpfaq/NTP-s-algo.htm#Q-ALGO-CLK-UPD)

------
ademarre
This looks similar to the method employed here:

[https://eager.io/blog/how-long-does-an-id-need-to-
be/](https://eager.io/blog/how-long-does-an-id-need-to-be/)

(HN discussion:
[https://news.ycombinator.com/item?id=8539866](https://news.ycombinator.com/item?id=8539866))

------
obstinate
How do you ensure that all your server times are perfectly synched? Or is
there just one server? This seems somewhat doubtful.

What happens when I push after a negative leap second?

What happens if the system clock is randomly skewy (see the relevant Rachel by
the Bay article).

~~~
dkubb
You could side-step leap second issues by using a TAI based timestamp.

------
SeoxyS
This has always been my favorite way to generate IDs, though I typically do it
with a 64bit integer and a custom epoch.

Being able to store keys as 64bit integers is much easier to work with, and
also much more efficient than b64 encoding. In the odd case you're working
with a language that can't handle numbers (ahem… javascript), you can always
encode it as a string.

Here's a library that implements this method:
[https://github.com/Fubelly/rubyflake](https://github.com/Fubelly/rubyflake)

~~~
jimktrains2
I love snowflakes:)

I wrote a little rust implementation[1] also

[1]
[https://github.com/jimktrains/rustflake](https://github.com/jimktrains/rustflake)

------
nodesocket
We generate ids using the following:

    
    
       $id = Functions::generate_random(5) . uniqid() . Functions::generate_random(3);
    

`uniqid()` ensures each id is unique (somewhat), and the random paddings of 5
and 3 characters make the ids harder to guess and unpredictable.

~~~
stormbrew
This uniqid() [1]? That appears to just be a microsecond timestamp. Definitely
not globally unique by any stretch.

[1]
[http://php.net/manual/en/function.uniqid.php](http://php.net/manual/en/function.uniqid.php)

~~~
ars
Yes, if you need unique, and generated on the server (not remotely) then also
include getmypid() and the IP address (or name) of the webserver (if you have
more than one).

Then use the option for more entropy in case the function is called more than
one in a milliseconds.

So in total:

    
    
        uniqid(gethostname().getmypid(), true);
    

Will be unique if you give sane names to your machines.

------
intortus
This looks like a longer, simpler form of twitter's 64-bit snowflake.

