
A Brief History of the UUID - mrbbk
https://segment.com/blog/a-brief-history-of-the-uuid/
======
tytso
I can actually fill in some of the details about the history of UUID's. Paul
Leach was an architect who worked at Apollo, OSF, and later Microsoft. I met
Paul in the mid-90's when he was an architect at OSF, and I was the tech load
for Kerberos v5 development at MIT. OSF DCE was going to use Kerberos for
authentication, and was going to use Apollo RFC as its RPC layer.

It was from talking to Paul that I learned about UUID's, and I added libuuid
into e2fsprogs 1.05, released September 7, 1996. UUID's were used in Linux in
the ext2 superblock, and later on, GNOME picked it up and used it extensively,
which meant among other things that if you wanted to run GNOME on FreeBSD or
NetBSD or Solaris, you had to compile e2fsprogs to get libuuid. :-)

Later on Paul went on to Microsoft, and I'm fairly certain that it was due to
Paul that Microsoft adopted the OSF DCE RPC layer for its internal use, and
UUID's started being used extensively inside Microsoft. UUID's also got used
in Intel's EFI specification for the GPT partition table, although somewhere
along the way they got renamed "Globally Unique ID's" \--- it's the same spec,
though.

While Paul was at Microsoft, the specs for UUID's finally got standardized by
the IETF as RFC 4122, so you no longer needed to get find dated copies of the
OSF DCE specification (or download e2fsprogs since I had an early version of
the UUID Internet Draft in the sources long before it finally squirted out the
other end of the RFC publication pipeline).

As far as uuidd is concerned, the reason why it exists is because a certain
very large Enterprise Resource Planning system was using libuuid to generate
uuid's for its objects, and it needed to create them very, very quickly so
they can initalize the customer's ERP database in finite time. They were also
using the time-based UUID's, with the UUID stored in the database with the
bytes cleverly rearranged so the time bits would be stored in the LSB, and the
Ethernet MAC address would be in the MSB, so that a database using a B-tree
(plus prefix key compression) for its indexing would be able to very
efficiently index the UUID's. This is similar to k-ordering trick that Flake
was using, but this very large enterprise planning company was doing in 2007,
five years before team at Boundary came up with Flake, and they were doing it
using standard UUID's, but simply storing the Time-based UUID bytes in a
different order. (I believe they were also simply storing the ID in binary
form, instead of base-62 encoding, since if you're going to have jillions of
objects in your ERP database, you want them to be as efficient as possible.)

Anyway, a certain Linux distribution company contacted me on behalf of this
very large Enterprise Resource Planning company, and we came up with a scheme
where the uuidd daemon could issue blocks of time-based UUID's to clients, so
we could amortize the UUID generation over blocks of 50 or 100 UUID's at a
time. (This ERP was generating a ___huge_ __number of UUID 's.) I did it as a
freebie, because I was tickled pick that libuuid was such a critical part of a
large ERP system, and it wasn't that hard to implement the uuidd extension to
libuuid.

~~~
cek
I can confirm Paul helped drive MS's adoption of OSF DCE RPC.

Once he convinced me of the uniqueness of correctly generated UUIDs I coined
the phrase "the likelihood of a UUID collision is the same as an avocado
spontaneously turning into a grapefruit."

A fun tid-bit: At one point I was the maintainer of the list of static UUIDs
with the Microsoft bit set. It was a flat text file checked into the windows
source. I reserved a chunk of them for my own projects because having all
those zeros was useful in debugging. E.g.
"00000000-0000-0000-c000-000000000046" (the c000 indicates MS reserved).

In '97 I wrote the internet-draft [1] that Paul & Rich Salz finally turned
into RFC 4122 in 2005 [2].

[1] [https://www.ietf.org/archive/id/draft-kindel-uuid-
uri-00.txt](https://www.ietf.org/archive/id/draft-kindel-uuid-uri-00.txt) [2]
[https://datatracker.ietf.org/doc/rfc4122/](https://datatracker.ietf.org/doc/rfc4122/)

~~~
chx
I am not sure how other people feel but despite me using the Internet for a
good 24 years now I am still astonished at the connection possibilities it
gives to us little people. Here I am, a man of little acclaim and communicate
with people who wrote e2fsprogs and the uuid RFC and whatnot. This is still a
wonder and I doubt it'll ever become an everyday feeling.

------
Animats
In the early days of network interface, unique IDs were a problem. It was once
suggested that each network interface have a $1 bill attached, with the serial
number of the bill being the adapter ID.

This was a real problem in the early days of low-cost Ethernet controllers.
Some manufacturers didn't buy their own address space [1], but reused that of
some major vendor. (Usually 3COM) This resulted in occasional real-world
duplicates.

[1] [https://regauth.standards.ieee.org/standards-ra-
web/pub/view...](https://regauth.standards.ieee.org/standards-ra-
web/pub/view.html#registries)

~~~
X-Istence
Somewhere I have two network cards with the same MAC address burned into them.

That was a lot of fun to figure out when I was younger and things started
going haywire!

~~~
tonyarkles
Just like the sibling post, I had the same thing happen on cheap NE2K nics.
Imagine my surprise when pinging one machine and getting two replies back!

Edit: it was on a hub too, so nothing kept the two machines from seeing the
packet.

------
skrebbel
I really like the ideas behind ksuid (near the end of the article). However,
two quotes:

> _Those concerned with UUID collision in a properly-configured system would
> find their time better spent pondering far more probable events like solar
> flares, thermonuclear war, and alien invasion on their systems._

And then further down:

> _A “custom” epoch is used that ensures >100 years of useful life._

Wait, so the last 128 bits of a KSUID won't get me in trouble before the sun
explodes, but the first 32 bits (the timestamp) will cause trouble well before
my grandkids die?

I really wonder why they didn't reserve some more bits for the timestamp, if
necessary at the cost of some less randomness. Could've made this stuff last
for millenia at no extra collision risk, in practice.

~~~
stock_toaster
Yeah. Not sure why they didn't just go straight for 64bit timestamps (maybe
10^-8 second granularity ~= 5.8k years) and 64bits of random data.

I also wonder if base58 would have been a bit nicer. base62 is of course
slightly more compact, but base58 is nice that it reduces visual character
ambiguity.

~~~
tytso
The original UUID's (from Apollo, as used in OSF DCE, e2fsprogs, Microsoft,
etc.) used a 64-bit timestamp, with 100ns granularity. It uses as the start of
its Epoch the beginning of the Gregorian Calendar (00:00:00.00, 15 October
1582), so it's good up to 3400 A.D. before it hits the 2038 problem. :-)

Note that UUID's, like IPv6 addresses, are sufficiently long that if users are
needing to interact with them directly, You're Doing Something Wrong. So the
whole base-62 versus using hyphens versus base58 discussion misses the point,
in my view. Computers will generally be exchanging them in 128-bit binary
format, and they should only really be dumped out for debugging reasons.

~~~
stock_toaster

      > Note that UUID's, like IPv6 addresses, are sufficiently 
      > long that if users are needing to interact with the 
      > directly, You're Doing Something Wrong
    

One would think so, but you would be surprised how many times someone has
typed out to me (or copy/pasted) a UUID for debugging purposes, or how often I
have had to eyeball UUIDs in production logs.

~~~
kijin
We eyeball, compare and copy/paste git commit IDs all the time, and they're
longer than UUIDs. I don't see any problem with it as long as there's good
tooling to help us interact with these IDs (e.g. git allows us to use the
first few characters of a commit ID if there's no collision).

------
mappu
There's also "ULID" in this neo-UUID space:
[https://github.com/alizain/ulid](https://github.com/alizain/ulid)

48-bit timestamp plus 80 bits randomness, base32 encoding (no hyphens), and
lexicographic sort order.

~~~
stock_toaster
That does seem pretty nice. Thanks for the link.

------
MichaelGG
Why would you ever want to use UUID format, which only has 122 bits, versus
just making a random 128 bit number? In which realistic scenario would simply
reading 16 bytes from urandom not be fine and actually cause issues that
removing 6 of those bits to identify the UUID type help?

Also, 32 bit timestamp + 128 random? I guess, but that sounds sort of
overkill-ish - if you're going to go to 20 bytes (and thus not fit in a DB's
UUID type, require more than 2 registers, etc.), why not make it 24 or 32
bytes and have a proper timestamp? Or if 32-bit timestamp is really
acceptable, are you sure that 96-bits of randomness are not?

~~~
013a
One of the things I've found strange about UUIDs is their serialization to hex
when displayed as a string, yet I've seen real life projects with little
technical debt store them as a string in a database. This is obviously the
fault of the programmer, but you have to look at that and think if you're
serializing to a string (whether that be in a database or over the network),
there are so many better options.

~~~
Ciantic
There will be a lot of these. Microsoft's ASP.NET Core Identity by default
stores user id as a GUID string to the database:

[https://github.com/aspnet/Identity/blob/dev/src/Microsoft.Ex...](https://github.com/aspnet/Identity/blob/dev/src/Microsoft.Extensions.Identity.Stores/IdentityUser.cs#L12)

------
wolfgang42
Vaguely related story: I inherited a system which generates SSCCs to identify
each box being shipped from a warehouse. These are supposed to be globally
unique, and are generated from a company's GS1 number (the same used for UPCs)
plus another number which the company is supposed to make sure is unique. This
particular system generated them on the client-side, based on the current
timestamp in microseconds, with a random pause to prevent two computers from
generating identical runs if they both printed labels at the same time. In a
fairly small warehouse, this generated collisions about every six months or
so. I instead changed the program to use an (already existing!) auto-increment
column on the shared database, thus precluding any possibility of collision
and making the program a lot faster since it was no longer delaying for a
quarter-second per label (on shipments of 200+ boxes).

------
mnarayan01
Having 32 bits of 1-second resolution time and 128 bits of random payload
makes the idea that these are "semi-sortable" a bit odd. Consider:

1\. Let's be super-lenient and say that we'll consider an _average_ size
bucket of up to 64k (2^16) equivalent entries to be "semi-sortable".

2\. If you generate anymore than 2^48 (2^32 * 2^16) IDs over the full 100ish
year lifetime of the ID, then your giving up on even that super-lenient
definition of "semi-sortable".

3\. If you're only ever going to generate 2^48 IDs, then 2^128 bits of random
payload (in addition to the 32 bits of timestamp!) seems like absurd overkill.

Given the amount of thought that obviously went into this, I'm guessing that
there's probably a good reason that they decided to go with 32 bit timestamps
(I can certainly think of many, SHA1 length assumptions being a likely
component), but if it's in the article, I missed it.

------
smaili
It's a little buried, but here's the primary tldr of their KSUID library:

> Thus KSUID was born. KSUID is an abbreviation for K-Sortable Unique
> IDentifier. It combines the simplicity and security of UUID Version 4 with
> the lexicographic k-ordering properties of Flake. KSUID makes some trade-
> offs to achieve these goals, but we believe these to be reasonable for both
> our use cases and many others out there.

------
cat199
Related tangent:

Anyone have any Domain/OS stories or resources they want to share?

This system always seemed like an interesting one, but details are fairly
scarce..

------
Valodim
> However, on a mobile device, almost anything goes: mobile devices cannot be
> trusted. While most of these are just as good as what’s available in the
> scenario above, it’s routine that the PRNG source on these devices isn’t
> very random at all. Given that there’s no way to certify the quality of
> these, it’s a big gamble to bet on mobile PRNGs. ID generation on low-trust
> mobile devices is an interesting and active area of academic research[1].

This is true for highly specialized systems like sensor nodes maybe. For what
is generally understood as a "mobile device", i.e. mobile phones or tablets,
it is bollocks.

------
odbol_
> It borrows core ideas from the ubiquitous UUID standard, adding time-based
> ordering.

Isn't time-based ordering bad, since it might allow hackers to predict UUID
generation and use it to compromise security systems based on UUID?

------
gumby
The phone number was hardly the "first unique identifier in a network" and
switchboards worked just fine before phone numbers; phone numbers were first
added in the 1890s because of the Stronger switch.

The first UUIDs in networks were probably titles (nobility or job titles in a
byzantine empire like China, Russia or, less, the Ottoman Empire). "Chief
Assistant to the Assistant Chief of Shipbuilding" is a unique node identifier
(doesn't identify a person, but then again phone numbers are reused too).

------
rphlx
/dev/urandom is a major liability on any machine with low uptime booted for
the first time from a widely-used image (i.e. a VPS), and on embedded systems
which have few sources of entropy & do not (or cannot) save/restore it across
boots. As a result there can be a much higher than pure-random chance of a
collision in the "random" portion of a UUID.

------
danielbankhead
Created something similar called "bronze". It tackles the problem of creating
unique identifiers at a slightly different angle, while allowing high
collision resistance:

[https://github.com/AltusAero/bronze](https://github.com/AltusAero/bronze)

------
tuupola
Since am big fan of everything Base62 I started to work on a PHP
implementation of KSUID.

[https://github.com/tuupola/ksuid](https://github.com/tuupola/ksuid)

------
gnu8
Is 996238e1-28d1-4b53-b81b-beae25f8edde working for anyone?

~~~
cpeterso
works for me.

[https://google.com/search?q=996238e1-28d1-4b53-b81b-beae25f8...](https://google.com/search?q=996238e1-28d1-4b53-b81b-beae25f8edde)

------
foreigner
At first I read this as "A Brief History of the IUD" \- that's not the same
thing at all :-)

