Hacker News new | past | comments | ask | show | jobs | submit login

I can actually fill in some of the details about the history of UUID's. Paul Leach was an architect who worked at Apollo, OSF, and later Microsoft. I met Paul in the mid-90's when he was an architect at OSF, and I was the tech load for Kerberos v5 development at MIT. OSF DCE was going to use Kerberos for authentication, and was going to use Apollo RFC as its RPC layer.

It was from talking to Paul that I learned about UUID's, and I added libuuid into e2fsprogs 1.05, released September 7, 1996. UUID's were used in Linux in the ext2 superblock, and later on, GNOME picked it up and used it extensively, which meant among other things that if you wanted to run GNOME on FreeBSD or NetBSD or Solaris, you had to compile e2fsprogs to get libuuid. :-)

Later on Paul went on to Microsoft, and I'm fairly certain that it was due to Paul that Microsoft adopted the OSF DCE RPC layer for its internal use, and UUID's started being used extensively inside Microsoft. UUID's also got used in Intel's EFI specification for the GPT partition table, although somewhere along the way they got renamed "Globally Unique ID's" --- it's the same spec, though.

While Paul was at Microsoft, the specs for UUID's finally got standardized by the IETF as RFC 4122, so you no longer needed to get find dated copies of the OSF DCE specification (or download e2fsprogs since I had an early version of the UUID Internet Draft in the sources long before it finally squirted out the other end of the RFC publication pipeline).

As far as uuidd is concerned, the reason why it exists is because a certain very large Enterprise Resource Planning system was using libuuid to generate uuid's for its objects, and it needed to create them very, very quickly so they can initalize the customer's ERP database in finite time. They were also using the time-based UUID's, with the UUID stored in the database with the bytes cleverly rearranged so the time bits would be stored in the LSB, and the Ethernet MAC address would be in the MSB, so that a database using a B-tree (plus prefix key compression) for its indexing would be able to very efficiently index the UUID's. This is similar to k-ordering trick that Flake was using, but this very large enterprise planning company was doing in 2007, five years before team at Boundary came up with Flake, and they were doing it using standard UUID's, but simply storing the Time-based UUID bytes in a different order. (I believe they were also simply storing the ID in binary form, instead of base-62 encoding, since if you're going to have jillions of objects in your ERP database, you want them to be as efficient as possible.)

Anyway, a certain Linux distribution company contacted me on behalf of this very large Enterprise Resource Planning company, and we came up with a scheme where the uuidd daemon could issue blocks of time-based UUID's to clients, so we could amortize the UUID generation over blocks of 50 or 100 UUID's at a time. (This ERP was generating a huge number of UUID's.) I did it as a freebie, because I was tickled pick that libuuid was such a critical part of a large ERP system, and it wasn't that hard to implement the uuidd extension to libuuid.




I can confirm Paul helped drive MS's adoption of OSF DCE RPC.

Once he convinced me of the uniqueness of correctly generated UUIDs I coined the phrase "the likelihood of a UUID collision is the same as an avocado spontaneously turning into a grapefruit."

A fun tid-bit: At one point I was the maintainer of the list of static UUIDs with the Microsoft bit set. It was a flat text file checked into the windows source. I reserved a chunk of them for my own projects because having all those zeros was useful in debugging. E.g. "00000000-0000-0000-c000-000000000046" (the c000 indicates MS reserved).

In '97 I wrote the internet-draft [1] that Paul & Rich Salz finally turned into RFC 4122 in 2005 [2].

[1] https://www.ietf.org/archive/id/draft-kindel-uuid-uri-00.txt [2] https://datatracker.ietf.org/doc/rfc4122/


I am not sure how other people feel but despite me using the Internet for a good 24 years now I am still astonished at the connection possibilities it gives to us little people. Here I am, a man of little acclaim and communicate with people who wrote e2fsprogs and the uuid RFC and whatnot. This is still a wonder and I doubt it'll ever become an everyday feeling.


If a UUIDv4 is used, then yes, I'm convinced it will be random to within the amount of entropy the system provides.

If another UUID version is used, I'm extremely wary. MAC addresses are explicitly not unique. Worse, there are plenty of bargain mfgs that make cards all with the same MAC. There's a story of a local college that ordered several hundred cards for workstations and discovered every single one had the exact same hardware MAC. Large orgs will spot this easily, but there could be hundreds or thousands of individuals that never tried connecting the same bad cards on the same network segment.

In practice, you can usually depend on it to be truly unique. But there are several failure modes that you absolutely need to be aware of if you plan on using this in production.


Paul is definitely a very smart guy and something of a programming hero of mine. I did have the good fortune to have dinner with him once, although I think I was too shy to really ask him anything.

The DCE library “surprisingly distributed by Apple” is used by their CIFS server and NetLogon daemon. From memory, it originally came from the OSF DCE RPC source code drop, which I then slightly modernised for use in XAD (the first Active Directory-compatible domain controller replacement, which became Domain Services for Windows at Novell), and then found its way to Apple via Likewise.


Even in 2007 the system csrng should be able to generate many millions of uuids/sec on one core. (Say, 10M/sec easily?) At that point isn't it just a matter of keeping a timestamp updated and swizzling it in?

How many millions/sec did they need?


The issue is that they had multiple threads generating objects and they wanted to use unique time-based UUID's because they wanted to efficiently fill a btree-indexed database. So if the Most significant bytes are constant, then all of the new objects will land in the same leaf node, and you can use certain hueristics such as header compression and assuming that you should fill a leaf node before adding a new node (instead of splitting the node equally) because you know that you are inserting keys into the b-tree in monotonically increasing sort order.

If you use random number generator to generate random UUID's, then you will be inserting the objects all over the b-tree and once the tree is too big to fit into memory, performance goes down the toilet.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: