
Ask HN: Some questions on transitioning to UUID in a large ecosystem - rottyguy
We are a large company and have many small systems with disparate key sets that we would like to standardized and would like to use UUIDs for the new global keyset. I have several questions regarding UUIDs if anyone can help with the answers.<p>1) Are there any issues co-mingling v4 &amp; v5 UUIDs in a single system? We would use v5 to transition the legacy systems to UUID and v4 for generating anything outside the legacy systems (new systems).<p>2) Per https:&#x2F;&#x2F;tools.ietf.org&#x2F;html&#x2F;rfc4122, &quot;A UUID is 128 bits long, and requires no central registration process.&quot;. This implies anyone in our ecosystem can generate a UUID and probably never collide. However, are there minimum restrictions on the machine that would generate these UUIDs? For instance, I would imagine a dependable clock (eg one that doesn&#x27;t reset to 1&#x2F;1&#x2F;1970 ever time it restarts) is necessary, but do they all have to be in sync or is some measure of skewing acceptable? Anything else?<p>3) Is there a list of reliable (future proof) uuid implementations we can use to cover all the major languages or are the standard libraries sufficient for v4&#x2F;v5 uuid gens? We have a mix of various flavors of *nix and windows in our ecosystem.<p>Thank you!
======
gravypod
Would it be possible to avoid standardizing, and coupling, to a single ID
type? It should be possible to locally store IDs as strings so the consumer of
the IDs doesn't need to know how to parse it and the provider of the ID (the
service) can choose whatever is natural.

For a content addressable file system it would be a hash, for another thing it
might be an int, for another thing it might be a UUID, etc.

If you need to "validate" the ID is correct then the only way to do that is to
contact the Source of Truth. Checking the syntax of the ID doesn't tell you if
it is valid. That will introduce use-after-free like conditions in your
system.

------
moksly
In Denmark we have a “newish” national standard for public service
architecture called rammearkitekturen. It’s an attempt to make IT easier and
cheaper in a country with the most digitised public sector of the world where
98 muniplacities each have 300 different IT systems on average.

Part of it includes a transition to UUIDs, and the way we deal with them is
slowly and in stages. Some systems, especially new ones are build to use them
as the standard identification, but some of our systems are 50 years old, run
on mainframes, tandem computers and what not, often with a range of APIs on
top of them. Others were designed with local non-standard UUIDs that would
work if the systems hadn’t been sold multiple times. And so on.

But the most basic way to get into them is by adding UUIDs for external use,
while the systems continue using their own ID system internally. Then
eventually replace internal IDs with the UUIDs when it becomes possible both
technically and financially.

This isn’t the cleanest approach and it’ll likely take a decade or two to
complete, but doing a Big Bang transition on an enterprise scale, well I
wouldn’t recommend it.

As for standards, go with the newest international standard on UUIDs for you
to part of the world. We follow EU.

~~~
rottyguy
are they stored as strings or 16byte blobs?

------
zzo38computer
I think there is not a problem using multiple kinds of UUIDs in the same
system; they will not interfere with each other, because the UUIDs are
necessarily different.

Another possibility is to use URIs; UUIDs are URIs too! (Put "urn:uuid:" at
front to make a UUID into a URI.)

~~~
rottyguy
This is an interesting idea. If I want to personalize the urn to my company's
uuid, would it be, urn:uuid:mycoid? can it be shorthanded to urn:mycoid?

~~~
zzo38computer
I do not recommend doing that. Instead, you may wish to either use the plain
"urn:uuid:" format, or use a different URI scheme (such as "http") with your
company's domain name. A company does not normally have only a single UUID;
you can make up additional UUIDs as needed. (If you want to avoid collisions,
you can use version 1 UUIDs.)

------
dingosity
If you're following RFC 4122, v4 and v5 UUIDs can be differentiated by looking
at the version number field. (See section 4.1.3 of RFC4122.) So if you're
worried about collisions between the two versions, as long as you're marking
your UUIDs with the appropriate version, this shouldn't be an issue.

For name-based or "truly random" UUIDs, you don't actually use a clock. That's
only for Version 1 UUIDs. (more info in section 4.3 of RFC4122.) They
unfortunately kept the names "timestamp" and "clock sequence" for both v4 & v5
UUIDs, even though there is no time-based information in them. Section 4.3 of
RFC4122 describes how bytes (octets) from the name-based hash function are
placed in the timestamp and clock sequence fields (for v3 & v5 UUIDs) and
Section 4.4 describes how random bits are placed in the various UUID fields.

In short, you don't need to know the time to generate a v4 or v5 UUID. Having
your servers synchronize their clocks is a good idea generally though; it
helps make sense of log files and some protocols freak out if there's too much
clock skew.

Using Version 4 UUIDs _will_ require you to have a "truly random" number
generator (or something close to it.) I wrote a node.js package for generating
UUIDs and made sure it gave the user the option of using /dev/random or
/dev/urandom or some other option (pretty sure I defaulted to /dev/random.) At
the very least you should know the difference between /dev/random,
/dev/urandom and /dev/arandom. I have used /dev/urandom as a random number
source, but they were only consumed by local processes (i.e. - i didn't give
them out to external clients.) So if we learned later there was a flaw in
/dev/urandom, the effects would not be exploitable by external actors.

If you're dealing with financial or PII data, there _may_ be regulatory
requirements on random number generation. Heaven help you if you're recording
credit card numbers in there somewhere.

It's probably hard to get detailed advice without knowing the content of the
data being stored and the context of it's use. But you can plan for the worst
by:

a. using UUID generation software that allows you to securely specify a
specific source for your (pseudo) random numbers.

b. understand that a UUID generator might block for an indeterminate amount of
time if you're forced to use a PRNG that waits to collect sufficient entropy.

c. here's my old UUID generating code. i don't recommend using it; it's just
too old. but it does give an example of using an interface that lets you
select the source of random numbers. it also probably doesn't work on windows:
[https://github.com/OhMeadhbh/node-mug](https://github.com/OhMeadhbh/node-mug)

~~~
Tomte
> So if we learned later there was a flaw in /dev/urandom,

Hard to imagine. On FreeBSD /dev/urandom and /dev/random are identical, on
Linux they used to be identical and are now so closely related that again it's
improbable that a flaw in one might not affect the other.

