Hacker News new | past | comments | ask | show | jobs | submit login
The UX of UUIDs (unkey.dev)
216 points by jlaneve 33 days ago | hide | past | favorite | 206 comments



> One way to enhance the usability of unique identifiers is by making them easily copyable.

No matter what your identifiers look like, if you want them to be easily copyable you should add `user-select: all` to the element containing them.

If you do this, all of the text will be selected automatically when you click on the element.

https://developer.mozilla.org/en-US/docs/Web/CSS/user-select


That's true, but there are a lot of places where ids live, but where I can't add `user-select: all`. For example, in terminal (logs), Studio3t (db client) etc.


I find double click to select word, triple to select line or buffer, in surprisingly many contexts. Does this work for you?


The whole point of this article is that double-click to select a word doesn't work with a UUID... though, I think this should just be fixed: I have XTerm set up where double clicking selects a word, triple clicking selects a filename (which can include a hyphen, but not a slash), quadruple clicking selects a URL or path, and quintuple clicking selects the rest of the line.


The workaround I've landed on is double-click and hold, then drag in the correct direction. Precise enough to just grab the UUID, imprecise enough to be quick and not annoying


A lot of terminals let you adjust the characters that break words. I’ve always configured my terminals and editors to treat `-` as a word character.


This is what I use, because it works almost everywhere almost exactly the same way, even when quite a lot else doesn’t have that quality. I similarly use long-press+slide for very similar behavior on iOS (although that has much more variance in behavior across apps, and sometimes even within the same app).


Adding clicks is not ideal, many people already have some troubles with double clicks (because they can't be fast enough, or it hurts), triple click is already harder.

Also more than 3 clicks starts taking a considerable about of time.


OK, right, yeah, they might have some word-break config that would need touching up. Thanks.


Now replace the double click, desktop, interaction, with press and hold, touch screen. I don't know of any triple click interaction that maps to a simplistic touch screen interaction.


The example used is a URL. I don't believe there is an equivalent for the address bar. Plenty of other examples exist, but that one is pretty easily reachable by users.


But you shouldn't be asking users to copy UUIDs out of your URLs... or really in general at all.


Why not?

> Customer: "I'm having issues with an order"

> CS: "Can you give me the order number?"

> Customer: "Sure, it's zero two a as in apple five..."

This seems entirely reasonable considering the whole point of TFA is to move away from using UUIDs as unique identifiers for resources in a service.


I’m inferring the comment you were responding to meant

“Asking the customer to copy/paste the UUID from a URL to send to Support via an email or chat.”

Rather than asking a customer to read it out to you.


API keys are basically just UUID's, they need copying quite often.


I like that some tools also provide a copy button next to fields, it looks like two overlapping boxes (kind of like: ⎘). If I were building a tool that exposed user copyable keys, I'd be sure to add these to my forms and fields. A brief contextual "copied!" modal that quickly fades away is a nice touch too.


Triple clicking has been around for a long-long time, no?


This is a neat trick but it will select the whole 'paragraph', which in most logs means the entire row. Triple click the following:

b214a2bb-c3c1-48fb-8272-10a2808337f3, something, ...

b214a2bb-c3c1-48fb-8272-10a2828337f3, something else...


Yep, double-click+drag is how I do it. On Linux double-click selects a chunk, and then dragging selects more chunks (fully). I double-click on the first or last chunk and go towards the other end.

Not as good as double-click selects all the id, but at least id doesn't take too much time and I don't have to precisely go to one of the ends.


I did not know about the double-click+drag feature. Thanks for sharing!


I've had computers since the mouse was commercialised and I didn't know this was a thing. Amazing, thank you.


I support all extension makers who strip user-select: none from all stylesheets.


How can I do that in my comment here? Say I want the word "double-click" to be simply selectable, right in this sentence.

I'm grateful to be informed about select-all, though!


Wow. TIL. I’ve been using JavaScript for that.


There is danger in that as well, you could be copying transparent/hidden text as part of the copy, and if you paste and <enter> without reading what you've entered, it can be dangerous.

That being said, most people copy and run bash script straight off the internet, so clearly not worried about copying stuff they haven't read!

e.g. https://bun.sh

   curl -fsSL https://bun.sh/install | bash


> That being said, most people copy and run bash script straight off the internet, so clearly not worried about copying stuff they haven't read!

The most common complaint about "pipe to bash" I've seen is the possibility for the server's response to detect it's being piped to bash, and then execute malicious code. The suggested remedy is to first download the install script (and check it) then run it. -- This seems overblown to me, since if you think the server may be malicious, then downloading programs from that server also seems risky.

Criticising people for not reading bash scripts from install pages is weirder to me. -- It's possible that some software author would hide malware in the install script; but, then why wouldn't they just hide malware in the installed program itself.


> This seems overblown to me, since if you think the server may be malicious, then downloading programs from that server also seems risky.

I heed the risk with the reasoning that even a benevolent server may be compromised, and that detecting pipe to bash is a potential way for that to go unnoticed.


Thanks, I was looking this up yesterday and only found a bunch of JS that didn't work.


In the browser, fine, but what about in the terminal?


Additionally, copy buttons are very nice.


> One way to enhance the usability of unique identifiers is by making them easily copyable. This can be achieved by removing the hyphens from the UUIDs,

No! That's throwing the baby out with the bathwater! Removing all separators means rare-but-important manual tasks of transcription or comparison become terrible, since there are no clear chunks.

Instead use a different character which doesn't have the same problem, one that most software considers part of the same "word"... such as the classic underscore.

For most people, double-clicking on this 123_456_789 will select all 9 important numbers. (And maybe a trailing space, but that's a separate problem.)


Speaking of which, why are we stuck with this terrible "trailing space selected" behaviour? It's not the case on all platforms, macos/ios perform fine and only select the actual word but Windows still includes the trailing space. There are posts online complaining about this going back 15 years at this point, it's super low hanging UX fruit.


When you hit the delete key once, the word and extra space are gone, nothing more to do.


Probably they tried to fix it in Windows, but it caused all your files to be deleted or something.

Like the coconut in TF2.


Is there an unambiguous, accepted, monosyllabic way to verbally speak the _ character?


Alas, no... however you might not need a sound if you can use tonal inflections and pauses to express the boundary instead. Particularly when chunks are short and when the receiver (or the software they're typing into) knows the format already... Although with a tech-illiterate relative you'll have bigger problems, like explaining what an underscore even looks like and where it is on their keyboard.

Obviously I can't fully express it in text here, but try to imagine this as a coworker speaking to you: "Hey, write down this IP address. It's ten, seventy, one twentyyyyyTWO, five."

They didn't actually say "period" or even "dot", but I bet you'd type 10.70.122.5 .


yes, but to confirm I'd repeat it back to them as "was that ten dot seventy dot one-twenty-two dot five?"

having a clear seperator helps me say the numbers faster


I've worked in IT (support, network mgmt and development roles) for 20 years, with colleagues, customers and clients from dozens of countries.

I've never once heard anyone drop out the dots in an IP. Non technical users aren't confident enough to do anything but read it exactly as it appears (one zero dot seven zero dot...) and technical users who are generally experienced enough to know what an IP address is, know that the dots are meaningful.


Generally, yeah.

If it's something like 56.7.23.231, I'm definitely going to disambiguate it by deliberately saying each one of of those three dots.

But if it's more like 192.168.0.1, I'm probably not going to bother with speaking any delimiters in conversation with another person who has at least reasonable familiarity with common IP networking layouts.

Bringing it back to the topic: UUIDs should not ever follow familiar content patterns (if they do, then that's an issue in and of itself), so I'm always going to speak the delimiters of a UUID -- whatever they consist of.

(If nothing else, doing so breaks up the pattern into human-digestible chunks -- which is probably the sole reason we have those delimiters in UUIDs to begin with.)


or you could say 172390917


No.

There also isn't one for "w", yet we get by with that as a letter.


> There also isn't one for "w", yet we get by with that as a letter.

Warning: Tangential rant ahead.

I'm teaching my toddler to read (Distar alphabet).

Even with the modified alphabet, it's a chore to "know" how to pronounce a letter.

'a' has at least 4 different pronunciations in words used by toddlers: apple, came, eat, bread.

All the vowels are like that, and even some consonants ('y' has at the very least: baby, yesterday, cycle, buy)

The only well-behaved letter in English is 'x': pronounced the same wherever you see it, as 'cks'[1].

[1] For toddlers, anyway. I doubt a 4-year old would be interested in LaTeX :-)


Unless it's at the beginning of a word, like xylophone?


The only rule in English is that there is an exception to every rule (including this one).


> Unless it's at the beginning of a word, like xylophone?

Well, we didn't cover sounding out of `ph` yet, and it isn't a toy he has, so thankfully it is not a word he uses.


> There also isn't one for "w"

There's dub.

Dub-dub-dub is pretty widely understood to mean www.


Unicode call it a lowline. PostScript calls it underscore and HTML says UnderBar.


If "underscore" gets tedious I just say "tac"

But I get that it's confusing with dashes.


It's a mess. In Georgian we say "lower dash". No short name in any language I know.


Screenreaders pronounce it "line".


I feel like "slab" would work.


nono, was it slab or slash [over a 8k bandwidth phone call]?


I would back "blank" as the most likely to be understood by the other person.


I'd understand that as space, but then I'm not a native speaker


> double-clicking on this 123_456_789 will select all 9 important numbers

True, but alas, on iOS I found that a double-tap selected only one of the digit groups.

In my view the touch interface UX is just as significant - perhaps even more so in recent years - given that the backdrop to many of these identifier format decisions is ensuring nontechnical end-user support, under time pressure, over possibly quite unreliable channels, goes as well as it can.

But look on the bright side, at least it didn't try to call the number


Text selection on iPhones has been unusable since iPhone OS 1. It's the one thing that's actually easier on Android.


Get a better baby for the rare cases, no need to always suffer


Or at LEAST make the dashes evenly spaced.

0000-0000-0000-0000-0000-0000-0000-0000


BitLocker does this and it's nice UX for walking someone through a recovery key over the phone.

Another VERY nice feature is it hashes each set of 6 digits as you type, so if you transpose one, you immediately get feedback instead of "invalid key!" after typing the whole thing out.


> hashes each set of 6 digits

nice, now i can dictionary attack any key, 6 digits at a time


I don't think that's how it works... It's a checksum, not letting you check if each section is part of the key.

At worst, the key would have some portion less entropy since there's a lot of bits used for checksums.


I don't think we're talking about the same problem here.

Regardless of how many dashes you have or how (ir)regularly they are spaced, to select the whole ID you must carefully click-drag-release around its boundaries, you can't just double-click anywhere in it to select.


use the character 'v' to separate sections? That would solve that problem, and it isn't a hex character

0000v0000v0000v0000v0000v0000v0000v0000


A couple other potentially desirable properties you could incorporate:

- K-sortable: ensures good locality when used as an id in a database (e.x. https://github.com/jetify-com/typeid https://github.com/segmentio/ksuid )

- checksum: primarily useful when an id might be conveyed verbally (e.x. customer support) or transcribed (e.x. Bitcoin wallet backup, BIP-39)


The bech32 format is a favorite of mine because it uses an alphabet that's designed to be unambiguous and its checksum is designed specifically to guarantee catching few character mistakes and make it possible to suggest where the mistake likely is. It also has a builtin human-readable purpose prefix at the front. Since it's all lowercase it also fits into the QR alphanumeric mode, which doesn't support mixed case so QR codes of bech32 IDs are more efficient.


An error-correction code inside the ID?! I love it.


UUIDs should not be used as database primary keys unless the DBMS recommends it or you have a well-studied special reason for it. Postgres and MySQL are meant to use bigserial by default, even Citus. Some special sharded DBMSes like Spanner need non-sequential pkeys, but even Spanner explicitly tells you to use uuid4 because k-sortable keys cause hotspotting: https://cloud.google.com/spanner/docs/schema-design#uuid_pri...


I understand the performance implications of using a UUID for a primary key. And if performance is your primary concern, then this is good advice for large tables.

But if I could go back 25 years and only give myself one bit of advice, it would be to use UUIDs as the primary key. Because in a different context to raw performance, it offers a lot of advantages.

While there are advantages in numerous areas, I'll focus on one for this post. The area of distributed data.

We started by running a database on prem. Each branch or store got their own db. 15 years later always-on networking happened. 15 years after that, all businesses have fibre.

So now all the branches use a giant shared online database. With merged data. Uuid based this task would be trivial. Bigint based, yeah, it's not.

Along the same timeline data started escaping from our database. It would go to a phone, travel around a bit, change, get new records, then come home. Think lots of sales folk, in places without reception, doing stuff.

So you're right in the context of a single database (cluster) which encompasses all the data all the time.

But in the context where data lives beyond the database, using uuids solves a lot of problems.

There are other places as well where uuids shine.

So as with most advice when it comes to SQL, I'd add "context matters".


When data lives beyond the database, you need a uuid, but it doesn't need to be your pkey. Even your typical backend-frontend app with a single DB will often send uuids over the API.

If you're copying a DB, mutating, then merging back in, you just have to reset the bigint pkeys. I can see how in some contexts that might be less convenient (or if merges are very frequent and reads are not, less performant), but that's a special case and not something to assume from the start. For example I've done merges like this before pretty easily with bigints, and I've also been in places where they start out with uuids pkeys then never benefit.


Bearing in mind that primary key, and clustered key are not necessarily the same thing, your point stands that the uuid does not need to be the clustered key.

Renumbering bigint primary keys, so as the effect a one-time merge, becomes substantially less trivial if the desire for minimal downtime, coupled with hundreds of related tables, and tens of sites are in play.


Yeah, I can see that


How do you know it would have worked out better with UUIDs? Did you load test it? What's the size of your dataset?


With bigint primary keys the process starts either taking the old site offline, and ends with bringing the new site online.

In-between is a non-trivial renumbering step, which takes measurable time that invalidates all existing backups.

By contrast uuid based databases do not need this step, and all existing data (some steady distributed, some in backups etc) remain valid.


UUID primary key remove hotspots; Sequence primary key increase locality.

Depends on your access pattern, you may prefer the other way, even on the same DBMS.


Yes, but that decision has to be well-researched.


I can't speak for PG but MySQL at least has a built in function to resolve the time ordering issue when storing v1 UUIDs (and a corresponding function to restore them to a valid UUID).


The CUID readme [1] explains that there's no real point to K-sortable on modern hardware:

[1] https://github.com/paralleldrive/cuid2?tab=readme-ov-file#no...


The CUID readme is wrong. You can safely ignore anyone who says "cloud-native" while discussing performance unless they're explaining why "cloud-native" architectures are often the worst of all possible designs for performance.

In postgres for example, full_page_writes (default on, generally not safe to turn off unless you can be sure your filesystem can guarantee it) means you have to write the entire page to WAL if you write one record. This will make your WAL grow way faster if you're doing random IOs. So right off that bat that's going to be a huge write impact.


What are the tradeoffs between typeid and ksuid?


The author seems to be unaware of TypeID. You can use TypeID and ignore this article.

https://github.com/jetify-com/typeid


Is this particularly widely used? I don't think I'm aware of TypeID either. I don't see why the author's pretty light solution is inferior to this library.


Because TypeIDs are compatible with UUIDv7 and are supported by libraries in many languages.


I'm a big fan of the similarly obscure TagURI for unique identifiers https://taguri.org/


Or rather, the article is the explanation of typeid?


Underscore has usability drawbacks.


Like what?


Depending on the font, it can collide with underlining (as in hyperlinks). I also once had a case where the dashed line a document viewer displayed for a page break hid underscores that happened to be on the last line of the page, causing the recipient to misinterpret the documentation.

In proportional fonts, underscores are generally wider than spaces, creating larger gaps between the underscore-separated parts than between the surrounding space-separated words. E.g. in "AAA BBB_CCC DDD", "AAA"/"BBB" and "CCC"/"DDD" are closer together than "BBB"/"CCC". In some fonts the difference is quite substantial. This makes for incorrect/unintuitive visual grouping.

You have to press Shift to type them. On mobile keyboards, underscore is usually one extra layer removed. For voice dictation, it's also longer than "dash" or "minus".


Regarding the last paragraph, people are reading these ids not (very rarely) writing them.


It's 'oldskool' _ although, frankly, if someone can't find it on a keyboard, they should be condemned to a life full of auto-correct errors.


So your IDs are now tightly bound to whatever "types" you've currently decided you have, forcing a narrow view of what an entity is and making the entire system extremely brittle to change? What is a "type" even supposed to be? This is forcing a doubling-down on an already problematic design principle: that every entity is exactly one type of thing and these types of things are completely different than those types of things and obviously you can just make the perfect set of types that will never change if you think think really hard and everyone will agree on what each type means and they'll never change and that will never be a problem.

Jesus, what a nightmare.


A quick glance of the repo shows that the "type" is just a prefix. You can do whatever you want with it. Basically the same thing as what the article suggests, no?


You face the same issues with table names though. So do you not name your tables?

The solution to your entity problem should be the same. You do the reasonable, practical thing, and rename/refactor if they drift away from the original mental concept.


> You face the same issues with table names though.

Exactly right. You've succinctly stated the biggest problem with almost all modern database design.

> So do you not name your tables?

I do, but that name does not represent a Type of Entity, where all entities therein are Exactly Thus, and all entities everywhere else are Absolutely Not At All Thus. Instead, it represents a statement I want to make about entities. Any "natural" meaning you put into your identifiers about what they are is defeating the point of the identifier.

> You do the reasonable, practical thing, and rename/refactor if they drift away from the original mental concept.

And now all of your IDs that are "in the wild" have expired. Can I still submit a request using the old ID, before you renamed Employee to WorkPerson and then to MobileLivingBeing and then to PossiblyMobilePossiblyLivingBeing? And it's not "if" they drift away, it's "when". And it's not just that they change over time, it's that they change from one perspective to the next. You can never have two distinct disciplines of the business ever referring to the same entity, because they don't agree on what the types mean. That bears repeating: you can never have two different disciplines both referring to the same entity unless they agree on what the types are, and they don't, because their terms have different meanings. Do your accountants and your maintenance people and your capital planning people and your corporate leadership all agree on exactly what a "facility" is? Because if they don't, they literally cannot even refer to the same entity. Good luck with your microservices.


All I'm getting here is you're against names and labeling things. But you don't provide any solutions and since a heap of untagged and unlabled or in any other way annotated data seem so obviously strictly worse I doubt that's what you're actually suggesting?


As I said: the name represents a statement I want to make about entities. Naming things is fine. I'm against categorizing things.


Related note: Amazon IAM credentials often look something like this (not real):

    aws_access_key_id = AKIA367COJQOEU3UOE
    aws_secret_access_key = a7Ed0F80a0AF6606/MQG3+4o/o
It's frustrating that you can select the access key by double clicking it but not the secret access key because of those / characters.


I agree, that is annoying. It doesn't work for me either. Nice that they specify "secret" in key name though to help prevent pasting the wrong one.


i keep a list of UUID reading and desirable properties here! https://github.com/swyxio/brain/blob/master/R%20-%20Dev%20No...


Nice list, found a couple projects I hadn't seen before.

My addition for your consideration: https://github.com/mik3y/django-spicy-id



It would be possible to bookmark and refer to individual articles if you'd used gist instead of github.


This is amazing thank you for sharing.


As someone who maintains a UUID library, this is definitely something that has been thought about, especially in the UUIDv6-v8 updates. But it was moved to be considered later as an extension after v6-v8 get approved fully.

But all these were talked about and considered before it was punted to a later time. https://github.com/uuid6/uuid6-ietf-draft/issues/27 https://github.com/uuid6/new-uuid-encoding-techniques-ietf-d... https://github.com/uuid6/new-uuid-encoding-techniques-ietf-d... https://github.com/uuid6/new-uuid-encoding-techniques-ietf-d...

But there is always TypeID in the meantime which uses UUIDv7 under the hood: https://github.com/jetify-com/typeid

Either way, I am in favor of prefixing and using alternative encodings, but it will need some time to figure out the best route. In the mean time, there are so many alternatives. TypeID, NanoID, ULID, etc. I even made my own quick one just for giggles: https://github.com/daegalus/snowflakes


> In our MySQL database we use IDs mostly as primary key

Clustered index with random data stored as chars as the PK, what a great time! You will surely not regret this decision later.


This is why UUID v7 is better to start with.

The author also compares to Stripe tokens, which is a strange comparison as you can see they also have a time component towards the beginning.


What do you recommend?


If you can’t model the table with a natural key (or it would be so large as to inhibit performance), then a simple, normal monotonic integer is best. MySQL even lets you use unsigned ints, so if you use a bigint, you can go all the way up to 2^64-1.

For those who think this doesn’t work in distributed systems, it absolutely does – PlanetScale uses them internally [0]. If what is likely the largest MySQL (under Vitess) cluster in the world can manage, yours can too.

If this is still untenable, then anything k-sortable (like UUIDv7, as the sibling comment mentioned) is a vast improvement over randomness. Don’t cause B+tree page splits, especially in an RDBMS with a clustering index like MySQL.

[0]: https://github.com/planetscale/discussion/discussions/366


Even if there's a natural key, serial bigint is usually the best. You can use unique indexes beside that without being stuck with them, and joins are faster on integer pkeys.


For performance, generally yes. Designing a purely relational model is satisfying, but theory often falls apart in prod.

It’s still a good idea, IMO, to think about table design starting as though you have a natural key (composite or singular). It helps develop the schema; you can then drop in a serial/identity/autoincrement column, and use the other relationships as FKs.


I agree. It also makes queries simpler. It's easier to handle

    WHERE id = 123
compared to

    WHERE site = 'HN' and username = 'hot_grill'
The last query is easier to write when you are querying the database manually, but I find the first more easy to handle programatically. It's easier to pass an argument from an URL or a message queue in this case.

As systems evolve, you may find that you need a third component to the natural key. If you don't use a simple id, you need to update every query that references the natural key


If you're querying manually, you can still do it the second way even if your pkey is the bigserial.


Problem with using base58 is that it uses 24 letters (excl 'I') so you can end up with 4 letter words that the marketing/PR departments does not like.

Hexadecimal is safe.


I like Crockford 32. It has more letters than hex, but is resistant to swear words.

https://www.crockford.com/base32.html


Resistant might be a strong word. I can see it only has E, A and Y as vowels which maybe helps a little for English as long as you're not the SATAN himself.


They did naz1 that coming.


It's also trivially easy to reroll ids until they don't contain a swear.


Collecting the dictionary of all swear words for all languages and their dialects might be less trivial. Keeping it up to date would probably take an institute worth of researchers.


And there's no "forward secrecy" - if a normal word becomes a swear, then it's often hard to go through and change all uuids.

Which is why the standard has been base16 (or base10) for so long.


But what if deadbeef becomes a swear?

/s… but only halfway


Modi is out of control.


What's more, unlike b58, it's case insensitive.

I usually favour b32 for IDs. There's also word encodings, but frankly those have more lewd combinations than they don't to a mind such as mine.


> resistant to swear words

No it's not.

Edit: downvoters, a tame example is 0x72b5473d5a567200.


I have no idea what that is supposed to say.


You take that hex and turn it into Crockford32 and it says eatmefatass00.


More importantly, Base58 is orders of magnitude slower than Base64/Base32/Base16 due to O(N^2) algorithms required to encode/decode it. Blockchain software is already trying to get rid of it. Adopting Base58 now would be shortsighted.


Well, mostly. Still have to be careful of l33tsp34k, where you can end up with things like b00b.


> where you can end up with things like b00b

Honestly, if seeing the character string "b00b" is a problem, you have bigger problems.


Maybe you have users which will spam one of your services until they get something that has b00b in it.


> Hexadecimal is safe.

Oh yeah?

ABADBABE B16B00B5 0B00B135 BEEFBABE CAFEBABE DEADBEEF

And a few others that I'm probably forgetting...


> Try copying this UUID by double-clicking on it

Nobody does this. Normal users don't even know this is a thing. I worked on an app that did something like this for placeholders in generated text, and in all our extensive testing and high-touch rollouts, we never saw anyone use it.

It's nice that you took the time to think about it, but it's not that important.


Literally everybody does this. It is actually a critical property of usable IDs!


How about when you long-tap on a touchscreen and it automatically selects the whole word?

Double-clicking is just one aspect of the wider principle, that is you make life easier for heuristics of all kinds.


I use it, but I’m just me.

What was the thing your product did?


Without getting too specific, it helped automate report writing for caseworkers by generating blocks of boilerplate text appropriate for each case, with placeholders for the particulars of the case.

Sort of like:

  The applicant is fully employed as a _PROFESSION_ at _NAME_OF_COMPANY_
Our designer excited told us how he specifically used underscores so you could double-click the placeholder and just type over it.


Exactly. It seems the way they use "users" there is much more accurately said as "Junior Devs", which takes your applicable pool from ~7billion max to "almost noone".

Quite literally not worth the trouble.


Can use ULID to "fix" some issues

https://github.com/ulid/spec


Ulid should wholly be deprecated now that uuidv7 is available.


Ulid have a short representation that uuid7 could use but doesn't define. Also, has UUID7 been standardized already? I thought it was still in the pipeline.


As a Microsoft SQL user (not everyone can choose their DB..), UUIDv7 has the issue that people will (understandably, but ignorantly) store it in "uniqueidentifier", which shuffles bytes around and are no longer sorted on time... ..

There is even a specific Microsoft SQL-time-ordererd UUID format which is sorted after byte shuffling..

We store ULID in binary(16). Works nicely. Only difference from UUIDv7 is the version bits..


Couldn't you put UUIDv7 in byte(16) and get the same feature?

I found today a PHP UUID library that does one they call UUIDv8 which has more time (like ULID)


Yes, one could. The problem is most programmers who are not aware of this (probably most), will see the UUID and make assumptions that uniqueidentifier type is the most suited one.

They are less likely to take a ULID and store it in binary(16). (Might store it in char(26), which is less storage efficient but is still sorted.)

Apart from that I happen to LIKE the fact that I can very quickly see the difference between a random identifier and a sorted identifier, they have quite different characteristics after all.. Although I guess people might eventually get used to the initial bytes being 0 on UUIDv7, or just learn to recognize the version bytes..

On that topic, why are the 8 fixed bits of a UUID not concentrated on the same byte. Perhaps the first one. Huge mistake..


ULID has many advantages over UUIDv7, including (but not limited to) higher timestamp granularity, and more usable canonical string representation.


Ok wow this blog post was really illuminating. Thanks for pushing back on my preconceived idea that UUIDv7 is the 'official' version of ULID

https://blog.daveallie.com/ulid-primary-keys


Why? It does not suffer from some of the UX issues this article discusses.


I built cybertoken [1] for API keys and passwords, not (only) IDs. It is basically the format that GitHub uses for their api keys. Underscores, a prefix, so we can get a better debugging experience and automated secret scanning. It also has a CRC32, so you can check offline if the token candidate is a cybertoken while doing secret scanning.

[1]: https://github.com/nikeee/cybertoken


I'd rather focus on optimizing the size of uuids than their text representation. Shipping uuids around as utf-8 text is silly. They're at most 128bits, so we shouldn't use more than that many.

In the case where it's necessary to have a text representation (e.g. in some user interface) I guess it's fine to choose whatever (stable) transformation you like, but the standard ways specified in RFC-4122 (hexadecimal, with or without hyphens) seem like the most foolproof. Regarding the logs search use case, the first "chunk" of a uuid is usually well more than enough for a unique match, IME.

Also, I've been burned before in cases where some clever transformation was used to make a uuid look different in text form, because in order to synthesize the actual binary uuid I first have to reverse engineer the transformation--e.g. to find the database record corresponding to some http request log message. That's just annoying, and the polar opposite of "user friendly" for the user story of an engineer trying to figure out what's wrong with the system.

So.. I guess my vote is to stick to the standard.


that's where I arrive too


"Let’s not pretend like we are Google or AWS who have special needs around this. Any securely generated UUID with 128 bits is more than enough for us."

Thank you. This is overthought so much, including with partially-random things like uuid3, 5, 7.


You need them when you don't have a quick, secure random source.

Like virtual machines or embedded systems.


If you don't have RNG, it's natural to think about a non-random UUID.


An even better UUID UX would cycle through colors when you clicked one and then would overlay the assigned color when you see that same UUID elsewhere. Better to find a needle in a haystack if it's the only one with a pink background.


With any of the entropies mentioned, it's not like you'll find 2 same ids in any pagefull of ids for this to ever matter...

It's more like, you used id X in your db and 8 months later another X lands, after billions and billions of rows have been inserted...


We're not looking for duplicates because we're afraid that probability has failed us. We're looking for duplicates because we have a thing of interest, with a corresponding UUID, and we want to notice where else that thing is involved.


It's more for tx ids or user ids,which are likely to be used multiple times in logging.


base58 is case sensitive which hinders readability. When devs work with uuids they typically remember the first few letters ("this is guid abc", "that one is guid 1ac"). Hard to do that in base58.

The coarsest encoding to have this property is Base32 where it remains easy to memorize first few letters without needing to memorize case.


> TLDR: Please don't do this: https://company.com/resource/c6b10dd3-1dcf-416c-8ed8-ae56180...

But like... why? This article literally does not explain the benefits beyond copying, they are just assumed. I'm not immediately sold on shorter === better, especially when the updated UUIDs are only marginally shorter and you have now introduced the overhead of a translation layer for one of the most basic building blocks in your application.


I thought it was fairly well reasoned in the article.

Let's say you have a customer with that UUID as their ID. Do you expect them to recite their UUID perfectly to you every time? What if they made 5 transactions, each with their own UUIDs and you need to look them up, do you now expect them to read out 6 fairly unwieldy IDs?

The article is about the UX of UUIDs. Yes, there's a translation layer and more dev work to implement, but the shorter size and use of non-ambiguous characters is a massive improvement in the usability for the end users.


You've added an overhead where it had trivial costs and limited to be handled by professionals to reduce an overhead where it's annoying many more people, including poor customers

Also, you don't explain why copying isn't good enough, you just reject the article's reasons


Why would readability of a UUID matter? At most, users should by copy-pasting them, not reading them or trying to memorize them, so why should I and l and 1 looking similar matter?


Well, point one of the article is that they're unnecessarily hard to copy-paste.


And the rest of the article talks about what GP is asking about


The author mentions UUIDs are hard to copy/paste because of the hyphens.


Right, and that is a fine idea to get rid of the hypens(personally, I just triple click) - I'm talking about the next section.


The easy fix is underscores.


It could help when you're manually inspecting a list of UUIDs?


Here is a thing I wish would Just Work, everywhere.

Given:

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.

I'd like to, say, double click on consectetur to select it (which works), bt then, while holding Shift, I would like to double click on elit so that the selection of consectetur is preserved, and extended over elit, including adipiscing.

The behavior I typically see is that, while holding Shift, the first click will extend the selection to the exact character of elit that I'm pointing at. The second click will then cancel the selection and select all of elit.

Like it doesn't mean a damn thing that I'm holding down Shift!

Ironically, I can make multiple selections that way using Ctrl in Firefox; Ctrl does modify the semantics of the second click while there is a selection.


We need to separate the storage format (Postgres and MySQL both have a bigint serial for primary key and it should stay that way)

for display, you have various ways to encode that number into something easier for humans, what I prefer:

- short word - no offensive words - with a checksum (so we easily spot any copy paste mistake) - not sequential - can be put in a url without extra encoding

this is our implementation of that (base32 and luhn code)

https://github.com/tttp/dxid


> This can be achieved by removing the hyphens from the UUIDs, allowing users to simply double-click on the identifier to copy it.

Triple click can be used.


Q: What's special about the format of UUIDs compared to, say, an equivalent entropy 128-bit number? For many use cases, the hyphens appear to be utterly irrelevant.

https://softwareengineering.stackexchange.com/questions/3855...


This is a great point. It's unfortunate that few engines and standard libraries implement base58 encoding/decoding. This is probably because it is a more complex format because it does not include certain ambiguous letters like l and O. Base64 is relatively straight forward to implement as it maps easily from a number to a character.


For user facing IDs I made base24 which limit the alphabet further to made it case insensitive.

https://www.kuon.ch/post/2020-02-27-base24/


TypeScript template literal types can be used to type-check IDs with prefixes: https://hw.leftium.com/#/item/39174998


If you expose UUIDs to the user, even if just as part of the URL, encode them as shortUUID: https://github.com/skorokithakis/shortuuid


I’ve always felt that Crockford Base 32 was very ergonomic. http://www.crockford.com/base32.html


Do they really have to be that long? And why can't they just be 4-4-4-4 (I'm sure someone knows). In hobby apps I often .split('-')[0] lol the first 8 is fine (so far)


Good points, wish they were specced into the original uuids so you don't need to postprocess (except for prefixes add those are more user-specific)


You also can't easily copy the word "double-click" by just double-clicking on it.

Maybe the browser (and other) UI is wrong about word boundaries.


Isn’t there some encoding that encodes bytes as words, or pronounceable letter groupings?


The UX of UUIDs... should not exist. There are so many great ways to improve UX that don't involve overloading or mangling an internal primary identifier that otherwise follows a standard structure.

Use immutable human-readable identifiers like "slugs" and/or "natural keys" in addition to robust primary keys.

> TLDR Please don't do this: https://company.com/resource/c6b10dd3-1dcf-416c-8ed8-ae56180...

That URL is fine, it should just be a 30X to https://company.com/stuff/cute-slug and you should use the user-friendly URL where possible.


User-friendly URLs with slugs are useful for blogs and other things commonly linked to. But for many other entities, links will be rare, and the central coordination required would just be a waste of time.


Great, then this isn't a problem.

There is no "central coordination" required to set up URL redirects, by the way.


UUIDv7 is great.

The PGP Word List for translating hex into words: exists.


Why is Base58 used instead of Base64?


It removes the ambiguous characters (I, l)


I at first misread "TLDR; Please don't do this:" as applying to the entire article. I read it expecting a kind of McSweeney's parody where it gives you terrible advice. Really confused because the advice was good and not funny.


Haha, same here. The UX of TLDRs…


Except that UUIDs by themselves don't do this at all:

> They provide a reliable way to ensure that each item, user, or piece of data has a unique identity.

It is the registration of a UUID in a database which prohibits reuse that does that. If you aren't do doing that, ensuring that each use of a UUID is not a reuse of an already assigned UUID, they are not UNIQUE.


In the real world, the likelihood that a bug in your database engine or ID-hander-outer (a race condition, storage edge case or the like) is a lot higher than the risk of collision on a sufficiently large and random key.

The whole point of using UUIDs is that you can generate them locally without central coordination--if you want to coordinate your identifiers, you can use a much friendlier ID length (which is explained in the article).


But you still need to be wary of malicious collisions. I have seen security vulnerabilities where the client generates a UUID ID and that was inserted into the database. However by picking an ID that corresponded to objects from other users it was possible to gain some access to those objects.

So any UUID coming from an untrusted source (like a client application) should be checked for uniqueness. However your client apps can be written assuming that their randomly generated UUIDs never have collisions.


Why would you let the client generate the IDs? If you have to check them anyways, just generate them on the server and only give the ID to the client.


It can be very useful for offline work and latency hiding. For example the client can generate data structures with the final IDs then sync them to the server. This can also be useful for implementing idempotent updates.

The alternative of having some sort of "placeholder ID" until the sever gets back to you (or you get back online) adds a lot of client complexity.


Can't you just give the client a list of generated IDs on the first request and check if the IDs are from the pregenerated IDs afterwards? That should be a lot cheaper than checking all IDs you ever generated.


You can do that I suppose. But I have found that in most cases I have a unique index on the ID anyways, as they are often table primary keys. So really I just have to ensure that it is either an INSERT or an UPDATE to a record that is owned by the user.


Then you don't really have the problem this is about though, and which UUIDs are supposed to solve. The real pro of UUIDs is that you can issue them in distributed situations where you can't look up the already used IDs easily.


If you ensure each UUID is generated to spec, using a good source for the random bits, they are astronomically unlikely to collide without any coordination. Choosing the same 124 bits at random just doesn't happen by chance.

Of course you have to deal with the implications of people intentionally colliding UUIDs, so maybe don't generate them client-side.


One of the major motivations for UUIDs is that you can generate them in a decentralized fashion, without central registration, with a very high degree of confidence that they will not be repeated, a very important feature in distributed systems where you don’t want to rely on a either a central point of failure or the need for distributed consensus just to generate an ID for data elements.


You'll see your first duplicate already-assigned v4 UUID (anywhere on earth) in a few thousand years.


that is not correct and borders wrong.

a reason to use uuid (eg v4) is that you can generate id's distributed without fearing collisions. it can happen, but is not likely.

so the uniqueness comes from a property of the ID and not the database.


Did you read the full article?

"Reducing the length of your IDs can be nice, but you need to be careful and ensure your system is protected against ID collissions. Fortunately, this is pretty easy to do in your database layer. In our MySQL database we use IDs mostly as primary key and the database protects us from collisions. In case an ID exists already, we just generate a new one and try again. If our collision rate would go up significantly, we could simply increase the length of all future IDs and we’d be fine."


Strictly speaking you can't be sure that your UUIDv4 isn't (by pure luck) also in someone else's database, so it's not guaranteed to be universally unique. It's just very, very likely to be so.


For some value of "strictly speaking", this is true; but it's not a very relevant value. You can't be "sure" your counter never produces a duplicate, either--strictly speaking--in reality. The bug-free program or the computer that isn't affected by external factors is like the friction-free surface: sometimes useful to think about but not something that exists in reality. And the likelihood of a cosmic ray causing a bit flip or a race condition in the way your counter updates is a lot higher (as in, we see it happen all the time) than the theoretical likelihood of a collision on a sufficiently large and random key.


> by pure luck

No, even that is not true. If all digital (and non-digital) storage media ever manufactured by humans - meaning all hard drives, tape drives, CDs, DVDs, BluRays etc. ever manufactured to date, and every book and word ever printed or written down. If those ALL were only filled in with UUIDv4s generated from a good random source .. you would still not see even one single collision!

UUID collisions are only possible with currently known human technology if your randomness source is not good enough. And it will remain so unless there are some astronomical leaps in digital information storage technology - at least 10 orders of magnitude more storage than currently exists.

EDIT: I thought of a way for programmers to mentally visualize how unlikely UUID collisions really are. Let's imagine that in some not-too-distant future, there are 10 billion people on Earth. Each of them are given one thousand CPUs. These CPUs have 1024 cores each, and they run at 10 GHz (clock cycles per second). The CPUs implement a hypothetical instruction that can generate a totally random UUID in one clock cycle.

As an experiment, all people on Earth one day decide to program all their thousand CPUs each to run a tight loop that will indefinitely generate UUIDs on all 1024 cores and then immediately discard them.

After continuing to run this experiment (whose electicity bill will make Bitcoin look like Earth Hour) all day, 24/7, for about 800 years, the likelihood of one UUID ever having been generated twice will have exceeded 50%.


I'm sorry to say that your analysis is wildly incorrect.

- 10 billion people =~ 2^33

- 1000 CPUs =~ 2^10

- 1024 cores =~ 2^10

- 10 GHz =~ 2^33

So: one second's computation by all of these people is 2^86 UUIDs generated. UUIDs are 128 bits. With probability essentially 1, there will be a collision within one second.

The reason is known as the birthday paradox. If you sample random values from a set of size k, after you've chosen about sqrt(k) values you will have chosen the same value twice with probability very close to 1/2. By 10*sqrt(k) samples you'll have found a collision with probability well over 90%.

In this case, after sampling 2^64 values you'll have a collision with probability 1/2. That happens in roughly 250 nanoseconds (2^-22 seconds) in your thought experiment.

2^64 sounds like a lot, but in many contexts it's not all that much. Every bitcoin block mined takes well in excess of 2^70 SHA evaluations. Obviously the miners are not dedicated to generating UUID collisions, but if they were they'd easily find thousands of them in the time it takes to mine one block (this neglects the fact that it is much easier to sample a UUID than to evaluate double-SHA256).


> The reason is known as the birthday paradox.

Right. Forgot about that little thing. You're absolutely correct.

With this taken into account a _single_ person with 1000 pieces of 1000-core, 10 GHz CPUs generating UUIDs will generate a match in a few minutes.


What part of "universally unique" in Universally Unique Identifier (UUID) you don't understand? They are specifically designed so that you can generate them locally without fearing collisions. That's like.. their entire point.


That's an aspiration, not a guarantee.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: