
Show HN: UUIDs that are Shakespearean, grammatically correct sentences - debdut
https://github.com/Debdut/uuid-readable
======
orf
I built something like this for Python[1]. To get UUID-like uniqueness
requires often way too many words, even if the Shakespearean-style might make
this easier to remember (but also increadibly difficult for non-native
speakers to understand).

For human-id I just went with groups of the 100 most common adjetives, nouns
and verbs combined together in roughly the order they would appear in a
sentence.

The output is nonsenese, of course, but I hope the combination of words
understandable to anyone with a basic knowlegde of English might help.

1\. [https://github.com/orf/human_id](https://github.com/orf/human_id)

~~~
technofiend
I did something like this to autogenerate host names. Then a colleague pointed
out he'd done the same using dict words and got in some hot water due to
politically incorrect combinations. I punted and used the output of ls /etc,
/bin and /usr/bin, so hostnames end up like ifconfig-du-passwd instead. This
is just temporary to make them slightly easier to read and parse as a human
until we map serial number to asset name and then use that.

~~~
wutbrodo
> Then a colleague pointed out he'd done the same using dict words and got in
> some hot water due to politically incorrect combinations.

That's pretty hilarious

------
cube00
Interesting project however clearly I am uncultured swine if these are
considered "easy to remember", I wonder if it would work with something more
lowbrow. Simpsons anymore?

~~~
debdut
Easy if someone was reading and quoting Shakespeare, also easier than
remembering shortids or uuids

~~~
mercer
supercalifragilisticexpialidocius is easier to remember than shortids or
uuids, but that distincting is meaningless when 1) everyone spells it slightly
differently, and 2) who is Mary Poppins?

I love projects like these but it's a bit of a stretch to argue that this one
is practically useful.

------
reallydontask
What is the use case for having to remember UUIDs?

I can see how it might speed up writing say, SQL queries if you don't have to
look up the UUID but not sure that this warrants the effort of committing this
to memory.

~~~
treis
Human readable IDs make sense in a lot of places. Like if a customer wants
information about their order or looking up a user in a system. It's easier to
say and understand words than a long string of letters and numbers.

~~~
formerly_proven
I think in many instances it would make more sense to still use nominal
identifiers, but enrich them with semantic data.

For example, instead of invoice #306889086579, use an ID like R2020-06-14951.

~~~
chris_wot
In other words, a synthetic key. Bad move.

~~~
formerly_proven
You mean a partially natural key, because UUIDs are as synthetic as it gets,
and aren't exactly unproblematic for databases.

------
geostyx
Added this to my UUID as a Service API! [0] Just add ?readable to any
endpoint! Eg.
[https://uuid.rocks/json?readable](https://uuid.rocks/json?readable)

[0] [https://uuid.rocks/](https://uuid.rocks/)

~~~
crispyporkbites
What is this for?

~~~
geostyx
Fun, primarily. I also use it in some of my other projects for UUID
generation.

It's built on Cloudflare Workers so it scales fine and doesn't cost me
anything since I use Workers for other projects anyway.

~~~
crispyporkbites
I mean why would you want to generate a uuid on a remote service?

~~~
geostyx
Maybe you want a UUID within a shell script, CI/CD pipeline or orchestration
framework which has native support for HTTP, but not UUID :)

~~~
isp
Little known feature on Linux:

    
    
        cat /proc/sys/kernel/random/uuid

~~~
moviuro
Also:

    
    
      uuid -r
    

See [https://linux.die.net/man/1/uuid](https://linux.die.net/man/1/uuid)

~~~
war1025
What's the -r do? Output as raw bytes? I always just do `uuid -v4`

~~~
moviuro
-r is for "random". I've needed it for one of my scripts: if you generate many UUIDs at once on the same machine, there _will_ be duplicates.

~~~
war1025
I believe v4 is what you actually want [1]. By default it looks like `uuid`
generates a v1 id, which as you said will produce duplicates if you call it
too quickly.

Maybe your tool is different from the one installed on my machine because when
I use the `-r` option I just get garbage back. This leads me to believe it's
returning the uuid as raw bytes when I pass that option.

    
    
       $ uuid
       97af2898-cb54-11ea-8d75-176c10241ffd
       $ uuid -v4
       0b8d75f5-a148-466c-b1b5-9d9d1022c327
       $ uuid -r
       ����T꧚�����.
    

[1]
[https://en.wikipedia.org/wiki/Universally_unique_identifier#...](https://en.wikipedia.org/wiki/Universally_unique_identifier#Version_4_\(random\))

~~~
moviuro
I checked in my code, I actually used `uuidgen`, not `uuid`. Thus the mixup!

------
throwanem
You can get away with calling it "Shakespearean" when UUIDs map to rhyming
couplets in iambic pentameter.

~~~
chongli
And with mononymous characters. Shakespeare didn’t use any names as clunky as
these!

------
bawolff
>"Drucill Hubert Lewse the Comer of Avera rejoices Fiann Craggy Florie and 5
hard trouts"

This is considered shakespearean? Cool idea, but i think the implementation
needs some work.

------
ponker
You're going to have to copy and paste these anyway (no chance in hell of
remembering them) so don't see what they get you over an UUID.

~~~
innocenat
It probably help with recognition. You probably won't remember if you have
seen 'b7b05951-c3d3-4a7f-b65b-122e7d2543d4' before, but you are likely to
remember that you have seen 'Cathleen d Dieball the Monolith of Alderson
reflects Arly Arnie Keenan and 18 large ants' before (based on the chance that
there are no similar uuid)

And if also give you something to read in your head when you see it.

"Oh, you mean the 18 ants object?"

~~~
sneak
Unless the collision is adversarial. There is a tool that brute forces the
generation of SSH server key fingerprints to make the beginning and end look
like the one you’re trying to spoof, so someone who knows the first or last
few bytes would likely be fooled.

The point of the UUID is that it’s got enough entropy to be unique. If you
start reducing it to smaller slices of its entropy, it isn’t a UUID anymore.

I think projects like this that try to make high entropy things appear more
human-manageable do more harm than good.

~~~
dogma1138
UUID’s should not be used for “security” they aren’t hashes, or suited to be
used as encryption keys or any thing else. They don’t need to be kept secret
as part of their trust model.

There should never be a situation in which the disclosure of a UUID breaks the
trust or security model of your application your design is wrong.

------
swiftcoder
> it's impossible to remember 32 random characters in UUID

Is it, though?

Plenty of folks memorise 16-digit credit card numbers - I've known retail
employees who can recite those back after reading them just once.

Back when I was a sysadmin, I taught myself to type 25-digit Windows product
keys from memory.

32 digits doesn't seem an unreasonable stretch, given time and practice.

~~~
hedora
Studies show the difficulty increases rapidly after 7-10 digits (which is why
phone numbers are 7 digits, not counting area code).

Someone memorized 70,000 digits of pi:
[https://www.guinnessworldrecords.com/world-records/most-
pi-p...](https://www.guinnessworldrecords.com/world-records/most-pi-places-
memorised)

~~~
Veen
It depends on the person too. I struggle to remember six digits and forget
them in hours, but my partner somehow manages to instantly memorize credit
card numbers at a glance and retain them for months. I thought she was
tricking me at first, but I tested her and its genuine. I guess some people
are just gifted with very good memories.

------
gertrunde
There's also RFC1751
[[https://tools.ietf.org/html/rfc1751](https://tools.ietf.org/html/rfc1751)].

A slightly different take on the same sort of problem.

------
dutchmartin
Interesting project. Getting a reading in radio spelling is easy to do. But
getting correct sentences out of a random predefined length string is way
harder.

------
SwiftyBug
What does it mean for a sentence to be "Shakespearean correct"? Do the
generated strings necessarily contain 10 syllables?

------
aitchnyu
I've been ideating about a base-48 (KJNTPBMYRVSH vs AIOU) version of decimal
ids. Ids like "sunihuvi" or "panimaso" which are (hopefully) memorable and
easy to pronounce with any accent. 5 letters can encode till 254803968.

Algorithm wizards, can I get error correction if I dedicate a character for
that?

------
kej
This is a fun project, but for practical purposes I think the PGP Word List
[1] would be more useful. It encodes any bytes (UUID or otherwise) as common
words from a list chosen for distinct sounds.

[1]
[https://en.wikipedia.org/wiki/PGP_word_list](https://en.wikipedia.org/wiki/PGP_word_list)

------
PaulRobinson
Why would you want this?

The entire point of UUIDs is I can quickly generate them knowing that they
will be universally unique, I don’t need to check for their existence
anywhere.

This dramatically increases the likelihood of collision to the point I can
almost certainly guarantee that they won’t be unique in any non-trivial
context.

~~~
flingo
Where is the entropy going?

This still has all eight bytes, from the generated UUID, and looks like it can
be converted back into a UUID.

~~~
debdut
yes, I will add an inversion, it's a bijection

~~~
jackhalford
The space of all uuids is larger than the space of all shakespearean uuids. So
by definition you can't biject them. You may be able to on able for your
subset of uuids in your project, but there is less entropy when using the
latter.

edit: It seems the shakespearean uuids are longer than uuids so I may be
conpletely wrong

~~~
function_seven
I skimmed the code and it looks like there’s a hole for every pigeon. Space
size appears to be equal.

------
kieckerjan
If I had the stamina to learn Shakespeare by heart I would start with the Bard
himself, thank you. :-)

------
Kevin605
Interesting project!

However, an online demo will be nice to have.

~~~
dogdot
[https://npm.runkit.com/uuid-readable](https://npm.runkit.com/uuid-readable)

------
traceroute66
Yawn, why re-invent the wheel ?

UUID = --> universally unique <\-- identifier

Why reduce the entropy just to make it look pretty ?

As for the people who say oh, but I can't remember/recognise
"e0e93156-c68b-493d-bf31-19048db7dd9e"...

Well sure, but git invented that wheel before you. Just use the last 11
characters for local discussions/notes/whatever.

Finally, what about people who are non-native/fluent in English ?

"e0e93156-c68b-493d-bf31-19048db7dd9e" flows accross borders and languages

"Romeo Romeo Where For Art Thou Romeo" could easily be meaningless gibberish
for a non-native/fluent English speaker, and also opens up totally un-
necessary issues of pronunciation and spelling.

There is also the possibility certain words might mean something different in
another language. Such as the famous Colgate in Spanish which means "go hang
yourself" !

~~~
debdut
Sorry, but information or entropy is not lost, its a bijection, I will soon
add the inversion.

~~~
traceroute66
I'm not claiming to be the world's leading expert on UUIDs but you must be
loosing something somewhere ? If its not entropy, you must be opening yourself
up to collisions or something ?

~~~
antoinealb
Why ? It looks like those ShakespeareIDs (SIDs) are much longer than UUIDs. So
I am not sure how you reach the conclusion than the count of possible SIDs is
smaller than UUIDs. The author mentions its a bijection, meaning for every
UUID you can generate its corresponding SID and vice versa. This, in turn,
means there is no higher probability of collision in the SID space than in the
UUID one.

~~~
debdut
True

~~~
chris_wot
Forgive me if I’m misunderstanding something, but these aren’t UUIDs, but a
SID that maps to a UUID?

