
Proquints: Identifiers That Are Readable, Spellable, and Pronounceable (2009) - ingve
https://arxiv.org/html/0901.4016
======
aasasd
English pronunciation mentioned together with words “unique” and
“unambiguous”―is that a joke? Oh yeah, no way that people pronounce ‘u’ and
‘a’ similarly.

 _“There are 6 unambiguous 1-letter vowels”_ ―since when are vowels in English
unambiguous? English writing should properly be classified as logographic,
since you can't know how a word is pronounced from its spelling, or vice-
versa.

Has the author ever communicated his own middle and last names in a phone
call? A big benefit of numeric identifiers in the first place is that it's
easier to dictate them without error. Even in my native phonemic language,
with words or letters it's too often back to the Folks' Phonetic Alphabet:
_“No, D as in Dmitry.”_

~~~
yodon
The strings only have to be unique and pronounceable to be valuable, they
don't have to be uniquely pronounceable.

When scanning a list of records by eye looking for an entry or entries, it's
helpful for the identifiers to be pronounceable because that allows normal
people to hold more in their short term memory than they could if the strings
were all gibberish and each entry ended up consuming multiple short term
memory slots in the brain. Sure, there is a tiny chance of collision but that
chance is still there when scanning by eye and trying to find a GUID that
starts with the first few letters of the GUID you're looking for. At that
point you verify you actually have the one you want. And yes, there are lots
of occassions where people look at identifiers, that's why pronounceable
identifiers have value.

~~~
aasasd
I'm baffled as to what you imagine the use-case to be. Is that how e.g. IP
addresses are used, in your opinion―by finding them in an onscreen list? Or SS
numbers? Or credit card numbers?

~~~
zingermc
Replacing hexadecimal stack traces?

~~~
aasasd
As others mentioned, this is the least feasible use-case exactly because the
time window is short and the return for the introduction of the new scheme is
dubious.

------
lifthrasiir
See also the Bubble Babble encoding (2000) [1] that is essentially the same
idea. Note that they are only useful as _private_ identifiers because as
public identifiers you can't avoid them spelling profanity; many geocoding
systems entirely forwent vowels for that reason [2].

[1] [http://wiki.yak.net/589](http://wiki.yak.net/589)

[2] [https://github.com/google/open-location-
code/wiki/Evaluation...](https://github.com/google/open-location-
code/wiki/Evaluation-of-Location-Encoding-Systems)

~~~
femto113
I (re)discovered a similar scheme about 7 years ago for encoding numbers based
on the observation that 20 consonants + 5 vowels offers essentially the same
information density as base 10 numbers. Excluded 'c' because it seemed the
most redundant (with 'k' and 's'). I've found it particularly useful for
generating random temporary passwords that can be communicated verbally rather
than needing to be emailed.

[https://github.com/femto113/node-
pronounceable](https://github.com/femto113/node-pronounceable)

------
vsviridov
Interesting, how the example ip address conversions and up sounding vaguely
Turkish or Arabic...

~~~
twic
To me, they all sound like the names of lost cities from H. P. Lovecraft.

 _Not since the wretched dark sun last rose over Lusab-Babad, or the accursed
sorcerer-priests of Haguz-Biram last spilled human blood on their altars, had
such indescribable evil lurked in the shadows as now did amongst the filth-
ridden canals and dank buttresses of Budov-Kuras, Budov-Kuras which squatted
upon the forbidden ruins of Mudof-Sakat, even its name blotted from memory by
the horror of its crimes._

------
knubie
I think urbit uses something like this for it's address space.

~~~
gwillen
I find Urbit's significantly more pronouncable. I wonder if it was inspired by
proquint.

~~~
lifthrasiir
Urbit's "phonetic base" is much more pronouncable because it always allocate
carefully chosen three letters per octet [1].

[1]
[https://stackoverflow.com/a/38175707/225272](https://stackoverflow.com/a/38175707/225272)

------
_nalply
Another idea: find 256 different syllables and use that to encode 8-bit
numbers. This way we also can avoid profanity (except for the rare case where
a string of multiple harmless syllables expresses a profanity).

~~~
gfody
or find 65,536 words and use them to encode 16-bit numbers, then you could
leverage your existing word associations to memorize

------
renholder
NOTE: 2009 should be in the title.

> _Debuggers could input and output memory addresses as proquints as an
> alternative to hex._

...but why? In debuggers, you're not concerned with memorising the address of
an object as much as you are concerned with what the hell is mangling said
object. There's no "value-add" for proquints, here. If you were live-debugging
in a session with someone else, sure, but how often are you generally doing
that?

> _Network tools such as browsers, ping, netstat, traceroute, etc. could input
> and output proquints as an alternative to dotted quads._

Again, what is the value-add? Since I've known 4-octet IP addresses and
numbers my whole life, it would take more effort to translate from proquint,
to understand the output on the screen. Adding a translation layer in a cut-
over fashion instead of a phased-in fashion just seems like wholly unnecessary
overhead, only intended to further justify the proquint, yeah?

~~~
userbinator
Yes, that debugger example really confused me and made me wonder why he added
it. Anyone who has actually done live-debugging sessions will know that hex is
perfectly pronouncible, and furthermore is a direct mapping to binary.

~~~
renholder
True, even in post-mortem debugging (assuming I have the objects from the
heap), it's not like I'm going to exclude the addresses of objects in my
write-ups (or as they're exposed via commands, such as !sosex.mdso[0]).

In native, if you're on the stack (e.g.: not using pointers), then addresses
don't mean much of anything and if you're using pointers, memory optimisation
means the address could mean feck-all after a cycle or two (assuming you've
looking at an iDNA/TTT[1]).

So, to agree with you, who are agreeing with me: It is _quite_ odd that this
would be a principal argument for proquints.

[0] - [https://github.com/lowleveldesign/debug-
recipes/blob/master/...](https://github.com/lowleveldesign/debug-
recipes/blob/master/debugging-using-windbg/sosex.help.txt)

[1] - [https://docs.microsoft.com/en-us/windows-
hardware/drivers/de...](https://docs.microsoft.com/en-us/windows-
hardware/drivers/debugger/time-travel-debugging-record)

------
rsync
"The suggested optional magic number prefix to a sequence of proquints is
"0q-"."

That is interesting - when I designed "Oh By Codes"[1] I chose "0x" as the
prefix to identify that sequence.

My entire life, every time I have seen a hex number I have pronounced the
prefix in my head as "Oh By ...." \- hence the name.

[1] [https://0x.co](https://0x.co)

------
bitwize
What could possibly go
wrong?[https://news.ycombinator.com/item?id=10409837](https://news.ycombinator.com/item?id=10409837)

