
Ask HN: Why not use a higher base than Hex for UUIDs if they're just strings? - ralusek
If we&#x27;re storing and sending them as strings, why not take up less space by bumping up the radix to harness more available characters? Going from hex to Base64 alone would be a seemingly obvious place to start, since Base64 is already a well established radix specifically for leveraging the fact that strings have quite a few characters available to them.<p>With Hexadecimal, we get 4 bits per character. With Base64, we get 6 bits per character. To keep UUIDs at 128 bits, we&#x27;d go from 32 to 22 characters in length.<p>So why do we use hexadecimal representation when it is literally just a string, and could be anything? Even keeping the radix in powers of 2 is kind of unnecessary if these are being stored and transmitted as strings, the radix at that point just becomes a mechanism for allowing additional combinations. We could use any arbitrary number of characters per digit and it would serve that purpose. There are 91 ASCII characters available, so what gives?<p>Additionally, why the dashes? If we reserve digits for meaning per standard, such as UUIDv1 MAC address + time, why not just go off of digit position? In a string, that dash takes up just as much wasted space as the additional characters which are providing data utility.
======
ProblemFactory
> With Hexadecimal, we get 4 bits per character. With Base64, we get 6 bits
> per character. To keep UUIDs at 128 bits, we'd go from 32 to 22 characters
> in length. ... Additionally, why the dashes? In a string, that dash takes up
> just as much wasted space as the additional characters which are providing
> data utility.

Instead of base64, sensible databases and libraries store UUIDs as "base256".
Actual binary strings of 128 bits = 16 bytes, not printable ASCII. That is the
shortest possible representation, with zero wasted space.

The hexadecimal representation is only used for displaying UUIDs to humans. It
doesn't have to be efficient, it has to be helpful.

For UUID version 1, the dashes separate clock, counter and MAC address parts
in the data. With hex encoding it is easy to convert to/from binary and for
"popular numbers" even to decimal in your head without running it through a
baseXX converter.

For UUID version 4, where the data is all completely random, it's less useful.
All you can do with it is compare two UUIDs. So there perhaps the human-
friendly representation could do without dashes and use a larger base.

------
candeira
Bitcoin, Ripple, Flickr UIDs already use something called base58, which is
like Base64 but with ambiguous characters removed.

IPFS is one of several projects adopting the scheme.

[https://en.wikipedia.org/wiki/Base58](https://en.wikipedia.org/wiki/Base58)

------
viraptor
That's the common representation for easy readability and interop. In other
words - that's how the standard defines it. You can store the uuids in
whatever form you prefer. Even directly as binary chunks in your database.

In practice, saving those few bytes will usually be just noise compared to the
rest of the data (unless you're storing primarily uuids, but then you
should've chosen a better solution)

------
mattbillenstein
I think the point is it's a standard documented format -- you can probably
come up with something much shorter if you don't care about this particular
format.

We've been using a custom 12-character format where the first character is a
type and the last 11 characters is a random int encoded in a base-62 alphabet
(url safe, no punctuation, etc) -- this gives us ~17 billion billion
combinations -- almost 64 bits.

~~~
mattbillenstein
Correction, our alphabet is base56 -- basically base58 minus 1 and o...

------
tyingq
Base64 includes slashes, which is what bit Let's Encrypt.
[https://community.letsencrypt.org/t/may-19-2017-ocsp-and-
iss...](https://community.letsencrypt.org/t/may-19-2017-ocsp-and-issuance-
outage-postmortem/34922)

------
kevinherron
UUID (v4) are _displayed_ as hex strings with dashes.

They are really just a 128-bit number.

