
Base58 - mrzool
https://en.wikipedia.org/wiki/Base58
======
decentralised
Why base-58 instead of standard base-64 encoding?

\- Don't want 0OIl characters that look the same in some fonts and could be
used to create visually identical looking data.

\- A string with non-alphanumeric characters is not as easily accepted as
input.

\- E-mail usually won't line-break if there's no punctuation to break at.

\- Double-clicking selects the whole string as one word if it's all
alphanumeric.

[https://github.com/bitcoin/bitcoin/blob/master/src/base58.h](https://github.com/bitcoin/bitcoin/blob/master/src/base58.h)

~~~
athenot
This is a similar encoding as airline reservation numbers, except it also has
lowercase characters. But the requirements are very similar. It's amazing how
much confusion is removed by suppressing ambiguities between number 0 vs.
letter O and number 1 vs. letter I.

For reservation numbers, doing away with case makes it easier to speak the
number over the phone.

~~~
decentralised
It's amazing how user experience can be affected from this deep in the stack.

------
diggan
If you can't get enough of Base-encodings, there is a project called
"multibase"
([https://github.com/multiformats/multibase](https://github.com/multiformats/multibase))
that lets you encode which Base-encoding you are using, together with the data
itself, so it's easier to know what data you are received/sending.

The basic format is `<varint-base-encoding-code><base-encoded-data>` and the
"varint-base-encoding-code" comes from
[https://github.com/multiformats/multibase/blob/master/multib...](https://github.com/multiformats/multibase/blob/master/multibase.csv)

------
dtf
Not to be confused with Base85.

[https://en.wikipedia.org/wiki/Ascii85](https://en.wikipedia.org/wiki/Ascii85)

(used with various alphabets for denser binary encoding in PDF, git binary
patches, and the scene files of the Arnold renderer).

~~~
stcredzero
In my personal project, I'm using my own Base92 encoding. I wonder why Base85
stops at 85? In particular, you can implement Base92 with one table lookup and
one conditional exception. (Which could also be implemented as a table lookup
to save branch prediction misses.)

------
TazeTSchnitzel
Reminds me of Crockford's
[https://www.crockford.com/wrmg/base32.html](https://www.crockford.com/wrmg/base32.html)

~~~
nayuki
When the new SegWit feature was introduced to Bitcoin, it came with a new
address format called Bech32.
[https://github.com/bitcoin/bips/blob/master/bip-0173.mediawi...](https://github.com/bitcoin/bips/blob/master/bip-0173.mediawiki)

------
eknkc
Take a look at the node js library ‘base-x’

[https://github.com/cryptocoinjs/base-x](https://github.com/cryptocoinjs/base-x)

Which can do base58 and any base you throw at it using the same algorithm.

------
manigandham
Neat. Side note, there's another project called HashIds with many language
implementations for encoding integers (or array of integers) into simple
alphanumeric strings, and decoding back again.

[https://hashids.org/](https://hashids.org/)

------
adambrenecki
Another neat encoding is the base-20 character set that Open Location Codes
("plus codes") use:

> The characters that are used in Open Location Codes were chosen by computing
> all possible 20 character combinations from 0-9A-Z and scoring them on how
> well they spell 10,000 words from over 30 languages. This was to avoid, as
> far as possible, Open Location Codes being generated that included
> recognisable words. The selected 20 character set is made up of
> "23456789CFGHJMPQRVWX".

[https://github.com/google/open-location-
code/blob/master/doc...](https://github.com/google/open-location-
code/blob/master/docs/olc_definition.adoc#open-location-code)

------
wizawu
I've replaced UUID v4 with nanoid + base58 in the last several projects.

[https://www.npmjs.com/package/nanoid-
base58](https://www.npmjs.com/package/nanoid-base58)

~~~
GordonS
What's the advantage here?

------
ur-whale
Base58 isn't a great choice from an implementation point of view.

It basically requires a bignum library (or a specialized version that can do
divisions of arbitrary long numbers by 58), and conversion is slow as hell.

Either you end up pulling in a full bignum library, or you have to use error-
prone, clunky specialized code you don't have time to try and understand and
verify.

Of all the design choices Satoshi made, this is far from the best one.

~~~
eesmith
The implementations at
[https://github.com/search?q=Base58](https://github.com/search?q=Base58) don't
look like they use a bignum package, nor do that look particularly slow or
complicated. The main encoding loop for [https://github.com/luke-
jr/libbase58/blob/master/base58.c](https://github.com/luke-
jr/libbase58/blob/master/base58.c) is:

    
    
        for (i = zcount, high = size - 1; i < binsz; ++i, high = j)
    	{
    		for (carry = bin[i], j = size - 1; (j > high) || carry; --j)
    		{
    			carry += 256 * buf[j];
    			buf[j] = carry % 58;
    			carry /= 58;
    		}
             }

~~~
ur-whale
There's a divmod by 58 in there. That's dog slow compared to the shifts a
power of two basis would require - even when the compiler gets smart about it
because it's a constant.

~~~
ChrisLomont
That divmod can be replaced with table walking like CRC code does for division
and remainder of fixed values, which is certainly fast, and does not require
bignums. Unless you now want to claim all CRC code is also slow as hell or
requires bignums. These are solved problems.

Many compilers even expand the code inline to generate fix size divides into
mult and shift.

Here's a 4 year old Hacker news thread on compiler tricks then [1]. They're
even better now.

[1]
[https://news.ycombinator.com/item?id=8368639](https://news.ycombinator.com/item?id=8368639)

~~~
ur-whale
>Many compilers even expand the code inline to generate fix size divides into
mult and shift.

This is what I meant when I said:

>even when the compiler gets smart about it because it's a constant.

~~~
ChrisLomont
You also claimed it's dog slow, and earlier you claimed it requires a big num
lib. All of these are wrong. Many compiler/architecture combinations can
reduce the divmod to one asm instruction, that on some architectures is very
quick. If it is slower than shift/mult trick, that compiler can choose to do
that.

Go look at some decent implementations and the assembly they generate. Don't
spread unfounded or unchecked claims.

~~~
nkurz
It's OK for different people have different impressions about the speed of
dogs. On x64 for 64-bit registers, built-in assembly DIV is about 10 times
slower than MUL, and 10 times faster than a lookup from RAM. Whether this
makes it fast or slow depends on what you are trying to do.

I'm interested in your comment that "some architectures" have a fast divmod.
Do you actually know of one where it's faster than a shift and a
multiplication? Or the same? We're finishing up a paper where this would be
very relevant information.

~~~
ChrisLomont
Intel flavors often do both with a single idiv instruction. Agner Fog has
performance tables for many variants [1]. I’d guess a few pipeline to similar
per loop cost of shift and add.

I suppose if you’re writing a paper you’re aware of quite a bit of literature
on exactly this problem. Recent papers have quite fast methods to do this.
I’ve not looked at recent state of the art to see if 58 has a near zero cost
divmod, but numbers of many forms do. I’d not be surprised if state of the art
has it reduced to a few non-mem access branchless no division instructions on
most architectures.

Maybe later I’ll poke at 58 and see what I can design. I’ve made quite a few
such algorithms over the years.

[1]
[https://www.agner.org/optimize/instruction_tables.pdf](https://www.agner.org/optimize/instruction_tables.pdf)

~~~
nkurz
I feel comfortable with the x64 approaches, but am much less familiar with the
efficiency of other architectures. The paper also benchmarks ARM (which is
relatively faster) and Power 8 (which I don't understand well). I'd be
particularly interested in knowing if any other architectures are much faster
for division/modulus (which would weaken the paper) or much slower (which
would strengthen it).

I'm might be exposing my ignorance, but what's 58 in this context? Is this an
ARM Cortex series, or something else?

~~~
BeeOnRope
FWIW I don't think other architectures are significantly faster than x86 for
the div or divmod instruction. The GP said that "it can be reduced to a single
assembly instruction" which is certainly true, but this one instruction is
_dog slow_ and produces 30+ uops[1]. Which is exactly why compilers choose to
flip fixed divides to a multiplication and a series of corrections that often
add up to ~10 instructions but even then it's still worth it.

Divide is a hard problem and I'm not aware of any breakthroughs. It's been
getting faster, but it's still 10 slower (at least) than multiplication as you
point out. Intel has thrown hardware at it a few generations back, speeding it
up (it used to be over 100 cycles), and AMD has a fairly fast divider in Ryzen
as well (for many inputs faster than Intel).

I expect POWER to be in the same range. ARM architectures are all over the
power/performance map, obviously, but for the biggest/fastest, I would guess
they are within a factor of 2, either way, of contemporary x86
implementations.

\---

[1] This point is important: it means you can't get as much work done while
the division is happening. There are other slow instructions, such as floating
point div, sqrt, transcendental and trig functions. However, these functions
are only a single uop and then occupy only their respective execution unit for
the duration, so you can use the rest of the EUs at full speed. So if you only
need to do one every 50 cycles or so, and have work to do in the meantime,
they can be also free. idiv is not like that: it spews out many uops which
compete for the same EUs as the rest of your code.

~~~
ChrisLomont
IDIV on Skylake-X is 10 uops [1,p246], not 30+. Some other Intel chips are
even lower (Goldmont is 3 uops, for example, but slower overall, Piledriver is
2 uops for r32, etc.). IMUL is often 2-3 uops on the same chips.

However, uops are not the bottleneck, pipelined throughput is what matters for
performance. Multiple IDIV can be in flight at once on different register sets
(real or virtual). Register renaming gives even more room.

A compiler can unroll the loop, use different registers (or the CPU can use
renaming), and pipeline them, getting better throughput. r16 IMUL has a
relative throughput of 2, r16 IDIV then has a relative throughput of 6. This
is 3 times slower, not 10. Pretty much any other 10 instructions are going to
be slower, not faster as you claim.

Compared to the overhead of looping and memory access, these are not the
bottleneck. These represent a small amount of the time to do the computation.

Do some timing and check. IDIV is not what it once was.

[1]
[https://www.agner.org/optimize/instruction_tables.pdf](https://www.agner.org/optimize/instruction_tables.pdf)

~~~
BeeOnRope
To be clear, I'm generally talking about machine-width operations, in this
case 64-bit output, not smaller operations like 8 or 16 bits.

For Skylake, we have 36 uops and 35-88 cycles of latency for a 64-bit div, and
for idiv it is even worse: 57 uops and 42 to 95 cycles of latency. These can
execute one every 20 to 80 cycles, so the throughput is fully _twenty to
eighty_ times worse than 64-bit multiplication.

> IMUL is often 2-3 uops on the same chips.

On Intel chips multiplication is almost always 1 uop if you need a 64-bit
result (this is what a multiply in C will compile to), and 2 uops if you need
the full 128 result. In in either case it can sustain one every cycle.

> However, uops are not the bottleneck, pipelined throughput is what matters
> for performance. Multiple IDIV can be in flight at once on different
> register sets (real or virtual). Register renaming gives even more room.

How do you know uops are not the bottleneck? They often are: it depends on the
surrounding code.

More importantly, div is slow in every way that could matter: if latency is
the bottleneck, div sucks (20 to 80 cycles, vs 3 for multiply). If uop
throughput is the bottleneck, div sucks. If pure div throughput is the
bottleneck (as you are suggesting), div _still_ sucks: the inverse throughput
for 64-bit div is 21 to 83 cycles: almost the same as the latency, so it is
barely pipelined at all, and fully 21 to 83 times slower than multiplication.

Now one might say that you are mostly interested in 32-bit values, as these
are common even in 64-bit code, and Intel has good support for them.

In this case div is quite a bit faster, but still dramatically slower than
multiplication. In latency terms it is 26 cycles on Skylake, versus 3 for
multiplication, so about 9 times slower. In "pipelined throughput" terms, it
is 6 cycles per division, versus 1 per multiplication, so 6 times slower.

As a summary, we can say that on the most recent Intel, which probably have
the fastest dividers around (and which have seen recent improvements), 64-bit
division is anywhere from 11 to 90 times slower than multiplication, depending
on how you measure it (and the input values), and 32-bit division is between 6
and 9 times slower, depending on how you measure it. I think my "10 times
slower" which actually falls towards the faster end of that range is actually
quite conservative!

On chips older than Skylake (which is the vast majority of chips you'll still
find in datacenters and in the cloud, since SKX is quite recent), the
situation is worse for div, since mul had the same 1/3 latency/tput
performance, but div was slower.

> Do some timing and check. IDIV is not what it once was.

Here, I agree.

It _has_ gotten much faster! In other words, it has gone from shockingly slow
to merely very slow. It is still way slower than multiplication in every
respect (which itself has gotten faster to the point where it has 1-cycle
throughput now), and almost all of the tricks to avoid actual div instructions
still apply.

------
classichasclass
I'm actually using base 58 to encode numbers for the gopher URL shortner with
the Ripple alphabet. It works as advertised, and is great for an old protocol
like this.

gopher://fld.gp/1/gopher/shorten (shortened URL example: gopher://fld.gp/1j/r
or [http://fld.gp/1j/r](http://fld.gp/1j/r) )

------
adzm
I love interesting representations of binary data, base 58 included, but this
article is so scant on details. I'm sure someone could even add a brief
overview of the algorithm and samples and it's limitations and idiosyncrasies,
such as how it handles things like padding etc.

~~~
lifthrasiir
I think the article is good as is, because base58 is actually pretty
dissimilar to most binary-to-text encodings. Surprisingly many encodings are
just base 2^k (16, 32, 64 and 128 [1]), some have converged to base 85
(because 85^5 / 2^32 ~= 1.03, i.e. just enough to represent 4 octets in 5
units). This is because binary-to-text encodings are not expected to use
costly bigint operations for an arbitrary-length payload. Base36 and base58
are the only common exceptions, and for this reason I actually think that they
should be termed "bigint-to-text" encodings.

Given that base58 is actually just a big number with translated digits, the
list of alphabets (actually, digits) is almost enough to decode base58
encodings. There is always an exception, of course: Bitcoin base58 calculates
a minimal length from the original payload length so that it has to be
"zero"-padded with one (which is the first alphabet, thus acts as a digit
zero) [2]. This preserves the original payload length, but is still
quadratically expensive and should be used for a reasonably short, known-
length data like hashes.

[1] This is actually not normally viable as a binary-to-text encoding, but
base122
([http://blog.kevinalbs.com/base122](http://blog.kevinalbs.com/base122)) does
use this to exploit UTF-8 encoding.

[2]
[https://en.bitcoin.it/wiki/Base58Check_encoding](https://en.bitcoin.it/wiki/Base58Check_encoding)
(note that I've used a significantly simpler explanation here, the gist is
same however)

~~~
eesmith
"Base36 and base58 are the only common exceptions"

Isn't Ascii85/Base85 at least as common as those? Quoting the Wikipedia entry
for it, "Its main modern uses are in Adobe's PostScript and Portable Document
Format file formats, as well as in the patch encoding for binary files used by
Git"

Python includes it as part of the standard library. Using the example from the
WP entry:

    
    
      >>> s = r"""<~9jqo^BlbD-BleB1DJ+*+F(f,q/0JhKF<GL>Cj@.4Gp$d7F!,L7@<6@)/0JDEF<G%<+EV:2F!,
      ... O<DJ+*.@<*K0@<6L(Df-\0Ec5e;DffZ(EZee.Bl.9pF"AGXBPCsi+DGm>@3BB/F*&OCAfu2/AKY
      ... i(DIb:@FD,*)+C]U=@3BN#EcYf8ATD3s@q?d$AftVqCh[NqF<G:8+EV:.+Cf>-FD5W8ARlolDIa
      ... l(DId<j@<?3r@:F%a+D58'ATD4$Bl@l3De:,-DJs`8ARoFb/0JMK@qB4^F!,R<AKZ&-DfTqBG%G
      ... >uD.RTpAKYo'+CT/5+Cei#DII?(E,9)oF*2M7/c~>"""
      >>> import base64
      >>> base64.a85decode(s, adobe=True)
      b'Man is distinguished, not only by his reason, but by this singular
      passion from other animals, which is a lust of the mind, that by a
      perseverance of delight in the continued and indefatigable
      generation of knowledge, exceeds the short vehemence of any carnal
      pleasure.'

~~~
lifthrasiir
I have explicitly mentioned this:

> [...] some have converged to base 85 (because 85^5 / 2^32 ~= 1.03, i.e. just
> enough to represent 4 octets in 5 units).

Base 85 does _not_ use bigint, it is just a clever approximation of optimal
encoding with 85 symbols.

~~~
eesmith
I think I was confused between "baseX" as the specific program name and 'base
X' as the base of the specific numbering system.

------
_eht
I'm more of a fan of BasE91 and I believe more people should be using it.

~~~
gpvos
Why? It's only a minuscule improvement over ASCII85, which is a well-
established encoding. I couldn't find a rationale for it anywhere.

------
gallerdude
There are many dimensions to utility.

------
carapace

        oO0 I1l B8 S5 b6G

