Hacker News new | comments | show | ask | jobs | submit login
Base58 (wikipedia.org)
95 points by mrzool 9 days ago | hide | past | web | favorite | 43 comments

Why base-58 instead of standard base-64 encoding?

- Don't want 0OIl characters that look the same in some fonts and could be used to create visually identical looking data.

- A string with non-alphanumeric characters is not as easily accepted as input.

- E-mail usually won't line-break if there's no punctuation to break at.

- Double-clicking selects the whole string as one word if it's all alphanumeric.


This is a similar encoding as airline reservation numbers, except it also has lowercase characters. But the requirements are very similar. It's amazing how much confusion is removed by suppressing ambiguities between number 0 vs. letter O and number 1 vs. letter I.

For reservation numbers, doing away with case makes it easier to speak the number over the phone.

It's amazing how user experience can be affected from this deep in the stack.

It might be interesting to see a form of encoding that removes similar-sounding letters/numbers as well. No more "A" or "J", "B" or "D", "3" or "T" confusion. Just stuff with a distinct sound!

Crockford base32?

Knocking out similar characters is a good idea but could be done to base32 to make it a little handier. I don't feel adding case sensitivity is worth a 15% reduction in string length.

If you can't get enough of Base-encodings, there is a project called "multibase" (https://github.com/multiformats/multibase) that lets you encode which Base-encoding you are using, together with the data itself, so it's easier to know what data you are received/sending.

The basic format is `<varint-base-encoding-code><base-encoded-data>` and the "varint-base-encoding-code" comes from https://github.com/multiformats/multibase/blob/master/multib...

Not to be confused with Base85.


(used with various alphabets for denser binary encoding in PDF, git binary patches, and the scene files of the Arnold renderer).

In my personal project, I'm using my own Base92 encoding. I wonder why Base85 stops at 85? In particular, you can implement Base92 with one table lookup and one conditional exception. (Which could also be implemented as a table lookup to save branch prediction misses.)

When the new SegWit feature was introduced to Bitcoin, it came with a new address format called Bech32. https://github.com/bitcoin/bips/blob/master/bip-0173.mediawi...

Take a look at the node js library ‘base-x’


Which can do base58 and any base you throw at it using the same algorithm.

Neat. Side note, there's another project called HashIds with many language implementations for encoding integers (or array of integers) into simple alphanumeric strings, and decoding back again.


Another neat encoding is the base-20 character set that Open Location Codes ("plus codes") use:

> The characters that are used in Open Location Codes were chosen by computing all possible 20 character combinations from 0-9A-Z and scoring them on how well they spell 10,000 words from over 30 languages. This was to avoid, as far as possible, Open Location Codes being generated that included recognisable words. The selected 20 character set is made up of "23456789CFGHJMPQRVWX".


I've replaced UUID v4 with nanoid + base58 in the last several projects.


What's the advantage here?

Base58 isn't a great choice from an implementation point of view.

It basically requires a bignum library (or a specialized version that can do divisions of arbitrary long numbers by 58), and conversion is slow as hell.

Either you end up pulling in a full bignum library, or you have to use error-prone, clunky specialized code you don't have time to try and understand and verify.

Of all the design choices Satoshi made, this is far from the best one.

The implementations at https://github.com/search?q=Base58 don't look like they use a bignum package, nor do that look particularly slow or complicated. The main encoding loop for https://github.com/luke-jr/libbase58/blob/master/base58.c is:

    for (i = zcount, high = size - 1; i < binsz; ++i, high = j)
		for (carry = bin[i], j = size - 1; (j > high) || carry; --j)
			carry += 256 * buf[j];
			buf[j] = carry % 58;
			carry /= 58;

There's a divmod by 58 in there. That's dog slow compared to the shifts a power of two basis would require - even when the compiler gets smart about it because it's a constant.

That divmod can be replaced with table walking like CRC code does for division and remainder of fixed values, which is certainly fast, and does not require bignums. Unless you now want to claim all CRC code is also slow as hell or requires bignums. These are solved problems.

Many compilers even expand the code inline to generate fix size divides into mult and shift.

Here's a 4 year old Hacker news thread on compiler tricks then [1]. They're even better now.

[1] https://news.ycombinator.com/item?id=8368639

> Many compilers even expand the code inline to generate fix size divides into mult and shift.

Specificly, u8 divmod 58 can be reduced to a u8->u16 multiply, a right shift, and three conditional subtractions; that's not great, but on a modern CPU it's a afterthought compared to the quadratic loop over the input size.

>Many compilers even expand the code inline to generate fix size divides into mult and shift.

This is what I meant when I said:

>even when the compiler gets smart about it because it's a constant.

You also claimed it's dog slow, and earlier you claimed it requires a big num lib. All of these are wrong. Many compiler/architecture combinations can reduce the divmod to one asm instruction, that on some architectures is very quick. If it is slower than shift/mult trick, that compiler can choose to do that.

Go look at some decent implementations and the assembly they generate. Don't spread unfounded or unchecked claims.

It's OK for different people have different impressions about the speed of dogs. On x64 for 64-bit registers, built-in assembly DIV is about 10 times slower than MUL, and 10 times faster than a lookup from RAM. Whether this makes it fast or slow depends on what you are trying to do.

I'm interested in your comment that "some architectures" have a fast divmod. Do you actually know of one where it's faster than a shift and a multiplication? Or the same? We're finishing up a paper where this would be very relevant information.

Intel flavors often do both with a single idiv instruction. Agner Fog has performance tables for many variants [1]. I’d guess a few pipeline to similar per loop cost of shift and add.

I suppose if you’re writing a paper you’re aware of quite a bit of literature on exactly this problem. Recent papers have quite fast methods to do this. I’ve not looked at recent state of the art to see if 58 has a near zero cost divmod, but numbers of many forms do. I’d not be surprised if state of the art has it reduced to a few non-mem access branchless no division instructions on most architectures.

Maybe later I’ll poke at 58 and see what I can design. I’ve made quite a few such algorithms over the years.

[1] https://www.agner.org/optimize/instruction_tables.pdf

I feel comfortable with the x64 approaches, but am much less familiar with the efficiency of other architectures. The paper also benchmarks ARM (which is relatively faster) and Power 8 (which I don't understand well). I'd be particularly interested in knowing if any other architectures are much faster for division/modulus (which would weaken the paper) or much slower (which would strengthen it).

I'm might be exposing my ignorance, but what's 58 in this context? Is this an ARM Cortex series, or something else?

FWIW I don't think other architectures are significantly faster than x86 for the div or divmod instruction. The GP said that "it can be reduced to a single assembly instruction" which is certainly true, but this one instruction is dog slow and produces 30+ uops[1]. Which is exactly why compilers choose to flip fixed divides to a multiplication and a series of corrections that often add up to ~10 instructions but even then it's still worth it.

Divide is a hard problem and I'm not aware of any breakthroughs. It's been getting faster, but it's still 10 slower (at least) than multiplication as you point out. Intel has thrown hardware at it a few generations back, speeding it up (it used to be over 100 cycles), and AMD has a fairly fast divider in Ryzen as well (for many inputs faster than Intel).

I expect POWER to be in the same range. ARM architectures are all over the power/performance map, obviously, but for the biggest/fastest, I would guess they are within a factor of 2, either way, of contemporary x86 implementations.


[1] This point is important: it means you can't get as much work done while the division is happening. There are other slow instructions, such as floating point div, sqrt, transcendental and trig functions. However, these functions are only a single uop and then occupy only their respective execution unit for the duration, so you can use the rest of the EUs at full speed. So if you only need to do one every 50 cycles or so, and have work to do in the meantime, they can be also free. idiv is not like that: it spews out many uops which compete for the same EUs as the rest of your code.

IDIV on Skylake-X is 10 uops [1,p246], not 30+. Some other Intel chips are even lower (Goldmont is 3 uops, for example, but slower overall, Piledriver is 2 uops for r32, etc.). IMUL is often 2-3 uops on the same chips.

However, uops are not the bottleneck, pipelined throughput is what matters for performance. Multiple IDIV can be in flight at once on different register sets (real or virtual). Register renaming gives even more room.

A compiler can unroll the loop, use different registers (or the CPU can use renaming), and pipeline them, getting better throughput. r16 IMUL has a relative throughput of 2, r16 IDIV then has a relative throughput of 6. This is 3 times slower, not 10. Pretty much any other 10 instructions are going to be slower, not faster as you claim.

Compared to the overhead of looping and memory access, these are not the bottleneck. These represent a small amount of the time to do the computation.

Do some timing and check. IDIV is not what it once was.

[1] https://www.agner.org/optimize/instruction_tables.pdf

To be clear, I'm generally talking about machine-width operations, in this case 64-bit output, not smaller operations like 8 or 16 bits.

For Skylake, we have 36 uops and 35-88 cycles of latency for a 64-bit div, and for idiv it is even worse: 57 uops and 42 to 95 cycles of latency. These can execute one every 20 to 80 cycles, so the throughput is fully twenty to eighty times worse than 64-bit multiplication.

> IMUL is often 2-3 uops on the same chips.

On Intel chips multiplication is almost always 1 uop if you need a 64-bit result (this is what a multiply in C will compile to), and 2 uops if you need the full 128 result. In in either case it can sustain one every cycle.

> However, uops are not the bottleneck, pipelined throughput is what matters for performance. Multiple IDIV can be in flight at once on different register sets (real or virtual). Register renaming gives even more room.

How do you know uops are not the bottleneck? They often are: it depends on the surrounding code.

More importantly, div is slow in every way that could matter: if latency is the bottleneck, div sucks (20 to 80 cycles, vs 3 for multiply). If uop throughput is the bottleneck, div sucks. If pure div throughput is the bottleneck (as you are suggesting), div still sucks: the inverse throughput for 64-bit div is 21 to 83 cycles: almost the same as the latency, so it is barely pipelined at all, and fully 21 to 83 times slower than multiplication.

Now one might say that you are mostly interested in 32-bit values, as these are common even in 64-bit code, and Intel has good support for them.

In this case div is quite a bit faster, but still dramatically slower than multiplication. In latency terms it is 26 cycles on Skylake, versus 3 for multiplication, so about 9 times slower. In "pipelined throughput" terms, it is 6 cycles per division, versus 1 per multiplication, so 6 times slower.

As a summary, we can say that on the most recent Intel, which probably have the fastest dividers around (and which have seen recent improvements), 64-bit division is anywhere from 11 to 90 times slower than multiplication, depending on how you measure it (and the input values), and 32-bit division is between 6 and 9 times slower, depending on how you measure it. I think my "10 times slower" which actually falls towards the faster end of that range is actually quite conservative!

On chips older than Skylake (which is the vast majority of chips you'll still find in datacenters and in the cloud, since SKX is quite recent), the situation is worse for div, since mul had the same 1/3 latency/tput performance, but div was slower.

> Do some timing and check. IDIV is not what it once was.

Here, I agree.

It has gotten much faster! In other words, it has gone from shockingly slow to merely very slow. It is still way slower than multiplication in every respect (which itself has gotten faster to the point where it has 1-cycle throughput now), and almost all of the tricks to avoid actual div instructions still apply.

58 is the number we're trying to divide by. (It's a constant, so there's a bunch of clever optimizations we can do; I don't think anyone is claiming fully-general division is fast by any standards beyond "round trip to memory".)

I feel silly, but thank you for the polite explanation of what is now obvious.

Thank you for correcting my misunderstanding.

The Ascii85 implementation in Python is 1/70th the performance of the base64 version, and the command-line driver to the base58 implementation I linked to is incredibly slower than the system base64. Profiling shows that the time is indeed spent on the two div/mod lines.

I assume that these are not the fastest possible implementations, but it does show that high performance is not trivial.

I'm actually using base 58 to encode numbers for the gopher URL shortner with the Ripple alphabet. It works as advertised, and is great for an old protocol like this.

gopher://fld.gp/1/gopher/shorten (shortened URL example: gopher://fld.gp/1j/r or http://fld.gp/1j/r )

I love interesting representations of binary data, base 58 included, but this article is so scant on details. I'm sure someone could even add a brief overview of the algorithm and samples and it's limitations and idiosyncrasies, such as how it handles things like padding etc.

The best-kept secret Wikipedia is the references section. First reference probably answers all your questions.

For those of you still in school, this is also how you "get around" the "Don't use Wikipedia" restriction on your school reports. You go to the Wikipedia article for whatever it is you're writing about, ignore it entirely and scroll down to the References, and go to town on those. 10 seconds of "research" and you've almost certainly got enough primary sources lined up to finish your report unless this is a thesis-level job. I scare-quote "get around" because this isn't even really "cheating", unless you're doing the assignment where you're explicitly instructed to use the library lookup system, in which case of course it is.

I think the article is good as is, because base58 is actually pretty dissimilar to most binary-to-text encodings. Surprisingly many encodings are just base 2^k (16, 32, 64 and 128 [1]), some have converged to base 85 (because 85^5 / 2^32 ~= 1.03, i.e. just enough to represent 4 octets in 5 units). This is because binary-to-text encodings are not expected to use costly bigint operations for an arbitrary-length payload. Base36 and base58 are the only common exceptions, and for this reason I actually think that they should be termed "bigint-to-text" encodings.

Given that base58 is actually just a big number with translated digits, the list of alphabets (actually, digits) is almost enough to decode base58 encodings. There is always an exception, of course: Bitcoin base58 calculates a minimal length from the original payload length so that it has to be "zero"-padded with one (which is the first alphabet, thus acts as a digit zero) [2]. This preserves the original payload length, but is still quadratically expensive and should be used for a reasonably short, known-length data like hashes.

[1] This is actually not normally viable as a binary-to-text encoding, but base122 (http://blog.kevinalbs.com/base122) does use this to exploit UTF-8 encoding.

[2] https://en.bitcoin.it/wiki/Base58Check_encoding (note that I've used a significantly simpler explanation here, the gist is same however)

"Base36 and base58 are the only common exceptions"

Isn't Ascii85/Base85 at least as common as those? Quoting the Wikipedia entry for it, "Its main modern uses are in Adobe's PostScript and Portable Document Format file formats, as well as in the patch encoding for binary files used by Git"

Python includes it as part of the standard library. Using the example from the WP entry:

  >>> s = r"""<~9jqo^BlbD-BleB1DJ+*+F(f,q/0JhKF<GL>Cj@.4Gp$d7F!,L7@<6@)/0JDEF<G%<+EV:2F!,
  ... O<DJ+*.@<*K0@<6L(Df-\0Ec5e;DffZ(EZee.Bl.9pF"AGXBPCsi+DGm>@3BB/F*&OCAfu2/AKY
  ... i(DIb:@FD,*)+C]U=@3BN#EcYf8ATD3s@q?d$AftVqCh[NqF<G:8+EV:.+Cf>-FD5W8ARlolDIa
  ... l(DId<j@<?3r@:F%a+D58'ATD4$Bl@l3De:,-DJs`8ARoFb/0JMK@qB4^F!,R<AKZ&-DfTqBG%G
  ... >uD.RTpAKYo'+CT/5+Cei#DII?(E,9)oF*2M7/c~>"""
  >>> import base64
  >>> base64.a85decode(s, adobe=True)
  b'Man is distinguished, not only by his reason, but by this singular
  passion from other animals, which is a lust of the mind, that by a
  perseverance of delight in the continued and indefatigable
  generation of knowledge, exceeds the short vehemence of any carnal

I have explicitly mentioned this:

> [...] some have converged to base 85 (because 85^5 / 2^32 ~= 1.03, i.e. just enough to represent 4 octets in 5 units).

Base 85 does not use bigint, it is just a clever approximation of optimal encoding with 85 symbols.

I think I was confused between "baseX" as the specific program name and 'base X' as the base of the specific numbering system.

I'm more of a fan of BasE91 and I believe more people should be using it.

Why? It's only a minuscule improvement over ASCII85, which is a well-established encoding. I couldn't find a rationale for it anywhere.

There are many dimensions to utility.

    oO0 I1l B8 S5 b6G

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact