- Don't want 0OIl characters that look the same in some fonts and could be used to create visually identical looking data.
- A string with non-alphanumeric characters is not as easily accepted as input.
- E-mail usually won't line-break if there's no punctuation to break at.
- Double-clicking selects the whole string as one word if it's all alphanumeric.
For reservation numbers, doing away with case makes it easier to speak the number over the phone.
The basic format is `<varint-base-encoding-code><base-encoded-data>` and the "varint-base-encoding-code" comes from https://github.com/multiformats/multibase/blob/master/multib...
(used with various alphabets for denser binary encoding in PDF, git binary patches, and the scene files of the Arnold renderer).
Which can do base58 and any base you throw at it using the same algorithm.
> The characters that are used in Open Location Codes were chosen by computing all possible 20 character combinations from 0-9A-Z and scoring them on how well they spell 10,000 words from over 30 languages. This was to avoid, as far as possible, Open Location Codes being generated that included recognisable words. The selected 20 character set is made up of "23456789CFGHJMPQRVWX".
It basically requires a bignum library (or a specialized version that can do divisions of arbitrary long numbers by 58), and conversion is slow as hell.
Either you end up pulling in a full bignum library, or you have to use error-prone, clunky specialized code you don't have time to try and understand and verify.
Of all the design choices Satoshi made, this is far from the best one.
for (i = zcount, high = size - 1; i < binsz; ++i, high = j)
for (carry = bin[i], j = size - 1; (j > high) || carry; --j)
carry += 256 * buf[j];
buf[j] = carry % 58;
carry /= 58;
Many compilers even expand the code inline to generate fix size divides into mult and shift.
Here's a 4 year old Hacker news thread on compiler tricks then . They're even better now.
Specificly, u8 divmod 58 can be reduced to a u8->u16 multiply, a right shift, and three conditional subtractions; that's not great, but on a modern CPU it's a afterthought compared to the quadratic loop over the input size.
This is what I meant when I said:
>even when the compiler gets smart about it because it's a constant.
Go look at some decent implementations and the assembly they generate. Don't spread unfounded or unchecked claims.
I'm interested in your comment that "some architectures" have a fast divmod. Do you actually know of one where it's faster than a shift and a multiplication? Or the same? We're finishing up a paper where this would be very relevant information.
I suppose if you’re writing a paper you’re aware of quite a bit of literature on exactly this problem. Recent papers have quite fast methods to do this. I’ve not looked at recent state of the art to see if 58 has a near zero cost divmod, but numbers of many forms do. I’d not be surprised if state of the art has it reduced to a few non-mem access branchless no division instructions on most architectures.
Maybe later I’ll poke at 58 and see what I can design. I’ve made quite a few such algorithms over the years.
I'm might be exposing my ignorance, but what's 58 in this context? Is this an ARM Cortex series, or something else?
Divide is a hard problem and I'm not aware of any breakthroughs. It's been getting faster, but it's still 10 slower (at least) than multiplication as you point out. Intel has thrown hardware at it a few generations back, speeding it up (it used to be over 100 cycles), and AMD has a fairly fast divider in Ryzen as well (for many inputs faster than Intel).
I expect POWER to be in the same range. ARM architectures are all over the power/performance map, obviously, but for the biggest/fastest, I would guess they are within a factor of 2, either way, of contemporary x86 implementations.
 This point is important: it means you can't get as much work done while the division is happening. There are other slow instructions, such as floating point div, sqrt, transcendental and trig functions. However, these functions are only a single uop and then occupy only their respective execution unit for the duration, so you can use the rest of the EUs at full speed. So if you only need to do one every 50 cycles or so, and have work to do in the meantime, they can be also free. idiv is not like that: it spews out many uops which compete for the same EUs as the rest of your code.
However, uops are not the bottleneck, pipelined throughput is what matters for performance. Multiple IDIV can be in flight at once on different register sets (real or virtual). Register renaming gives even more room.
A compiler can unroll the loop, use different registers (or the CPU can use renaming), and pipeline them, getting better throughput. r16 IMUL has a relative throughput of 2, r16 IDIV then has a relative throughput of 6. This is 3 times slower, not 10. Pretty much any other 10 instructions are going to be slower, not faster as you claim.
Compared to the overhead of looping and memory access, these are not the bottleneck. These represent a small amount of the time to do the computation.
Do some timing and check. IDIV is not what it once was.
For Skylake, we have 36 uops and 35-88 cycles of latency for a 64-bit div, and for idiv it is even worse: 57 uops and 42 to 95 cycles of latency. These can execute one every 20 to 80 cycles, so the throughput is fully twenty to eighty times worse than 64-bit multiplication.
> IMUL is often 2-3 uops on the same chips.
On Intel chips multiplication is almost always 1 uop if you need a 64-bit result (this is what a multiply in C will compile to), and 2 uops if you need the full 128 result. In in either case it can sustain one every cycle.
> However, uops are not the bottleneck, pipelined throughput is what matters for performance. Multiple IDIV can be in flight at once on different register sets (real or virtual). Register renaming gives even more room.
How do you know uops are not the bottleneck? They often are: it depends on the surrounding code.
More importantly, div is slow in every way that could matter: if latency is the bottleneck, div sucks (20 to 80 cycles, vs 3 for multiply). If uop throughput is the bottleneck, div sucks. If pure div throughput is the bottleneck (as you are suggesting), div still sucks: the inverse throughput for 64-bit div is 21 to 83 cycles: almost the same as the latency, so it is barely pipelined at all, and fully 21 to 83 times slower than multiplication.
Now one might say that you are mostly interested in 32-bit values, as these are common even in 64-bit code, and Intel has good support for them.
In this case div is quite a bit faster, but still dramatically slower than multiplication. In latency terms it is 26 cycles on Skylake, versus 3 for multiplication, so about 9 times slower. In "pipelined throughput" terms, it is 6 cycles per division, versus 1 per multiplication, so 6 times slower.
As a summary, we can say that on the most recent Intel, which probably have the fastest dividers around (and which have seen recent improvements), 64-bit division is anywhere from 11 to 90 times slower than multiplication, depending on how you measure it (and the input values), and 32-bit division is between 6 and 9 times slower, depending on how you measure it. I think my "10 times slower" which actually falls towards the faster end of that range is actually quite conservative!
On chips older than Skylake (which is the vast majority of chips you'll still find in datacenters and in the cloud, since SKX is quite recent), the situation is worse for div, since mul had the same 1/3 latency/tput performance, but div was slower.
> Do some timing and check. IDIV is not what it once was.
Here, I agree.
It has gotten much faster! In other words, it has gone from shockingly slow to merely very slow. It is still way slower than multiplication in every respect (which itself has gotten faster to the point where it has 1-cycle throughput now), and almost all of the tricks to avoid actual div instructions still apply.
The Ascii85 implementation in Python is 1/70th the performance of the base64 version, and the command-line driver to the base58 implementation I linked to is incredibly slower than the system base64. Profiling shows that the time is indeed spent on the two div/mod lines.
I assume that these are not the fastest possible implementations, but it does show that high performance is not trivial.
(shortened URL example: gopher://fld.gp/1j/r or http://fld.gp/1j/r )
For those of you still in school, this is also how you "get around" the "Don't use Wikipedia" restriction on your school reports. You go to the Wikipedia article for whatever it is you're writing about, ignore it entirely and scroll down to the References, and go to town on those. 10 seconds of "research" and you've almost certainly got enough primary sources lined up to finish your report unless this is a thesis-level job. I scare-quote "get around" because this isn't even really "cheating", unless you're doing the assignment where you're explicitly instructed to use the library lookup system, in which case of course it is.
Given that base58 is actually just a big number with translated digits, the list of alphabets (actually, digits) is almost enough to decode base58 encodings. There is always an exception, of course: Bitcoin base58 calculates a minimal length from the original payload length so that it has to be "zero"-padded with one (which is the first alphabet, thus acts as a digit zero) . This preserves the original payload length, but is still quadratically expensive and should be used for a reasonably short, known-length data like hashes.
 This is actually not normally viable as a binary-to-text encoding, but base122 (http://blog.kevinalbs.com/base122) does use this to exploit UTF-8 encoding.
 https://en.bitcoin.it/wiki/Base58Check_encoding (note that I've used a significantly simpler explanation here, the gist is same however)
Isn't Ascii85/Base85 at least as common as those? Quoting the Wikipedia entry for it, "Its main modern uses are in Adobe's PostScript and Portable Document Format file formats, as well as in the patch encoding for binary files used by Git"
Python includes it as part of the standard library. Using the example from the WP entry:
>>> s = r"""<~9jqo^BlbD-BleB1DJ+*+F(f,q/0JhKF<GL>Cj@.4Gp$d7F!,L7@<6@)/0JDEF<G%<+EV:2F!,
>>> import base64
>>> base64.a85decode(s, adobe=True)
b'Man is distinguished, not only by his reason, but by this singular
passion from other animals, which is a lust of the mind, that by a
perseverance of delight in the continued and indefatigable
generation of knowledge, exceeds the short vehemence of any carnal
> [...] some have converged to base 85 (because 85^5 / 2^32 ~= 1.03, i.e. just enough to represent 4 octets in 5 units).
Base 85 does not use bigint, it is just a clever approximation of optimal encoding with 85 symbols.
oO0 I1l B8 S5 b6G