Hacker News new | past | comments | ask | show | jobs | submit login

I think its widely understood that a little-endian prefix VarInt decodes much faster as the leading alternative to LEB128. You can implement the whole thing without a loop, thus you don't have to exercise branch and loop prediction resources at all.

- count leading zeros (or ones, depending on the first byte's tagging preference)

- Unaligned load expressed as memcpy. Let the compiler's instruction scheduler peephole it out into a single unaligned load instruction.

- shift the loaded value by an amount indicated by the leading zero count.

- Zig-zag decode for signed integers




What do you mean by "little-endian prefix varint"? Like this?

``` enum VarInt { Byte(u8), Short(u16), Long(u32), LongLong(u64), } ```

I.e. one byte tag followed by the value? It's definitely an option for me but the main issue is that values under 128 now take 2 bytes instead of 1. I guess I could do something like what CBOR does and use 0-251 = literal value, 252 = u8 follows, 253 = u16 follows, 254 = u32 follows, 255 = u64 follow.


It can be considered a variation on LEB128 that packs all of the continuation bits into the first byte.

Ie, two leading ones set in the first byte mean that there are two trailing bytes.

https://news.ycombinator.com/item?id=11263378 is one description.

https://github.com/stoklund/varint has some benchmarks.


Ah ok I understand. Might be a little awkward to implement in Javascript given its lack of 64-bit ints, etc, although it does at least have CLZ32!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: