
Stream VByte: breaking new speed records for integer compression - ingve
https://lemire.me/blog/2017/09/27/stream-vbyte-breaking-new-speed-records-for-integer-compression/
======
kbenson
Just to give you an idea of how effective this strategy seems to be, this
seems very similar to what the _COST in the land of databases_ submission the
other day was talking about, where choosing interesting, fitting algorithms
for the problem instead of large distributed but general systems often yields
very good results (enough such that a single thread can often beat 100+ core
distributed systems).

In one of the linked prior blog posts[2] of his he linked to in the beginning
of that article, he said this:

 _Using the Hilbert curve layout leads to a very simple compressed
representation: having sorted the u64 values it uses to represent edges, one
can simply delta-encode them (writing the differences between adjacent
values). Because the Hilbert curve teases out locality, the gaps are often
quite small, and therefore compress nicely. When I combined a variable-length
integer encoding with gzip, the resulting file was only 45GB; about 2.8 bits
per edge). Decompressed (with just the variable-length delta encoding), it is
154GB, which is the representation I used to get the measurements above._

He then goes on in his posts to comically trounce some large distributed
systems.

1:
[https://news.ycombinator.com/item?id=15332051](https://news.ycombinator.com/item?id=15332051)

2:
[https://github.com/frankmcsherry/blog/blob/master/posts/2015...](https://github.com/frankmcsherry/blog/blob/master/posts/2015-02-04.md)

~~~
frankmcsherry
!#$!@# I was going to go try that out. ;) When doing the Chaos/Order stuff,
the decompression was about half the CPU cycles, so thinning that could help.

Fwiw, the encoding I used wasn't VByte; because I was doing strict increments,
I knew `0` wasn't a valid value to see, and used it to encode variable byte
lengths (number of zeros: number of valid data bytes). Not smart, but easy to
hack up, and gave me back 128-255 in one byte. I'm curious to check this out,
now!

------
mrhill
This method of compressing integer arrays in differences and then reducing the
number of bits for each difference, is very efficient for many applications.
We used it many years ago at a startup to compress and realtime render vector
maps for mobile devices.

Not only made it the data really small (we could fit all the roads of a
country like Germany in 5 MB), it was also blazing fast to decode. Our vector
map renderer used to do the vector map decompression on the fly, while doing
coordinate transformation, and rendering to the screen. That was really the
trick to get realtime zooming and panning vector maps on ~100 MHz ARM CPUs
with tiny data caches of the time.

We even took that compression scheme a step further from byte-bound to bit-
granularity. Above article made me wonder, how our old scheme performs on
today's CPU...

~~~
kwillets
One of the problems with SIMD is that we can't do this kind of shuffle at bit
granularity, at least not in one instruction.

~~~
jeffffff
if you want bit level granularity you can use either SIMD-BP128 or SIMD-
FastPFOR from
[https://arxiv.org/pdf/1209.2137.pdf](https://arxiv.org/pdf/1209.2137.pdf)

------
kwillets
This is indeed fast and simple, since the pshufb instruction allows unpacking
of bytes into multiple zero-padded words in one step. This technique uses a
precomputed table of shuffle patterns, so it's a short sequence of
instructions to fetch control information for each 4-word sequence and apply
the shuffle given.

------
jorangreef
"I should mention that unlike Amazon, we did not patent our approach. We want
you to use this in your commercial and open-source projects!"

Thanks Dan!

------
kuwze
Could there be a similar scheme for floats?

Also, is there a penalty for using float16s for CPU centric code?

~~~
akrasuski1
Using fixed-point representation would allow using this compression method.

------
Scaevolus
The example code should replace "datasource" with "databytes".

~~~
rurban
Na

