

Blosc, an extremely fast, multi-threaded, meta-compressor library - 0x1997
http://blosc.org/

======
jasode
>faster than a memcpy() OS call

I usually don't nitpick terminology but memcpy() is a C language runtime
library function and not a Linux/Win32 os call.

------
oofabz
> Blosc comes with a pre-filter (also called pre-conditioner) called shuffle
> which rearranges bytes in a clever way for the compression stage.

This sounds like the Burrows-Wheeler transform, which bzip2 uses:

[https://en.wikipedia.org/wiki/Burrows%E2%80%93Wheeler_transf...](https://en.wikipedia.org/wiki/Burrows%E2%80%93Wheeler_transform)

~~~
pkhuong
Fixed-width binary data (e.g., a sequence of double floats) often benefit from
a simpler transform: just transpose the bits/bytes so that, e.g., the least
significant bytes form a contiguous region, followed by all the second least
significant bytes, etc.

> Meant for binary data: can take advantage of the type size meta-information
> for improved compression ratio (using the integrated shuffle pre-
> conditioner).

makes it sound like that's what Blosc is doing.

~~~
rdc12
Sounds interesting is there a name for that technique? Or more to the point
something that can be searched for

~~~
shiningmuppet
we call it the 'shuffle filter' but that is all

------
DanBC
I wish they'd done some benchmarking to demonstrate how quick it is across
different data.

~~~
faltet
Here you have a benchmark based on the MovieLens database:

[http://nbviewer.ipython.org/github/Blosc/movielens-
bench/blo...](http://nbviewer.ipython.org/github/Blosc/movielens-
bench/blob/master/querying-ep14.ipynb)

The results are explained here:

[http://www.blosc.org/docs/bcolz-
EuroPython-2014.pdf](http://www.blosc.org/docs/bcolz-EuroPython-2014.pdf)

and, more in-depth here:

[https://python.g-node.org/wiki/starving_cpu](https://python.g-node.org/wiki/starving_cpu)

------
bkeroack
Pretty cool. Faster and better compression ratios, according to the tutorial.

~~~
shiningmuppet
Depends on the data.

------
kolev
No GitHub? No Bitbucket? Just a source code dump? Weird!

~~~
shiningmuppet
[https://github.com/Blosc/](https://github.com/Blosc/)

~~~
kolev
Thanks!

------
fenollp
Yes but does it achieve optimal tip-to-tip efficiency?

------
adwilk
Yeh it's fast, but what's its Weissman score?

~~~
shiningmuppet
It's off the charts...

