Hacker News new | comments | show | ask | jobs | submit login
Real Time Data compression using LZ4 (fastcompression.blogspot.fr)
61 points by suprgeek on June 6, 2012 | hide | past | web | favorite | 9 comments



With such compression ratio for text, and compression speed, this algorithm fits well to the Google Snappy / LZO / FastLZ group. Every database engine should use one of these, they are operating at disk I/O speed.


Only for certain kind of data. Like a document index. But not for numerical or many other types. RLE, delta encoding and many other simpler algorithms are a better match on many cases.


I just integrated it into the programming language I am developing, thanks for such a small and handy library!


You should consider zlib, it's compatible with everything and has very nice options. Also it's very low in memory usage (around 500KB vs *MB). Check out how pigz implements it with pthreads (one thread per block).


Speed is good, but how does it perform in terms of compression ratio against other algorithms?


From Compression Ratings website I've catched this:

    program    comp-ratio  comp-time  decomp-time
    bzip2      34.1%       468.09s    167.03s
    gzip -5    37.6%       141.30s    34.76s
    lz4 -c2t4  43.9%       52.72s     5.84s
http://compressionratings.com/sort.cgi?rating_sum.full+p3


  program       comp-ratio   comp-time  decomp-time
  pigz -1          40.3%       26.92s      20.31s
  lz4 -c2t4        43.9%       52.72s       5.84s
  info-zip -1      41.3%      122.31s     119.81s
More relevant would be a comparison with gzip -1, lzop, snappy and rolz.

Also the test data is mixed. It is not very helpful to see where this algorithm shines.

Also note the memory usage shots up from 5m/2m for info-zip -1 to 46m/42m for the specific case of lz4 you picked.

EDIT: also bzip2 seems to be paticularly bad for this specific dataset, other algorithms in that category get better compression ratios. Added pigz to the comparison (info-zip with pthreads).


This level of compressions is only good for very low entropy data. Like HTML found in the wild from template code. But you probably are better off running whitespace cleanup, CSS removal and plain minimization. All that is just as fast and needs no decompression.

These compressors are only useful for a handful of cases.


There's a table here: http://code.google.com/p/lz4/




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: