
LZHAM – Lossless compression with faster decompression than LZMA - Arkanosis
https://code.google.com/p/lzham/
======
joosters
It's strange that they intend for the decompressor to be fast on embedded
devices and handhelds, yet give no benchmarks for ARM processors. Instead,
they only talk about i7 performance, a chip that you won't find in the target
environments.

Does the algorithm run well on ARM?

~~~
nkurz
For that matter, does it run well on a modern x64 processor? I'm confused by
people who do optimization at this level, but don't distinguish between the
different generations of processors. Core i7 is a marketing term corresponding
to Intel's positioning of the CPU within the market, but says little about the
performance characteristics of the processor.

He mentions later that he's using a "Core i7 970", which is from the Westmere
generation that came out in 2010. The decompression speed on the more modern
Sandy Bridge, Ivy Bridge, Haswell, or Broadwell chips could be anywhere from
the same to 8x better depending on how amenable the algorithm is SIMD
improvements.

Without testing it's hard to know, but my guess would be that since it was not
consciously designed to take advantage of the newer processor features, it
probably will be at the low end of this, and could be beaten handily by an
approach that targets only new instruction sets.

------
raphaelj
If you are looking for a fast/realtime lossless compression algorithm, you
should prefer lz4 [1] or lzo [2].

Nevertheless, an article comparing the speed and compression efficiency of
these algorithms could be interesting.

[1] [https://code.google.com/p/lz4/](https://code.google.com/p/lz4/)

[2]
[http://en.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93Ober...](http://en.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93Oberhumer)

~~~
uxcn
There's a link to a table of benchmarks in the description
([http://mattmahoney.net/dc/text.html](http://mattmahoney.net/dc/text.html)).
LZ4 still looks a little more compelling than LZHAM in terms of
compress/decompress time. It doesn't mention the specific application though
(CPU/Disk/etc...).

~~~
AaronFriel
Are you sure it doesn't compare favorably to LZ4? For a small (1.5x) slower
decompression speed, the compression factor appears to be substantially
better. LZHAM managed to compress the enwiki8 set to 23.80 MiB, and the
enwiki9 set to 196.83 MiB.

LZ4, by comparison, had compressed sizes of 40.88 MiB and 362.40 MiB. A
database or a Linux distribution using zswap/zram/etc. would be able to store
84% more data in the same amount of memory. For caches, that's huge.

Here are the table rows:

    
    
        lzham alpha 3 x64 -m4 -d29                    24,954,329  206,393,809    155,282 x  206,549,091    595     9 4800 LZ77 45 
        lz4 v1.2          -c2                         42,870,164  379,999,522     49,128 x  380,048,650     91     6   20 LZ77 26
    

One trade-off appears to be that lzham often requires a larger dictionary in
memory, but it appears that even with smaller dictionary sizes it is
appreciably more compact than lz4.

~~~
uxcn
That was also one of the things I noticed at a quick glance. It would probably
be a fairly big win for anything I/O constrained. I didn't take a look at the
data being compressed or the algorithm in detail though.

------
bhouston
One of the LZMA variants should be added to Chrome/Firefox and others as an
optional compression format. Gzip compression ratio is brutally bad compared
to the LZMA variants for text and binary compression. It would make a big
difference to the web, especially for static but commonly downloaded data.

~~~
faragon
"It would make a big difference to the web, especially for static but commonly
downloaded data."

Static but commonly downloaded data is already cached. For downloading
software packages, or differential sofware updates (e.g. initializing the
dictionary with previously downloaded elements), it would make a lot of sense
(basically, the things that are already being done with LZMA getting extra
speed).

It will take time to get LZHAM as stable as LZMA. It looks promising, anyway.

~~~
bhouston
> Static but commonly downloaded data is already cached.

I was trying to refer to things that are downloaded by a lot of people (such
as our 3D scenes here: [https://clara.io/library](https://clara.io/library)),
rather than something that is repeatedly downloaded by a single person.

Downloading the data to these 3D scenes is by far the slowest thing and we
know that LZMA variants lead to nearly 2x reduction in data transfer.

------
jleahy
Given that LZMA is the algorithm with the highest compression ratio and lowest
speed of all commonly used algorithms, this isn't particularly impressive.

Why is it better than bzip2, or gzip? Both of these are 2-3x faster than LZMA
(much more so for gzip --fast) but have lower compression ratios.

LZ4 would be the extreme example of an algorithm even faster than gzip (but
with still lower compression ratio).

~~~
AceJohnny2
FWIW, from the "Large Text Compression Benchmark" linked from that page:

    
    
                    Compression     Compressed size      Decompresser Total size   Time (ns/byte)
      Program           Options      enwik8      enwik9     size (zip)    enwik9+prog  Comp Decomp  Mem Alg Note
    
      lzham alpha 3 x64 -m4 -d29   24,954,329  206,393,809  155,282 x  206,549,091    595     9 4800 LZ77 45 
      gzip 1.3.5        -9         36,445,248  322,591,995   38,801 x  322,630,796    101    17  1.6 LZ77
      bzip2 1.0.2       -9         29,008,736  253,977,839   30,036 x  254,007,875    379   129    8 BWT
    

Salient point is the decompression time (ns/byte), which is 129 for bzip2, 17
for gzip, and... 9 (!) for lzham. So on that point it blows them out of the
water, while still achieving higher compression rate on this type of input. As
the original webpage implies, this is perfect for stuff like video games.

~~~
corysama
Is the "Mem" column saying that lzham used 4800 megabytes vs gzip using 1.6
megabytes? If that's so, it makes lzham much less attractive for console and
mobile games. If that could be reduced to 8 while still being faster than
gzip, then it would be very interesting.

For anyone publishing compression benchmarks: Please, please include memory
usage in the report! Very few reports bother to do so at the moment.
Meanwhile, many compressors take the assumption that since 16 gigs of RAM is
cheap these days, that means using a few extra gigs to get a speedup is
totally reasonable. It may be reasonable for you, but it makes the algo
useless for me.

~~~
usefulcat
The mem column is memory used for compression.

------
giovannibajo1
Would be a good fit for UPX.

~~~
nekitamo
Isn't UPX pretty fast the way it is right now?

~~~
faragon
Not, if your executable has embeded elements and it takes e.g. 500MB.

~~~
nekitamo
Out of curiosity, what kind of executable is this? Something generated by
HPHPc?

------
DiabloD3
How does it compare to LZ4

~~~
lzhamfan1
[http://richg42.blogspot.ca/](http://richg42.blogspot.ca/)

Disclosure: I'm probably the only person to release a game that uses LZHAM
other than Rich… I chose it because because I needed an LZMA-like compression
ratio with a faster decompressor. Performed up to expectations.

~~~
matt4711
What is your opinion on this LZ scheme: [http://zola.di.unipi.it/rossano/wp-
content/papercite-data/pd...](http://zola.di.unipi.it/rossano/wp-
content/papercite-data/pdf/soda14.pdf) which seems to have very good
compression effectiveness and decompression time.

~~~
muizelaar
There's a review of that work here: [http://encode.ru/threads/1903-quot-bi-
criteria-quot-data-com...](http://encode.ru/threads/1903-quot-bi-criteria-
quot-data-compression?p=41614&viewfull=1#post41614)

------
loisaidasam
Let's cut to the chase here... what's the Weissman score?

~~~
floatrock
And how does it perform on the 3D video benchmarks?

------
rasz_pl
compression 1-3MB/sec

decompresion ~100MB/s

color me unimpressed :(

~~~
MacsHeadroom
How is a 3x improvement over LZMA not impressive?

------
jcburns
Glad to see the Pied Piper team's work come to fruition.

