

Finite State Entropy - A new breed of entropy coder - ch
http://fastcompression.blogspot.fr/2013/12/finite-state-entropy-new-breed-of.html

======
the_french
This was a nice article.

In the comments Jarek Duda discusses some of the more technical details of the
implementation and of his ANS paper that enabled this writeup. One cool thing
I noticed is this:

    
    
        Another advantage of ANS is that we can slightly perturb     
        the initialization procedure using a pseudo-random number 
        generator initialized with cryptographic key (e.g. and 
        also the number of block): for example choosing between 
        the lowest weight symbol and the second best one.
        This way we get simultaneously a decent encryption for free.
    

Does anyone know what modern compression tools use for entropy coding? Do they
all use huffman and would this help them increase speed? What about gzip for
HTTP transactions?

------
ch
There is a nice treatment on entropy coders in Managing Gigabytes
([http://ww2.cs.mu.oz.au/mg/](http://ww2.cs.mu.oz.au/mg/)). I highly recommend
the text as it gives good background on entropy encodings and it also takes a
nice engineering approach to implementation.

------
nova
Arithmetic coding, one of the early victims of software patents.

------
cromwellian
"In a nutshell, this coder provides the same level of performance as
Arithmetic coder, but only requires additions, masks, and shifts."

Didn't IBM's Q-Coder also make the same claims?

Here's another one:
[http://www.gvu.gatech.edu/people/official/jarek/courses/7491...](http://www.gvu.gatech.edu/people/official/jarek/courses/7491/Arithmetic2.pdf)

~~~
Tractor4973
The Howard-Vitter coder only works for binary alphabet (0/1).

------
thristian
Well, this is interesting.

I toyed around with entropy encoding at one point based on a republished copy
of an old Dr. Dobbs' Journal article I found, and I managed to cleanly
reimplement the described algorithm in Python and reproduce the example
mentioned in the article... but eventually I hit a problem I wasn't smart
enough to debug. I'm eager to see whether this new algorithm is more
approachable.

------
beagle3
Does anyone know the real history with arithmetic encoders?

A lot of places seem to credit Rissanen, but I remember from when this was
actually my field, that Elias did most of the work in the early '60s (and
actually went on a tangent that Shannon considered in the '50s), and
Rissanen's contribution was getting the finite word length arithmetic
considerations properly done.

In fact, when I studied this at the university (early 90s), it was introduced
as "Elias (aka Arithemetic) coding".

~~~
jmspring
Glen Langdon, who authored early papers with Jorma Rissanen on Arithmetic
Coding was my advisor. I am not sure if I still have notes on what Glen taught
in his data compression classes, but I will see what I have.

I've been meaning to get notes from his classes as well as some transform
mathematics notes from one of David Huffman's classes online.

------
eliasmacpherson
How does it compare to the algorithm used in FLAC or mpeg4 lossless? Is there
a reasonable way to benchmark the differences?
[https://en.wikipedia.org/wiki/Golomb_coding#Simple_algorithm](https://en.wikipedia.org/wiki/Golomb_coding#Simple_algorithm)

~~~
Tractor4973
Golomb is way too static, it doesn't compress as well as Huffman, let alone
FSE.

------
vanderZwan
So if I understand this correctly, how much it outperforms Huffman greatly
depends on the data being compressed. Which makes me wonder: what kind of
real-life datasets would benefit the most from this new approach?

~~~
jmpe
Check out the FiniteStateEntropy/test/probaGenerator.c code.

It builds a table of character sequences based on a probability. If you feed
it a p=0.5 you get a sequence that halves each time:

0000000011111223 (shortened example, the real total length is 4096 so you get
2048x0, 1024x1 ...)

This is then used as an alphabet probability to build random sequences in the
test file.

The algorithm scores better than huff for higher P, meaning that the alphabet
shortens with a few dominant ones. This means that (statistically) you get a
lot of words in your dictionary that resemble each other. And that's where the
fractional probability plays its role.

To try to answer your question I'd guess that this technique shows its muscle
for data sets in which the dictionary is large and consists of similar
sequences. Example: a csv file with a lot of numbers, gis or kml datasets,
Gerber files, ...

