
Cmix is lossless data compression aimed at optimizing compression ratio - luu
http://www.byronknoll.com/cmix.html
======
cba9
> cmix has surpassed the winning entry of the Hutter Prize (but exceeds the
> memory limits of the contest)...cmix uses a similar neural network
> architecture to paq8l. cmix uses a four layer feedforward network with
> 414,273 neurons (paq8l uses a three layer network with 3,633 neurons).

This is something I've suspected for a while: the Hutter Prize has become
obsolete. I agree with the 'prediction is intelligence' paradigm, but the
Hutter Prize cannot show any meaningful progress in AI for the simple reason
that the corpus is too small and the resource limits are too stringent. 4
layers with 414k neurons is nowhere near what deep learning researchers use to
create intelligent systems. You couldn't fit char-rnn into the Hutter Prize
constraints! At this point, all the prize incentivizes is finding an algorithm
which fits into the allotted resources, no matter how dumb and inferior that
algorithm is to the state of the art which requires more resources to run.
Bit-twiddling to optimize the current approaches may be fun for a certain kind
of people, but it's hard to see how the contest has contributed at all to
progress in AI in the past decade, which was the original purpose.

Perhaps what is necessary is a Hutter Prize 2: expand the corpus to the modern
English Wikipedia (not one from a decade ago), perhaps add in other language
corpuses to increase it further to get a multi-gigabyte corpus, and then
increase the resource limits drastically (32GB RAM for starters so cmix can
get in), and shut down the old one for diminishing returns. A desktop machine
these days can certainly support a decent amount of RAM and computation (on
Newegg, you can get a desktop tower refurb with 32GB Ram for <$1000), so
there's no cost excuse if one can afford to offer thousands of euros as a
prize...

------
edent
> At least 32GB of RAM is recommended to run cmix.

Youch! I _totally_ get what they're trying to do here and that one day even
the cheapest toy will have 10x that RAM. But still...

~~~
DannyBee
Considering it is is 30-300x slower as its nearest competitors, ram is your
smallest problem ;-)

(see
[http://mattmahoney.net/dc/text.html](http://mattmahoney.net/dc/text.html))

Note that sadly, decompression is equally as slow, so it doesn't even make a
lot of sense for something that is "compress once, distribute billions of
times".

~~~
dllu
The aim of cmix is to use as many context mixing models as is necessary to get
the best possible compression ratio as an academic exercise. Little attempt
was made to make it faster or use less memory. That said, the author recently
communicated to me an idea to make it use orders of magnitude less memory.
Instead of running all the context models in one pass while storing all their
states, one could run each model in a separate pass, so that only the memory
for one context model needs to be stored at any point in time. Since the
original program was not multithreaded, this does not affect the running time.
Unfortunately that requires rewriting the whole thing and he is preoccupied
with other projects at the moment.

------
Cognitron
what's its weissman score?

