
NNCP: Lossless Data Compression with Neural Networks - MrXOR
https://bellard.org/nncp/
======
cs702
A natural question I've pondered from time to time is whether Fabrice Bellard
is really a time traveler from a more advanced civilization in the future,
sent back in time to show us, mere mortals, what humankind will be capable of
in the future.

If this sounds far-fetched, consider that he has _created_ FFMPEG, QEMU,
LibBF, SoftFP, BPG, TinyEMU, a software implementation of 4G/LTE, a PC
emulator in Javascript, the TCC compiler, TinyGL, LZEXE, and a tiny program
for computing the biggest known prime number.

And that's just a partial list of his successful projects, which now of course
also include software for more efficient lossless compression with deep LSTM
and Transformer neural networks.

Any of these projects, on its own, would be considered a notable achievement
for an ordinary human being.

Fabrice Bellard deserves some kind of superhuman lifetime achievement award.

Source: [https://bellard.org](https://bellard.org)

~~~
eismcc
Agree. It’s also a message to the rest of us that we are likely capable of a
lot more than we think.

~~~
makapuf
Agree. It's also an - involuntary - message to the rest of us that we are
likely capable of a lot less than he is.

------
raphlinus
And this is why Fabrice Bellard is so often cited as an example of a 10x
programmer: his relatively simple LSTM-based neural net compressor is 10x
faster than one of the more complex state of the art algorithms while
delivering comparable compression ratios.

~~~
javierluraschi
Where is the reference to the 10X improvement? As in, which benchmark are you
comparing against? Thanks!

~~~
beagle3
As in “one Fabrice Bellard is worth 10 or more average programmers”.

~~~
javierluraschi
I mean, regarding algorithm speed.

From the conclusion:

“We presented a practical implementation of an LSTM based lossless compressor.
Although it is slow, its description is simple and the memory consumption is
reasonable compared to compressors giving a similar compression ratio. It
would greatly benefit from a dedicated hardware implementation.”

I only skimmed through the paper, but looks to me like this might be a 10X
improvement over other LSTM compression methods, not state-of-the-art
compression.

That said, I’m far from being an expert in compression and was hoping from
someone in HN to explain how relevant this is in their discipline.

~~~
stan_rogers
The comment you are replying to was trying to explain the "10X programmer"
idea. Nowhere was it claimed that the algorithm would have a 10X performance
improvement.

~~~
javierluraschi
No, I meant this comment: "His relatively simple LSTM-based neural net
compressor is 10x faster than one of the more complex state of the art
algorithms"

------
userbinator
Interesting to see another Fabrice project, and great to see compression
performance on par with the best available, but unfortunately it still falls
short of what's necessary to win this prize:

[http://prize.hutter1.net/index.htm](http://prize.hutter1.net/index.htm)

(I've computed that 14,826,395 bytes is required to become the next winner.
CMIX comes close, but its huge memory consumption disqualifies it.)

------
londons_explore
Note that this uses CPU only to run the neural network.

Accelerators are now the norm for this sort of stuff, hence why this is so
slow.

Also note this is closed source. Unusual for bellard, and also unusual for
this sort of research.

~~~
loeg
> Also note this is closed source. Unusual for bellard

Not wholly unusual for Bellard. E.g.,
[https://bellard.org/lte/](https://bellard.org/lte/)

------
rosstex
Could someone explain how these are guaranteed to be lossless?

~~~
gwern
Arithmetic encoding
[https://en.wikipedia.org/wiki/Arithmetic_encoding](https://en.wikipedia.org/wiki/Arithmetic_encoding)
. The way I would put it is: the statistical model does its best to predict
the next bits, assigning short bit strings to what it thinks is likely to come
next and long bit strings to less likely next bits, and if it is indeed right,
only the short bitstrings need to be emitted; if it's wrong, more long bit
strings must be spent to correct it and emit the exact right bitstring. (Like
using shorthand with lots of convenient shortcuts, and then falling back to
painfully writing everything out explicitly.) So, you can use imperfect lossy
probabilistic models, while still getting lossless compression.

------
jason_slack
I wish I could ask Fabrice how he structure his day/time and balances life and
family.

Edit: typo

------
thesz
Why did he not used Time Convolutional Networks?

[1]
[https://openreview.net/pdf?id=rk8wKk-R-](https://openreview.net/pdf?id=rk8wKk-R-)

[1] shows that TCN may outperform LSTM on tasks that really needs memory.

~~~
317070
Because LSTM's are proven over time to be industry grade, and are not latest-
greatest-but-no-one-has-really-reproduced-the-results-yet.

A paper with inductive results is just a first step in a really long process
until ideas are commonplace accepted.

~~~
thesz
For every time you say "LSTM is industry grade" I would point you to large
companies (Google-level) which train their acoustic models with TCN
architectures.

He used Transformer, why not TCN?

