
DeepZip: Lossless Compression Using Recurrent Networks [pdf] - stelonix
https://web.stanford.edu/class/cs224n/reports/2761006.pdf
======
danielbigham
I have a suspicion that in the future computers will use a vaguely DNN-like
approach for compression, but I haven't seen any super compelling examples of
this yet. I can say "imagine a photo taken at 2000 feet and 2 miles away of
the Eifel Tower with a beautiful orange sunset in the background", and your
mind can use it's internal representation to render something along those
lines. I feel like computers some day should be able to do the same thing,
except that with another kilobyte of constraints I should be able to make the
final result really close to a particular photo that matches that description.

~~~
anonfunction
Though it has little bearing on your point, many people including myself do
not possess the ability to "imagine a photo". It is known as aphantasia and
Blake Ross has a great article[1] on the subject.

1\. [https://www.facebook.com/notes/blake-ross/aphantasia-how-
it-...](https://www.facebook.com/notes/blake-ross/aphantasia-how-it-feels-to-
be-blind-in-your-mind/10156834777480504/)

~~~
exikyut
Well, in an even _more_ distant future, a more developed version of the kind
of neural networks that the GP describes could be hooked up to your brain to
provide this functionality.

------
visarga
On the one hand, DeepZip does neural net based compression.

On the other hand, in "The Case for Learned Index Structures" [1] paper, they
replace B-Trees with neural nets:

> by using neural nets we are able to outperform cache optimized B-Trees by up
> to 70% in speed while saving an order of magnitude in memory over several
> real world data sets

Basically ML/DL is coming for the classic data structures, replacing some of
them with learning based versions.

[1] [https://arxiv.org/abs/1712.01208](https://arxiv.org/abs/1712.01208)

------
brian_cloutier
Can someone who knows more about neural networks tell me how this works?
Apparently the model's weights don't need to be sent over the network. The
only thing which is shared is some seed used to initialize the model with
random weights. While the network encodes data it simultaneously updates its
weights. To decode, somehow you can run the same process and magically get
back the original data?

The paper doesn't explain how this is possible, how would I go about
implementing something like this?

~~~
john_moscow
The naive solution would be to split the input data into the batches. Compress
the first batch using a random model, then train a new model based on it.
Compress the second batch using the model trained on the first batch and so
on. Depending on your data structure, you can use a model trained on all the
previous batches (or just N previous batches) when compressing the i-th batch.

This can be likely optimized (e.g. 7-Zip's arithmetical coder doesn't split
data into batches, but rather uses exponentially filtered conditional
probabilities that are instantaneously updated each time a new sample is
processed). I'm no expert in RNNs, but my hunch is that you can adjust the
model incrementally using backpropagation right after it processes another
sample instead of completely retraining it.

~~~
brian_cloutier
Interesting.

So, to decode you would start with the unoptimized model and decode the first
batch. You then train using that first batch to get a better model and use
your improved model to decode the next batch.

Okay, that makes sense; thanks for the response!

------
NegatioN
"We know that Recurrent Neural Networks (LSTM/GRU) based models are good at
capturing long term dependencies"

This seems like a bold statement. We know they are good at capturing some
structure from sequential data at least, but they seem lacking in some regards
[0].

However it seems to yield some interesting results. The architecture is like
an autoencoder, but with RNNs connected at each layer? Have you tried using
NTMs?

[0]:
[https://rylanschaeffer.github.io/content/research/neural_tur...](https://rylanschaeffer.github.io/content/research/neural_turing_machine/main.html)
(section on LSTM Copy Performance)

------
rytill
Does anyone have a link to a good replacement resource for the 2nd source? The
link is broken.

[2] EE376C Lecture Notes: Universal Schemes in Information Theory:
[http://web.stanford.edu/class/ee376c/lecturenotes/intro_lect...](http://web.stanford.edu/class/ee376c/lecturenotes/intro_lecture_slides_ee376c_fall_2016_2017.pdf)

------
zitterbewegung
Why aren’t they comparing it to parq or even bzip?

~~~
lern_too_spel
Because the author compares to the best domain-specific compressors (by
compression ratio) for each dataset instead. For text, he used CMIX[1]. For
DNA sequences, he used MFCompress[2].

[1] [http://mattmahoney.net/dc/text.html](http://mattmahoney.net/dc/text.html)

[2]
[http://bioinformatics.ua.pt/software/mfcompress/](http://bioinformatics.ua.pt/software/mfcompress/)

------
Torkel
nit: an abstract should describe the obtained result

------
YetAnotherNick
Correct me if I am wrong, but it seems they haven't counted the network
storage.

~~~
batmansmk
Quoting: " If the model is trained for more than an epoch, the weights of the
RNN Estimator Block also need to be stored, and would count in the compression
size (MDL Principle [14])"

~~~
YetAnotherNick
After another read, they are only storing seed and not the weights, needing
entire data to be decoded, and so the compressed format cannot be transferred
without all the data.

> Only random seed is Stored: The model is initialized using a stored random
> seed, which is also communicated to the decoder. Thus, the effectively the
> model weights do not contribute to the total compressed data size

> Single Pass: As we do not store the weights of the model, we perform a
> single pass (1 epoch) through the data

------
maliker
Could the Weissman score be higher than middle-out methods?

