
Rate-distortion optimization - atesti
https://fgiesen.wordpress.com/2018/12/10/rate-distortion-optimization/
======
tnecniv
If you're interested in this topic, the "Information Bottleneck Method" [1] is
a seminal paper in the area. In rate-distortion theory, you need to pick a
distortion function that identifies what information you would like to
maintain in your lossy compression. This paper justifies the choice of a kind
of KL-divergence as the distortion function, and gives a simple and efficient
algorithm for solving the optimization problem with this distortion function.

[1]
[https://www.cs.huji.ac.il/labs/learning/Papers/allerton.pdf](https://www.cs.huji.ac.il/labs/learning/Papers/allerton.pdf)

~~~
ssivark
I really like Kenneth Rose's review [1] of deterministic annealing, which is
the most simple and concrete example of the Information Bottleneck method that
I know of. I wanted to work with a concrete example and get some intuition, so
I wrote some Julia code [2] a couple of years ago. Should be fairly easy to
tweak it to run with Julia v1.0, or translate to Python, in case anyone's
interested.

[1]:
[https://scl.ece.ucsb.edu/sites/scl.ece.ucsb.edu/files/public...](https://scl.ece.ucsb.edu/sites/scl.ece.ucsb.edu/files/publications/b98_2_0.pdf)

[2]:
[https://github.com/sivark/DeterministicAnnealing/blob/master...](https://github.com/sivark/DeterministicAnnealing/blob/master/DeterministicAnnealing.ipynb)

------
srean
I think the post missed a great opportunity to point out that rate distortion
theory is the fundamental problem that we are stuck at -- be it viewed as a
information theory problem, machine learning problem, statistics problem,
signal processing problem ...

This. is. the. holy. grail.

Note for the lossless case with discrete alphabets we have polynomial time
universal algorithms for source coding. No matter what the unknown
distribution, these algorithm (eventually) guarantee the optimal possible
performance had we known the distribution ahead of time. No need to assume a
parametric family of distributions etc etc. We have not been able to solve
this for continuous alphabet/ lossy / rate distortion case. We don't know of
any universal algorithm that's efficient. If we knew how to do build this we
would solve ML/signal processing/statistics etc., and gone home (to build
other things). Even an impossibility result would help -- that says you cannot
have an efficient algorithm unless, say P=NP.

------
ahartmetz
This guy regularly explains difficult and interesting topics very well. The
"trip through the graphics pipeline" is probably the best explanation of
modern GPUs on the internet.

------
ttoinou
Interesting. The quest to choose the right lambda seems obvious to me : the
two variables R and D don't have the same unit/dimension so basically the
lambda means "how much are we ready to give up on bitrate to compensate for
this kind of distortion" because the lambda from a physical point of view is a
conversion from one unit to the other

~~~
srean
Too bad that you are not into Economics. Your comment would have earned you a
Nobel, may be more than one [dead straight face serious]

~~~
ttoinou
^^ which paper ? I usually get sick of economics grounded on / modeled by
maths

~~~
srean
Few economics Nobel prizes were essentially rediscovery of Lagrangian
multiplier method, which is bread and butter stuff in optimization.

