
Discrete Cosine Transform in Video Compression – Explain Like I'm Five - ponderingfish
https://ottverse.com/discrete-cosine-transform-dct-video-compression/
======
JackFr
Maybe I'm dense, but I didn't find this useful at all. The author spends the
bulk of the article explaining what transform means (not typically the
sticking point) and then uses terms like "pixel-domain", "frequency-domain",
"decorrelating", "energy compacting" without explanation or definition. The
author could spend a little less time on the former and more on the latter
IMHO.

~~~
OneGuy123
This, it's funny how the author talks about views/transforms and all other
stuff EXCEPT the actual intuition behind the DCT.

~~~
ponker
That’s because it’s the hardest thing to grok. I have two electrical
engineering degrees from a highly regarded university and still don’t.

------
cdavid
I am not convinced it is eli5. I am too lazy to write a blog post w/
illustration, but for audio signals, which I am more familiar with, the
intuition behind DCT (and MDCT, used e.g. for mp3) is straightforward.

Assuming you understand that a Fourier transform is an operation to go from
from time domain to the frequency domain, the problem solved by DCT, DST, etc.
is related to the fact that _digital_ signal processing are finite, and
without any care, you introduce 'irregularities' if you use a 'normal' Fourier
transform.

So the main idea of DCT/DST/etc. is to implicitly 'copy' and/or mirror the
signal, to reduce the artefacts/irregularities introduced by Fourier
Transform. Reducing irregularities intuitively leads to more regular signals,
and the more regular your signal, the quicker the high frequencies decrease,
which is the compression effect of DCT.

More mathematically, but still very informally: DCT/DST is about boundary
conditions. Using DFT (the 'normal' Fourier transform for digital signals)
will imply discontinuities at the boundaries. For continuous time signals, an
intuitive way to define regularities is to measure the decay of successive
derivative of a function f(t), by looking at convergence convergence of t^n
f(t) as t -> inf for n. That implies that regular functions have bounded
Fourier transforms, and the more regular, the faster the Fourier transform
decays.

The DST/DCT, by mirroring/copying the signal, reduce irregularities, and hence
their coefficients decrease faster.

~~~
jerf
ELI5 was a mistake. ELI12 would have made much more sense. ELI5 is so
obviously absurd that people seem to subconsciously just discard the criterion
and put all sorts of things in that no conceivable 5-year-olds would
understand, not even the genius ones we occasionally read about. ELI5 is
extremely restrictive. ELI12 would be much more sensible. An ELI12 explanation
would be possible, but it wouldn't be able to casually assume "frequency
domain" or related concepts are things that could be assumed; it would first
have to show how you can break images down into frequency components.

~~~
cdavid
Yes. I think you can bypass the idea of "frequency domain", and focus on
regular / fast changing. I think a smart 12 year old would understand the
intuition that fast changing and high frequencies are somehow related, I mean
physics gives plenty of concrete example.

And then you explain that for images, contours are fast changing, and you have
the justification for Fourier coefficients truncation ~ compression.

Explaining how DCT helps instead of DFT is the harder part.

------
Diti
This is DEFINITELY NOT explained like I’m five. More like late high school
level (mathematical planes, frequencies) and above (matrixes). I appreciate
the author’s attempt at vulgarization but the title is wrong.

~~~
olx_designer
Is this can be used as a feature extraction for image recognition task ? I
read some time ago someone used jpeg compression as a feature extraction
before DNN.

~~~
qayxc
This depends on the kind of features you're interested in.

If the features lead to clustering in the frequency domain, then yes.
Otherwise it'd be detrimental.

Examples for features that'd work well are edges and lines. Features based on
gradients wouldn't work well.

------
walagran
Pretty cool! I would add Computerphile's Youtube Video as a watch after this:
[https://www.youtube.com/watch?v=Q2aEzeMDHMA](https://www.youtube.com/watch?v=Q2aEzeMDHMA).
This is a part of their 4-video series on
JPEG:[https://www.youtube.com/playlist?list=PLQfOC23r609kmgOr_V8sf...](https://www.youtube.com/playlist?list=PLQfOC23r609kmgOr_V8sfUnr0r3pOI9gT)

~~~
djmips
Those are great! For a more in depth series of videos on the transforms
themselves I haven't found anything better than videos from Steve Brunton.
Here's a link jumping in at Fourier Series. Best set of videos I've seen on
the subject.

[https://www.youtube.com/watch?v=MB6XGQWLV04&list=PLMrJAkhIeN...](https://www.youtube.com/watch?v=MB6XGQWLV04&list=PLMrJAkhIeNNRjxJ_sMtJ02geqw_-
vuB7O&index=47)

------
EForEndeavour
The main reason why this article wasn't intuitive to me is that the toy
example is trivial, and the next example (a real image) is very nontrivial.

The first example shows through three lines of runnable Matlab code (sadly, I
don't have Matlab and it's not free, but whatever) that if you start with an
8-by-8 matrix of 255 and run the discrete cosine transform, you end up with a
sparse matrix with just one nonzero value in the first entry, because "the DCT
has compacted the energy of the matrix into the first element referred to as
the DC coefficient. The rest of the coefficients are called the AC
coefficients."

Cool? This doesn't help any more than just telling me that sentence directly.
Also, the article doesn't even expand on this fingerhold of familiarity: DC
and AC? Like direct current and alternating current, so that first element is
in some sense the zero-frequency ("direct current") term, and if the starting
example had any variation at all beyond a perfectly uniform field of 255's,
you'd start to see "energy" showing up in the "alternating-current" entries?
_That_ would start to give some intuition.

I guess what I'm saying is this article works, but it works by frustrating the
reader just enough that they modify and play with the code (porting it from
Matlab if they have to), thereby gaining the intuitive understanding that the
article promises.

~~~
mindcrime
_(sadly, I don 't have Matlab and it's not free, but whatever)_

What about Octave[1]?

[1]:[https://www.gnu.org/software/octave/](https://www.gnu.org/software/octave/)

------
ssawyer06
Being familiar with the DCT but having no idea how to explain to a five year-
old, I clicked. This does not ELI5 the DCT.

~~~
Aardwolf
How about this analogy:

You have a tap with a hot handle and a cold handle. You can make medium
lukewarm water by making hot and cold equally strong. To maintain temperature
but increase flow, you must increase both handles

After DCT, the tap instead has a handle controlling the temperature, and
another handle controlling the flow. You can make all the same combinations of
flow and temperature, but it's controlled in a different way. Medium lukewarm
water is now made by having the temperature handle halfway, and the flow then
increased by increasing only the other handle.

Not sure if this analogy works for America since American showers have, afaik,
a single rotational dial that somehow controls both temperature and flow
(???), but in Europe the distinction between the sinks with a red and a blue
handle, vs the thermostatic tap with temperature handle and flow handle, is
very common :)

~~~
jerf
" a single rotational dial somehow controls both temperature and flow (???)"

Two styles: A rotational dial that controls temperature and flow is controlled
by pulling in and out, and a rotational dial that simply controls temperature,
and flow is always maximum unless it's off.

------
djmips
If this stuff intrigues you please try out Steve Brunton's extensive set of
videos on Data Science that include superb lectures on Fourier Series, the
Fourier Transform and the Fast Fourier Transform with examples in Matlab and
Python. Can't recommend this guy enough.
[https://www.youtube.com/channel/UCm5mt-A4w61lknZ9lCsZtBw](https://www.youtube.com/channel/UCm5mt-A4w61lknZ9lCsZtBw)

------
justjonathan
This is an explanation (from the amazing three blue one brown) of the DFT not
the DCT, but this was the first thing that ever really made sense to me. It is
explained by someone who really knows math incredibly well and is an amazing
teacher, with amazing visuals:
[https://www.youtube.com/watch?v=spUNpyF58BY](https://www.youtube.com/watch?v=spUNpyF58BY)

------
cevans01
For those that have already worked with the FFT a bit, there are ways to use
the FFT to calculate a DCT:

[https://dsp.stackexchange.com/questions/2807/fast-cosine-
tra...](https://dsp.stackexchange.com/questions/2807/fast-cosine-transform-
via-fft)

------
miccah
Not DCT, but this website [1] has been on my TODO list for awhile. It explains
DFT using interactive visualizations.

[1] [https://jackschaedler.github.io/circles-sines-
signals/index....](https://jackschaedler.github.io/circles-sines-
signals/index.html)

~~~
ponderingfish
Nice .. this is definitely something to bookmark.

------
golergka
> In simple terms, the Discrete Cosine Transform takes a set of N correlated
> (similar) data-points and returns N de-correlated (dis-similar) data-points
> (coefficients) in such a way that the energy is compacted in only a few of
> the coefficients M where M << N.

That's not simple terms.

------
hinkley
3Blue1Brown ELI15 the Fast Fourier Transform, which sets up the problem
domain:

[https://www.youtube.com/watch?v=spUNpyF58BY](https://www.youtube.com/watch?v=spUNpyF58BY)

------
anovikov
Good description of how you use it for image compression, but nothing about
video compression - that is, how similarity between consecutive frames is
used.

------
rrao84
Fantastic explanation of transforms and the example of 20 questions and DCT
totally did it for me. The rest of the explanation is hard to grok if aren't a
programmer or not interested in image & video compression. YMMV

------
master_yoda_1
Looks like everyone at hackernews have intelligence of a five year old :) Sad
to see for so many human being their brain never evolve.

------
andi999
Actually even if you are not 5 this totally misses the point about the DCT.
Especially the AC components are not explained.

------
zackmorris
Here's the page that got me interested in the DCT 15 years ago:

[https://www.mathworks.com/help/images/discrete-cosine-
transf...](https://www.mathworks.com/help/images/discrete-cosine-
transform.html)

Although I must admit that I haven't fully internalized it (it's easy to
forget how it works). It might help to learn some of the more mainstream ones
like the Fourier transform first:

[https://www.mathworks.com/help/images/fourier-
transform.html](https://www.mathworks.com/help/images/fourier-transform.html)

[https://www.mathworks.com/help/images/image-
transforms.html](https://www.mathworks.com/help/images/image-transforms.html)

MATLAB (or GNU Octave which is free) is the only language I've used that maps
abstractions to code in close to a 1:1 fashion. I think of code bloat as
roughly these orders of magnitude:

Math/Matrix languages (MATLAB) 1:1

Scripting languages (JavaScript, PHP, Python, etc) 10:1

Bare metal languages (Rust, C++, Assembly) 100:1

Unfortunately I have yet to find a functional programming language that is 1:1
in the same way that MATLAB is. One would think that
Lisp/Haskell/Scala/Julia/Clojure/Erlang would be concise. But in practice,
I've found them to generally be write-only languages that are quite difficult
to grok. The only thing that comes close is spreadsheets, but due to their 2D
nature, they can't really scale beyond a certain level of complexity.
Honestly, I think that the lack of adoption of functional programming
languages (due to their own obstinance) is one of the great problems of our
time.

Anyway, for the past 5-10 years or so, I solve most problems in my head in a
data-driven way that looks like a hybrid between MATLAB and spreadsheets,
piping data between classes written in an Actor model style (I learned this
from a very good teacher during a contract at HP). I use rules of thumb like
no mutable data (stored state), mostly higher-order functions, and framing
concepts in terms of transformations rather than class inheritance. Then I
translate that to whatever language I have to use for the client, which is
probably PHP, JavaScript or C# (Unity).

I highly recommend this sort of approach for keeping abstractions separate
from implementations. Unfortunately since nobody else seems to follow this
method, it makes for some friction in the workplace. People talk past me all
of the time because they don't realize that I'm coming from this place of
experience. They don't see the need for this clean separation when they are
"trying to get things done". They may get things done faster, but if you want
bug-free implementations, this is the way to go IMHO.

Edit: I'm on the fence about TensorFlow. It's arguably more advanced than
MATLAB, but vastly more opaque. So I'm not sold on any advantage it provides,
other than perhaps better performance or usefulness in certain domains like
machine learning. Oh and any perceived lack of performance with MATLAB is an
implementation detail. I can't understand why it's not fully parallelized and
running on the GPU by now.

------
rsync
"Explain Like I'm Five"

Oh my god - are you all right ? Where are your parents ?

