
H.264 is Magic - LASR
https://sidbala.com/h-264-is-magic/
======
lostgame
Absolutely love this:

'Suppose you have some strange coin - you've tossed it 10 times, and every
time it lands on heads. How would you describe this information to someone?
You wouldn't say HHHHHHHHH. You would just say "10 tosses, all heads" \- bam!
You've just compressed some data! Easy. I saved you hours of mindfuck
lectures.'

This is a really great, simple way to explain what is otherwise a fairly
complex concept to the average bear. Great work.

~~~
agumonkey
RLE a given. It's true that the average person rarely understand that this is
what computers call compression, but everything after that involves a bit of
thinking. Optimal huffman.

~~~
nightcracker
IMO Huffman is conceptually more complicated (not the implementation, but the
logic) than arithmetic coding.

And Huffman isn't optimal unless you are lucky, unlike arithmetic coding.

~~~
agumonkey
I never learned AC. It's on my overflowing stack of thing to read about.

~~~
nightcracker
AC is conceptually stupidly simple. All you do is encode a string of symbols
into a range of real numbers.

To start your range is [0, 1). For each symbol you want to encode you take
your range and split it up according to your probabilities. E.g. if your
symbols are 25% A, 50% B and 25% C, then you split up that range in [0, 0.25)
for A, [0.25, 0.75) for B and [0.75, 1) for C.

Encoding multiple symbols is just applying this recursively. So to encode the
two symbols Bx we split up [0.25, 0.75) proportionally just like we did [0, 1)
before to encode x (where x is A, B or C).

As an example, A is the range [0, 0.25), and AC is the range [0.1875, 0.25).

Now to actually turn these ranges into a string of bits we choose the shortest
binary representation that fits within the range. If we look at a decimal
number:

    
    
        0.1875
    

We know that this means 1/10 + 8/100 + 7/1000 + 5/10000\. A binary
representation:

    
    
        0.0011
    

This means 0/2 + 0/4 + 1/8 + 1/16 = 0.1875. So we encode AC as 0011.

\---

The beauty of arithmetic coding is that after encoding/decoding any symbol we
can arbitrarily change how we split up the range, giving rise to adaptive
coding. Arithmetic coding can perfectly represent any data that forms a
discrete string of symbols, including changes to our knowledge of data as we
decode.

~~~
titanomachy
Nice explanation. Can you explain how to remove ambiguity relating to string
length?

"0" = 0.0b = 0 falls in the range [0,0.25) so it's a valid encoding for "A";
but isn't it also a valid encoding for "AA", "AAA", etc.?

AA = [0,0.25) * [0, 0.25) = [0, 0.125), and so on.

It seems that adding "A"s to a string in general doesn't change its encoding.

~~~
Dylan16807
You either reserve a symbol for "end of stream" or externally store the
length.

It's the equivalent to pretending a Huffman stream never ends and is padded
with infinite 0s.

------
userbinator
The lossy transform is important, but I think what's actually most important
in video compression is getting rid of redundancy --- H.264 actually has a
lossless mode in which that transform is not used, and it still compresses
rather well (especially for noiseless scenes like a screencast.) You can see
the difference if you compare with something like MJPEG which is essentially
every frame independently encoded as a JPEG.

The key idea is to encode differences; even in an I-frame, macroblocks can be
encoded as differences from previous macroblocks, and with various filterings
applied: [https://www.vcodex.com/h264avc-intra-
precition/](https://www.vcodex.com/h264avc-intra-precition/) This reduces the
spatial redundancies within a frame, and motion compensation reduces the
temporaral redundancies between frames.

You can sometimes see this when seeking through video that doesn't contain
many I-frames, as all the decoder can do is try to decode and apply
differences to the last full frame; if that isn't the actual preceding frame,
you will see the blocks move around and change in odd ways to create sometimes
rather amusing effects, until it reaches the next I-frame. The first example I
found on the Internet shows this clearly, likely resulting from jumping
immediately into the middle of a file:
[http://i.imgur.com/G4tbmTo.png](http://i.imgur.com/G4tbmTo.png) That frame
contains only the differences from the previous one.

As someone who has written a JPEG decoder just for fun and learning purposes,
I'm probably going to try a video decoder next; although I think starting from
something simpler like H.261 and working upwards from there would be much
easier than starting immediately with H.264. The principles are not all that
different, but the number of modes/configurations the newer standards have ---
essentially for the purpose of eliminating more redundancies from the output
--- can be overwhelming. H.261 only supports two frame sizes, no B-frames, and
no intra-prediction. It's certainly a fascinating area to explore if you're
interested in video and compression in general.

~~~
logicallee
This is really interesting and the imgur picture you linked (with your
explanation) explains it really clearly!

But when seeking, why wouldn't any local media playback seek backwards and
reconstruct the full frame? It's not like the partial frame after seeking is
useful - I'd rather wait 2 seconds while it scrambles (i mean "hurries up") to
show me a proper seek, wouldn't everyone?

What was your Internet search for finding that imgur frame? What is this
effect called?

~~~
noisem4ker
>why wouldn't any local media playback seek backwards and reconstruct the full
frame?

Most codecs/players do. VLC used to be criticized for being different in that
regard. One possible advantage is istantaneous seeking, as there's no need to
decode all the needed frames (which could amount to several seconds of video)
between the nearest I-frames[1] (the complete reference pictures) and the
desired one.

[1]: plural, because prediction can also be bidirectional in time

The use of incomplete video frame data for artistic purposes is called
"datamoshing".

~~~
kakarot
I try to use VLC when I can because it offers intuitive playlist support, but
for high-resolution H.264 and friends I usually have to switch to Media Player
Classic.

VLC is willing to let my entire screen look like a blob of grey alien shit for
10 seconds instead of just taking a moment to reconstruct frames.

And its hardware acceleration for newer codecs is balls. Sucks because
otherwise, it's right up there with f2k for me.

~~~
TheAceOfHearts
I stopped using VLC when I found mpv [0]. I really like it because it exposes
everything from the CLI, so once you're familiarized with the flags you're
interested in using, it's easy to play anything. For everyday usage it "just
works" too, as expected of any video player.

[0] [https://mpv.io/](https://mpv.io/)

~~~
nitrogen
How does it compare to mplayer? My biggest complaint about mplayer is it still
doesn't play VFR videos well.

~~~
77pt77
I've tried it.

* Sane defaults (encodings and fonts, scaletempo for audio)

* instantaneous play of next and previous videos

* navigation in random playlist actually works

* Easy always on top key binding

* Most mplayer key bindings work

I'll definitely keep on trying it for a while.

~~~
Drdrdrq
Does it include all the codecs by default? I think this was a major reason VLC
succeeded the way it did. With all other players (BPlayer anyone?) you needed
to find and install tons of codecs while in VLC it just worked.

~~~
77pt77
It has played everything I've thrown at it so far...

------
szemet
I thought I'll learn something special about H.264, but all information here
is high level and generic.

For example if you replace H.264 with a much older technology like mpeg-1
(from 1993) every sentence stays correct, except this:

 _" It is the result of 30+ years of work"_ :)

~~~
phire
Also the fleeting mention of b-frames, which mpeg-1 doesn't have. And I
believe mpeg-1 doesn't use 16×16 macroblocks.

Still, it's a good overview of generic video compression.

~~~
szemet
[https://en.wikipedia.org/wiki/MPEG-1#B-frames](https://en.wikipedia.org/wiki/MPEG-1#B-frames)

[https://en.wikipedia.org/wiki/MPEG-1#Macroblocks](https://en.wikipedia.org/wiki/MPEG-1#Macroblocks)

------
amluto
Nice article! The motion compensation bit could be improved, though:

> The only thing moving really is the ball. What if you could just have one
> static image of everything on the background, and then one moving image of
> just the ball. Wouldn't that save a lot of space? You see where I am going
> with this? Get it? See where I am going? Motion estimation?

Reusing the background isn't motion compensation -- you get that by encoding
the differences between frames so unchanging parts are encoded very
efficiently.

Motion compensation is when you have the camera follow the ball and the
background moves. Rather than encoding the difference between frames itself,
you figure out that most of the frame moved and you encode the different from
one frame to a shifted version of the blocks from a previous frame.

Motion compensation won't work particularly well for a tennis ball because
it's spinning rapidly (so the ball looks distinctly different in consecutive
frames) but more importantly because the ball occupies a tiny fraction of the
total space so it doesn't help that much.

Motion compensation should work much better for things like moving cars and
moving people.

~~~
erydo
Your example seems to assume translation only. I wonder how difficult/useful
it would be to identify other kinds of time-varying characteristics
(translation, rotation, scale, hue, saturation, brightness, etc) of partial
scene elements in an automated way.

Along the same lines, it would be interesting to figure out an automated time-
varying-feature detection algorithm to determine which kinds of transforms are
the right ones to encode.

Do video encoders already do something like this? It seems like a pretty
difficult problem since there are so many permutations of applicable
transformations.

~~~
Animats
_I wonder how difficult /useful it would be to identify other kinds of time-
varying characteristics (translation, rotation, scale, hue, saturation,
brightness, etc) of partial scene elements in an automated way._

That's how Framefree worked. It segments the image into layers, computes a
full morph, including movement of the boundary, between successive frames for
each layer, and transmits the before and after for each morph. Any number of
frames can be interpolated between keyframes, which allows for infinite slow
motion without jerk.[1] You can also upgrade existing content to higher frame
rates.

This was developed back in 2006 by the Kerner Optical spinoff of Lucasfilm.[2]
It didn't catch on, partly because decompression and playback requires a
reasonably good GPU, and partly because Kerner Optical went bust. The segment-
into-layers technology was repurposed for making 3D movies out of 2D movies,
and the compression product was dropped. There was a Windows application and a
browser plug-in. The marketing was misdirected - somehow, it was targeted to
digital signs with limited memory, a tiny niche.

It's an idea worth revisiting. Segmentation algorithms have improved since
2006. Everything down to midrange phones now has a GPU capable of warping a
texture. And it provides a way to drive a 120FPS display from 24/30 FPS
content.

[1] [http://creativepro.com/framefree-technologies-launches-
world...](http://creativepro.com/framefree-technologies-launches-world-s-most-
advanced-digital-imaging-software/) [2]
[https://web.archive.org/web/20081216024454/http://www.framef...](https://web.archive.org/web/20081216024454/http://www.framefree.com/)

~~~
danieltillett
John do you know where all the patents on Framefree ended up?

~~~
Animats
Ask Tom Randoph, who was CEO of FrameFree. He's now at Quicksilver Scientific
in Denver.

~~~
Animats
Some venture IP company in Tokyo called "Monolith Co." also had rights in the
technology.[1] "As of today (Sept. 5, 2007), the company has achieved a
compression rate equivalent to that of H.264 and intends to further improve
the compression rate and technology, Monolith said."[2] (This is not Monolith
Studios, a game development company in Osaka.) Monolith appears to be defunct.

The parties involved with Framefree were involved in fraud litigation around
2010.[3] The case record shows various business units in the Cayman Islands
and the Isle of Jersey, along with Monolith in Japan and Framefree in
Delaware. No idea what the issues were. It looks like the aftermath of failed
business deals.

The inventors listed on the patents are Nobuo Akiyoshi and Kozo Akiyoshi.[4]

[1]
[https://www.youtube.com/watch?v=VBfss0AaNaU](https://www.youtube.com/watch?v=VBfss0AaNaU)
[2]
[http://techon.nikkeibp.co.jp/english/NEWS_EN/20070907/138905...](http://techon.nikkeibp.co.jp/english/NEWS_EN/20070907/138905/)
[3] [http://www.plainsite.org/dockets/x8gi572m/superior-court-
of-...](http://www.plainsite.org/dockets/x8gi572m/superior-court-of-
california-county-of-san-francisco/canyon-capital-a-cayman-island-company-v-
kozo-akiyoshi-et-al/) [4] [http://patents.justia.com/inventor/nobuo-
akiyoshi](http://patents.justia.com/inventor/nobuo-akiyoshi)

~~~
danieltillett
Great dectective work. I suspect the IP is now a total mess - with luck nobody
has been paying the patent renewal fees and everything is now free.

------
adilparvez
Related, how h265 works:
[http://forum.doom9.org/showthread.php?t=167081](http://forum.doom9.org/showthread.php?t=167081)

This is a great overview and the techniques are similar to those of h264.

I found it invaluable to get up to speed when I had to do some work on the
screen content coding extensions of hevc in Argon Streams. They are a set of
bit streams to verify hevc and vp9, take a look, it is a very innovative
technique:

[http://www.argondesign.com/products/argon-streams-
hevc/](http://www.argondesign.com/products/argon-streams-hevc/)
[http://www.argondesign.com/products/argon-streams-
vp9/](http://www.argondesign.com/products/argon-streams-vp9/)

~~~
agumonkey
Heh, happy to see doom9 still alive and kicking. They were the n°1 resource in
the early days of mainstream video compression.

~~~
IshKebab
It's not really alive and kicking. The forum is still active but the rest of
the site hasn't been touched since 2008.

------
woliveirajr
I love how you can edit photos from people to correct some skin imperfections
without loosing the touch that the image is real (and not that blurred,
plastic look) when you decompose it in wavelets and just edit some
frequencies.

Don't know in photoshop, but in Gimp there's a plugin called "wavelet
decomposer" that does that.

~~~
avian
I guess this is the plugin you are talking about? Interesting.

[http://registry.gimp.org/node/11742](http://registry.gimp.org/node/11742)

~~~
woliveirajr
Exactly that.

There was a question about retouching photos some while ago
([http://photo.stackexchange.com/questions/48999/how-do-i-
take...](http://photo.stackexchange.com/questions/48999/how-do-i-take-
flattering-photos-of-people-with-acne-scarring)) that using wavelets was a
good use of it.

------
mherrmann
I recently experienced this as follows:
[https://www.sublimetext.com](https://www.sublimetext.com) has an animation
which is drawn via JavaScript. In essence, it loads a huge .png [1] that
contains all the image parts that change during the animation, then uses
<canvas> to draw them.

I wanted to recreate this for the home page of my file manager [2]. The best I
could come up with was [3]. This PNG is 900KB in size. The H.264 .mp4 I now
have on the home page is only 200 KB in size (though admittedly in worse
quality).

It's tough to beat a technology that has seen so much optimization!

1:
[http://www.sublimetext.com/anim/rename2_packed.png](http://www.sublimetext.com/anim/rename2_packed.png)

2: [https://fman.io](https://fman.io)

3:
[https://www.dropbox.com/s/89inzvt161uo1m8/out.png?dl=0](https://www.dropbox.com/s/89inzvt161uo1m8/out.png?dl=0)

~~~
hrjet
You could give FLIF [1] a try. With the help of Poly-FLIF [2] you can render
it in the browser. Don't forget to try the lossy mode, it gives better
compression with negligible loss in quality.

1: [http://flif.info](http://flif.info)

2: [https://github.com/UprootLabs/poly-
flif/](https://github.com/UprootLabs/poly-flif/)

------
the8472
> Chroma Subsampling.

Sadly, this is what makes video encoders designed for photographic content
unsuitable for transferring text or computer graphics. Fine edges, especially
red-black contrasts start to color-bleed due to subsampling.

While a 4:4:4 profile exists a lot of codecs either don't implement it or the
software using them does not expose that option. This is especially bad when
used for screencasting.

Another issue is banding, since h.264's main and high profiles only use 8bit
precision, including for internal processing, and the rounding errors
accumulate, resulting in banding artifacts in shallow gradients. High10
profile solves this, but again, support is lacking.

~~~
astrange
It's easy to make a 4:2:0 upscaler that doesn't color bleed. Everyone just
uses nearest-neighbor, which sucks, and then blames the other guy.

~~~
shabbyrobe
How would you make a 4:2:0 upscaler that doesn't color bleed?

~~~
astrange
50% solution: bicubic or bilinear. 90% solution: EEDI3. (kinda slow) 99%
solution: use the full resolution Y plane for edge-direction.

~~~
the8472
I don't think that can accurately restore the details that have been created
by subpixel-AA font rendering.

But if you have source/subsampled/interpolated comparisons that show 99%
identical results i would be interested to see them.

Of course all that is useless if you don't have control over the output
device. Just having the ability to record 4:4:4 makes the issue go away as
long as the target can display it, no matter what interpolation they use.

------
algesten
"See how the compressed one does not show the holes in the speaker grills in
the MacBook Pro? If you don't zoom in, you would even notice the difference. "

Ehm, what?! The image on the right looks really bad and the missing holes was
the first thing I noticed. No zooming needed.

And that's exactly my problem with the majority of online video (iTunes store,
Netflix, HBO etc). Even when it's called "HD", there are compression artefacts
and gradient banding everywhere.

I understand there must be compromises due to bandwidth, but I don't agree on
how much that compromise currently is.

~~~
dosshell
>No zooming needed

Isn't the images above the text a zoomeed version?

>Here is a close-up of the original...

~~~
algesten
I took it to mean that we had to zoom to see that the holes were gone in the
compressed version.

------
dluan
By the way, this is an incredible example of scientific writing done well.
It's very tangible jelly-like feeling that the author clearly has for the
topic, conveyed well to the readers. This whole thread is people excited about
a video codec!

~~~
LASR
Thank you! It means a lot to me. Yes, I try to convey my sense of excitement
about technology to other people.

------
eutectic
Anyone who likes this would probably also enjoy the Daala technology demos at
[https://xiph.org/daala/](https://xiph.org/daala/) for a little taste of some
newer, and more experimental, techniques in video compression.

~~~
dao-
Note that Daala has been discontinued in favor of AV1:
[https://en.wikipedia.org/wiki/AOMedia_Video_1](https://en.wikipedia.org/wiki/AOMedia_Video_1)

Previously Daala was presented as a candidate for NETVC but apparently this
didn't go anywhere?
[https://en.wikipedia.org/wiki/NETVC](https://en.wikipedia.org/wiki/NETVC)

~~~
TD-Linux
A lot of Daala tools are now being copied into AV1, the largest being PVQ:
[https://aomedia-review.googlesource.com/#/c/3220/](https://aomedia-
review.googlesource.com/#/c/3220/)

------
hal9000xp
Just yesterday, I've read this one:

[http://web.cs.ucla.edu/classes/fall03/cs218/paper/H.264_MPEG...](http://web.cs.ucla.edu/classes/fall03/cs218/paper/H.264_MPEG4_Tutorial.pdf)

How does DCT work:

[https://www.youtube.com/watch?v=Q2aEzeMDHMA&](https://www.youtube.com/watch?v=Q2aEzeMDHMA&)

------
alexandrerond
Very well explained. But I could have understood it all without the bro-
approach to the reader. You see where I am going with this? Get it? See where
I am going? Ok!

~~~
alexk307
Maybe I'm in the minority here but I think it adds a bit of color to an
otherwise dry topic to write about.

~~~
nothrabannosir
I remember loving this style when I was a novice, e.g. Beej's networking
tutorial. Not a big fan anymore, either, but certainly valuable for (part of)
the target audience, I think.

------
spacehacker
The part about entropy encoding only seems explain run-length encoding (RLE).
Isn't the interesting aspect of making use of entropy in compression rather to
represent rarer events with longer longer code strings?

The fair coin flip is also an example of a process that cannot be compressed
well at all because (1) the probably of the same event happening in a row is
not as high as for unfair coins (RLE is minimally effective) and (2) the
uniform distribution has maximal entropy, so there is no advantage in using
different code lengths to represent the events. (Since the process has a
binary outcome, there is also nothing to gain in terms of code lengths for
unfair coins.)

------
john111
Can someone explain how the frequency domain stuff works? I've never really
understood that, and the article just waves it away with saying it's like
converting from binary to hex.

~~~
rayiner
It's a bad analogy. Binary and hex are just different _formats_ for
representing the same number. Spatial domain and frequency domain are
different views of a complex data set. In the spatial domain, you are looking
at the intensity of different points of the image. In the frequency domain,
you are looking at the frequencies of intensity changes in patterns in the
image.

A good way to develop an intuition for the fourier space is to look at simple
images and their DFT transforms:
[http://web.cs.wpi.edu/~emmanuel/courses/cs545/S14/slides/lec...](http://web.cs.wpi.edu/~emmanuel/courses/cs545/S14/slides/lecture10.pdf)
(3/4 of the way through the slide deck).

This analysis of a "bell pepper" image and its transform is also helpful:
[https://books.google.com/books?id=6TOUgytafmQC&pg=PA116&lpg=...](https://books.google.com/books?id=6TOUgytafmQC&pg=PA116&lpg=PA116&dq=dft+image+intensity+changes&source=bl&ots=m8lUh6N0ms&sig=dXNH4GYn39FVen9nZ7pv30zss5k&hl=en&sa=X&ved=0ahUKEwicnYfFpI_QAhWs6YMKHcoqBOIQ6AEIRjAG#v=onepage&q=dft%20image%20intensity%20changes&f=false).

As for why you want to do this: throwing away bits in the spatial domain
eliminates distinctions between similar intensities, making things look
blocky. In the frequency domain, however, you can throw away high-frequency
information, which tends to soften patterns like the speaker grills in the MBP
image that the human eye isn't that sensitive to to begin with.

~~~
astrange
> Spatial domain and frequency domain are different views of a complex data
> set.

Or in this case, a real data set.

------
amelius
> discard information which will contain the information with high frequency
> components. Now if you convert back to your regular x-y coordinates, you'll
> find that the resulting image looks similar to the original but has lost
> some of the fine details.

I would expect also the edges in the image to become more blurred, as edges
correspond to high-frequency content. However, this only seems to be slightly
the case in the example images.

~~~
LordDragonfang
You can see exactly that with the speaker grill and the text (This type of
transformation is notoriously bad at compressing images of text, and is why
you shouldn't use jpg for pictures of text)

In this context, the edges of, say, the macbook are not "high frequency"
content, since they only feature one change (low to high luminosity) in a
given block rather than several (high-low-high-low-high) like for the grill.

~~~
amelius
You should have a look at the Fourier transform of a step-function. It has
high frequency components.

------
amelius
What are directions for the future? Could neural networks become practically
useful for video compression? [1]

[1]
[http://cs.stanford.edu/people/eroberts/courses/soco/projects...](http://cs.stanford.edu/people/eroberts/courses/soco/projects/2000-01/neural-
networks/Applications/imagecompression.html)

~~~
jerf
Suppose I have a table of 8-digit numbers that I need to add and subtract for
various reasons. Do I A: have a child, train them how to read numbers, add,
and subtract, and then have the child do it or B: use a calculator purpose
built to add and subtract numbers?

Neural nets are always expensive to train. You'd better be getting something
from them that you can't get some other way.

~~~
gugagore
Yes, you don't need the machinery of learning when you already have an
algorithm you're happy with. Adding a table of numbers, I don't think anyone
hopes to do much better than we already do with our circuits and computer
architectures.

With video compression, I think most would agree that there might be better
architectures/algorithms that we haven't stumbled upon yet. Whether
specifically "neural networks" will be the shape of a better architecture, I
don't know. But almost surely some meta-algorithm that can try out tons of
different parameters/data-pipeline-topologies for something that vaguely
resembles h.264 might find something better than h.264.

Neural nets are expensive to train. But so is designing h.264.

------
iplaw
H.265 gets you twice the resolution for the same bandwidth, or the same
resolution for half the bandwidth.

~~~
ksec
H.265 gets you half the file size for ten times more in royalty fees, or
saving 50% of bandwidth for 1000% more in royalty.

~~~
_puk
Do you have a reference for that?

I was under the impression that the first 100,000 units are free, and then 20c
per unit afterwards to a max of $25m.

H264 drops to 10c per unit after 5m units, to a max of $6.5m.

You need to be shipping 125 million units annually to hit the full $25m.

Yes it's more, but it's not quite ten times. And notably if the chip maker
pays the royalties, then the content creators don't need to (though that was
excepted indefinitely with H264).

Parts regurgitated from a quick google for reference [1]

[1]
[http://www.theregister.co.uk/2014/10/03/hevc_patent_terms_th...](http://www.theregister.co.uk/2014/10/03/hevc_patent_terms_thrashed_out_but_look_whos_not_at_the_codec_party_microsoft_and_google/)

~~~
brigade
HEVC got an additional licensing pool in HEVC Advance that demanded
significantly greater license fees on top of MPEG LA's.

Said group's demands are basically _the_ reason Netflix started considering
VP9.

~~~
TD-Linux
Two additional, as Technicolor later dropped out of HEVC Advance and is now
licensing theirs individually:
[http://www.streamingmedia.com/Articles/Editorial/Featured-
Ar...](http://www.streamingmedia.com/Articles/Editorial/Featured-
Articles/Technicolor-Withdraws-from-the-HEVC-Advance-Patent-Pool-108941.aspx)

------
kakarot
Ya'll wanna get the most out of your H.264 animu rips? Check out Kawaii Codec
Pack, it's based on MPC and completely changed my mind about frame
interpolation. [http://haruhichan.com/forum/showthread.php?7545-KCP-
Kawaii-C...](http://haruhichan.com/forum/showthread.php?7545-KCP-Kawaii-Codec-
Pack)

~~~
voltagex_
a) offtopic

b) Leave codec packs in 2000 where they belong. They are a great malware
vector and also good at messing with settings they shouldn't.

>KCP utilizes the following components: MPC-HC - A robust DirectShow media
player. madVR - High quality gpu assisted video renderer. Included as an
alternative to EVR-CP. xy-vsfilter / XySubFilter(future) - Superior subtitle
renderer. LAV-Filters - A package with the fastest and most actively developed
DirectShow Media Splitter and Decoders. (Optional) ReClock - Addresses the
problem of audio judder by adapting media for smooth playback OR utilized for
bit perfect audio.

I'm actually using MPC-HC and AC3Filter to deal with some files where I
couldn't hear the centre channel on VLC (on stereo speakers). Everything else
isn't really needed.

~~~
kakarot
oh crap it's the topic police. I use it specifically for madVR and
interpolating frames for high-quality low FPS anime. It looks really great.
The best I've found for this particular purpose. Be nice.

------
Savageman
I wonder if across a lot of videos, the frequency domain representations look
similar and if instead of masking in a circle we could mask with other (pre-
determined) shapes to keep more information (this would require decoders to
know them, of course). Or maybe this article is too high-level and it's not
possible to "shape" the frequencies.

~~~
LASR
It's certainly possible to use any arbitrary shape. The way it really works is
that there is a quantization matrix - which essentially is a configurable mask
for your frequency domain signal.

Yes, I've dumbed it down in the article to a simple circle to illustrate the
point.

------
nojvek
This is a really well written article. Exactly why I love HN. Sometimes you
get this nice technical intros into fields you thought were black magic.

------
rimbombante
Articles like this are what makes HN great, and not all those repeated links
to the visual studio 1.7.1.1.0.1.pre02-12323-beta3 changelog.

------
mtw
Even better H.265 with 40-50% bit rate reduction compared with H.264, at the
same visual quality!

~~~
olegkikin
But much higher hardware requirements for both encoding and decoding. Encoding
is like 8x slower too.

------
el0j
The PNG size seems to be misrepresented. The actual PNG is 637273 bytes when I
download it, and 597850 if I recompress it to make sure we're not getting
fooled by a bad PNG writer.

So instead of the reported 916KiB we're looking at 584KiB.

This doesn't change the overall point, but details matter.

    
    
      $ wget https://sidbala.com/content/images/2016/11/FramePNG.png
      --2016-11-04 22:08:08--  https://sidbala.com/content/images/2016/11/FramePNG.png
      Resolving sidbala.com (sidbala.com)... 104.25.17.18, 104.25.16.18, 2400:cb00:2048:1::6819:1112, ...
      Connecting to sidbala.com (sidbala.com)|104.25.17.18|:443... connected.
      HTTP request sent, awaiting response... 200 OK
      Length: unspecified [image/png]
      Saving to: ‘FramePNG.png’
    
      FramePNG.png                      [ <=>                                             ] 622.34K  --.-KB/s   in 0.05s
    
      2016-11-04 22:08:08 (12.1 MB/s) - ‘FramePNG.png’ saved [637273]
    
      $ pngout FramePNG.png
       In:  637273 bytes               FramePNG.png /c2 /f5
      Out:  597850 bytes               FramePNG.png /c2 /f5
      Chg:  -39423 bytes ( 93% of original)

~~~
nartsbtaa
Why even compare PNG and H.264 to begin with? PNG is a lossless compression
format. A better comparison would be something lossy like JPG, which could
easily shrink the size to ~100 kB. The point still stands, but at least it's a
more relevant comparison.

------
notlisted
Well done. The only thing that could make this better is an interactive
model/app for me to play around with. The frequency spectrum can probably be
used while retouching images as well.

A video on youtube led me to Joofa Mac Photoshop FFT/Inverse FFT plugins [1]
which was worth a try. I was unable to register it, as have others. Then I
came across ImageJ [2], which is a really great tool (with FFT/IFFT).

Edit: if anyone checks out ImageJ, there's a bundled app called Fiji [3] that
makes installation easier and has all the plugins.

If anyone has other apps/plugins to consider, please comment.

[1] [http://www.djjoofa.com/download](http://www.djjoofa.com/download)

[2]
[https://imagej.nih.gov/ij/download.html](https://imagej.nih.gov/ij/download.html)

[3] [http://fiji.sc/](http://fiji.sc/)

~~~
0x09
I published a set of utilities that I developed for playing and to help myself
learn about frequency analysis here, you might find them interesting:

[https://github.com/0x09/dspfun](https://github.com/0x09/dspfun)

------
i336_
I found this explanation of Xiph.org's Daala (2013) very interesting and
enlightening in terms of understanding video encoding:
[https://xiph.org/daala/](https://xiph.org/daala/)

Related:

BPG is an open source lossless format for images that uses HEVC under the
hood, and is generally better than PNG across the board:
[http://bellard.org/bpg/](http://bellard.org/bpg/)

For a runner-up lossless image format unencumbered by H265 patents (completely
libre), try [http://flif.info/](http://flif.info/).

------
afghanPower
A real fun read. Had an assignment a couple of weeks ago where we used the
most k most significant singular values of matrices (from picture of Marilyn
M.) to compress the image. H.264 is on a whole other level, though ;)

~~~
bananicorn
Now the Question is - Manson or Monroe - and which one would be easier to
compress? ;)

------
optimuspaul
I enjoyed this for the most part and even learned a little. But it started out
very simple terms and really appealing to the common folk. But then about
halfway through the tone changed completely and was a real turn off to me.
It's silly but this "If you paid attention in your information theory class"
was the spark for me. I didn't take any information theory classes, why would
I have paid attention? I don't necessarily think it was condescending, but
maybe, it's just that the consistency of the writing changed dramatically.

Anyway super interesting subject.

------
problems
Really cool stuff, one thing though seems a little odd:

> Even at 2%, you don't notice the difference at this zoom level. 2%!

I'm not supposed to see that major streakiness? The 2% difference is extremely
visible, even 11% leaves a noticably bad pattern on the keys (though I'd
probably be okay with it in a moving video), only the 30% difference looks
acceptable in a still image.

------
dirtbox
I like this video explaining the difference between H.264 and H.265
[https://www.youtube.com/watch?v=hRIesyNuxkg](https://www.youtube.com/watch?v=hRIesyNuxkg)

Simplistic as it is, it touches on all the main differences. The only problem
with H.265 is the higher requirements and time needed for encoding and
decoding.

------
markatkinson
Damn, lost me during the frequency part.

~~~
Koshkin
Sometimes it's just easier to learn the math. (I am not kidding.)

------
ludwigvan
What is the latest in video compression technology after H264 and H265?

The article discusses lossy compression in broad terms, but have we reaped all
the low hanging fruit? Can we expect some sort of saturation just like we have
with Moore's law where it gets harder and harder to optimize videos?

------
el0j
If the author truly wants 'magic', how about we take a 64KiB demo that runs
for 4 minutes. That's 64KiB containing 240 seconds of video, and your H.264
had to use 175 for only five seconds on video.

We can conclude that 64KiB demos are at least 48 times as magical as H.264.

------
vcool07
This was a good and interesting read. Is h.264 an open standard ?

~~~
shocks
Doesn't look like it;
[https://en.wikipedia.org/wiki/H.264/MPEG-4_AVC](https://en.wikipedia.org/wiki/H.264/MPEG-4_AVC)

> H.264 is protected by patents owned by various parties. A license covering
> most (but not all) patents essential to H.264 is administered by patent pool
> MPEG LA.[2] Commercial use of patented H.264 technologies requires the
> payment of royalties to MPEG LA and other patent owners. MPEG LA has allowed
> the free use of H.264 technologies for streaming internet video that is free
> to end users, and Cisco Systems pays royalties to MPEG LA on behalf of the
> users of binaries for its open source H.264 encoder.

~~~
0x09
It is an open standard. Anyone can purchase and implement it, and it was
developed by ISO. The technologies are not royalty free in the US. Don't
conflate the two. *

Edit: I emphasize this mainly because the terms have a specific meaning in
standards jargon but also because it places the blame for software patent
abuses on the wrong parties (the standards developers rather than the lawyers
and legislators).

~~~
vcool07
OK, so it is open, but not free ? Is it available for academic purposes free
of cost ?

~~~
Khoth
You have to pay royalties to actually use it, but if you just want to read the
thing, you can get it for free from the ITU.
[https://www.itu.int/rec/T-REC-H.264-201602-S/en](https://www.itu.int/rec/T-REC-H.264-201602-S/en)

------
neo2006
The comparison doe not make any sense, and no h264 is not magic!!: \- The guy
is comparing a lossless format PNG to H264 which is a lossy video format, that
is not fair. \- he is generating a 5 frame video and compared to 1 frame
image, only the I-frame at the begining of the video matter in that case al
the others are derived from it, P-Frame. \- What is the point of having that
comparaison we already have images format comparable to the size of a H264
I-frame and using the same science (entropy coding, frequency domain, intra
frame MB derivation...)?

~~~
mcherm
Did you read the article?

The point you are making here is PRECISELY the point that the author was
making in the article: that a lossy format can be far, far smaller. He then
goes into the details (from a high-level point of view) of what kinds of
losses H264 incurs.

------
syastrov
An enjoyable, short and to the point article with many examples and analogies.
But my favorite part was this:

"Okay, but what the freq are freqX and freqY?"

------
umbs
"1080p @ 60 Hz = 1920x1080x60x3 => ~370 MB/sec of raw data."

I apologize if this is trivial. What does 1920 in above equation represent?

~~~
boundlessdreamz
1080p is 1920x1080 px

Btw question is trivial but don't feel apologetic about asking questions. None
of us know everything and in a field we don't know, our questions will be
trivial.

------
some1else
Try scrubbing backwards. H264 seeking only works nice if you're fast-
forwarding the video. Actually, that is kind of magical.

------
11thEarlOfMar
Do H.264 and WebRTC have different use cases? Or do they compete directly?

~~~
eddieh
Let's say you want to video chat with someone using only web browsers, you
would establish a direct peer-to-peer connection with WebRTC and then you
could stream H.264 video to each other. I'd say WebRTC and H.264 compliment
each other. However, the shared stream or data need not be H.264.

------
imperialdrive
Great Write-up, thank you for your time and effort!

------
molind
Wow, now tell me how H.265 works!

------
xyproto
Copyrighted and patented magic.

------
bjn
Well written article.

------
andrey_utkin
Too trivial, too general, too pompous. I'd downvote.

------
mohda786921
I need hacker

------
mohdz4939
I need hacker

------
wizkkidd
time to make move on: h.265

------
mentioned_edu
Nice

------
wizkkidd
time to move on: h.265

------
wizkkidd
time to move on H.265

------
aaron695
H.265/HEVC vs H.264/AVC: 50% bit rate savings verified

[http://www.bbc.co.uk/rd/blog/2016/01/h-dot-265-slash-hevc-
vs...](http://www.bbc.co.uk/rd/blog/2016/01/h-dot-265-slash-hevc-vs-h-
dot-264-slash-avc-50-percent-bit-rate-savings-verified)

------
mozumder
So what's the final car weight? It looks like you stopped at the Chroma
subsampling section..

~~~
LASR
6.5 Ounces! or 0.4 lbs. Thanks for the feedback! I added the final weight into
the conclusion.

------
monochromatic
This is great as a high-level overview... except that it's _way too_ high-
level. These are all extremely well-known techniques. Is there any modern
video compression scheme that doesn't employ them?

In other words, why is H.264 _in particular_ magical?

------
imaginenore
> _" If you don't zoom in, you would even notice the difference."_

First of all, I think he meant "you would NOT even notice".

Second of all, that's the first thing I noticed. That PNG looks crystal clear.
The video looks like overcompressed garbage.

------
andrewvijay
Well explained. I was thinking of reading about h264 and this is an amazing
starter. Thanks Sid!

------
necessity
s/magic/lossy

------
kutkloon7
"This concept of throwing away bits you don't need to save space is called
lossy compression."

What a terrible introduction of lossy compression. This would mean that if I
empty the thrash bin on my desktop, it's lossy compression.

The concept of going through all compression ideas that are used is pretty
neat though.

~~~
Jweb_Guru
> This would mean that if I empty the thrash bin on my desktop, it's lossy
> compression.

It is.

------
cogwheel
MB is 1024 * 1024 * bytes not 1000 * 1000 * bytes. Unless you're a HDD/SSD
manufacturer.

~~~
garaetjjte
MiB (mebibyte) is 1024 * 1024 bytes.

------
VikingCoder
Ugh. Comparing the file size difference between a lossless PNG and a LOSSY
H.264 video of a STATIC PAGE is absurd. Calling it "300 times the amount of
data," when it's a STATIC IMAGE is insulting in the extreme. It really doesn't
matter if the rest of the article has insights, because you lost me already.

~~~
1maginary
He clarifies right after that he got to those numbers because he used a
lossless vs lossy encoder. Really should've kept reading

~~~
VikingCoder
"right after that".

No he didn't explain "right after that." He rambled on and on, and even after
all of that, he STILL doesn't bring up JPG.

It's an inherently stupid comparison to make. You can't polish a turd.

~~~
LASR
Thanks for the feedback! Sorry if I was unclear. The comparison with PNG is
very intentional to illustrate the vast difference in the compression
efficiencies involved. I do state the difference clearly here though:

> This concept of throwing away bits you don't need to save space is called
> lossy compression. H.264 is a lossy codec - it throws away less important
> bits and only keeps the important bits.

> PNG is a lossless codec. It means that nothing is thrown away. Bit for bit,
> the original source image can be recovered from a PNG encoded image.

