Hacker News new | past | comments | ask | show | jobs | submit login
A hands-on introduction to video technology: image, video, codec and more (github.com/leandromoreira)
694 points by manorwar8 on June 3, 2017 | hide | past | favorite | 74 comments

Video compression is not understood well enough throughout the whole stack yet.

I recently got a 1080p projector for home use, so now movies / TV series in my home are viewed on a 100" screen. Content is mostly from Netflix and Amazon Prime Video.

Netflix does a really good job with encoding. I cannot say the same for Amazon Prime Video; even with their exclusive (in UK) offerings, like American Gods or Mr Robot, the quality of the encode is quite poor when viewed on a big screen. Banding, shimmering blocky artifacts on subtle gradients, insufficient bit budget for dimly lit scenes - once you become aware of the issues, it becomes really distracting.

OTOH a really big screen is a fantastic ad for high quality high bitrate content. Anything less than 2GB/hour is noticeably poor.

A nice comparison I often do: I take a plain black image full #000, and put it side by side with a video.

Keep in mind that for historical reasons related to the way analog TVs used to work, black in video is almost never "#000". We call that setup, or superblack. In the analog days NTSC black was 7.5 IRE, not 0, which is why you could always tell when a television was turned on even with nothing on the screen. More than one digital standard kept this custom, setting black at (roughly, in your terms) "#111". So if you're seeing encoding artifacts or grayness on black video, that's more a legacy of video itself and not the encoding work done by whoever made the media.

Say I sit down at a switcher and push the Black button. You'd think nothing would come out, right? Not quite: that switcher will almost always emit broadcast black, not "null," whether it's analog or digital. Sure, you can probably get Photoshop to spit out 0% then encode in some codec that lets you keep 0% and successfully display it on a modern digital television, but the more equipment and encodes you add to a video production the more likely the signal will be pulled up to broadcast black, so it's better to just think of that as black.

There's a similar exercise with white on the upper end. Remember when analog TVs would buzz when you wired up your computer and put RGB white on the screen? That's superwhite.

Well you should still be outputting full contrast. If it's not displaying black as fully black on your screen, there's a problem somewhere, probably in your playback software.

In MPC there's an option Render Settings > Output Range > 0-255 or 16-235. When set to 0-255 (default) my videos play with full black.

That setting sounds like it's for clamping your output to drive analog televisions in case your media has supercolors in it, given its default. I mean, putting a lens cap on a camera will record 16, and a white title on black in most editors will use 235 white on 16 black. I'm not sure what you think that setting is doing in that circumstance, when playing "typical" media. It definitely shouldn't be scaling, because that would distort intended color.

A unique pleasure of HN is watching programmers try to argue with domain experts on the topic of their domain expertise.

Domain experts sometimes don't realize when their knowledge is out of date.

To wit: HDMI does allow for 0-255 (or 1023, etc) "full range" quantization instead of the default "limited range" of 16-235 for most CEA-861 formats, if the sink's EDID allows a selectable YCC quantization range and the source sends AVI YQ=1.

Or even getting to the "plain black image full #000" remark that prompted this thread, images are basically always full range where black is 0 and there is no "superblack" concept. If it's being transmitted over HDMI, it's up to the GPU to do range conversion and RGB/YCbCr conversion as necessary, and up to the OS to do ICC colorspace conversion as necessary.

> Domain experts sometimes don't realize when their knowledge is out of date.

Sigh. Another unique pleasure of HN.

Sometimes there's a difference between practical and possible. I did not say it was not possible. While you're technically correct, the worst kind of correct, very little content and few delivery systems actually utilize that capability -- look how much you had to type just to get there. Then how are you going to deliver that alternate colorspace content? To wit: I note you've overlooked ATSC and other delivery mechanisms, as well as HDMI being a specification designed to handle all cases including non-NTSC-legacy regions which had a different colorspace all along (notably PAL, which defined "blanking" and "black" as equivalent).

Practically speaking, pretty much no content you'll ever play makes use of wide gamut in North American contexts. If you're cooking a file for yourself, that's one thing, but once you're producing something that's mass consumed there's a lot of compatibility issues to worry about. We still have fractional frame rates and picture safe zones, for crying out loud, despite drop frames being pointless for decades and overscan being a word nobody under 30 has heard. What's the point of leveraging wide gamut in HDMI if all your capture equipment, editing tech, broadcast delivery, playback equipment, and displays cannot handle it? For the general case you're counting on Vizio adding that ability into a $200 TV and Comcast handling it in their CPE. Good luck.

Why does that matter? When producing video content handling a wide array of distribution media matters. Except for circumstances where you control the entire system end to end, such as a planetarium or some other kind of venue, you always have to plan for your content potentially being sent over lame, legacy-style HDTV broadcasting. That's the floor. Sometimes you even have to think about SDTV transmission; God help you, and have fun with chyrons...

Not even Rec. 2020, which is designed for UHD and is way better, changed this[0]. It's not a concept that's in dispute, despite whatever HDMI -- one standard of several hundred that are involved in delivering you a single TV broadcast -- can do.

[0]: https://en.wikipedia.org/wiki/Rec._2020#Digital_representati...


1. You asserted that 0/0% in image codecs was superblack. The most common codec for which that is actually true is WebP, which I'll bet no one in this thread actually cares about.

2. In digital colorimetry, gamut and range are orthogonal. Gamut is the space of real colors a given colorspace encompasses, range is whether the digital signal is quantized to 85% of the theoretical bitdepth and is what we've been talking about. Rec 2020 very much does increase gamut, as mentioned in your link.

3. Studios are actually starting to master in full-range 48-bit RGB (or XYZ technically, I think...) But yeah, that's irrelevant to end-users and this discussion.

4. Yes, everyone has been talking about PCs and how they render / output to TVs. That's why images and MPC were mentioned; broadcast doesn't deal with images. Computers deal with full-range RGB; that's literally the framebuffer format everyone renders everything to. As such, they convert video's YCbCr to full-range RGB before HDMI or TVs come into the picture; MPC's setting mentioned earlier is to ignore what the file says about its YCbCr range when it performs that conversion. Because while there are video files on the internet that set the wrong range, there are also internet video files that are legitimately full-range. And I'm not even talking about ancient codecs using palettes or VQ.

And since computers have traditionally outputted full-range digital RGB signals (over DVI, DisplayPort, LVDS, etc), they do indeed negotiate full-range RGB over HDMI when possible. And yes, even $200 TVs generally support that nowadays.

Heck, if you created a full-range H.264 file, tossed it on a thumb drive, and stuck it in your TV, I'd be willing to bet most if not all modern TVs play it correctly.

And absolutely, positively none of that matters to what the last episode of Scandal looks like as delivered over Netflix alongside an image of RGB black, which was the entire point of the thread. My perspective was the production of the average content that a user would be doing that comparison against, not your damned thumb drive that absolutely works and therefore, obviously, invalidates everything I'm saying. I even said, repeatedly, that the content just doesn't exist. That's all I said. Let me repeat that, just so we're abundantly clear: yes, video engineer, it is technically possible, but it is pretty much never used.

I did not assert that 0% in image codecs was superblack. I specifically mentioned Photoshop to avoid pedantry with someone coming along and saying "but nonlinear editors won't emit superblack unless you force them to!", not to talk about image codecs. You did that, and I ended up with more pedantry than I had bargained for in this entire thread, so that was a pointless exercise. As was sharing my perspective at all, honestly; I'm a professional video editor with credits (where "gamut" is a common term to discuss this phenomenon, despite your doctorate in colorimetry), not a video engineer. I apologize for speaking out of turn, and will defer to your expertise in the future.

The irony is that you responded to someone making a joke about the exact thing that you're doing. Please, just stop.

Sorry for thinking you were talking about image codecs and thinking you were confusing RGB black with superblack.

And other people who do different things have different perspectives and discussion to offer! Amazing! Are you satisfied with your correctness?

You are the reason I hate this community. "Oh, neat, a thread I know something about. Let me try contributing. Oh, look, now I'm in a slapfight with a guy who develops codecs for a living who completely missed the point of what I was saying and the context thereof, and wants to correct my usage of an incorrect term because it's his area of expertise. Let me rush to contribute more."

HN: The Game of Being More Right Than Everyone Else. Congratulations on winning. I'll go play something productive.

Edit: This was left when the parent comment said "Sorry, I only develop video codecs for a living," and it has now been rewritten. I'm leaving mine and I'd rather my whole subthread just be detached and deleted at this point

I am always reminded of this xkcd https://www.xkcd.com/1831/

You're right I don't really know exactly what the setting does.

But if a video cannot play with the black in the video being the darkest black my monitor is capable of, there is a problem there. Maybe it's some inherent problem with standards or formats or something. But if a video wants to show black, it should display on my monitor as dark as the monitor supports.

A camera module's optical black level isn't the same as Rec.709 color spec using a range of [16..235] for RGB color. https://en.m.wikipedia.org/wiki/Rec._709

I'm sorry, I'm not sure what you're explaining to me. I'm just saying recording "black" will yield broadcast black in general, without switching colorspaces and producing an entire content pipeline designed to handle an alternate colorspace. Nothing to do with the camera.

Depends on the content - HDR 4K content on both Netflix and Amazon is encoded in horribly low bitrate - about 12-15Mbit/s which makes it barely look any better than 1080p content (for comparison - 4K BluRay discs are about 60-100Mbit/s and look infinitely better).

This is probably not an accident.

A good broadband connection is limited to a sustained rate around 13-18 Mbps.

in US maybe, Im getting my advertised Mbps 24/7 as long as the source is able to keep up.

Really? I find that their 4K content looks a lot better than their 1080p content, not necessarily because of resolution but because of H.265.

It looks better, but it's not even remotely close to how BR looks (and how 4K actually should look when bringing full resolution to the table) - even with H.265/VP9 it's quite bit starved at that bitrate.

Could you share a link to the projector you bought? And what do you project it onto?

So maybe a more useful comment than video-range/full-range pedantry over historical brokenness:

Banding (which I think is the root cause of all your complaints) is effectively due to overly coarse quantization. This is amplified in H.264 (which Netflix and Amazon generally use) since the signal has multiple 8bit -> 10+bit -> 8bit trips, with each 8bit reduction contributing additional errors that can effectively result in small changes between blocks in the input signal amplified into noticeable differences in the output. And it's more noticeable in low-light scenes since current standards' gamma transfer functions compress a bit too much on the lower end (maybe my personal opinion there...)

The easiest solution is to have more discrete steps at each intermediate point. But for that to really help you'd need to change codecs from 8-bit H.264, which means not being playable on many devices. I mean, you could get 15% finer quantization with full-range 0-255 over video-range 16-235, but that doesn't help that much; you really need the 4x that 10bit gives you to be noticeable.

The second easiest solution is to have a debanding postfilter. But between the wide variety of platforms Netflix/Amazon have to run on, and modern DRM requirements, that's also not really an option.

So ultimately, you're left with developing prefilters to try to shape where bits are needed for given scenes, and careful tuning of your encoder. Which Netflix (presumably) has a several year head start over Amazon on.

Remember when we did this ugly interlacing thing, so that we could get a higher (50/60fps) framerate?

When did we decide that 24/25/30fps was good enough? Now we have a Blu-Ray standard that cannot handle greater than 30fps, and media corporations that are unwilling to release content via any other medium.

Put that together with ever-increasing resolutions, and the amount of pixels something moves across from one frame to the next becomes greater, and video looks more and more choppy.

Franky, this is a much bigger problem than NTSC ever was. Even with content (The Hobbit, Billy Lynn's Halftime Walk) being created at higher framerates, users have no way to get the content outside of a specialized theater because the Blu-Ray standard cannot handle it, and because people seem to honestly believe that higher framerates look bad.

I suppose we can only hope that creators take better advantage of digital mediums that do not have such moronic, and frankly harmful, arbitrary limitations.

High frame rates are beautiful. Gamers have known that for a long time and are obsessive about fps. The guy that did that amazing full motion video on a 1981 PC talks about doing experiments. And found that high frame rates were much more important than the image resolution. And had some nice examples of high res but choppy video next to high frame rate video with crap resolution. He somehow got 60 fps video on that ancient hardware and it looked acceptable.

> High frame rates are beautiful

Yes, but it all depends on what you are seeing. Games look fantastic, VR probably looks great.

For movies, I'd stick with 24 fps, as it keeps the "style" of film. Subtle things like the motion blur produced at that framerate and how we see/expect the movement in the screen are altered when the framerate changes.

As with a lot of things, it depends on what you want to achieve!

> 24 fps ... keeps the "style" of film

If what you wash my to achieve involved a lower framerates, go for it. The problem is that 24fps is considered the "standard", and film producers receive a ridiculous amount of backlash for creating any HFR content. There are 2 major films I know about (4 if you count the other two Hobbit films) that use higher framerates. Since I didn't go watch them in theaters, I have no way to see them in their original framerates.

> Subtle things like the motion blur produced at that framerate...

We expect 24fps quirks like motion blur because we have spent the entire history of film conditioning ourselves to prefer it. There are even expensive ways to render what appears to be motion blur so that we can simulate 24fps film in CG.

To contrast with the entire history of film (until recently), many people, like me, have UHD panels to watch film. At such a high resolution, the amount of pixels that motion blurs over has increased significantly. To make a clearer picture, most major studios record at a higher framerate to minimise motion blur, even if they intend to show every other frame. That means that most motion looks very choppy, as it jumps hundreds of pixels each frame. These films would be beautiful at higher framerates, but practically no one is willing to fight the "industry standard".

How do these problems even compare? NTSC is incapable of reproducing any! movies native frame rate, resolution and color space faithfully. It's that our expectation of consumer media have changed.

Broadcasters seem way less inflexible as the were back then, with television devices no longer costing multi-month salaries in most regions and whole standards like analog transmission and DVB-T 1 being already phased out and the current replacement rates being somewhere significantly under 10 years. Breaking legacy standards is no longer a no-go.

BluRay is not the problem, the display devices which support only older HDMI standards are, then again their whole hardware capabilities rarely exceed them either.

I doubt to see another consumer standard gaining any widespread adoption for physical distribution, when YouTube does already stuff like 8k@60fps and beyond. The expensive part now are the +4k display systems, not their media boxes (fire sticks, chromecast, set-top boxes, gaming consoles), which are cheap and can handle almost anything they have compatible hardware decoders for.

There are UHD BluRay players for less than USD 200, it's up to the studios to release content https://de.wikipedia.org/wiki/Ultra_HD_Blu-ray

> There are UHD BluRay players for less than USD 200, it's up to the studios to release content.

I wasn't aware until now that UHD Blu-Ray supports up to 60fps. At least they have that going for them.

Man that was horrible, hated the deinterlacing artifacts. at codec level it was nasty if you ever wanted debug issues with MBAFF. good thing hevc did away with that crap.

>> When did we decide that 24/25/30fps was good enough?

Everything looks weird at 48/60fps

Remember LoTR?

I think you mean the Hobbit, which was mentioned in parent comment. People didn't like it because they are so used to blurry low frame rate movies. If all movies were filmed like that, people would quickly get used to it. Soon they wouldn't be able to stand 24 fps. It's objectively worse.

Similarly footage shot with digital cameras is often modified to look more like film. There's no objective reason for it most of the time, people are just used to the artifacts of film. Anything else "looks weird". Now stuff like lens flare, shaky cameras, etc, are added to shots made entirely in CG!

I can get past that stuff because it doesn't make the quality that much worse, but low fps definitely does. What's the point of having super high resolution 4k display, if the scenes displayed on it are incredibly blurry from a low frame rate?

I never got to see The Hobbit in 48fps. That was only available in theaters. Blu-Ray standard constrained releases to 24fps, and the studios don't care enough to release a better version.

> Everything looks weird at 48/60fps

Everything looks jarringly different in 48/60fps. That does not make it bad. For several technical reasons, higher framerates look better. If you really want 24/30fps, you can always get it. Just like you can have grayscale on a color display.

I use Smooth Video Project to interpolate frames when I can (anything without DRM), but that still leaves me with a blurry picture, and artifacts. Even so, it's a more comfortable experience, especially with camera panning.

looks like this contains a bunch of creative commons (CC-BY-SA) content ripped from wikipedia without proper attribution. please add the missing attribution




I'm so sorry, I didn't mean to make it wrong, I tried really hard to put all the references in a list https://github.com/leandromoreira/digital_video_introduction... !

I'll fix these attributions but also feel free to point me more or even PR.

It's interesting to note that the architecture of the first ISO codec MPEG (1) is almost identical to the one we have today H.265 That codec was standardised in the late 90s So this design has carried through for about 20 years. Most of the changes relate to the targeted parameters such as frame size, frame rate and bitrate. Only the last step 264 --> 265 seems to have added new features.

This is a very well written introduction

What? They’ve changed the transforms, the entropy coding, the way motion compensation works, changed the blocking strategy....what aspect is unchanged, actually?

These are all refinements. The broad strokes of the algo haven't changed

What could have changed?

That's a good research topic. Considering the time-frame involved -- 20 years you could infer that attempts at finding alternative strategies have been half hearted The way the ISO and standardisation works is the reason for this. 99 out 100 researchers work on the mainstream

Just curious as I never bothered to think about this before, in all H. Codecs/standards... what does the "H" stand for?

It's an ITU spec naming convention - think things like X.25. They're all <letter>.<digit>+. The letters aren't often very mnemonic. The letter is a large bucket classification, audiovisual and multimedia systems for H.



Don't know, but it was associated with the ITU as opposed to the ISO before they decided to merge their efforts

It is not like it is not tried. Wavelet compression never took off. I don't know if it is because the format of is just better, or there was never enough investment into those formats.

The problem with Wavelet based compression was since wavelet transforms were applied globally to the whole frame at a time, while they were suitable for still image compression they couldn't really take advantage of motion compensation so their applicability for video was low. Same with fractal based techniques. Besides, as resolutions got higher and higher the blockiness of the 8x8 DCT became less and less a factor

Digital cinema uses wavelet compression - intra-frame only JPEG2000 at hundreds of Mbps. It seems that at high resolution and bitrate it actually performs similarly to or better than h264, e.g. this paper and its references: http://alumni.media.mit.edu/~shiboxin/files/Shi_ICME08.pdf

Digital cinema uses a resolution that is much higher than H.265's targeted sweet spot. Their quality needs are also a lot higher. Motion compensated video cannot give them the desired quality. Hence intra frame only wavelet based compression. Also, note that JPEG2000 which uses wavelets implies that for still images, wavelets can be made to work better. JPEG which preceded JPEG2000 was 8x8 DCT based.

You can still use wavelet compression to encode the residual from MC, but I think the biggest problem is performance: DCTs have been optimised far more than wavelet transforms.

Even in still-image compression, the difference is noticeable --- I have some high-resolution PDFs containing JPEG2000 scanned images, and they take significantly longer to render than the equivalent containing JPEG images.

Well, you could. But the way current schemes are structured, the motion compensation is done on a 16x16 macroblock basis Using an 8x8 DCT to clean up the 4 quadrants within a macroblock makes sense. But performing a global wavelet transform on a motion compensated difference image would mess you up at the boundaries of the macroblocks since you would potentially have discontinuities there. Of course it would be possible to devise a scheme that had a different approach to motion compensation like say a per pixel one that used optical flow. Also, not all macroblocks are encoded using motion compensation Even in P and B frames, some are encoded intra. You would lose that small optimisation in a wavelet based scheme.

Similar techniques as the ones used to optimise DCTs could be used for wavelets. There has just not been a demand. The standardisation effort tends to swamp out all alternatives once the decision to choose the algorithm has been made. There's a huge amount of momentum behind the standard which makes alternatives very hard to pitch. This is part of the reason why the same basic technique has been in prevalent use for 20 odd years and nothing else has come into play

Patent issues also didn't help for jpeg2000.

Where can one read about all the also-ran or proprietary codecs used in the 90's?

It truly was the dark ages of digital video with low frame rates and postage stamp sized windows.

Try the FAQ for comp.compression

Xiph.org wrote fascinating stuff about video compression when working on their next-generation codec, Daala https://xiph.org/daala/

This was also food for whiteboards in the show silicon valley. Compare: https://github.com/leandromoreira/digital_video_introduction...

with: http://imgur.com/a/Sne89

Those don't really seem similar. Diagrams with boxes and arrows aren't uncommon.

"Q" = Quantizer, "EC" = Entropy Coding, etc. It's definitely similar if you look closer.

"LZW" -> Lempel–Ziv–Welch on the entropy codes doesn't really make sense though (maybe that's why the word "stupid" is next to it).

I liked that the green channel in Mario's picture was titled "Luigi". Nice touch :)

The frequency of 60/1001 Hz and the situation where we are stuck with it basically forever is a shame upon the entire profession of video engineers.

Any more background on the 60/1001Hz thing?

Got really excited for a second thinking this was discussion on transport technology as opposed to encoding.

What aspects of the transport technology are you specifically interested in?

Uh...assume I know absolutely nothing; can you start with a simple ELI5, and then elaborate/point me off in the right direction with a seed foundation of knowledge?

LOL, I'll try. Two major things to grasp about transport technology are the a) "entry point" and b) the notion of time.

a) multiple types of media data are encoded independently and then bundled together in what essentially looks like an endless file (called a stream file). So when given a chunk of such a file, the decoder needs to quickly identify the nearest offset in it where it can begin decoding simultaneously all the individual media it needs. This is called "access point". Decoding cannot be started at any random place in the stream, as it generally requires context (so an access point allows to start decoding with the context being empty for all required media -- audio, video, graphics, subtitles etc). Stream file formats (called containers) are designed to solve this, provide access points to the decoder, as easily and frequently as possible.

b) a decoder, when driven by a running presentation device -- video screen, audio amplifier etc -- is essentially a pump. The encoder can be looked at like a pump too, when they are separated by network. If decoder runs faster than the source feeds it, it will drain the pipe and will make the presentation device run idle (which will be noticeable to the consumer). If it runs slower, at some point it will be drowned in data from the source. So the pumping rhythm needs to be maintained identical between both ends. The most practical way to synchronize the "piping clocksource" is via the stream file itself (which has to carry time sample data for that). Again, different containers solve this differently (some not at all).

EDIT: I didn't mention (should go into b)) the effort to make constant the throughput of the pumping -- "constant bit rate", as I believe with the advent of transport schemes which require point-to-point connections (as opposed to multicast streaming), the importance of this goes lower now.

Re: b) For video, the encoder contains a model of the decoder, including the amount of buffering available to the decoder. The bit-rate controller at the encoder uses this model to ensure that the decoder always has the right amount of data in its input buffers. It also ensures that the information rate of the channel is matched with that of the compressed stream in a live transmission setting. The transport scheme which operates at a layer below the codec, therefore only needs to take care of delay and packet delivery/loss related issues over the channel. Media is typically transmitted over UDP.

This is really great. We seriously lacked a good introduction to video technology.

This is awesome. I work in the VOD space, specifically in content protection and this is great reference guide. I've been meaning to write a similar guide for DRM.

first example interlacing image is wrong, shows running dogo with simulated division into scan lines, but does not take into account timing difference - that was one of the mayor sources of deinterlacing artifacts. Alternating fields are 1/60 second apart in time.


I believe the next step in video compression will be more on smarts like object tracking och object recognition.

Machine learning becoming more and more popular will probably help :)

They started along that direction with MPEG-4. But it didn't go very far then.

How much of that repo is protected by patents and cannot be reused?

Amazing work.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact