
H.266/Versatile Video Coding (VVC) - caution
https://newsletter.fraunhofer.de/-viewonline2/17386/465/11/14SHcBTt/V44RELLZBp/1
======
Unklejoe
It's interesting that they are able to continue improving video compression.
You'd think that it would have all been figured out by now.

Is this continued improvement related to the improvement of technology? Or
just coincidental?

Like, why couldn't have H.266 been invented 30 years ago? Is it because the
computers back in the day wouldn't have been fast enough to realistically use
it?

Do we have algorithms today that can compress way better but would be too slow
to encode/decode?

~~~
giantrobot
Video compression is a calculus of IO capacity, memory, and algorithmic
complexity. Take the MPEG-1 codec for instance, it was new about 30 years ago.
While today most people think of MPEG-1 videos as low quality the spec
provides the ability to handle bit rates up to 100Mb/s and resolutions up to
4095x4095. That was _way_ higher than the hardware of the time supported.

One of MPEG-1's design goals was to get VHS-quality video at a bitrate that
could stream over T1/E1 lines or 1x CD-ROMs. The limit on bitrate led to
increased algorithmic complexity. It was well into the Pentium/PowerPC era
until desktop systems could play back VCD quality MPEG-1 video in software.

Later MPEG codecs increased their algorithmic complexity to squeeze better
quality video into low bit rates. A lot of those features existed on paper
20-30 years ago but weren't practical on hardware of the time, even custom
ASICs. Even within a spec features are bound to profiles so a file/stream can
be handled by less capable decoders/hardware.

There's plenty of video codecs or settings for them that can choke modern
hardware. It also depends on what you mean by "modern hardware". There's
codecs/configurations a Threadripper with 64GB of RAM in a mains powered jet
engine sounding desktop could handle in software that would kill a Snapdragon
with 6GB of RAM in a phone. There's also codecs/configurations the Snapdragon
in the phone could play using hardware acceleration that would choke a low
powered Celeron or Atom decoding in software.

~~~
AareyBaba
Are there codecs that require high compute (Threadripper) for encode but can
be easily decoded on a Snapdragon ?

~~~
pcl
Yes — many codecs can be optimized for decoding at the expense of encoding.
This is appropriate for any sort of broadcast (YouTube, television, etc).

Also, in many applications, it’s suitable to exchange time for memory /
compute. You can spend an hour of compute time optimally encoding a 20-minute
YouTube video, with no real downside.

Neither of these approaches are suitable for things like video conferencing,
where there is a small number of receivers for each encoded stream and latency
is critical. At 60fps, you have less than 17ms to encode each frame.

Interestingly, for a while, real-time encoders were going in a massively
parallel direction, in which an ASIC chopped up a frame and encoded different
regions in parallel. This was a useful optimization for a while, but now,
common GPUs can handle encoding an entire 1080p frame (and sometimes even 4K)
within that 17ms budget. Encoding the whole frame at once is way simpler from
an engineering standpoint, and you can get better compression and / or fewer
artifacts since the algorithm can take into account all the frame data rather
than just chopped up bits.

~~~
beervirus
Surely videoconferencing doesn’t actually use 60 FPS...

~~~
giantrobot
Some web conferencing would want to do 60fps. There's also realtime streaming
like Twitch, PS Now, and Google's Stadia.

~~~
imtringued
Twitch isn't real time.

~~~
mathw
Yes it is. The delay on a Twitch stream doesn't mean they don't have to deal
with encoding and transmitting frames at full speed. If Twitch wasn't real
time, you'd only be able to watch live streams slowed down!

~~~
Kihashi
That's a different definition than most people mean when they say "real-time".

~~~
pavon
Not me. All realtime systems have some latency, but what makes them realtime
is that they must maintain throughput, processing data as quickly as it comes
in. You can subdivide to hard-realtime and soft-realtime depending on how
strict your latency requirements are, but it is still realtime.

------
KitDuncan
Can we all just agree on using AV1 instead of another patent encumbered
format?

~~~
otterley
No, because the market is more than happy to pay a few cents or dollars per
device to get better compression and lower transmission bandwidth. This
observation has held true consistently in the 3 decades since compressed
digital media was invented.

~~~
corty
The market is rather unhappy. E.g. Win10 doesn't ship an H265 codec because
it's too expensive.

~~~
otterley
At 99 cents for the add on, the decision to charge users smells more like a
political decision than an economic one. The cost to Microsoft is undoubtedly
far less than that. 99 cents is basically the bare minimum you can charge when
you accept credit cards as a method of payment.

~~~
eggsnbacon1
That's 1% of the cost of windows, for a video codec when there's already
dozens of free alternatives

~~~
otterley
There’s a fixed cap on the royalty rate, so even if we assume Microsoft paid
for a license for each copy of Windows sold to customers, on a per-copy basis
it would be much less than 1%.

------
DrBazza
Naively hoped I'd read 'this will be released to the community under a GPL
license' or similar. Instead found the words 'patent' and 'transparent
licensing model'.

I appreciate that it costs money and time to develop these algorithms, but
when you're backed by multi-billion dollar "partners from industry including
Apple, Ericsson, Intel, Huawei, Microsoft, Qualcomm, and Sony" perhaps they
could swallow the costs? It is 2020 after all.

~~~
p_l
The dirty secret of video codecs is that you can't make a modern video codec
that isn't patent encumbered, which in turn makes it so that even if they
wanted to be open, they go for defensive patents, which in turn perpetuate the
situation.

At least the patent licenses usually used with MPEG mean that private use of
open source implementations is free.

~~~
speedgoose
Software patents aren't a thing in ~~Europe~~ a few European countries. Sure
it's difficult to ignore the American market for a company, but an independent
developer could specify a state of the art video codec without thinking about
patents.

Edited because I didn't know that some European countries accept software
patents.

~~~
p_l
Software patents are a very complex thing.

For example, many countries in EU do not allow patents on software, but that's
not something you can claim to be true for all of them - at least before
Brexit, since iirc UK was pretty happy to provide software patents.

Then there's a case where if you're really willing you can, as far as I
understand, force a patent dispute through WTO, with possibility of patent
valid in USA being executed for example in Poland, despite the fact that the
patent is invalid in Poland (it doesn't matter if your software is part of
physical solution in Poland, algorithms of any kind are not patentable).

~~~
speedgoose
One more benefit of the brexit for mainland Europe! I didn't know about using
the WTO to dispute invalide patents. I guess a WTO dispute can make sense for
a Boeing software patent used by Airbus. I wonder whether the WTO would care
about an independent developer. It's not very good press, but perhaps it's
fine for them.

~~~
p_l
WTO doesn't care, WTO serves as a forum to get it through.

The question is, does a "practicing entity" holding the patent cares enough to
go through the hardest route to get the patent executed using WTO as a forum?
One needs to compare costs and benefits. It's why patent trolling involved
pretty much few counties in Texas, because that's where the costs were lowest
compared to benefits.

------
eigenvalue
Can anyone verify if this is a real number? It’s possible sometimes to make
surprising claims (such as 50% lower size) by relying on unusual or
unrealistic situations. I would rather if they use a standard set of test
videos with different content and resolutions, and some objective measure of
fidelity to the original, when quoting these percentages. But if the 50%
number is real, then that is truly remarkable. I wonder how many more CPU
instructions are required per second of decoded video compared to HEVC.

~~~
ajross
If one trusted numbers like this, and followed a chain of en vogue codecs back
through history, you'd expect that a modern codec would produce files sizes
like 3% of MPEG-2 on the same input data. It's all spin.

I'm sure it does better, but I'm equally sure it'll turn out to be an
incremental benefit in practice.

> I wonder how many more CPU instructions are required per second of decoded
> video compared to HEVC.

CPU cycles are cheap. The real cost is the addition of _yet another ?!@$!!#@
video codec block on every consumer SoC shipped over the coming decade_.

Opinionated bile: video encoding is a Solved Problem in the modern world, no
matter how much the experts want it to be exciting. The low hanging fruit has
been picked, and we should just pick something and move on. JPEG-2000 and WebP
failed too, but at least there it was only some extra forgotten software.
Continuing to bang on the video problem is wasting an absolutely obscene
amount of silicon.

~~~
sp332
MPEG-2 doesn't support video larger than 1920x1152, so it's hard to compare on
4k video, let alone 8k. But according to
[https://www.researchgate.net/publication/321412719_Subjectiv...](https://www.researchgate.net/publication/321412719_Subjective_and_objective_quality_assessment_of_MPEG-2_H264_and_H265_videos)
H.265 can achieve similar visual quality with 10% of the bit rate of MPEG-2
even on SD video (832*480). [Edit: not 720p]

~~~
entropicdrifter
832x480 is ~480p video (720x480 is standard widescreen dvd) 720p is 1280x720

------
anordal
Nobody mentioning EVC? Worth a read for anyone concerned about patent
licensing:

[https://en.wikipedia.org/wiki/Essential_Video_Coding](https://en.wikipedia.org/wiki/Essential_Video_Coding)

There are 3 video coding formats expected out of (former) MPEG this year:

[https://www.streamingmedia.com/Articles/Editorial/Featured-A...](https://www.streamingmedia.com/Articles/Editorial/Featured-
Articles/Inside-MPEGs-Ambitious-Plan-to-Launch-3-Video-Codecs-
in-2020-134694.aspx?utm_source=related_articles&utm_medium=gutenberg&utm_campaign=editors_selection)

So this isn't necessarily _the_ successor to HEVC (except that it is, in terms
of development and licensing methods).

------
clouddrover
> _A uniform and transparent licensing model based on the FRAND principle
> (i.e., fair, reasonable, and non-discriminatory) is planned to be
> established for the use of standard essential patents related to H.266
> /VVC._

Maybe. On the other hand, maybe not. Leoanardo Chiariglione, founder and
chairman of MPEG, thinks MPEG has for all practical purposes ceased to be:

[https://blog.chiariglione.org/a-future-without-
mpeg/](https://blog.chiariglione.org/a-future-without-mpeg/)

The disorganised and fractured licensing around HEVC contributed to that. And,
so far, VVC's licensing looks like it's headed down the same path as HEVC.

Maybe AV1's simple, royalty-free licensing will motivate them to get their act
together with VVC licensing.

------
xiphias2
Shouldn't deep learning based video codecs take over dedicated hardware video
decoders as more tensor cores become available in all new hardware?

NVIDIA's DLSS 2.0 supersampling is already moving into that direction.

~~~
sp332
Instead of a video file or stream, that would be more like shipping a program
that recreates the video. It might be cool, but it's not really feasible to
play back that kind of thing on normal TV hardware.

~~~
xiphias2
I'm not sure what you mean. There are already multiple research articles that
show that deep neural network based video compression can be competitive,
here's an example:

[https://papers.nips.cc/paper/9127-deep-generative-video-
comp...](https://papers.nips.cc/paper/9127-deep-generative-video-
compression.pdf)

------
Havoc
The fact that the underground scene is still pumping 264 instead of 265 (I'd
estimate 90/10 split optimistically) tells me the real world is not quite
ready for 266.

So I guess it comes down to 266 hw support. Or powerful CPUs that can push sw
decoding?

------
blacklion
What I don't understand, why do internationa; standardization organizations
allows patent-encumbered technologies to become de-jure standards.

MPEG, WiFi, GSM…

IMHO, intentional standards must be implementable without any patent fees, or
they are very bad standards.

~~~
freeone3000
There's no law requiring wifi - "de facto". And they're standards because
they're quite good! They have hardware support and parallelization and account
for all use cases, even the marginal ones, and have reference implementations
and support. Standards orgs don't care about patents because they're not
relevant. This isn't a case of trolling - this is literally a software patent
being used for its intended purpose by its developer, to extract profit by
coming up with a new idea, and letting others use it.

------
nickysielicki
I'm no expert when it comes to video codecs but I'm surprised that we're still
able to see such strong claims of algorithmic improvements to h264, and now to
h265. I'm also aware of how patent-encumbered this whole field is and I'm
skeptical that this is just a money grab.

This is really just a press release, what's actually new? Can it be
implemented efficiently in hardware?

~~~
bob1029
Your skepticism is very healthy, especially in this arena. With video codecs,
information theory is ultimately the devil you must answer to at the end of
the day. No amount of patents, specifications or algorithmic fantasy can get
you away from fundamental constraints.

It seems like the major trade-off being taken right now is along lines of
using more memory to buffer additional frames. This can help you in certain
scenarios, but in the general case, you cannot ever hope that a prior frame of
video has any bearing on future frames of video. It is just exceedingly likely
that most frames of video look much like prior frames. So, you can certainly
play this game to a point, but you will quickly find yourself on the other end
of the bell curve.

You can also play games with ML, but I argue that you are going even further
from the fundamental "truth" of your source data with this kind of technique,
even if it appears to be a better aesthetic result in isolation of any other
concern.

There are also lots of one-off edge cases that have always been impossible to
address with any interframe video compression scheme. Just look at the slowmo
guys on youtube dump confetti on a 4K camera. No algorithm except for the
dumbest intraframe techniques (i.e. JPEG) can faithfully reproduce scenes with
information this dense, and usually at the expense of dramatic bandwidth
increases.

Bandwidth is cheap and ubiquitous. I say we just use the algorithms that are
the fastest and most efficient for our devices. We aren't in 2010 sucking 3G
or edge through a straw anymore. Most people can get 20+mbps in their
smartphones in decently-populated areas.

~~~
freeone3000
The advantage of the H-series of codecs is strong support of hardware
implementation. This has been a selling point since H.262. You can get a H.265
IP core from Xilinx, Intel, and other major vendors -- so the actual runtime
performance for H.266 (once a core is available) will be very low and constant
(and comparable to current codecs). Bandwidth and storage space are real
costs, despite the handwaving around it, and reducing these requirements while
not reducing visual quality is an important step.

As for "information-dense scenes": Pathologic cases such as the HBO intro
screen are encoded into modern codecs as noise, and regenerated client-side,
because there's no actual information there. These scenes are either
engineered or pure noise.

------
cagenut
that sounds great, but this is a press release with no real technical details.
can anyone in the know add some context? for instance, whats the tradeoff? I
assume more CPU?

webrtc based video chats are all still using h264, did they not adopt 265 yet
for technical or licensing reasons? what is the likelihood of broad browser
support for h266 anytime soon?

~~~
TheRealSteel
H.265/HEVC takes about ten times as much computation to encode than H.264 [1],
so H.264 still has legitimate technical use cases, even with licensing/patents
aside.

This makes it great for a company like Netflix or YouTube, but less good for
one-to-one and/or battery sensitive use cases like video calls. However,
specialized chips help, and some mobile devices can record in HEVC in real
time (mine from 2019 can). I believe current smartphones have HEVC encoding
hardware, but I'm struggling to find a source for that right now.

I haven't seen the details of this new codec yet, but it's quite possible it
also has a large encoding cost which will make it better suited to particular
use cases, as opposed to a blanket upgrade.

[1] [http://www.praim.com/en/news/advanced-
protocols-h264-vs-h265...](http://www.praim.com/en/news/advanced-
protocols-h264-vs-h265/)

~~~
becauseiam
iPhone 7 onwards[1], Qualcomm Snapdragon 610 onwards[2], and Intel Skylake and
later CPUs[3] can all encode and decode H.265 in hardware to varying profile
levels.

1: [https://support.apple.com/en-gb/HT207022](https://support.apple.com/en-
gb/HT207022)

2:
[https://www.qualcomm.com/snapdragon/processors/comparison](https://www.qualcomm.com/snapdragon/processors/comparison)

3:
[https://trac.ffmpeg.org/wiki/Hardware/QuickSync](https://trac.ffmpeg.org/wiki/Hardware/QuickSync)

~~~
jeffbee
QuickSync is actually a feature of Intel's integrated GPU. Parts without GPUs,
no matter how recent, don't have QuickSync.

------
donatj
Huh. I wonder how encoding speeds compare. I rarely chose h265 over h264
because similar levels of visual quality took massively more time.

~~~
ctdeneen
But for most situation you encode once and play multiple times. Wouldn't it be
better to reduce storage and bandwidth costs with a smaller file (assuming the
same quality)?

~~~
donatj
It's a trade off. When I have a batch of 40 videos and encoding h264 takes 20
minutes per video and h265 takes 4 hours means the difference between 13 days
and 160 days.

The latter isn't practical, I'll eat the couple hundred MB in order to save a
lot of time.

~~~
waterhouse
I think by "13 days and 160 days" you mean "13 hours and 160 hours".

------
TekMol
At some point the compressed version of "Joker" will be 45 chars:

"Sequel to Dark Night starring Joaquin Phoenix"

Of course we will not have to film movies in the first place then. We will
just put a description into a compressor start watching.

~~~
prvc
8 MiB Shrek is kind of an AV1 meme at this point.

------
baybal2
H.265 is still not mainstream, and not used to full extend of its performance

I'm not sure if 265 is worth spending efforts on now when 266 is about to
crash the party, and will be equally adopted at least "equally poorly"

~~~
crazygringo
H.265 seems pretty mainstream by now. Older devices obviously don't support it
in hardware, but pretty much all newer ones seem to, no?

It's just a slow percolation throughout the ecosystem as people buy new
hardware and video servers selectively send the next-generation streams to
those users.

The effort on h.265 has already been spent. Now it looks like h.266 is the
next generation. It's going to be years before chips for it will be in
devices. That's just how each new generation works.

------
0-_-0
Question is, how does it compare to AV1?

~~~
miclill
I guess only time will tell.

AV1 is supposed to be 30% better than HEVC and they claim H.266 is 50% better
than HECV. This would mean that H.266 is roughly 30% better than AV1. By
better I'm always referring to the bandwidth/space needed.

But take this with more than a grain of salt since bandwidth/space are only
one of many things that matter and also these comparisons are dependent on so
many things like resolution, material (animatic/real), etc. etc.

~~~
mda
I don't think your math adds up. Is 150 30% better than 130? It is only 14%
better.

Regardless, these early performance claims are most likely complete bullshit.

~~~
occamrazor
It’s about size: 50 is about 30% better than 70, which is 30% better than 100.

------
prvc
It is mildly amusing that the very simple vector art "VVC" logo on their
webpage is displayed by sending the viewer a 711 KB .jpg file.

------
pxf
Will it be used? Probably the last one that does not use some sort of AI
Compression.See this for image compression
[https://hific.github.io/](https://hific.github.io/) In the next 10 years AI
Compression will be everywhere. The problem will be standartisation. Classic
compression algoritms can't beat AI ones.

~~~
crazygringo
AI compression is super, super cool... but while standardization is certainly
a major issue, isn't the model size a much larger one?

Given that model sizes for decoding seem like they'll be on the order of many
gigabytes, it will be impossible to run AI decompression in software, but will
need chips, and chips that are a lot more complex (expensive?) than today's.

I think AI compression has a good chance of coming eventually, but in 10 years
it will still be in research labs. There is absolutely no way it will have
made it into consumer chips by then.

~~~
pxf
"Isn't the model size a much larger one?" yap It will probably be different,
and systems will have to download the weights and network model, as new models
come in, I don't think that we will have a fixed model with fixed weights, the
evolution is too fast. Decoding will take place using the AI chip on the
device aka "AI accelerator"

------
znpy
I wonder how small would one of those 700mb divx/xvid movies would be if
compressed with this new encoding method.

------
crazygringo
They talk about saving 50% of bits over h.265, but also talk about it being
designed especially for 4K/8K video.

Are normal 1080p videos going to see this fabled 50% savings over h.265? Or is
the 50% _only_ for 4K/8K, while 1080p gets maybe only 10-20% savings?

The press release unfortunately seems rather ambiguous about this.

------
robomartin
Among other things, I have worked with and developed technology in the
uncompressed professional imaging domain for decades. One of the things I
always watch out for is precisely the terminology and language used in this
release:

"for equal perceptual quality"

Put a different way: We can fool your eyes/brain into thinking you are looking
at the same images.

For most consumer use cases where the objective is to view images --rather
than process them-- this is fine. The human vision system (HVS, eyes + brain
processing) is tolerant of and can handle lots of missing or distorted data.
However, the minute you get into having to process the images in hardware or
software things can change radically.

Take, as an example, color sub-sampling. You start with a camera with three
distinct sensors. Each sensor has a full frame color filter. They are
optically coupled to see the same image through a prism. This means you sample
the red, green and blue portions of the visible spectrum at full spatial
resolution. If we are talking about a 1K x 1K image, you are capturing one
million pixels of each, red, green and blue.

BTW, I am using "1K" to mean one thousand, not 1024.

Such a camera is very expensive and impractical for consumer applications.
Enter the Bayer filter [0].

You can now use a single sensor to capture all three color components.
However, instead of having one million samples for each components you have
250K red, 500K green and 250K blue. Still a million samples total (that's the
resolution of the sensor) yet you've sliced it up into three components.

This can be reconstructed into full one million samples per color components
through various techniques, one of them being the use of polyphase FIR (Finite
Impulse Response) filters looking across a range of samples. Generally
speaking, the wider the filter the better the results, however, you'll always
have issues around the edges of the image. There are also more sophisticated
solutions that apply FIR filters diagonally as well as temporally (use
multiple frames).

You are essentially trying to reconstruct the original image by guessing or
calculating the missing samples. By doing so you introduce spatial (and even
temporal) frequency domain issues that would not have been present in the case
of a fully sampled (3 sensor) capture system.

In a typical transmission chain the reconstructed RGB data is eventually
encoded into the YCbCr color space [1]. I think of this as the first step in
the perceptual "let's see what we can get away with" encoding process. YCbCr
is about what the HVS sees. "Y" is the "luma", or intensity component. "Cb"
and "Cr" are color difference samples for blue and red.

However, it doesn't stop there. The next step is to, again, subsample some of
it in order to reduce data for encoding, compression, storage and
transmission. This is where you get into the concept of chroma subsampling [2]
and terminology such as 4:4:4, 4:2:2, etc.

Here, again, we reduce data by throwing away (not quite) color information. It
turns out your brain can deal with irregularities in color far more so than in
the luma, or intensity, portion of an image. And so, "4:4:4" means we take
every sample of the YCbCr encoded image, while "4:2:2" means we cut down Cb
and Cr in half.

There's an additional step which encodes the image in a nonlinear fashion,
which, again, is a perceptual trick. This introduces Y' (Y prime) as
"luminance" rather than "luma". It turns out that your HVS is far more
sensitive to minute detail in the low-lights (the darker portions of the
image, say, from 50% down to black) than in the highlights. You can have
massive errors in the highlights and your HVS just won't see them,
particularly if things are blended through wide FIR filters during display.
[3]

Throughout this chain of optical and mathematical wrangling you are highly
dependent on the accuracy of each step in the process. How much distortion is
introduced depends on a range of factors, not the least of which is the way
math is done in software or chips that touch every single sample's data. With
so much math in the processing chain you have to be extremely careful about
not introducing errors by truncation or rounding.

We then introduce compression algorithms. In the case of motion video they
will typically compress a reference frame as a still and then encode the
difference with respect to that frame for subsequent frames. They divide an
image into blocks of pixels and then spatially process these blocks to develop
a dictionary of blocks to store, transmit, etc.

The key technology in compression is the Discrete Cosine Transform (DCT) [4].
This bit of math transforms the image from the spatial domain to the frequency
domain. Once again, we are trying to trick the eye. Reduce information the HVS
might not perceive. We are not as sensitive to detail, which means it's safe
to remove some detail. That's what DCT is about.

So, we started with a 3 sensor full-sampling camera, reduced it to a single
sensor and three away 75% of red samples, 50% of green samples and 75% of blue
samples. We then reconstruct the full RGB data mathematically, perceptually
encode it to YCbCr, apply gamma encoding if necessary, apply DCT to reduce
high frequency information based on agreed-upon perceptual thresholds and then
store and transmit the final result. For display on an RGB display we reverse
the process. Errors are introduced every step of the way, the hope and
objective being to trick the HVS into seeing an acceptable image.

All of this is great for watching a movie or a TikTok video. However, when you
work in machine vision or any domain that requires high quality image data,
the issues with the processing chain presented above can introduce problems
with consequences ranging from the introduction of errors (Was that a truck in
front of our self driving car or something else?) to making it impossible to
make valid use of the images (Is that a tumor or healthy tissue?).

While H.266 sounds fantastic for TikTok or Netflix, I fear that the constant
effort to find creative ways to trick the HVS might introduce issues in
machine vision, machine learning and AI that most in the field will not
realize. Unless someone has a reasonable depth of expertise in imaging they
might very well assume the technology they are using is perfectly adequate for
the task. Imagine developing a training data set consisting of millions of
images without understanding the images have "processing damage" because of
the way they were acquired and processed before they even saw their first
learning algorithm.

Having worked in this field for quite some time --not many people take a 20x
magnifying lens to pixels on a display to see what the processing is doing to
the image-- I am concerned about the divergence between HVS trickery, which,
again, is fine for TikTok and Netflix and MV/ML/AI. A while ago there was a
discussion on HN about ML misclassification of people of color. While I
haven't looked into this in detail, I am convinced, based on experience, that
the numerical HVS trickery I describe above has something to do with this
problem. If you train models with distorted data you have to expect errors in
classification. As they say, garbage-in, garbage-out.

Nothing wrong with H.266, it sounds fantastic. However, I think MV/ML/AI
practitioners need to be deeply aware of what data they are working with and
how it got to their neural network. It is for this reason that we've avoided
using off-the-shelf image processing chips to the extent possible. When you
use an FPGA to process images with your own processing chain you are in
control of what happens to every single pixel's data and, more importantly,
you can qualify and quantify any errors that might be introduced in the chain.

[0]
[https://en.wikipedia.org/wiki/Bayer_filter](https://en.wikipedia.org/wiki/Bayer_filter)

[1] [https://en.wikipedia.org/wiki/YCbCr](https://en.wikipedia.org/wiki/YCbCr)

[2]
[https://en.wikipedia.org/wiki/Chroma_subsampling](https://en.wikipedia.org/wiki/Chroma_subsampling)

[3]
[https://en.wikipedia.org/wiki/Gamma_correction](https://en.wikipedia.org/wiki/Gamma_correction)

[4]
[https://www.youtube.com/watch?v=P7abyWT4dss](https://www.youtube.com/watch?v=P7abyWT4dss)

~~~
somethingsome
Don't you think that the iterated convolution process in neural networks is,
in a measure, able to overlook this kind of 'visual trickery' ? I can imagine
that the network is not able to perform well if you change the color profile
of the input when you trained on another one, but small texture attenuations,
diminished chroma components, etc. may not be as important when the image is
downsampled and split a huge number of times (wondering)

~~~
robomartin
Here's one way to look at it: Contrast and well defined edges can be important
in feature extraction. Our vision system, on the other hand, can do just fine
with less information in the high frequencies (where edges live).

~~~
somethingsome
I see your point, and I'm not particularly defending neural networks, but IMO,
nothing prevents a network to generate a kernel able to detect 'fuzzy edges'
and to refine it to an edge after some convolutions. So, if the input images
are always consistent between them and with the input images for inference, I
think the problem may be diminished (?), even if, as you say, we introduce
some misclassification error. Obviously, to have the guarantee that all the
input images are generated in the same way is a very strong condition
difficult to achieve.

~~~
robomartin
From my perspective, the only way to get there is if AI practitioners make a
paradigm shift towards encoding understanding rather than making classifier
systems trained with massive data sets. The classification approach has a very
real asymptotic limit on what can be achieved. You can train NN's using large
data sets on some domains but not all domains. Just think about what a dog can
do, even just a puppy. We are nowhere near to that. Not even close. This is
because our AI classifies without understanding.

I have books on AI that are thirty years old. I think I can say they cover
somewhere between 80% and 90% (if not more) of what AI is today. The
difference is computing that is thousands, millions, of times faster, massive
amounts of storage, etc. In other words, one could very well argue we haven't
done much in 30 years other than build faster computers.

~~~
somethingsome
I don't think that neural networks are the right framework to achieve general
purpose artificial intelligence (AGI). And indeed, the AI field may need a
paradigm shift to achieve higher classification goals. I believe that
probabilistic neural networks may be an interesting extension toward general
purpose AI, even though this kind of networks need even more data.

If we take the example of a puppy, it seems to generalize pretty well using
something like one-shot learning, but is it? I cannot confirm for sure how
much data a puppy has already digested before being able to do what we could
call "one shot learning". So maybe, the exposure to data is already there,
waiting for a specialization toward a particular task.

Giving the ability to a network to be probabilistic enable it to do inference
using uncertainty, which is clearly a neat feature when you are gravitating
toward AGI for scene understanding.

In the case of video compression, scene understanding may introduce more
artifacts IMO: Even if the scene is captured with high end cameras, on a pixel
level basis, the edges will never be perfectly neat. I think this will
decrease the ability of any network to "understand" which object is at the
edges, this results in low classification rates on them, resulting in bad
compression/decompression quality (?) for features that are important to the
human eye.

All in all, I'm not sure that NN are the right tool for this kind of problems.
But we are diverging from the main subject VVC, Thanks for the very
interesting comments :)

------
im3w1l
50% is very impressive. It's not just a gold rush of low hanging fruit
anymore, they did real work, created real benefits. I'm willing to pay a
little tax on my devices or softwares for this.

------
jonpurdy
My 2012 Mac Mini has quickly become much less useful since YouTube switched
from H.264 (AVC) to VP9 for videos larger than 1080p a couple of years ago
(Apple devices have hardware decoders). I've tested 4K h.264 videos and they
play wonderfully thanks to the hardware.

My internet connection speeds and hard drive space have increased much faster
than my CPU speeds (internet being basically a free upgrade).

So I don't appreciate new codecs coming out and obsoleting my hardware to save
companies a few cents on bandwidth. H.264 got a good run in, but there isn't a
"universal" replacement for it where I can buy hardware with decoding support
that will work for at least 5-10 years.

~~~
crazygringo
Honestly, the expectation that a 2012 computer will play 4K video seems a
little unreasonable, no? 4K video virtually didn't even exist back then. I'm
actually amazed it even handles it in h.264.

This isn't about saving companies a few cents on bandwidth. It's about
_halving_ internet traffic, about doubling the number of videos you can store
on your phone. That's pretty huge. You can still get h.264 video in 1080p on
YouTube so your computer is still meeting the expectations it was manufactured
for.

~~~
jonpurdy
It's not so much about it being able to handle 4K, but Youtube already has too
low of a bitrate for 1080p (resulting in MPEG artifacts, color banding, etc).
So I like to watch in 2.7K or 4K downsampled, since at least I get a higher
bitrate.

The bigger problem that I didn't mention is with videoconferencing: FaceTime
is hardware accelerated and has no issues with 720p, but anything WebRTC seems
to prefer VP8 or VP9 codecs, which fails on my Mini and strains my 2015 MBP.
Feels like a waste of perfectly good hardware to me.

------
superkuh
I'd rather have slightly larger files that don't take hardware acceleration
only available on modern CPU to decode without dying (ie, h264). Streaming is
creating incentives for bad video codecs that only do one thing well: stream.
Other aspects are neglected.

And it's not like any actual 4K content (besides porn, real, nature, or
otherwise) actually exists. Broadcast and movie media is done in 2K then
extrapolated and scaled to "4K" for streaming services.

~~~
crazygringo
Huh? TV and movies are widely shot with 4K cameras these days.

What is 2K? I've never even heard of a "2K" camera. Where did you get the idea
things are being filmed in "2K" and being scaled to 4K?

Genuinely curious where you're getting this information from. Or are you
confused because 1080p refers to the vertical resolution while 4K refers to
the horizontal resolution?

~~~
superkuh
[https://www.engadget.com/2019-06-19-upscaled-
uhd-4k-digital-...](https://www.engadget.com/2019-06-19-upscaled-
uhd-4k-digital-intermediate-explainer.html) is one easily found example but it
wasn't where I had read it. I'm pretty sure I've seen it on HN itself.

 _edit_ : here's another
[https://old.reddit.com/r/cordcutters/comments/9x3v4e/just_le...](https://old.reddit.com/r/cordcutters/comments/9x3v4e/just_learned_most_4k_content_is_actually_2k/)

~~~
crazygringo
OK, so by 2K you mean 1080p. That's a very unusual nomenclature but I see what
you mean, thanks.

The top link in the reddit thread disproves what you're saying though:

[https://4kmedia.org/real-or-fake-4k/](https://4kmedia.org/real-or-fake-4k/)

Somewhere between a third and a half of films are listed as "real 4K".

So there is actually _tons_ of real 4K content. (And the list is just films --
there are plenty of streaming TV shows in real 4K too, like Mrs Maisel.)

There might be another reason for the misperception -- it's true that film
editing is generally done in something lower-quality like compressed 1080p,
but that's just for speed/space while you work. All the clips "point" to the
4K originals, so when the final master is produced, it's still produced out of
that "real" 4K.

~~~
fomine3
BTW: I really dislike calling horizontal 2660px as "2K" resolution. It's even
close to 3K than 2K. It should be called as 2.5K.

------
liquid153
Will devices need new hardware. Also I thought companies were all on board
with royalty free VP9

------
mrfusion
So how does it achieve this compression from a laypersons perspective?

------
qwerty456127
How many weeks does it take to encode a 1-minute video on an average (non-
gaming, I mean without a huge fancy GPU card or an i9/Threadripper CPU) PC?

------
irrational
Anyone know how this compares to AV1?

------
m3kw9
It will be first adopted by pirates for sure

~~~
syshum
H264 seems to still be the preferred codec in this space, even though H265 is
a smaller file size.

Largely due to the CPU over head of H265, though I am not sure why more people
do not use GPU encoding over CPU Encoding, I have never been able to notice
the difference visually

~~~
ctdeneen
Nvidia's NVENC doesn't support CRF, one of the more popular methods of rate
control during encoding.

~~~
gsich
NVENC is too low quality for the scene.

------
xvilka
Due to the patenting Fraunhofer probably did more harm to humanity than
something good. At least its software division.

------
shmerl
Another patent encumbered monstrosity? No, thanks. Enough of this junk. Some
just never learn.

