
Falsehoods programmers believe about video - pomfpomfpomf3
https://haasn.xyz/posts/2016-12-25-falsehoods-programmers-believe-about-%5Bvideo-stuff%5D.html
======
derefr
> rendering subtitles at the output resolution is better than rendering them
> at the video resolution

I would like to know what's wrong with this approach. I watch a lot of
commentated speed-run videos: that's often something like ~244p video, plus
soft subtitles. The subtitles get rendered at the source resolution
(presumably, into the video framebuffer) and then upscaled along with the
image, forcing them to be a tiny blurry mess instead of the crisp, readable
text they could be.

~~~
CoolGuySteve
It's also missing the most common error I see: conflating subtitles with
closed captions.

Closed captions are positioned on the screen to indicate who's talking, have
descriptive audio for sound effects, and should be in a high contrast easy to
read font (most people with hearing deficiencies also have problems seeing,
ie: out of date prescriptions for both hearing aids and eye glasses).

As far as I know, QuickTime does it right but the Apple TV, Netflix, and
YouTube fuck it up, but that's because I helped write the QuickTime one way
back.

~~~
deadmutex
AFAIK, The YouTube implementation does all of those.

Here is a demo: [https://www.youtube.com/watch?v=BbqPe-
IceP4](https://www.youtube.com/watch?v=BbqPe-IceP4)

Please do not spread falsehoods.

Disclamer: I work at YouTube.

~~~
deoxxa
[http://take.ms/fwwhx](http://take.ms/fwwhx)

That is frustratingly poor contrast.

~~~
deadmutex
This might be because you might have changed your settings for CC in the past.

Here is what mine look like
[http://imgur.com/HLIVXQ6](http://imgur.com/HLIVXQ6)

You can click settings again to change font sizes, font family, colors, etc.

~~~
smnscu
I have to change my CC settings every damn couple of days because they revert
to the same white-on-white 400% bullshit. So thanks for that.

~~~
deadmutex
I am not sure, but the screenshot I submitted seems to be the default
settings.

However, software that is 100% perfect is pretty much impossible to write, and
if you think there's a systematic issue, please file a bug, so it can help
others in same situation.

~~~
DonHopkins
I am absolutely sure I never changed my settings, and that they have changed
over time without me doing anything. Why would anyone want 50% font size at
25% opacity on a yellow then green background? I didn't ask for that. Yes,
this is most definitely a systematic issue.

The bug reports I've submitted to google have been ignored, and that's a
frustrating distraction from what I'm paid to do. Maybe if you submit one
yourself, somebody will pay attention, because google is paying you to work on
youtube, and hopefully they will take you more seriously than their users.

~~~
saurik
> Maybe if you submit one yourself, somebody will pay attention, because
> google is paying you to work on youtube, and hopefully they will take you
> more seriously than their users.

FWIW, I always read "file a bug report", when not used to mean "I need more
detail" but to mean "talk to the hand" and when spoken by someone working
close to a project, as "fuck off", particularly if the person never even
bothered to determine whether or not you've used their bug tracker in the past
(or even filed a bug already for this specific issue).

When I find someone on a forum with a bug that I haven't heard about, I sit
around and talk to them until they either get tired of wanting to talk to me
or I get the information I need to fix the problem. The alternative would
essentially translate to "I don't actually care about this bug", as that's the
only way you are going to get certain classes of bug report. I have shown
people at Apple bugs that they were absolutely fascinated by momentarily and
then told "File a Radar". I clearly wasn't in a position to do at the moment
and which of course I forgot to do it when I got home... they should know this
happens, because this assuredly happens to almost every single person they
tell that to (and no, "well, we do see a large number of bugs filed" is not
evidence against "people you tell to file a bug using your arcane system,
particularly if they have to do it days later, probably won't"), and yet even
when a potentially rare and real and critical bug is shown to them in person
(this was even at an event where the whole point was to work with customers on
their issues), their response is easentially "engh, I don't care if this
doesn't work unless it affects a ton of people". As someone who works in
security, I'm going to assert "do you want vulnerabilities? because this
attitude is how you get vulnerabilities": every bug is precious as it is a
mistake in your mental model of the software, and who knows how far down the
rabbit hole that mistake will take you.

Sure: I realize that the engineer isn't always the best person to do this, and
even in my tiny company I had to solve that, but the solution isn't to tell
people to "go use the bug tracker", a comment which shunts annoying work
learning a new system, one which is all too likely to demoralize them (Apple's
Radar is a great example of this), but instead to have someone whose job is to
talk to people to follow up with credible bugs: I'd go "hey Xyz, there's a guy
on this forum who's complaining about something I hadn't heard of before; can
you try to get more details from them?" (where Xyz has changed over the years,
but has always been one of the few key positions). I couldn't begin to count
the number of times I have debugged an issue with someone on reddit.

~~~
DonHopkins
The rule of thumb I used was: if Google is paying someone enough to go online
and post defenses of youtube like "Please do not spread falsehoods", and tell
me things I already know like "software that is 100% perfect is pretty much
impossible to write," then part of his paid job should also be filing bug
reports using the bug tracking system he uses every day, and probably already
has an account logged in and a tab opened on, when the falsehoods turn out to
be true.

If YouTube were open source, and I could look at the source code of the
keyboard handler to find the cause of the problem myself, prove my bad
experience was not just a falsehood to be brushed off, and possibly even
suggest a fix, then maybe I would have been more motivated to put my own time
into filing a bug report.

But Google is a huge well funded advertising company that payed billions of
dollars for YouTube and makes billions of dollars off of it, has a huge
complex system set up for digital rights management, promoting and paying for
advertisements, enabling copyright holders to report violations, paying many
employees for actively pursuing and resolving those copyright violations,
removing inappropriate content, hiring conservative lobbyists and sending
executives to kiss Donald Trump's ring [1], etc.

So I would expect YouTube employees to put at least as much time and effort
into reporting bugs about their own product to their employer, as they put
into monetizing YouTube while defending its reputation from people they
perceive as spreading falsehoods about it.

[1] [http://www.reuters.com/article/us-usa-trump-google-
idUSKBN14...](http://www.reuters.com/article/us-usa-trump-google-
idUSKBN1430G6)

~~~
deadmutex
I am not getting paid to do this, and actually am taking some time to be with
the family for holidays. I just saw the original comment saying that YT
"fuck(s) it up" when it comes to CC, and it looked OK to me. So, I just wanted
to share my results so that people do not assume "it's screwed up everywhere".
I do know that the engineers I have met at work really want to try to do the
right thing for users.

I don't have access to file a bug through a work account for the next few
days, and if you come across any issues (like CC broken by default), please
file a bug with a lot of details. People do look at that stuff. I am glad that
you found the source of the issue, and I hope you can agree that it would've
been impossible to find it if I had just filed a bug. I do not work anywhere
close to the team that implemented the CC, and when people have said "file a
bug" to me in a work context, they have meant it as a way to "let's keep track
of this so it's not forgotten". Luckily, the people I have met at work have
been good about this. I do not speak for Google or anyone else there, just
sharing my own personal experience.

~~~
DonHopkins
Thanks for responding. I'll submit a bug if I can, now that I know the cause
the problem. But I need to know where best to submit it.

It's a design and documentation bug, that needs to be addressed at a higher
level by re-evaluating the decisions and justifications behind all the
keyboard accelerators, removing the ones that nobody actually uses and that
cause more problems than they solve (like making closed captioned text
transparent and changing its colors), implementing full and immediate "?"
keyboard help, and writing some online documentation.

So should I simply click "send feedback" on any random youtube video and write
up my suggestions, as this page tells me to? [1] I've done that now, so let's
see what happens.

Do you really sincerely think my suggestion will actually make it back to the
designers through that channel and that changes will happen as a result? Is
there a way for me to track it?

Or is there a better accountable bug tracking system that I can actually
submit a real trackable bug into and watch the progress and see if it gets
marked "will not fix", like
[https://bugs.chromium.org](https://bugs.chromium.org) but for youtube? Do you
have access to a better bug tracking system for youtube that's not public?

[1]
[https://support.google.com/youtube/answer/4347644?hl=en](https://support.google.com/youtube/answer/4347644?hl=en)

------
franciscop
The original one ( [http://www.kalzumeus.com/2010/06/17/falsehoods-
programmers-b...](http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-
believe-about-names/) ) left me bafled. Then I realized you have to strike a
balance; otherwise you cannot deal with names at all. The point where drawing
the line depends on your industry/customers, but I'd safely say that it's too
restrictive nowadays so these lists are useful somewhat and of course they are
interesting.

~~~
kibwen
It's true that you have to draw a line somewhere based on technical and
business constraints, but an important takeaway of the names article is that
you almost certainly don't need to do anything with a name other than treat it
as an opaque string that can be displayed back to the user. For example, I'm
struggling to think of a good reason why user registration would require
separate first name and last name fields, and yet this practice is
overwhelmingly common. For that matter, why do you want my real name at all,
considering that it can't be used as a unique ID anyway?

~~~
smallnamespace
But sometimes your government expects names to be broken into given and family
names (the US, Japan, and China all seem to make this assumption -- every
government form I've seen from those countries wants your name fully broken
out).

I've no direct experience with, say, Russian or Latin American governments,
but cultures that use explicit patronymic or matronymic names might expect
that broken out as well.

If you ever need to submit user data to the government (e.g. for tax reasons),
and you don't ask your user to break the name apart, then you will necessarily
be guessing, which seems strictly worse than just asking them how their name
might split.

At the end of the day, if you operate in a given culture, then you need to
address those cultural norms. Bending over backwards to support every possible
edge case seems unwise if they also happen to disagree with those norms.

~~~
kibwen
I'm explicitly disregarding government websites here, since the government has
a legitimate reason to care about my full name (and gets to define what a
"legal" name means), and also 99% of the websites that I sign up for are
unrelated to the government. There's no reason for a social network to report
the names of all its users to a government agency automatically, and I'm
skeptical that even my bank would have such a requirement.

~~~
smallnamespace
I don't mean reporting all the names, but for example if you ever transmit
payments there is a KYC process, and if you are a bank you must report any
suspicious money laundering activity.

At the end of the day there are cultural conventions around names, and various
agencies use them. I don't see why software should be explicitly culturally
neutral, unless your audience is explicitly a global one (and even then, I
think localization is preferable to just sticking names into a single field).

~~~
Thrillington
There is significant engineering cost to implement localization correctly. If
you're a bank or in another market where you interact with a government, then
you must bear those costs. If not, why not just use an opaque string and spend
your engineering dollars on your actual product?

------
nhaehnle
This is a good list, but it would be so much better with some (brief) pointers
to counter-examples to the beliefs.

~~~
unscaled
This unfortunately follows the conventions of the genre called "Falsehood
programmers believe about X": [http://spaceninja.com/2015/12/08/falsehoods-
programmers-beli...](http://spaceninja.com/2015/12/08/falsehoods-programmers-
believe/)

I honestly think this genre is horrible and counterproductive, even though the
writer's intentions are good. It gives no examples, no explanations, no
guidelines for proper implementations - just a list of condescending gotchas,
showing off the superior intellect and perception of the author.

~~~
inopinatus
Perhaps there is scope for a list of Falsehoods Programmers Believe About
Falsehoods Programmers Believe.

~~~
mojuba
Let's start then:

1\. Everything said in every "Falsehoods Programmers Believe..." list is true.

The Falsehoods sound like ultimate truths only because of the literary genre.
They sound like they were written by an expert who not only knows what's true,
but also knows what we think we know, which kind of automatically takes
him/her to the next level of expertise.

~~~
unscaled
3\. Every falsehood that is true should be accounted for.

4\. Every falsehood that is true CAN be accounted for.

5\. Making your code compatible with a falsehood doesn't come with a price.

6\. There are no falsehoods which are mutually exclusive.

~~~
pwdisswordfish
> Every falsehood that is true

Hmm.

------
donatj
\- "all subtitle files are UTF-8 encoded"

Hah, this strikes really close to home. I've had to work with so so many
subtile files in Eastern European and Turkish Windows codepages mostly but not
entirely compatible with Win-1252. There's no way to tell them apart
programmatically, so you check that the extended characters make sense. It's a
bit of a nightmare.

------
justinlaster
> a H.264 hardware decoder can decode all H.264 files

and

> video decoding is easily parallelizable

At a previous job, I don't know if it was just the field I was in or just bad
luck, but having to explain this over and over again was kind of a personal
nightmare.

That being said, this is an excellent list!

~~~
wstrange
Curious - Why is this? Does this assume streaming video, and you can't look
ahead in the stream?

If you can jump ahead, it would seem to be easy to have multiple threads,
starting at key frames to decode the content. You'd have to splice them
together, but this seems possible.

~~~
saurik
1) You are now assuming that "seeking to a position will produce the same
output as decoding to a position"; even if the video is well-formed (and you
don't end up with massive issues where the key frames just don't work
correctly) you are likely going to end up with subtle discontinuities between
every segment. 2) You are now going to have to be buffering a couple seconds
worth of uncompressed video somewhere, probably not on the GPU, leading to a
much higher I/O bandwidth requirement somewhere that isn't good at that, so
this is only probably going to be sort of parallel (FWIW, I believe most
people who try to do parallel video decoding are assuming that they can have
different parts of the encoder concentrate on different sections of the
screen, which sounds good until you see how non-local video decoding can be).

~~~
the8472
> 1) You are now assuming that "seeking to a position will produce the same
> output as decoding to a position"; even if the video is well-formed (and you
> don't end up with massive issues where the key frames just don't work
> correctly) you are likely going to end up with subtle discontinuities
> between every segment.

Wouldn't "the keyframes just don't work correctly" result in corrupted output
anyway?

If we're worrying about already-broken situations then it is quite obvious
that additional breakage may occur in related features.

~~~
msandford
I think the point is that video definitely is that broken and the only reason
video does work is because everyone has work-arounds for everyone else's bugs.
At least that's my experience with video. It's all a disaster.

~~~
pdkl95
I believe[1] this isn't necessarily about broken files. There is a _lot_ of
variation allowed by the spec. One example that I've seen in the wild is
extra-long (> 60 seconds) periods between I-frames. Seeking to an arbitrary
point either requires searching backwards from the seek-point for an I-frame
and storing a _massive_ amount of RAM. As this usually isn't possible and
would require decoding _hundreds_ of frames, decoding may cheat and make do
with as many P and B frames as it can handle.

[1] I haven't actually read most of the h.265 spec. It's possible these are
technically invalid files.

~~~
the8472
a 1-minute span for I-frames would not be prohibitive for parallel processing
that the quotes part was referring to, with a 60-minute video it would still
give you 60 segments to process in parallel.

~~~
pedrocr
A single uncompressed frame of 1080p video occupies 28MB in RAM, so 1 minute
of 24fps video will take up 40GB. If you want to be able to run 4 cores at
once it's 3 times that. You won't be doing that any time soon on your laptop
or smartphone.

~~~
gkop
Curious as to your math? My naive thinking is 1920 * 1080 * 8 (generous) bytes
is around 16MB.

~~~
pedrocr
I forgot where I got 28 from but it's indeed a mistake. For normal display you
could get away with 1920 * 1080 * 8bit = 6MB. For a 10bit display it would be
around 8MB. You do indeed often use 32bit float for high-quality processing
but since what we're storing here is the output frame you would finish all
that processing and then go down to 8 or 10bit per channel. So recalculating
the math that's 8GB for 1 minute of video, still way too impractical.

------
smallnamespace
This article would be infinitely better if it any provided counterexamples.

------
iopq
> my hardware contexts will survive the user’s coffee break

hell, they don't survive alt-tabbing into a game that has a different
resolution than the monitor

~~~
pvdebbe
Heh... for some reason youtube can't survive when I start a video on my
monitor and then I switch outputs to TV using an xrandr script by closing one
output and opening the other. I thought it was possible to continue the video
that way but once I noticed it doesn't work, it made sense immediately.

Mplayer and co, on the other hand can cope with it but my window manager can
mess it up so I don't bother.

------
tuxidomasx
This list makes me not want to program any [video stuff]

------
scottlamb
From the article:

> I can exclusively use the video clock for timing

Heh. I just finished writing up a design doc to address problems I had with
this, and I referenced "Falsehoods programmers believe about time". Then I
opened Hacker News and saw this article. So this is very timely for me.

(My doc: [https://github.com/scottlamb/moonfire-nvr/blob/new-
schema/de...](https://github.com/scottlamb/moonfire-nvr/blob/new-
schema/design/time.md))

------
microcolonel
I don't think programmers believe any of the video decoding falsehoods; not
because they know any better, but because they know they don't know.

Also, none of these unfounded preconceptions make intuitive sense, so I don't
see why people would believe them.

------
imaginenore
> _interlaced video files no longer exist_

Interlaced video files should no longer exist.

Seriously, f __k interlaced video.

> _upscaling algorithms can invent information that doesn’t exist in the
> image_

That's not a falsehood. Upscaling _does_ invent information that doesn't exist
in the image.

~~~
emcq
Perhaps that author was being pendantic, but from an information theroetic
perspective it is correct that you cannot invent information with upscaling.

The upscaled image does not have more information than what was in the
original image; you can reconstruct the upscaled image given only the
information available in the original image, the output resolution dimensions,
and upscaling algorithm.

~~~
imaginenore
That's like saying fractal images are not information. Just because something
is generated by a formula, doesn't mean it's not new information.

------
jheriko
it is true, video is a nightmare mess littered with weird functionality nobody
needs. (limited range only just disappeared in rec 2100, optionally???
really??? i'm not worried about my electron gun in my CRT from 1975 these
days...nor do i want to know what a Y or a Cb or a Cr means because everything
is RGB and B&W TV is long dead... and 4:2:2 is not exactly compression so much
as computational overhead etc.. etc.)

its a nightmare, but the reason for these observations is precisely that it
shouldn't be a nightmare. this area of programming is a wasteland ... nobody
that good wants to solve these trivial problems :/

~~~
mrob
Chroma subsampling isn't going anywhere. You'll usually get subjectively
better quality with 4:2:0 chroma compared to 4:4:4 at the same bitrate. And
this means you can't have everything in RGB, so all the colorspace conversion
complexity can't be ignored.

Try experimenting with chroma subsampling in JPGs, but note that not all image
viewers have good chroma upscaling. MPV can display still images as well as
video and you can choose the chroma scaling algorithm.

~~~
haasn
> Chroma subsampling isn't going anywhere. You'll usually get subjectively
> better quality with 4:2:0 chroma compared to 4:4:4 at the same bitrate. And
> this means you can't have everything in RGB, so all the colorspace
> conversion complexity can't be ignored.

What's more, YCbCr is more efficiently compressed than RGB even if you don't
subsample, for the same reason that a DCT saves bits even if you don't
quantize: Linearly dependent or redundant information is moved into fewer
components, in this case most of the information moves into the Y channel with
the Cb and Cr both being very flat in comparison. (Just look at a typical
YCbCr image reinterpreted as grayscale to see what I meant)

~~~
jheriko
isn't it the case that amount of data required to store the result of a
lossless DCT is bounded below by the size of the data, and this is why
lossless JPG compression does not use such a scheme?

~~~
haasn
I'm not actually sure. In retrospect, I'm not sure what ‘DCT without
quantizing’ really means, since the output of the cosines are probably real
numbers? I guess the interpretation would be quantized to however many steps
to reproduce the original result when inverted (and rounded).

In lossless JPEG it seems they omitted the DCT primarily for this reason: It
not being a lossless operation to begin with, if you actually want to store
the result. What other lossless codecs often do is store a lossy version such
as that produced by a DCT, alongside a compressed residual stream coding the
difference (error).

In either case, it's important to note the distinction between reordering and
compressing; reordering tricks like DCT can reorder entropy without affecting
the number of bits required to store them, but the simple fact of having
reordered data can make the resultant stream much easier to predict.

For example, compare an input signal like this one:

FF 00 FF 01 FF 02 FF 03 FF 04 ...

By applying a reordering transformation to move all of the low and high bytes
together, you turn it into

FF FF FF FF FF .. 00 01 02 03 04 ..

which is much more easily compressed. As for whether that's the case for (some
suitable definition of) lossless DCT, I'm not sure.

------
lolc
And this is why I don't do video. (And have lots of respect for the people who
write the libraries I use.)

------
FranOntanaya
Could write an entire page just on subtitles.

------
antirez
There is a lot of potential information in such a list. But in this form is
quite a "trust me" thing that does not really add to the reader knowledge.

------
milansuk
Nice one! Now I would like to see article like this, but about ciphers,
hashes, digital signitures, etc.

------
the_duke
An explanation for each 'falsehood' would have been nice

------
ryanmarsh
Well video programming just sounds delightful.

/sarcasm

------
AznHisoka
can we have falsehoods programmers believe besides video that are more common?
this list probably is relevant for 1% of programmers here.

~~~
greenyoda
Just type "falsehoods programmers believe" into the search box at the bottom
of the page and you'll get a ton of previous articles on falsehoods in various
domains that have been posted here over the years:

[https://hn.algolia.com/?query=falsehoods%20programmers%20bel...](https://hn.algolia.com/?query=falsehoods%20programmers%20believe&sort=byPopularity&prefix&page=0&dateRange=all&type=story)

And while this topic is not personally relevant to me since I don't work with
video decoding, I do find learning about different technologies interesting.
Reading this gives me an appreciation for how much effort goes into making
video, something we all take for granted, work.

If people only posted articles that were relevant to a majority of readers, HN
would be a much less interesting place.

