
FFmpeg and a thousand fixes - abraham
http://googleonlinesecurity.blogspot.com/2014/01/ffmpeg-and-thousand-fixes.html
======
samworm
It is interesting that YouTube isn't mentioned in this blogpost, despite there
being good evidence that ffmpeg has been used there[1].

The fuzz testing they mention is based around constructing malformed (or at
least "exotic") input files and then monitoring for failures... ie simulating
exactly the kind of attack someone might use against YouTube's transcoding
infrastructure.

[1] [http://multimedia.cx/eggs/googles-youtube-uses-
ffmpeg/](http://multimedia.cx/eggs/googles-youtube-uses-ffmpeg/)

~~~
chadillac
I'm not proud of this legacy code, but... it exists because it was a real
world issue that would cripple a conversion server in production envs when fed
certain files with timing/syncing errors as part of an automated upload and
conversion process. When it would crash it would consume 100% of the cores and
eat up enough RAM to force swap.

This cron has been keeping ffmpeg in check for over 4 years (not 6, whoopsie)
in a production environment at this point... it processes thousands of videos
a day using a custom queuing and reviewing system.

    
    
      1 <?php
      2 /*
      3 **  Cron responsible for detecting failed/hung ffmpeg
      4 **  instances and killing them.
      5 */
      6 require_once '/lib/class/dbmysqli.php';
      7 
      8 $output = shell_exec('ps -aeo pid,etime,args | grep ffmpeg | grep -v grep');
      9 
     10
     11 
     12 
     13 preg_match_all("/^[ ]{0,}([0-9]*)[ ]{0,}(.*?) .*?([0-9]*?)\-[0-9]*?\.flv /m", $output, $preg_out, PREG_SET_ORDER);
     14 
     15 if (!empty($preg_out)) {
     16 
     17     $db = new DBmysqli('dbmaster');
     18 
     19     foreach ($preg_out as $process) {
     20         $pid = intval($process[1]);
     21         $etime = $process[2];
     22         $queue_id = intval($process[3]);
     23 
     24         $etime = intval(str_replace(':','',$etime));
     25 
     26         // elapsed time >= 60:00 (1 hour)
     27         if ($etime >= 6000) {
     28             $fail_sql = "DELETE FROM media_upload_queue WHERE queue_id = $queue_id";
     29             shell_exec("kill -9 ".$pid); // kill hung ffmpeg process
     30             $db->query($fail_sql); // remove file from DB
     31             // debate if life is worth living...
     32         }
     33     }
     34 }
     35 ?>
    

Yes I know there are better ways to do this now at the OS level, but it was a
quick hack over a half decade ago and continues to work... ain't broke, don't
fix it kinda deal.

~~~
rip747
"I'm not proud of this legacy code,"

"This cron has been keeping ffmpeg in check for over 6+ years in a production
environment at this point... it processes thousands of videos a day using a
custom queuing and reviewing system."

why do we programmers always feel we need to apologize for something that we
did quickly, but has been running without incident for a number of years.

take a bow my friend. that was an awesome patch you came up with all those
years ago.

~~~
MBCook
> why do we programmers always feel we need to apologize for something that we
> did quickly, but has been running without incident for a number of years.

The script is only treating the symptom, not the problem.

It would be better to detect the files before they waste an hour of time, so
you could tell the user instead of having them silently disappear. Maybe there
is something you could do to fix the files. The programmer part of me says
there is a real fix that needs writing.

It's obviously a great script if it's worked that long. Who knows how long it
would take to track down and fix the bug(s) causing the issue. If they haven't
_needed_ to fix the bug in all these years just writing that script was
obviously a good decision.

~~~
zaroth
> If they haven't needed to fix the bug in all these years just writing that
> script was obviously a good decision.

Exactly this. There are an unbounded number of bugs which will cause this
single symptom. I think detecting the symptom is exactly the right solution.

~~~
kbenson
In theory. In reality, after treating one or two causes, the occurrence rate
may fall to once a month or year.

I agree that mitigating the problem is a good _first_ step though.

------
pjmlp
The wonders of C:

\- NULL pointer dereferences,

\- Invalid pointer arithmetic leading to SIGSEGV due to unmapped memory
access,

\- Out-of-bounds reads and writes to stack, heap and static-based arrays,

\- Invalid free() calls,

\- Double free() calls over the same pointer,

\- Division errors,

\- Assertion failures,

\- Use of uninitialized memory.

But hey, any good programmer always writes perfect C code.

~~~
vinkelhake
Really breaking new ground here. What should FFmpeg have been written in? I
can think of a few candidates, but I think they share a number of those
"wonders".

~~~
Perseids
On the 30C3 there was a talk about a C Compiler (actually an llvm optimization
plugin) that can eliminate nearly every memory management related
vulnerability by adding memory checks. The penalty is only a 100% increase in
runtime.

> If you're a C programmer and somebody says like: I've that optimization that
> makes your program 3% faster you go: wow! Then you come and say: well, now I
> make it half as fast you go: w00t?! But then you remember that people
> actually use ruby to serve web pages.

I'd say this would be actually worth it for a far more secure FFmpeg.

[https://www.youtube.com/watch?v=2ybcByjNlq8](https://www.youtube.com/watch?v=2ybcByjNlq8)

~~~
StavrosK
The difference is that FFMpeg is very, very CPU-bound, taking four hours
instead of two to encode a movie is a big hassle. On the other hand, not many
people use a world-facing FFmpeg instance to submit jobs to. I don't see why
security should trump speed here.

~~~
Perseids
You mean, not many people use VLC?

[https://www.videolan.org/developers/vlc.html#thirdparty](https://www.videolan.org/developers/vlc.html#thirdparty)

~~~
Crito
For realtime playback locally sure. I don't think there are many/any public-
facing systems backed by VLC. Presumably VLC prioritizes speed over security.

~~~
Perseids
Oh, _that_ was the point being made. My view is that VLC essentially _is_ a
public facing system, in the same sense that my browser is a public facing
system. Sure, it is not as easy to do a targeted attack against my VLC as
against my mail client (or against the nginx on my server). But it is fed a
lot of untrusted data. The same goes for Chrome, which also uses FFmpeg
according to the article.

------
chubot
This is cool. I wonder how many of the bugs led to code execution?

If you managed to execute code, what privileges do you have in Chrome? I hope
that Chrome was using OS sandboxing for video playing. After all, if they
found 1000 bugs, there are probably a few more zero day's available.

Does playing Flash video in Chrome/Firefox end up using ffmpeg? I'm not that
knowledgeable about how video works, but there are probably at least two
execution contexts: playing .mp4 natively in Chrome, or playing it via a Flash
container.

~~~
jbk
> I wonder how many of the bugs led to code execution?

Quite a few. We're often affected with VLC, and code execution is easy to get
to. But with VLC, you're "only" in userland.

~~~
fulafel
Chrome is also in userland - and it has a sandboxing system. Assuming they're
sandboxing ffmpeg, these bugs are more risky for VLC users than Chrome users.
Plus, Chrome is more diligent with security updates and the auto-update
mechanism is fully automatic.

A sandboxing system provided by either ffmpeg or VLC would be a very good
idea, though it would be some work... encoded data in, decoded frames out via
shared memory. Negligible performance impact.

~~~
chubot
Yeah definitely. Really there is a gross violation of the principle of least
privilege here. A video player is a great thing to sandbox because all it
needs is a video output, very limited file access, and very limited gui input.
It doesn't need to read all your files, open network connections, or start
processes.

Given that there are apparently thousands of bugs in the video parsing code,
it seems like a no-brainer.

Section 5.2 in this DJB paper talks about (portable) isolation of plain
transformations. Video playing is already close to "pure" or could be made
pure pretty easily.

[http://cr.yp.to/qmail/qmailsec-20071101.pdf](http://cr.yp.to/qmail/qmailsec-20071101.pdf)

~~~
jbk
> A video player is a great thing to sandbox because all it needs is a video
> output, very limited file access, and very limited gui input. It doesn't
> need to read all your files, open network connections, or start processes.

Not sure if serious or sarcasm, to be honest, since this seems very far from
what we see.

A media player is not simple to sandbox, (as the MacOS X sandbox showed us for
example), because:

\- you need to open files by yourself, without user interaction, to support
playlists,

\- you need to open connections by yourself to support video protocol like
RTSP, RTMP, RTP,

\- you need raw device access to support Webcams, Capture devices, DVDs, DVB
tuners,

\- you need to access GPU buffers for direct rendering, and/or shaders to do
fast filtering or just plain chroma-conversions,

\- you need to be able to access the audio output, at low-level, for libsync
which is not always doable with the simplified APIs,

\- and I don't understand what you mean by "very limited gui input"; how is
that less than other programs?

Sure, it can be done, with performance costs but it's clearly not a "no-
brainer".

~~~
chubot
I'm talking about a multiprocess architecture like Chrome.

[http://www.chromium.org/developers/design-documents/multi-
pr...](http://www.chromium.org/developers/design-documents/multi-process-
architecture)

The entire app isn't sandboxed -- just the code that does video parsing, i.e.
with the thousands of bugs and hundreds of remote code execution exploits (!).

See my other comment on this topic. ffmpeg is already very modular, and used
in many video players (user interfaces), so this separation is more than
natural -- it _already_ exists in the codebase.

BTW, some people seem to be unfamiliar with the multiprocess/Unix design
approach (usually people with a Windows background, which I came from as
well). I recommend
[http://www.catb.org/esr/writings/taoup/](http://www.catb.org/esr/writings/taoup/)
for a great intro to this design philosophy.

~~~
jbk
> The entire app isn't sandboxed -- just the code that does video parsing,
> i.e. with the thousands of bugs and hundreds of remote code execution
> exploits (!).

The exploits are usually on the protocol (access) level, the demuxer (format)
level but also the decoder level.

While in theory the first 2 are what you call parsing, many security issues
appear also at the decoder level.

And if you want to split the video decoder from the rendering, you need to
introduce an additional memcpy (or two) of full decoded frames, which has an
important impact.

I agree it would be nice to try, with a correct, separated processes
architecture, but it means changing also a bit the usual architecture, where
there is a direct-rendering between the decoder and the output.

~~~
ori_b
What you do is split the file I/O from the rendering. Drop privs on the
renderer, and feed it data from another process that handles the file access,
setup, and other privileged operations.

You can open your X11 socket (or whatever your GUI uses) in the I/O process
and hand it off to the renderer before dropping privileges, and hand it off
without any loss of performance. Another thought -- although I don't generally
touch 3d rendering, or know enough to know if it's possible off the top of my
head -- is something like cross-process pbuffers: you render into one, and
composite the result to the screen from another process. Since your OS already
does this sort of thing (X11 compositors, for example, do this sort of cross-
process compositing), this can't be too expensive.

You might need to use the XACE extension to restrict the renderer to just the
rendering output window in case it gets exploited, so that you don't have
access to the rest of the UI.

The places where vulnerabilities can do damage are when communicating with the
rest of the OS: File I/O allows you to get user data, permanently install
malware, and delete things. The window system allows you to snoop
keystroke/mouse data, synthesize input, and watch what the user does. Drop
privs on those, and your app is fairly effectively sandboxed.

------
scrabble
_Until we can declare both projects "fuzz clean" we recommend that people
refrain from using either of the two projects to process untrusted media
files. You can also use privilege separation on your PC or production
environment when absolutely required._

Well, it's not like I've got a whole lot of options for video processing. Note
that they didn't recommend alternatives that they say _are_ fuzz clean.

I use FFmpeg quite extensively and will continue to do so.

------
nathancahill
Interesting that they are pushing fixes for both FFmpeg and the libav fork. I
guess they decided not to pick sides in that war.

~~~
craigyk
That is good. But this reminds me of the first time I installed 'ffmpeg' on
Ubuntu only to have it not work with my well-tested parameters and also
declare ffmpeg 'deprecated'. Only after some head scratching did I figure out
it had installed libav and an ffmpeg wrapper that tried but failed to be
compatible with the real ffmpeg interface.

I wasn't even aware of the split at the time, but this shenanigan definitely
gave me a strong negative initial impression of libav.

~~~
nathancahill
For sure. The story of FFmpeg and libav is full of sneaky shenanigans like
that.

~~~
vezzy-fnord
A good entry on this, for those interested: [http://blog.pkh.me/p/13-the-
ffmpeg-libav-situation.html](http://blog.pkh.me/p/13-the-ffmpeg-libav-
situation.html)

~~~
teddyh
That is written by the FFmpeg side. Do you have a complementary view?

~~~
0x09
The mpv developers have published an overview of the situation from their
independent perspective which (FWIW) is very accurate from my vantage point as
well: [https://github.com/mpv-player/mpv/wiki/FFmpeg-versus-
Libav](https://github.com/mpv-player/mpv/wiki/FFmpeg-versus-Libav)

------
claudius
Once again I am happy that ‘Segmentation fault. Core dumped.’ is a perfectly
reasonable reaction to malformed user input in my field of work, which only
increases my respect for those who write real-world software. Thanks! :)

------
CraigJPerry
So my paranoia has finally taken over. My first read through and I only take
in these features:

    
    
        * I presume all this infra is used for more than just ffmpeg so why this software specifically? Im conscious its included in everthing from web browsers to games. If this was an attack vector, its a BIG vector.
        * 1121 (or whatever it was) is a curious number to make a post about. Why not at the 1k mark or 1250?
        * Explicitly calling out use of an unpriv user to a readership likely already well versed in basic security practices but notorious for casually ignoring on personal machines
    

Google, blink twice if theyve got ya tied up in some legal requirement that
you cant disclose.

------
lmm
A _thousand_ , and all the categories they list are things the computer should
catch for you. How many more will it take before we switch to better
languages?

~~~
lucb1e
Be aware that these are open-source projects. I don't see _you_ starting a
fork in a better language.

~~~
Goopplesoft
I find this, 'don't complain, fork/fix' mentality ridiculous. Why should he?
If his goal is get people off C, communicating/persuading in a developer
community comments is probably more effective than him creating a project in
some other language that no one would likely ever use.

~~~
pkroll
In what way is asking someone, somewhere, to move a project to another,
unnamed language, going to be more effective than actually starting a project
and putting code into it? Sure, "go do it yourself" is not a pleasant
response, but "go take the tens of thousands of lines of code that mostly
works, and spend person-years rewriting it in another language so that a
certain set of bugs aren't an issue anymore, instead of fixing the mostly
working code" is in no way a reasonable request. "What can be done to help
reduce the bug count, by a casual user?" might be. Contacting Coverity or
another static analysis company that occasionally runs their tools on open
source programs to help the world (and get the free press out of it...), might
result in a huge list of subtle (and hideously obvious) bugs getting squashed.

------
IceyEC
I get a kick out of the fact that they recommend not using FFMPEG but they say
that Chrome uses it.

~~~
abraham
> I get a kick out of the fact that they recommend not using FFMPEG

More specifically Google is recommending you don't use it to "process
untrusted media files"

~~~
ID_HOME
I don't trust any media files... Especially ones from Google who are doing
most of the research... seriously though, media zero-days are unheard of,
probably because they would be extremely valuable.

------
tlrobinson
This must be one of the reasons it seems like there's a new update to VLC
every time I launch it...

~~~
jbk
Yes. That's sad, but we need to update often, because of that.

And our update mechanism suck. We're working on improving the process because
of that...

------
atmosx
> At Google, security is a top priority - not only for our own products, but
> across the entire Internet.

Hm, privacy was not on top, I guess - or we're toasted.

~~~
jlogsdon
Can't deliver privacy guarantees unless you have a secure system in the first
place. That's not to say Google does care about privacy, but security being
top makes sense.

------
emmelaich
Bit off topic but I may as well bring this up here.

    
    
      1. flvrunner uses ffmpeg
      2. biggest result for flvrunner searches is how to remove flvrunner (including browser toolbars)
      3. doubleclick runs webads for flvrunner.com (google own doubleclick)
      4. such ads even run on onlinebehavior.com, which is owned by Google's analytics guy.
    

There's a lot of malware on the net that google could do more to reduce.

Someone more conspiratorially minded than me might make other deductions.

------
neves
The software testing course in Udacity is an excellent introduction to the
fuzzy testing techniques that the article talks about. Here is the link:
[https://www.udacity.com/course/cs258](https://www.udacity.com/course/cs258)

------
justincormack
Years ago (I think it was ffmpeg) it would crash constantly on tv input, but
it ran in a different process and restarted automatically.

------
kokey
It sounds like about the right level of effort required to maintain something
developed by Fabrice Bellard.

------
lhgaghl
"Until we can declare both projects "fuzz clean" we recommend that people
refrain from using either of the two projects to process untrusted media
files."

BAHAHA What a joke.

