
Fuzzing FFmpeg for fun and profit - kierank
http://obe.tv/about-us/obe-blog/item/26-fuzzing-ffmpeg-for-fun-and-profit
======
viraptor
I can't wait for the time when if your project reads files, fuzzing it is a
well-known, recommended practice. (unless you're writing fully managed code)

There were some fuzzing projects before afl, but even with afl's recent
popularity we're still in the situation where you just have to grab a random
application, point afl at it and get some basic crashes in a few minutes. With
clang-analyzer, coverity, afl, and many other projects available for free,
there's no reason this should be possible.

Then again, I'm still waiting for the time when people don't code with sql
injection issues...

~~~
cornstalks
Even if you're writing fully managed code you should still fuzz. It can help
find logic errors in the code which can cause erroneous program states or
crashes.

The fact that the code is managed helps prevent against some nasty exploits,
but it doesn't necessarily mean the code always does what the programmer
intended.

~~~
wyldfire
Agreed! I like to think of the fuzzers as merely an automated tool to drive
test cases, maximizing on coverage.

So when asked, "Why fuzz test?" I can answer, "remember when you wrote fifteen
test cases and wondered to yourself whether you needed a sixteenth? But then
you decided that it would take more work than it's worth to do that. -- Well,
now we can tell the computer to just keep trying stuff until it finds unique
cases."

~~~
mraison
There is also generative testing, which can be a good middle-ground between
hand-crafted tests and fuzzing. You can think of it as a fuzzer with more
guidance regarding input generation, and can uncover edge-cases that a fuzzer
would take forever to discover. I tend to think of those 3 approaches as
complementary.

Example in Clojure:
[https://github.com/clojure/test.check](https://github.com/clojure/test.check)

------
jamesrom
Can someone explain why you would want to fuzz ffmpeg?

What's wrong with ffmpeg crashing when you feed it invalid input? What's the
alternative to it not crashing? Should it continue transcoding or should it
exit quietly?

What's the problem here trying to be solved?

~~~
001spartan
Fuzzing is useful for finding bugs that may or may not be serious security
issues. If a program crashes when you feed it invalid input, it could
potentially crash in a way that allows code execution.

After finding a crash, a researcher will generally explore this to see if it
allows for arbitrary code execution. If it does, it is possible for someone to
create a weaponized exploit.

For a utility as widely-used as ffmpeg, any bugs pose a threat to many
systems. Fuzzing it makes the Linux ecosystem safer for all of us (hopefully).

------
cottonseed
afl [0] is awesome. I have it running right now. It found some of bugs in
arachne-pnr [1].

I wish someone would do a JVM version (combined with the lines of QuickCheck
and typed generators).

[0] [http://lcamtuf.coredump.cx/afl/](http://lcamtuf.coredump.cx/afl/)

[1] [https://github.com/cseed/arachne-pnr](https://github.com/cseed/arachne-
pnr)

~~~
MBCook
As a Java programmer I would love a version of afl. The JVM certainly exposes
enough information for other profilers/tracers and things dynamically adding
code (spring/hibernate) that it seems like it should be possible.

~~~
rainforest
There's a tool that does similar to AFL called EvoSuite
([http://www.evosuite.org](http://www.evosuite.org)). It uses a genetic
algorithm like AFL and generates JUnit suites for the target classes. It
doesn't look for crashes like AFL - it's closer to MS's IntelliTest tool but
uses metaheuristics instead of SMT to break conditions.

------
sunnyps
Google has done a lot of work [1] with respect to fuzzing ffmpeg because it's
included in Chrome.

[1] [https://googleonlinesecurity.blogspot.com/2014/01/ffmpeg-
and...](https://googleonlinesecurity.blogspot.com/2014/01/ffmpeg-and-thousand-
fixes.html)

~~~
viraptor
It probably helps that AFL's author works for Google.

------
72deluxe
Very interesting and thought provoking. The article links to Google's page on
finding 1000 bugs in FFmpeg
([https://googleonlinesecurity.blogspot.co.uk/2014/01/ffmpeg-a...](https://googleonlinesecurity.blogspot.co.uk/2014/01/ffmpeg-
and-thousand-fixes.html)).

This mentions the fixes they found (NULL pointer dereferences, Invalid pointer
arithmetic leading to SIGSEGV due to unmapped memory access, Out-of-bounds
reads and writes to stack, heap and static-based arrays, Invalid free() calls,
Double free() calls over the same pointer, Division errors, Assertion
failures, Use of uninitialized memory.)

Some of those things could have been caught by static code analysis or
stricter coding standards. Or, if it had been a C++ project you could replace
some of those problems (NULL pointer dereferences) with using references in
the first place instead of passing pointers around. That might be a bit of an
oversimplification as it is a very complex project but for me it was a
reminder to change my coding style in C++ instead of sticking with the C-style
way of doing it.

~~~
J_Darnley
C++ would make it a horrible project to work on. No doubt you would advocate
for it to use the slow features of c++. Templates or something, right? Then
you would probably say that it should use boost too because that seems to be
the standard library of c++.

~~~
72deluxe
Wrong. I WOULD NOT advocate the use of templates just because I use C++. But I
WOULD advocate using safe language features and compile-time type checking.
Throwing pointers around is bad practice, hence C++'s encouragement to use
references, move semantics and emphasis on use of containers, clear tidy-up
practices, obvious demarkation of what owns an object, RAII etc. This moves
the burden onto the compiler, not the developer or team of developers to
understand where data is owned, when it will be cleaned up etc. etc.

Further, templates do not make C++ slow; it is simply compile-time
polymorphism.

Does runtime polymorphism make software slow? Does casting to or from a void*
in C suddenly make it slow? The entire emphasis in C++ for templates is that
you pass the error checking to the compiler, not at runtime to guess that this
incoming void* is the type that you want. reinterpret_cast is only used where
absolutely necessary!! It is far safer to NOT cast. Avoid casting. Templates
(and the STL) help out in this regard.

I would use the STL for projects, and encourage every else in C++ land to do
the same. Containers make sense, as do the algorithms associated with them
(and they save you reinventing the wheel).

But I would NOT advocate use of Boost just because it is popular or written in
C++. That would be a stupid reason to encourage its use; I personally do not
use Boost. Boost != STL. You appear to be confusing Boost with the STL ??

No doubt your unbridled hatred of C++ and libraries written in C++ (eg. Boost)
has stopped you reading reference works on the STL. You can find a
comprehensive reference here:
[http://en.cppreference.com/w/](http://en.cppreference.com/w/)

There's no Boost on there, btw. That's at boost.org.

Personally, I was only making the comment regarding a reminder of my own
coding style in light of the problems they found in their C codebase. I do not
write C, but I can see that the pitfalls they fell into could easily be
carried out by me in C++ if I stuck to unsafe coding styles.

Furthermore, many of their problems could be solved in C++ land very easily:
NULL pointer dereferences (don't use pointers, use references - it'll never be
NULL), Invalid pointer arithmetic leading to SIGSEGV due to unmapped memory
access (use STL containers), Out-of-bounds reads and writes to stack, heap and
static-based arrays (use STL containers, you may still get out-of-bounds
problems but at least an exception is thrown), Invalid free() calls, Double
free() calls over the same pointer (use move semantics, never see a new or
delete ever again), Division errors (use a static_assert to check that a value
is in range, if possible. Or use an assert in a defensive coding style),
Assertion failures (use static_assert to catch at compile time if possible),
Use of uninitialized memory (RAII - initialise before use).

Your (or anyone's) invalid use of C++ does not make the language invalid, in
the same way that stabbing someone with a knife does not suddenly mean all
knives must be banned.

~~~
lgieron
Doesn't STL containers make your program slower though (e.g. game devs
generally manage memory manually, for better performance)? For something like
FFmpeg, that's a valid concern.

~~~
72deluxe
I thought typically for efficient pipelines on x86 hardware, or to avoid cache
hits where the data being shoved in does not fit the hardware perfectly.

For FFmpeg that is a concern, but I believe the replier to my original comment
was disparaging C++ for no good reason, other than it wasn't C...

------
zurn
No mention of security impact of these bugs in the tracker. Anyone know if
FFmpeg has some vulnerability process, or do they just commit fixes and let
other people worry about exploitability, security hotfixes and CVEs, like
Linux?

------
slederer
Nice work! Can you share some details on the time/resource requirements for
this tests/setup?

------
aorth
I didn't see any note about what version these bugs were found in, or will be
fixed in. For reference, at the time of this writing, my GNU/Linux box has
ffmpeg 2.8.1.

~~~
kierank
Fixed in whatever git master was at the time and I believe they were
backported to the latest maintenance version at the time.

------
hueving
OT: I can't describe how much I loathe "for fun and for profit" in titles -
especially since the majority aren't linked to anything describing a monetary
or resource gain. If you want to say something is useful, find a different way
to say it!

I wish those titles had a weight to drag them off the front page much quicker.

~~~
libber
Maybe you know this, maybe you do not but these are all a nod to
[http://insecure.org/stf/smashstack.html](http://insecure.org/stf/smashstack.html)
which itself did not directly involve profit. It's a security thing.

~~~
hueving
I know, but it's the equivalent of "x considered harmful". It just shows a
lack of originality.

