
Pulling JPEGs out of thin air - atulagarwal
http://lcamtuf.blogspot.com/2014/11/pulling-jpegs-out-of-thin-air.html
======
tux3
This is INSANELY COOL.

If it's smart enough to learn how to build a JPEG in a day, use it with netcat
and it could probably send quite a lot of things down in flames.

Who needs static analysis :) ?

~~~
gear54rus
Yeah, netcat would be fun. Although, it seems that it also straces (or
similar?) the app it tests.

> _it triggers a slightly different internal code path in the tested app_

This would be impossible on the network.

~~~
jacquesm
Mock it locally, exploit it globally. One more reminder why it's useful to
turn off your server signatures, especially if they spew out version
information.

~~~
jMyles
Oh ho ho no you don't. That's security through obscurity, and that's never
ever OK for anybody.

I never understood this attitude. It has always been my experience that
obscurity is in fact an important part of security. It's a weakness when
_mistaken_ for security, not when understood as part of it.

Sadly, I do actually have signatures (with version information) to mute.

~~~
icelancer
People knee-jerk say that because they assume it's the only thing being done
to secure an asset, when obviously it's a valid defense in depth measure, one
with very low marginal cost (setting a few variables in conf files).

~~~
XorNot
This really depends. The marginal cost of "what version is this box running,
why doesn't this work, oh we don't have that tool?" could be very high on
something like that.

I mean remembering, that the net is full of slow brute-forcers and the like.
Just because it takes a few days to run through all the exploits doesn't mean
that someone won't do it - that's thinking of security in human, individual
terms, as though the threat is targeted rather then general.

~~~
jacquesm
But here is where it comes in handy: If you have your version numbers out
there in some database and a 0 day for that particular version hits then
you're hacked. If not then you might be able to patch your system before a
breach happens. It's no guarantee but since it costs very little _and_ gives
you possibly a bit more time when you need it badly it does not hurt.
Obviously you need to cross all your other t's and dot the i's too.

And if you need your headers to tell _you_ what versions are running and what
tools are installed you are doing something else very wrong.

------
vinhboy
At the risk of sounding really stupid. Can someone ELI5 what's going on here
and why everyone thinks its so amazing?

~~~
0x0
It starts with an invalid .jpg (literally a text file containing "hello"), and
by trying over and over, changing random bytes and tracing the execution of
the decoder program as it is fed the corrupted input, it will drill deeper and
deeper into the program until it has gotten far enough that the input is
actually a valid .jpg, without any human input.

Fuzzing like this is a very effective technique for finding (security) bugs in
programs that parse input, because you will quickly end up with "impossible"
input nobody thought to check for (but is close enough that it won't be
rejected outright), and whoops there's your buffer overflow.

In this particular case, the fuzzer is going beyond just throwing random
input, as it considers which changes to the input trigger new code paths in
the target binary, and therefore should have a higher success rate in
triggering bugs compared to just trying random stuff. And don't forget, this
will work with any type of program and file type, not just .jpgs and the djpeg
binary.

~~~
3rd3
Why does it use fuzzed input in the first place? Couldn’t one just use random
input from the beginning instead? It would be effectively equivalent but
fuzzing of a "hello" string seems to be roundabout.

~~~
Zikes
In this case "hello" was just a pseudorandom starter to seed the fuzzer.

------
userbinator
I remember a very similar technique being used successfully for automatically
cracking software (registration keys/keyfiles, serial numbers) before
Internet-based validation and stronger crypto became common; the difference is
that method didn't require having access to any source code or recompiling the
target, as it just traced execution and "evolved" itself toward inputs
producing longer and wider (i.e. more locations in the binary) traces.

~~~
joshschreuder
That sounds interesting, do you have any more information on this topic?

------
bonzoq
The author of this article is a hacker from the time, when the word hacker
meant something different than it does today. I remember his website from my
early teens when I started using the internet via a dial-up connection back in
1998. Lcamtuf, glad to see you're still around. Your fellow countryman.

------
zackmorris
Potential instructions for trying this on Mac (I was unable to make it work,
perhaps we can build upon this):

curl -LO
[http://lcamtuf.coredump.cx/afl.tgz](http://lcamtuf.coredump.cx/afl.tgz)

tar zxvf afl.tgz

rm afl.tgz

cd afl*

make afl-gcc

make afl-fuzz

mkdir in_dir

echo 'hello' >in_dir/hello

# there is a glitch with the libjpeg-turbo-1.3.1 configure file that makes it
difficult to compile on Mac, so I tried regular libjpeg:

curl -LO
[http://www.ijg.org/files/jpegsrc.v8c.tar.gz](http://www.ijg.org/files/jpegsrc.v8c.tar.gz)

tar zxvf jpegsrc.v8c.tar.gz

cd jpeg-8c/

CC=../afl-gcc ./configure

make

# error: C compiler cannot create executables

# if the above command worked to build an instrumented djpeg, then this should
work

cd ..

./afl-fuzz -i in_dir -o out_dir ./jpeg-8c/djpeg

~~~
cr3ative
Hello,

Install homebrew if you don't have it already, then

    
    
       brew install gcc
    

Then in the afl* folder:

    
    
       CC=gcc-4.9 make clean all
    

Fixes this so that jpeg-8c will compile.

However, we then get stuck as djpeg is a shell file (and .libs/djpeg exits
with error 5) and I've got a bit distracted to continue. Good luck!

~~~
birkbork
\--disable-shared

------
im2w1l
Regarding

>if (strcmp(header.magic_password, "h4ck3d by p1gZ")) goto terminate_now;

How impossible would it be to look at the branching instruction, perform a
taint analysis on its input and see if there is any part of the input we can
tweak to make it branch/not branch. Like, we jumped because the zero flag was
set. And the zero flags was set because these two bytes were equal. Hmm that
byte is hardcoded. This other byte was mov'd here from that memory address.
That memory address was set by this call to fread... hey, it come from this
byte in the input file.

~~~
JoshTriplett
[https://en.wikipedia.org/wiki/Symbolic_execution](https://en.wikipedia.org/wiki/Symbolic_execution)

Quite possible. More commonly done with higher-level languages rather than
machine code, but certainly possible with machine code. A good fuzzer could do
this too.

The fuzzer from the article, american-fuzzy-lop
([https://code.google.com/p/american-fuzzy-
lop/](https://code.google.com/p/american-fuzzy-lop/)), does something similar
to this as it moves forward in execution, trying to find interesting inputs
that cause the program to take a different code path. Symbolic execution could
accelerate that process, allowing afl to immediately identify the relevant
things to fuzz, rather than randomly mutating and looking for interestingness.
On the other hand, unless the program in question runs very slowly, or uses
many complex compound instructions before a single conditional branch, random
mutation seems likely to produce results rapidly from sheer speed.

Symbolic execution does seem like it would work well if you want to reach a
specific point in the program, and you have rather complex conditionals
required to get there. But it would still have trouble with complex cases.
Consider a file format with a SHA256 hash in it, where the header must have a
valid hash to parse. Symbolic execution would have a very hard time figuring
out the input relationship required to get past that hash check.

~~~
darkmighty
Yea I thought of hashes too. Because there are hashes proven (?) to be secure,
it follows that it's impossible to make a universally efficient fuzzer (i.e.
one that necessarily spends much less than ~exp(parser size) time).

~~~
ynik
There are no hashes that are proven to be secure. And we aren't likely to get
such a proof any time soon: secure hashes can only exist if P != NP.

------
rainforest
See also: Microsoft Code Digger [1], which generates inputs using symbolic
execution for .net code, and EvoSuite, which uses a genetic algorithm to do
the same for Java [2].

[1] :
[http://blogs.msdn.com/b/nikolait/archive/2013/04/23/introduc...](http://blogs.msdn.com/b/nikolait/archive/2013/04/23/introducing-
code-digger-an-extension-for-vs2012.aspx)

[2] : [http://www.evosuite.org](http://www.evosuite.org)

------
Mchl
I like to imagine that given enough time it eventually generates the Lenna [1]
jpeg and exits

[1] :
[https://en.wikipedia.org/wiki/Lenna](https://en.wikipedia.org/wiki/Lenna)

------
gear54rus
I had a brief 'It's alive :O' moment when reading this, imagine seeing face
looking at you in one of those pics :)

Nice article, concept of fuzzers was new to me.

------
raisedbyninjas
I'm not familiar with how the fuzzer was monitoring the executed code path.
Would this be thwarted by address space layout randomization?

~~~
jeffmcjunkin
No. afl requires an instrumented (compiled with extra information) executable,
and watches the code paths. When fuzzing a seed finds a new code path, it will
recycle that fuzzed version as a new seed.

ASLR can help prevent successful exploitation of bugs that afl might find, but
it won't prevent the program from crashing in the first place.

(Plus, since afl requires compiling the binary, I doubt it bothers to enable
ASLR. There's no benefit for fuzzing purposes.)

~~~
williamsharkey
Can you fuzz an uninstrumented executable to add instrumentation?

~~~
f-
You can add instrumentation to binaries using DynamoRIO or pin. This isn't
currently supported by afl-fuzz out of the box, although there's nothing that
makes it fundamentally difficult.

------
ionforce
Sounds very much like a genetic algorithm/evolutionary computation.

~~~
f-
Sure, it's even called that on the project page :-) It just uses an
interesting fitness function that knows nothing about the underlying data
format - essentially, "improve the edge coverage in this black-box binary".

------
bane
Wow, two awesome ideas in a week. Reminds me of this posted just a couple days
ago [http://reverseocr.tumblr.com/](http://reverseocr.tumblr.com/)

------
JonnieCache
Now to try this with midi...

But what to feed it into? I could make some musical analysis stuff, but do I
need to write it in C to avoid accidentally fuzzing my interpreter?

~~~
gregwtmtno
I'm running it now on an mp3 encoder. So far, no results, but I'll update if I
get anything out of it.

~~~
f-
Encoder? You'd probably want to try a decoder as the target binary if you want
to make MP3s.

------
1ris
OT: Is there a simple, little fuzzer that just uses grammars as templates for
their outputs?

~~~
tptacek
Yes, lots and lots and lots of them. Peach is the most popular example.

~~~
101914
"a simple, little fuzzer"

The download for Peach is 190MB??

------
stevebot
You can throw afl-fuzz at many other types of parsers with similar results:
with bash, it will write valid scripts;

^ that seems fun, I just don't think I would run it on my machine for fear of
what it might create (oh.. rm -rf * ok!)

------
vitamen
The beginning of this article reads eerily similar to the beginning of Greg
Egan's Diaspora, though in a much more limited context.

------
fenollp
What if we feed afl a program that checks whether a number is prime? Will it
slowly discover a way to make primes?

------
fit2rule
This is awesome .. "Go Away or I will Replace You With a Fuzz" seems like my
next t-shirt order ..

------
slvn
This what a hacker be.

------
byEngineer
This is totally amazing! Wondering if it would be possible to go the other way
around: from generated JPG to a string. If yes, what a cool way to send your
password as a... JPG over email.

~~~
smkdtr
Along similar lines, I wonder if this fuzzer can be used to bruteforce
passwords for applications. Would it do any better than standard "try all the
combinations" method?

~~~
tptacek
Not really, because it depends on collecting traces from the target, and if
you can do that you can usually just read the password out of memory.

~~~
bri3d
On the flip side, it could probably be used as a really slow universal keygen
for naive license-key implementations :)

