If it's smart enough to learn how to build a JPEG in a day, use it with netcat and it could probably send quite a lot of things down in flames.
Who needs static analysis :) ?
> it triggers a slightly different internal code path in the tested app
This would be impossible on the network.
I never understood this attitude. It has always been my experience that obscurity is in fact an important part of security. It's a weakness when mistaken for security, not when understood as part of it.
Sadly, I do actually have signatures (with version information) to mute.
I mean remembering, that the net is full of slow brute-forcers and the like. Just because it takes a few days to run through all the exploits doesn't mean that someone won't do it - that's thinking of security in human, individual terms, as though the threat is targeted rather then general.
And if you need your headers to tell you what versions are running and what tools are installed you are doing something else very wrong.
Ah yes, I forgot this little detail. I wonder if you can get it to work on the local machine first, but talking throught a socket instead of stdin.
Then, pipe the result throught netcat !
(This isn't a new concept, although afl is a particularly tight implementation of it; you can look up the paper for "autodafe" for a (much) earlier version).
That's got to be the funniest and most appropriate name for a piece of software ever.
Fuzzing like this is a very effective technique for finding (security) bugs in programs that parse input, because you will quickly end up with "impossible" input nobody thought to check for (but is close enough that it won't be rejected outright), and whoops there's your buffer overflow.
In this particular case, the fuzzer is going beyond just throwing random input, as it considers which changes to the input trigger new code paths in the target binary, and therefore should have a higher success rate in triggering bugs compared to just trying random stuff. And don't forget, this will work with any type of program and file type, not just .jpgs and the djpeg binary.
To expand on this, techniques like this are called whitebox fuzzing (or maybe graybox in afl's case). In their extreme whitebox fuzzers even incorporate constraint solvers to directly solve inputs that take the program to previously unexplored paths. One very impressive project is the SAGE whitebox fuzzer [1,2,3] that's in production use at Microsoft (an internal project sadly). I work in the related field of automated test generation, but all my tools are very much research-grade. However, in SAGE they've done all the work of figuring out how 24/7 whitebox fuzzing can be integrated into the development process. I am somewhat envious of the researchers getting to work in an environment where that is possible. If you're interested I very much recommend reading the papers on SAGE.
 Poster about SAGE: http://research.microsoft.com/en-us/um/people/pg/public_psfi...
 An approachable article on SAGE: http://research.microsoft.com/en-us/um/people/pg/public_psfi...
 The paper with all the details: http://research.microsoft.com/en-us/projects/atg/ndss2008.pd...
So, while I suspect it's very cool, it's also a bit of a no-op for everybody else. It's also impossible to independently evaluate the benefits: for example, its performance cost, the amount of fine-tuning and configuration required for each target, the relative gains compared to less sophisticated instrumented fuzzing strategies, etc.
You could also have a started with a valid .jpg with lots of complicated embedded exif metadata sections etc, and have a good chance of triggering bugs in those code paths without having to "discover exif" first.
1) JPEG file structure is complex (much more so than a BMP file for example).
2) Imagine you didn't know what it looked, but had a tool that could "read" them. djpeg in this case
3) What the fuzzer does is "peek" at how the djpeg tool reacts to random "fuzzed" data. It's smart about it, understanding when a bit of data makes the tool do something new (code path)
4) After millions of iterations it "learns" how JPEG file structure works and can generate valid JPEGs.
This might not be very relevant for JPEGs, given that you can just Google their structure. This is cool because it shows how the tool could figure out (or find a weakness in) something where you don't know how it works.
It is better to look at it as a maze solver that tries to generate instructions for a robot that will lead that robot through the maze, while that robot has its own weird way of interpreting the instructions.
It it generates) makes
afl starts feeding it random inputs gradually gaining limited understanding about what it accepts and what it does not.
Eventually, it gathers enough info (based on strace'ing, google it, and its output) to start pulling JPEG images out of thin air like it's nothing.
The catch is, that this afl program learns this all by itself, and it can do so with any other parser program (like ELF parsers, GIF parsers, whatever). It is a general purpose tool.
Also, similar concept can be applied to things like web servers, or any other servers (since they parse what clients send them).
curl -LO http://lcamtuf.coredump.cx/afl.tgz
tar zxvf afl.tgz
echo 'hello' >in_dir/hello
# there is a glitch with the libjpeg-turbo-1.3.1 configure file that makes it difficult to compile on Mac, so I tried regular libjpeg:
curl -LO http://www.ijg.org/files/jpegsrc.v8c.tar.gz
tar zxvf jpegsrc.v8c.tar.gz
# error: C compiler cannot create executables
# if the above command worked to build an instrumented djpeg, then this should work
./afl-fuzz -i in_dir -o out_dir ./jpeg-8c/djpeg
Install homebrew if you don't have it already, then
brew install gcc
CC=gcc-4.9 make clean all
However, we then get stuck as djpeg is a shell file (and .libs/djpeg exits with error 5) and I've got a bit distracted to continue. Good luck!
>if (strcmp(header.magic_password, "h4ck3d by p1gZ")) goto terminate_now;
How impossible would it be to look at the branching instruction, perform a taint analysis on its input and see if there is any part of the input we can tweak to make it branch/not branch.
Like, we jumped because the zero flag was set. And the zero flags was set because these two bytes were equal. Hmm that byte is hardcoded. This other byte was mov'd here from that memory address. That memory address was set by this call to fread... hey, it come from this byte in the input file.
Quite possible. More commonly done with higher-level languages rather than machine code, but certainly possible with machine code. A good fuzzer could do this too.
The fuzzer from the article, american-fuzzy-lop (https://code.google.com/p/american-fuzzy-lop/), does something similar to this as it moves forward in execution, trying to find interesting inputs that cause the program to take a different code path. Symbolic execution could accelerate that process, allowing afl to immediately identify the relevant things to fuzz, rather than randomly mutating and looking for interestingness. On the other hand, unless the program in question runs very slowly, or uses many complex compound instructions before a single conditional branch, random mutation seems likely to produce results rapidly from sheer speed.
Symbolic execution does seem like it would work well if you want to reach a specific point in the program, and you have rather complex conditionals required to get there. But it would still have trouble with complex cases. Consider a file format with a SHA256 hash in it, where the header must have a valid hash to parse. Symbolic execution would have a very hard time figuring out the input relationship required to get past that hash check.
It seems to me that all these methods would eventually run into the Halting Problem. Trying to fuzz through a hash (or other crypto) this way would essentially involve having to break it by a slightly more "intelligent" version of bruteforce.
>The left side was computed by summing this and this. That was in turn computed by xoring that and... Screw it. The left side can not be controlled. Now, the right side was loaded from this part of the file. Aha! Let's just change that part instead.
 : http://blogs.msdn.com/b/nikolait/archive/2013/04/23/introduc...
 : http://www.evosuite.org
 : https://en.wikipedia.org/wiki/Lenna
Nice article, concept of fuzzers was new to me.
3) Instrumenting programs for use with AFL
Instrumentation is injected by a companion tool called afl-gcc. It is meant to
be used as a drop-in replacement for GCC, directly pluggable into the standard
build process for any third-party code.
The correct way to recompile the target program will vary depending on the
specifics of the build process, but a common approach may be:
$ CC=/path/to/afl/afl-gcc ./configure
$ make clean all
ASLR can help prevent successful exploitation of bugs that afl might find, but it won't prevent the program from crashing in the first place.
(Plus, since afl requires compiling the binary, I doubt it bothers to enable ASLR. There's no benefit for fuzzing purposes.)
But what to feed it into? I could make some musical analysis stuff, but do I need to write it in C to avoid accidentally fuzzing my interpreter?
The download for Peach is 190MB??
^ that seems fun, I just don't think I would run it on my machine for fear of what it might create (oh.. rm -rf * ok!)
All the fuzzer is doing is exploring the possible codepaths through the application trying to exercise all the code; many of the codepaths end up with the executable outputting an error message and terminating. Some maybe put it into an infinite loop. Some end up with it completing a JPEG data parse and terminating - so in amongst all the possible paths it explores, of course it will eventually seek out input sequences which bring that about.
If it's not, better attacks exist than trying every single password.
If you're trying to crack the application - not the password - maybe, but I kinda doubt it.