
Ask HN: What is the emerging state of the art in fuzzing techniques? - hwhatwhatwhat
I&#x27;m fairly familiar with the popular tools such as afl and Codenomicon Defensics. But I find the academic literature very opaque and don&#x27;t really know where to start.<p>If I want to understand the cutting edge of fuzzing techniques, and what will be the emerging state of the art in the next few years - where should I look? Any good papers or books (with at least some for novices to understand), or research projects that are leading towards a new excellence?
======
wepple
in general, have a poke around
[https://fuzzing.info/papers/](https://fuzzing.info/papers/)

First, I think the next big step in fuzzing will actually be a _complement_ to
fuzzing - solving.

AFL and friends can bitbang their way to massive code coverage, but can still
fail on fairly simple testcases. Some recent research[1] by the authors of
Angr[2] show that by pairing the brute-force coverage and exception discovery
of a tool like AFL with constraint solving tools can really dig deep into a
program, by actually solving the path to a given block of code. Microsoft's
infamous SAGE fuzzer does this IIRC.

Second, I think there are still massive oportunities for fuzzing closed-source
programs, as well as programs with tricky state, such as browsers or network
daemons.

[1] [https://www.internetsociety.org/sites/default/files/blogs-
me...](https://www.internetsociety.org/sites/default/files/blogs-
media/driller-augmenting-fuzzing-through-selective-symbolic-execution.pdf)

[2] [http://angr.io](http://angr.io)

~~~
joosters
Has anyone tried combining fuzzing with 'all-pairs'/'pairwise' test case
generation?

The idea of pairwise testing is that individual features in a program are
commonly tested, but _combinations_ of them are often poorly tested. However,
trying to test all features with each other soon becomes a combinatorial
nightmare. To deal with this, you use an algorithm (e.g. see the code for
'allpairs' at
[http://www.satisfice.com/tools.shtml](http://www.satisfice.com/tools.shtml) )
to pick a minimal set of test cases that cover all possible pairs of
configuration settings.

These test cases could then be used as starting points for fuzzing, to provide
a greater code coverage faster.

~~~
wyldfire
NIST has done some research on combinatorial software testing and has a pretty
good package [1] I've used before for generating test cases given N
dimensions.

The idea behind their software is to maximize the effectiveness of test time
because testing those N dimensions exhaustively is infeasible.

[1]
[http://csrc.nist.gov/groups/SNS/acts/index.html](http://csrc.nist.gov/groups/SNS/acts/index.html)

------
scriptdevil
I am very interested in learning about this too! I started out with AFL for
fuzzing but soon had to move to LLVM's LibFuzzer because I didn't want non-
ASCII inputs (by design, we know we wouldn't get that) and also
SantizerCoverage seemed to be more robust than the 64kB shared memory array
that AFL uses for large programs.

However, libFuzzer being an in-process fuzzer has again created a lot of
headache - especially in places where we malloc stuff and expect free to
implicitly happen at exit - in libFuzzer's case, the exit is caught and the
entrypoint function is restarted causing memory leaks and OOM crashes. This
made me have to include #ifdef FUZZ ... #endif lines in the codebase - adding
different behavior in fuzzed and unfuzzed cases which felt wrong.

I have considered implementing an out-of-process fuzzer from scratch (or base
it off AFL), but have been holding off till I get time to read about more
prior work given that this is not of the highest priority at work.

That said, SAGE from Microsoft seems really interesting[1]. It generates
inputs intelligently by constraint-solving on inputs to conditional
statements. It isn't exactly new though.

[1] [http://research.microsoft.com/en-
us/um/people/pg/public_psfi...](http://research.microsoft.com/en-
us/um/people/pg/public_psfiles/sage-in-one-slide.pdf)

~~~
jerf
Bear in mind that while AFL may feed you a binary string, you can "decode"
that into whatever you want. It may take a bit of creativity or bit-shifting,
but it can be done. You're free to grab 8 bits off the front and start setting
global run-time flags or something, for instance. It's just data.

Also due to the way that AFL works, if you have a branch at the beginning of
your program that immediately exits if it sees non-ASCII input, you don't lose
all that much time, because AFL sees that as the same branch being exercised
over and over again and hammers only on the input that allowed it to progress
past that. In fact I think I almost _always_ use AFL on text-based protocols,
and it works fine. It's a common use case for AFL.

~~~
scriptdevil
While I didn't internally decode the bitstring, you are right. AFL did
generate tonnes of useful tests and I did uncover a couple of bugs using it.

That said, given that I didn't spend too much time actually trying to
understand AFL, could you clarify if I was right in my understanding that AFL
doesn't have true coverage but rather a heuristic using a table as documented
in
[http://lcamtuf.coredump.cx/afl/technical_details.txt](http://lcamtuf.coredump.cx/afl/technical_details.txt)
. Given that the program I was fuzzing was HUGE, wouldn't it falsely alias
branches?

Thank you for your input!

~~~
jerf
It could probably go either way, depending on the nature of your program. It
depends on the correlation between "separate paths of execution" and "separate
test cases". Certainly that's strong for a lot of programs, but if I sat down
to construct pathological cases I probably could, and in the world of Turing
chaos that programs inhabit, if pathological cases _can_ exist, you can count
on hitting them sometimes, no matter what low probability of that outcome you
think you can justify.

AFL is definitely heuristic, and thus can conceivably be fooled in places
where a true symbolic execution wouldn't be. On the other hand, it's very fast
and easy to set up and use. Can't ever have it all. :)

------
robto
I've started using clojure.spec[0] in my regular day programming with an eye
on using generative testing, which I understand is a form of fuzzing. I'm very
new to this, but it feels incredibly practical in terms of bang for buck -
like the 'cutting edge' of practical use. I'm not sure what it's academic
background is, but I'd highly recommend reading and listening to what Rich
Hickey has to say about it. He's a smart guy.

[0][http://clojure.org/about/spec](http://clojure.org/about/spec)
[1][http://blog.cognitect.com/cognicast/103](http://blog.cognitect.com/cognicast/103)

~~~
freshhawk
I'll definitely second this one, I'm still getting comfortable with Clojure
Spec and the hooks into test.check (it's brand new, and I'm unfortunately no
longer doing Clojure for my day job) but it's already great and I'm only at
the beginner/intermediate stage.

------
qntty
For people as confused about this post as I was:

[https://en.wikipedia.org/wiki/Fuzz_testing](https://en.wikipedia.org/wiki/Fuzz_testing)

------
ikisusi
There is a 30 seconds video based on 90 minutes of a collaborative experiment
by 22 fuzzing practitioners. Aim was to write down couple of points about
fuzzing state of the art.

Video:
[https://www.youtube.com/watch?v=UrhRUKgeDQI](https://www.youtube.com/watch?v=UrhRUKgeDQI)

Text: [https://github.com/ouspg/ouspg-
open/blob/master/presentation...](https://github.com/ouspg/ouspg-
open/blob/master/presentations/fuzzing-sota.md)

Updates e.g. as pull requests most welcome. :)

------
hguant
DARPA's Cyber Grand Challenge is pushing the state of the art when it comes to
RE, exploitation, and the like. Very presentable talk by Mike Walker, who's
heading up the project with a bunch of ex-Raytheon guys.

There are links to some repos of in the talk: it's not exactly what you're
looking for, but if you're interested, it's a good resource.

[https://www.youtube.com/watch?v=ejPghbtAG58](https://www.youtube.com/watch?v=ejPghbtAG58)

------
Harvester57
Here you have some interesting work from Fabien Duchene, ENSIMAG/CEA
researcher, about black-box genetic fuzzing (I know it sounds like a lot of
buzzwords, and in fact it was a little bit mocked during SSTIC 2016, but it's
some really good stuff !)

[http://hal.univ-grenoble-alpes.fr/hal-00978844/](http://hal.univ-grenoble-
alpes.fr/hal-00978844/) [https://dl.acm.org.sci-
hub.cc/citation.cfm?id=2557550&dl=ACM...](https://dl.acm.org.sci-
hub.cc/citation.cfm?id=2557550&dl=ACM&coll=DL&CFID=812652285&CFTOKEN=99735597)

~~~
scriptdevil
I don't know if adding a sci-hub link is a good idea.

~~~
endergen
Why not?

~~~
zokier
Would you accompany a movie recommendation with a piratebay link on HN?

~~~
linschn
Except this research was (most probably, I did not check) funded with french
or european taxpayers' money, and therefore people should not have to pay for
it twice.

The moral standpoint is very different than with movie or music pirating.

~~~
nickpsecurity
We should try to extend the First Sale Doctrine to digital publications. If it
hasnt been tried already.

------
jorangreef
This is not exactly what you asked for, but if the code you want to fuzz is
written by yourself, you could learn fuzzing by doing the fuzzing yourself. It
might also be easier to do this at first, since you're closer to your code,
and would need less adapters for existing fuzz solutions.

1\. Write a simple test function which will generate a very wide range of
allowed inputs to the function you want to test. Try to generate average
inputs most of the time, and outliers some of the time. Use a seeded Mersenne
Twister as your random number generator. For example, if the function you are
testing accepts an array of buffers, then for a single test of the function,
you could choose at random how many buffers to generate, and then at random
the length of each buffer, and then at random the contents of each buffer. You
could then call the function many times, each time with a different array of
inputs. Or if you were testing a document editor or CRDT, you might want to
randomly generate different combinations of user edits, e.g. a delete 10% of
the time, an insert 50% of the time, etc.

2\. Write the simplest possible independent implementation of the function you
want to test. For example, if you are testing a custom hash map, you could use
the hash map from your standard library as the basis for the independent
implementation. Or if you were testing a key/value storage engine, you could
consider using an in-memory hash as the basis for the independent
implementation.

3\. Run your random fuzz inputs from step 1 through both your implementations
and assert that the outputs of both are always the same at each step. Both
implementations could be called a few thousand times depending on the run
time.

------
SEJeff
Checkout TriforceAFL as well, which allows using AFL (American Fuzzy Lop)
inside a qemu virtual machine.

This is a pretty interesting writeup on it:
[https://www.nccgroup.trust/us/about-us/newsroom-and-
events/b...](https://www.nccgroup.trust/us/about-us/newsroom-and-
events/blog/2016/june/project-triforce-run-afl-on-everything/)

------
mytummyhertz
shameless self plug: [https://www.nccgroup.trust/us/about-us/newsroom-and-
events/b...](https://www.nccgroup.trust/us/about-us/newsroom-and-
events/blog/2016/june/project-triforce-run-afl-on-everything/)

allows you to run AFL on arbitrary VMs. so far we've used it to find some
Linux vulnerabilities, and are starting to find stuff in other operating
systems too. and we're just getting started :)

~~~
wyldfire
I read this recently via HN. Congrats, this is impressive.

------
anonymousDan
You could take a look at the proceedings of conferences like ICSE
(International Conference on Software Engineering). For example there's been a
lot of work over the last decade or so on making dynamic symbolic execution
techniques more practical (as exemplified by e.g. KLEE or SAGE).

------
humbleMouse
My friend is working on some next gen fuzzing stuff here:

[https://github.com/2trill2spill/nextgen](https://github.com/2trill2spill/nextgen)

~~~
2trill2spill
Thanks for the shout out, But I'm the nextgen author If anyone has any
questions, comments, bugs, whatever. The project was started because of
frustrations with AFL and trinity. The "novel" feature at this point is using
dtrace for instrumentation instead of a compiler plugin like AFL. This allows
for instrumenting closed sourced applications without using qemu and non C/C++
applications. I have more planned for the future but I'm trying finish the
core of nextgen, mainly the logging, interprocess communication, and
concurrency model.

------
pag
I worked on the cyber reasoning system (CRS) at Trail of Bits for our entry
into the Cyber Grand Challenge [1]. Some slides describing the system are here
[2].

Specifically, I implemented our fuzzer. I created a dynamic binary translator
[3] that emulated the DECREE [4] operating system and x86 arhcitecture. It had
the Radamsa [5] mutator built-in, along with a number of other simpler
mutators.

I think our fuzzer out-performed our competitors, though I am biased ;-) The
fuzzer was single-threaded, but could perform more than a million fuzz/mutate-
execute (with coverage) iterations every two hours. Before I optimized it, it
beat the pants off PIN [6]. We ran many such fuzzer processes concurrently.
They would saturate the CPUs, and actually performed no I/O because I emulated
all I/O in memory ;-) This was key to us achieving such high-throughput.

Our fuzzer wasn't super smart (though Radamsa is), but it benefited a lot from
a feedback loop with our symbolic executors [7]. The symbolic executors would
produce inputs that would then get fuzzed. These inputs could feed back into
the symbolic executors, etc. That added more brains to our system.

All in all, we ran the CRS across something like 300 large EC2 nodes (across
three availability zones). Per node, 8 or so fuzzers processes were running
constantly for 24 hours. I'd ballpark that as 28.8 billion mutate+execute
cycles.

In conclusion, the key for us was to make a production-quality, high-
throughput fuzzer that did only one thing really well and really fast, then
complement it with other more sophisticated tools like symbolic executors.

[1] [https://blog.trailofbits.com/2015/07/15/how-we-fared-in-
the-...](https://blog.trailofbits.com/2015/07/15/how-we-fared-in-the-cyber-
grand-challenge/) [2]
[http://infiltratecon.com/archives/Slides_Artem_Dinaburg.pdf](http://infiltratecon.com/archives/Slides_Artem_Dinaburg.pdf)
[3]
[https://en.wikipedia.org/wiki/Binary_translation](https://en.wikipedia.org/wiki/Binary_translation)
[4]
[https://github.com/CyberGrandChallenge/libcgc](https://github.com/CyberGrandChallenge/libcgc)
[5] [https://github.com/aoh/radamsa](https://github.com/aoh/radamsa) [6]
[https://software.intel.com/en-us/articles/pin-a-dynamic-
bina...](https://software.intel.com/en-us/articles/pin-a-dynamic-binary-
instrumentation-tool) [7]
[https://en.wikipedia.org/wiki/Symbolic_execution](https://en.wikipedia.org/wiki/Symbolic_execution)

~~~
Zardus
It's impressive how many resources you threw at problem! We (Shellphish) had
very similar results by using AFL [4] for fuzzing and angr [5] for symbolic
execution (we published our approach at NDSS in February [1]) on around 300
cores. Of course, we procrastinated pretty heavily, ended up hacking our CRS
together in three weeks, and it was absurdly rough around the edges and didn't
get anywhere near your crash numbers during the qualifying event itself. As we
discuss in the paper, our experiments with the impressive numbers were carried
out afterwards, in less chaotic conditions.

It's interesting that the approaches taken by us [1], you [2], and
ForAllSecure [3] for the CQE (at least on the exploitation side) were so
similar. I've talked with two other teams that had an analogous setup (as well
as two other teams, who did quite well, that took a very different route). I
guess some great minds think alike!

As a side note, in the ToB blog post, you talk about wanting to join up with
another team to be able to play in the final event. Did you guys end up
finding a partner? It'd be interesting to face your CRS again next month :-)

[1] [https://www.internetsociety.org/sites/default/files/blogs-
me...](https://www.internetsociety.org/sites/default/files/blogs-
media/driller-augmenting-fuzzing-through-selective-symbolic-execution.pdf) [2]
[https://blog.trailofbits.com/2015/07/15/how-we-fared-in-
the-...](https://blog.trailofbits.com/2015/07/15/how-we-fared-in-the-cyber-
grand-challenge/) [3] [https://blog.forallsecure.com/2016/02/09/unleashing-
mayhem/](https://blog.forallsecure.com/2016/02/09/unleashing-mayhem/) [4]
[http://lcamtuf.coredump.cx/afl/](http://lcamtuf.coredump.cx/afl/) [5]
[http://angr.io](http://angr.io)

~~~
pag
We ran three versions of our system, because in the last few weeks/days before
the quals we had a bunch of regressions. One of those regressions was caused
by a single line change in my fuzzer :-p On the twitter account we used to
track our progress, we used rapper names for each version. One version was two
weeks old, one was up-to-date with the fixed fuzzer, and one had an
experimental version of one of our symbolic executors that produced <var>
tags. That didn't work.

We also under-utilized those nodes :-( Each node had at at least 4 idle cores
wasting our money. Our resource allocation mechanism was naive.

I looked through some of the stuff released byt DARPA after the event and they
released some of our PoVs as official PoVs. If you hex-decode them, you'll see
something like "bad seed to Radamsa"!! That was a bug in how I would invoke
Radamsa -- sometimes I'd pass it a seed that was way too big.

We tried to team up with every team but ForAllSecure. No one wanted to have
our name on their ticket, or they were just fishing for details :-/ We've done
a bit of work on the system since, getting it to work on Linux programs via a
"port" of parts of libc to DECREE.

~~~
Zardus
You didn't approach us, either; bummer! I doubt we'd have been down to team up
outright, but maybe some of you guys could have done an internship in our lab
if you really wanted to work on the CGC? I guess it's ancient history now,
though.

Resource utilization is definitely tricky. But man, the amount of resources
you had is just mind-boggling! I just realized it's even more cores than DARPA
gave us for the final event! I'm impressed you guys managed to keep it all
running smoothly (at least, it seemed that way from our lab, where there was
complete chaos as our system fell over and crapped itself repeatedly for the
first few hours).

~~~
dguido
Hey Zardus, I'm sure the info reached you. What team are you? We approached
everyone either indirectly through broadcast emails or directly through me
calling/emailing.

I'm not sure an internship would have been what we were looking for. If I had
to guess, that's probably why we didn't pursue it further.

------
vulnan
You might not be as far from the cutting edge as you'd expect.

From what I've seen, fuzzing is divided into two major camps (I'm generalizing
to the extreme here):

1\. Mutational - These include tools like AFL, are gaining traction in the
open source community, and have a lot of applications, perhaps most notably in
library and application fuzzing.

2\. Generational - These include commercial tools Defensics and PeachFuzzer,
and open source tools like Peach, Spike, and Sulley. The state of the art is
held by commercial offerings in this camp, and it's what businesses are more
likely to be interested in.

My hypothesis as to the reason for this split: Open source hackers are
interested in finding bugs. Businesses are interested in assurance that their
software is safe ("safe"). Protocol-specific tools give the impression that
we've done the best we can at securing IP/TCP/TLS/HTTP/etc. Defensics is by
far the dominant offering (in terms of apparent popularity), and Peach is the
only active competitor I've ever found.

The open source generational branch is moving very slowly. The primetime
candidate was once Peach, now called Peach Community [1]. Unfortunately the
corporate backer switched to a closed solution, and left the open source tool
out to dry. The latest tool of note besides Peach is Sulley [2] [3].

Books: I haven't found any books that go below the surface. "Fuzzing: Brute
Force Vulnerability Discovery" has decent reviews on Amazon, but I found it
more breadth than depth.

Papers:

1\. IMO the seminal paper on fuzzing is Rauli Kaksonen's thesis, "A Functional
Method for Assessing Protocol Implementation Security." [6] This will take you
almost to the state of the practice. Kaksonen was a co-founder of Codenomicon.
Very interesting read.

Talks: If you want cutting edge research, conference talks and blog posts may
be as good as papers.

1\. 2007 Blackhat conference Sulley talk "Fuzzing Sucks! - Introducing Sulley
Fuzzing Framework" [2]

2\. Google Charlie Miller fuzzing. My favorite slide decks are [7] and [8].
High fives (and a beverage on me should time and space ever permit) to anyone
who can find audio or video from the actual talks.

Shameless plug(s):

1\. Due to lack of response on Sulley pull requests, I forked to a new project
called boofuzz [4], and I commit to at least address pull requests more
quickly.

2\. I'll be giving a fuzzing talk at Defcon 24's Packet Hacking Village which
will address, among other things, the state of open source fuzzing [5].

[1]:
[http://www.peachfuzzer.com/resources/peachcommunity/](http://www.peachfuzzer.com/resources/peachcommunity/)

[2]: [http://www.podcast.tv/video-episodes/pedram-amini-aaron-
port...](http://www.podcast.tv/video-episodes/pedram-amini-aaron-portnoy-
fuzzing-sucks-or-fuzz-it-like-you-mean-it-7581709.html)

[3]: [https://github.com/OpenRCE/sulley](https://github.com/OpenRCE/sulley)

[4]:
[https://github.com/jtpereyda/boofuzz](https://github.com/jtpereyda/boofuzz)

[5]:
[https://www.wallofsheep.com/pages/dc24](https://www.wallofsheep.com/pages/dc24)

[6]:
[http://www.vtt.fi/inf/pdf/publications/2001/P448.pdf](http://www.vtt.fi/inf/pdf/publications/2001/P448.pdf)

[7]:
[https://cansecwest.com/csw08/csw08-miller.pdf](https://cansecwest.com/csw08/csw08-miller.pdf)

[8]:
[http://pages.cs.wisc.edu/~rist/642-fall-2012/toorcon.pdf](http://pages.cs.wisc.edu/~rist/642-fall-2012/toorcon.pdf)

------
thornjm
I would have a look at Project Zero.

P.S. I think many governments and corporations would keep their fuzzing
techniques quite secret. You don't want to do the same fuzzing as anyone else.

~~~
k__
Really?

I thought everyone dropped "security by obscurity" long time ago.

~~~
thornjm
I was actually thinking along the lines of fuzzing for exploit development.

You want a unique bug that will last a long time. In which case, your fuzzing
techniques are a trade secret. A lot of fuzzing advances take place behind
closed doors.

Project Zero has some people with interesting backgrounds doing bug hunting
for good.

~~~
roddux
I hold the belief that Google _must_ be getting something else out of Project
Zero other than just "we hire the best hackers" bragging rights.

I figure they're selling exploits (the ones they don't publicise) to
governments.

------
realkitkat
AFL for Windows:
[https://github.com/ivanfratric/winafl](https://github.com/ivanfratric/winafl)

------
platz
curious how people who have been doing fuzz testing feel about property-based
testing (at least the part about generating inputs, not asserting results)

------
majke
You should watch for upcoming presentations of Mateusz Jurczyk @j00ru . I'm
not entirely sure where he's going to present next though.

------
jdimov10
Try Google:
[https://scholar.google.co.uk/scholar?as_ylo=2015&q=fuzzing&h...](https://scholar.google.co.uk/scholar?as_ylo=2015&q=fuzzing&hl=en&as_sdt=0,5)

