
Generating Software Tests: Breaking Software for Fun and Profit - ingve
https://www.fuzzingbook.org/
======
dbcurtis
Disclaimer: I only took a quick glance at the material, not a deep dive. So
feel free to discount my criticism as severely as you like. Or whack me with a
clue-bat, if needed.

1\. In the fuzzing chapter author seems to be showing how to create a fuzzer
in Python. OK, so that is great if you want to learn the basics of creating a
fuzzer. HOWEVER... there is a great Python module called "hypothesis" that is
well developed, well maintained, and 100% production ready. If your goal is to
do fuzzed testing of Python, I urge you to check out hypothesis. It integrates
nicely with drivers like unittest, and does automatic test case reduction. In
other words, if it discovers a failing case, it does a pretty darn decent job
of trimming the failing case down to the minimal inputs necessary to provoke
the failure. In fact, I urge you to check out hypothesis, period. You might be
surprised at how much your current testing leaves out.

2\. The first chapter on testing talks about the basics of doing a unit test
and writing a test driver. Again, test drivers that are well maintained and
have a lot of mileage on them already exist for Python. But more over, the
author seems to focus on tactical testing. Unit testing is good, yes, but
there is a difference between "testing" and "validation". Testing is tactical.
Validation is strategic. I didn't see anything there about analyzing the
product test space, and how to make judgment calls about how to decide what to
test and how to measure coverage in a way that ensures that your customers
will be well-served. For most real products, the entirety of the test space is
intractably large, so it is very important to know how to target tests that
tickle edge and corner cases, because just throwing random crap at the SUT,
while necessary, is unlikely to sensitize the _interesting_ test points.
(Imagine each test case as throwing darts at a dart board: You want as many
darts to land on the line between the rings as land cleanly within the rings.)

------
k4st
I work on a Google Test-like unit testing framework called DeepState [1] that
gives you access to fuzzing and symbolic execution from your C or C++ unit
tests. We have a tutorial [2] describing how to use it. It's new and still
under development, but shows massive potential, and really brings these
software security tools into a more developer-centric workflow.

[1]
[https://github.com/trailofbits/deepstate](https://github.com/trailofbits/deepstate)
[2]
[http://www.petergoodman.me/docs/secdev-2018-slides.pdf](http://www.petergoodman.me/docs/secdev-2018-slides.pdf)

------
lootsauce
Another approach to automated generation of test cases is through a process of
turning traces into tests. Valid uses of code already exist. These are in the
application itself and for a library you can find published code that makes
use of it.

A recent paper [1] covers this subject and demonstrates in R achieving better
code coverage than hand written tests and does so for a large portion of
published libraries.

I think this is a very interesting direction for automated test case
generation. If you look at snapshot testing [2] it reminds me very much of
that.

[1]
[http://janvitek.org/pubs/issta18.pdf](http://janvitek.org/pubs/issta18.pdf)
[2] [https://blog.kentcdodds.com/effective-snapshot-
testing-e0d1a...](https://blog.kentcdodds.com/effective-snapshot-
testing-e0d1a2c28eca)

------
nickpsecurity
If interested in this topic, here is a survey that tells you about various
ways to do automatic, test-case generation:

[https://cs.stanford.edu/people/saswat/research/ASTJSS.pdf](https://cs.stanford.edu/people/saswat/research/ASTJSS.pdf)

My baseline recommendation is using one of every type if you can just run
overnight on a server dedicated to verification or testing. I also think
hybrid methods have a ton of promise combining static analysis, dynamic
analysis, and above methods of test generation. Here's an example:

[https://homes.cs.washington.edu/~mernst/pubs/palus-
testgen-i...](https://homes.cs.washington.edu/~mernst/pubs/palus-testgen-
issta2011.pdf)

------
fouronnes2
Slightly off topic: I remember reading on HN about a tool that explored all
code paths in a binary automatically and used it to effectively find bugs and
even generate valid data. For example it could generate valid jpeg images just
by exploring the valid code path of a jpeg decoder. Does anyone remember that
tool's name?

~~~
wnoise
"afl" or "american fuzzy lop".

[https://lcamtuf.blogspot.com/2014/11/pulling-jpegs-out-of-
th...](https://lcamtuf.blogspot.com/2014/11/pulling-jpegs-out-of-thin-
air.html)

~~~
Joky
See also for in-process fuzzing:
[https://llvm.org/docs/LibFuzzer.html](https://llvm.org/docs/LibFuzzer.html)

------
muro
One thing that is rarely mentioned is that fuzzying finds crashes, but not
wrong code - you still need an oracle (something that tells you whether the
output is correct) and those can be very hard to write.

An useful tool, sure, but not a silver bullet.

~~~
lmkg
That's not entirely true. If you have multiple implementations, you can
compare fuzz results of the implementations against each other rather than
against ground truth. There's a project that does this with several popular C
compilers that's found some good bugs, but I can't remember the name right
now.

This approach does still have limitations, e.g. it assumes errors are
independent. And, of course, it requires independent implementations. But it
is sometimes more accessible than an oracle.

~~~
pfdietz
You're probably referring to Csmith, the random C generator from John Regehr's
group at U. of Utah.

[https://embed.cs.utah.edu/csmith/](https://embed.cs.utah.edu/csmith/)

Fuzzing C compilers this way is difficult, because you can't generate programs
that have undefined behaviors, and C has so many of those. The complexity and
challenge of their project was to not generate bogus programs while not
throwing away too much diversity.

The general technique is called differential testing. For compilers, it was
pioneered by William McKeeman at DEC in the 1990s. He also addressed C
compilers, with a more limited set of inputs than Csmith produces.

[https://www.cs.dartmouth.edu/~mckeeman/references/Differenti...](https://www.cs.dartmouth.edu/~mckeeman/references/DifferentialTestingForSoftware.pdf)

Personally, I've applied (for the last 15 years) this kind of structured
random testing to Common Lisp compilers. It's my contribution to the informal
development process for SBCL. In this case, the comparison is between code
compiled at different optimization settings, and with functions declared
NOTINLINE, and with or without type declarations. Any source of diversity that
preserves program meaning can be used to expose bugs. Many of the bug reports
I've submitted to SBCL come from the random tester (and some other randomized
testing techniques); you can get an idea of the kind of thing the tester finds
by looking at them:

[https://bit.ly/2PLZDuK](https://bit.ly/2PLZDuK)

------
slow_donkey
At risk of sounding ignorant but has anyone fuzzed something like APIs before?
Would love to get some anecdotes from fuzzing more abstracted systems.

~~~
paulgdp
The Rust ecosystem has a team that tries to fuzz as many crates (rust
packages) as possible [1].

Unlike C/C++ and like Python, fuzzing Rust code is not really about finding
memory bugs but more about finding logical errors [2].

To do this, a project has been set up with 83 (so far) targets fuzzing the
public API of 48 (so far) important crates [3].

All those targets can be fuzzed using any of the three major native code
feedback-based fuzzers (AFL, LibFuzzer, and Honggfuzz).

[1] [https://github.com/rust-fuzz/targets](https://github.com/rust-
fuzz/targets)

[2] see the trophy case: [https://github.com/rust-fuzz/trophy-
case](https://github.com/rust-fuzz/trophy-case)

[3] [https://github.com/rust-
fuzz/targets/blob/master/common/src/...](https://github.com/rust-
fuzz/targets/blob/master/common/src/lib.rs)

Disclaimer: I'm a member of this team and the author of the honggfuzz crate
that makes honggfuzz work with Rust code.

~~~
cpeterso
And Cargo has good support for integrating Rust fuzzers into one's own
projects:

[https://medium.com/@seasoned_sw/fuzz-testing-in-rust-with-
ca...](https://medium.com/@seasoned_sw/fuzz-testing-in-rust-with-cargo-
fuzz-13b89feecc30)

btw, I'm impressed that the rust-fuzz trophy list includes only one UAF, one
uninitialized memory read, and no segfaults. :)

[https://github.com/rust-fuzz/trophy-case](https://github.com/rust-
fuzz/trophy-case)

