
The Saturation Effect in Fuzzing - weinzierl
https://blog.regehr.org/archives/1796
======
Taniwha
First fuzzer I wrote (we didn't call them that, we were just trying to find
bugs triggered by bad packets in a cable TV protocol suite) made random bad
packets - by making good ones and then mutating them past some random point -
it found 3 bugs (which was amazing at the time) but the real problem with all
these systems is (in terms of my problem ):

\- the number of possible bad packets is literally billions of times larger
than good ones \- pretty soon the percentage of those that trigger bad
behaviour gets close to 0

~~~
jsnell
Modern fuzzing tools are far more effective at probing the state space than
just random mutations. It's hard to appreciate just how effective until you
see it in practice.

E.g. last time I fuzzed a network element with AFL, it took seconds from it to
go from a starting corpus of a single ethernet IPv4 SYN packet to some double-
encapsulated IP-in-NSH-IP-in-NSH-ethernet monstrosity that triggered a
misparse. And seconds more for it to generate a IPv6 packet with a fragment
extension header that triggered some other problem. A random walk would have
no chance of finding that.

~~~
mentat
Do you know of any good writeups for this particular kind of process? I too
did fuzzing of network devices before it was called fuzzing and am interested
in trying it again with modern tooling.

~~~
woodruffw
The AFL technical details document[1] is a decent reference for one particular
subset of fuzzer feedback: code coverage metrics.

[1]:
[https://lcamtuf.coredump.cx/afl//technical_details.txt](https://lcamtuf.coredump.cx/afl//technical_details.txt)

------
not2b
Coverage-based fuzzing won't find cases where an extreme value of some
variable causes a malfunction but there is no code that treats this value
specially, because the coder missed it. The most common case of this is an
integer variable with the smallest possible value: 0x80000000 for 32 bits. The
problem with this value is that if you negate it, it is still negative. That
might cause a computation to go badly wrong, but a coverage-based fuzzer might
"think" that it has covered every state your code can reach without finding
bugs.

Is anyone aware of fuzzers that take issues like this into account by
explicitly trying problematic values, like INT_MIN for an integer variable?

~~~
landr0id
Most fuzzers I think have a dictionary of special values they’ll occasionally
use. I wrote a structure-aware fuzzer framework which uses random values for
the initial generation, then on subsequent mutations will perform
arithmetic/bit flips with a small chance to grab a special value from the
dictionary.

Even without a dictionary if your input is reasonably small it should discover
these special values given enough iterations.

~~~
not2b
The dictionary of special values is a good approach to deal with this. Without
it, you'd need billions of iterations before randomly trying 0x80000000 for a
32-bit integer.

------
arthurjj
I find fuzzing fascinating but it seems to only be used in protocols and tools
such as compilers, networking protocol implementations etc. Does anyone have
experience using it for business logic?

~~~
aaron695
Not sure of your exact definition of fuzzing but our staging environment had
real data in it.

I'd always run end to end tests on random real data records.

I don't see many people doing this, not sure?

There seems to be kickback because tests might fail one run and pass another.
People don't seem to like this.

~~~
zxcmx
I think as long as it's reproducible it's Ok... but should be a team decision.
The data has to be non-sensitive of course, very often it's not ok to pull
data out of prod. Fuzzers in general tend to work hard to make sure crashes
are reproducible (usually with a seed at the start of the run, and by saving
crashing inputs).

If you can't repro a crash a month or a week later, it's not worth it IMHO.

Usually you can get most of the benefits with data generation, based on a
deterministic seed.

Mainly though, you have to ask about the purpose of the tests. Most developers
want a test suite that tells them with high confidence that what they just
worked on didn't create a new bug, or a regression. That lets them stay
focused on their work rather than chasing through the codebase for an
unrelated latent issue which might have been introduced by someone else, years
ago.

There is often value in separating "exploratory" or "stochastic" tests which
might uncover _new_ bugs (previously unknown) from regular tests. To make it
really work as part of default workflow you need a culture which understands a
feature might take additional time because the team stopped to fix a latent
issue to get back to green.

To put it bluntly, letting random-ish testing break your pipeline is making a
statement of business priority (we care so much about random bugs that we will
stop all other work until they are fixed) which might not align with reality.

"Is this the most important thing for me to be working on for the success of
the business?"

I think the "right" way to add fuzzing / random testing to a pipeline is sell
the value to the business and have an initiative to rigorously fuzz the snot
out of the software. Crucially, have resources dedicated to triage and fix the
identified issues. It should be a non-breaking pipeline stage right up to the
point where everyone feels confident that any test failures are the result of
new code.

The worst case is that an opinionated developer adds stochastic tests, with
bad reproducibility, without consulting the team and randomly breaks builds in
an organisation that only rewards or understands feature delivery and ticket
punching. That is basically going to make their co-workers life hell and is
IMHO not the right way to go about it.

------
carlmr
Are there any good resources on using fuzzers for the first time?

------
j-kent
I thought this was going to be about the guitar.

