
How to break everything by fuzz testing - MD87
https://chameth.com/break-everything-fuzz-testing/
======
TwoBit
My favorite personal fuzzing story is from 1987 when a friend said his x86
graphics drawing program was solid, and I said OK and smashed both hands on
the keyboard and it insta-crashed.

~~~
MaxBarraclough
Mark Twain put it best: _The weakest of all weak things is a virtue which has
not been tested in the fire._

------
thechao
> The fix for this was fairly straightforward - I just made the library keep a
> record of the previously visited IFDs and bail out if it found a loop.

If you just want to detect loops, keep a “+1” pointer that you use to
increment through the data; also, keep a “+2” pointer that is advanced _twice_
each time your “+1” pointer advances: either your “+2” pointer hits the end,
or it becomes equal to your “+1” pointer — in which case you have a loop.

~~~
colatkinson
Also known as the "Tortoise and Hare Algorithm!"

[https://en.wikipedia.org/wiki/Cycle_detection#Floyd's_Tortoi...](https://en.wikipedia.org/wiki/Cycle_detection#Floyd's_Tortoise_and_Hare)

------
bhaak
If you haven't read the whole article, you should do that.

There's a funny plot twist at the end.

3 bug reports by discovering 1 bug. What a bargain! :-D

------
oweiler
In university we had to write a simple fuzzer which extracted options from man
pages and ran the corresponding command with randomized but valid options.
Didn't take long until we found the first bug in one of the tested commands.

~~~
elktea
that's a good assignment - extra marks for reporting the bugs?

------
pfdietz
A very funny thing about fuzzing is how random input testing used to be so
looked down upon by the software testing community. Read old (1970s) testing
books and you'll see comments like "random testing is the worst kind of
testing". I still saw this even as recently as a decade ago.

~~~
praptak
Fuzzing is not random testing though. It's _directed_ random testing.

~~~
pfdietz
The original fuzzing was as random + black box as it gets.

~~~
UncleMeat
Yes, and things like coverage guided fuzzing have completely revolutionized
things. Prior to directed fuzzing, it was okay but largely unimpressive. Now
it blazes through code structures that were previously used as motivating
examples for symbolic execution. It is a meaningfully different technique
today.

~~~
pfdietz
Well, that depends on what you mean by "impressive", and in what domain. Black
box compiler fuzzing has been very effective.

~~~
UncleMeat
That's actually one of the few fields where I feel like fuzzing has
underperformed. There was an interesting paper at OOPSLA this year that found
that while the fuzzing community has indeed found a lot of bugs, that these
bugs actually are triggered by real code approximately-never. It was a really
interesting result coming from within a community that ordinarily biases
towards overinflating the value of PL techniques.

~~~
pfdietz
That paper, if it's the one I'm thinking of, found that a compiler bug found
by fuzzing was more likely to be found by a user, than a user found bug was
likely to be found by another user. So if fuzzing-found bug reports are bad,
user-found bug reports are even less useful.

Another thing to remember is that as blackbox fuzzing became state of the
practice, its benefit declined, as the bugs it would find would be found
early, by the developers themselves. All testing techniques are self-limiting
this way.

I want you to look at the results of jsfunfuzz and tell me the impact of that
wasn't profound.

------
Psyladine
"A QA engineer walks into a bar. Orders a beer. Orders 0 beers. Orders
99999999999 beers. Orders a lizard. Orders -1 beers. Orders a ueicbksjdhd.

First real customer walks in and asks where the bathroom is. The bar bursts
into flames, killing everyone."

------
snazz
Fuzzing is fun! If you're doing it on your personal computer (as opposed to a
cloud VM somewhere), I'd suggest putting the testcase output directory on a
spinning-rust hard drive that you don't care about instead of your (presumably
much more expensive) internal SSD. It creates an impressive number of disk
writes.

I've been thinking about fuzzing JavaScript code (not attacking V8 or
SpiderMonkey, but the JS code itself). While JavaScript might not be
vulnerable to buffer overflows and format string vulnerabilities, it certainly
can have logic issues, unhandled exceptions, and DoS vulnerabilities that are
exposed by fuzzing.

I took a look at the most-depended-on NPM packages. I'll try writing test
harnesses on functions that take user input. Does anyone have any ideas for
packages that could use some fuzz testing?

~~~
segfaultbuserr
> _I 'd suggest putting the testcase output directory on a spinning-rust hard
> drive that you don't care about instead of your (presumably much more
> expensive) internal SSD._

Even better, use the /dev/shm RAM disk if your memory is more than enough
(although you should probably create an additional RAM disk with a size limit
if you don't want a runaway program to accidentally drain your RAM). On a
modern development machine, taking 2 GiB out for testcase issue is usually not
a problem, and there's often a significant acceleration.

------
jansan
It can be difficult to evaluate the result of a test. We solved this by using
an existing (of course inferior) library that uses a different algorithm for
the same task (different algorithm so it fails at different tests). We would
run the same test with both libraries and compare the results. If they were
different, we had to find a way to decide which library failed or maybe
evaluate those failed cases manually.

------
jansan
I have used fuzz testing to make my Bezier intersection library more robust
against edge cases. The test would try to find all intersections between a
random pair of curves that lie within certain bounds (you probably know that
there can be up to nine intersections between two cubic Bezier curves). At the
beginning it failed at approx. 1 in 100,000 randomly generated curve pairs,
now I am at a point where there is not a single failure in a billion test.

My problem was how to decide if a test failed, because this would not be a
crash, but failing to find an intersection between the curves. So I compared
against an existing library which uses a completely different algorithm, which
means the other library fails at other test cases than my own library. If the
results in a test case were different, one must have failed and by testing
against the found intersections I could easily decide which one.

------
luord
I love case studies like this. They are the best way to show why the subject
matter at hand is important and worth investing time into.

------
ToFab123
Interesting. Is there any fuzzing libraries for c#?

~~~
thewebcount
Can you call c# from C? If so, then you can just use any C fuzzing library,
and have it call your C# code. I do this with C++ and Objective-C using
clang's libfuzzer. You write a single C function that takes a pointer to a
buffer and a length and pass it to whomever you want. I just write a C wrapper
that calls my Objective-C or C++ functions with the data.

~~~
snazz
Doesn't libFuzzer only require `extern "C" int LLVMFuzzerTestOneInput(...` to
fuzz C++ code? What else does your C wrapper do beyond that? Google puts their
fuzz tests right alongside the rest of the Chromium source code, which is C++.

------
mebr
My summary of this blog post: plenty of random input data can reveal code
bugs. The kind of bugs that would take probably a lot of time of think of and
write unit tests for, in advance.

~~~
userbinator
_The kind of bugs that would take probably a lot of time of think of and write
unit tests for, in advance._

Would it? Maybe it's because I've had a "low-level upbringing", but whenever
I'm writing parsing code for a file format, "assume any byte of data you read
can have any value" is the norm. The rest of it follows from there.

~~~
jra_samba_org
Yeah, I've gotten to the point where I can't do any arithmetic on any values
without immediately adding integer wrap tests afterwards.

~~~
mcswell
Reminds me of using a slide rule. You normally push the inner part (the C
scale) to the right, line up the 1 on the C scale with the first number you're
multiplying on the D scale, then look on the C scale for the second number
you're multiplying, and read the result off the D scale immediately below
that.

But when the result is more than 10, you've wrapped: your answer is off the D
scale. So now you have to push the inner part back to the left, and line up
the 10 (usually marked as 1, at the right-hand end) on the C scale with the
first number on the D scale. And remember to add 1 to the exponent.

I've seen slide rules where the D scale goes slightly beyond 10 (like 10.1),
so if the result was just a tiny bit over 10, you wouldn't need to wrap.

------
WrtCdEvrydy
It just gets worse and worse....

