
The Day I Fell in Love with Fuzzing - eaguyhn
https://nullprogram.com/blog/2019/01/25/
======
kragen
I haven't tried afl-fuzz myself, although it sounds like world-class awesome
software, but I'm a real believer in testing things with David MacIver's
Hypothesis, which invokes your functions with random inputs, and then does
similar canonicalization and minimization kinds of things. I like Hypothesis
so much that when I wrote Dumpulse
[http://github.com/kragen/dumpulse](http://github.com/kragen/dumpulse) I added
a Python interface to it _purely so I could test it with Hypothesis_. Which
found bugs, of course, even though Dumpulse is around 100 instructions when
compiled.

~~~
sametmax
I love the concept of hypothesis, but I struggle with finding a real life use
case. I just can't formulate my assertions, I don't know what to put in them.

unit tests are easy since I know what the code does, and I can just tell it
what to do and what I expect.

But with hypothesis, I have to find some kind of general property, which I
have a hard time to do.

Any tips, or materials I can use ?

P.S: I read your test.py, and see you use a rule base state machine. I've
never seen that in any tutorial on hypothesis I read. What does it do ? How do
you use it ?

~~~
felixyz
You might find these useful:

[https://hypothesis.works/articles/rule-based-stateful-
testin...](https://hypothesis.works/articles/rule-based-stateful-testing/)

[http://propertesting.com/book_stateful_properties.html](http://propertesting.com/book_stateful_properties.html)

For a long time, I felt the same way you describe: all introductions to
property-based testing show you things like testing a function that reverses a
list and yes it all looks very useful and nifty, but it's hard to imagine how
to formulate interesting variants for more complex code. These two posts were
the first to really give me a fuller picture and lots of practical strategies:

[https://fsharpforfunandprofit.com/posts/property-based-
testi...](https://fsharpforfunandprofit.com/posts/property-based-testing/)

[https://fsharpforfunandprofit.com/posts/property-based-
testi...](https://fsharpforfunandprofit.com/posts/property-based-testing-2/)

(In general, everything I've ready/watched by Scott Wlaschin has been super
helpful, and I've never written a line of F#.)

~~~
sametmax
Thanks a lot. This is why I still use HN :)

~~~
felixyz
This was my very first comment on HN. You made my day :)

~~~
sametmax
Just read it. Never heard of fsharpforfunandprofit before. Never touched f#
either. But damn, that's good.

------
ergothus
> When I got started, I had just learned how to use yacc (really Bison) and
> lex (really flex)

I've dug into parsers a few times, but everything I encountered seemed to
think that once I had a parsed tree of commands it was obvious how to consume
it...and it wasn't (for me). I've never had the free time to dedicate to
experimenting that abstractly, so anytime I'm tempted to write a DSL or
similar for a current problem I punt and do something else.

Does anyone have advice on how to find nice ways to use parsers/lexers in a
practical way that doesn't involve a massive investment of time or assume I
have a lot of abstract comp sci background? Every thing I've seen has been
more of a compilers course, which seems overkill.

~~~
adrianN
IMHO you're often best off with writing a simple recursive descent parser by
hand. For lexing you can often get away with splitting the input string at
word boundaries.

For slightly more advanced needs, parser combinator libraries make parsing and
lexing quite straightforward. I honestly wouldn't use a parser generator.

~~~
angara
Regular expressions are also useful for lexing. I like to work from this
example code from the Python standard library docs:
[https://docs.python.org/3.6/library/re.html#writing-a-
tokeni...](https://docs.python.org/3.6/library/re.html#writing-a-tokenizer)

~~~
fanf2
Remember, `lex` is just a pile of regular expressions, so it is great for
simple parsing in C. There are good reasons it was known as the Swiss Army
knife of Unix before scripting took over :-)

------
kodablah
For anyone interested on the JVM side, I wrote a fuzzer last year using afl-
like logic: [https://github.com/cretz/javan-warty-
pig](https://github.com/cretz/javan-warty-pig)

------
panic
_> I also combed through the outputs to see what sorts of inputs were
succeeding, what was failing, and observe how my program handled various edge
cases. It was rejecting some inputs I thought should be valid, accepting some
I thought should be invalid, and interpreting some in ways I hadn’t intended.
So even after I fixed the crashing inputs, I still made tweaks to the parser
to fix each of these troublesome inputs._

I love just looking through the absurd stuff AFL comes up with, even if it's
not causing a crash or incorrect behavior. Like this bit of art it caused my
parser generator to produce:
[https://i.imgur.com/VoV7cU9.png](https://i.imgur.com/VoV7cU9.png)

------
hyper_reality
This is a great introduction to fuzzing, I love the section about minimising
the corpus of fuzzed inputs and turning that into a test suite. That's
something that should be done with all parsing software!

------
evmunro
Fuzzing is super powerful, but can be a bit complicated to set up - that's why
I'm working on a fuzzing-as-a-service platform[1] that automates a bunch of
the steps described here.

If you're interested in trying out fuzzing without having to learn the
intricacies of AFL or set things up manually, let me know[2] and I can get you
set up with an account to play around with.

Happy to answer any questions about AFL/Fuzzbuzz!

[1] [https://fuzzbuzz.io](https://fuzzbuzz.io)

[2] everest [at] fuzzbuzz [dot] io

~~~
rixrax
What kind of closed source / commercial software can be fuzzed on your
platform? E.g. how would you fuzz something like Adobe Lightroom or Autodesk
AutoCAD there? What kind of reports would you provide?

~~~
andrei
The nature of fuzzers like AFL is that you get better results by instrumenting
your code and writing your own harness, but AFL has a "qemu mode" that runs
precompiled binaries in an instrumented VM instead. We'll be adding this to
the platform in the near future.

You won't get the same kind of results that you could by writing your own
harness, but it would still be possible to find crashes, extreme memory usage
or timeout bugs. Using something like libdislocator [1] would allow you to
expose certain memory bugs as well.

[1]
[https://github.com/mirrorer/afl/tree/master/libdislocator](https://github.com/mirrorer/afl/tree/master/libdislocator)

------
heyjudy
Someome made afl work with rust here: [https://github.com/rust-
fuzz/afl.rs](https://github.com/rust-fuzz/afl.rs)

~~~
steveklabnik
There’s also [https://github.com/rust-fuzz/cargo-
fuzz](https://github.com/rust-fuzz/cargo-fuzz)

------
afraca
The test files are coupled tightly to the implementation. He says it himself
that when he wants to restructure things new tests have to be generated. It
seems clumsy, but I don't have strong negative feelings on it.

~~~
caf
That just means that it's a form of white-box testing.

------
crb002
Or why you should write a formal parser instead of writing a YOLO shotgun
parser.

------
Quarrelsome
The fuzzing is interesting but whats the deal with storing config files as
binaries in 2019? Paradox has tons of text configs and doesn't even bother
minifying them.

Am I underestimating the quantity or smth?

~~~
ulysses
The config files are for a game that was announced in 1999 and released in
2003.

~~~
Quarrelsome
he's talking about reworking it in 2018.

> That’s the way things were until mid-2018 when I revisited the project.

~~~
c256
Yes, because retro-gaming is a thing.

~~~
Quarrelsome
Right. I'm just saying why not rip it out now? Why bother fuzzing parsers if
the parsers themselves aren't really adding value anymore? The ability to edit
your configs in a text editor is pretty valuable isn't it?

Maybe he's just got used to it.

~~~
detaro
He isn't rewriting the game (that was made by someone else), he is rewriting
his tools for modifying the game files.

