Hacker News new | comments | ask | show | jobs | submit login
The Day I Fell in Love with Fuzzing (nullprogram.com)
350 points by eaguyhn 24 days ago | hide | past | web | favorite | 47 comments



I haven't tried afl-fuzz myself, although it sounds like world-class awesome software, but I'm a real believer in testing things with David MacIver's Hypothesis, which invokes your functions with random inputs, and then does similar canonicalization and minimization kinds of things. I like Hypothesis so much that when I wrote Dumpulse http://github.com/kragen/dumpulse I added a Python interface to it purely so I could test it with Hypothesis. Which found bugs, of course, even though Dumpulse is around 100 instructions when compiled.


I love the concept of hypothesis, but I struggle with finding a real life use case. I just can't formulate my assertions, I don't know what to put in them.

unit tests are easy since I know what the code does, and I can just tell it what to do and what I expect.

But with hypothesis, I have to find some kind of general property, which I have a hard time to do.

Any tips, or materials I can use ?

P.S: I read your test.py, and see you use a rule base state machine. I've never seen that in any tutorial on hypothesis I read. What does it do ? How do you use it ?


You might find these useful:

https://hypothesis.works/articles/rule-based-stateful-testin...

http://propertesting.com/book_stateful_properties.html

For a long time, I felt the same way you describe: all introductions to property-based testing show you things like testing a function that reverses a list and yes it all looks very useful and nifty, but it's hard to imagine how to formulate interesting variants for more complex code. These two posts were the first to really give me a fuller picture and lots of practical strategies:

https://fsharpforfunandprofit.com/posts/property-based-testi...

https://fsharpforfunandprofit.com/posts/property-based-testi...

(In general, everything I've ready/watched by Scott Wlaschin has been super helpful, and I've never written a line of F#.)


Thanks a lot. This is why I still use HN :)


This was my very first comment on HN. You made my day :)


Just read it. Never heard of fsharpforfunandprofit before. Never touched f# either. But damn, that's good.


Welcome!


Welcome!


fred hebert (monocqc) has his actual book out just now on Amazon, iirc

https://www.amazon.com/gp/product/1680506218


How I often approach this is to 1) write a couple of unit tests 2) refactor unit tests into data-driven tests, using pytest.mark.parametrize 3) use hypothesis to generate the data. I very often get to 2, but not always to 3.

The most basic things to check

- No valid inputs give an error

- All invalid inputs give a error

These are enhanced if you sprinkle your code with assertions. Writing assertions inside the code is good training to identify. Something supposed always non-negative, check it. Something always supposed to have more than 0 entries, check it. Something supposed to be sorted, check a[0] <= a[1] and a[-2] <= a[-1].

Often you can test symmetry type properties

- deserialize(serialize(in)) == in.

- init(); connect(); disconnect(); -> state identical as just init();

Or you can use an 'oracle', something external that tells you whether the output is correct or not.

- allTricksOptimized(in) == simpleReferenceImplementation(in)

- myImplementation(in) == thirdparty.implementation(in)


Thanks. felixyz links did contains all those, but I'll remember the trick about pytest.mark.parametrize as a step toward hypothesis.


Well, normally Hypothesis just calls a function with some random arguments and checks that it "works", operationalized as "doesn't crash". But that's inadequate for testing stateful APIs, in which calling a single function isn't useful — for stateful APIs, you need to go through a series of calls, not just a single call. So the rule-based state machine thing gives you a way to describe the valid series of calls, and Hypothesis then generates 100 random sequences according to your spec and verifies that they all work. David wrote the article about it in 2017 that felixyz already linked: https://hypothesis.works/articles/rule-based-stateful-testin...

In many cases, though, it's better if you can refactor your code to support a stateless interface, and then you can write simple single-step function tests.

How do you know what to test? There's a whole spectrum. Just testing that valid inputs don't crash is often pretty useful. (If you're not sure what valid inputs would be, maybe try writing the README before the tests.) In the case of Dumpulse, I went to the other extreme; its actual semantics are extremely dumb (thus the name), so the test suite includes a reimplementation in Python of the intended semantics, and the test verifies that the implementation in C has the same behavior. (But in constant space and strictly limited runtime, which the Hypothesis test doesn't check.)

I picked a few functions and classes at random to see what kind of testable properties I could find:

- sympy.polys.rationaltools.together: this function gives a testable assertion in its docstring: "``apart(together(expr))`` should return expr unchanged." That's an easy property to test. Also, though, its output expression is intended to be equivalent to its input expression, so if you put together some kind of generator of expressions (which is actually the hard part in this context), you could verify that together(expr) always evaluates to the numerically same value as expr.

- horizons.world.buildability.settlementcache.SettlementBuildabilityCache: this seems to be a storage system that caches some data and then incrementally updates the cache as the data is modified. The desired property seems to be that adding some data to the cache, then modifying it, gives you the same results as you would have gotten just by adding the modified data to an empty cache. Also I think there's a serialization aspect, so you could check to see if loading the cache from a binary blob gives you the same results as just creating it in memory.

- statsmodels.datasets.macrodata.data.load: this function loads a CSV file from disk. Really all you can test about it, without interposition at the filesystem layer, is that it returns some data. It doesn't really take any inputs. A lot of installation problems could cause it to throw an error, though, so that would be a useful test in some circumstances.

- sre_constants: if executed as a script, this program should generate a syntactically valid .h file in sre_constants.h. You should be able to feed that to a C compiler to verify that, but if you care about that, your build system is probably already doing it!

Do you want to point me at some code of yours so I can see what kind of useful property-based tests could be written for it?


Thank you. Yes, please give me an example on a few functions from: https://github.com/Tygs/ayo/blob/master/ayo/scope.py


Damn it, I really wanted that example :(


I've started using Hypothesis for testing my code when I'm working on Kattis problems, if I get a runtime exception of some form. It's especially valuable considering that the official execution environment is essentially a black box. I don't know the actual inputs that fail, and I don't know the exception raised to figure out what it is. In almost every case, I just have hypothesis be the test input and soon discover my bugs.


> When I got started, I had just learned how to use yacc (really Bison) and lex (really flex)

I've dug into parsers a few times, but everything I encountered seemed to think that once I had a parsed tree of commands it was obvious how to consume it...and it wasn't (for me). I've never had the free time to dedicate to experimenting that abstractly, so anytime I'm tempted to write a DSL or similar for a current problem I punt and do something else.

Does anyone have advice on how to find nice ways to use parsers/lexers in a practical way that doesn't involve a massive investment of time or assume I have a lot of abstract comp sci background? Every thing I've seen has been more of a compilers course, which seems overkill.


IMHO you're often best off with writing a simple recursive descent parser by hand. For lexing you can often get away with splitting the input string at word boundaries.

For slightly more advanced needs, parser combinator libraries make parsing and lexing quite straightforward. I honestly wouldn't use a parser generator.


Regular expressions are also useful for lexing. I like to work from this example code from the Python standard library docs: https://docs.python.org/3.6/library/re.html#writing-a-tokeni...


Remember, `lex` is just a pile of regular expressions, so it is great for simple parsing in C. There are good reasons it was known as the Swiss Army knife of Unix before scripting took over :-)


IMHO recursive ascent is both easier to write and faster.


I could never understand why you have a parser as a separate thing when this was presented as an up-front decision. Reading this series of posts, where something like a traditional compiler architecture emerges in the same way that normal program architecture emerges via gradual refactoring, made it make a lot more sense to me: https://hokstad.com/compiler .


What skills/languages do you have? It can help tune the suggestions.

Barring that, you can google "compiler tutorial" and just start looking for one you like. One nice thing about compilers is they consist of a lot of stages and while the whole is arguably greater than the sum of its parts, the parts are all pretty darned useful on their own, too.


Javascript is my strongest, and I have enough Python to be comfortable - I had a lot of Perl back in the day but that knowledge is fairly rusty (though Parse::RecDescent was my first exposure to parsers). I did Java for a few years, but I refer to those as my Dark Times and wouldn't want to venture back there. (Experienced Perl dev doing Java is NOT a good time for either the dev or the code)

I can't argue about the broad applicability of compilers, but the main reason I've avoided such tutorials is that I'm looking for something I can get the basics of in hours/days rather than a weeks/months.


For the record neither is experienced Java dev doing Perl.


If you got to the point of having a tree, but found them clumsy to work with, the first thing I'd suggest is to play with an AST explorer. https://astexplorer.net/ is quite good.

Picture the task of what you want to do in your head. Maybe you're writing a linter, and you want to enforce a rule like "never use the 'var' keyword". Write some example code and poke around in the explorer.

Another thing to keep in mind is that once you have a tree, usually you also want to have some tree traversal utilities, so that you can walk over the tree and optionally transform it.


Actually "compilers course" is exactly what you should be looking for imho. Compilers are basically apps that parse input and output something in response to it - which is what this is all about. Lex and yacc are very simple to use once you understand the core principles.


Tbh parser combinators and PEG parsers are the modern way of doing these sorts of things.


PEGs (parsing expression grammars) are probably the by far easiest way currently.


For anyone interested on the JVM side, I wrote a fuzzer last year using afl-like logic: https://github.com/cretz/javan-warty-pig


> I also combed through the outputs to see what sorts of inputs were succeeding, what was failing, and observe how my program handled various edge cases. It was rejecting some inputs I thought should be valid, accepting some I thought should be invalid, and interpreting some in ways I hadn’t intended. So even after I fixed the crashing inputs, I still made tweaks to the parser to fix each of these troublesome inputs.

I love just looking through the absurd stuff AFL comes up with, even if it's not causing a crash or incorrect behavior. Like this bit of art it caused my parser generator to produce: https://i.imgur.com/VoV7cU9.png


This is a great introduction to fuzzing, I love the section about minimising the corpus of fuzzed inputs and turning that into a test suite. That's something that should be done with all parsing software!


Fuzzing is super powerful, but can be a bit complicated to set up - that's why I'm working on a fuzzing-as-a-service platform[1] that automates a bunch of the steps described here.

If you're interested in trying out fuzzing without having to learn the intricacies of AFL or set things up manually, let me know[2] and I can get you set up with an account to play around with.

Happy to answer any questions about AFL/Fuzzbuzz!

[1] https://fuzzbuzz.io

[2] everest [at] fuzzbuzz [dot] io


What kind of closed source / commercial software can be fuzzed on your platform? E.g. how would you fuzz something like Adobe Lightroom or Autodesk AutoCAD there? What kind of reports would you provide?


The nature of fuzzers like AFL is that you get better results by instrumenting your code and writing your own harness, but AFL has a "qemu mode" that runs precompiled binaries in an instrumented VM instead. We'll be adding this to the platform in the near future.

You won't get the same kind of results that you could by writing your own harness, but it would still be possible to find crashes, extreme memory usage or timeout bugs. Using something like libdislocator [1] would allow you to expose certain memory bugs as well.

[1] https://github.com/mirrorer/afl/tree/master/libdislocator


how does this compare to oss-fuzz [0]? is the main value proposition that its easier to set up?

[0]: https://github.com/google/oss-fuzz/


Hey - I'm the other guy working on Fuzzbuzz

It's similar to oss-fuzz in terms of functionality, in that it lets you integrate fuzzing into your dev workflow by automatically pulling your latest code, fuzzing in the background, alerting you on bugs, running regression testing, etc.

It differs in that while oss-fuzz is only for select large open-source projects, Fuzzbuzz lets anyone sign up and begin fuzzing their code. We also support more languages - the usual C/C++ as well as Golang, Python and Ruby, with more in the pipeline.


Someome made afl work with rust here: https://github.com/rust-fuzz/afl.rs



The test files are coupled tightly to the implementation. He says it himself that when he wants to restructure things new tests have to be generated. It seems clumsy, but I don't have strong negative feelings on it.


That just means that it's a form of white-box testing.


Or why you should write a formal parser instead of writing a YOLO shotgun parser.


The fuzzing is interesting but whats the deal with storing config files as binaries in 2019? Paradox has tons of text configs and doesn't even bother minifying them.

Am I underestimating the quantity or smth?


The config files are for a game that was announced in 1999 and released in 2003.


he's talking about reworking it in 2018.

> That’s the way things were until mid-2018 when I revisited the project.


Yes, because retro-gaming is a thing.


Right. I'm just saying why not rip it out now? Why bother fuzzing parsers if the parsers themselves aren't really adding value anymore? The ability to edit your configs in a text editor is pretty valuable isn't it?

Maybe he's just got used to it.


He isn't rewriting the game (that was made by someone else), he is rewriting his tools for modifying the game files.




Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: