Hacker News new | past | comments | ask | show | jobs | submit login
Go: Fuzzing Is Beta Ready (golang.org)
306 points by ingve on June 4, 2021 | hide | past | favorite | 53 comments



Someone (Hillel Wayne?) observed that fuzz testing and property testing are basically the same thing, but the communities are almost entirely disjoint so the tools are completely separate.


I think Prop testing is great, but stuff like AFL (with instrumentation to basically find your conditionals for you, and working backwards to modify data for it) is very different from property testing.

My experience (real world, for actually existing code) is that property tests often require a lot of fiddling with the data generation, in order to actually stress your system in interesting ways. If you just throw "totally random" data into your system, you won't be testing very interesting properties. Amazing assurances and payoff from doing it, of course! Just like... "just generate random arbitraries" works a lot less when you're working with 100 field structs and only 10 or so "matter" in senses you care about.

To my knowledge I have not seen a property testing toolkit that leverages code coverage in the way fuzzing does.


Python's hypothesis did coverage guided property-based testing for a bit.

They turned it off (not sure if they turned it on again): the problem is that property based tests of the kind implemented with hypothesis are mainly supposed to be run as part of your CI/CD test suit. So they should be fast.

Coverage guided fuzzing like AFL does usually takes longer than people are prepared to wait for in their build/test server.


I've recently launched HypoFuzz - https://hypofuzz.com/ - which solves this by running it on a different server.

You run the tests briefly in CI to check for regressions, and then leave them running permanently on a fuzz server to search for new bugs. Nelson Elhage has a good writeup of this approach at https://blog.nelhage.com/post/two-kinds-of-testing/


Awesome!


There are some examples, for example Crowbar is an OCaml tool that uses AFL to drive property based tests.


People can have different definitions and still communicate usefully, and I think there is not 100% agreement on the exact boundaries between the two.

That said, for me: they are distinct but related, and that distinction is useful.

For example, Hypothesis[1] is a popular property testing framework. The authors have more recently created HypoFuzz[2], which includes this sentence in the introduction:

“HypoFuzz runs your property-based test suite, using cutting-edge fuzzing techniques and coverage instrumentation to find even the rarest inputs which trigger an error.”

Being able to talk about fuzzing and property testing as distinct things seems useful — saying something like “We added fuzzing techniques to our property testing framework” is more meaningful than “We added property testing techniques to our property testing framework” ;-)

My personal hope is there will be more convergence, and work to add convenient first-class fuzzing support in a popular language like Go will hopefully help move the primary use case for fuzzing to be about correctness, with security moving to an important but secondary use case.

[1] https://hypothesis.works

[2] https://hypofuzz.com


They are fundamentally the same in a lot of important ways. Any fuzz test is almost certainly (if not entirely certainly) a property test.

In Rust the driver for property and fuzz testing can be shared, which is nice[0].

https://docs.rs/arbitrary/1.0.1/arbitrary/trait.Arbitrary.ht...

By describing my data with arbitrary I've written programs that had traditional property testing and fuzz-testing as well, without any additional effort.

It's only really post-AFL where fuzzing has suddenly become synonymous with instrumentation guided data generation, which is a totally fine distinction to make, but they're still fundamentally equivalent. The first fuzzers were virtually identical to prop test frameworks today.

I'd be fine with dropping the "fuzzing" name entirely and instead just having us use the term property testing, with "data generation" being the thing we start differentiating ie: "random prop testing" or "instrumentation guided prop testing" or "type based prop testing" etc.


My understanding is that the difference between fuzz testing and property testing is how the input is crafted. Both can be viewed as a pair of things: a function to generate series of bits as input, and a way to turn these bits into the appropriate data under test.

Property testing generates these bits using a specified distribution, and that's about it. Fuzz testing generates these bits by looking at how the program is executed, and uses a black box to try to explore all paths in the program.

Most libraries for property testing comes with very convenient ways to craft the "input to data" part. Fuzz tools come with an almost magically effective way to craft interesting inputs. The two combines very well (and have been combined in several libraries).


Yep, but in the dual, they have the same goal: produce interesting inputs to the program which might exhibit trouble.

This is why you can use one of the approaches to help the other side of the approach.

The 3rd solution is concolic testing: use an SMT/SAT solver to flip branches. The path down to the branch imposes a formula. By inverting the formula, we can pick a certain branch path. Now you ask the SMT solver to check there there's no way to go down that branch. If it finds satisfiability, you have a counterexample and can explore that path.


Being thematically similar is not that interesting, if one works better than the others.

Coverage guided fuzzing is eating symbolic execution and concolic testing for breakfast. It isn't even close. As much as I love these more principled approaches, the strategy of throwing bits at programs and applying careful coverage metrics is just way more effective, even for cases that seem to be hand picked for SMT-based approaches to win.


> Property testing generates these bits using a specified distribution, and that's about it

I think most property testing frameworks also come with the concept of "shrinkage", which is a way to walk back a failed condition to try to find the "minimum requirement of failure". Though I am sure there are PT frameworks that haven't implemented this.


If memory serves right, AFL might have some support for something like shrinking?

Of all the property based testing libraries, hypothesis has some great ideas on shrinking. (By comparison, Haskell's QuickCheck is actually pretty bad at shrinking.)


Fuzz testing doesn't mean that the program execution is monitored in any more advanced way than detecting crashes. The fancy execution path feedback stuff came about relatively recently in the history of fuzzing.


Reading about fuzz testing all I could think of was 'is this not property testing?'... That is strange. It sounds like both communities could learn a lot from each other unless there is something I am missing (there probably is..)


In my mind fuzz testing is external, backboxed, and often profile-guided, while proptesting is more structured, internal (language aware) but generally less guided or entirely unguided.

This here is closer to what I see as property testing than fuzzing, although it looks like they plan on coverage feedback (so guided generation).


Property-based testing requires you to define the condition of success or failure (the "property" you're testing for) too, right? Whereas fuzzing just looks for crashes?


> Property-based testing requires you to define the condition of success or failure (the "property" you're testing for) too, right?

That property can be "does not crash".

And I'd say this is the structured / language-awareness part: with fuzzing you can't generally build an oracle.

And if you can it's of course trivial: just have a wrapper script check the result against the oracle, and trigger whatever the fuzzer looks for indicating "failure" whether it's a return code or a segfault or…


Fuzzing is more to ensure safety and property checking to ensure correctness IMHO. Both are related but not similar.


Surely the coverage of fuzz testing is a superset of property testing?


Right. That’s the point - they are both conceptually the same thing, testing via automatically generated inputs.


DeepState [1] is a tool that lets you write Google Test-style unit tests, as well as property tests, in either C or C++, and plug in fuzzers and symbolic executors. That is, DeepState bridges this gap between fuzz testing and property testing.

[1] https://github.com/trailofbits/deepstate


I expect the observation has been made many times, but one particular example of note is https://danluu.com/testing/


I've heard that a bit recently too. I think it's more that people selling property testing tools are trying to sell them as fuzzing tools to unsuspecting suckers.


the common point being, covering the input space right ?


IIRC there was this [1] issue that some people pushed for a couple of years. Then at some point, this other one [2] became the new one for it (which has Kate Hockman as the issue creator).

It's been a multi-year effort, so congrats to those who've made it happen.

[1] https://github.com/golang/go/issues/19109

[2] https://github.com/golang/go/issues/44551


There is a good LWN article that gives a useful overview of the current proposal as well as briefly hits on some of the history:

https://lwn.net/Articles/829242/


That is a good description. Rather than building a corpus genetically (mutating a random input and adding it to the corpus of it gives new coverage), I wonder if we could use static analysis to generate a corpus in a single shot? I.e., statically identify the branches and pick inputs that cover each branch?


Randomizing tests in any way is good and useful in theory, but am I the only one finding that it ends-up being counter-productive in practice?

What I found over time is that is erodes the trust of the team in the test suite. When you work on feature A, the last thing you want is for the tests of unrelated features B and C to randomly fail and prevent merging your work.

So what becomes the norm is re-running the tests until is passes rather than understanding it and fixing it.

Over time, such issues accumulate and it becomes harder and harder to be lucky enough for it to pass.

While the right thing to in theory do would be to always fix it, it's not always possible depending on the organization of the team and tasks, and being bothered by unrelated stuff during a specific task is just annoying.

That said, this package could be useful if pseudo-random (maybe it is, I didn't look so much in details).


When a bad input is found, it gets saved to your corpus and gets checked on all subsequent runs. So re-running the test won't make it go away.


Which contravenes a fundamental good practice of testing: Each test should run independently and be stateless.

Also, this would not work with most (all?) CI runs (which are inherently stateless), and working around this would be complex and introduce a lot of negative side effects.


Well, the alternative point of view is to see that when you have an infinite number of tests, you can't run them all, and need to sample them.

(Saving the seeds to the corpus still leaves your tests independent and stateless. You are just sampling from your space of all possible tests a bit more intelligently in future.)


What are the benefits of building out a whole separate part of the test framework that handles this? Is there a way to fuzz non-string inputs?


> What are the benefits of building out a whole separate part of the test framework that handles this?

Instead of making it part of the base runner? Maybe so the beta bits can work and it’ll be folded in more directly afterwards?

Also if it includes coverage guiding being able to know that a run is fuzzing or non-fuzzing would avoid having to include the fuzzing instrumentation for non-fuzzing run, mayhaps?

> Is there a way to fuzz non-string inputs?

From the design doc it looks like you can have any number of parameters and

> Fuzzing of built-in types (e.g. simple types, maps, arrays) and types which implement the BinaryMarshaler and TextMarshaler interfaces are supported.


I was thinking a library to generate fuzzed inputs that you could use in normal tests.

It's good to hear that it's not limited to strings.


> I was thinking a library to generate fuzzed inputs that you could use in normal tests.

The fuzzer input is random data. You don't need a special library to generate random data. In fact that's what the post tells you with respect to type compatibility:

> types which implement the BinaryMarshaler and TextMarshaler interfaces are supported

these are just tools to convert from/to binary (unstructured or utf8).


Are you sure it's random bytes? Some fuzzers start with a given input and then mutate it to increase coverage of the code under test.


> Some fuzzers start with a given input and then mutate it to increase coverage of the code under test.

Yes, and this one does, but you requested a library to generate inputs didn't you? A library can't get coverage feedback from the SUT, which as I wrote in my previous comment would be why you'd be building a dedicated test framework.


> Is there a way to fuzz non-string inputs?

Fuzzing usually revolves around strings because of escape characters, escape sequences. There is a much larger set of string characters than there are for the 10 or so numeric digits. Numbers don’t have the same problems that strings do, because numbers are usually interpreted only as data, whereas strings can be interpreted as data or computation.


> Fuzzing usually revolves around strings because of escape characters, escape sequences.

Not always. AFL has been used to detect issues around processing plain old binary data (eg https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2014-8637)

I would argue that anything involving a parser of some description (either binary or text-based) is a good candidate for fuzzing.


>What are the benefits of building out a whole separate part of the test framework that handles this?

I'm guessing that you won't always want to execute these alongside other tests. Go has also taken the same approach with benchmarks.


Fuzz testing is expensive.

And you gotta start somewhere. String inputs is a good start, and you can use those to test other inputs by factoring through conversion functions.


Any details about how the mutator works? The design doc hints at a “coverage-based” mutator, but I can’t see anything specific about how it works or even if that was implemented.



And the code: https://github.com/golang/go/blob/dev.fuzz/src/internal/fuzz...

Edit: and here's the mechanism that guides mutations towards increased coverage: https://github.com/golang/go/blob/5542c10fbf19cb199d1659c189...


Interesting. I've used gopter a lot for property-based testing, though it's very complex (and impressive), and can get slow or require hacks for complex types.

I'm glad this is being made, but like many other things that have been added to Go, it shows the limitations of the language that you can't just build this inside the language. (I might be wrong, as I haven't had a chance to look at the design docs/implementation yet, but the installation instructions imply that's the case).


> it shows the limitations of the language that you can't just build this inside the language.

Not sure why you'd make that assumption. https://github.com/dvyukov/go-fuzz


Yeah go-fuzz is an awesome tool, which I've used extensively on some of my own projects.

When writing parsers and compilers it has proven eerily good at identify corner-cases (panics, and infinite loops).

I'm looking forward to trying the new approach out. Anything that makes fuzz-testing easier to configure/maintain and spreads awareness is a good thing in my book.


go-fuzz requires an instrumented binary. It's essentially a forked Go compiler. So I think it's fair to call this a limitation of the language. But I don't see that as a negative. :)


How does fuzzing compare to something like Quickcheck? Are they basically equivalent?


QuickCheck gives you a mechanism for a form of unit testing where you build valid values as test cases and test that your code maintains specific expected properties.

Fuzzing is similar but typically involves starting from a known-good input then randomising it at the byte level (irrespective of validity). This project allows for property testing-like unit tests, but tools like american fuzzy lop focus on detecting whole application crashes.


Is there a law of some kind that says the sum of time taken to compile and test code is a language independent constant? So either our testing tools take time to run or your compiler or a mix of both .. if we want robust software, that is.


I don’t think this is true. Python tests take a loooong time to run while (on the other extreme) Go tests can compile and run nearly instantly, at least based on my 10-15 years experience with both languages.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: