Squashed Bugs, with Failure Rate Heatmaps

jacquesm · on Dec 21, 2019

I love all these developments. The hard part is to define what failure looks like, that can sometimes be as hard as the rest of the work of partitioning the code until the bug has too little room to hide.

The best protection against bugs is still not found in debugging fast or automatically but by reducing scope right at the design stage, the smaller the scope the easier it is to find bugs. This is one of the reasons why I think in the very long term functional languages will win out, they reduce scope in a very natural way because they tend to be almost side effect free unless you explicitly code that in.

Languages that make it easy to access stuff that is out of scope tend to feel very productive in the beginning of a project but you get bogged down quickly because everything starts to interact with everything else and soon it is too much to keep in your head at once.

perl4ever · on Dec 21, 2019

It may not be viable for "real" programming, but what I like these days is to just write some code without even trying to think too clearly about what I am doing or the details, and sprinkle asserts that encode how things should be if I was following what I am doing.

Then I run it interactively with source debugging and when anything goes wrong, I do some editing and more than half the time I don't even have to restart the program from the beginning.

And the more subtle the things I "assert" the more direction I get when they are violated. It eliminates a lot of effort in sorting out incorrect ideas when I can just encode my incorrect ideas and let the computer find the problem.

I've always shied away from Lisp and similar, but I'm definitely reminded of how advocates describe the benefits of developing in an interpreted environment. When I was a kid, I always had the sense that only compiled languages were prestigious, industrial strength, professional, whatever.

pfdietz · on Dec 21, 2019

Lisps are compiled, generally. It's just that you can add new compiled functions to a running image, and replace existing compiled functions. The running time of compiled Common Lisp can approach that of compiled C code, and sometimes be better.

The dividing line between compiled and interpreted languages has become permeable these days anyway. Perhaps the better dividing line is between languages that are designed to compile to efficient code, vs. languages whose design precludes that. In some cases this may just mean we need smarter compilers. And this is where I really like this kind of language fuzzing: it's a great way to test compilers, and can be extremely helpful at firming up implementations of complicated optimizations.

perl4ever · on Dec 21, 2019

"Lisps are compiled, generally"

The environment I was working in is compiled also, but like you say, you can modify it while running.

jacquesm · on Dec 21, 2019

That's roughly how I write Erlang programs, which is probably not how it should be done but it is surprisingly effective. First write your base cases, then add exceptions until it works.

iamxy · on Dec 21, 2019

Author here. Functional languages indeed avoid many code defects from its own design, and it is also easier to locate the problematic code. That's why we originally chose Rust rather than C++ to develop the TiKV project in PingCAP.

It's hard to implement a bot to do auto-debugging. The purpose of this Hackathon project was to develop a tool to improve debugging efficiency. Check out our progress on https://github.com/fuzzdebugplatform/fuzz_debug_platform.

miere · on Dec 21, 2019

I totally agree with that, mate. I wonder how efficient we developers would be if we can mix good designing techniques with good toolchains like those.

pfdietz · on Dec 21, 2019

One thing I've wondered about with this sort of language fuzzing is how well the fuzz inputs cover the program under test. They had all the tools in place to measure that, but I didn't see them report how well they did.

Ideally, one would like to adjust the fuzzer based on that information to steer executions to lesser covered parts of the code (or, possibly, toward parts you want to focus your testing on.)

iamxy · on Dec 21, 2019

Author here. Thanks for your great advice. We will apply this experimental project to testing TiDB in the future and output the report to https://github.com/fuzzdebugplatform/fuzz_debug_platform/iss.... Through the statistics of code block coverage, we can not only identify suspicious code blocks but also count code coverage.

The fuzzer we implemented is driven by BNF expressions. We can adjust the inputs of the fuzzer based on the statistics.

pfdietz · on Dec 21, 2019

One of the tricks you should try is adding Swarm Testing. It tends to make this sort of fuzzing more effective at finding bugs, and I would not be surprised if it improves your coverage also.

https://dl.acm.org/citation.cfm?id=2336763

https://blog.regehr.org/archives/591

iamxy · on Dec 21, 2019

Thank you for the information, very helpful, I will learn and try it in our test.

alexnewman · on Dec 21, 2019

I wonder why they used a SQL fuzzer as opposed to the nist compatibility guidelines. I've had the honor (torture) of implementing SQL 4 times and it was the only way I could effectively test compliance.

kqr · on Dec 21, 2019

Because of interaction effects. You can have something that passes all compatibility requirement examples yet fails in weird ways for long, strange queries that combine various features in ways not found in compatibility requirement examples, simply because these examples cannot possibly span the full, infinite input space.

As a simple example, maybe your system handles select queries fine, and it handles all legal field names, but some bug causes it to do the wrong thing when the third field in your select list is called "ape" because that matches the name of some internal bookkeeping structure you have. Does the code have to be badly designed for that to even be possible? Yes, but that is also the type of code we tend to find bugs in.

alexnewman · on Dec 22, 2019

So, I don't want to sound snarky, because what you are saying is not false, it just misses something. The nist compatibility guidelines aren't just about syntactic guidelines, they are about checking to make sure the isolation levels work correctly. The tests are meant to be extensive and reproducible. I'm not saying they are perfect, but I would suggest databases have been testing these particular issues for a long time. Another good tool to checkout is jepsen. Anyway, Nist compatibility guidelines are underused and as a result open source databases miss out.

kqr · on Dec 22, 2019

I agree both would be a good idea. I was just suggesting they might not be sufficient alone. I like that you pointed them out! Sorry for coming across a bit contrarian.

alexnewman · on Dec 24, 2019

Ah now I feel like I was wrong to make you feel as though you are contrarian. It was the right balance to bring knowledge to the post. Thanks for your input.

Razengan · on Dec 21, 2019

Why the Yoda title?

caiobegotti · on Dec 21, 2019

It's called fronting and it's is perfectly fine English.