
Finding bugs in SQLite, the easy way - robin_reala
http://lcamtuf.blogspot.com/2015/04/finding-bugs-in-sqlite-easy-way.html?m=1
======
CJefferson
I actually (as a big C and C++ fan) quite a damning result for C.

Its easy to say "just programming well" when bugs occur in badly written
projects, but SQLite is so well tested, and generally considered well written.

~~~
f-
As others have said, parsers for complex formats (be it text-based or binary)
are exceptionally hard. There are some classes of C/C++ software where the
choice of a language doesn't have such a striking effect. For parsers, the
effect is hard to ignore.

But parsers are also the kind of stuff you almost always end up writing in
C/C++, and there are semi-compelling reasons for doing so - chiefly,
performance and flexibility. You can disagree and make your pitch for Ocaml or
JavaScript or whatever, but really, if we had clearly superior choices, we
wouldn't be dealing with this problem today (or it would be a much more
limited phenomenon). There are some interesting contenders, but the revolution
won't happen tomorrow, no matter how much we talk about it on HN.

Perhaps a more fitting conclusion is that if you are parsing untrusted
documents, our brains are too puny to get it right, and the parser really
needs to live in a low-overhead sandbox. Mechanisms such as seccomp-bpf offer
a really convenient and high-performance way to pull it off.

~~~
stingraycharles
In C++, we have Boost.Spirit, and for C there is Bison/Flex. Surely those are
better and more safe alternatives than hand-rolling your own parser?

~~~
f-
Many of the horribly vulnerable parsers are generated with Bison / Flex, so
it's not exactly a robust solution. Plus, especially for binary formats
(images, videos, etc), it's hand-written or bust.

------
contingencies
Who has the best automated fuzzing team... how fast can they grab this stuff
(seconds from major open source repository commit time to automatic generation
of functional exploit)? There's sure to be teams out there in the
seconds/minutes range for a broad range of cases already.

------
nowarninglabel
Link to afl-fuzz is broken in the post, should be
[http://lcamtuf.coredump.cx/afl/](http://lcamtuf.coredump.cx/afl/)

------
bch
> I realized that the developers of SQLite maintained a remarkably well-
> structured and comprehensive suite of hand-written test cases in their
> repository

Tcl for the win[0].

[0] [https://www.sqlite.org/testing.html](https://www.sqlite.org/testing.html)

------
jevinskie
I guess you can run afl-fuzz along side LLVM's LibFuzzer for even better
results. Fuzzing is getting very interesting. =)

[http://llvm.org/docs/LibFuzzer.html](http://llvm.org/docs/LibFuzzer.html)

------
StavrosK
Does anyone have a good writeup on getting started with afl-fuzz? I'm a Python
developer and looked into it the other day for fuzzing Python programs, but
gave up after a bit because of the documentation (it seems geared towards
lower-level languages).

~~~
kkl
I'm sure you have checked out the official documentation
([http://lcamtuf.coredump.cx/afl/README.txt](http://lcamtuf.coredump.cx/afl/README.txt))?

Fuzzing Python programs with AFL will be hard. AFL leverages a version of the
GCC compiler to instrument (add additional code to) the resultant binaries.
Because Python is not a compiled language this will be difficult. I'm sure
there is something you could do to make it work.

~~~
gsnedders
There's a wrapper for CPython to get it to implement the tracing code, afl-
python[1]. Alex Gaynor did a basic intro to it a few days ago which is a
better write-up that the project itself has! Of course, this doesn't work so
well with mixed C/Python code.

[1]: [https://bitbucket.org/jwilk/python-
afl](https://bitbucket.org/jwilk/python-afl) [2]:
[https://alexgaynor.net/2015/apr/13/introduction-to-
fuzzing-i...](https://alexgaynor.net/2015/apr/13/introduction-to-fuzzing-in-
python-with-afl/)

