Regehr is right though, I'm working on porting SQLite to iOS (and a CMake build system along the way) for as part of testing a custom compiler for correctness. I already got the SQLite TCL test suite running on a jailbroken device and the number of failures was less than 20, out of many tens of thousands of tests. That is definitely tractable!!
Mr. Hipp is a doctor, not a professor. Thank you, FrankBooth.
Some of the design choices in SQLite go back to the early 2000, and looked like good ideas at the time. Some of the problems detected by tis-interpreter had no sanitizer to detect them until the latter appeared. That's a long time to use seemingly okay, in-practice-harmless idioms such as 1-indexed arrays everywhere in a large software project. And there is the question of how many real bugs would be introduced by trying too hard to fix these (though hopefully the excellent test suite would prevent that).
Prof. Regehr did not find problems with SQLite. He found
constructs in the SQLite source code which under a strict
reading of the C standards have “undefined behaviour”,
which means that the compiler can generate whatever machine
code it wants without it being called a compiler bug.
That’s an important finding. But as it happens, no modern
compilers that we know of actually interpret any of the
SQLite source code in an unexpected or harmful way. We know
this, because we have tested the SQLite machine code –
every single instruction – using many different compilers,
on many different CPU architectures and operating systems
and with many different compile-time options. So there is
nothing wrong with the sqlite3.so or sqlite3.dylib or
winsqlite3.dll library that is happily running on your
computer. Those files contain no source code, and hence no
This test suite is how John found so much stuff, and that's also how one can trust that the binaries produced by compilers tested by Richard Hipp work correctly even if there is some undefined behavior in the source code.
Looking at your second link, it ends up being the less desired second option, but since they are using -fprofile-arcs it may be the added profiling code defeats the unwanted optimizations. But still, a potential blind spot occurs when both the production and instrumented code have the same omissions.
So while their level of testing is truly heroic, since the effect of UB optimization can be to omit branches that the compiler reasons can never happen, it seems like there still may be cases where source level coverage does not match up with binary coverage. Maybe they have some external way of noticing that this is happening?
Edit: Just saw that there are some new comments from both John and Richard at the bottom of the blog post.
Anyway, it sounds like the testing is better than what is applied to the compilers they are using.
Here I try to depict the various categories of undefined behaviors (UBs)
I wish authors would qualify more of their exotic initialisms in general.