I'm somewhat surprised by D. Hipp's response. I understand that the UB in SQLite is "OK" since it is very well tested, but I figured he would still have a strong opinion about avoiding almost all UB.
Regehr is right though, I'm working on porting SQLite to iOS (and a CMake build system along the way) for as part of testing a custom compiler for correctness. I already got the SQLite TCL test suite running on a jailbroken device and the number of failures was less than 20, out of many tens of thousands of tests. That is definitely tractable!!
Edit:
Mr. Hipp is a doctor, not a professor. Thank you, FrankBooth.
I'm not sure what in the article gave you the impression that Richard Hipp doesn't have a strong, positive opinion about fixing undefined behavior in SQLite. He has patiently fixed all the issues that were straightforward to fix. That's commendable.
Some of the design choices in SQLite go back to the early 2000, and looked like good ideas at the time. Some of the problems detected by tis-interpreter had no sanitizer to detect them until the latter appeared. That's a long time to use seemingly okay, in-practice-harmless idioms such as 1-indexed arrays everywhere in a large software project. And there is the question of how many real bugs would be introduced by trying too hard to fix these (though hopefully the excellent test suite would prevent that).
You may not have seen Richard's comment that was added at the bottom of the the blog post just before 'jevinskie' commented. It includes this paragraph:
Prof. Regehr did not find problems with SQLite. He found
constructs in the SQLite source code which under a strict
reading of the C standards have “undefined behaviour”,
which means that the compiler can generate whatever machine
code it wants without it being called a compiler bug.
That’s an important finding. But as it happens, no modern
compilers that we know of actually interpret any of the
SQLite source code in an unexpected or harmful way. We know
this, because we have tested the SQLite machine code –
every single instruction – using many different compilers,
on many different CPU architectures and operating systems
and with many different compile-time options. So there is
nothing wrong with the sqlite3.so or sqlite3.dylib or
winsqlite3.dll library that is happily running on your
computer. Those files contain no source code, and hence no
UB.
I'm not sure yet if this is a appropriately healthy or inappropriately fatalistic attitude. I like that it puts the emphasis on testing the binary rather than focussing on how any particular compiler version interprets the source, but I don't know if even SQLite has dense enough tests to justify confidence in this approach. It also makes me wonder if there should be greater emphasis on binary distribution.
On the density of SQLite's tests, Hwaci owns a test suite with MC/DC coverage[1][2], that Richard kindly loaned to John for this experiment. MC/DC coverage is used as a criterion for safety-critical software testing. Before hearing about SQLite, I had never seen anyone commit to MC/DC coverage without being forced to by a safety standard.
This test suite is how John found so much stuff, and that's also how one can trust that the binaries produced by compilers tested by Richard Hipp work correctly even if there is some undefined behavior in the source code.
I'm amazed by the quality of their tests, but still wonder if it's enough. The question I had going in was whether their test coverage looked all the possible branches in the source (ie, what the programmer intended) or all the branches in the binary (ie, post compiler "optimizations").
Looking at your second link, it ends up being the less desired second option, but since they are using -fprofile-arcs it may be the added profiling code defeats the unwanted optimizations. But still, a potential blind spot occurs when both the production and instrumented code have the same omissions.
So while their level of testing is truly heroic, since the effect of UB optimization can be to omit branches that the compiler reasons can never happen, it seems like there still may be cases where source level coverage does not match up with binary coverage. Maybe they have some external way of noticing that this is happening?
Edit: Just saw that there are some new comments from both John and Richard at the bottom of the blog post.
If some necessary code is optimized away, this can surely translate to the code missing some requirement or failing, which can be covered by a regression test case.
Anyway, it sounds like the testing is better than what is applied to the compilers they are using.
Having said that, D. Hipp has been very eager to merge other bug fixes revealed by ASan and perhaps MSan/TSan. I found a bug with ASan in our local SQLite version had that was already found and fixed with ASan upstream.
Regehr is right though, I'm working on porting SQLite to iOS (and a CMake build system along the way) for as part of testing a custom compiler for correctness. I already got the SQLite TCL test suite running on a jailbroken device and the number of failures was less than 20, out of many tens of thousands of tests. That is definitely tractable!!
Edit: Mr. Hipp is a doctor, not a professor. Thank you, FrankBooth.