
SQLite with a Fine-Toothed Comb - jsnell
http://blog.regehr.org/archives/1292
======
jevinskie
I'm somewhat surprised by D. Hipp's response. I understand that the UB in
SQLite is "OK" since it is _very_ well tested, but I figured he would still
have a strong opinion about avoiding almost all UB.

Regehr is right though, I'm working on porting SQLite to iOS (and a CMake
build system along the way) for as part of testing a custom compiler for
correctness. I already got the SQLite TCL test suite running on a jailbroken
device and the number of failures was less than 20, out of many tens of
thousands of tests. That is definitely tractable!!

Edit: Mr. Hipp is a doctor, not a professor. Thank you, FrankBooth.

~~~
pascal_cuoq
I'm not sure what in the article gave you the impression that Richard Hipp
doesn't have a strong, positive opinion about fixing undefined behavior in
SQLite. He has patiently fixed all the issues that were straightforward to
fix. That's commendable.

Some of the design choices in SQLite go back to the early 2000, and looked
like good ideas at the time. Some of the problems detected by tis-interpreter
had no sanitizer to detect them until the latter appeared. That's a long time
to use seemingly okay, in-practice-harmless idioms such as 1-indexed arrays
everywhere in a large software project. And there is the question of how many
real bugs would be introduced by trying too hard to fix these (though
hopefully the excellent test suite would prevent that).

~~~
nkurz
You may not have seen Richard's comment that was added at the bottom of the
the blog post just before 'jevinskie' commented. It includes this paragraph:

    
    
      Prof. Regehr did not find problems with SQLite. He found   
      constructs in the SQLite source code which under a strict 
      reading of the C standards have “undefined behaviour”, 
      which means that the compiler can generate whatever machine 
      code it wants without it being called a compiler bug. 
      That’s an important finding. But as it happens, no modern 
      compilers that we know of actually interpret any of the 
      SQLite source code in an unexpected or harmful way. We know 
      this, because we have tested the SQLite machine code – 
      every single instruction – using many different compilers, 
      on many different CPU architectures and operating systems 
      and with many different compile-time options. So there is 
      nothing wrong with the sqlite3.so or sqlite3.dylib or 
      winsqlite3.dll library that is happily running on your 
      computer. Those files contain no source code, and hence no 
      UB.
    

I'm not sure yet if this is a appropriately healthy or inappropriately
fatalistic attitude. I like that it puts the emphasis on testing the binary
rather than focussing on how any particular compiler version interprets the
source, but I don't know if even SQLite has dense enough tests to justify
confidence in this approach. It also makes me wonder if there should be
greater emphasis on binary distribution.

~~~
pascal_cuoq
On the density of SQLite's tests, Hwaci owns a test suite with MC/DC
coverage[1][2], that Richard kindly loaned to John for this experiment. MC/DC
coverage is used as a criterion for safety-critical software testing. Before
hearing about SQLite, I had never seen anyone commit to MC/DC coverage without
being forced to by a safety standard.

This test suite is how John found so much stuff, and that's also how one can
trust that the binaries produced by compilers tested by Richard Hipp work
correctly even if there is some undefined behavior in the source code.

[1]
[https://en.wikipedia.org/wiki/Modified_condition/decision_co...](https://en.wikipedia.org/wiki/Modified_condition/decision_coverage)

[2]
[https://www.sqlite.org/testing.html#mcdc](https://www.sqlite.org/testing.html#mcdc)

~~~
nkurz
I'm amazed by the quality of their tests, but still wonder if it's enough. The
question I had going in was whether their test coverage looked all the
possible branches in the source (ie, what the programmer intended) or all the
branches in the binary (ie, post compiler "optimizations").

Looking at your second link, it ends up being the less desired second option,
but since they are using -fprofile-arcs it may be the added profiling code
defeats the unwanted optimizations. But still, a potential blind spot occurs
when both the production and instrumented code have the same omissions.

So while their level of testing is truly heroic, since the effect of UB
optimization can be to omit branches that the compiler reasons can never
happen, it seems like there still may be cases where source level coverage
does not match up with binary coverage. Maybe they have some external way of
noticing that this is happening?

Edit: Just saw that there are some new comments from both John and Richard at
the bottom of the blog post.

~~~
kazinator
If some necessary code is optimized away, this can surely translate to the
code missing some requirement or failing, which can be covered by a regression
test case.

Anyway, it sounds like the testing is better than what is applied to the
compilers they are using.

------
advisedwang
UB = undefined behaviour for anyone else who didn't get the unspecified
acronym at first glance.

~~~
SixSigma
It is defined at first use :

Here I try to depict the various categories of undefined behaviors (UBs)

~~~
advisedwang
Huh, I didn't see that when I first read the article. Maybe it was added or
maybe I just missed it.

~~~
thefifthsetpin
He amended the article after getting comments like yours.

------
TorKlingberg
We need to get John Regehr on the C standard committee!

