
SQLite with a Fine-Toothed Comb (2016) - jxub
https://blog.regehr.org/archives/1292
======
collinf
Richard Hipp (creator of SQLite) had this to say about Rust and SQLite in the
comments:

> Rewriting SQLite in Rust, or some other trendy “safe” language, would not
> help. In fact it might hurt.

Prof. Regehr did not find problems with SQLite. He found constructs in the
SQLite source code which under a strict reading of the C standards have
“undefined behaviour”, which means that the compiler can generate whatever
machine code it wants without it being called a compiler bug. That’s an
important finding. But as it happens, no modern compilers that we know of
actually interpret any of the SQLite source code in an unexpected or harmful
way. We know this, because we have tested the SQLite machine code – every
single instruction – using many different compilers, on many different CPU
architectures and operating systems and with many different compile-time
options. So there is nothing wrong with the sqlite3.so or sqlite3.dylib or
winsqlite3.dll library that is happily running on your computer. Those files
contain no source code, and hence no UB.

The point of Prof. Regehr’s post (as I understand it) is the the C programming
language as evolved to contain such byzantine rules that even experts find it
difficult to write complex programs that do not contain UB.

The rules of rust are less byzantine (so far – give it time :-)) and so in
theory it should be easier to write programs in rust that do not contain UB.
That’s all well and good. But it does not relieve the programmer of the
responsibility of testing the machine code to make sure it really does work as
intended. The rust compiler contains bugs. (I don’t know what they are but I
feel sure there must be some.) Some well-formed rust programs will generate
machine code that behaves differently from what the programmer expected. In
the case of rust we get to call these “compiler bugs” whereas in the
C-language world such occurrences are more often labeled “undefined behavior”.
But whatever you call it, the outcome is the same: the program does not work.
And the only way to find these problems is to thoroughly test the actual
machine code.

And that is where rust falls down. Because it is a newer language, it does not
have (afaik) tools like gcov that are so helpful for doing machine-code
testing. Nor are there multiple independently-developed rust compilers for
diversity testing. Perhaps that situation will change as rust becomes more
popular, but that is the situation for now.

~~~
IshKebab
Undefined behaviour is not a compiler bug - it is _deliberate_.

And having undefined behaviour in your C code is definitely not a good thing,
even if it is basically unavoidable.

The real problem is that the C and C++ standards cop out to UB in too many
places, e.g. with things like type aliasing and people reasonably think
"weeeell, it may be UB but it works now and I need it so screw it" and then
you have a mess of programs relying on de facto non-standard behaviour which
is shit.

The C people just need to officially define some of the de facto behaviours.

Rust doesn't have this problem because it doesn't leave so many basic things
undefined.

~~~
akira2501
> The C people just need to officially define some of the de facto behaviours

Sure, as soon as the all the different ISA people officially define some of
the de facto behaviors. UB isn't in the standard "just because" it's in the
standard because there is no apparent underlying standard.

~~~
cryptonector
Aliasing rules, for example, have nothing to do with ISAs. Neither do pointer
comparison rules, and many others besides.

The rule about memcmp() with invalid pointers by length zero does have to do
with actual systems, but it can still be standardized and the vendors with
now-non-compliant memcmp() implementations just have to fix it. This has
happened before (e.g., snprintf()), so the ISA thing is a total cop-out.

~~~
zlynx
Weird ISAs are exactly why you cannot compare pointers. Segmented memory for
one. Or imagine an OS and compiler that implemented automatic overlay
switching. With that and PAE on 32-bit x86 systems you could have special "far
overlay" pointers returned from malloc calls which would map in different 1 GB
overlay sections when accessed.

Aliasing rules are important in some ISAs too. Like weird DSPs. Imagine a
system where 32-bit objects can't even share the same memory space as 8-bit
objects. Casting a pointer to a different sized type is completely meaningless
there. Of course programming such a weird thing is usually done in assembly,
but there are C compilers.

~~~
cryptonector
I'm not familiar enough with C on segmented architectures, so I can't quite
speak to that, but I was referring to [0], which clearly has nothing to do
with segmented architectures.

As to aliasing, ISAs too had nothing to do with the reason for aliasing rules,
but rather optimizations for functions like memcpy() (as opposed to
memmove()).

[0]
[https://news.ycombinator.com/item?id=17439467](https://news.ycombinator.com/item?id=17439467)

~~~
mattnewport
There are other aliasing rules that have big performance impacts on certain
architectures. The Xbox 360 and PS3 Power cores for example had a severe load-
hit-store performance penalty that tended to be triggered by code that moved
data between floating point and integer registers via memory. Strict aliasing
rules that allow the compiler to assume float and int pointers don't alias
could make a huge performance difference but those rules are also the source
of much troublesome undefined behavior for code that does intentional type
punning.

The ISA in this case requires going via memory to move data between fp and
integer registers and certain implementations of that ISA had major
performance impacts associated with that. In this case UB rules really did
allow for valuable optimizations but really did cause trouble elsewhere.

~~~
cryptonector
The ISA you describe doesn't require aliasing rules. It merely gives you an
incentive to have them.

C and other languages need much better control over aliasing than the
'restrict' keyword and compiler command-line switches.

------
siscia
Completely unrelated, but maybe I can use the deep knowledge of HN.

Browsing from the article I ended up in the page of tis-interpreter that says
"You can also use TrustInSoft to maintain compliance or reach certification
according to the norms EN-50128/IEEE 1558." [1]

How does this compliance process work? Some of you have experience?

[1]: [https://trust-in-soft.com/industries/rail/](https://trust-in-
soft.com/industries/rail/)

~~~
pascal_cuoq
You should contact them[1]. TrustInSoft has a document that matches the
classes of defects that its analyzer can guarantee the absence of to the
vocabulary and recommendations of EN 50128. If you are preoccupied by this
standard in particular, I'm sure that this document would make things clearer.

N.B.: You had drifted away from the description of tis-interpreter at some
point before you arrived to [https://trust-in-
soft.com/industries/rail/](https://trust-in-soft.com/industries/rail/) . While
nothing prevents you from finding tis-interpreter useful in application of EN
50128, the page was written with TrustInSoft Analyzer in mind. TrustInSoft
Analyzer is a static analyzer that propagates through the program sets of
values (“abstract values” is the technical term[2]) instead of the concrete
values corresponding to specific inputs that tis-interpreter propagates. As a
result, TIS Analyzer can guarantee the absence of unwanted behaviors for all
possible inputs. This is what makes it valuable from the point of view of
software safety.

[1] disclosure: by “them”, I mean “us”, since I'm a co-founder and work there

[2] see
[https://en.wikipedia.org/wiki/Abstract_interpretation](https://en.wikipedia.org/wiki/Abstract_interpretation)

------
jokoon
I wonder if there are enough statically compiled languages out there.

Not to be arrogant, but it doesn't seem new and/or recent languages are
picking up fast enough, because their syntax is just not simple enough. I see
many languages, and I never really find one that is interesting to me.

I recently found volt, and I thought this language was really awesome. Then I
realized that it is garbage collected.

D is fine, but it has too many high level constructs (OOP and al) which I
don't find useful. Rust is okay, but to me syntax matters most, and to me rust
is too far from C.

All of this shows that you cannot beat C.

I just want a language that picks up from C, has the ease of use and
readability of python, doesn't include high-level constructs as first class
citizen, and compiles to LLVM or machine code. I wish I had the skills to
build such language. C++ is close, but its slow compilation and its backward
compatibility with C are not good.

~~~
adwn
> _All of this shows that you cannot beat C._

No, all of this shows that you have very peculiar preferences and demands,
some of which are more emotional than technical.

> _I just want a language that picks up from C, has the ease of use and
> readability of python_

These two requirements are mutually exclusive.

~~~
cryptonector
Nim does that. But I myself don't want Python-style syntax. I like braces and
dislike semantic indentation -- part of it is that I like showmatch, and part
of my aesthetic sense.

------
lixtra
Another call for boringcc[0]. [0]
[https://news.ycombinator.com/item?id=10772841](https://news.ycombinator.com/item?id=10772841)

