> Rewriting SQLite in Rust, or some other trendy “safe” language, would not help. In fact it might hurt.
Prof. Regehr did not find problems with SQLite. He found constructs in the SQLite source code which under a strict reading of the C standards have “undefined behaviour”, which means that the compiler can generate whatever machine code it wants without it being called a compiler bug. That’s an important finding. But as it happens, no modern compilers that we know of actually interpret any of the SQLite source code in an unexpected or harmful way. We know this, because we have tested the SQLite machine code – every single instruction – using many different compilers, on many different CPU architectures and operating systems and with many different compile-time options. So there is nothing wrong with the sqlite3.so or sqlite3.dylib or winsqlite3.dll library that is happily running on your computer. Those files contain no source code, and hence no UB.
The point of Prof. Regehr’s post (as I understand it) is the the C programming language as evolved to contain such byzantine rules that even experts find it difficult to write complex programs that do not contain UB.
The rules of rust are less byzantine (so far – give it time :-)) and so in theory it should be easier to write programs in rust that do not contain UB. That’s all well and good. But it does not relieve the programmer of the responsibility of testing the machine code to make sure it really does work as intended. The rust compiler contains bugs. (I don’t know what they are but I feel sure there must be some.) Some well-formed rust programs will generate machine code that behaves differently from what the programmer expected. In the case of rust we get to call these “compiler bugs” whereas in the C-language world such occurrences are more often labeled “undefined behavior”. But whatever you call it, the outcome is the same: the program does not work. And the only way to find these problems is to thoroughly test the actual machine code.
And that is where rust falls down. Because it is a newer language, it does not have (afaik) tools like gcov that are so helpful for doing machine-code testing. Nor are there multiple independently-developed rust compilers for diversity testing. Perhaps that situation will change as rust becomes more popular, but that is the situation for now.
Not that I'm taking sides here. I'm really interested in both the extensive testing that the SQLite team does, and the analysis that John blogs about.
For one, it's quite likely that an embedded platform's toolchain will not be part of the SQLite test configurations. Secondly, SQLite can be and is compiled into a binary, and this means that all bets are off, especially if LTO is enabled.
Thirdly there are products that build on SQLite, such as its own commercial encryption extension and other extensions from third parties. The former probably enjoy the same level of testing, but it's not clear how the latter are tested.
The conclusion is that it's humanly impossible to write memory-safe C, even with 100% test coverage, static and dynamic analysis. Something like Frama-C is required, which is virtually unheard of for the majority of open source and commercial software.
C has both compiler bugs and undefined behaviour. Undefined behaviour is an inherent property of the C standard, while a compiler bug is a property of the implementation (a place where it doesn't match the standard).
A valid argument along the same lines might be that the Rust compiler has existed for less time and is used less than C compilers, and therefore is more likely to contain bugs.
> Because it is a newer language, it does not have (afaik) tools like gcov that are so helpful for doing machine-code testing.
Coverage tools work on Rust, such as kcov. I'm not sure of the state of gcov itself though.
> Nor are there multiple independently-developed rust compilers for diversity testing.
Isn't diversity testing only necessary/good because there are many C compilers? Using your phrasing, if the code compiles and runs correctly (i.e. every single machine instruction is checked) with the one Rust compiler that exists, then it works.
There's definitely many reasons why a language having multiple compilers is good, but I think "diversity testing" is circular logic.
And having undefined behaviour in your C code is definitely not a good thing, even if it is basically unavoidable.
The real problem is that the C and C++ standards cop out to UB in too many places, e.g. with things like type aliasing and people reasonably think "weeeell, it may be UB but it works now and I need it so screw it" and then you have a mess of programs relying on de facto non-standard behaviour which is shit.
The C people just need to officially define some of the de facto behaviours.
Rust doesn't have this problem because it doesn't leave so many basic things undefined.
If it's literally unavoidable, then the language specification is BROKEN.
Now, most C UB is avoidable, but it's very difficult to notice some UB, and most compilers aren't that good at telling you about the UB they exploit. In this sense UB is unavoidable in that human programmers may often write code with UB without noticing.
If it's only "practically unavoidable", not literally, then the language specification and/or the compilers (by failing to warn about it) are BROKEN.
You cannot blame C programmers, not anymore. The committee has been much too aggressive in its zeal to speed up C by adding more UB cases. We've reached the point where compiler outputs run very fast because all the important bits have been elided by the optimizer, breaking the program in the process. We, the users of the language, have been pushed to the breaking point by the committee and the compiler groups. Please stop. And don't just stop, revisit some of the worst UB decisions.
Yes, even C89 had lots of footguns, but UB was much more manageable.
The only reasons I myself have not yet abandoned C are: a) I haven't learned Rust yet, b) many codebases I work with are C codebases and won't get rewritten in Rust anytime soon, c) it takes time to get enough critical mass. (c) is happening though, and (a) is, for me, just a matter of time; (b) I can solve by moving on to new things, but the world is full of legacy code that we can't just abandon/rewrite, so moving on isn't exactly likely.
Sure, as soon as the all the different ISA people officially define some of the de facto behaviors. UB isn't in the standard "just because" it's in the standard because there is no apparent underlying standard.
The rule about memcmp() with invalid pointers by length zero does have to do with actual systems, but it can still be standardized and the vendors with now-non-compliant memcmp() implementations just have to fix it. This has happened before (e.g., snprintf()), so the ISA thing is a total cop-out.
Aliasing rules are important in some ISAs too. Like weird DSPs. Imagine a system where 32-bit objects can't even share the same memory space as 8-bit objects. Casting a pointer to a different sized type is completely meaningless there. Of course programming such a weird thing is usually done in assembly, but there are C compilers.
As to aliasing, ISAs too had nothing to do with the reason for aliasing rules, but rather optimizations for functions like memcpy() (as opposed to memmove()).
The ISA in this case requires going via memory to move data between fp and integer registers and certain implementations of that ISA had major performance impacts associated with that. In this case UB rules really did allow for valuable optimizations but really did cause trouble elsewhere.
C and other languages need much better control over aliasing than the 'restrict' keyword and compiler command-line switches.
"The disagreement is not over whether or not UB is a problem, but rather how serious of a problem. Is it like “Emergency patch – update immediately!” or more like “We fixed a compiler warning” or is it something in between."
In C regarding UB, everything goes every single time the compiler gets upgraded.
Browsing from the article I ended up in the page of tis-interpreter that says "You can also use TrustInSoft to maintain compliance or reach certification according to the norms EN-50128/IEEE 1558." 
How does this compliance process work? Some of you have experience?
N.B.: You had drifted away from the description of tis-interpreter at some point before you arrived to https://trust-in-soft.com/industries/rail/ . While nothing prevents you from finding tis-interpreter useful in application of EN 50128, the page was written with TrustInSoft Analyzer in mind. TrustInSoft Analyzer is a static analyzer that propagates through the program sets of values (“abstract values” is the technical term) instead of the concrete values corresponding to specific inputs that tis-interpreter propagates. As a result, TIS Analyzer can guarantee the absence of unwanted behaviors for all possible inputs. This is what makes it valuable from the point of view of software safety.
 disclosure: by “them”, I mean “us”, since I'm a co-founder and work there
 see https://en.wikipedia.org/wiki/Abstract_interpretation
Not to be arrogant, but it doesn't seem new and/or recent languages are picking up fast enough, because their syntax is just not simple enough. I see many languages, and I never really find one that is interesting to me.
I recently found volt, and I thought this language was really awesome. Then I realized that it is garbage collected.
D is fine, but it has too many high level constructs (OOP and al) which I don't find useful. Rust is okay, but to me syntax matters most, and to me rust is too far from C.
All of this shows that you cannot beat C.
I just want a language that picks up from C, has the ease of use and readability of python, doesn't include high-level constructs as first class citizen, and compiles to LLVM or machine code. I wish I had the skills to build such language. C++ is close, but its slow compilation and its backward compatibility with C are not good.
It's GC, but that can be turned off and you can use manual memory management or you can simply use one of a handful of GC methods that work for you (in my experience, the GC is very performant). It has easy syntax like Python. It can be compiled to C, C++, ObjC, or JS (currently). It has tons of meta-programming features like templates, macros, etc. It can be cross-compiled fairly easily to almost any platform. It has whatever high level stuff you want to use, or just don't use them. It's not 1.0 yet, but it's pretty stable overall and the current release is feature-full enough that you could do just about anything with it that you want, right now. It's also got a really easy to use FFI. It's also about as fast as C, in practice.
No, all of this shows that you have very peculiar preferences and demands, some of which are more emotional than technical.
> I just want a language that picks up from C, has the ease of use and readability of python
These two requirements are mutually exclusive.
We had quite a few alternatives, with saner defaults, but some did not come with an OS and others died alongside with their OS.
And getting rid of UNIX and POSIX will be a very hard thing to ever achieve, other than being a Google, Microsoft, Apple and pushing something else forward no matter to whom it hurts and how much it costs.
D has betterC mode. It's hard to get closer to C than that.
I can understand if you make the argument that D isn't as portable because it doesn't support as many platforms, but you certainly don't need to use classes, which are not even supported in betterC.
No other language is going to be more backwards compatible with C, because no other language has that goal. If slow compilation is what is bothering you, you might want to look at your compiler options for your target architecture and of course, update your hardware. What type of software are you writing? What is your development platform, and what are you targeting? Are you using a good IDE, like Clion?
I want to write games, and usually the libraries involved can be a little large, like bullet or Ogre3D. I remember MSVC 2012 stopping compilation because of a lack of memory.
Accelerating the step of rebuilding a single file requires precompiled headers, which is atrocious to manage, or specific code decomposition in several files, which is quite demanding.
I'm aware modules might be accelerating compilation (I hope I'm right), but I'm not even sure they will be part of C++20.
I like C++ too, but the slow compilation times are really getting on my nerves, and I wish there was a good solution for that, even if it breaks backward compatibility.