But looking at https://www.sqlite.org/cgi/src/rptview?rn=7 I see a few Core Crash Bugs every month (April has 7). Most of them are not 'critical', to be clear, plenty of 'severe' and 'important' in the list.
I don't want to disparage SQLite. They have a free, fantastic product, go to great lengths to have stability no matter what, and document their processes. I learned a lot of them.
But clearly, our gold standard is not perfect. So now what?
* We might turn to theorem proving. But is it possible at SQLite scale? Are the proofs themselves enough? I remember about a critical piece of software with a proof no incorrect answers would come out. Turns out they forgot to prove the program would end, so they got no answer ever instead of an incorrect one. How do you prove your proofs are complete enough?
* We might turn to fuzzing, code analyses, linting,... SQLite does all of them.
* Or simply admit we are human, fallible, and try to do the best we can while knowing our limits? Sounds defeatist.
> Or simply admit we are human, fallible, and try to do the best we can while knowing our limits? Sounds defeatist.
Not all bugs are manmade (though obviously the vast majority are). But to give examples of other types of bugs: there could be bugs created by compiler optimisations or bugs due to esoteric hardware. Or even features that was intended behaviour but the landscape has since shifted in ways the developers were not aware (I guess you could argue that last point is manmade bit it's arguably not the failure of the developer).
For example take function that takes a byte as an input and has byte as an output. Moreover has no external or internal state. You can run through all 256 inputs and check the outputs.
However, if the function has 128 bits of input. You could not possibly enumerate all possible inputs and check the output. So it takes other methods to prove it's bug free. This can be hard to find, and there is no way to automate this in all cases. (Halting problem)
However, bug free code is not impossible to write. I might say a bug free system is more likely to be impossible since would involve perfect hardware that is also immune to interference, and the physical world can cause a bit to flip while performing a computation.
If computing was still just flipping switches and waiting for LEDs to flash then you might have a point. But in real terms, it's just not possible to write 100% bug free code anymore. The real trick is eliminating all the bugs that people are likely to experience in normal operation and all the bugs that attackers might exploit. Even that is no small feat.
It is like arguing that because a ship's navigation system was faulty the directions given for a destination were wrong.
To use his words we supply computers to programs, we don't supply programs to computers. or something to that effect.
https://www.cs.utexas.edu/~EWD/transcriptions/EWD10xx/EWD103... Paragraph beginning "But before a computer is ready to perform a class of meaningful manipulations ..."
To use the ships navigation system example, if you have some kind of "auto-pilot" system that would steer a ship into central London when a bad destination was set, then you'd quickly demand the authors of the auto-pilot to write exceptions for bad destinations.
That's the ugly side of development; all too often us developers are having to write code to handle ugly systems. It's not fun but it's usually necessary.
Luckily perfection and correctness are not important. While the desire to mathematically prove things may cloud judgement and waste time seeking mathematically perfect world that doesn't exist, we still can make almost anything work reliably on top of unreliable components with guaranteed probabilities. That's what resilience and fault tolerance are all about and how most of the world works. It's too bad that some of it is touched by this "let's do at least something for security" attitude and stupid perfect correctness ideology that of course never guarantees anything with any certainty, so we have planet scale security vulnerabilities that impact everyone to deal with.
I think @laumars meant impossible in practice, not impossible in the strict mathematical sense. We can speculate that in theory it’s possible to write bug free programs that are large and complex, but nobody has ever done it. And not for flaky hardware or compiler reasons, all large software has many software defects, bugs in the design. It is not currently possible in practice to write a program like SQLite that is bug free, and no analysis of halting problem or number of input bits will tell us whether it could actually happen. Nobody even uses metrics like that for software stability because they don’t tell us anything practical.
I've heard it quipped as every program has at least one bug, and every program can be optimized to be smaller, and so by induction every program can be reduced to a single, wrong, instruction.
You can substitute "manmade" with a better adjective if you wish.
Do we have systems to write programs that are free of dangling pointers? Yes. Double frees? Yes. Buffer overflows? Yes. Seg faults? Yes.
With regard to infinite loops, we cannot prove programs in general terminate or not, but we CAN prove that some programs terminate. If our program cannot be proven, we can adjust it until it can.
I guess this is similar to theorem proving. We have to be able to define what kind of bug we're talking about, and develop in a system that allows us to prove to ourselves that we no longer have that class of bug.
When I talk about a "system", I don't care what it is. Maybe it's a library or set of functions or API that doesn't allow for certain types of errors. Maybe it's a static analysis tool. Maybe it's a language. It doesn't matter, as long as you can have high assurances later that your large code base no longer contains a particular class of errors.
Does this get rid of all bugs? No, but it greatly reduces the possible ways that bugs can exist, and that's huge progress.
Errors due to hardware failing because of physical properties are expected and unavoidable, but a logical statement no matter how big is supposed to be provable to be correct (except maybe weird self referencing statements, but that's another topic). I think that's what makes our expectations higher in that field as opposed to any other ones.
Algorithms, are mathematically able to be proved, but any real implementation has to involve physical hardware at some level, these implementation details cannot (reasonably) be proven mathematically therefore need sufficient tests.
In the absence of a proof, tests can cover both algorithms & implementations.
Question: what typically happens when you attempt to prove a program that is not correct, especially a large program?
1. You come to a point where to proceed with the proof you have to show some particular thing, and it is apparent that you cannot show that. The nature of this snag does tell you what to change in the program in order to clear this block. I.e., the proof attempt finds actionable bugs in the program.
2. You come to such a point, but the nature of the snag doesn't tell you anything about why it happened or how to fix it. All you learn is that your techniques are insufficient to resolve the status of this program.
3. You reach a point where you have deduced all that you can deduce and made all the inferences that you can make, but that was not enough to prove the program correct. You haven't hit any particular concrete blocker though. You have simply run out of ideas. Maybe someone else can do better.
Question 2: when you have proven some program correct, how hard is it to adapt or update the proof when new features are added to the program?
If it takes a lot of work up update proofs, they may only be feasible for programs where the proved version can stay in service for a long time.
For programs where a given version can stay in service for a long time, I think we probably should be making more effort to prove them if they are used in areas of importance.
SQLite probably does fall into that category for most of its users and use cases.
So the bugs get reported, fixed and the test-set grows, and the software grows more stable?
If the api-surface area doesn't change too much, given the right practices, the software should only grow more stable, right? If bugs like these become rarer and rarer over time, then it's fine, no?
Code that doesn't have to evolve at all for years is very very rare, especially in a product that's used on evolving hardware and software architecture.
Responding to parent post, i don't see another solution than theorem provers if we want to pretend that we're doing anything reliable. But it's going to be a long long way...
This particular kind of problem is fortunately pretty well-known, though I won't pretend that it's not still a possibility with a lot of proof systems. In the case of SQL databases, we are fortunate in that the base language doesn't provide any mechanism for nontermination (besides WITH RECURSIVE, which most queries will not have). So we don't actually need to prove termination for most queries, but instead provide a proof of "termination-preserving refinment"--that given a query that was already going to terminate (or not), our optimized query will have the same termination behavior (as well as, if it terminates, providing "the same" results, up to the usual SQL caveats about row ordering). Now we'll still have nontermination, but only for queries that are inherently nonterminating (i.e. incorrectly written WITH RECURSIVE queries).
As far as "is this kind of proof possible at the SQLite scale": I think it is. The thing it's solving is the sort of thing that is at least reasonably specifiable ("given this SQL query and this data set, these are legal results" and "given these transactions, this is the effect on the permanent storage system"). It would be a huge effort (probably on par with CompCert at a minimum, but I'd bet on significantly more work), but I think it would be worth it: embedded databases like SQLite are very easy to drop into other programs and are used extensively, and one that was formally verified would probably be able to have immediate impact (by comparison, Sel4 or CompCert often aren't in a position to replace existing parts of consumer toolchains). I also think the database is in a position where many people would probably be prepared to sacrifice some of the hardest-to-prove optimizations in exchange for correctness guarantees.
That doesn't sound defeatist to me at all, just realistic, as long as it isn't used as an excuse. After all, Sqlite is one of the best tested libraries available.
One reason I really like DB that are basically thin layers on top of SQLite, like bedrock  and some other I cant remember on top of my head. May be one day we see the expanded role of SQLite.
As long as we need expressively languages, it will be possible to write bugs (be that low level languages or even "scripting" languages). And as long as we have a diverse ecosystem of hardware and software stacks (and I hope we do continue to) then there will be bugs.
Bugs are unavoidable. The point of testing frameworks / fuzzing / etc is to reduce the number and severity of bugs.
However if you do have a method of writing 100% bug free code then I'd be interesting in you sharing it ;)
Looks like SQLite used the sanitizers, and a new tool still found plenty UB.
I believe the SQLite auther verified the compiler output for the UB found, and saw it had no impact in practice, so there was no critical bug yet. Then he fixed the UB anyhow.
Last paragraph was from vague memories, not from the article, BTW. I would welcome better links.
The resulting script for reproducing the bug is indeed concise, but I almost want to read about the starting point, and how much of a struggle it was to investigate and deduce the factors behind this bug.
"You may enter a privacy password below. This provides only mild security, but should prevent others from messing with your subscription. Do not use a valuable password as it will occasionally be emailed back to you in cleartext."
This is arguably better security because you can leave the password field blank, and a random one will be generated for you. That protects non-tech people from accidentally using a valuable password for something that is not actually security-sensitive.
Hashing passwords is not hard.
You can hash password and also email password in clear text at sign up, when you still have access to the password.
And it doesn't even make sense to send hash of the password back in email.
I was talking about the practice quoted in that comment of not hashing passwords.
Edit: To be clear it doesn't say it explicitly, but it says that the cleartext password is sent periodically and the only way to do that is to not hash it.
If you're not using a default-generated password in your mailman subscription you're doing it wrong.
(mailman should really just get rid of passwords altogether for non-mod/admin users.)
(but seriously, it's great when a project is so stable that having a bug is newsworthy).
Just want to know if we can help you in any way. I'm in awe of moderation efficiency on this site.
I once thought it would be fun to have a control DSL for HN, such that people could plant phrases in comments to signal things to either moderators or the software. Like most ideas of that sort, it never made it over the 'too complicated' speed bump.
That said, it's a little surprising to see a bug of this type in SQLite which generally has such solid test coverage... Oh well, nothing is bug free after all.
I was just thinking that's the sign of an extremely mature system. No system will never be free of problems, but you will have less of them, and paradoxically, the problems you are left with have far more complicated and usually emergent paths to activation.
What do you mean by this?
"SQLite (Severity: Critical): Infinite loop due to ORDER BY LIMIT optimization"
It's very arguable that the bit in the brackets is editorialization. On the one hand I added it so nontechnical manager types would go "wait, what" and forward the issue internally so techs learn sooner. On the other hand, the subject itself is bad enough that _most_ will figure it out anyway, and pragmatically speaking this probably isn't going to kill anything by not being fixed within the next 30.8 seconds, so...
Although this could be classed as a security vulnerability because now SQL injection can get you DoS. But the likelihood of the DoS being on a server is arguably low. Definitely nonzero but arguably low.
How are you suggesting a team on a very large project find this elsewise?
Anyway I'm not suggesting they find it elsewise. In fact as you rightfully pointed out finding bugs like this is incredibly hard.
However surely those code-generator bugs could have been spotted earlier? The reason they weren't was because they didn't cause any obviously wrong behaviour, which made it hard to find them using test, but as long as you pay attention you can still spot them (as evidenced by the fact that they were eventually fixed).
So if you're wondering how a bug like this might have slipped past their 'extensive test coverage', I felt it was important to point out that putting too much trust in those test is also a viable explanation. Not necessarily the right explanation, as I have no way of knowing the situation at SQLite, but a possible explanation.
Presumably they are to find it using something besides tests? Intuition? Casting lots? Code review?
EDIT: Not meant to say that the original comment was reasonable.
Such as, in their draft goals , under S30200, they outline a memory allocator goal by referring to a proof.
Though, if the parents goal was to imply that the whole of SQLite should be verified... I'm not convinced its entirely possible. Quite a lot of SQLite revolves around things around where the halting problem may come into play.
By that argument every implementation of quicksort (say) would be using formal methods. But they aren't. There are some definitions of formal methods allowing this kind of "weaker" form, but for the sake of this discussion such methods do not prevent a particular class of bugs. Explicit goal statements are, while being a very good direction, informal.
> Though, if the parents goal was to imply that the whole of SQLite should be verified... I'm not convinced its entirely possible. Quite a lot of SQLite revolves around things around where the halting problem may come into play.
This is probably correct and I don't recommend to do so. Even CompCert , probably the single most verified C compiler, is not absolutely fully verified and of course there had been bugs in the unverified modules (codegen AFAIK). But formal methods can be partially applied to produce provably more correct code (in some metrics).
SQLite involves extensive amounts of IO, it's a flat-file database most of the time, afterall. That is inherently unsafe, and can't be proven to terminate.
SQLite also has an internal VM, for which it generates code (the source of the bug in this article). The VM is Turing-Complete, hence you can't always prove that it will terminate.
Under reasonable assumptions about I/O (POSIX APIs and stuff), you can certainly prove termination of such operations.
> SQLite also has an internal VM, for which it generates code (the source of the bug in this article). The VM is Turing-Complete, hence you can't always prove that it will terminate.
You don't have to always prove any VM program will terminate. You just have to show termination-preserving refinement of the SQLite code generator (as I alluded to in my other post). You don't even need types for this.
*I'm sure there is some exception out there but it doesn't matter for this argument.