Hacker News new | comments | show | ask | jobs | submit login
SQLite: Infinite loop due to the ‘order by limit’ optimization (sqlite.org)
144 points by ScottWRobinson 9 days ago | hide | past | web | favorite | 74 comments

I think most of us agree that SQLite is close to the gold standard for stability and testing. This post reminds us that even SQLite has critical bugs.

But looking at https://www.sqlite.org/cgi/src/rptview?rn=7 I see a few Core Crash Bugs every month (April has 7). Most of them are not 'critical', to be clear, plenty of 'severe' and 'important' in the list.

I don't want to disparage SQLite. They have a free, fantastic product, go to great lengths to have stability no matter what, and document their processes. I learned a lot of them.

But clearly, our gold standard is not perfect. So now what?

* We might turn to theorem proving. But is it possible at SQLite scale? Are the proofs themselves enough? I remember about a critical piece of software with a proof no incorrect answers would come out. Turns out they forgot to prove the program would end, so they got no answer ever instead of an incorrect one. How do you prove your proofs are complete enough?

* We might turn to fuzzing, code analyses, linting,... SQLite does all of them.

* Or simply admit we are human, fallible, and try to do the best we can while knowing our limits? Sounds defeatist.

It's impossible to write bug free code. All these tools we have do is mitigate the number and severity of bugs around. If the remaining ones are weird edge cases that don't affect many people then that's a net win for everyone else. But we shouldn't assume that even "gold standards" are bug free

> Or simply admit we are human, fallible, and try to do the best we can while knowing our limits? Sounds defeatist.

Not all bugs are manmade (though obviously the vast majority are). But to give examples of other types of bugs: there could be bugs created by compiler optimisations or bugs due to esoteric hardware. Or even features that was intended behaviour but the landscape has since shifted in ways the developers were not aware (I guess you could argue that last point is manmade bit it's arguably not the failure of the developer).

It's not impossible to write, but just extremely hard for anything that is not trivial.

For example take function that takes a byte as an input and has byte as an output. Moreover has no external or internal state. You can run through all 256 inputs and check the outputs.

However, if the function has 128 bits of input. You could not possibly enumerate all possible inputs and check the output. So it takes other methods to prove it's bug free. This can be hard to find, and there is no way to automate this in all cases. (Halting problem)

However, bug free code is not impossible to write. I might say a bug free system is more likely to be impossible since would involve perfect hardware that is also immune to interference, and the physical world can cause a bit to flip while performing a computation.

That code doesn't run independently. You need a compiler, compilers aren't bug free. You need other functions to gain it's input and functions to do something with the output - those might all contain bugs. Even if you're not targeting different OS's or even "POSIX's", and even if you're not targeting different CPU architectures, you still get subtle variations in AMD64 chips that can - if you're working low level enough - throw up some bugs. And if you're not working that low level then you have the entire runtime stack and their bugs to contend with.

If computing was still just flipping switches and waiting for LEDs to flash then you might have a point. But in real terms, it's just not possible to write 100% bug free code anymore. The real trick is eliminating all the bugs that people are likely to experience in normal operation and all the bugs that attackers might exploit. Even that is no small feat.

I think Dijkstra put paid to that argument ages ago. The program is an abstract that the compiler implements. If the compiler is faulty that doesn't make the program faulty.

It is like arguing that because a ship's navigation system was faulty the directions given for a destination were wrong.

To use his words we supply computers to programs, we don't supply programs to computers. or something to that effect.

https://www.cs.utexas.edu/~EWD/transcriptions/EWD10xx/EWD103... Paragraph beginning "But before a computer is ready to perform a class of meaningful manipulations ..."

I agree with you to an extent but the issue isn't so much who's to blame for unexpected behaviour but rather who's responsible for fixing it. You could make all kinds of reasoned arguments about who's to blame if a compiler optimisation or even outright compiler bug caused good code to behave badly under certain conditions but ultimately the end users don't care as they would still expect the application developers to fix the build scripts and/or project code regardless.

To use the ships navigation system example, if you have some kind of "auto-pilot" system that would steer a ship into central London when a bad destination was set, then you'd quickly demand the authors of the auto-pilot to write exceptions for bad destinations.

That's the ugly side of development; all too often us developers are having to write code to handle ugly systems. It's not fun but it's usually necessary.

But you can't make a program without interpreting a spec and making lots of assumptions about the world including the ones about the compiler, that may or may not hold. So you can't make a program guaranteed to be bug free for the world, only for your interpretation of the spec.

Bug-free code implies perfect understanding of a perfect spec, which is something that cannot exist in the real world. We don't have the capacity to interpret everything perfectly correctly or perfect understanding of the universe. We barely have enough understanding to even make a spec for something.

Luckily perfection and correctness are not important. While the desire to mathematically prove things may cloud judgement and waste time seeking mathematically perfect world that doesn't exist, we still can make almost anything work reliably on top of unreliable components with guaranteed probabilities. That's what resilience and fault tolerance are all about and how most of the world works. It's too bad that some of it is touched by this "let's do at least something for security" attitude and stupid perfect correctness ideology that of course never guarantees anything with any certainty, so we have planet scale security vulnerabilities that impact everyone to deal with.

> It’s not impossible to write, but just extremely hard for anything that is not trivial.

I think @laumars meant impossible in practice, not impossible in the strict mathematical sense. We can speculate that in theory it’s possible to write bug free programs that are large and complex, but nobody has ever done it. And not for flaky hardware or compiler reasons, all large software has many software defects, bugs in the design. It is not currently possible in practice to write a program like SQLite that is bug free, and no analysis of halting problem or number of input bits will tell us whether it could actually happen. Nobody even uses metrics like that for software stability because they don’t tell us anything practical.

> It's impossible to write bug free code

I've heard it quipped as every program has at least one bug, and every program can be optimized to be smaller, and so by induction every program can be reduced to a single, wrong, instruction.

Compiler optimizations are man-made. Esoteric hardware? Also manmade.

The point I was making is a separation between bugs due to human error and bugs due to unanticipated environments outside of a particular projects codebase. I don't think it's fair to class those as human error because the actual error doesn't necessarily lie with the developer.

You can substitute "manmade" with a better adjective if you wish.

I think you have to give up on the idea that you can fix ALL bugs all the time. But I don't think we need to be defeatist. Each layer of tests is an improvement. The benefits of unit testing, integration testing, comparison testing, and fuzz testing are pretty unambiguous. We can always add more layers that give us high assurances against certain classes of bugs.

Do we have systems to write programs that are free of dangling pointers? Yes. Double frees? Yes. Buffer overflows? Yes. Seg faults? Yes.

With regard to infinite loops, we cannot prove programs in general terminate or not, but we CAN prove that some programs terminate. If our program cannot be proven, we can adjust it until it can.

I guess this is similar to theorem proving. We have to be able to define what kind of bug we're talking about, and develop in a system that allows us to prove to ourselves that we no longer have that class of bug.

When I talk about a "system", I don't care what it is. Maybe it's a library or set of functions or API that doesn't allow for certain types of errors. Maybe it's a static analysis tool. Maybe it's a language. It doesn't matter, as long as you can have high assurances later that your large code base no longer contains a particular class of errors.

Does this get rid of all bugs? No, but it greatly reduces the possible ways that bugs can exist, and that's huge progress.

In what other field of substantial complexity is there any expectation that errors never occur? I don't think it's defeatist to accept that there will always be problems. It seems to me in any complex field (medicine, aviation, finance, etc.) the thing that maximizes human success is to design processes so that problems are exposed, treated, contained, and prevented from re-occurring as easily as possible. Looking for something shaped like "problems will cease to exist" seems like a distraction to me.

Software is different than any other field because it's supposed to stand on pure logic.

Errors due to hardware failing because of physical properties are expected and unavoidable, but a logical statement no matter how big is supposed to be provable to be correct (except maybe weird self referencing statements, but that's another topic). I think that's what makes our expectations higher in that field as opposed to any other ones.

'Algorithms' can be proved, 'implementations' necessarily need tests.

Algorithms, are mathematically able to be proved, but any real implementation has to involve physical hardware at some level, these implementation details cannot (reasonably) be proven mathematically therefore need sufficient tests.

In the absence of a proof, tests can cover both algorithms & implementations.


If that's true to some extent, is it because Mathematics is built much more slowly, with higher standards of "inclusion" of new elements? Not the same pressure to deliver.

> We might turn to theorem proving

Question: what typically happens when you attempt to prove a program that is not correct, especially a large program?

1. You come to a point where to proceed with the proof you have to show some particular thing, and it is apparent that you cannot show that. The nature of this snag does tell you what to change in the program in order to clear this block. I.e., the proof attempt finds actionable bugs in the program.

2. You come to such a point, but the nature of the snag doesn't tell you anything about why it happened or how to fix it. All you learn is that your techniques are insufficient to resolve the status of this program.

3. You reach a point where you have deduced all that you can deduce and made all the inferences that you can make, but that was not enough to prove the program correct. You haven't hit any particular concrete blocker though. You have simply run out of ideas. Maybe someone else can do better.

Question 2: when you have proven some program correct, how hard is it to adapt or update the proof when new features are added to the program?

If it takes a lot of work up update proofs, they may only be feasible for programs where the proved version can stay in service for a long time.

For programs where a given version can stay in service for a long time, I think we probably should be making more effort to prove them if they are used in areas of importance.

SQLite probably does fall into that category for most of its users and use cases.

I don't know.. but isn't time just the thing we need? The more a library gets used, the more bugs that get discovered.

So the bugs get reported, fixed and the test-set grows, and the software grows more stable?

If the api-surface area doesn't change too much, given the right practices, the software should only grow more stable, right? If bugs like these become rarer and rarer over time, then it's fine, no?

Seems that the bug was introduced in 2016 due to an optimization in ORDERY BY clause.

Code that doesn't have to evolve at all for years is very very rare, especially in a product that's used on evolving hardware and software architecture.

Responding to parent post, i don't see another solution than theorem provers if we want to pretend that we're doing anything reliable. But it's going to be a long long way...

> I remember about a critical piece of software with a proof no incorrect answers would come out. Turns out they forgot to prove the program would end, so they got no answer ever instead of an incorrect one.

This particular kind of problem is fortunately pretty well-known, though I won't pretend that it's not still a possibility with a lot of proof systems. In the case of SQL databases, we are fortunate in that the base language doesn't provide any mechanism for nontermination (besides WITH RECURSIVE, which most queries will not have). So we don't actually need to prove termination for most queries, but instead provide a proof of "termination-preserving refinment"--that given a query that was already going to terminate (or not), our optimized query will have the same termination behavior (as well as, if it terminates, providing "the same" results, up to the usual SQL caveats about row ordering). Now we'll still have nontermination, but only for queries that are inherently nonterminating (i.e. incorrectly written WITH RECURSIVE queries).

As far as "is this kind of proof possible at the SQLite scale": I think it is. The thing it's solving is the sort of thing that is at least reasonably specifiable ("given this SQL query and this data set, these are legal results" and "given these transactions, this is the effect on the permanent storage system"). It would be a huge effort (probably on par with CompCert at a minimum, but I'd bet on significantly more work), but I think it would be worth it: embedded databases like SQLite are very easy to drop into other programs and are used extensively, and one that was formally verified would probably be able to have immediate impact (by comparison, Sel4 or CompCert often aren't in a position to replace existing parts of consumer toolchains). I also think the database is in a position where many people would probably be prepared to sacrifice some of the hardest-to-prove optimizations in exchange for correctness guarantees.

> * Or simply admit we are human, fallible, and try to do the best we can while knowing our limits? Sounds defeatist. *

That doesn't sound defeatist to me at all, just realistic, as long as it isn't used as an excuse. After all, Sqlite is one of the best tested libraries available.

>I think most of us agree that SQLite is close to the gold standard for stability and testing.

One reason I really like DB that are basically thin layers on top of SQLite, like bedrock [1] and some other I cant remember on top of my head. May be one day we see the expanded role of SQLite.

[1] http://bedrockdb.com

Or that the basic building blocks/foundations are just terrible

I think you're referencing very specific type of bugs while overlooking a whole plethora of other ways a program might fail to follow expectations.

As long as we need expressively languages, it will be possible to write bugs (be that low level languages or even "scripting" languages). And as long as we have a diverse ecosystem of hardware and software stacks (and I hope we do continue to) then there will be bugs.

Bugs are unavoidable. The point of testing frameworks / fuzzing / etc is to reduce the number and severity of bugs.

However if you do have a method of writing 100% bug free code then I'd be interesting in you sharing it ;)

I recall reading somewhere that the code of SQLite also has some UB. Which makes me wonder, if they use runtime sanitizers, like ASan, UBSan, etc.

Might be this:


Looks like SQLite used the sanitizers, and a new tool still found plenty UB.

I believe the SQLite auther verified the compiler output for the UB found, and saw it had no impact in practice, so there was no critical bug yet. Then he fixed the UB anyhow.

Last paragraph was from vague memories, not from the article, BTW. I would welcome better links.

For the curious, this was the resolution: http://www.sqlite.org/cgi/src/info/206720129ed2fa88

> This problem was originally reported on the SQLite users mailing list. It took considerable effort to reproduce the problem and then boil it down to the repro script shown above.

The resulting script for reproducing the bug is indeed concise, but I almost want to read about the starting point, and how much of a struggle it was to investigate and deduce the factors behind this bug.

I find the infinite loop much less troublesome than the wrong data that apparently this bug may cause as well.

ARG. The sqlite mailing list manager sent me my plaintext password back in its "welcome" email.


Mailing lists traditionally rely minimally on passwords. The "password" field at http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqli... has this copy:

"You may enter a privacy password below. This provides only mild security, but should prevent others from messing with your subscription. Do not use a valuable password as it will occasionally be emailed back to you in cleartext."

This is arguably better security because you can leave the password field blank, and a random one will be generated for you. That protects non-tech people from accidentally using a valuable password for something that is not actually security-sensitive.

That is just terrible policy. Password reuse is probably the biggest security problem there is, and until we can convince every human on the planet to use a password manager, every person that has to store a password needs to do so in a secure fashion. Saying "don't use a high-value password" is terrible practice.

Clearly not everyone reads the small print and even if they did that doesn't really excuse it in today's internet.

Hashing passwords is not hard.

Hashing password has nothing to do with sending email in plaintext.

You can hash password and also email password in clear text at sign up, when you still have access to the password.

And it doesn't even make sense to send hash of the password back in email.

I wasn't talking about sending emails.

But... the conversation was about sending the password in emails.

Read the comment I replied to. It explains that the passwords are not hashed at all. They are not sent in an email and then hashed.

I was talking about the practice quoted in that comment of not hashing passwords.

Edit: To be clear it doesn't say it explicitly, but it says that the cleartext password is sent periodically and the only way to do that is to not hash it.

If it makes you feel any better, the Fossil project (which is by the same developers, and is used for SQLite version control) recently moved off their mailing list and onto a new web forum/email notifications system built into Fossil, which has better security (better than this, certainly). SQLite will eventually move to the same system.

That doesn't necessarily mean that it stores the password in plaintext. It can mail you the plaintext password based on your own input upon sign-up before storing a salted hash in the database.

You should consider submitting this to http://plaintextoffenders.com/

Along with every mailman setup ever.

If you're not using a default-generated password in your mailman subscription you're doing it wrong.

(mailman should really just get rid of passwords altogether for non-mod/admin users.)

I guess the test suite doesn't check for the halting problem yet.

(but seriously, it's great when a project is so stable that having a bug is newsworthy).

I wonder if a a usable subset of SQLite/sql92 would be amiable to to formal methods, or if that is still to big?

I think you meant "amenable" rather than "amiable" here.

Is it going to end up in 3.25?

Better title: "SQLite: Infinite loop due to the ORDER BY LIMIT optimization"

Changed from "SQLite: View Ticket". Thanks!

To start a tangent @dang: Do you really read every individual comment, or is there some pattern matching on specific strings like 'Better title'? If so, what are the patterns?

Just want to know if we can help you in any way. I'm in awe of moderation efficiency on this site.

There are too many comments to read them all. We rely on flags, emails, browsing the threads, and a few tricks. But not really pattern matching on strings.

I once thought it would be fun to have a control DSL for HN, such that people could plant phrases in comments to signal things to either moderators or the software. Like most ideas of that sort, it never made it over the 'too complicated' speed bump.

I contacted @dang asking to change the title since the edit period had passed. To his credit, he responded in less than 20 minutes, after hours.

My guess is that they can check who flagged a post, then just view those users' recent comments.

Thanks for fixing my mistake on the title :)

This probably needs a rename, "SQLite: View Ticket" is not a terribly descriptive title.

That said, it's a little surprising to see a bug of this type in SQLite which generally has such solid test coverage... Oh well, nothing is bug free after all.

> That said, it's a little surprising to see a bug of this type in SQLite which generally has such solid test coverage

I was just thinking that's the sign of an extremely mature system. No system will never be free of problems, but you will have less of them, and paradoxically, the problems you are left with have far more complicated and usually emergent paths to activation.

> usually emergent paths to activation

What do you mean by this?

In this particular case, the fault wasn't in any one component and was subsequently hidden and then revealed due to the complicated interplay between multiple modules. This is usually where case-by-case testing breaks down and you have to turn to other methods like fuzzing to elicit these behaviors.

I agree with the rename. I'd suggest

"SQLite (Severity: Critical): Infinite loop due to ORDER BY LIMIT optimization"

It's very arguable that the bit in the brackets is editorialization. On the one hand I added it so nontechnical manager types would go "wait, what" and forward the issue internally so techs learn sooner. On the other hand, the subject itself is bad enough that _most_ will figure it out anyway, and pragmatically speaking this probably isn't going to kill anything by not being fixed within the next 30.8 seconds, so...

Although this could be classed as a security vulnerability because now SQL injection can get you DoS. But the likelihood of the DoS being on a server is arguably low. Definitely nonzero but arguably low.

I think most readers of Hacker News realise that an infinite loop bug is pretty severe.

On the other hand this is the kind of bugs you'd expect to see when programmers rely too heavily on the tests to catch bugs.

It takes 20 lines to reproduce the bug minimally, and relies on a confluence of code-generator bugs. I believe most fuzzers would take a considerable amount of time to find a bug like this...

How are you suggesting a team on a very large project find this elsewise?

Looks like somewhere along the lines my comment got interpreted as criticism rather than merely an observation, that wasn't my intent.

Anyway I'm not suggesting they find it elsewise. In fact as you rightfully pointed out finding bugs like this is incredibly hard.

However surely those code-generator bugs could have been spotted earlier? The reason they weren't was because they didn't cause any obviously wrong behaviour, which made it hard to find them using test, but as long as you pay attention you can still spot them (as evidenced by the fact that they were eventually fixed).

So if you're wondering how a bug like this might have slipped past their 'extensive test coverage', I felt it was important to point out that putting too much trust in those test is also a viable explanation. Not necessarily the right explanation, as I have no way of knowing the situation at SQLite, but a possible explanation.

rely too heavily on the tests

Presumably they are to find it using something besides tests? Intuition? Casting lots? Code review?

I think the intention was to embrace formal methods, which "proves" the absence of bugs (to the limited extents though). For a heavily-used critical piece of code like SQLite the lack or weak use of formal methods might actually be a reasonable criticism.

EDIT: Not meant to say that the original comment was reasonable.

Surprisingly SQLite does actually embrace formal methods in various places.

Such as, in their draft goals [0], under S30200, they outline a memory allocator goal by referring to a proof.

Though, if the parents goal was to imply that the whole of SQLite should be verified... I'm not convinced its entirely possible. Quite a lot of SQLite revolves around things around where the halting problem may come into play.

[0] http://www.sqlite.org/draft/sysreq.html

> Such as, in their draft goals [0], under S30200, they outline a memory allocator goal by referring to a proof.

By that argument every implementation of quicksort (say) would be using formal methods. But they aren't. There are some definitions of formal methods allowing this kind of "weaker" form, but for the sake of this discussion such methods do not prevent a particular class of bugs. Explicit goal statements are, while being a very good direction, informal.

> Though, if the parents goal was to imply that the whole of SQLite should be verified... I'm not convinced its entirely possible. Quite a lot of SQLite revolves around things around where the halting problem may come into play.

This is probably correct and I don't recommend to do so. Even CompCert [1], probably the single most verified C compiler, is not absolutely fully verified and of course there had been bugs in the unverified modules (codegen AFAIK). But formal methods can be partially applied to produce provably more correct code (in some metrics).

[1] https://github.com/AbsInt/CompCert

What things that SQLite does revolves around anything halting-problem related? Almost all SQLite queries are over a finite amount of data and have a well-specified legal solution set that barely requires recursion; WITH RECURSIVE doesn't have to halt but still has a well-defined semantics where it does. I think even most optimization problems on SQL queries are decidable; you have to want to do quite interesting things (like prove query subsumption) before you get into undecidability territory.

I see two main areas where the halting problem comes into play:

SQLite involves extensive amounts of IO, it's a flat-file database most of the time, afterall. That is inherently unsafe, and can't be proven to terminate.

SQLite also has an internal VM, for which it generates code (the source of the bug in this article). The VM is Turing-Complete, hence you can't always prove that it will terminate.

> SQLite involves extensive amounts of IO, it's a flat-file database most of the time, afterall. That is inherently unsafe, and can't be proven to terminate.

Under reasonable assumptions about I/O (POSIX APIs and stuff), you can certainly prove termination of such operations.

> SQLite also has an internal VM, for which it generates code (the source of the bug in this article). The VM is Turing-Complete, hence you can't always prove that it will terminate.

You don't have to always prove any VM program will terminate. You just have to show termination-preserving refinement of the SQLite code generator (as I alluded to in my other post). You don't even need types for this.

Almost all things you'd want to prove about a program are undecidable in the general case. That doesn't mean much about the particular case of your program.

A computer program should be written to never* deadlock, and that implies that the programmers have a vague proof sketch in their head of why it doesn't deadlock that could if needed be formalized as a proof.

*I'm sure there is some exception out there but it doesn't matter for this argument.


It is about as cool and thorough as the test suite that is used for the software in the airplanes you use.

Just to be clear, are you arguing for less tests?

Applications are open for YC Winter 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact