Hacker News new | past | comments | ask | show | jobs | submit login
How SQLite is tested (sqlite.org)
163 points by antonios on Nov 28, 2013 | hide | past | web | favorite | 37 comments

While that undoubtedly sounds very impressive, IMO tests should not be what you use to ensure that your software is correct — in my opinion, they can be actively a bit harmful by giving you a wrong impression of correctness.

Just take a look at https://www.sqlite.org/changes.html — after the recent 3.8.0, there were 2 bug fix releases separated by 3-4 days (!), with bug descriptions sounding relatively, hmm, trivial, should I say?

I use SQLite, and consider it a great software; just want to make a point about my own dislike of over-reliance testing. Hight-level languages and strong (automatically) provable invariants should be our basis for writing reliable software, not C with a ton of tests.

1. fixed a logical error in the code. High-level language won't help you if you invert a logical condition or commit some more subtle logical mistake.

2. Our software also needs to be fast. No one needs a slow database engine. There is no high-level language that beats C on tasks like writing sqlite.

3. Finally, show me that unicorn high-level language with strong, automatically provable invariants and one piece of widely used, reliable software that was written in it.

Sqlite works. Testing works.

If there actually was a credible alternative to C for writing software that is efficient and reliable with requirements that sqlite has, I can assure you that people would use it. The problem is: it doesn't exist.

> 3. Finally, show me that unicorn high-level language with strong, automatically provable invariants and one piece of widely used, reliable software that was written in it.

Not implying that this is true now, but this is the specific goal of Rust, with Servo.

With its current design trajectory, Rust won't be able to prove interesting high-level invariants. It's going to give you the kinds of things you can prove with traditional ML-style type systems and affine types. The combination of the two lets you encode some interesting things, e.g. typestates, but they won't let you express something like "this browser engine correctly implements CSS".

Contenders would be ATS and Nimrod. Neither fulfills the "one piece of widely used, reliable software that was written in it" requirement, though.

Airbus planes run software written in Ada. Ada does fairly well in the programming language shootout and is designed for writing reliable software.

Most high-level languages have the drawback that they usually cannot inter-operate with anything else, or if they can, it gets very ugly. This begins with C++ being impossible to consume from anything else than C++ and certainly doesn't stop with Java or similar languages. For a portable, low-level library that aims to exist nearly everywhere I doubt there is anything better than C.

And you'd need tests either way. With certain high-level languages you could reduce the need for valgrind and signed integer overflow tests perhaps, but nearly all other test (and thus error) classes remain. You will want to be safe from regressions, you will want to make sure that you can recover gracefully upon process termination, OOM or power failures. You also want to make sure that your SELECT doesn't yield different result sets than other database engines for the same data set.

All those cases are not going away, just because you use a different language. And proving things correct necessitates that your formal model accurately describes what you want to create. Bugs in said model will still be bugs in the application, even if you proved that the application conforms exactly (which still isn't easy).

Typical ivory tower hn blather honestly. I would love to hear a real solution instead of self aggrandizement and vague ideas.

So, how would you test database software?

This reminds me of an old Google(?) interview question.

"How do you test a Calculator after dropping it?" http://www.youtube.com/watch?v=ILkT_HV9DVU&t=46m50s

For all the possible things that could go wrong in a fancy graphing calculator, what should you do to ensure it works properly?

For interview purposes, I build a LEGO Mindstorms framework to push the buttons and check that the correct LCD segments light up. And wait several weeks while it runs through a sufficient number of iterations to give some assurance of correctness.

For real-world purposes, I see if it turns on and gives me 4 when I do 2+2. If it fails, I buy a new one off Amazon.

But is possible to create very serious bugs even with high level languages, and proving software in a way to provide good coverage is extremely hard.

Yeah - it's ludicrous to suggest provable constraints will ever really achieve anything close to the coverage these tests have.

Having said that, if you have the choice, it's still unwise to spend huge amounts of time writing+maintaining tests but then balk at the relatively short amount needed to use a strong type system. It won't catch lots of things, but it will catch quite a few. For the limited concepts it can express, it can ensure correctness more rigorously, easily and quickly than any testing can. Might as well use the easy method where you can, right?

But sqlite is old and extremely portable - the only language that springs to mind would be C++. And while that does provide advantages, it's a bit of a stretch to claim those would merit a rewrite.

If you don't actively maintain the tests, then yes you can get a wrong impression of correctness. But when done properly testing helps ensure a certain level of correctness and old bugs don't reappear. A good rule to follow is to create a test(s) for each bug and run the tests on every check-in which is pretty easy with continuous integration. This will mitigate most bugs before they reach production. And the ones that do reach production get fixed...forever.

This "old bugs reappearing" how does this happen? People make the same logical mistakes in new code? Old buggy code is literally resurrected somehow?

Making changes again at the same place where a bug was fixed previously is a common offender. At least at my workplace we have regressions every now and then (and thanks to no test suite those sometimes make their way to the customer, yay). Basically every time you touch code you have the potential to break something. Even more so when a bugfix was done by adding code, not by changing it. People might accidentally rewrite the fix. If you have no tests that ensure that functionality stays that way it can be easy to break things in the process.

From my days as a release/build manager, the most common cause of this was people merging code from branches where fixes had been overlooked and similar horrors. The second highest would be people fixing something that "looked wrong" but was actually there to solve a problem.

Yeah I should have probably used the technical term: regression. It is very easy to do. Sometimes it happens when code is rewritten/refactored. Sometimes a new feature utilizes a buggy code path that was not previously used/tested. Etc.

It's pretty easy to make the same mistake that originally caused a bug when rewriting code.

There are no automatically provable invariants that would cover SQLite's functionality. How does a tool automatically prove that a query compiler is correct, for example? I don't see an easy way, barring writing SQLite in Coq.

Your choice of language can help you avoid, or even completely eliminate, certain types of defect.

But no language is ever going to prevent you from coding the wrong intent into your application - your programming language has no idea whether (for example) you are intending to keep or omit a left join in your optimisation routine.

Wow. Impressive!

But what bugs me about sqlite is

a) suppressing errors giving surprising results (hint: try opening a misspelt database file)

b) "syntax error near 'as'": almost impossible to find errors in longer sql statements (are other engines better, giving line numbers?)

c) no full outer join https://www.sqlite.org/omitted.html which can be worked around by writing about the double code, thus giving chance for twice as much errors

I really love the simplicity of using it, though!

a - This way you'll still have valid sqlite context (object), which would contain the error message (and the memory allocated to it), but it would be in erroneous state. That could embed much more information rather than just returning an error code. c - My guess is that would put more memory pressure in terms of rewriting the SQL statement, which could explode more than what a typical sqlite user would expect (there is much tighter memory usage limit, and it could be even more limited).

e.g. SQLite is not trying to do everything, and does not encourage you to. I guess you basically wrap your style around it, rather than other way around.

My takeaway: if you want (meaningful) 100% code coverage, you might have to write 1,000 times more test code than actual production code.

That's been my experience in practice. For a critical piece of integration code I had about 100 times more code in the tests then in the function I was testing.

I love the simulated random failures in malloc and I/O, it's something I haven't seen at all in other software. Maybe some people do it, but I think it should be more common.

Simulating malloc failures is also very easy to do in Windows with the MS Application Verifier.

Anybody know how the defect rates in SQLite code compares to that of the NASA Shuttle dev team?

Shuttle software: "the last three versions of the program -- each 420,000 lines long-had just one error each. The last 11 versions of this software had a total of 17 errors. Commercial programs of equivalent complexity would have 5,000 errors."


This seems very suspicious! How can you know how many bugs a program has?

By how many people die due to software defect.

SQLite can get so many tests partly due to the relatively stable requirements. In an environment with fast-changing requirements, it is not likely to have so many test cases for regression tests. In this type of environment, what I observed is either 1) the team has no time to update the initial test cases and causing a large batch of tests not being run 2) team spends significant time to fix and debug tests, slowing down the progress in adding new cases

"More bugs have been introduced into SQLite while trying to get it to compile without warnings than have been found by static analysis."

I've seen (and done it) myself in other code - for example trying to shutdown integer-conversion code early on, I would (rather prematurely) decide what the type would be, and later not having the warning (as guiding light) I have to rediscover what I did and fix it back. Hit me couple of time when converting 32-bit -> 64-bit c/c++ code, while trying to shutdown the warnings (part of the project requirments).


This was on front-page before.

yep, and it should be required reading for anyone that hasn't seen it yet.

true, I have not realized that

Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact