SQLite is the most thoroughly tested codebase I'm aware of [1]. It has seven tim...

segmondy · on Dec 14, 2018

100% branch, line coverage means nothing. It's about logical coverage. What are you testing for? You are not testing lines of code, but logic.

SQLite · on Dec 15, 2018

Right. The actual standard is called "modified condition/decison coverage" or MC/DC. In languages like C, MC/DC and branch coverage, though not exactly the same, are very close.

Achieving 100% MC/DC does not prove that you always get the right answer. All it means is that your tests are so extensive that you managed to get every machine-code branch to go in both directions at least once. It is a high standard and is difficult to achieve. It does not mean that the software is perfect.

But it does help. A lot. When I was young, I used to think I could right flawless code. Then I wrote SQLite, and it got picked up and used by lots of applications. It will amaze you how many problems will crop up when your code runs on in millions of application on billions of devices.

I was getting a steady stream of bug reports against SQLite. Then I took 10 months (2008-09-25 through 2009-07-25) to write the 100% MC/DC tests for SQLite. And after that, the number of bug reports slowed to a trickle. There still are bugs. But the number of bugs is greatly reduced. (Note that 100% MC/DC was first obtained on 2009-07-25, but the work did not end there. I spend most of my development time adding and enhancing test cases to keep up with changes in the deliverable SQLite code.)

100% MC/DC is just an arbitrary threshold - a high threshold and one that is easy to measure and difficult to cheat - but it is just a threshold at which we say "enough". You could just as easily choose a different threshold, such as 100% line coverage. The higher the threshold, the fewer bugs will slip through. But there will always be bugs.

My experience is that the weird tests you end up having to write just to cause some obscure branch to go one way or another end up finding problems in totally unrelated parts of the system. One of the chief benefits of 100% MC/DC is not so much that every branch is tested, but rather that you have to write so many tests, and such strange, weird, convoluted, and stressful tests, that you randomly stumble across (and fix) lots of problems you would have never thought about otherwise.

Another big advantage of 100% MC/DC is that once they are in place, you can change anything, anywhere in the code, and if the tests all still pass, you have high confidence that you didn't break anything. This enables us to evolve the SQLite code much faster than we could otherwise, using relatively few eyeballs.

Yet another advantage of 100% MC/DC is that you are really testing compiled machine code, not source code. So you worry less about compiler bugs. "Undefined behavior" is a big bugbear with C. We worry less than others about UB because we have tested the output of the compiler and we know that the compiler did what we wanted, even if the official C-language spec didn't require it to. We still avoid UB, and SQLite does not currently contain any UB as far as we know. But is is nice to know that even if we missed some UB in the code someplace, it probably doesn't matter.

Drdrdrq · on Dec 15, 2018

Nicely written, and thank you for providing such a great peice of engineering!

A thought: would it help to have a modified C compiler that would crash the app whenever UB was encountered? It might help find some bugs where non-default C compiler was used (which I assume happens, given the large amount of platforms sqlite supports). Or am I missing something?

cyan_pencil · on Dec 15, 2018

There is ASAN, the address sanitizer. You can enable it by passing some flags to gcc. It will make your program crash as soon as there is an out of bounds read / undefined behaviour. If debug symbols are enabled, it will also tell you which line of code was responsible. It can save you countless hours of debugging

Leszek · on Dec 15, 2018

I believe there's some things your have to do such aren't C compatible, e.g. store fat pointers of base+length+offset instead of raw pointers, to catch OOB accesses.

drorh · on Dec 20, 2018

I would not help because modern compiler treat UB as an optimization opportunity, including a license to do whatever they want (even elimination of code).

_0w8t · on Dec 15, 2018

That 100% branch coverage does not include indirect calls via functional pointers or jumps to signal handlers caused by devision by zero or invalid memory access, right?

krick · on Dec 15, 2018

True, but SQLite still is one damn well-crafted codebase, that has been explored by thousands of pairs of eyes over time. And that's the point.

yladiz · on Dec 14, 2018

Isn't that the definition of branch testing, to test all possible branches within code and also testing the logic in all of those branches?

gregmac · on Dec 14, 2018

Consider a line like:

    value = 1 / input;

You can get "100% coverage" if you test that with `input = 1`, but unless you check with `input = 0` you're missing a quite important logical check.

breckuh · on Dec 15, 2018

Can't you just have a NonZeroInt type?

whyonearth · on Dec 15, 2018

Only if your language supports dependent types.

tonyg · on Dec 15, 2018

It's perfectly possible in languages with ordinary ADTs.

  data Nat = Z | S Nat
  data NonZeroNat = OnePlus Nat
  data NonZeroInt = Negative NonZeroNat | Positive NonZeroNat

jwatte · on Dec 16, 2018

C isn't one of those languages, though.

tonyg · on Dec 17, 2018

The analogous would be the Go-style represent-a-sum-badly-as-a-product,

  struct nonzero_t {
    int is_negative;
    unsigned int one_less_than_the_absolute_value;
  };

which, under interpretation, ranges from -(2^32) to -1 and +1 to +(2^32).

krylon · on Dec 15, 2018

In Ada, you can define integer types that only accept a given range of values.

akvadrako · on Dec 14, 2018

Not really - internally inputs 0 and 1 use different branches.

RussianCow · on Dec 14, 2018

That's not a branch; otherwise you would have an infinite (or impossibly large) number of branches for just that one line of code. A branch is when you execute one set of code upon a given condition, and another if that condition is not met.

akvadrako · on Dec 15, 2018

I didn't say every number is a different branch. But on many processors, divide by zero triggers an interrupt. That's semantically the same as a branch.

qznc · on Dec 15, 2018

It depends on the language. In C it is not a branch because division by zero is undefined and not a path you consider. In Java you can argue that there are two branches. One branch that throws an exception and one that does not.

zaphar · on Dec 15, 2018

No coverage reporting library will attempt to tell you that kind of coverage. You are essentially in violent agreement with the op but turning it into an argument by using different words for the same concept.

thaumasiotes · on Dec 15, 2018

Testing all possible branches (each considered individually) won't get you very far in terms of testing all possible logic flows.

Consider a function which checks 5 simple if-statements in a row, always in the same order. Getting each branch means you tested 10 things.

But there are 32 ways for 5 if-statements to jointly evaluate. If there is a logical dependency between the state checked by one if-statement and the state checked by another one, your perfect coverage may not pick up on that.

If the if-statements might be checked in an arbitrary order... there are 120 ways to order 5 things. But you'll still get perfect branch coverage by checking 10 of them.

archgoon · on Dec 15, 2018

  int returns_less_than_twelve(int input1, int input2) {
    int sum = 0;
    if (input >= 5) {
       sum += 5;
    } 
    if (input2 >= 5) {
      sum += 8;
    }
    return sum;
  }

The following test cases will pass and achieve 100% branch coverage.

  int x = returns_less_than_twelve(5,0);
  test_assert(x < 12);
  int y = returns_less_than_twelve(0,5);
  test_assert(x < 12);

However, this does not cover the entire input space so

  returns_less_than_twelve(5,5);
  test_assert(x < 12);

Will fail. However, branch coverage won't tell you that there's a hole in your test coverage.

Generally however, writing full branch coverage will find a lot of issues, and also cause you to really think through how your code works; but still, it doesn't guarantee correctness. If you want that you need to start bringing tools that either exhaust your input space (a function which takes 5 booleans can be exhaustively tested for correctness in trivial amounts of time), or you start modeling your chosen language well enough that you can use a mathematical prover to demonstrate that your program or function is safe on all inputs.

This of course requires you to come up with a definition of 'correct' or 'safe'. For the above program, it's clear how to define correctness. For things like "Don't let an unauthorized individual access this data or data that is derived from it in a way contrary to the desires of the owner of said data" it gets 'tricky' ;).

gcc_programmer · on Dec 14, 2018

You test all possible executions of your program when you test with all possible data inputs, which is infinite.

seanwilson · on Dec 15, 2018

You can do this with machine verified programs. It's like proving a maths theorem; you don't check it for all values but you create a robust argument that it must be true for all values.

paulie_a · on Dec 14, 2018

That is why I like using random data generators for tests. You can input some static data and then the rest is random. Every once in a while a bug pops out when you see a test fail that was previously passing.

regecks · on Dec 14, 2018

SQLite is heavily subjected to fuzzing already ... maybe this vulnerability was discovered that way.

snaky · on Dec 15, 2018

And the probability to find the corner case ('input = 0') for the simplest expression ('value = 1 / input') this way is not quite astounding.

jcden · on Dec 15, 2018

None of the above methods are used for testing. You use boundary testing, branch testing, equivalence partitioning etc. Random data is not a good method of testing.

jakevn · on Dec 15, 2018

Except for the fact that it is exactly the method that has been used to discover a large number of critical bugs in the most popular OSS projects (including SQLite): http://lcamtuf.coredump.cx/afl/

Frizi · on Dec 15, 2018

Fuzzing isn't really practical if all you do is just generate a totally random bit stream for input. There are many much more clever and robust strategies to hit as many edge cases as possible. Check AFL[1] for some details on generating smart random input files. You can also combine that with pretty advanced dynamic execution analysis to fuzz against unknown processor instruction sets, like in sandsifter[2].

[1]: http://lcamtuf.coredump.cx/afl/

[2]: https://github.com/xoreaxeaxeax/sandsifter

tptacek · on Dec 15, 2018

On the contrary, for years, the most prolific fuzzers basically did just generate random bitstreams, and that technique will still find vulnerabilities in all the memory-unsafe software that hasn't been fished out by those same dumb fuzzers.

paulie_a · on Dec 17, 2018

Sorry, I strongly disagree. Random data with a few static arguments is an incredibly great way to test. Adding in some chaos finds bugs. "why did that test fail after 100 times...ohhhh"

I try to only use random data when possible, less and smaller tests to write with a proper setup. End result: more bugs found.

Random data is a great method of testing.

mholt · on Dec 14, 2018

This is exactly why I don't find code coverage tools useful. Now, a tool that can show what the tests assert against? THAT would be useful.

haney · on Dec 14, 2018

What you’re describing sounds very similar to “mutation” testing. https://en.m.wikipedia.org/wiki/Mutation_testing

nedbat · on Dec 15, 2018

That would be interesting. In what form would you express the result?

blattimwind · on Dec 14, 2018

SQLite can by principle not suffer from a RCE.

_pgmf · on Dec 14, 2018

Not sure why this is being downvoted, but you're correct. Now, a networked application that exposes some level of access to sqlite? That's another story. The question I think we all are asking is just how much "leg" does sqlite have to show to be vulnerable?

tptacek · on Dec 14, 2018

It's a pretty silly definition; it's like saying PDF or JPEG parsers can't be vulnerable to RCE, when they are in fact major vectors for RCE attacks.

ATsch · on Dec 15, 2018

I think the reverse definition is just as silly... Calling a JPEG parser vulnerability an RCE just because some online service is using it in a way that can be exploited remotely. By that definition, any bug is an RCE, since I can just set up a web server to run that program.

I think a better way of looking at it is that it's an ACE Vulnerability in the e.g. JPEG parser that causes an RCE in the Online Service.

Or, in this case, an ACE vulnerability in SQLite that causes an RCE in Chromium.

tptacek · on Dec 15, 2018

Sure, though what I'd say is silly is epistemological conceit of trying to pin down vulnerabilities as "remote" or "local". A lot of vulnerability research terms are silly (sillier than RCE). Either way: it's a "term of art", and it means what it means, and this is a clear and obvious instance of an RCE.

schoen · on Dec 14, 2018

I assume people making this distinction are thinking about "network services that the public can compromise by interacting with them over the Internet" vs. "software that someone can compromise by getting it to accept a malicious input". But I agree that "RCE" is commonly used for both; otherwise we would have to maintain that browsers don't suffer RCE vulnerabilities because a malicious document is no longer "remote" once the browser has downloaded it.

eli · on Dec 15, 2018

Sure, but I very frequently parse PDFs and JPEGs from untrusted sources, but almost never open untrusted .sqlite files.

(This is still a serious security vulnerability)

tptacek · on Dec 15, 2018

It's an RCE in Chrome.

eli · on Dec 15, 2018

Ok I see your point

armitron · on Dec 15, 2018

Risk is transitive.

pedrocr · on Dec 15, 2018

> but almost never open untrusted .sqlite files

You may not notice that you do when apps use sqlite as their file format:

https://www.sqlite.org/appfileformat.html

tlrobinson · on Dec 15, 2018

I don't know. I'd say PDF or JPEG parsers (and SQLite) can have arbitrary code execution vulnerabilities, which can in turn be responsible for remote code execution vulnerabilities when used in network-connected software.

e.g. SQLite has an ACE. Chrome has a RCE (which is SQLite's fault).

tptacek · on Dec 15, 2018

If what you're observing is that industry lingo is suboptimal, you'll get no argument from me. Consider for instance "XSS" and "CSRF", which are just manifestly silly names. But the names mean what they mean; try as I might, I can't get people to accept "Javascript injection".

burfog · on Dec 15, 2018

The actual industry term is just "code execution", or maybe "arbitrary code execution" if you want to get more specific than is typically worthwhile, not "RCE".

Usage example: "I got code execution!"

tlrobinson · on Dec 15, 2018

I’m observing there are reasonable terms for both the vulnerability in SQLite (https://en.m.wikipedia.org/wiki/Arbitrary_code_execution) and the vulnerability in Chrome due to the vulnerability in SQLite (https://en.m.wikipedia.org/wiki/Remote_code_execution) and wondering why we can’t just use those?

tptacek · on Dec 15, 2018

I don't know what to tell you. Try this: Google [browser rce], and then [browser ace] (or [browser ace vulnerability] or whatever). It'll be immediately apparent what the term of art for drive-by code execution vulnerabilities in browsers is.

I sort of intellectually in the back of my head know that "arbitrary code execution" is a term that has been coined and used in the past, but I don't offhand know of anyone that uses it (among other things, it's kind of redundant). "Local only" code execution vulnerabilities aren't "LCE", but rather (usually) "privilege escalation".

tlrobinson · on Dec 17, 2018

In both my comments I explicitly said that vulnerabilities in browsers can and should be called RCEs. I was only arguing about what to call vulnerabilities in the underlying libraries (like SQLite) which aren't inherently exposed to "remote" data/manipulation.

Say for some reason someone used an exploitable version of SQLite in a program that had the setuid bit set. You wouldn't say SQLite had a privilege escalation vulnerability, would you?

ynniv · on Dec 15, 2018

They're only vulnerable to RCE if image data can be supplied remotely. What's the analog here? Accessing the JavaScript API? Specifying a query string? Maliciously encoded data? Some of these are scarier than others.

tptacek · on Dec 15, 2018

This isn't, like, a real debate. Go here:

https://pwnies.com/

Start with the 2018 nominations but feel free to check the archives. Drive-by browser vulnerabilities are RCEs.

blattimwind · on Dec 15, 2018

> Drive-by browser vulnerabilities are RCEs.

I would never argue they aren't, but by this logic ("it's like saying PDF or JPEG parsers can't be vulnerable to RCE") virtually every code execution vuln in a library can be called RCE. I haven't noticed this to be the case with e.g. libtiff vulnerabilities (of which many make it into my inbox regularly), although image libraries are one of the cases were CE = RCE is still fairly reasonable.

Let's assume this SQLite bug is only exploitable if you can input arbitrary SQL. Almost no applications use it this way (except Chrome). I think it's clearly unreasonable to call it a RCE in SQLite then.

tptacek · on Dec 15, 2018

Uh, no, for example, it could also potentially impact any application that uses sqlite as (part of) a file format.

stefan_ · on Dec 15, 2018

In that sense, soon USB kernel stacks have remote code execution vulnerabilities because browsers added dumb APIs.

Should we fix bugs? Yes. Should we scream at people that expose raw APIs they don't understand far beyond their design constraints? Yes yes yes.

ynniv · on Dec 15, 2018

I think you're assuming the target is a browser, but my question was how this might affects servers. Does the attack use malicious SQL statements, API calls, or encoded data?

_lflx · on Dec 15, 2018

Technically, this attack is actually is two separate attacks in a chain. The first node in the chain is delivering malicious SQL. The second node is executing code remotely via SQLite. The proof is that SQLite or the application linking it could have mitigated this attack independently by either filtering the query string or better protecting the memory which is being written to.

In practice, however, the community gets more bang for their buck if they label the SQLite code execution vulnerability as an RCE since the vast majority of use is in a networked setting. You have to remember the audience used for these terms. They aren’t scientists in the traditional sense where taxonomy is highly aligned the ontology — instead, the labeling serves the operators with metaphors that depart from reality insofaras they increase security engineers ability to do their job effectively.

LolNoGenerics · on Dec 15, 2018

Sql databases are usually behind an application layer. Still they can suffer from RCE. Sqlites model is no protection.

yzmtf2008 · on Dec 15, 2018

It's no more "by principle" secure than any other SQL server bind to localhost only, so I'm not sure what you meant by it does not suffer from a RCE.

int_19h · on Dec 15, 2018

It is, actually. When something binds to localhost, there's still potential for privilege escalation vulnerability, since any process can connect to the port - so if there's an exploit, a low-privileged process could hijack a higher-privileged one. Localhost sockets are still a security boundary.

Since SQLite in and of itself is just a library, it doesn't have that problem. You have to expose it to untrusted inputs manually somehow (e.g. by setting up a socket).

snaky · on Dec 14, 2018

Especially considering you could generate faster C code than written by hand. With Proofs, Without Compromises http://adam.chlipala.net/papers/FiatCryptoSP19/FiatCryptoSP1...

nichochar · on Dec 14, 2018

Have you seen the language Esterel?

snaky · on Dec 15, 2018

My impression is it's rather low-level verification and modelling language comparing to the platforms like Kami (https://deepspec.org/entry/Project/Kami)

nyc_pizzadev · on Dec 14, 2018

Or lets wait and see what the bug is and why it wasn't detected in the testing coverage.

xtrapolate · on Dec 15, 2018

> "If even SQLite can have a RCE vulnerability, I'm convinced that it is not feasible for anybody to write safe C code."

This has much less to do with C, than it has to do with the fact that sqlite is a huge codebase.

Software can be vulnerable regardless of the programming languages used.

nobody271 · on Dec 15, 2018

You had me up until that last little bit at the end. No where did you make any reference to C until the end. You're kind of putting the cart before the horse aren't you?

jeklj · on Dec 15, 2018

To be a bit glib, unit tests don’t catch security vulnerabilities. Maybe I’d agree this can happen to any project, but my example might be something more like OpenBSD

hyperman1 · on Dec 15, 2018

Why not?

In this specific case, a unit test that checked this integer overflow seeems to prevent the vulnerability.

To be clear: This is not to admonish sqlite. They have taken testing further than any other project i've heard of, except maybe the NASA software that might cost lives if it fails.

hashhar · on Dec 15, 2018

Incidentally a lot of NASA tools use SQLite as well from what I have heard.