
SQLancer, a tool to automatically test DBMSs to find logic bugs - jinqueeny
https://github.com/sqlancer/sqlancer
======
stelf
Incredible, commendable work ! Should be funded properly and praised by all DB
communities that benefit from it.

Indeed is curious as to how it runs against Oracle or MSSQL, but maybe
commercial guys should fund the work first otherwise they’ll just thanku-for-
the-job which actually would potentially save them millions...

~~~
mrigger
Thanks a lot! So far, we have considered only systems that are (partially)
available in the open. Two of these are mainly developed commercially, namely
CockroachDB and TiDB. We are glad that Cockroach Labs and PingCAP were indeed
quite thankful for our testing efforts (e.g., see
[https://twitter.com/JordanALewis/status/1249689838686818304](https://twitter.com/JordanALewis/status/1249689838686818304)
for CockroachDB and
[https://twitter.com/PingCAP/status/1260678075714600968](https://twitter.com/PingCAP/status/1260678075714600968)
for TiDB).

We will also consider testing DBMS with a DeWitt Clause in the future. For
now, we have indirect evidence that our approaches could find bugs in Oracle.
Specifically, we found a case in MySQL, which we consider a bug, but which was
closed by the bug verifiers, partly because Oracle computes the same - we
believe incorrect - result. Our bug report can be found here:
[https://bugs.mysql.com/bug.php?id=99182](https://bugs.mysql.com/bug.php?id=99182).

We believe that it is a bug, because:

* the invariants that we are testing using TLP, which is the approach that we used to find the bug, should hold for all DBMS;

* dropping the UNIQUE constraint would yield the result we would expect, and we believe that an index should never affect a query's result;

* an earlier version of MySQL computed the result we would expect;

* and DBMS that strive to support the MySQL SQL dialect, computed the result we would expect (e.g., TiDB).

------
j4mie
A list of bugs found using this approach is here:
[https://www.manuelrigger.at/dbms-bugs/](https://www.manuelrigger.at/dbms-
bugs/)

~~~
amadvance
Finding 179 bugs in SQLite in less than a year is astounding!

~~~
petters
It really is astounding. Given the amount of testing sqlite receives, one
cannot help but wondering whether writing a database is beyond the reach of
our current development tools.

~~~
WJW
Clearly writing databases is not beyond the reach of of current development,
since we have a bunch of them and they have been doing pretty well at serving
production workloads for literally decades.

Writing 100% _bug-free_ databases is probably not possible with current
development tools, but it might also not be required. Every extra "nine" of
reliability costs exponentially more, whether you're looking at uptime,
durability or any other metric. At some point the extra costs just don't
measure up to the extra correctness gained.

~~~
ableal
This work seems to have made much cheaper a couple of those "nines" for the
logic correctness metric.

------
muldvarp
This is incredibly cool, finding almost 200 bugs in SQLite is a really great
achievement. I'd certainly be interested in how often these bugs come up in
real applications.

~~~
mrigger
Thanks! Here is one interesting example of a bug that was reported on the
SQLite mailing list: [https://www.mail-archive.com/sqlite-
users@mailinglists.sqlit...](https://www.mail-archive.com/sqlite-
users@mailinglists.sqlite.org/msg117434.html). The SQL query that triggers the
bug is rather complex. Interestingly, it was generated automatically: "I might
not spell it like that myself, but a code generator would do it (and much
worse!). This example was simplified from a query generated by a Django ORM
queryset using `.exclude(nullable_joined_table__column=1)`, for instance."

When the user reported the bug, it had actually already been fixed on the
latest SQLite3 trunk based on one of our bug reports. The query in our test
case used a “NOT NULL NOT NULL” predicate, which is unnecessary complex as
well:
[https://www.sqlite.org/src/tktview?name=c31034044b](https://www.sqlite.org/src/tktview?name=c31034044b).
We believe that this is interesting evidence that even “obscure” bugs matter
in practice, for example, when the queries are automatically generated, as
demonstrated above.

------
schedutron
Amazing work; I was wondering whether SQLancer can be integrated with systems
like Cosette[1] and XData[2] in a useful way? These systems are different as
in they test the "correct intention" of SQL queries and are mostly engine-
agnostic.

Manual correctness checking of SQL is cumbersome and error-prone since
developers may write queries in several different ways, while using fixed
datasets (independent of the query) may fail to catch errors in them. XData
automatically generates small datasets designed to catch errors in the queries
(an example error is using a left-join instead of a full outer join).

Any ideas about how we can combine the utilities of SQLancer and XData /
Cosette? I was a research intern at IITB Database Group, who developed XData
and I still contribute to the project.

[1] [http://cosette.cs.washington.edu/](http://cosette.cs.washington.edu/)

[2]
[http://www.cse.iitb.ac.in/infolab/xdata/](http://www.cse.iitb.ac.in/infolab/xdata/)

------
grandinj
Neat! Perhaps you could run this against H2 (I’m a maintainer). Although best
to run against our current master, release is a little old.

~~~
mrigger
Thanks for your interest! I'd love to test H2, but we are currently lacking
the resources to do so. We've received a number of similar requests in the
last couple of days and weeks. For each DBMS that we test, we need to add some
logic to randomly generate databases and queries, since the SQL dialects
typically differ. I've opened an issue in SQLancer's issue tracker for now:
[https://github.com/sqlancer/sqlancer/issues/23](https://github.com/sqlancer/sqlancer/issues/23)

~~~
bmichel
The name SQLancer suggests that it is made for SQL databases. Do you think it
would be easy to adapt it for documents databases like CouchDB or MongoDB?

~~~
mrigger
I believe that the approaches that we designed could be implemented/extended
to document-oriented DBMS. Since I don't have any experience with those DBMS,
I have no intuition on whether the approaches would uncover any bugs.

Implementation-wise, supporting CouchDB and MongoDB in SQLancer would be
feasible. SQLancer shares little infrastructure between the testing
implementations for the individual DBMS, and thus imposes only few
restrictions on them. We would definitely welcome implementations for any
NoSQL DBMS in SQLancer!

------
castorp
The Postgres team has been using SQLSmith for that purpose for a while if I'm
not mistaken.

~~~
mrigger
Many DBMS development teams use SQLsmith and other random query generators,
which can effectively find crash bugs. In contrast, we designed and developed
SQLancer to automatically find logic bugs, which are bugs that cause the DBMS
to compute an incorrect result set (e.g., by omitting a record from the result
set).

~~~
tpetry
How do you design the tests? Do you think of complex cases and hope any dbms
may fail the test? Because this is would result in many tests having no
problem and sometimes finding a bug, which cant really be an effective way of
doing it.

~~~
mrigger
We automatically generate databases and queries. The main challenge is to
verify that the query yields the expected result. To this end, we have
designed and implemented three different approaches:

* Pivoted Query Synthesis (PQS): The main idea is to generate a query that is guaranteed to fetch a randomly-selected row, to which we refer as the pivot row. If the pivot row is not contained in the result set, we have detected a bug.

* Non-Optimizing Reference Engine Construction (NoREC): The idea is to transform a query so that it cannot be optimized effectively by the DBMS. By comparing the result set of the "optimized" query with the result set of the "unoptimized" query, we can detect bugs that affect the DBMS' query optimizer.

* Ternary Logic Partitioning (TLP): The main idea of this approach is that we can partition a given query into multiple "partitioning" queries, each of which computes a partition of the result, and to then combine these partitions to derive the same result as the original query. If the two results differ, we have detected a bug in the DBMS.

------
noja
Has anyone run this against Oracle?

~~~
purerandomness
Not possible as their licence explicitly forbids to "disclose results of any
program benchmark tests without Oracle’s prior written consent." [1] and a
listing of bugs found in different RDBMS by an automated approach would likely
be considered a benchmark.

Same applies for Microsoft's SQL Server.

[1] [http://www.oracle.com/us/corporate/pricing/olsa-
ire-v122304-...](http://www.oracle.com/us/corporate/pricing/olsa-
ire-v122304-070683.pdf)

~~~
satya71
I doubt those terms have any legal validity. Still, pointless. Any one paying
for Oracle probably doesn't care about bugs since they have support contract
/s.

