Hacker News new | past | comments | ask | show | jobs | submit login
Cosette: An Automated SQL Solver (washington.edu)
109 points by mpweiher on July 20, 2017 | hide | past | favorite | 21 comments

So can you brute-force queries until you find something faster, then verify it?

Would much rather a computer did that than me

Does this really use advanced features of Homotopic Type Theory? Does it really require the Univalence Axiom? (Caveat: I don't yet properly understand either of those two things.)

It seems to me that the representation of cardinalities as "Univalent Types" could just as easily be done using Decidable Setoids (eg the number 3 is represented by a Setoid with a decidable equivalence relationship that divides its members into 3 equivalence classes).

There are only four things in this system that require a notion of equality and/or equivalence: tuples, cardinalities, relations (as functions from tuples to cardinalities) and queries (as functions from relations to relations).

I think this could all be done with in ordinary plain dependent type theory, without any HOTT or Univalence Axiom (for example, in Idris, which is apparently incompatible with the Univalence Axiom). Each of the four notions of equivalence would be represented by a corresponding type function.

Have I missed something?

> Please register (if you are first time user) or login (if you have registered)

No, thank you.



Because it's awesome! You've never had to rewrite a query for some sort of reason other than the results returned (e.g. performance) and really wondered if the original and rewritten query were exactly the same?

I ended up building some tooling scripts to do this for stored procedures. It compares the output of the current and proposed stored procedures for all possible parameters and displays the differences and the set of parameters that it used.

It's not formal verification by any means, but it seemed to work pretty well in practice. It would take forever though. I'd start it and walk away for a couple hours when I wanted a thorough test. But my confidence in correctness and performance gain (it measured perf too) was well worth the effort.

Because it's a lot of work tuning complex SQL queries, and you can easily make a coding mistake that runs faster - but produces different results.

I'm assuming this solver only works on a particular subset of SQL like ANSI SQL. Applications are often coupled to one database by using exotic features and non-portable SQL. In fact, the more you need performance, the more likely you are to leverage these aspects of your database and prevent yourself from using a tool like this.

Thanks for the explanation guys. I totally forgot that there is something like SQL tuning and optimising.

also, no need to write test cases again if they can prove or disprove two queries.

You still need test cases to make sure the requirements have been satisfied. Even in the case where you know there is a problem with a query and show there is a difference with Cosette there is no guarantee that it will show the specific difference you are looking for. This is a fantastic tool but it doesn't replace testing.

fair enough. I was thinking more on the side of "this rewrite should be correct and indeed it is" by asking the tool to come up with a proof rather than doing extensive testing.

Cosette can prove equivalency but not correctness. Unit testing can't prove correctness either but it can demonstrate that specific conditions are satisfied.

If you have a query 1 that satisfies case a and b but then identify a bug and add case c you can write a query 2 which satisfies a, b and c and indeed Cosette may identify that case as the difference (or not!) but you still have no guarantee that you will not then discover case d which is not satisfied by 1 or 2 or a future 3.

Even if the two queries are provably equivalent, that does not mean that the database engine will produce the same results for the two. There could be bugs in the implementation.

Most of all, because if you can prove that two queries are equivalent in behavior, you are close to proving that one DBMS has the same behavior as another DBMS for all possible SQL queries.

And then you ask: Why?

Well, if you can prove two DBMSs equal for all SQL queries, you can prove that an optimization to the DBMS (such as an improvement to the query optimizer or new indexing systems) will not affect the behavior of any possible SQL you provide.

That's not at all what this does and it's not possible to do either.

That's what you can do with Coq. I can implement a reference interpreter and an optimized compiler and prove that the behavior of all possible programs are equivalent in both environments.

This is, what I assume, the long-term target of the article.

You'd need to prove that a DBMS executes a query always correctly to begin with. That in itself would be quite the task.

Additionally SQL as supported by some DBMSes (e.g. Postgres due to recursive CTE and window functions) is turing complete. That makes proving equivalency in general a bit challenging.

It'll be possible if both databases are written using proof assistants like Coq. And I think this work is a step towards that direction.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact