Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
CreuSAT: Formally verified SAT solver written in Rust and verified with Creusot (github.com/sarsko)
315 points by ingve on June 17, 2022 | hide | past | favorite | 72 comments


In addition to the project itself being very cool and useful (you can count the number of verified SAT solvers that can solve most problems in modern SAT competitions on two fingers!), it's also an impressive demonstration of its toolchain, which essentially exploits the fact that the purely safe fragment of Rust (augmented with a handful of types that use unsafe code like Vec, which I believe have been proven compatible with the rest of Creusot in another recent paper) behaves like a functional language. This allows borrow-checked Rust code to be treated purely functionally in safe contexts, which allows exploiting the large amount of existing software tailored to proving purely functional specifications, without having to worry about memory-related side conditions, separation logic, etc.

This has always been the hope for Rust verification tools (that Rust would scale to these kinds of large proofs better than other industrial imperative languages due to the restrictions of the borrow checker), and I think this is the first real practical indication that that is indeed the case. Both in terms of verification time and additional verification code, it weighs in at a small fraction of that of the only other verified solver with comparable performance on SAT benchmarks (IsaSAT), which is extracted using code generation from Isabelle (which is a full proof assistant), while other attempts to produce verified SAT solvers directly from imperative code fall far short of its performance because they do not implement any of the difficult optimizations (presumably due to the amount of extra proof effort required).

Incidentally, another paper has come out at basically the same time making a similar observation, Aeneas: https://arxiv.org/abs/2206.07185, which uses a resizable hash map as its case study. Since it just came out, it does not have any larger proof developments yet that could be used as comparison, and it has somewhat different goals, but overall I am also excited about it and am very hopeful for the future of verification tools that integrate directly with Rust code like these two, as I think they are one of the more promising avenues towards the regular integration of verified code with more "ordinary" projects.


Does the proof encompass some sort of proof of amount of RAM consumption in it, presumably bounded by some characteristic of the problem presented?

Regardless, this is very impressive, and one of the first things to raise my eyebrows w.r.t. formal proofs being possibly useful in the real world in a long time. If safe Rust can function as the bridge between the "practical" world and the world of proofs, that alone would be enough to justify Rust's existence.


It does assume enough RAM, as it has no way of knowing how much RAM the target machine will have. It is thus correct with regards to the model specified by Rust, but a program which is proven correct in Creusot may get an OOM-error. A note here is that it will not provide the wrong answer due to insufficient memory, but it will crash.

Proving some upper bound on the amount of memory used (and potentially gracefully exiting) should be possible, but I have not prioritised it, as OOM has not been an issue.


"has no way of knowing how much RAM the target machine will have"

That's why I phrased it in terms of a bound based on some characteristic of the problem.

But I was just asking for information. I know it's a well-known problem. To be honest the fact that we're even seriously discussing this as something that may be doable is a huge step forward. AIUI, previous state of the art on real languages would have this as hopelessly pie-in-the-sky.

I think in many cases memory bound is uninteresting; one could examine a vector implementation and be satisfied that it won't explode unnecessarily. After all, we do exactly that on non-proved implementations without too much fear all the time. SAT just came to mind as a problem where that could be an interesting thing to be able to prove.


No, it does not. This is mentioned as a limitation in the paper (it does, however, handle things like integer overflows correctly). I believe with some suitable extensions, it would be possible to prove invariants like this as well, but IMO functional correctness is a pretty good start!


That's analogous to the halting problem.


It would be cool if the unsafe aspects of Rust that are necessary (like Vec) could themselves be formally verified, like when each edition of the language gets compiled or something. I know some languages include proof tools but they're not super popular, and Rust's compiler helps a lot. Still, it would be really cool to know, or at least know that you can't know. LinkedList is another example. There's no way to really write a linked list in the rust friendly way, so you have to use some boxing. All of that is fine, it probably works, but what if I forgot later, and changed something about the implementation, introducing bugs around edge cases or what not.


This process is actually already underway, notably with the RustBelt project: https://plv.mpi-sws.org/rustbelt/. It has not yet achieved the kind of tight integration with the source language that stuff like Creusot has, but everyone would like it to move in that direction :)

For the specific case of Vec (and a few other types--e.g. Cell, Mutex, etc.), as I mentioned, a recent paper by xavxav actually proves that the important parts of the unsafe implementation of Vec do actually fulfill the signatures that are used by the CreuSAT proof. This is part of https://people.mpi-sws.org/~dreyer/papers/rusthornbelt/paper..., which I believe won the distinguished paper award at this year's PLDI!


This discussion is also interesting wrt the rust verification: https://news.ycombinator.com/item?id=30174827 . Creusot author did comment on it.

Seems like there finally will be alternatives to Ada SPARK.


Oh wow. So efforts like this can actually eventually trickle down to more regular programming?


Well, that's the research question! So far signs are promising, IMO. I think this is at least the best chance formal methods have ever had at being relatively easy to integrate into larger codebases--Rust's type system is strong enough to support nontrivial verification conditions, and effectively encapsulate them, without needing to directly deal with "exotic" logic, heavyweight encodings, pointer obligations that need to be manually discharged, or linking to code generated by some external language (a feature it shares with functional programming languages, as well as "hybrid" linear/dependently typed lanugages, like Dafny or ATS). Unlike functional or dependently typed programming languages, however, Rust has a lot of adoption independently of its suitability for formal methods, meaning that ease of integration has a good chance of benefiting projects that don't have correctness as a principle requirement. For example, assuming that a useful crate can be extracted from this project, it would be quite easy to integrate into a video game like Veloren (e.g. to help improve some of its AI heuristics), which is about as far from traditional high-assurance software as you can get!

If we can make using high-assurance software as easy (or almost as easy) as non-high-assurance software, without sacrificing huge amounts of performance to do so, it will be a major step in making software more reliable for everyday programmers, rather than restricting these kinds of techniques to critical libraries like cryptographic primitives where correctness and performance justify pretty much any number of proof engineering hours. This doesn't mean everything would be verified, but with more and more verified "building blocks," less and less engineering time would be spent debugging third party code (which at least IME, is usually much more painful than verifying code at the top level). We have a long way to go to get there, but projects like this are an excellent start.


Why are verified SAT solvers interesting?


SAT solvers often form (part of) the backbone of other proof assistants and verified software and hardware, which makes them producing correct output quite critical. At the same time, production SAT solvers (the ones people actually use on the critical path of these proof assistants) are very performance-sensitive, which means tons of optimizations that are nonobvious, subtle, and in some cases not obviously memory safe. This makes them ideal candidates for formal verification.

Unfortunately, in the past, the proof effort required to produce a competitive SAT solver was so high that this was considered infeasible, so instead the industry has been focused on non-verified SAT solvers that produce efficiently verifiable certificates for their solution that can be checked by an independent verifier. This was an important improvement that has allowed SAT solvers to continue to evolve, but (as mentioned downthread) they have disadvantages compared to a solver that can be fully trusted without producing a certificate, including in "forcing" people to use techniques that produce proofs which we know how to efficiently verify.

What this project promises is an order of magnitude reduction in proof effort to achieve and maintain an extensible, competitive, verified solver, which may help change the above calculus to favor verified solvers. IMO, it is quite an exciting development, even if it does not pan out!


Maybe I'm missing something blindingly obvious, but why is "trusting" a SAT solver necessary in the first place? The solution itself either satisfies the constraints or it does not. Are there situations where we can't even tell whether the solution is correct by looking at the problem instance?


For positive solutions, you are correct--verifying the solution is incredibly easy. For negative solutions (UNSAT), however, this is not the case--UNSAT is in Co-NP, not NP. Finding an efficiently verifiable representation of the UNSAT proof can be very challenging, which is part of why solvers are given much more time to produce and verify certificates than they are to solve the problem in the first place.


Right, unsat is the interesting case where you especially want a proof of the SAT solver.

Unsat can be used, in turn, to prove other things. If you want to prove that some condition C is always true in a set of expressions, just ask the SAT solver to solve for "not C". If the answer is UNSAT, then C is always true... if the answer UNSAT is actually correct.


Ahh, I see now what you mean. Thanks, that makes much more sense to me now.


Thanks. Do you have a link or reference to more material on the certificates. I dont understand a bit, but I want to learn more.


I will just steal from the paper's author (Sarek) and point you to the first two talks and lecture notes by Randy Bryant at this link: https://fm.csl.sri.com/SSFT22/#splogic.


> on two fingers

In which base? Can I use my knuckles?


Base 1, no knuckles required :)


This project uses one-way translation of Rust code into WhyML through Creusot. This translation is not formally guaranteed to be correct or semantics-preserving.

Ideally, the code produced by Creusot should be translated back to Rust by a tool from Creusot or someone else. This way it is possible to use fixed point iteration, just like AST pretty-printer and parser for programming language syntax.

This is more or less an approach taken by verification team of Buran (USSR's space shuttle) software. They took Fortran code and compiled it into executable code. The executable code was then manually translated back into Fortran and discrepancies analyzed. This way they found several errors in Fortran compiler. Buran made fully automatic landing in late 80-s.

Contemporary verifiers should do the same automatically.


Indeed, Creusot itself is not formally verified, but we have verified the metatheory in our paper RustHornBelt: https://people.mpi-sws.org/~dreyer/papers/rusthornbelt/paper.... Actually verifying the translation that Creusot produces would definitely be interesting future work, but I'm not sure it's where the best 'bang for the buck' is found right now, though I try to ensure the underlying translation would remain simple to formalize.


Fascinating anecdote. Where can I read more about this?


I don't know where you can read more. There are resources in Russian, but buran.ru is not quite functioning right now.

I remember that story from the time when I worked with guys from Russian CQG team, who, in turn, worked in CQG with the member of the Buran's software engineering team. It was 15 years ago.


Is it guaranteed that Rust -> WhyML -> Rust Would ever yield a fixed point?


It is possible to produce a dump of Rust program AST into WhyML datatypes and/or functions and project and prove what can be proven.

The structure of WhyML datatypes and functions can be parsed and transpiled back into Rust.


In theory no. In practice probably (Rust compiles itself to a fixed point from older versions in around 3 iterations, IIRC).


Hi! I’m the author of Creusot, Sarek did some amazing work on CreuSAT, he also had to teach himself formal verification and SAT solving along the way.

If you have any questions about CreuSAT or Creusot, I’d be happy to answer


At first glance I didn't see a license in the repo. Am I missing it?


Ah, I knew I forgot something!

Thanks, I licensed it under the MIT license now:)


Thanks!


Some usage examples and performance comparisons against other solvers (e.g. minisat, varisat) would be nice.


Check out Sarek’s thesis in the evaluation section: https://sarsko.github.io/_pages/SarekSkotåm_thesis.pdf.

In general it’s roughly at the level of microsat so worse than minisat and kissat, but not garbage either.


How does it compare to kissat in terms of speed?


There's a whole undiscovered country of writing software that is both mathematically sound and intuitively sound. I had a long list of hopes and dreams in the late 90's of what I'd hoped software would be by now. I got about half of my wishes, and some of them were weird for me because I went from bitter struggles to get even lip service to some of those things, to two jobs later having them taken for granted. It's weird to brace for push back and then have people nod and change the subject.

This is one of the wishes I have for 2040, that we continue to prove the ergonomics of type theory and escape analysis, and that having succeeded there, we start normalizing other domains of provably correct code. If you can do that for SAT solvers, maybe we'll start writing other kinds of AI in such a manner.


I have wondered at times if the reason why we don't have those things is because, in such an environment, 20th century management might not work. Basically, "the curse of lisp", although I don't mean lisp specifically in this case, but rather languages and tooling that unleashes the power of formal verification and, in a larger way, the power of math.

It's conceivable that in an alternate universe an individual engineer's productivity could be so high, because of math, that one developer would replace a whole team from our universe, and bugs would be extremely rare and mostly restricted to UI. From that point of view, the theory says, it's harder to build teams and these super charged developers are natural information silos. Or, in other words, it feels like the tech choices made have demonstrated a slight preference of less efficient developers that all understand a shared problem (e.g. this team is responsible for testing, or this team builds integrations with APIs, or another team parses unstructured data into structured data), over technology that would aid or formalize many aspects of those kinds of problems and others.

Also, in such a world, I feel that software engineering would be much more abstract, and it probably wouldn't be approachable to those who aren't legitimately fascinated and interested in this stuff. Sort of like how I've never met anyone in my life who studied Math just to get rich, even though you can become phenomenally wealthy if you do.

More likely though the answer is complicated and it's a mix of historical accident, what I mentioned, and a slew of other factors. Nonetheless, I too think it would be so cool if I could spend my time writing programs that write programs that write programs, all guaranteed to match how I stated the problem's invariants. I could spend the majority of my time really understanding the requirements and assumptions, as opposed to actually implementing the features.


>It's conceivable that in an alternate universe an individual engineer's productivity could be so high, because of math, that one developer would replace a whole team from our universe, and bugs would be extremely rare and mostly restricted to UI.

I think this only applies for strictly theoretical problems and code golf, but not in real world programming. It's likely useful for low-level servers like compilers and parsers, but most of the world isn't working on those problems.

For any engineering task (not just software engineering) half of the problem is modeling the problem. Most bugs I see in systems by non-junior engineers are in this camp; the engineer thought one thing but the world was actually different (and this abstracts up to, where management thought one thing, but the world was different).

For a parser, that problem is easily defined. For other problems, the difficulty of programming just moves into verifier. Even if your code was bug-free, it isn't the case that your verifier is. For example lets say you are building a search engine; how do you embed the fact that queries for "Justin Beiber" should rank higher than "Justin Timberlake" except for when the term "super bowl" in included? Or how does the verifier embed the fact that local government regulation says that queries for "Winnie the pooh" should only return results for the kids cartoon character? Finally, Godel's shows us every system will have useful results that cannot be formally verified.

On a more human level, I think verifiers have great use in building more stable systems but I don't think we will see a day where they make us more efficient. It's like believing cars will lead to people staying home all day just because it will only take 5 minutes to get to the grocery store. No, the human mind will invent evermore complex systems on top of the tools we have requiring even more engineers to manage them.


> For any engineering task (not just software engineering) half of the problem is modeling the problem. Most bugs I see in systems by non-junior engineers are in this camp; the engineer thought one thing but the world was actually different (and this abstracts up to, where management thought one thing, but the world was different).

What burns me just as often is when "the problem is modeling the problem" garners a response of refusing to model anything, lest ye be wrong. This is the realm of architectural astronauts that have not yet accepted that title. I'll just put some mechanism in here where the user can make this code do whatever they want and all will be good.

This is sabotage masquerading as reason. As a somewhat senior person, refusing to decide is leaving that decision to your 'inferiors', who then bear the shame of having guessed wrong. Their track record looks worse, and yours looks better. See, you can't trust they new guys with anything. They always fuck it up. That's why you should listen to me instead.

I'm getting real tired of working with Grima Wormtongue.

What I expect from senior devs is that they have a plan, and one or more backup plans. When Plan A doesn't work, I expect them to acknowledge it and flip to Plan B. I get disappointed a lot, and I can see it in others as well. I can't tell you how many times I've surprised someone into stunned silence by practicing what I preach. Wait you're not going to fight me about how your plan didn't work? What is happening?


Absolutely, thanks for the incredible response. It is a bit pie in the sky thinking isn’t it.


There is TLA+ for verifying specifications. Here is a talk by its creator:

https://www.youtube.com/watch?v=-4Yp3j_jk8Q


Unsurprisingly, we can see a growing interest in the Rust ecosystem regarding formal verification. I try to keep https://github.com/newca12/awesome-rust-formalized-reasoning up to date. I will add CreuSAT shortly. The goal is to have an overview of what is actually Rust powered. Long term goal is to identify best practices and provide usefull common libraries.


Excellent work. Kudos to the author(s). I have been looking for a way to write verified performant practical software for a long time, and CreuSAT is an example of how it can be done using Creusot. Thanks!


What are some practical applications for a SAT solver?


Verifying programs, for one thing--CreuSAT actually calls out to no less than three different SAT solvers in the course of automatically discharging its proof obligations :)

More generally, SAT solving is useful whenever you have something that can be represented as a circuit, which is an extremely large class of problems. Indeed, SAT can be proven to be polynomial-time equivalent to any problem in NP (which you can read as "any decision problem whose yes solution is efficiently verifiable"), which includes many useful problems: common examples include the traveling salesman problem, detecting hamiltonian cycles, subgraph isomorphism (interestingly, graph isomorphism is no longer believed to be NP-complete), graph coloring, and many others (which in turn often have easy reductions from other extremely useful problems). Since SAT solvers have received so much optimization work, in practical cases these problems are very often solved by reducing them to some variant of a satisfiability problem, rather than trying to solve them directly.

It is even more useful when combined with theories that can compute special cases (like arithmetic on natural numbers) much more efficiently, which is known as an SMT solver (satisfiability modulo theory). Sometimes, the theories are undecidable, which means the SMT solver is allowed to report "don't know" or run forever in some cases (in practice, people then massage the inputs to try to get them within the solvable cases). This is the form in which SAT solvers are used most often. There are a ton of applications, but an immediate one is determining whether version constraints in a package manager (which may include equalities, inequalities, etc.) can all be simultaneously satisfied (and producing a satisfying assignment). There are a lot of good examples, though, which is why SMT solvers are so ubiquitous in the literature when it comes to "how do we do this really hard thing?" (e.g. they are often critical components of program synthesis).

Sorry if this wasn't that helpful--it's a very broad topic and I'm sure many people here can give you a much better answer!


Another SAT developer here. Most prominently, SAT Solvers are used in the design and construction of CPUs. See the EDA Challenge of the SAT Competition 2021.


Isn't the traveling salesman optimization problem non-polynomially-verifiable? https://www.nomachetejuggling.com/2012/09/14/traveling-sales...


The optimization problem isn't, but neither are the optimization problems for a bunch of other NP-complete problems. The decision problem (where you specify an upper bound cost) is indeed NP-complete.


Is there a name for the class of problems SAT solvers are good at solving ?

"If I can frame the problem as X, then I can get a SAT solver to solve it." What would you call X ?


First the intuition: if you can frame your problem as a search through a large search space to check whether your question is true of any of those points, such that for any particular point figuring out the answer is "easy" but the search space seems insanely large, your problem is probably "equivalent" (in some sense) to SAT. If you have an easier problem than that, you can also represent it as SAT, but you probably shouldn't, because there's probably a faster way to solve it.

---

Now the long version...

There's a very precise answer here, actually, since SAT is what's called an NP-complete problem (a very important and large class of problems). The easiest way to explain what you can use with SAT is to first define NP, then define NP-completeness.

Let X be some finite "alphabet" of symbols in your system, e.g. 0 and 1 for binary inputs (but as it turns out, for our purposes it could be any finite alphabet and it doesn't really change the definition).

We define a decision problem on finite strings of symbols in X, as a predicate on strings of symbols in X, P: X* -> Prop (strings in X are just a finite number of concatenated symbols of X, e.g. "", "0", "00", "10", etc.); another way of thinking about this is just saying that P is a subset of the set of all strings of symbols in X. It can be pretty complicated to actually check whether a string is actually in P (you can define P as "the set of all programs that don't halt!"), so that is not that useful computationally.

Intuitively, since we're doing computer science, we want to characterize the properties we can actually check. Let's narrow down decision problems just a bit. First, we'll define "computable" functions f : A -> B for any types A and B, as functions / algorithms that take inputs of type A, return inputs of type B, and halt on all inputs (you can define halting formally but it's a pain so we'll skip that). These are a much better fit for computers to run than arbitrary predicates, because they have an answer and a defined algorithm.

Now, we'll narrow down the set of decision problems to the class of what are called "semidecidable problems," which is basically the largest class of functions we can say something useful about computationally. For a decision problem P : X* -> Prop, we say P has a semidecision procedure if there exists an alphabet Y and a computable function f : (X, Y) -> bool, such that for any string x : X, P x holds if and only if there exists some string y: Y such that f(x, y) returns true.

Clearly, not every predicate has a semidecision procedure: the stupid counterexample we talked about before (the set of all prgorams that don't halt) doesn't have one. But, to show how broad this class is, the halting problem actually is semidecidable! That's because if a program halts, by definition, it must halt in a finite number of steps; so you can make Y* just be a natural number represented in binary (or whatever your favorite base is), pass the max number of steps to compute to f (a function that simulates a computer for y steps), and then the function returns true if f halts within y steps and false otherwise. Pretty incredible, right? This is one of many reasons why people are usually mistaken when they say something is impossible because of the halting problem; in practical cases, you can almost always work around it with a semidecision procedure and an appropriate certificate.

One really neat aspect of semidecision procedures is that any semidecidable problem P has an algorithm that, given x: X, will always* terminate and return true if P x holds (but may run forever if it doesn't, so this is called a ). How? Because Y is finite, every string Y* can be represented as a natural number in base Y. So we can just run through every natural number, in order, convert each one into its equivalent string in Y, and then run f (the semidecision procedure for P) on (x,y). If it returns true, we're done, and we know x is in the set (and if x is in the set, we know it will eventually return true for some y, so this always terminates if x is in the set). If it doesn't return true, we're not done, but that's okay because we're only a semidecision procedure :)

There's a bunch of other interesting stuff about semidecision procedures, but we're now going to narrow it down even further... because the problem with the definition I just gave is that there's no restriction on how big y can be. It could be very large, arbitrarily larger than X. So I could just give you a ridiculous number as the upper bound on how long it takes to halt, for instance, and tell you to keep trying for that number of steps, and you couldn't realistically prove me wrong, but also obviously couldn't run that many steps. Additionally, the semiprocedure f just needs to be computable, but nobody said it needs to be fast... as long as you can prove that it halts, it can be basically arbitrarily slow! And of course, the algorithm I mentioned for finding the certificate is incredibly impractical even for relatively small certificates.

So a useful class of problems are those for which there exists an "efficient" semidecision procedure, where the total time required to run f on x and y is bounded in some way by the length of the original input, x. "Efficient" can mean a lot of things, but most people agree that stuff that's exponential in the length of x is probably not efficient, so a popular definition is that a decision procedure is "efficient" if it can run in time polynomial in the length of the input (note that this is the length of the input string, so for example if x is a number represented in binary, it needs to be polynomial in the logarithm of the number, not the number itself). Note that this obviously also places a bound on how much bigger the useful part of y can be than x: it has to be at most polynomially bigger, otherwise the algorithm definitely can't run in time polynomial in the length of x since it has to spend longer than that just reading the parts of the certificate it needs!

Using polynomial time (also called P) is nice because it erases away a bunch of stuff, like log factors (i.e. the form in which the input is represented), conversion procedures, etc. Polynomial time is also closed under most operations: if two procedures can run in polynomial time, then basically any composition you can think of that uses them also runs in polynomial time. It can also hide huge hidden terms though, so "polynomial time reducible" or "polynomial doesn't necessarily actually mean "efficient", it's just a convenient category that restricts things so they don't blow up in insane ways. In practice, most stuff people actually care about has a pretty small polynomial.

The class of problems I just defined (problems with semidecision procedures that run in time polynomial in the input?). That's NP! And that's the set of problems you can encode as SAT instances :) Why? The answer is pretty interesting... first, remember how we defined an algorithm that always terminates with "yes" if x is in the set, for any semidecision procedure at all? Well, with the restriction that our semidecision procedure runs in polynomial time and that the certificate is at most polynomially larger than X, we can ∂efine a new procedure, called a decision procedure, for the problems, as follows:

Given x, run through every y up to a polynomial greater than the maximum certificate blowup (we know this polynomial exists, and what it is, by the definition of problems in NP, as I mentioned earlier, so we know that if a certificate exists, we'll find it). For each y, run our decision procedure f(x, y) on it; this runs in polynomial time, also by definition. If f(x, y) returns true, then we return true. If it didn't return true for any y up to our upper bound, we return false. This new algorithm is said to decide P: it halts on all inputs, not just the ones that return yes, and returns true if x is in P and false otherwise. Problems with this property are called decidable, so we can see that all problems in NP are decidable.

How does this relate to SAT? Well, first, let's observe we can convert the problem from base |X| and |Y| to base 2 (and back) in polynomial time, with at most polynomial input blowup (proof omitted, it's not that interesting). Once we've done that, the algorithm we gave above looks a lot like a naive algorithm for solving SAT: if we think of every bit in the certificate as a boolean variable, then we're just running through all possible assignments to the boolean variables, and then checking f(x,y) on each assignment to see if any are true. So if we can somehow abstract f(x, _) into a boolean circuit (that uses at most polynomially more space than the original input), we're all set!

As it turns out, we can do that! The actual way we do it is kind of subtle, especially if we want to be efficient, and depends heavily on the program representation you've chosen, but the Wikipedia article on the Cook Levin theorem (https://en.wikipedia.org/wiki/Cook%E2%80%93Levin_theorem) gives one approach. So, any problem in NP can be converted to SAT in polynomial time :). Since SAT is also in NP (the certificate is the string of satisfying assignments, and checking it just requires running the boolean expression on the certificate), we call SAT (and other problems like it) "NP-complete." Any problem in NP that you can convert SAT into in polynomial time is also NP-complete, so this is a very natural class of problems, and it contains a whole bunch of important stuff (basically, anything where you need to search the input space for an answer, but can verify it easily).


Thanks for this great right up! I am slowly making my way through it :)


You're welcome! Happy to help if you have any questions, as you can probably tell I used to study this stuff in some depth.


SAT solvers are used in package managers to resolve dependencies.

SAT solver is a tool that determines whether a given boolean formula can be True (it is called SAT) or not. If it can be True, a SAT solver provides values for variables so that you can check that they are correct.

For example, a Boolean formula

   a & (b | c)
Can be True if a = True, b = True and c = False.

SAT solver can be useful for resolving dependencies in a package manager. Imagine if a user wants to install package A with version >= 1.0, but package A depends on other packages and might conflicts with some packages as well. The package manager converts these requirements into a boolean formula like this: let's say that if variable A₁₀ is True, then package A version 1.0 is installed, if A₁₁ is True, then package A version 1.1 is installed and so on.

First, let's express user's desire to have any version of A installed. Here is the rule:

    A₁₀ | A₁₁ must be True
Let's say A version 1.0 requires B version 1.0 or 1.1. We can then write a rule, that if A 1.0 is installed, then B 1.0 or 1.1 must be installed as well:

    !A₁₀ | (B₁₀ | B₁₁) must be True
Now, let' say that B 1.0 conflicts with C 1.0 and both cannot be installed at the same time. We can express that as a formula as well:

    !(B₁₀ & C₁₀) must be True
Finally, we cannot install two versions of the same package, so let's write that too:

    !(A₁₀ & A₁₁) & !(B₁₀ & B₁₁) must be True
Now we can combine all the rules into a single formula. All the rules must be true, so we join them using AND operator:

    (A₁₀ | A₁₁) & (!A₁₀ | (B₁₀ | B₁₁)) & !(B₁₀ & C₁₀) & !(A₁₀ & A₁₁) & !(B₁₀ & B₁₁)
Now we can give the formula to the SAT solver. If will solve it and give a set of versions that need to be installed in order to satisfy all requirements. Or it will prove that this is impossible.

Another use of SAT solver is verification of programs: if you have two pieces of code without loops and you want to make sure that they are equivalent, you can convert them into boolean formulas and use SAT to prove that for any given input they produce the same output.


SMT solvers are built on top of them which can do useful things like crack license keys or do cryptanalysis.

https://zeta-two.com/assets/other/nixucon18-slides.pdf https://github.com/TechSecCTF/z3_splash_class


Another use case: verifying natural language requirements. As an example: http://slra1.baselines.cloud


So can CreuSAT be used as a backend for Creusot?

A “self-hosted” SAT-solver, in the sense that it can prove its own correctness, that would certainly give you a speaking slot at some conference. But also a philosophical dilemma I guess. :)


This was asked in the Rust Zulip as well, and is actually the original idea which this thesis evolved from (Xavier said something along the lines of "My supervisor and I have talked about how cool it would be to have a solver which verifies itself").

The short answer is that it should be possible to write a Why3 driver for CreuSAT which allows it to handle basic propositional reasoning, and that one would probably need to add some theories.

The a bit longer answer:

Getting it to work as a backend should probably not be all that hard, and it should probably be possible to make it able to prove some things as well. Getting it to verify itself would probably be very hard. As it stands, the proof needs Alt-Ergo, CVC4 and Z3 (all of which are really good) to pass, and some of the obligations take a long time. A solver capable of verifying itself would (I imagine) have to put much more emphasis on having only simple proofs which were solvable using a minimal amount of theories, and would probably have to run for a very long time. I dunno, would be a fun project, and I guess one could decide after having tried to prove for instance Friday (the super naive solver) if it is at all worth pursuing.


I thought proof checking tools mostly negated the need for verified sat solvers.


A former coworker of mine was in the process of publishing a really nice paper about a framework that allows concurrent resp. simultaneous verification of proofs generated by most state-of-the-art SAT Solvers, which drastically reduced the time needed to construct and then resp. simultaneously verify UNSAT proofs. Unfortunately, he left before publishing the paper.

(Among other reasons he was unhappy with the academic world, which I do understand perfectly.) Unfortunately, I have trouble reaching him. I believe his basically finished paper would be of so much use to some people. It's really sad. I believe his paper really should be published in a journal or at least put online somewhere. It's not totally groundbreaking or something, but like hard work nobody had the determination to, yet. At least not publicly. I feel so bad that I can not share his work without his permission.


Producing the proofs (especially for UNSAT) is often very time consuming, and verifying them can be subtle (and if an incorrect proof is produced, you still have to deal with that case somehow...). It's not so much that they negate the need for verified SAT solvers as that creating and maintaining a verified SAT solver was considered much too difficult to justify the proof effort required. If the goal can be accomplished with an order of magnitude less effort, that calculus changes somewhat.


SAT solving is the subtle and time consuming part. A valid SAT proof certificate can be verified efficiently.


I'm referring to producing the certificate as the time-consuming part, not the verification time (I disagree that verification is not subtle, but I suppose that doesn't matter as there are verified verifiers). For example, many SAT solvers no longer attempt to provide "pure" resolution proofs because they are too expensive to produce, which restricts the kinds of techniques you see used in competition. In fact, producing certificates efficiently is an active research area, which wouldn't be the case if certificate production had negligible cost, e.g. see https://lmcs.episciences.org/9357/pdf.


SAT comp gives you 5x the solving time to check your proof, which is indicative that checking isn’t so simple (though it is on paper)


Generating a proof and checking it is definitely a step up, but it does have a few issues.

- SAT is in NP, and UNSAT is in CoNP. This means that proofs take a lot of space to store and a lot of time to check. Proofs are usually run with 5x the time limit of the initial solve, and still end up timing out.

- Proofs are (currently) based on resolution, which means that research on efficient solving techniques which can not be efficiently modelled as resolution is not being prioritised. This is touched on briefly in the first 2 talks of Randy Bryant on: https://fm.csl.sri.com/SSFT22/#splogic. I think the talks in general are among the better for understanding a SAT solver like CreuSAT.


>Proofs are (currently) based on resolution

Proofs are actually based on a practical version of extended resolution [1] called DRAT. Extended resolution is an extremely strong proof system, and research on techniques that are not efficiently modelled by resolution are in fact being prioritised (see for instance [2]). This is not to say that CreuSAT is not very impressive, but just a correction on the current state of the things.

[1]: https://en.wikipedia.org/wiki/Frege_system#Extended_Frege_sy...

[2]: https://link.springer.com/chapter/10.1007/978-3-642-31365-3_...


I am actually having to learn about SAT solvers. Is there any newbie friendly resource you think might help?



Thank you kind sir/madam.


Nice.


Will there be a version of this for the College Board type of SAT?


That's not possible; it's well known that the College Board SAT is inconsistent:

https://www.youtube.com/watch?v=POMX80Q11O0

https://www.youtube.com/watch?v=Xh5coE5wJ2I


Those examples are mistakes the test makers made. It doesn’t make the test “inconsistent”.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: