Hacker News new | past | comments | ask | show | jobs | submit login
Modern SAT solvers: fast, neat and underused (codingnest.com)
465 points by kachnuv_ocasek 36 days ago | hide | past | web | favorite | 118 comments



What would be a concrete problem you could solve with SAT solvers, besides the obvious "satisfy this boolean expression" (which doesn't exactly come up every day)? Is it a useful tool for other NP-complete problems?

I mean, certainly you COULD reduce the Traveling Salesman Problem to SAT (it's NP-complete, after all), but I would assume that it would be a totally impractical way of doing it compared to more specialized algorithms.

I'm asking since the title contains the word "underused", but doesn't elaborate on practical use-cases. What would you use it for?


Surprisingly, for many problems (including travelling salesman), Sat solvers are very competitive. I research in this area and it is a constant source of annoyance when you come up with a clever special algorithm which then gets beaten by translation to SAT.

SAT particularly exceeds when you have side constraints. Imagine travelling salesman where you have to top up petrol only in cities, or where you need to sleep at night, or some roads are slower at rush hour, or you have a particular place you want to be at a particular time/date.

A specialised travelling salesman solver would be useless for any of these. Adding side constraints to SAT is simple.

In general, I wouldn't try to write sat models directly unless you are writing a logic problem. Consider looking at conjure ( https://github.com/conjure-cp/conjure ) which adds a nice language on top, and get target various NP complete solvers, including Sat (which often is the best). (I'm one of the minor conjure developers)


That's really interesting. But how does it actually work, in practice, to find an optimal TSP path given a SAT solver? I know there's a reduction proof, but if I remember correctly, the proof I learned involved transforming TSP from an optimization problem into a series of multiple decision problems of the form "Is there a path with length less than k?". That would mean running the SAT solver multiple times to get the right path length. Is that efficient enough, or is there a better way to do this?


It's more than just a SAT solver, but Microsoft's Z3 SMT solver has an option to minimize a quantity of interest. I haven't worked with it much, but it seemed faster than a binary search on 'k' when I was playing around with a non-TSP problem.

https://rise4fun.com/Z3/tutorialcontent/optimization


It usually is efficient enough, mainly because in practice it is very easy to find slightly too long paths, and fairly easy to prove smaller lengths unsatisfiable, so any solver spends most of its time proving, after finding the optimal value n, that n-1 isn't possible.


Some SAT solvers allow you to add additional constraints while they are running. So you can find a solution and then add additional constraints to try to get it to find a better one.


Something I found out, after participating in HashCode was that Google have a pretty comprehensive optimisation library called OR Tools. There is a module specifically for routing. I don't know what Google uses in production, but presumably this was developed for things like navigation:

https://developers.google.com/optimization/


Not sure how concrete/practical you'd consider this, but here's one:

In last year's Advent of Code programming contest, one of the trickier problems involved taking a collection of points in 3D space, each of which has a "range" defined by Manhattan distance, and finding the location that is "in range" of the maximum number of them.

Now, this isn't a theoretically intractable problem. Each point defines the center of an octahedral volume of space, and by repeatedly slicing each volume using the others' boundaries, you can partition the entire space into something like O(N^3) disjoint pieces. You can then enumerate these pieces to determine which one is contained in the largest number of octahedra. But actually implementing this is tricky; each piece ends up having an irregular polyhedral shape, and calculating the geometry of their boundaries would be non-trivial.

Instead, some of the participants noticed out that you could represent the constraints as a formula:

    find (x,y,z) that maximize sum((|x-xi| + |y-yi| + |z-zi|) < ri)
and then just feed it to a SAT/SMT solver such as Z3. This is probably less computationally efficient, but in practice it runs reasonably quickly, and it's vastly easier to code. (The Advent of Code scoring system is based almost entirely on the time taken to come up with a correct solution, so it strongly favors solutions that are quick to implement.)

Here's one such solution: https://www.reddit.com/r/adventofcode/comments/a8s17l/2018_d...


Day 23, that was a fun one!

It’s pretty easy to solve with an octree. My goal was run-time speed rather than quickest solution.

My octree solution runs in ~3 milliseconds. If memory serves, the Z3 solver took around 17 seconds on the same machine.

I made a nifty (imho) video that shows the octree traversal order. There’s no need to create irregular shapes.

https://www.forrestthewoods.com/blog/solving-advent-of-code-...


I'm working in first-order logic theorem proving (which obviously has far greater applications than propositional SAT: from mathematical proof checking, to software and hardware verification, e.g. compiler optimisation checking, to integration in higher-order solvers, etc.). Our algorithm is based on the idea of abstracting the first-order problem to a propositional one, which is then passed to a SAT solver! If it is inconsistent, so is the first order problem and we're done. If not, the SAT problem is satisfiable and we can use the solution to guide a "refinement" of the propositional abstraction, which is again passed to the SAT solver, and so on. If no "refinement" can be made, the first-order problem is satisfiable.

We do reasonably well on the annual CASC competition of theorem provers (we're iprover), even winning a division where this approach is particularly strong!

[1]: http://www.tptp.org/CASC/27/


   Our algorithm ...
Unless I completely misunderstand what you write, the approach you describe is the standard way of setting up SMT-solvers by interaction between a SAT solver with a theory solver. I think it is called lazy SMT or DPLL(T) in [1] where this approach seems to have been described first. [2] is also an interesting description of the phenomenon.

[1] H. Ganzinger, G. Hagen, R. Nieuwenhuis, A. Oliveras, C. Tinelli, DPLL(T): Fast Decision Procedures. https://www.cs.upc.edu/~oliveras/espai/papers/dpllt.pdf

[1] R. Nieuwenhuis, A. Oliveras, C. Tinelli, Abstract DPLL and Abstract DPLL Modulo Theories. http://www.cs.upc.edu/~roberto/papers/lpar04.pdf


I read both those papers ;)

I guess that you can think of our algorithm as lazy SAT modulo quantified first order logic, with the crucial difference that first order logic is undecidable. But anyway in this case the "decision procedure for T" would just be checking whether any two complementary literals which are true in the model are unifiable, which is decidable.

Paper: http://www.cs.man.ac.uk/~korovink/my_pub/inst_gen_modular.pd...


It's not the same approach as CDCL(T) (the stuff in z3, cvc4, and other classic SMT solvers).

Basically CDCL(T) uses the theory part (the "T") as a callback in the inner loop of the SAT solver (to validate partial models). iProver and similar refinement provers tend to have their own outer loops ("refinement loops" in other communities) that makes a succession of calls to the SAT solver, to completion, and use the model or unsat core to refine their own abstraction for the next iteration of the outer loop.


Great talk by Kathleen Fisher on this topic. Also mentions SAT solvers. https://youtu.be/TH0tDGk19_c


It's not difficult to reduce scheduling problems that are usually done with dynamic programming etc. to SAT.

Then there's SATPLAN. If I had VC money I would be seeking young'uns trying to change the world with SATPLAN, of course in a way that's married to machine learning. (Machine learning doesn't "think", it "learns"; logic-based AI "thinks" but can't "learn").

E: part 1 says at the outset:

> As an example, the often-talked-about dependency management problem, is also NP-Complete and thus translates into SAT[2][3], and SAT could be translated into dependency manager. The problem our group worked on, generating key and lock cuttings based on user-provided lock-chart and manufacturer-specified geometry, is also NP-complete.<

Part 3 is already deep in the rabbit hole of comparing heuristics. People like me just write something that "compiles" to DIMACS and fires Minisat.


A primary problem in marrying logic to ML is doing so in a differentiable way. i.e. when your SAT part of the system is suboptimal, how do you know which parameters of the learned part to change, and in which direction?


We propose Differentiable Satisfiability and Differentiable Answer Set Programming (Differentiable SAT/ASP) for multi-model optimization. Models (answer sets or satisfying truth assignments) are sampled using a novel SAT/ASP solving approach which uses a gradient descent-based branching mechanism.

https://arxiv.org/abs/1812.11948


From the article, a lot of the heuristics seems to revolve around and 1) which variables to try first 2) which learned clauses to keep in your set.

Assign scores and differentiate on those scores. This is (very loosely) how AlphaGo also works - you take a tree-based algorithm (MCTS) and then use neural nets to score the next node to expand.


you don't need differentiability you just need a learning rule. there are learning algorithms that don't use gradient ascent.


I found this blog post quite interesting[1]. The author is optimising rack positionning to minimise cabling.

However, trying to use his method to optimise a linear typewriter[2] proved unsuccessful, as projected solving time was around 13 million years.

[1]https://yurichev.com/blog/cabling_Z3/ [2]https://www.benjaminpoilve.com/projects/typewriter-optimisat...


Yeah, I tried Z3 (and other things) for the typewriter (my write up is linked at the bottom of the original linear typewriter post) and got nothing useful out of it - even with extra constraints like "make sure E and T are close together".

(But your 13M years is a lot quicker than my projected seven hundred ninety-eight trillion, seven hundred twenty billion, nine hundred fifty-two million, one hundred seventy-six thousand, seven hundred and forty-five years you'd need for an exhaustive search of the 26! possibilities.)


Concolic testing

https://en.wikipedia.org/wiki/Concolic_testing

Basically, the problem "find an input that causes the program to execute this sequence of statements".

---

Identification of missing optimizations in the LLVM backend.

https://github.com/google/souper


If you use a programming language with a package manager like npm, PIP, CPAN, Pub, Cargo, Bundler, etc. then every time you ask it to figure out what versions of your dependencies to use, you are effectively running a SAT solver.

In particular, pub — the package manager for Dart — explicitly uses the CDCL process described in the article:

https://medium.com/@nex3/pubgrub-2fb6470504f


A notable exception being Go modules, which do Minimal Version Selection [1], a much simpler approach.

[1] https://research.swtch.com/vgo-mvs


I'm a co-author of a paper where we use program synthesis to optimize big-data (i.e., slow-running) SQL queries. The synthesis engine we use encodes its search problems into SAT.

https://www.microsoft.com/en-us/research/wp-content/uploads/...


Hello, i am CS undergrad.i wish to learn about program synthesis. while researching online for courses available. i came across the one offered at univ of washington and a few more, they do not have video lectures.if possible, please provide pointers on the right place to start.


Hi,this blog post is a good starting point: https://alexpolozov.com/blog/program-synthesis-2018/

Among other documents, it links to this extensive survey: https://www.microsoft.com/en-us/research/wp-content/uploads/...

All the best!


I’ve had a supply demand problem that I couldn’t figure out how to use a SAT solver for - thousands of demands ( movies have to reach thousands of theatres) with tens of possible paths (fedex, satellite, download) - each at different costs, some impossible and costs will change with volume and consideration of other similar deliveries. Have a solution that models each choice of supply demand tuples as nodes in a tree and doing depth first search on it.

Any ideas on how to switch to an SAT solver? How would I even write such a large SAT Boolean expression, and how would I tell the SAT solver that the costs will change for each choice depending on business rules and other choices made? It seemed impossible with a SAT system but I assume I’m not familiar enough with the concepts.


You probably want to use SMT (which uses SAT underneath), not SAT directly. SMT solvers usually let you write constraints over numbers which is a more natural fit for your problem.

Also if you have thousands of variables, you probably shouldn't be writing SMT statements directly. Usually you have some sort of script for your problem domain that emits SMT then invokes the solver. E.g. the popular SMT solver Z3 has a very simple lisp-like syntax that's easy to emit from a script.

As for changing costs, you might be able to use simple expressions like: costOfShippingXusingZ = 100 - numOfXusingZ * 2

Or you might use "soft constraints" that let you add weights to optional constraints.

It's important to note that with SMT/SAT solvers you generally want to keep your formulas linear or "first-order" which means don't multiply variables together if you can help it.

This might be useful: https://rise4fun.com/Z3/tutorial/optimization


> Also if you have thousands of variables, you probably shouldn't be writing SMT statements directly. Usually you have some sort of script for your problem domain that emits SMT then invokes the solver. E.g. the popular SMT solver Z3 has a very simple lisp-like syntax that's easy to emit from a script.

the follow on is that the same program should also process the model (answer) that Z3 produces and render it in a domain-appropriate fashion.

this is easiest if you use something like the python interface to Z3, instead of actually generating SMT-LIB sexps and feeding them to Z3.


At one point I wrote a Common Lisp interface to Z3. The syntax was nice, except that empty lists have to be printed as () rather than NIL. So close to working with no tweaks.

The hardest part is getting the model back at the end (well, and making problems that don't scale too badly with N).


Thanks a lot, will check this out.


It sounds like you should be able to transform the tree into a graph that encodes every possible intermediate state and use a path finding algorithm to find the shortest path from "nothing delivered" to "everything delivered". A* might work well since it sounds like you can fairly easily calculate a minimum bound for the cost to get from anywhere to the "fully delivered" state.


How would A* perform on a graph with 2^n nodes?


It can perform very well if you happen to have a good heuristic function for that problem and that data.

However large graphs might not fit in memory.


Yeah, the highlight of my current solution is that the graph is computed lazily / on demand (supplies are passed in the current path and provide estimates) - very CPU heavy but light on memory. There no way I can precompute every possibility, but is there anything specific to A* that rules out lazy node creation? In this problem the nature of every node is a function of the path you took to get there.


No, you can definitely use A* with lazy node creation. It's not clear to me that you'll get a reasonable runtime though with an NP-complete problem. Seems like you might have a pretty high branching factor there, but if you're already using depth-first search, A* should be a major improvement over that with a good heuristic.


You need memory proportional to the number of nodes you visit (score of the node, previous node on the shortes path to this node, list of next candidates). There is nothing preventing you from creating everything else lazily, and in practise most nodes will never get visited.

A* also allows you to find a solution faster by overestimating your heuristic: For example if you multiply your heurisitic with 1.1 you are guaranteed to find a path whose length is within 10% of the shortest path, requiring less CPU time and memory for the search.


What are you trying to do? Show that deliveries can be made or minimize the cost?

If you’re minimizing the cost, shouldn’t this be formulated as an optimization problem?


Yes. That's a MILP. And if we think SAT solvers are useful and underused, wait till you start working with MILP solvers.


Yes, yes, yes. I've made this comment a few times on here but tools like MIP solvers are vastly unknown since they didn't take on a sexy name when they started out. However, they can outperform most heuristic based algorithms, and even solve some of the largest problems. It's how the NFL makes their schedules oh, and you can imagine how complicated that is.


Fun fact. Most power grids in the developed world are run off of some of the largest MILP and LP problems. We're talking about millions of constraints to commit units and hundreds of thousands to dispatch them. These run around the clock 24/7.

I've only recently learned about SAT type solvers as MILP/LP problems are very fast and globally optimal, unlike things like A* or Genetic Algorithms. I think GA's are cool because you can write one in less than 1/2 page of Python, where MIP solvers are usually large black-box C codesets (Ex: GLPK, CPLEX, GUROBI, XPRESS, Open-Solver).


GA does not belong anywhere near real problems. They are toy solutions with zero guarantees.

Fun to code though!


Sadly, I've mostly come to the same conclusion in that I at least need a MIP Gap to tell me how far I am from the optimal solution.

My only issue with LP and MILP solvers is scalibility. You have none! Multicore doesn't really seem to help any.


Ssshhh ... Stop telling everyone. I have rewritten various problems as MILP it finds a solution and people think I’m smart!

My favorite MILP is commercial and perhaps somewhat obscure for general population - Mosek. Any recommendations in regards to other good options? perhaps open source that might be in same or better league.


Gurobi, CPLEX and XPRESS are the obvious commercial ones. GLPK and Cbc will also do MILP.

For more information, see this page: http://plato.asu.edu/guide.html


Thank you so much. Very nice collection of information.


I think it was this year I learned they exist. They told me a few to check out. I still don't have a list of applications. What areas were you thinking about when you said underused?


Cost minimisation, so yes, it is an optimisation problem, and I haven’t heard of MILPs, will check them out. Is that a Prolog style constraint solver? And would it support dynamic constraints, where some choices will render others impossible (inventory / bandwidth exhaustion)?


A practical example is the Alive theorem prover, which found many bugs in LLVM's instcombine pass (i.e. peephole optimizations) and which we now use to prove the correctness of new transformations.

https://rise4fun.com/Alive https://blog.regehr.org/archives/1170



Seeing someone cite a little project I did for fun many years ago is a very nice surprise. :)


Those are games, not practical problems. I find SAT solvers exciting, but at the same time I’m so disappointed by how few problems they solve. So far it seems to be dependency management and key cutting.


Generally speaking, if you ever find yourself coding a brute-force backtracking search for some combinatorical problem, it might be a good idea to consider using a SAT solver instead. This way you get to take advantage of the non-chronological backtracking, and all the other tricks the SAT solvers implement.

I wrote the hexiom solver that got linked above and the motivation in that case was that I saw that someone tried to solve the puzzle with a hand-written backtracking search, but that their algorithm wasn't able to find a solution to one of the levels.

My version of the SAT solver managed to quickly find a solution for every case, including the highly symmetric one where the usual search algorithm got stuck. The nicest thing is that the implementation was very declarative, in that my job was to produce a set of constraints for the SAT solver, instead of to produce an imperative algorithm. The symmetry breaking was one example of something hat was very easy to do in a SAT-solver setting but which would be trickier to do by hand.


I found two remotely exploitable 0days in the Erlang VM with the use of a SAT solver.

They can trivially break lots of hash algorithms. You can also find bitwise equivalents to functions which you wouldn't naturally expect to be easily definable in terms of bitwise operations. You can extend existing concepts with a lot of extraordinary new functionality.


DO you have a quick explanation or any reference on using SAT solver to find equivalent bit-wise functions? I can see how you could easily verify equivalence, but I am struggling to see how you could use SAT solver to generate alternate functions using bit-wise operators.


When considering a small number of possible operations, I've found the optimum way to work this is to simply enumerate all binary expression trees of increasing length and determine if some valid assignment exists for any of them.


That's really neat, do you have a writeup?


This is a description of what lead to it: https://www.reddit.com/r/ReverseEngineering/comments/5h23u3/...

If you want more information about the vulnerability and mechanisms of exploiting it, project members assigned it CVE-2016-10253 .


What I really want is scanning my own codebase for problems.


The beautiful thing about this line of research is that if you have a problem that can be reduced to key cutting, you're set. Particularly if you can learn the lock-charts (or whatever the data looks like in your higher-level problem domain) from data.

The magic sauce is in being able to combine these things. Chuck a bag full of keys, sans known tumblers (those are embedded in doors, you don't steal doors) into a machine and learn with eg. stochastic gradient descent what are tumblerset-candidates that satisfy the entire system. Sensor arrays + AI that thinks + machine learning = wow.


Why fund DeepMind? Blink stalker micro & beating some old board game can't possibly be useful. Yes, we should also ignore Game Theory because it's just games, nothing practical

The complaint was having to compile problems into boolean satisifiability problems. Here we see one need only describe their problems as puzzles. These examples are nice in that they include code that shows how straight forward description of these problems can be


I called those out as not practical problems because the commenter specifically asked for that. I’m all for research, but there’s this bait and switch that goes on with SMT solvers. They’re presented as useful (TFA says “underused”) and when someone asks for real life use cases, people show minesweeper.


IIRC the dpkg package manager in Debian uses a SAT solver for dependency resolution.


Debian does not have a SAT solver powered package manager.

That said, DNF package manager in Fedora, Mageia, OpenMandriva, and Yocto uses a SAT solver. As does the Zypper package manager in (open)SUSE.


As do many language package managers, e.g. Dart's pub.


I used a SAT solver awhile ago to build a program that scheduled retail employees based on preferences set by both the employer and the employees.


In my current project, I started using a SMT solver to type a language I'm compiling (find out the "best" types for every expression). Then I also used it for storage ressource allocations (finding out which programs should store their output so that later retrieval is the cheapest), and then again to layout some graph on the GUI; and I have another similar use case already in mind.

Of the many ways to solve a complex problems that we have on our tool box -- algorithms, neural networks, constraint solvers -- I would totally agree to say that solvers are underused. But when you get used to them it's not hard to find use cases.


If you're willing to overlook that Z3 is a more general SMT solver, it's a useful tool for one-off problems where raw brute-force is inefficient.

I used this to proof-of-concept an attack on Seahash[1]; this was the first time I'd ever used an SMT solver, but it was really very straightforward.

[1] https://gitlab.redox-os.org/redox-os/tfs/issues/5#note_9768


As its applicable to all discrete Problems (think of Sudoku), its a really neat skill to have.

Reductions are pretty easy to write and modern SAT solvers are crazy fast.


SAT, its extensions and SMT solvers are heavily used in bioinformatics. For example in genetics and for protein folding problems.


Air Traffic Reschedualing [1]. The algorithm is using hard constraints rather than soley relying on machine learning. I believe that the paper is essentially describing a SAT solver strategy that is used by a machine learning algorithm. The agorithm should also great for employee reschedualing, or any kind of reschedualing where you want to find the nearest solution anytime the last known solution is invalidated by a state change.

[1] https://www.researchgate.net/publication/259823636_Addressin...


I have used them to prove equivalence between pseudo code and optimized production code. Several rounding-related tricks and a too large search space made it very difficult to prove with traditional testing.


An easy case that I've used multiple times is equivalence checking. In other words: phrase a problem like "are these two programs equal" as "Is there any input X that can satisfy the following: F(X) != G(X)". If you can compute F and G as boolean functions, a SAT solver can often quickly and easily give you an answer. If the solver says this equation is unsatisfiable (UNSAT), then they're equal. Otherwise, it will give you a counterexample: an input X where F(X) isn't the same as G(X).

Three examples I've done this for, myself.

- Hardware circuits: is a hand-optimized combinational function (optimized with hand-written karnaugh maps, or whatever) equivalent to a high level specification? You could for example use this to write N versions of an instruction decoder, a reference spec and versions specialized to different hardware devices, and check them all together.

- Packet processing: are two different firewall/packet filters equivalent? In other words, is there any packet P for which two filters F(P) and G(P) have different results? This might be useful for example as a cloud user to ensure certain assumptions can't be violated (locking out necessary ports, etc)

- Cryptographic code: is this piece of C code equivalent to a high level specification? Is it still the same if I optimize it? This is possible by bounding the input of the code and then extracting models from them. You can do this in practice for pretty interesting functions. For example, I showed a reference spec of ChaCha20's RFC was equivalent to some C code I wrote, up-to some bounded stream length. (Unfortunately when SIMD is involved it's trickier to pull this off but that's mostly a technical limitation more than a theoretical one.)

There is a variant of the third approach I've also seen used: an easy way of attacking weak RNGs! For example, the ancient JellyBean version of Android had a weak RNG at one point that allowed Bitcoin wallets to be stolen by cracking the keys. Effectively, this PRNG was seeded with a 64-bit state. But 64-bits isn't that much: two different seeds may possibly collide and give the same resulting random value. So here's how you phrase the problem: instead of equality, you're basically trying to prove the PRNG is non-injective. If R(n) is a function that takes a seed state 'n' and produces a pseudo-random number, then translate R to boolean expression, and ask this question: forall x y, iff x != y, R(x) == R(y). Are there any two values `x` and `y` that are not equal, but do result in the same resulting RNG values? If this equation is satisfiable, then your RNG is weak. In practice, a SAT solver can prove this is the case for the JellyBean RNG in less than 0.1 seconds. And you can also do it the other way, prove injectivity: forall x y, x == y || R(x) != R(y). You prove that for all x,y either x and y are equal, and if they're not, they will never result in the same output. This can also be proved in less than a fraction of a second. [1]

SAT solvers and SMT solvers are useful for a wide variety of practical applications, but they are not magic. You just need a bit of familiarity with them, and you'll probably find a lot of problems can be phrased that way! The secret, of course, is actually learning how to phrase problems in a manner that a SAT solver can actually work with. That's an entirely different problem.

[1] You actually want to prove something slightly different: forall x y, not(x == y || R(x) != R(y)). You want to find two inputs that result in "unsatisfiable" in this case. Negating the result gets you that: if any two x/y are not the same, but R(x) == R(y), then the overall result is 'true', which is a satisfiable case with a counter example.


I think reducing the TSP to a SAT problem is not a very efficient approach. MIP solvers have become extremely sofisticated (and implement many of the techniques of SAT and much more) and offer a much more powerful modeling paradigm. it's hard to imagine a SAT solver outperforming them on such problems.


Heavily used in hardware design and verification.

Equivalence checking, logic synthesis and optimisation, formal verification, etc.


> What would be a concrete problem you could solve with SAT solvers, besides the obvious "satisfy this boolean expression" (which doesn't exactly come up every day)?

Say, you want to implement a package manager, with dependencies.


Pedantry: you can reduce TSP to SAT just because it's in NP. It's the opposite reduction (from SAT to TSP) that you can do because TSP is NP-complete.


Via routing, pin placement, optimizing logic, and probably a couple other hardware problems that are also nondeterministic and not coming to mind right now.


Program verification. E.g. when you want to prove that certain condition/path (e.g. asserts) is unreachable.


The article does not mention it but "symmetry breaking", a technique of exploiting symmetry to prune the search tree, can also be an important component of modern SAT solvers. In some sense one cannot do better than avoiding all symmetries in the given problem however that might come at significant computational price in practice.

Static symmetry breaking is performed as a kind of preprocessing, while dynamic symmetry breaking is interleaved with the search process. The static method has been used more but there are promising attempts to use dynamic symmetry breaking with CDCL solvers. See, for example, cosy (https://github.com/lip6/cosy) a symmetry breaking library which can be used with CDCL solvers.


I’ve researched many moon ago, breaking symmetries dynamically by adding clause conflicts. You take the search space build a graph with it and pass it to a Graph homomorphism detector. Then you create adicional clauses that break the symmetry by stopping possible assignments that limit the search space. The overhead of searching for symmetries and the exposition of dynamic clauses, make it a difficult trade off against the ruthless efficiency of traditional CDCL. Also many competitions used to occur every year to find out the fastest solver, these things have been optimized far beyond you would do for “normal” software.


> The overhead of searching for symmetries and the exposition of dynamic clauses, make it a difficult trade off against the ruthless efficiency of traditional CDCL

That is certainly true. However, once all the other components have been optimized the hell out of them (of course that is a never ending story itself but probably with diminishing returns) the optimization of the symmetry breaking parts will follow. As I mentioned before, efficiently dealing with isomorphism in the search space is the ultimate (theoretical) optimization.

This development can already been observed looking at the history of graph automorphism packages. For a very long time "naughty" was the only significant player than suddenly came "saucy" and "bliss". As far as I know, one of the motivations for the newcomers was to use them for symmetry breaking.


Graph symmetries seem like they should cancel out some how and be detectable by some linear algebra..


Not sure what exactly you mean by that but finding symmetries in a graph that is determining its automorphism group is not a simple problem. It is closely related to the graph isomorphism problem, about which we don't currently know whether it is solvable in polynomial time or it is NP-complete. On the other hand, we know that the Integer Linear Programing problem is NP-complete.


At this year's PyCon, Raymond Hettinger gave a talk about the very subject: https://www.youtube.com/watch?v=_GP9OpZPUYc. I haven't watched it yet, but knowing him, it will definitely be worth doing so.


It is! He ran out of time a bit I think, at least it felt quite rushed at times, but I was left very curious and motivated to try some SAT.


wow, thanks for this, Raymond is a serious python programmer, and old hand, and he gives excellent talks. Had no idea he gave a talk on SAT


I think the number one reason SAT Solvers are underutilized is because even relatively advanced programmers like me don't have a particular clue how to frame their problems for a SAT solver.


Although modern SAT solvers has a ring to it, I wish we would go back to calling them high schoolers.


Yes. Isn't this what has Felicity Huffman is facing a prison sentence?


Can anyone help with what looks like SAT scalability problem? Is there a modern SAT solver that can do it? I was trying out Google SAT and it seems not to be able to scale to a simple school scheduling problem. Here’s the post with the code and test data https://groups.google.com/forum/m/#!topic/or-tools-discuss/O...


In the worst case we can't get around exponential runtime in the problem size. This is problematic since encoding often is not even linear but quadratic or worse. But there are often tricks to get to encoding to nlogn size.

Problems that are "hard" are usually specially crafted (SAT competition).

Getting back to your question: In practice most problems do not show exponential runtime behaviour. Using a good SAT solver is key here since the more naive an implementation is, the more probable it is to run into exponential runtime with practical problems.


Thanks. Is there such a good SAT solver you would recommend to try out?


Is there a standard book or lecture that explains the stuff like conflict clauses and literal watch? I see informal tutorials but something more fully explained, but not overly technical would be useful for people interested in applications. Just something in between blog posts and having to read the original technical papers.


The barrier to entry seems extremely high. Most "example" problems already require arcane domain knowledge.


okay, I'll bite.. what's a SAT, and why does it need to be solved? Am I a buffoon for thinking it's the standardized test?


It means satisfiability problem. You can think of it as one "assembly language" of optimization, in the sense that many higher-order problem formulations can be "compiled" to SAT.



hmm I wonder if the article about SAT solvers defines them? https://codingnest.com/modern-sat-solvers-fast-neat-underuse...


To be fair it took me a bit realize this was part 3 of a series. It would have made more sense to post part 1 to HN.


thanks


Why are people ITT talking about reducing problems to SAT? As I understand it, the point of reducing a problem to SAT is because there's a proof that SAT is NP-Complete which in turn would prove that your problem is NP-Complete. But if you have some sort of industrial constraint satisfaction/ASP/scheduling/search/etc problem, you can solve it with whatever methods are available. You don't have to reduce a problem in such and such domain down to a big Boolean formula and solve it via SAT techniques. Am I wrong?


SAT (and SMT) solvers are extremely optimized and very expressive.

Writing an encoding from problem P to SAT is usually much simpler and easier to do correctly than writing a dedicated solver for P. SAT solvers are so efficient than very often they will be competitive with dedicated ones.

At the end of the day, you want to solve your high-level problem, not writing a solver, especially when such good solvers are already available.


> As I understand it, the point of reducing a problem to SAT is because there's a proof that SAT is NP-Complete which in turn would prove that your problem is NP-Complete.

That's kind of backwards: reducing any problem in NP (not just NP-complete ones) to SAT is possible and mathematically straightforward because SAT is NP-complete. You do reductions from SAT to other problems in order to show that they are NP-complete (which involves cleverness in working out how to represent propositional logic in e.g. Sudoku).


What happened was that SAT solvers (and the related SMT solvers) got very fast, so fast that often the fastest way to solve a constraint satisfaction problem is to express it in a form one of these general solvers takes as input and then just using the general tool.

Of course the problems being solved are NP hard in general, but in practice, on the kind of problems they're actually given, the solvers can be sufficiently fast.


To prove that a problem is NP complete you have to reduce SAT to that problem, not that problem to SAT.

You can solve combinatorial search problems with whatever method you want, but it can be very hard to beat a SAT solver, because you may have to duplicate all the strategies, engineering, and heuristics that SAT solvers have. If your problem has special structure then it may be worth writing your own solver. Constraint satisfaction problems can often be written with high level constraints that have good constraint propagators (e.g. the alldifferent constraint).


Reading through a couple of other responses in this thread, it seems that converting a problem to be compatible with a SAT solver can sometimes result in unexpected gains in speed? That SAT solvers can sometimes be a better solution to a problem than the general "go to" solutions for a domain, despite it seeming to be inapplicable to such a domain? Maybe?


It's not that they seem inapplicable to some domain: they're applicable to a huge number of domains (basically any kind of constraint satisfaction problem). A lot of people thinks that this generality means that solving SAT problems is difficult, but in reality it means that a huge amount of effort has been invested into SAT solvers, which may outweigh the benefits you can get from a specialized solver.


What is the benefit of using a SAT solver directly instead of an SMT solver, other than tweaking for a bit better performance in the SAT solver?


Since I cannot find it mentioned in the OP nor in the comments: if you want to experiment with SAT, the pysat package might be a pretty good start.

https://github.com/pysathq/pysat

The API is not exactly Pythonic (numpy/scipy/opencv style) but it's loaded with features.


"and underused"

Where does the article talk about this?


Stupid question... Are there equivalent (in the sense of being fast and having nice interfaces) solvers for #SAT problems? A whole pile of inference in machine learning can be framed in this way which makes me curious.


In case others aren't familiar with the termiology: #SAT is the counting version of the SAT problem, which tries to find how many different solutions exist to a set of constraints)


How similar is that to answer set programming?



Is meta-learning still useful for SAT solving (e.g SATzilla)?


I thought it was an article on satellite signal for tv




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: