Love this article and the push to build awareness of what modern SAT solvers can do.
It's worth mentioning that there are higher level abstractions that are far more accessible than SAT. If I were teaching a course on this, I would start with either Answer Set Programming (ASP) or Satisfiability Modulo Theories (SMT). The most widely used solvers for those are clingo [0] and Z3 [1]:
With ASP, you write in a much clearer Prolog-like syntax that does not require nearly as much encoding effort as your typical SAT problem. Z3 is similar -- you can code up problems in a simple Python API, or write them in the smtlib language.
Both of these make it easy to add various types of optimization, constraints, etc. to your problem, and they're much better as modeling languages than straight SAT. Underneath, they have solvers that leverage all the modern CDCL tricks.
We wrote up a paper [2] on how to formulate a modern dependency solver in ASP; it's helped tremendously for adding new types of features like options, variants, and complex compiler/arch dependencies to Spack [3]. You could not get good solutions to some of these problems without a capable and expressive solver.
There are also Constraint Programming solvers (some SAT based, some not) and (Mixed) Integer Programming solvers (not SAT based).
Each "school" excels at different types of problems. ASP for modelling a knowledge-base and running queries on it, CP for discrete optimization problems or for all-solution search, SMT for formal verification and proofs, MIP for optimization of (mostly) continuous variables.
Modern solvers in these "schools" can do things traditionally meant for other "schools". Z3 can do optimization, clingo can include CP-style constraints with clingcon, some MIP solvers can find all solutions, etc.
All in all, this type of "classical" AI is super interesting and I hope the hype on LLMs doesn't suck up all the funding that would go to research in this area.
The work on [2] is fascinating to me, both because of the problem domain and as a case study on the effective application of ASP. I will be reading this paper carefully to pore over the details.
I read Potassco's Answer Set Solving in Practice book [0] but it's pretty dense. I suspect it would be easier to digest if you read it while also following their course materials, which are all online [1].
These days I recommend people start with the Lifschitz book [2] and read through the Potassco book [0]. Lifschitz's book is a much gentler introduction to ASP and logic programming in general and its examples are in ASP code (not math). It's also more geared towards the programming side than the solving side, which is probably better for most people until they really want to understand what clingo/gringo/clasp are doing and what their limitations are.
There are other more applied courses, like Adam Smith's Applied ASP course at UCSC [3]. The problems in that course look like a lot of fun.
I second the recommendation to start with Lifschitz and move on to the Potassco book from there. To add: One does not need to know Prolog to get into ASP, the semantics are unique and more minimal. That said, I personally struggled with ASP before it clicked, it takes time to grasp the lingo and grok the semantics if you have never worked with something similar. Best to have a guide that introduces the concepts one at a time ("What do you mean, there's more than one type of negation?!")
The "Easy ASP" [0] tutorial from Potassco can walk you through some examples, if you'd like.
The playlist is aimed at a general scientific/business audience, the presenter suggests that a lot of natural and business systems can be described in this manner. The presenter also mentions how a Clingo program was used, without modification, to optimize radio frequency band allocation.
Here's a repository [1] of ASP programs in clingo. Under problem classes, I see mostly: game AIs, graph problems, various puzzles, so on.
> ... modern SAT solvers are fast, neat and criminally underused by the industry.
Modern SAT solvers are fast, neat and criminally undertaught by the universities. Seriously, why isn't this taught at the undergraduate level in CS70 [1] discrete math type courses or CS170 [2] analysis of algorithms type courses?
This is… not true. CS170 specifically teaches about reducing NP problems to SAT (you can find this in the Algorithms textbook linked in the class syllabus). I recall solving one of the projects by using MiniSat after converting a problem to 3-SAT. FWIW, the textbook is excellent and the course was very useful.
I definitely recall doing reductions from SAT in Algorithms courses. I think that is a common part of most curricula.
I don't recall being taught any practical uses of SAT. It was introduced only in the context of Cook's theorem, as the problem you needed to reduce to other problems in order to show NP-completeness.
I think most people now learn SAT in that theoretical context, not as a tool to solve problems.
To clarify, you're specifically talking about reductions to SAT, not from SAT, right?
Note the former is used as a solution technique for feeding into SAT solvers, where the latter's goal is basically the exact opposite (to show NP-hardness and hence algorithmic intractability). Formal methods courses do the former, but algorithms courses usually use SAT for the latter.
No. Algorithms courses focus on computability and complexity (including NP-completeness); they don't generally focus on SAT solving. Formal methods are the ones that use SAT solving, SMT solving, etc. to formally prove correctness.
Ah, thank you. We had theory classes with automata and reductions and complexity proofs, and then algorithms classes that covered some solving techniques. I think I mixed up Formal Methods with Theory.
I used to teach formal methods at university, including a course with a lot of SAT examples. We tried to make it as practical as possible, with many examples and exercises in Java (where you just generate your formulas and call a solve method). Thing is, most students seemed to hate it passionately, just like with reductions to SAT in complexity theory.
I wrote a bespoke backtracking solver for a specific class of problems. Would love to use Z3 or something, but frankly, I wouldn't know how to systematically translate problem instances to solver constraints. It's essentially a kind of very complex job-shop scheduling problem with lots of dynamic inter-dependencies. Many of the problems are hard to even solve without dead-locking, while we also naturally strive to minimize overall processing time. Where would I find ressources to help me get a grip on my specific problem [1]? Could I reasonably hope that Z3 or another general solver might be faster than a moderately well designed bespoke solver in a compiled language? (My solver combines a bunch of greedy heuristics with a backtracker in case it runs into a dead-lock, which is often.)
[1] Rough problem outline: Input goods must be transformed through a complex series of assembly/machining/processing lines to output goods; each line transforms one or two inputs into an (intermediary or end-) product; an assembly line produces it's product in a fixed number of time units and cannot be stopped or delayed; finished intermediary products arrive at the end of an assembly line, which must be cleared by then; there are a limited number of temporary storage spaces where intermediary products can be moved to/from in a fixed number of time units; some assembly lines must wait for two intermediary products to be completed to start a job combining them into another intermediary or end product; end products must then be moved to their destinations.
The pdf linked on this site is the biggest collection of SMT examples I know of: https://sat-smt.codes/
I’m no SMT expert, but the way I’ve done it is to make some representation in Python Z3, and then write some function or class that generates those. I was solving MLB eliminations (more complex than it sounds) and I think I used arrays of ints for number of wins. So I’d pull MLB data, turn that into schedule objects which turned themselves into z3 constraints.
This type of problem is more the domain of Constraint Programming (a related field). Job-scheduling problems are pretty much the main focus. I would look up MiniZinc (it even comes with a nice IDE) and see if you can grok it.
- Bridge crossing: Missionaries and cannibals, 17 minute bridge crossing (I need a computer to solve this anyway)
- Concurrency: finding bug in mutual exclusion algorithm (Peterson algorithm with a bug)
- Graphs: coloring a map, finding a Hamilton path
- Sorting: Finding bugs in a sorting network
For many students it was difficult to encode problems in SAT. They seemed to understand given example encodings, but then found it difficult to vary them. There is a lot of freedom in how one may encode things and it's hard at first to debug at first when things don't work in the way you're expecting. If there is no solution, then one needs to investigate where there are unwanted constraints or errors in the encoding. If there are unwanted solutions, one needs to identify the missing constraints. It was hard to get across how to do this and it's probably frustrating for beginners.
Both my Alma Maters had courses that used these extensively, and also planners like (Pl|Tr|L)ingeling. We also covered reducability and SAT in multiple courses in both.
These should also be taught in an advanced PL course, e.g liquid, dependent etc types.
I seem to recall that a poorly scaling sat solver in conda-forge broke so badly in 2020 that it shifted the tectonic plates underneath the entire academic python ecosystem away from conda and towards pip.
That was always a bit of a red herring, from my understanding. Yes, if you poorly model something into an ad hoc SAT solver, expect slowness.
Which is a bit of the general idea of these being underused. If you can get your problem into a SAT form or three, than feed it to a state of the art solver, it can work amazingly well. But, you will be spending a lot of time moving to and from the SAT formulation.
I don't. That said, I think the problems are typically small enough that you don't gain much by hunting for a good SAT formulation? Python doesn't do anything that any other dependency manager does. (Does it?)
Fun, I'll have to look at that. The major implication, though, is that yum does the same thing without an explicit sat formulation. Right?
Edit: I will note that the linked doc is silly old. And it seems that the original proposed replacement for YUM was Hawkeye, and is deprecated? But DNF is still steaming ahead? I didn't find any obvious links talking about performance. Did the SAT idea pan out? I'd almost think it was a bust, with some of these wikis. :(
I'm not sure what kind of evidence that would be. Version selection is NP-complete, so there is no known algorithm that efficiently solves all problem instances.
You can spend time looking really hard at the problem instances you have and identifying common patterns, and write a complex algorithm that works well as long as the dependencies you are trying to solve at least sort-of follow these patterns. This usually works well until it fails completely, at which point you can look really hard for new patterns in new use cases, and make your complex algorithm handle those as well.
But there's also the option of turning your problem into a SAT (or answer set programming, or constraint programming, or integer programming, etc.) formulation, using an existing SAT solver, and not have to write any complex algorithm at all.
From this and Dart, I think one of the lessons here is that SAT solvers are the wrong technique for solving dependencies. SAT solvers find “better” solutions by some metric, but humans using package managers want something which is both simpler and faster.
Not sure if related, to Schaefer's theorem, but I dove into Answer Set Programming [1] recently, which follows this approach, enabling the use of fast-ish SMT solvers, which are a generalization of SAT solvers! Boolean/Propositional Logic is to Predicate Logic as SAT is to SMT. There's a very nice course about it from the developers of Potassco, one of the best open source Answer Set Programming framework [2].
The syntax looks like Prolog, but predicate negations are a first class citizen, avoids infinite loops.
Prolog's approach is like a depth first search through a search tree -- ASP is like a nondeterministic turing machine, exploring all branches simultaneously from the bottom up.
My gut is that was the point? That Dart uses a SAT solver to no discernible advantage.
I will also note that this amuses me to no end. If you have enough dependencies that you need the speed of a SAT solver.... how many dependencies do you have? And why are they changing so dang much?
Poetry, pip, nix or sending out butterflies to bend cosmic rays to write bits into memory. It really doesn't matter as long as it's not the hellscape that is conda.
Also there had been a growing trend for most popular packages to offer precompiled wheels on PyPI instead of just sdist releases.
This meant that people who had moved to Conda because they couldn't get Pip to install important packages into their environment took another look and found that actually they could get things installed using Pip now.
At the same time Pip also got a resolver allowing you to have install time confidence you're not installing conflicting package, and recently (Pip 23.1+) the resolver's backtracking got pretty good.
That said Conda mostly solved this (and once mamba is the default resolver engine things will be really fast), Pip is not ever going to be a package manager, and Poetry still isn't an environment manager, and most other Python package/installer alternatives to Conda won't do things like install your Jupyterlab's nodejs dependency.
After many years I now almost exclusively use Pip to install into an environment, but still nothing beats Conda for bootstraping the non-Python-package requirement's (such as Python itself) nor for getting things working when you are in a weird environment that you can't install OS dev libraries into.
Is Conda actually moving towards making mamba the default? Last I heard, they were distinctly uninterested in that, since mamba is implemented in C++, and they would rather rely on their own slow Python code, which they can more easily modify.
Nice idea! The pattern matching compiler/optimizer in OCaml doesn't do this. It's implemented using this algorithm which I've attempted to understand a few times but is a bit beyond me:
Has there been any effort to formalise the subset of NP that lends itself to SAT resolution (is there something between x^n and n^x)?
For example, what are the defining characteristics of a graphs for which the travelling salesman problem is resolvable without resorting to brute force?
"In logic and computer science, the Boolean satisfiability problem (sometimes called propositional satisfiability problem and abbreviated SATISFIABILITY, SAT or B-SAT) is the problem of determining if there exists an interpretation that satisfies a given Boolean formula. In other words, it asks whether the variables of a given Boolean formula can be consistently replaced by the values TRUE or FALSE in such a way that the formula evaluates to TRUE."
> As an example, the often-talked-about dependency management problem, is also NP-Complete and thus translates into SAT[2][3], and SAT could be translated into dependency manager.
This reminds me of make[0] and of being made aware that make[0] is a SAT solver. I think it was when I attended a conference. Unfortunately I cannot find an authoritative source to quote, so will rely on, and be grateful to, the HN community to correct me should this be wrong.
FWIW, here's a little console-mode puzzle game of SAT problems, if you want to solve some manually. The "board" is not exactly like the example table in the post, since that one was for Sudoku in particular. This grid represents variables as rows and clauses as columns.
This brings back memories. In early 2000s, I wrote my undergrad thesis on a survey of SAT solving techniques. I believe the most capable general solver at the time was called DPLL and used a backtracking approach. My key insight at the time was that if you knew the "shape" of the SAT problem (you had domain specific insight) then you could take some shortcuts by using a custom algorithm. Eg this clause is always false and reduce the search space.
But what do you mean “fast”? If your problem ends up on the steep side of the exponential curve, it’s going to take a while to solve.
I had a lot of fun making my own CDCL solver in Rust, and I’ve really enjoyed messing with Z3 for some theoretical computer science. On all of my explorations, there was a very tangible problem size beyond which the solve time was unusable.
In the case of Z3 with most real world problems, the typical problem size is beyond this limit.
Z3 is actually not a particularly good SAT solver, you really want to use a dedicated tool for pure SAT problems. On the other hand if your problem is in a richer class like QBF or SMT then z3 shines and often you can use encoding tricks to scale problems significantly
The "AI Winter" was largely caused by people realizing building better logic, chess, or similar analytical engines proves to be a poor model for human like intelligence. The current renaissance is due to Machine Learning / Deep Learning based essentially on statistical models.
In the specific context of language there was a famous debate between Chompsky and Norvig that touches on these themes: http://norvig.com/chomsky.html
I believe events of recent years have not been kind to Chompsky's side of this debate. I'm less bullish on large language models turning into AGI than many people here, but I think if we do develop AGI it's a certainty it will be a based on probabilistic models, not logically consistent formalisms alone.