Send+more=money and how to use forward-checking in search

lamchob · on July 21, 2021

The general method presented here is Constraint Programming (CP), a sibling of SAT and ILP. In CP, you can formulate constraints over variables. Search is then performed branch-n-bound style. A variable is assigned, then constraints check if the assignment is legal and remove non-solutions from the other variable's domains.

If anyone is interested: An accessible python module for this is python-constraint. Gecode is a powerful, modern C++ CP Framework.

mzl · on July 21, 2021

Here is the send more money puzzle in Gecode: https://github.com/Gecode/gecode/blob/develop/examples/money...

zshrdlu · on July 21, 2021

Here in a 30-year-old Common Lisp library: https://unwindprotect.com/constraint-programming

dragontamer · on July 21, 2021

And this exists because all of these problems are NP-complete, so "guess-and-check" is basically the state-of-the-art right now. CP, ILP, and 3SAT all can be converted into each other, so if you can find a faster algorithm than guess-and-check, you'll probably improve the state of the art.

mzl · on July 21, 2021

All of CP, SAT, and ILP are much more advanced than guess and check, although there are never any guarantees.

dragontamer · on July 21, 2021

IMO, "guess and check" describes WalkSAT and Backtracking (including DPLL) pretty effectively.

There's a lot of nuance in exactly how you're guessing-and-checking. But... its a pretty good idea. In the case of Backtracking-search with lookahead, the "guess" is your initial label / variable assignment. When it doesn't work out, maybe you perform some kind of "learning" over the guess (why didn't it work? Can it be written into a new constraint?), and then you guess again (aka: backtracking search).

When you're out of heuristics (variable-ordering heuristics, domain ordering heuristics, etc. etc.) , your best improvement is to arbitrarily choose another variable + assignment and see if it works out. When it doesn't work, you backtrack a bit and guess again with the new information you gathered.

mzl · on July 21, 2021

You are missing the crucial role of propagation - looking at the current state and making inferences based on that. Admittedly, DPLL is pretty weak on that with “just” unit propagation, but it is still _very_ important.

dragontamer · on July 21, 2021

If you told someone to "Guess-and-check" a Sudoku puzzle, they will innately "learn" about the puzzle from their guesses.

The question of "guess-and-check" is "how much time should I spend making a good guess" ?? If you spend a lot of time making inferences, you're doing "Strong guess and check".

If you spend almost no time making inferences, you're doing "weak guess and check".

--------

People who play Suduku puzzles know that "weak guess and check" solves easier puzzles faster, but "strong guess and check" solves harder puzzles faster. And worst of all, there's no way to really know if you have a hard or easy puzzle ahead of time.

So I argue that "guess and check" is a good layman's term for this whole affair. What you're calling "inference" and "unit propagation" are just the "strength of the guess"... the amount of time you spend thinking about your guess before dedicating yourself to it.

mzl · on July 21, 2021

Guess-and-check is usually another term for generate-and-test, which explicitly does not do propagation.

freemint · on July 21, 2021

Not all ILP can be converted to SAT, unbounded integers can not be processed in one SAT problem without Infinite formulas

freemint · on July 21, 2021

You missed another sibling: Pseudo Boolean Programming

petercooper · on July 21, 2021

Fun! It's beside the point of the article, but am I misinterpreting "There are 2 valid solutions" or is it at least 2 valid solutions? The first two solutions I came up with weren't the same as the author's but passed the checker (e.g. 7534+0825=08359)

mzl · on July 21, 2021

Normally, solutions that start with zeroes are prohibited in alphametic puzzles.

petercooper · on July 21, 2021

Yes, but one of his two valid suggestions also starts with zeroes (specifically, 2817+0368=03185).

jonsen · on July 21, 2021

Right:

https://en.m.wikipedia.org/wiki/Verbal_arithmetic

SoylentOrange · on July 22, 2021

I am OP and there are actually 25 solutions, with only one of them having all leading digits as non-zero.

I made a mistake in the article, where I meant to write "there are at least 2 solutions and here is one example". I will fix this. Nice catch.

petercooper · on July 22, 2021

Aha! I only found a few with my naive approach, but always good to check that I haven't lost all my marbles yet :-) Good post, look forward to more.

sunir · on July 21, 2021

I didn’t check his algorithm but if it only found one solution with a leading zero it is incorrect.

3821+0468=04289 as well.

akudha · on July 21, 2021

How hard would it be to write a program that can create such puzzles, given a set of few thousand valid English (or some other language) words?

dragontamer · on July 21, 2021

Its difficult to understand programs that do what you want, but relatively easy to write (!!!!).

I recommend reading Knuth's The Art of Computer Programming: Volume 4 Fascicle 5. Knuth covers the creation of word-puzzles such as crossword puzzles (embedding words into a 2-dimension plane that fits).

Knuth also covers the "Create a comma-free code" problem to a high amount of detail, which probably is most related to your question of "creating puzzles of a certain type".

TAOCP Fascicle 5 is pretty short but is a very difficult read due to the difficulty of the material. But it really does cover what you're asking for.

mzl · on July 21, 2021

It’s not terribly hard given a decent enough constraint programming environment I think, although I haven’t tried to implementing the construction problem yet.

Given that one is typically limited to at most 10 different letters (since each letter must have a unique corresponding digit), the number of possible word combinations is reasonably limited as well.

akudha · on July 21, 2021

Are you only thinking of the equation a + b = c? Maybe in this case it is limited, but if we try anything like a + b + c = d, then it might be quite a large list to check, no?

According to the website below, there are 5,454 four letter words, 12,478 five letter words and 22,157 six letter words (assuming one stops at six letter words).

https://www.bestwordlist.com/6letterwords.htm

If we allow well known names (sports stars, movie stars, presidents etc) then this list might become very large.

mzl · on July 21, 2021

Sure, there are lots but I think it should be doable. Will probably try sometime. Easiest is to make a model that fixes the lengths of the words, and just run it for different word lengths.

As a comparison, see this model solving crosswords using constraint programming. https://github.com/Gecode/gecode/blob/develop/examples/cross...

dragontamer · on July 21, 2021

FC is one of the "cheapest" checks.

"Maintaining Arc Consistency" is actually asymptotically equivalent (in the same O(n^2) complexity class as FC), but textbooks only list AC3 typically (which is suboptimal but pretty easy to understand. I'd compare it to "insertion sort"). AC4 is difficult to program and understand (especially difficult to extend beyond binary constraints), but worst-case asymptotically optimal.

Today: there's AC2001, a simple extension to AC3, which has best-case attributes similar to AC3 but is worst-case asymptotically optimal. Its also "obvious" how to extend into GAC (aka: the GAC2001 algorithm) to handle non-binary constraints.

This is an NP-complete problem, so there's really no way to know if FC, MAC (with AC2001), or even Path-consistency (a consistency level above MAC) is optimal. I guess the rule of thumb is that if there's "lots of solutions" out there, then you want to spend less time checking: and maybe FC is better.

If there are "very very few solutions", then MAC (and even Path-consistency) may be superior. A stronger check leads to cutting more of the search tree away, but requires a bigger-and-bigger preprocessor at every step.

The textbooks I've read seem to default to FC. But given the discovery of AC2001, I think the modern default probably should be MAC using AC2001.

----------------

Or in the "Sudoku terms", how long do you want to spend searching the tree, vs how long do you want to spend proving that the current search isn't "hopeless" ??

The longer you ensure that your "current choice" is "not hopeless", the longer your best-case search takes. The less time you spend on checking for "hopelessness", the longer the worst-case search takes (because you'll run into more hopeless situations but still spend tons of time searching through it).

FC, MAC (aka 2-consistency), Path Consistency (aka 3-consistency), and 4-Consistency, 5-consistency, 6-consistency... you can arbitrarily spend as much time as you like on checking for hopelessness. If you have 99-variables (like the 99-locations in Sudoku), checking for 99-consistency is equivalent to solving the puzzle in every possible way (If multiple solutions are possible, 99-consistency would find them all!!!). As such, 100-consistency cannot exist if you only have 99-variables, so 99-consistency is the limit on Sudoku.

Your "Big O" is something like (O(d^k)), where d is the size of domains (9 in Sudoku), and k is the consistency level you're aiming for. So O(9^2-ish) is what you're looking at for FC and MAC. O(9^99) is what you're looking at for 99-consistency. So aiming for higher levels of consistency scales poorly to say the least... and textbooks only briefly mention Path (3-consistency) and k-consistency in passing.

In practice, if you're in fact trying to "find all solutions", the brute force 99-variable == 99-consistency is suboptimal. In practice, you might find a 55-consistency relationship that's backtrack free (exponentially faster than 99-consistency), but this involves solving some rather nasty graph problems. Its absolutely worth the tradeoff but there be dragons here. (Finding the optimal ordering of variables means solving for the minimum-induced width of an induced graph of the variables, which is an NP-complete problem)

--------

There's also DAC: Directional Arc Consistency. Which is "more consistent" than FC but "less consistent" than full Arc Consistency (aka: MAC/Maintaining Arc Consistency). A naive implementation of DAC will be asymptotically optimal.

Needless to say: researchers have spent a lot of time looking at various "consistency" benchmarks for these backtracking problems.

mzl · on July 21, 2021

In general, modern constraint programming solvers runs the propagators for the constraints until fixpoint. The level of propagation for a constraint is typically chosen by the modeler, and is typically one of domain consistency, bounds consistency, or simple value propagation. No real world competitive CP solvers use the types of algorithms you mention for the whole problem, since constraints are given in intention (a+b<=c, all_different(x1, …, xn), …), not extension (list of valid tuples).

With that said, sometimes shaving (also called singleton arc consistency) is used, by testing all literal assignments and pruning those that directly give failure. See https://www.inf.tu-dresden.de/content/institutes/ki/cl/study... for an application of this to Sudoku

dragontamer · on July 21, 2021

Isn't the "benchmark" for typical propagators arc consistency?

I'm not too familiar with many kinds of constraints. But the constraints I am familiar with such as "Global Cardinality Constraint" (which is a generalization upon all_different) reaches arc-consistency after a basic "maximal flow" algorithm is applied to it.

Indeed: proving that the maximum-flow algorithm over global-cardinality-constraint (aka all_different) reaches arc-consistency proves that you can reach arc-consistency over the whole problem set by iterating over the fixed point.

-------

In effect, we quickly learn to reach arc-consistency using a different algorithm (ex: maximum flow) over the "all_different(x1, …, xn)".

You're right in that "arc-consistency classic" is not being done here. But the effect is the same. Maximum-flow(all_different(x1, x2... xn)) will lead to arc-consistency with the all_different constraint.

Now that you're "locally arc-consistent", you iterate over "a+b<=c" and make that constraint arc-consistent. If the domains are restricted again, you run the maximum-flow algorithm over all_different(x1, x2... xn) and achieve arc-consistency again. (Bonus points: the "last time" you ran maximum_flow is still relevant and serves to accelerate the 2nd run of the maximum_flow problem. That residual graph is still correct!! You just need to slightly update it given the new restrictions on the domains)

-------

In case you're curious: the all_different constraint being arc-consistent with maximum flow is pretty fun to think about. All variables are on one-side of a bipartite graph, while all assignments are on the 2nd side of a bipartite graph. Connect all variables to assigments that match their domains.

Connect a "super-source" to all variables, and "super-sink" to all values. Super-sink has a value of "1" for each connection to each variable (aka: each variable can only be assigned once).

Solve for maximum flow from super-source to the super-sink. Now that you have one particular maximum-flow (a possible assignment), it is possible to determine all other assignments by iterating the possibilities where the flow goes but remains maximum.

mzl · on July 21, 2021

For many constraints there are propagators that can reach domain consistency (all remaining values for all variables are part of some solution to the constraint, sometimes called generalized arc consistency). For some constraints the best propagator in practice is a much weaker one, as propagating to domain consistency is in itself NP-hard. Many times there is no actual formal consistency level definable. A great example of this is the circuit constraint that holds if a set of variables forms a Hamiltonian path in a graph. Hamiltonian paths is a classic NP-complete problem. The propagators for it make smart inferences, but not as a consistency level.

Crucially, propagators are for a single constraint only. For a constraint problem, you run all the propagators to a fix point. There is no single global “algorithm” here, and there are no higher level consistencies.

dragontamer · on July 21, 2021

But "Arc Consistency" doesn't solve a problem. It only removes "obviously wrong" assigments from a variable's domain.

The arc-consistency to the circuit constraint is NOT about solving Hamiltonian paths. Its about finding all "obviously wrong" assignments that can be discovered looking at 2-variables at a time.

Because all arc-consistency algorithms can be brute-forced by looking at all combinations of pairs of variables, we have at worst O(n^2) number of variables to consider to achieve arc-consistency.

Each variable only has a finite number of assignments (ignoring ILP with infinite numbers: assuming classical constraint programming). Which means that you can brute-force all possibilities in O(n^2 * d^2) where d is the cardinality of the domain, and n is the number of variables.

This means that reaching arc-consistency over the circuit / Hamiltonian path problem is in O(n^2*d^2) worst case (per propagation step. Another O(n) multiplier is added for reaching arc consistency over all variables), with possibly a specialized algorithm that goes faster than that brute force style.

mzl · on July 21, 2021

Arc consistency is not really a good term to use, since it was originally defined for 2-ary constraints only, but is sometimes used for n-ary constraints. That is why I use the term domain consistency consistently :-)

Given a propagator for circuit that reaches domain consistency (note, for the whole n-ary constraint), that is equivalent to a checker of there is a Hamiltonian path and that will most likely never be a polynomial algorithm (unless P=NP). I can’t say that I’m interested in any particular property of the decomposition of the constraint into 2-ary parts, it is not that interesting of a structure. The cool filterings that circuit propagators do are based on global analysis of the whole graph, not on any local property. For example by finding ordering a of strongly connected components in the current graph.

Almost all propagators need to be reasonably fast to be usable, that is sort of a given in this area. An exponential propagator is almost always a bad idea. For all_different there is a O(n^2.5) algorithm for domain consistency which is really cool and important. In practice, the simple value propagation (remove assigned values from the other variables) is often better for finding solutions quickly.

dragontamer · on July 21, 2021

I see your point, and my mistake in using "Arc Consistency" when that's mostly a theoretical (ie: binary constraints only) terminology.

Extending the concept of Arc-consistency out to general n-arity constraints leads to two, ambiguous generalizations. I took one generalization, you seem to have taken the other (because the generalization I took was uninteresting to the problems you are talking about).

mzl · on July 21, 2021

Yeah, it’s a suboptimal terminology, at least in my opinion.

I have a strong bias for the practical side of constraint programming systems, as may have come through.