If anyone is interested: An accessible python module for this is python-constraint. Gecode is a powerful, modern C++ CP Framework.
There's a lot of nuance in exactly how you're guessing-and-checking. But... its a pretty good idea. In the case of Backtracking-search with lookahead, the "guess" is your initial label / variable assignment. When it doesn't work out, maybe you perform some kind of "learning" over the guess (why didn't it work? Can it be written into a new constraint?), and then you guess again (aka: backtracking search).
When you're out of heuristics (variable-ordering heuristics, domain ordering heuristics, etc. etc.) , your best improvement is to arbitrarily choose another variable + assignment and see if it works out. When it doesn't work, you backtrack a bit and guess again with the new information you gathered.
The question of "guess-and-check" is "how much time should I spend making a good guess" ?? If you spend a lot of time making inferences, you're doing "Strong guess and check".
If you spend almost no time making inferences, you're doing "weak guess and check".
People who play Suduku puzzles know that "weak guess and check" solves easier puzzles faster, but "strong guess and check" solves harder puzzles faster. And worst of all, there's no way to really know if you have a hard or easy puzzle ahead of time.
So I argue that "guess and check" is a good layman's term for this whole affair. What you're calling "inference" and "unit propagation" are just the "strength of the guess"... the amount of time you spend thinking about your guess before dedicating yourself to it.
I made a mistake in the article, where I meant to write "there are at least 2 solutions and here is one example". I will fix this. Nice catch.
3821+0468=04289 as well.
I recommend reading Knuth's The Art of Computer Programming: Volume 4 Fascicle 5. Knuth covers the creation of word-puzzles such as crossword puzzles (embedding words into a 2-dimension plane that fits).
Knuth also covers the "Create a comma-free code" problem to a high amount of detail, which probably is most related to your question of "creating puzzles of a certain type".
TAOCP Fascicle 5 is pretty short but is a very difficult read due to the difficulty of the material. But it really does cover what you're asking for.
Given that one is typically limited to at most 10 different letters (since each letter must have a unique corresponding digit), the number of possible word combinations is reasonably limited as well.
According to the website below, there are 5,454 four letter words, 12,478 five letter words and 22,157 six letter words (assuming one stops at six letter words).
If we allow well known names (sports stars, movie stars, presidents etc) then this list might become very large.
As a comparison, see this model solving crosswords using constraint programming. https://github.com/Gecode/gecode/blob/develop/examples/cross...
"Maintaining Arc Consistency" is actually asymptotically equivalent (in the same O(n^2) complexity class as FC), but textbooks only list AC3 typically (which is suboptimal but pretty easy to understand. I'd compare it to "insertion sort"). AC4 is difficult to program and understand (especially difficult to extend beyond binary constraints), but worst-case asymptotically optimal.
Today: there's AC2001, a simple extension to AC3, which has best-case attributes similar to AC3 but is worst-case asymptotically optimal. Its also "obvious" how to extend into GAC (aka: the GAC2001 algorithm) to handle non-binary constraints.
This is an NP-complete problem, so there's really no way to know if FC, MAC (with AC2001), or even Path-consistency (a consistency level above MAC) is optimal. I guess the rule of thumb is that if there's "lots of solutions" out there, then you want to spend less time checking: and maybe FC is better.
If there are "very very few solutions", then MAC (and even Path-consistency) may be superior. A stronger check leads to cutting more of the search tree away, but requires a bigger-and-bigger preprocessor at every step.
The textbooks I've read seem to default to FC. But given the discovery of AC2001, I think the modern default probably should be MAC using AC2001.
Or in the "Sudoku terms", how long do you want to spend searching the tree, vs how long do you want to spend proving that the current search isn't "hopeless" ??
The longer you ensure that your "current choice" is "not hopeless", the longer your best-case search takes. The less time you spend on checking for "hopelessness", the longer the worst-case search takes (because you'll run into more hopeless situations but still spend tons of time searching through it).
FC, MAC (aka 2-consistency), Path Consistency (aka 3-consistency), and 4-Consistency, 5-consistency, 6-consistency... you can arbitrarily spend as much time as you like on checking for hopelessness. If you have 99-variables (like the 99-locations in Sudoku), checking for 99-consistency is equivalent to solving the puzzle in every possible way (If multiple solutions are possible, 99-consistency would find them all!!!). As such, 100-consistency cannot exist if you only have 99-variables, so 99-consistency is the limit on Sudoku.
Your "Big O" is something like (O(d^k)), where d is the size of domains (9 in Sudoku), and k is the consistency level you're aiming for. So O(9^2-ish) is what you're looking at for FC and MAC. O(9^99) is what you're looking at for 99-consistency. So aiming for higher levels of consistency scales poorly to say the least... and textbooks only briefly mention Path (3-consistency) and k-consistency in passing.
In practice, if you're in fact trying to "find all solutions", the brute force 99-variable == 99-consistency is suboptimal. In practice, you might find a 55-consistency relationship that's backtrack free (exponentially faster than 99-consistency), but this involves solving some rather nasty graph problems. Its absolutely worth the tradeoff but there be dragons here. (Finding the optimal ordering of variables means solving for the minimum-induced width of an induced graph of the variables, which is an NP-complete problem)
There's also DAC: Directional Arc Consistency. Which is "more consistent" than FC but "less consistent" than full Arc Consistency (aka: MAC/Maintaining Arc Consistency). A naive implementation of DAC will be asymptotically optimal.
Needless to say: researchers have spent a lot of time looking at various "consistency" benchmarks for these backtracking problems.
With that said, sometimes shaving (also called singleton arc consistency) is used, by testing all literal assignments and pruning those that directly give failure. See https://www.inf.tu-dresden.de/content/institutes/ki/cl/study... for an application of this to Sudoku
I'm not too familiar with many kinds of constraints. But the constraints I am familiar with such as "Global Cardinality Constraint" (which is a generalization upon all_different) reaches arc-consistency after a basic "maximal flow" algorithm is applied to it.
Indeed: proving that the maximum-flow algorithm over global-cardinality-constraint (aka all_different) reaches arc-consistency proves that you can reach arc-consistency over the whole problem set by iterating over the fixed point.
In effect, we quickly learn to reach arc-consistency using a different algorithm (ex: maximum flow) over the "all_different(x1, …, xn)".
You're right in that "arc-consistency classic" is not being done here. But the effect is the same. Maximum-flow(all_different(x1, x2... xn)) will lead to arc-consistency with the all_different constraint.
Now that you're "locally arc-consistent", you iterate over "a+b<=c" and make that constraint arc-consistent. If the domains are restricted again, you run the maximum-flow algorithm over all_different(x1, x2... xn) and achieve arc-consistency again. (Bonus points: the "last time" you ran maximum_flow is still relevant and serves to accelerate the 2nd run of the maximum_flow problem. That residual graph is still correct!! You just need to slightly update it given the new restrictions on the domains)
In case you're curious: the all_different constraint being arc-consistent with maximum flow is pretty fun to think about. All variables are on one-side of a bipartite graph, while all assignments are on the 2nd side of a bipartite graph. Connect all variables to assigments that match their domains.
Connect a "super-source" to all variables, and "super-sink" to all values. Super-sink has a value of "1" for each connection to each variable (aka: each variable can only be assigned once).
Solve for maximum flow from super-source to the super-sink. Now that you have one particular maximum-flow (a possible assignment), it is possible to determine all other assignments by iterating the possibilities where the flow goes but remains maximum.
Crucially, propagators are for a single constraint only. For a constraint problem, you run all the propagators to a fix point. There is no single global “algorithm” here, and there are no higher level consistencies.
The arc-consistency to the circuit constraint is NOT about solving Hamiltonian paths. Its about finding all "obviously wrong" assignments that can be discovered looking at 2-variables at a time.
Because all arc-consistency algorithms can be brute-forced by looking at all combinations of pairs of variables, we have at worst O(n^2) number of variables to consider to achieve arc-consistency.
Each variable only has a finite number of assignments (ignoring ILP with infinite numbers: assuming classical constraint programming). Which means that you can brute-force all possibilities in O(n^2 * d^2) where d is the cardinality of the domain, and n is the number of variables.
This means that reaching arc-consistency over the circuit / Hamiltonian path problem is in O(n^2*d^2) worst case (per propagation step. Another O(n) multiplier is added for reaching arc consistency over all variables), with possibly a specialized algorithm that goes faster than that brute force style.
Given a propagator for circuit that reaches domain consistency (note, for the whole n-ary constraint), that is equivalent to a checker of there is a Hamiltonian path and that will most likely never be a polynomial algorithm (unless P=NP). I can’t say that I’m interested in any particular property of the decomposition of the constraint into 2-ary parts, it is not that interesting of a structure. The cool filterings that circuit propagators do are based on global analysis of the whole graph, not on any local property. For example by finding ordering a of strongly connected components in the current graph.
Almost all propagators need to be reasonably fast to be usable, that is sort of a given in this area. An exponential propagator is almost always a bad idea. For all_different there is a O(n^2.5) algorithm for domain consistency which is really cool and important. In practice, the simple value propagation (remove assigned values from the other variables) is often better for finding solutions quickly.
Extending the concept of Arc-consistency out to general n-arity constraints leads to two, ambiguous generalizations. I took one generalization, you seem to have taken the other (because the generalization I took was uninteresting to the problems you are talking about).
I have a strong bias for the practical side of constraint programming systems, as may have come through.