
Can Neural Networks Crack Sudoku? - kyubyong
https://github.com/Kyubyong/sudoku
======
timdellinger
To me, this answers the question "How well can neural networks crack sudoku if
we forget all of our domain knowledge, and just try to brute force our way
through with insufficient training data and a neural network that's poorly
sized?".

I suggest an alternate approach that would use a training set consisting of
real-world data of sudoku puzzles that are in the process of being solved:
before a box is filled in, and then after it's filled in. This would teach the
network to solve the puzzle one step at a time, instead of all at once.

If you insist on only using the as-received puzzle and the solution for
training, my intuition is that you'll need more layers than 10 in the NN, and
much much more data.

A taxonomy of what complex solving strategies need to be learned already
exists: see e.g.
[http://www.sudokuwiki.org/y_wing_strategy](http://www.sudokuwiki.org/y_wing_strategy)
and the categories "basic", "tough", diabolical", and "extreme". My guess is
that the NN (using the original author's strategy, but with more layers and
lots more data) will eventually do well on "basic" and "tough", but will start
to miss on some of the latter conditions depending on the training set and how
the network is set up.

~~~
theptip
To me, it's an interesting question to ask how well NNs can perform without
hand-holding, i.e. can they figure out all of the complex strategies that
humans have figured out, starting from a blank slate?

C.f. AlphaGo, which invented new strategies that proved interesting to human
players.

Of course, if you're training your NN to solve a real-world problem, you
wouldn't choose to train it in this way, you would do as you suggested and
feed it as much relevant training data as possible, to give it a head-start.

(Your points RE: sizing and quantity of training data sound reasonable and are
out of my expertise, so nothings to add there).

~~~
PaulHoule
There is a very simple strategy that solves all Sudoku puzzles.

Pick one open square, try a number that is possible for that, backtrack if you
get stuck.

For better results, pick the open square which has the fewest number of
possible choices.

Human solvers might try something more efficient, but the above strategy is
not at all bad for computer implementation.

It's very much a solved problem with existing technology, see

[https://en.wikipedia.org/wiki/Satisfiability_modulo_theories](https://en.wikipedia.org/wiki/Satisfiability_modulo_theories)

adding a neural net to it doesn't help in any way.

~~~
andrewflnr
Yeah, yeah, sudoku is easy for computers. That's not the point. The point is
learning about neural networks. So yeah, "adding a neural net" does help with
that.

~~~
PaulHoule
In which case the hard part is finding a suitable problem; you might be better
off learning about SMT solvers because you don't have Marc Cuban telling
everybody to do it.

~~~
andrewflnr
We don't really know if it's suitable until we try it.

------
kortex
So much missing-the-point in this thread. The point isn't "can we solve sudoku
with a completely over-wrought solution", it's "can NN be applied to X class
of problems with no human-added domain specific knowledge". And that I think
is quite cool.

~~~
ska

       "can NN be applied to X class of problems with no human-added domain specific knowledge"
    

Universal approximation theory suggest that for a significant class of
problems, the answer to this is "obviously" [1]. The problem that remains is
how effective is the learning.

I'm not convinced applying them to areas like this one where the are clearly
much better approaches teaches us anything useful for more interesting cases,
but maybe it does.

[1] NB it's not immediately clear that OPs problem is in that class.

~~~
jmull
> teaches us anything useful...

Maybe the useful thing here is that Sudoku is a problem space a lot of people
understand pretty well even though it's not trivial. That makes a nice domain
for thinking about and understanding NN. I guess that people who are familiar
with the problem space of solving Sudoku and similar problems but who don't
know much about NN will find this pretty interesting but people who already
understand NN pretty well (or aren't familiar with solving puzzles like
Sudoku) won't.

~~~
ska
I think your guess is reasonable. It's the reason I wasn't very prescriptive:
i.e. NN is not a very sensible approach if the goal is a great Sudoku solver,
but the exercise might have other value I just can't immediately see.

~~~
bmh100
Implications around strategy learning, reinforcement, and learning logical
systems.

~~~
ska
The point is - would you do better off on these with a more applicable problem
domain? Do the things you learn transfer there well?

It's not obvious.

------
majewsky
Evidence that DNNs are somewhere between phase 2 and 3 of
[https://en.wikipedia.org/wiki/Hype_cycle](https://en.wikipedia.org/wiki/Hype_cycle)

~~~
dahart
I'm sometimes cynical about NNs too, but even if I were to take the hype cycle
theory at face value, I have to admit it's possible that NNs are reaching
maturity ("plateau of productivity") now, given that there were large hype
cycles over them in the 1960s and 1980s, and given that they are starting to
really work and are being deployed in large scale applications. But, this hype
cycle theory isn't scientific, so all supporting evidence is anecdotal and
subjective, right?

------
wiredfool
Soduku is trivially directly solvable for puzzles that are not in the
hard/ultra hard level by propagating constraints.

For cases where that's not possible, iterative approaches can be very
effective -- with not particularly optimized python, the worst solution time I
ever saw was 1 sec on a 1st gen eeepc netbook. That had about 6 layers of
iterative backtracking before it came up with the solution.

~~~
jcoffland
I wrote a Soduku solver in Python while waiting in the airport on a long
layover that solves even the most difficult puzzles in a few seconds on an now
old laptop. It requires some backtracking but it's really not that hard.

 _Edit_ : I decided to publish my implementation on GitHub.
[https://github.com/jcoffland/fsudoku](https://github.com/jcoffland/fsudoku)

~~~
wiredfool
Yeah, I was seeing if I could do the complete solution faster than a single
solution. But I was at the inlaw's dinner table, not the airport.

This is it, uploaded a few years later in a repo for soduku over sms:

[https://github.com/wiredfool/sms-
doku/blob/master/so.py](https://github.com/wiredfool/sms-
doku/blob/master/so.py)

------
brian_herman
You can solve sudoku with a SAT solver. You don't need neural networks, this
was an assignment in cs 251 at UIC.

~~~
likelynew
There are no easy way to solve SAT as the size of sudoku. Modern day SAT
solvers are very large and contains years of experience, containing 10s of
heuristics and complex structures. If we can simplify using neural network, I
think it's a great step.

~~~
contravariant
Well, it is really more of a exact cover problem, which can be solved quite
simply and elegantly with Knuth's algorithm X.

A neural network isn't simpler in any sense of the word. You might as well
throw a simulated annealer at the problem.

~~~
likelynew
I am not saying it is not. My condition is in what if neural network can solve
a problem without any domain knowledge. I think it's a huge win even if we
don't understand anything about the solution. Be it well solved problems like
approximate minimum distance in a graph to practically unsolvable like
automatically proving unproved theorems in mathematics.

~~~
likelynew
I take it this way. How many problems can you solve using knowing mostly exact
cover. Not very many. But, if we can build ML systems that can solve mostly
any problem(we can't today) even without giving any insight, I would say it
will be one of the things whose impact will surpass anything. Note, I am not
getting into AGI, just saying a system that can solve objective problems. And
while sudoku is not a very good example, but I think it shows we can do many
cool things from neural networks. Neural networks are the most sure bet for
such a system.

~~~
taeric
You'd be surprised how far exact cover can get you. The trick isn't in knowing
exact cover, per se. But in seeing how to map problems to different problems.

That is, the "exact cover" nature of Sudoku is not immediately obvious to
everyone. At least, it wasn't obvious to me. Seeing how quickly you can map it
to that and then get a solution was a lot of fun and ridiculously educational.

------
the-dude
Obligatory reference to the (former?) Prime Minister of Singapore :
[https://arstechnica.com/information-
technology/2015/05/prime...](https://arstechnica.com/information-
technology/2015/05/prime-minister-of-singapore-shares-his-c-code-for-sudoku-
solver/)

------
willvarfar
Every time sudoku comes up on HN I think back to when Ron Jeffries tried to
solve it using Test Driven Development ;)

[http://ravimohan.blogspot.se/2007/04/learning-from-sudoku-
so...](http://ravimohan.blogspot.se/2007/04/learning-from-sudoku-solvers.html)
\- enjoy :D

~~~
c3534l
I was going to say those really don't seem comparable, but after reading
Jeffries', that really was awful.

------
sarabande
I find it interesting that when the NN fails to complete the whole puzzle, it
seems to fail spectacularly (github.com/Kyubyong/sudoku#results) -- that is,
there aren't a lot of good-partial attempts (90% or above). Does anyone know
why this might be the case? Does the first wrong placement of a number in a
gap essentially ruin the rest of the guesses?

~~~
suddensleep
My intuition says yes; if a wrong number is placed in a square and then taken
as ground truth for solving the rest of the puzzle, it certainly seems like
the error would propagate.

As a concrete example, say you misplace a '2' somewhere within a given puzzle.
Obviously, this cell is incorrect. But depending on the nature of what the NN
has learned, it may believe the row (resp. column, 3x3 box constraint) already
has the '2' in it, so tries to fill its correct spot with another number.
Which of course then leads to the column and/or 3x3 box of that cell to learn
an incorrect value, starting the process over again.

This same phenomenon can be seen in the game Kenken; depending on the
strategies you use at any given point in the game, one mistake can propagate
outward pretty quickly and spoil large sections of the puzzle.

------
benp84
As a combinatorial optimization problem, integer programming is the best tool
for solving Sudoku. Fast, accurate, and guarantees the optimality of the
solution. I'm impressed at how well this neural network did anyway though!

------
nathell
> 1M games were generated using generate_sudoku.py for training. I've uploaded
> them on the Kaggle dataset storage.

No need to do that. It would suffice to just include the random seed in the
script so that its results are reproducible.

~~~
jononor
One should do both. Scripts and their dependencies have bugs, potentially
leading to another sequence of games. At the very least one should specify a
couple of the games, so one can check against them.

------
trextrex
That looks interesting. Sudoku is a fairly useful constraint satisfaction
problem/testbed to test neural networks architectures on, since the most
general version of sudoku is NP-complete.

There was some interesting work a while ago that solved sudoku using spiking
neural networks too [1] (Fig. 5) with some nice associated theory.

[1]
[http://journals.plos.org/ploscompbiol/article?id=10.1371/jou...](http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003311)

~~~
bhaavan
I thought Sudoku can be solved in constant time. It would be a O(9! * 9! ) or
something, which is just a O(1), isn't it?

~~~
dahart
A constant time solution in years or eons isn't very helpful. There are around
6.7e21 complete sudoku boards if you want to brute force enumerate them.

[https://en.m.wikipedia.org/wiki/Mathematics_of_Sudoku#Enumer...](https://en.m.wikipedia.org/wiki/Mathematics_of_Sudoku#Enumerating_all_possible_Sudoku_solutions)

~~~
tom_mellior
> if you want to brute force enumerate them

But you don't. Even what you might call "brute force" Sudoku solvers start
from the given hints in the puzzle, which reduce the search space
considerably. Given a set of Sudoku hints with a unique solution, you won't
need "years or eons" with even a very simple solver.

~~~
dahart
Yes, of course you don't, I think you misunderstood my comment. The comment I
replied to was asking about true brute force enumeration of all boards -- not
a solver that uses inferences based on the hints -- he guessed that you only
need 9! * 9! total attempts which implies enumerating all boards and picking
the one that matches the initial hints. This is in fact a constant time
solution like he guessed, but it is also intractable.

A normal "very simple" solver of the kind you're talking about will solve the
solution very quickly, in seconds or less, and it will also not be a constant
time algorithm.

~~~
mamon
9!*9! is just little over 131 billion dashboards. checking correctnes (no
number is repeated in row or column) is as simple as counting a sum in all
boxes, rows and columns.

Generation of all possible boards is little more complex but I can hardly see
how it could take "eons" \- with proper implementation (Apache Spark :P ) and
reasonably powerful hardware (1000 CPU core cluster, ha! ha! ha!) it should
run under 1 day and take just little over 13 TB of disk/RAM space :).

~~~
dahart
9! * 9! is not the number of boards, that is a bad estimate. If you don't
account for symmetries, rotations, re-numberings and other things, the number
of boards is 6.7e21. Even if you could check a full board in a nanosecond
(which you can't) enumerating that number would take more than 200,000 cpu-
years.

The paper linked to in the OP's article says: "Due to the sheer number of
sudoku solution grids a brute force search would have been infeasible, but we
found a better approach to make this project possible. Our software for
exhaustively searching through a completed sudoku grid, named checker, was
originally released in 2006. However, this first version was rather slow.
Indeed, the paper [1] estimates that our original checker of late 2006 would
take over 300,000 processor-years in order to search every sudoku grid."

[https://arxiv.org/pdf/1201.0749.pdf](https://arxiv.org/pdf/1201.0749.pdf)

The estimate in the comment (9!*9!) seems to be implying a simple enumeration,
not a complex strategy of symmetry-folding. But even if you do reduce the
enumeration, the authors of that paper say their software requires 800 CPU
years. I'm not making any claims about whether getting that down to a day
might be possible, but I wish you good luck. By all means, show everyone how
to do it with a proper implementation and a large cluster! ;)

------
zeckalpha
Sure, but so can decision trees.

------
sr2
It's a fascinating subject. Worth reading this entry for more details:

[https://en.wikipedia.org/wiki/Sudoku_solving_algorithms](https://en.wikipedia.org/wiki/Sudoku_solving_algorithms)

And apparently quantum computers can solve them using sheer brute force alone,
but these articles don't go into enough detail (i.e: can they solve the 'evil'
puzzles?)

[https://www.engadget.com/2007/02/14/worlds-first-
commercial-...](https://www.engadget.com/2007/02/14/worlds-first-commercial-
quantum-computer-solves-sudoku/)

[https://www.scientificamerican.com/article/first-
commercial-...](https://www.scientificamerican.com/article/first-commercial-
quantum-computer/)

------
xor0110
Coming up next: CNNs applied to training CNNs

~~~
dimatura
That actually happened a long time ago.

------
tbonza
Peter Norvig cracked Sudoku using a clever algorithm:
[http://norvig.com/sudoku.html](http://norvig.com/sudoku.html)

------
andreyk
Using 10 convolutional layers seems questionable - the precise values and
location of each number is essential for getting the right answer, so it seems
like the model design is poorly suited to the task. The obvious approach to me
would be to train a neural net to be a heuristic to order exploration in a
constraint satisfaction solver, mirroring how AlphaGo combines monte carlo
search trees with neural nets. Might net a nice speed up.

------
tinyrick2
A more interesting problem would be to train an n x n sudoku and test the
model on an m x m sudoku where m < n, or even m > n. Does it make sense?

edit: formatting

------
jcoffland
Here is my collection of standard sized Sudoku test puzzles:
[https://github.com/jcoffland/fsudoku/tree/master/puzzles](https://github.com/jcoffland/fsudoku/tree/master/puzzles)

Some of these are the hardest possible puzzles, with a solution, for the size.

------
bluetwo
Funny, I taught my AI engine to solve easy, medium, and hard NYT sudoku last
week.

It needed only 150 tries to solve the hard puzzle.

------
faragon
I would like to know if can be proven that NN for problems involving
permutations can be as efficient as a SAT solver, less, or more efficient
(e.g. if using the training just as a acceleration avoid learning futile
permutations works for saving training time, or if it is not noticeable)

------
dkarapetyan
This is interesting. My intuition says NNs will be terrible at combinatorial
optimization types of problems. Then again Go is a combinatorial type of game
and AlphaGo beat everyone.

Maybe an NN for Sudoku should use AlphaGo type of architecture.

------
shriphani
Indeed. A simple solution is to add a differential optimization layer (see
Amos et al):
[https://arxiv.org/abs/1703.00443](https://arxiv.org/abs/1703.00443)

------
DrNuke
One underlying question is who pays for diffuse crowdsourcing in 2017? I have
indirect notice of at least a dozen small enterprises suspected of raiding
githubs daily and refactorizing for commercial purposes.

------
picrin
I'd say that using neural networks to solve sudoku-like puzzles is a bad idea
-- a SAT solver, constraints program or integer program would be all quicker
to run and quicker to implement.

------
cabalamat
I can't help thinking that a gofai approach would be a better way to solve
sudoku than an NN.

------
santaclaus
Interesting -- there must be a large literature on neural nets and NP-
completeness, no?

------
1001101
If the wet ones can, I don't see why the ones of an arbitrary size made out of
bits can't. Not a very good argument, but there's this: recurrent neural
networks are Turing complete.

~~~
sp332
Humans don't learn how to play Sudoku by staring at a bunch of solved puzzles.
We are given the rules up front.

------
gm-conspiracy
Nail, meet hammer.

~~~
jcoffland
Or rather, vaguely nail shaped object, meet hammer.

~~~
webninja
When you have a hammer, everything starts looking like a nail.

