
How We Built the ARKit Sudoku Solver - anielsen
https://blog.prototypr.io/behind-the-magic-how-we-built-the-arkit-sudoku-solver-e586e5b685b0
======
bnjmn
Since the author spent so much time on the optical character recognition step,
it's worth mentioning that you don't even really need OCR for this task.

You can just find the squares, group the characters in the squares into visual
equivalence classes, assign each class an arbitrary number, solve the puzzle
in terms of those numbers, then fill in each empty square with the (average?)
image of the equivalence class it matches.

This would allow you to solve a Sudoku puzzle with letters or WingDings
instead of numbers, and the output font would naturally match that of the
original puzzle.

~~~
yeldarb
That's an interesting idea! What happens if the source puzzle doesn't have all
of the digits present though?

~~~
ndh2
Take random characters from WingDings.

~~~
bnjmn
This! Still, admittedly, this is a drawback of my strategy.

------
amrrs
An earlier thread about the same app where the dev promised to write this
medium post:
[https://news.ycombinator.com/item?id=15299822](https://news.ycombinator.com/item?id=15299822)

------
Simon_says
> By the time we launched the app it was trained on over a million images of
> Sudoku squares.

This is super cool, but I can't help but think that something is missing if it
takes hundreds of thousands of examples of digits for a machine learning
algorithm to be able to differentiate them. It wouldn't take a human child
this many. The available machine learning algos are not using near the amount
of information available.

~~~
Cthulhu_
You'd think recognising a square (containing squares and more squares if need
be) would be relatively simple and not require advanced machine learning /
training. Or that recognising a square doesn't take as much. The demo also
indicates the sudoku needs to be fairly accurately scanned, similar to a QR
code.

~~~
zimpenfish
I don't think the square recognition used any machine learning.

> We use iOS11’s Vision Library to detect rectangles in the image.

Looking at
[https://github.com/gunapandianraj/iOS11-VisionFrameWork](https://github.com/gunapandianraj/iOS11-VisionFrameWork)
\- this definitely doesn't touch CoreML

~~~
yeldarb
It's unclear whether Vision uses machine learning behind the scenes though.
It's kind of implied in their docs that it uses CoreML behind the scenes
(which makes sense with the other things it does like Face recognition and
object tracking).

The nice thing is it detects "projected rectangular regions" so even if the
puzzle isn't aligned with the camera it still works.

I do wish I had more control though; it runs into trouble sometimes and
there's not much I can do other than apply heuristics afterwards to determine
whether I should throw out the sample or continue.

Example of a bad read from Vision Rectangle Detection:
[https://imgur.com/a/RSpTG](https://imgur.com/a/RSpTG)

~~~
zimpenfish
> Example of a bad read from Vision Rectangle Detection:
> [https://imgur.com/a/RSpTG](https://imgur.com/a/RSpTG)

Well, it's technically correct - it did find a rectangle :)

------
zaroth
Really enjoyed reading this. The process is really explained well including
all the fun rabbit holes and unexpected pitfalls on launch day, and the
technical steps to overcome.

Interesting limitations to work around such as vertical planes vs horizontal,
and focal length.

Not _at all_ surprised they saw better performance with almost immediate
payback by training models on their own $1,200 hardware than running in the
cloud.

Very interesting they trained their own character recognition model and not
only that but built their _own_ custom crowd-sourced image labeling system
complete with accuracy checks and review screens.

Overall, fantastic write-up!

~~~
yeldarb
"They" haha :P

I was surprised to read that IKEA had 70 employees working on their ARKit app!
([https://twitter.com/DanielZarick/status/917472837295837185](https://twitter.com/DanielZarick/status/917472837295837185))

This whole thing (including the backend tools) took me about the equivalent of
1 month of full-time work (I was doing it mostly nights & weekends though
since our games are what pay the bills).

I brought in one of my (excellent) designers from Hatchlings a couple days
before launch to make the cool grid "scanning" animation and to do our
branding and logo.

~~~
Someone
Only 70? I can see lots of work creating accurate (in size and in colors) 3D
models of every item in their catalog that look good, and maybe even more
discussing with management whether the current model accurately portrays the
product.

I don’t know whether the functionality is present (last time I checked, the
app wasn’t available in ‘my’ App Store), but integrating the app with their
inventory system(s) and translating it also can’t be free.

~~~
kemayo
They have some in-store kiosks for building mocked up rooms already, which
would plausibly have some usable assets there.

------
sgt101
The machine vision part of the ARKit project is definitely "wow"!

I had fun writing a Sudoku solver in Julia.

[https://github.com/sgt101/simons-silly-sudoko-in-
julia](https://github.com/sgt101/simons-silly-sudoko-in-julia)

------
DannyBee
So, 6 years ago, Google Goggles could solve sudoku puzzles for you from
pictures.

It's interesting to compare how far the interface/speed has come in 6 years if
you watch the videos of how it was done then:
[http://googlemobile.blogspot.com/2011/01/google-goggles-
gets...](http://googlemobile.blogspot.com/2011/01/google-goggles-gets-faster-
smarter-and.html)

(Goggles did not do the image handling on device)

------
langitbiru
I think, in a year or two, someone will build a crossword puzzles solver using
AR, ML, and computer vision. Granted, it is more difficult because we need to
recognize the alphabets and solving crossword puzzles is much harder than
solving sudoko. At least, the crossword puzzles solver can give word
recommendation if it can not solve the puzzle completely.

~~~
dfan
Crosswords are pretty tough, partially because many puzzles (such as the
Tuesday, Thursday, and Sunday puzzles in the New York Times) have enough
wordplay in their answers (not just the clues) that they break normal
crossword rules, in a specific way that the solver has to determine. I think
Dr. Fill ([https://arxiv.org/abs/1401.4597](https://arxiv.org/abs/1401.4597))
is still the state of the art.

------
rmorey
I found this very interesting, regarding crowdsourcing the training data:

"After the first pass I had enough verified data that I was able to add an
automatic accuracy checker into both tools for future data runs (it would
periodically show the user known images and check their work to determine how
much to trust their answers going forward)."

------
lingz
Seems like the challenge of applying AR is more in smart design than advanced
ML.

~~~
yeldarb
Good observation! Be sure to check out part 1 where I talked about the design
decisions behind the app: [https://blog.prototypr.io/why-we-built-magic-
sudoku-the-arki...](https://blog.prototypr.io/why-we-built-magic-sudoku-the-
arkit-sudoku-solver-306dde6c0a77)

------
gtm1260
Does anyone know how the detection of sudokus on vertical planes can be
achieved. Great article esp on the crowdsourcing and machine vision fronts,
but the authors explanation of this aspect left a lot to be desired.

~~~
yeldarb
Sorry, I was pretty hand-wavy with that because it was basically just trial
and error until it worked sufficiently well.

The data I had available to mess with was the difference in width of the top
of the puzzle and the bottom (with some trig you can determine its angle
relative to the camera) and the projection matrix of the camera relative to
the scene origin.

It's not perfect but it works better than having nothing at all.

------
elsurudo
Definitely cool, and I like the application of ARKit. But using ML to solve a
sudoku seems like overkill. I remember writing a constraint-based solver as
the first assignment for an undergrad-level AI course back in uni. Surely this
implementation is less efficient? Someone let me know if I am wrong.

If the aim was to simply learn new tech, though, then I get it. I am just wary
of ML being a hammer used on anything even remotely resembling a nail.

~~~
deafcalculus
The author says ML wasn't used for solving Sudoku. Vision was only used for
transducing the image of a sudoku puzzle to a puzzle structure in memory.

~~~
elsurudo
Oops, must have missed it. Thanks for the clarification.

~~~
dingo_bat
It says they used a "traditional recursive algorithm", probably referring to
the backtracking solution. In my experience it's fast enough to not matter for
this sort of application (the other things that are going on are 1000x more
complex).

~~~
eutectic
There is a small fraction puzzles for which simple backtracking gets stuck on
unproductive branches and essentially never finishes.

~~~
yeldarb
Do you have any examples? I'd love to improve the algorithm.

Someone on /r/programming pointed me here:
[http://apollon.issp.u-tokyo.ac.jp/~watanabe/sample/sudoku/in...](http://apollon.issp.u-tokyo.ac.jp/~watanabe/sample/sudoku/index.html)

But the app seems to already handle those Ok without doing anything special:
[https://www.dropbox.com/s/arfd03kr8ieczk5/recursive-
solver-k...](https://www.dropbox.com/s/arfd03kr8ieczk5/recursive-solver-
killer.mov?dl=0)

------
gitgud
Did they really acquire 600,000 images for training data by scanning books by
hand? How long did that take?

~~~
yeldarb
A couple of hours. I made a tool to do it automatically. Flip page, hold up
phone, move it around to a few different angles, flip page, repeat.

I should note that’s 600k small squares so each full puzzle scan yields 81
small images.

