
Real time numbers recognition (MNIST) on an iPhone with CoreML - uberneo
https://www.liip.ch/en/blog/numbers-recognition-mnist-on-an-iphone-with-coreml-from-a-to-z
======
yeldarb
Neat walkthrough!

Last year I actually made an applied-CoreML app to solve sudoku puzzles where
MNIST came in very handy.

I wrote about it here: [https://blog.prototypr.io/behind-the-magic-how-we-
built-the-...](https://blog.prototypr.io/behind-the-magic-how-we-built-the-
arkit-sudoku-solver-e586e5b685b0)

~~~
nothis
>After I scanned a wide variety of puzzles from each book, my server had
stored about 600,000 images

600,000?!? Even divided by 81 that's over 7000! How long did this take?

~~~
yeldarb
A couple of afternoons.

I just hacked into my app's flow to upload a "scan" of the isolated puzzle to
my server instead of slicing it and sending the component images to CoreML.

Then I sat there and flipped through page after page of Sudoku puzzles and
scanned them from a few different angles each, sliced them in bulk on the
server, and voila: data!

~~~
dangero
Sorry I’m still confused. You took roughly 7000 pictures in two afternoons?
What do you mean by sliced them in bulk? If you took them from different
angles how do you slice them in bulk?

~~~
yeldarb
Correct.

The app already had the code for "isolate the puzzle and do perspective
correction" so the uploaded images all looked something like this:
[https://magicsudoku.com/example-uploaded-
image.png](https://magicsudoku.com/example-uploaded-image.png)

By "slicing in bulk" I mean the server was the one that split that out into 81
smaller images rather than the app doing the slicing and uploading 81 small
images.

Taking them from different angles was done because the perspective correction
adds distortions that I didn't want my model to be sensitive to.

~~~
bigmit37
Interesting stuff! I’m also a little confused as to how you took so much
pictures in only a couple of afternoons.

~~~
jononor
7000 pictures at 5 seconds per picture is "only" 10 hours of work. Possibly
per-picture time can be lower than that too. Seems quite doable over 2-4
afternoons.

Props for doing the project end2end, including the non-trivial (and typically
skipped) part of collecting training data.

------
rahimnathwani
"Apple ... provides a ... helper library called coremltools that we can use to
... convert scikit-learn models, Keras and XGBoost models to CoreML"

Awesome.

------
a_c
As someone with not much experience in ML, how to handle when there is no
number present or if a number is present?

~~~
lozenge
The predictions variable has a confidence value for each digit. You can put a
cutoff and say if none is above a certain confidence, assume there's no number
at all.

~~~
jefft255
This could work, but it is important to note that a lot of ML algorithms
trained in a closed domain (no "other" class) will be pretty bad at knowing
what they don't know. This is an open problem in ML.

------
zackmorris
The scrollbar distance confirms a suspicion that I've held for some time: that
writing a machine learning algorithm is of similar complexity to developing an
iOS app in Xcode!

~~~
saagarjha
What scrollbar distance are you talking about?

~~~
zackmorris
It was a joke - the Xcode section starts about halfway down the page. I was
just illustrating that the friction we deal with today is of comparable
complexity to what might be thought of as advanced programming (AI, VR, AR,
physics, etc etc).

