
MNIST Handwritten digits classification using Keras, part 1 - Aryal007
http://www.python36.com/mnist-handwritten-digits-classification-using-keras/
======
appleflaxen
With all the progress in machine learning it seems like there should be an
amazing OCR tool that works out of the box for structured documents.

Does it exist?

I've used tesseract and its relatives, but they seem to have a hard time with
any document that's not a single column. The difference between what they can
achieve (which to be fair is amazing) with what I _expected_ based on all the
ML demos that only does the first 10% (numbers only, no structure), but does
it in a 30-minute demo, is big. Things like affine transforms (scaling,
rotation) and decorations like bold, underline, and weird fonts create even
more problems.

Why isn't there a docker container with an AWS lambda function in it that
takes any image format I upload (pdf, png, jpg as the most critical) and
returns a UTF string of its content?

My god, I'm spoiled by technology.

~~~
zzleeper
Agreed.

What I do for tables is use the OCR as a step 1, where I just extract the
coordinates of each textbox. Then, use one of several methods to recognize
columns (Hough transform, or we search for empty areas). Then, do some
clustering to recognize lines.

The problem is that there's a million of edge cases that you need to handle
(split boxes that span multiple columns, in the case of tables recognize cells
that span multiple columns or rows, etc.). There _should_ be a solution for
all these edge cases, but it gets tricky quickly.

~~~
GCU-Empiricist
Having worked on some of these questions, then there comes the question of
logical flow, how do you order that inset square of text at the bigger font,
is it a picture caption or a header text?

------
ssivark
(Apologies in advance if this sounds snarky, don't mean to be)

There are hundreds of posts on using Keras/TensorFlow/PyTorch/etc to do MNIST
classification, and many examples on Github. All these resources are very easy
to find by Googling. This post doesn't seem to have anything different to add
to the conversation. So, why do such articles continue to be written, and why
do they still get upvoted on HN? Is it that a lot of people want to learn
these things, but haven't gotten a chance to, so they upvote with the hope of
staying in touch with the topic? Is it FOMO? One might be forgiven for
considering this spam.

~~~
dang
Moderators routinely downweight tutorial articles for that reason. It took a
surprisingly long time to realize it, but tutorials aren't a good fit for
Hacker News.

HN is for articles that gratify intellectual curiosity. Most tutorials don't
do that; it doesn't fit with being a recipe for a task. If a recipe were to
dive deeply into how and why its steps work, how they got that way, and
compared other recipes that have (or don't have) similar steps, it would
gratify intellectual curiosity much better. But it would be a worse recipe. It
would also be a lot more work to write.

The worst kind of tutorial from an HN point of view is the kind filled with
arbitrary details that apply only to a particular program or product, like
"Now put a file named foo in a directory called baz". But that is almost all
of them.

[https://news.ycombinator.com/newsguidelines.html](https://news.ycombinator.com/newsguidelines.html)

------
stared
“Many good ideas will not work well on MNIST (e.g. batch norm). Inversely[,]
many bad ideas may work on MNIST and no[t] transfer to real [computer vision]”
– a tweet by François Chollet (creator of Keras)

So - please, anything harder (or rather: more relevant to deep learning). At
least images in CIFAR-10: [https://blog.deepsense.ai/deep-learning-hands-on-
image-class...](https://blog.deepsense.ai/deep-learning-hands-on-image-
classification/)

~~~
julsimon
Or Fashion-MNIST [https://github.com/zalandoresearch/fashion-
mnist](https://github.com/zalandoresearch/fashion-mnist) :)

