
MNIST Handwritten Digit Classifier – beginner neural network project - ironislands
https://github.com/karandesai-96/digit-classifier
======
imurray
Nice. Recreating these methods in simple code for yourself is definitely the
way to check you understand it. This demo looks nice, clean, and
straightforward. (Although I'd rename or comment variables x and y, or give
some sort of guidance on what way around the weight matrices are within the
code itself.)

It's also worth checking out existing neural net code-bases to see what tricks
they have. The fine details usually aren't in papers, and they're not all in
the text-books either.

The first potential problem that jumped out at me in this code was the
initialization:

    
    
        self.weights = [np.array([0])] + [np.random.randn(y, x)
                for y, x in zip(sizes[1:], sizes[:-1])]
    

If the number of units in a layer is H, the typical size of the input into the
layer above will be √(H). For large H, the sigmoid will usually saturate, and
the gradients will underflow to zero, making it impossible to learn anything.
There are some tricks to avoid the numerical problems, but even if you avoid
numerical-underflow, things probably aren't going to work well.

I'd multiply those initial weights by a small constant divided by the square-
root of weights going into the same neuron. For multiple layers you might
consider layer-by-layer pre-training. For other architectures, like recurrent
nets, definitely find a reference on how to do the initialization.

PS I would definitely add a test routine to check that the gradients from
back-propagation agree with a finite difference approximation. It's so easy to
get gradient code wrong, and it's so easy to test.

~~~
tyrael71
'It's also worth checking out existing neural net code-bases to see what
tricks they have. The fine details usually aren't in papers, and they're not
all in the text-books either.'

Given that you are a person who is highly-qualified to answer, I am genuinely
curious why do you think that is? Reimplementing algorithms from scratch is an
efficient way to learn, understand the underlying concepts and attempt
improvements in a research context.

~~~
imurray
A lot of machine-learning papers are eight pages. Speech conference papers
(heavy users of neural nets) are often only four. Some details aren't part of
the main message, so don't make it in. Often code is available, and
initialization and other tweaks can be found in there (even if you aren't
going to use their code).

That said, there are also whole papers, even collected volumes, on
initialization and other practical details.

Textbooks aren't always up-to-date with the latest practical knowledge, as
deep-learning practice is moving quickly. Or they simply don't want to clutter
their high-level maths descriptions with code-level implementation details.
Teaching stuff is all about tradeoffs. I'm sure several books _do_ mention the
scale of weights for simple feed-forward weights though, as it's not an
implementation-level detail, and it's probably been well known since the
1980s.

------
tyrael71
Can someone explain how this repo is so popular/ why it's so popular here?
This is a basic implementation of a relatively simple algorithm that you learn
when initially starting with ML/DL.

I do not want in any way to sound critical and am genuinely curious about the
dynamics of why people would find this interesting given it's reduced
complexity.

~~~
rch
[https://xkcd.com/1053/](https://xkcd.com/1053/)

~~~
d4rth_s1d10us
Haha nice comic ! :-D

------
sapphireblue
I have an old one written some time ago by myself too, in node.js/javascript:
[https://github.com/crystalline/dnnjs](https://github.com/crystalline/dnnjs)
It is a simple multilayer perceptron with ReLU nonlinearity, it can achieve
1.7% error on MNIST which is bad compared to convnets but good enough for
multilayer perceptron. Training a model is as simple as running "node node-
mnist.js" in terminal.

------
liamconnell
I'm seeing a lot of comments about the lack of non-MNIST neural network
tutorials. Well here's mine. It uses financial data and the goal is to build a
trading strategy. Its a work in progress and comments/criticism is welcome.

[https://github.com/LiamConnell/deep-
algotrading](https://github.com/LiamConnell/deep-algotrading)

~~~
vamega
Thanks. This looks really great!

------
partycoder
If you want to jump into this sort of thing, I highly recommend Google's
Udacity course for deep learning.
[https://classroom.udacity.com/courses/ud730](https://classroom.udacity.com/courses/ud730)

~~~
RockyMcNuts
it's very good, but personally I'd start with the Andrew Ng and Hinton courses

[https://www.coursera.org/learn/machine-
learning](https://www.coursera.org/learn/machine-learning)
[https://www.coursera.org/learn/neural-
networks](https://www.coursera.org/learn/neural-networks)

I think the Udacity course is best if you know principles of machine learning
and want to apply them in a more professional toolchain and learn Tensorflow

------
iopq
I thought Neural Networks nowadays use ReLU instead of Sigmoid? Especially in
the context of deep learning

~~~
asib
Looks like this implementation is based (in part) on the Stanford ML course,
which teaches nnets using sigmoid activation.

Given that it's intended to introduce to beginners how nnets work, the choice
of activation is an aside anyway - the real meat is back/forwardprop.

~~~
ironislands
Absolutely, though I'll put in softmax, tanh and ReLU activation functions in
near time. It isn't that difficult. Putting in more docstrings is also one of
my todo.

------
naveen99
I found the Arrayfire examples in c++11 very approachable:
[https://github.com/arrayfire/arrayfire/blob/devel/examples/m...](https://github.com/arrayfire/arrayfire/blob/devel/examples/machine_learning/neural_network.cpp)

The softmax_regression and logistic regression examples are even easier.

There are bindings for nodejs, python, other languages.

But it's so nice to be able to follow the definitions of each symbol and
function in visual studio, not to mention being able to step through the
imperative code.

And it's fast.

~~~
melonakos
Thank you for the kind words. If there is anything we can do to help, let us
know.

*I work at ArrayFire.

~~~
naveen99
Will do. Thank you for Arrayfire.

------
IshKebab
I kind of with people would stop using MNIST for NN tutorials. There are a
billion of them already. Do something different.

~~~
detaro
Suggestions for other interesting starter projects? Doing something with NN is
on the learning list...

~~~
IshKebab
The difficulty is always getting a dataset.

I've started trying to get a network to recognise different vowels
("aaahhhhhh", "eeeeee", "ooooooo", etc.). Relatively easy to generate data -
you just need your voice and a microphone. Downside is all the NN systems are
much more set up for images than sound.

Or what about neural net fingerprint recognition. There must be databases of
fingerprints somewhere. Or irises.

Recognise a type of wood from images of its grain?

Or activity recognition from accelerometer data. I think Pebble recently open
sourced their recogniser and it was surprisingly not a neural network. I'm
sure a neural network could do better. Might be hard to get a decent amount of
data here but this could be a good incentive to do exercise!

