

Beginning deep learning with 500 lines of Julia - milvakili
http://www.denizyuret.com/2015/02/beginning-deep-learning-with-500-lines.html

======
kiyoto
"I wanted to write something that is concise, easy to understand, easy to
extend, and reasonably efficient. There is a subtle trade-off between
conciseness and extensibility: If we use a very high level language that
already has a "neural_network_train" function, we can write very concise code
but we lose the ability to change the training algorithm. If we use a very low
level language that only provides primitive arithmetic operations, all the
algorithm details are exposed and modifiable but the code is bulky and
difficult to understand. For a happy medium, the code should reflect the level
at which I think of the problem"

The tradeoff is not between concision and extensibility, but high- and low-
level computations.

Even if the language natively implements a "neural_network_train" function, as
long as the language also offers low-level primitives to implement all the
necessary parts of the neural_network function, the language is no less
extensible than the OP's suggested alternative. For example, almost 100% of R
users use "lm" to run linear regressions, but R has all the necessary pieces
to implement the linear regression calculation (either by inverting matrices
or running iterative gradient descent algorithms)

The OP conflates the library-level abstraction and language-level abstraction.
I am with him in that there is a trade off between concision and extensibility
w/r/t language-level abstraction. The library-level abstraction is
pragmatically important (i.e., you would not use OCaml to run websites) but
theoretically uninteresting (Ocaml can certainly express all the needed
computation for a web server).

~~~
Houshalter
As a practical example, brain.js is a very limited neural network library. You
can only provide a dataset and train on that dataset. You can't implement any
variations like dropout, stochastic gradient descent, momentum, etc.

Whereas a good neural network library like Torch lets you work at a much lower
level of abstraction. You can put together individual layers, and it gives you
the internal code for doing forward and backward passes, and chaining them
together.

~~~
kiyoto
That's exactly what I am talking about. Your anti-example is a case of
library-level abstraction. Now, if you are saying that JavaScript doesn't lend
itself to writing extensible machine learning routines whereas Lua does, it's
a different story.

------
film42
The gist of this is really the Backpropagation algorithm [1]. It appears as
though everything else is a means to detect when to stop. One of the clearest
walkthroughs I've seen online is a short series of very short videos by
Stephen C Welch [2] [3]. If you're interested in this space, start with him.

[1]:
[http://en.wikipedia.org/wiki/Backpropagation](http://en.wikipedia.org/wiki/Backpropagation)

[2]:
[https://www.youtube.com/watch?v=bxe2T-V8XRs](https://www.youtube.com/watch?v=bxe2T-V8XRs)

[3]: [http://nbviewer.ipython.org/github/stephencwelch/Neural-
Netw...](http://nbviewer.ipython.org/github/stephencwelch/Neural-Networks-
Demysitifed/blob/master/Part%201%20Data%20and%20Architechture.ipynb)

------
idunning
The author mentions that one of his goals was to focus on a smaller set of
functionality and make it simple and high-performance, but I've got to put a
shoutout to Mocha.jl here [1]. It is essentially Julia's answer to the Caffe
deep learning framework (which is linked in the article), and has pure Julia,
C++, and CUDA GPU backends. Its under active development but is already pretty
amazing. Bonus: it has documentation!

On the contents of this blog post: I really like how the Julia type system is
used here. Not only do the types help structure the code and send a signal to
the user, but of course there is type-checking to catch errors.

[1]:
[https://github.com/pluskid/Mocha.jl](https://github.com/pluskid/Mocha.jl)

------
dicroce
Write your neural network code today, and every 5 years or so dust it off add
a few layers and run it on your current computer and watch the its performance
improve!

------
lohengramm
What the "deep" means in deep learning? Is backpropagation deep learning? Why
not just machine learning? I became lost since this term started to get used a
lot recently. I have implemented and used a feed forward neural net trained
with the BP algorithm once and learned it just as a "machine learning"
technique, no deep.

I haven't read this yet though, maybe it explains.

~~~
noelwelsh
Deep usually means more than one hidden layer. The basic algorithms are
similar to back propagation, being based on gradient descent, but there are a
lot of tricks (or refinements, depending on your point-of-view) to make the
learning more robust and efficient.

Deep learning is a specific area within machine learning.

~~~
sgt101
I tried training a 4 layer network in as a young man (because, why not) a day
later, no convergence, some calculations revealed that an expectation of many
1000's of days (I have a memory of 1 million, but that could be bs) before the
network would converge.

So "deep" networks have been around for many decades, and they haven't,
because you couldn't train them. Now we have computers that are 10,000* faster
(at least) and training algorithms that are much faster too these
architectures are interesting.

~~~
noelwelsh
This comment nails it. I think it is under-appreciated how much machine
learning progress is enabled by increasing computing power. This is not to
deny algorithmic improvements, but it's hard to refine algorithms if you can't
actually run them.

~~~
lohengramm
Thank you all above for the explanations.

------
jostmey
I love Julia. I used it to write a restricted Boltzmann machine in less than a
page of code ... and my code has lots of whitespace ... and without much
effort. It helped that I knew matlab.

~~~
memming
I'm curious about your implementation. Can you show us the code?

~~~
idunning
I don't think its the same, but there is a high-quality implementation in
Julia that is registered:
[https://github.com/dfdx/Boltzmann.jl](https://github.com/dfdx/Boltzmann.jl)

------
dschiptsov
Is far as I remember, in the Andrew Ng's course it was about a few lines of
Octave. Has the world been dramatically changed since then?

------
vonnik
We did our best to build something both readable and extensible, borrowing the
syntax of numpy, matlab and scikit-learn:

[http://deeplearning4j.org](http://deeplearning4j.org)

