Hacker News new | past | comments | ask | show | jobs | submit login
Beginning deep learning with 500 lines of Julia (denizyuret.com)
142 points by milvakili on Feb 28, 2015 | hide | past | favorite | 17 comments

"I wanted to write something that is concise, easy to understand, easy to extend, and reasonably efficient. There is a subtle trade-off between conciseness and extensibility: If we use a very high level language that already has a "neural_network_train" function, we can write very concise code but we lose the ability to change the training algorithm. If we use a very low level language that only provides primitive arithmetic operations, all the algorithm details are exposed and modifiable but the code is bulky and difficult to understand. For a happy medium, the code should reflect the level at which I think of the problem"

The tradeoff is not between concision and extensibility, but high- and low-level computations.

Even if the language natively implements a "neural_network_train" function, as long as the language also offers low-level primitives to implement all the necessary parts of the neural_network function, the language is no less extensible than the OP's suggested alternative. For example, almost 100% of R users use "lm" to run linear regressions, but R has all the necessary pieces to implement the linear regression calculation (either by inverting matrices or running iterative gradient descent algorithms)

The OP conflates the library-level abstraction and language-level abstraction. I am with him in that there is a trade off between concision and extensibility w/r/t language-level abstraction. The library-level abstraction is pragmatically important (i.e., you would not use OCaml to run websites) but theoretically uninteresting (Ocaml can certainly express all the needed computation for a web server).

I don't think there has to be. Many libraries for example overkill it and make everything a layer when it should be a layer with a hyper parameter. A good example of this is drop out. Why not make it a knob rather than a layer?

I think people should be given high level primitives like "layers", allow them to make their own where necessary, but allow the defaults to be: Layer with x (dropout,momentum,..) trained by optimization algo: LBFGS,Hessian Free,.. This allows people to experiment with different configurations without having to dive deep to achieve some basic problems.

Relevant to julia: it's a great language and what I wish production code could look like (while being fast!)

Like rust, it's in a pretty alpha state right now. I'm watching the language heavily though.

As a practical example, brain.js is a very limited neural network library. You can only provide a dataset and train on that dataset. You can't implement any variations like dropout, stochastic gradient descent, momentum, etc.

Whereas a good neural network library like Torch lets you work at a much lower level of abstraction. You can put together individual layers, and it gives you the internal code for doing forward and backward passes, and chaining them together.

That's exactly what I am talking about. Your anti-example is a case of library-level abstraction. Now, if you are saying that JavaScript doesn't lend itself to writing extensible machine learning routines whereas Lua does, it's a different story.

The gist of this is really the Backpropagation algorithm [1]. It appears as though everything else is a means to detect when to stop. One of the clearest walkthroughs I've seen online is a short series of very short videos by Stephen C Welch [2] [3]. If you're interested in this space, start with him.

[1]: http://en.wikipedia.org/wiki/Backpropagation

[2]: https://www.youtube.com/watch?v=bxe2T-V8XRs

[3]: http://nbviewer.ipython.org/github/stephencwelch/Neural-Netw...

The author mentions that one of his goals was to focus on a smaller set of functionality and make it simple and high-performance, but I've got to put a shoutout to Mocha.jl here [1]. It is essentially Julia's answer to the Caffe deep learning framework (which is linked in the article), and has pure Julia, C++, and CUDA GPU backends. Its under active development but is already pretty amazing. Bonus: it has documentation!

On the contents of this blog post: I really like how the Julia type system is used here. Not only do the types help structure the code and send a signal to the user, but of course there is type-checking to catch errors.

[1]: https://github.com/pluskid/Mocha.jl

Write your neural network code today, and every 5 years or so dust it off add a few layers and run it on your current computer and watch the its performance improve!

What the "deep" means in deep learning? Is backpropagation deep learning? Why not just machine learning? I became lost since this term started to get used a lot recently. I have implemented and used a feed forward neural net trained with the BP algorithm once and learned it just as a "machine learning" technique, no deep.

I haven't read this yet though, maybe it explains.

Deep usually means more than one hidden layer. The basic algorithms are similar to back propagation, being based on gradient descent, but there are a lot of tricks (or refinements, depending on your point-of-view) to make the learning more robust and efficient.

Deep learning is a specific area within machine learning.

I tried training a 4 layer network in as a young man (because, why not) a day later, no convergence, some calculations revealed that an expectation of many 1000's of days (I have a memory of 1 million, but that could be bs) before the network would converge.

So "deep" networks have been around for many decades, and they haven't, because you couldn't train them. Now we have computers that are 10,000* faster (at least) and training algorithms that are much faster too these architectures are interesting.

This comment nails it. I think it is under-appreciated how much machine learning progress is enabled by increasing computing power. This is not to deny algorithmic improvements, but it's hard to refine algorithms if you can't actually run them.

Thank you all above for the explanations.

I love Julia. I used it to write a restricted Boltzmann machine in less than a page of code ... and my code has lots of whitespace ... and without much effort. It helped that I knew matlab.

I'm curious about your implementation. Can you show us the code?

I don't think its the same, but there is a high-quality implementation in Julia that is registered: https://github.com/dfdx/Boltzmann.jl

Is far as I remember, in the Andrew Ng's course it was about a few lines of Octave. Has the world been dramatically changed since then?

We did our best to build something both readable and extensible, borrowing the syntax of numpy, matlab and scikit-learn:


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact