So I plead to anyone starting learning ML, please do not shy away from new languages. It's really a much smaller effort than you imagine and the rewards are much bigger than you expect them to be.
It's interesting. The fast.ai MOOC takes the complete opposite approach:
We have spent as much time studying the research into effective education techniques as we have studying the research into deep learning—one of the biggest differences that you'll see as a result is that we teach "top down" rather than "bottom up". For instance, you'll learn how to use deep learning to solve your problems in week 1, but will only start to learn why it works in week 2! And you'll spend a lot more time learning how to write effective code and use effective processes than you will on learning mathematical formalisms.
Personally, I much prefer the fast.ai approach, which I'd characterize as "develop a shallow understanding, and then go deep". Either way, one thing is for sure. The part where it is claimed that lessons learned from building things from scratch are real gamechangers when it comes to the messiness of tackling real world problems with these tools doesn't bare out in reality. The fast.ai course is much, much more focused on performance in real world situations than any other course I've looked at.
In the past, I have advocated learning Deep Learning using only a matrix library. For the purposes of actually knowing what goes on under the hood, I think that this is essential, and the lessons learned from building things from scratch are real gamechangers when it comes to the messiness of tackling real world problems with these tools. However, when building neural networks in the wild (Kaggle Competitions, Production Systems, and Research Experiments), it's best to use a framework.
Which makes a bit more sense in context. Certainly I think taking CS231N has put me at an advantage going through fast.ai because the quality of the notes (and discussion on backprop, how neural nets work, etc.) is significantly better. Andrej Karpathy and Justin Johnson have done an amazing job with the whole website.
The downside with CS231N is the lack of really real-world practice if you're want to work with image data. The implementation examples only focus on CIFAR-10. I think the two courses are definitely complementary and I think if I'd only taken fast.ai I would be left a bit unsatisfied about knowing what I'd been doing when twiddling parameters.
However, my comment was about how much the theory helps in the real world (and I don't think my quote was misleading. I think the author still advocates learning theory before using the frameworks)
I think that the fast.ai approach of showing solutions, then explaining the theory is really good.
For example in lesson 2 Fast.ai shows how to finetune a VGG model to get (real!) state of the art performance on a real-wordl, two-class image recognition problem by adding an additional dense layer.
In CS231N (which I haven't done, although I have glanced through the notes) this isn't really said anywhere. It's kind of implied in a lot of places, and if you study it you'll be able to work it out for yourself.
That isn't bad, but I'm unconvinced that gives the huge "real world" advantage given the amount of time it takes to get to that point.
I strongly agree they are complimentary though.
That said, I'd be interested to see some actual results from people taking what they learned in fast.ai (with no prior knowledge)and applying it to a new test set. It's easy to be impressed with the examples where you're walked through it, but figuring out what works on your data is whole new kettle of fish!
I liked the parallel constructions of the neural network and the transition from linear algebra to framework. I really appreciate the ease of use of PyTorch, which pushed me over the edge into actually doing something useful with deep learning.
I managed to get through this tutorial and make a submission to the kaggle digit recognizer competition in the span of a few hours. I'm excited to figure out how to train a model more efficiently, which seems to difficult problem of choosing network hyperparameters.
Some background in this article currently featured in r/MachineLearning: https://www.reddit.com/r/MachineLearning/comments/5o9vfx/d_k...