Karpathy is one of my favourite authors - not only is he deeply involved in technical work (audit the CS231n course for more[1]!), he spends much of his time demystifying the field itself, which is a brilliant way to encourage others to explore it :)
If you enjoyed his blog posts, I highly recommend watching his talk on "Automated Image Captioning with ConvNets and Recurrent Nets"[2]. In it he raises many interesting points that he hasn't had a chance to get around to fully in his articles.
He humbly says that his captioning work is just stacking image recognition (CNN) on to sentence generation (RNN), with the gradients effectively influencing the two to work together. Given that we've powerful enough machines now, I think we'll be seeing a lot of stacking of previously separate models, either to improve performance or to perform multi-task learning[3]. A very simple concept but one that can still be applied to many other fields of interest.
Andrej is also a great lecturer; his CS231n class in the winter was both the most enjoyable and educational I've taken all year. All of the materials are available at cs231n.stanford.edu, although I can't seem to find the lecture videos online. It may not have been recorded.
As a bonus, there's an ongoing class on deep learning architectures for NLP which covers Recurrent (and Recursive) Neural nets in depth (as well as LSTM's and GRU's). Check out cs224d.stanford.edu for lecture notes and materials. The lectures are definitely being recorded, but I don't think they're publicly available yet.
He's good at demystifying a lot of things. He's taught thousands (at least) of people how to get started with solving the Rubik's cube competitively (shortest-time) via his YouTube channel.
Read over [1] and am currently watching [2], and I really can't get over a not insignificant bit of dissonance:
(a) He seems to be very intelligent. Kudos. But…
(b) How good of an idea is it really to create software with these abilities? We're already making machines that can do most things that had once been exclusive to humans. Pretty soon we'll be completely obsolete. Is that REALLY a good idea? To create "face detectors" (his words!)?
Our generation is going to get old and feeble and eventually die. If we have children, they'll completely supplant us.
Our relevance is ephemeral, but our influence will be lasting. Do we want to have a legacy of clinging to our personal feelings of importance, or of embracing the transience of our existence and nurturing our (intellectual) progeny?
If you enjoyed his blog posts, I highly recommend watching his talk on "Automated Image Captioning with ConvNets and Recurrent Nets"[2]. In it he raises many interesting points that he hasn't had a chance to get around to fully in his articles.
He humbly says that his captioning work is just stacking image recognition (CNN) on to sentence generation (RNN), with the gradients effectively influencing the two to work together. Given that we've powerful enough machines now, I think we'll be seeing a lot of stacking of previously separate models, either to improve performance or to perform multi-task learning[3]. A very simple concept but one that can still be applied to many other fields of interest.
[1]: http://cs231n.stanford.edu/
[2]: https://www.youtube.com/watch?v=xKt21ucdBY0
[3]: One of the earliest - "Parsing Natural Scenes and Natural Language with Recursive Neural Networks" http://nlp.stanford.edu/pubs/SocherLinNgManning_ICML2011.pdf