Hacker News new | past | comments | ask | show | jobs | submit login
[dupe] Hacker's guide to Neural Networks (karpathy.github.io)
236 points by eaxitect on March 23, 2015 | hide | past | favorite | 61 comments



Some more good posts when it was submitted last year: https://news.ycombinator.com/item?id=8553307


Someone linked to a coursera course on machine learning in that thread, but the URL seems to have changed. It's now found at https://www.coursera.org/course/ml

"Andrew Ng's Standford course has been a god send in laying out the mathmatics of Machine Learning. Would be a good next step for anybody who was intrigued by this article."


I don't remember Andrew Ng's coursera class giving a satisfying introductory mathematical treatment. I remember frequent handwaving away of the calculus intuition in favor of just dropping the "shovel-ready" equations into our laps so that we could do the homeworks. If you wanted a better treatment you had to dig it up for yourself (which wasn't too hard if you visited the forums but still).

Has it been supplemented since then?


Visualizing training of CNN in browser allows to anybody quickly get the ideas behind. A while ago, I was looking for things like "xyz applet" to get intuition about xyz. This is exactly what is going on here. Great respect to the author!


Nice introduction to backpropagation. The author understandably handles only a special case and avoids the case when an input comes in on many completely different paths to the output. When this article will be extended, it would be interesting to read about the case when it is not that way. (e.g. convnets with parameter sharing).


I also like how the backpropagation section starts out by immediately talking about how it is really just chain rule application.

The backwards-moving pattern of "backpropagation" is really just a side-effect of the derivative chain rule application order, but a lot of intro materials treat backprop as if it is some fancy thing specially-designed for neural nets. I suppose "compute the gradient of this function using basic vector calculus" just isn't sexy enough. I complain mostly because it took me a while to figure out whether backprop was exactly the same as gradient descent, or if there were subtle differences.


Another interesting point--that chain-rule gradient evaluation is essentially something called automatic differentiation:

https://en.wikipedia.org/wiki/Automatic_differentiation

which is really cool stuff and should be included more often when talking about backprop.


For those playing along at home (like me): even though it's only to highlight the idea, there's a bug in the "Random Local Search" loop. It should be `x_try = best_x + tweak_amount...` best_x, not x. Same for y. I was wondering why it wasn't improving even with 1e10 iterations ;)


After correction:

100 iterations give best_x = -1.76, best_y = 2.91, best_out = -5.15

10,000,000 iterations (less than 1 sec CPU) give best_x = 16,657, best_y = 16,662, best_out = 277,541,583

https://jsfiddle.net/fe068f0L/


Finally, there is a neural networked way of learning neural network! Thanks..


Also see non-NN topics in the field of AGI. NN is just the most popular mainstream approach lineage.


With all respect to this PhD candidate, if you're serious about this, you're not doing it in Javascript.

And if you're serious about learning it, you'd plop down the $149 for the Home edition of Matlab plus the $45 for the neural net toolbox. Or take a statistics class and grab the $99 student edition, which comes with the machine learning toolbox.

If you're gonna learn some shit, learn some shit. Calling Math.* to calculate rise over run in Javascript isn't where it's at.

Or maybe I'm just grumpy because I interviewed a Caltech CS grad last week who flat out did not know C. Never touched it in his studies. World's gone nuts.


With all due respect, Karpathy is serious -- and funnily enough he implemented deep learning (convolutional neural nets) in Javascript[1]. Admittedly he did it mainly for fun + browser demos (it's on the front page of Stanford's Convolutional Neural Networks for Visual Recognition course[2], which he is an instructor for), but for learning about the concepts, it's a good enough language.

When the students understand the concepts and what they're interested in, then we can nudge them to focus on choice of languages. Even then it's a dangerous domain as there's so many options.

Deep learning and want to focus on algorithms or only previously had high level experience? Python with Theano is a good bet and can take advantage of the CPU or GPU. Even Python + numpy.

Replicating existing work in the literature and want to take advantage of the some of the existing libraries? Much of it is in Matlab.

Doing something crazy on the GPU? C for OpenCL ...

The list keeps going, but before getting to any or all of those details, the first step is understanding the concepts.

[1]: https://github.com/karpathy/convnetjs

[2]: http://cs231n.stanford.edu/


I'm going to echo Dijkstra here: It is practically impossible to teach good programming to students that have had a prior exposure to MATLAB: as potential programmers they are mentally mutilated beyond hope of regeneration.

Matlab is a very poor language in and of itself, but it doesn't help that the culture in Matlab (at least whhen I was last exposed to it) doesn't use revision control or write tests. And if you want to productionize a service you have to screw around with licenses on deployment machines, use the painful Matlab compiler, or just reimplement the relevant work. It's just a dead end technical credit card -which may be good for some! Don't get me wrong. But for many many people it just makes a mess.


Just wondering why Matlab ?

Why no Python ? Why not C ? What about Lisp ? And Haskel ? Hey did I mention D ?

I can agree that JS is not the best suited languages for this task but, JS is extremely well know so it is perfect to explain something.

Matlab... Humm is not so widely know...


Agreed. You use Matlab if you want to do work that will not been seen outside of academia. You use R if you have some hope of getting a real job after your postgraduate years, and you use C/C++ code with Python (or Lua if you like the Torch work) as your glue if you plan on being one of the rock-stars that people actually compete over.


If you've ever done anything with matrices (and are lazy) you should have encountered Matlab. And if you haven't, it's worth checking out. Matlab made my life so much easier!


My order of preference would be

Python, because I know it already, and its my day to day language. As far as I know it has decent libraries.

GNU Octave, as it was reccomended on the Coursera course.

R as it is an open source alternative, no expensive licensing cost, so I can try it out a no risk.

Would I gain anything going to Matlab, as I can see advantages with the others being open source?


Octave is basically a poor mans Matlab. If you think Octave might a good answer then Matlab is definitely a better answer if you can afford it. Matlab has better performance, a better IDE and a much bigger collection of libraries.

That being said Matlab is a pretty poor language for anything beyond prototyping and testing proofs of concept. And when I'm doing anything that might turn out become a longer lived project I always choose Python and numpy.


Are you planning on getting work done? Or are you looking forward to digging into the source of yet another GNU tool that is stuck 20 years in the past?

"One of the biggest new features for the Octave 3.8.x release series is a graphical user interface. It is the one thing that users have requested most often over the last few years and now it is almost ready."

If you're planning on spending more than a few hours in the tool, consider spending the $200 on the one that's from this century.

I picked up Matlab late and wish I'd done so sooner. It's a REPL for higher-level thinking and quick experimentation. It makes some hard things very very simple and some simple things very hard. They've dropped the price and fixed a lot of shit in the last 5 years. It's replaced a lot of goofing around in Python and Excel for me.

Edit: I have a love/hate relationship with R and more of a love relationship with Python but they don't do ANYTHING like what Matlab does.


Isn't Octave built with Matlab compatibility in mind? And Octave 3.8.0 was released back in December 2013 - so the GUI is already present, and from this decade. It's a great open source alternative to Matlab in my eyes for anyone that does not want to stump up $200.

http://en.wikibooks.org/wiki/MATLAB_Programming/Differences_...


Since you linked to several thousand words explaining the differences between the languages, as well as the following quote: "The plan is for the GUI to be considered stable with version 4.0" then I'll assume you haven't used either product. I assure you that I did not spend $4k just to insult the open source community.

Love open source, glad they're doing it, much love to the devs -- and life is just too fucking short. If you're broke or screwing around--ok, maybe. If you're sitting in it more than an hour a week, maybe Photoshop is worth the $10/month vs GIMP, dig?

And if you're in school and get it for free or are interested in getting up to speed with the state of the art in machine learning and qualify for the personal license at under 200 bucks? I can't think of a better tool.


For the record, Photoshop > (GIMP)x10000 IMHO. (Just finally glad I found someone that may agree)

I've not used Matlab but I've used Octave (during the ml class from Andrew Ng) and it wasn't too awful. Occasionally very frustrating though so if Matlab solves those frustrations I could see it being worth it if you're going to spend time in it!


Several thousand? Evidently you like to exaggerate. The content is circa 1000 words - but the point it makes in relation to this context is delivered in the first sentence.

I've used both before. Matlab in my CompSci BS and MS and Octave in my professional career.

I'm just making the point that your critique of Octave seems a little harsh, and to the Leyman skews the reality - that Octave is an open source alternative. Yes, a little rough around the edges if you intend to use the visual aspects of the tool, but for matrix multiplication etc it has its use cases.


The page you link is actually a shortened form and links to the real list: http://wiki.octave.org/FAQ#How_is_Octave_different_from_Matl... which comes in at 2,662 words.

The parts you say are good about Octave are EXACTLY the stuff I use Python and R for. If you want to use a Matlab library or toolkit, maybe cut in paste some code from a paper, do a quick 'what if' thing -- Octave just doesn't cut it.

(My point in speaking up here isn't to diss Octave it's to re-center the reality around Matlab.)


I've used Scilab/Xcos a bit for continuous simulation. I know there is the ANN toolbox but I haven't used it. It's a pretty usable Free "Matlab clone" but I usually only see GNU Octave mentioned. Just tossing it out there, as I said I mostly use it to draw pretty Lorenz attractors ;)


What about Mathematica?


Agreed, I'm surprised people aren't talking more about Mathematica. I feel like it is a way nicer experience, the user community is great and the built-in computable data is awesome. It's expensive, so it may not make sense for messing around, but for real prototyping and deployable APIs, it's worth it.


Matlab is popular mostly because there's a lot of code in place to do optimization and to compare to past work. I personally stay away from it these days because its pretty horrible for anything but that, e.g GUIs.


You can't have taken a college-level course in applied mathematics without touching Matlab. Maybe not all statistics courses use it, but we definitely used it when we were learning about newton elimination, linear/poly regression and other things where you solve mathematical equations that aren't analytically solvable.

And yes, that was a mandatory course for CompSci. You mean people get through CS without touching Matlab and such? That's bonkers.


You definitely can. I only ever touched Matlab in a course on Signal Theory (wavelets, Radon transform, Fourier...) All other courses involving anything needing solving 'somehing, were programmed from the ground up in C (off the top of my head this included Numerical Methods, Numerical Analysis, Dynamical Systems). In more algebra focused classes we used gp/pari or Mathematica.


Call me silly, but implementing Numerical Methods/Analysis from the ground up in C sounds like something that would put me in a fetal position in a dark corner somewhere. Why would anyone do that to their students?

Did they also make you implement operating systems in assembler?


Well at Carnegie Mellon they did. That's why I wanna know how some punk got his Caltech paper stamped without even learning C.

He did seem sad that SpaceX wouldn't touch him. Apparently you need more than friends at the Jet Propulsion Laboratory and an ability to talk endlessly about Haskell to send shit to Mars. So that's good news at least.


I'm a mathematician, not CompSci (but I taught CS students and they had to also implement some things in C on their second semester.) So, no operating systems. Tough stuff was taken for granted when I was in the University.


There seems to be two schools of thought when it comes to teaching numerics. You either teach how to effectively implement the algorithms or you teach how to effectively select and apply the algorithms. It seems to be tied to if the computational science department comes out of the Computer or Math department.

I'm personally of the opinion that knowing how to select and apply the right algorithm is far more important to learn for undergraduates. Implementing numerical algorithms correctly and efficiently is serious high level PhD territory, and if you're not going to learn how to do it right you're much better off not doing it at all and leaving it to the experts.


If you've never implemented a bootloader and some trampolines or interrupt handlers in assembler, you didn't really learn about operating systems.


Only after we built the CPU pipeline using nand gates.


Applied math grad student here. C/C++ or Fortran was required for my computational/numerical courses. Part of this was because of the emphasis on numerical stability and complexity.

I was, however, surprised starting out to find out how many of my fellow students had never programmed in a compiled language before - different focuses during undergrad, I suppose.


Another applied math grad here. Our intro numerical course was taught in Matlab since it made it a lot easier to focus on stuff like numerical stability and complexity without getting stuck in the details of teaching people C. Everybody who ever uses a computer to do any sort of math needs to know about concepts like stability, truncation, accuracy and precision, not everybody needs to know C or Fortran. It seems unnecessary to me to tie the learning of the one to the learning of the other

It wasn't before the intermediate courses that Fortran (77 and 95),C and assembly where taught. And even there Matlab was used to illustrate most higher level concepts.


Well, I had some classes you mentioned (statistics, numerical methods, linear regressions) and never touched matlab.


The best reasons for doing stuff with neural networks in JavaScript is that:

a) Lots of people are being introduced to programming via something related to the web; if neural networks are 'hidden' in a technology that they won't be using for a long time, a whole lot of people won't get into using neural networks.

b) JavaScript's everywhere - so if something can be shown in JavaScript, a lot more people can see it. And they can try it out themselves, tinker with it, all within software that their computer came with. No $149 software package required.

That said, Andrej Karpathy is a pretty smart guy, and so even if I had serious reservations against using JavaScript for this kind of thing , I'd probably be taking whatever he's doing seriously :)


1) JavaScript is obviously fine for an introduction and pedagogical benefits trump all other concerns so the author of the post picked a very good tool for the job at hand.

2) If you want to be grumpy and complain about "what has the world come to". I'd rather use Fortran than C for this specific domain. I'd also rather use Fortran than Matlab.

If it's a bigger project and not the typical "academic toy example" I'd very much prefer Python+NumPy over Matlab (if I was a better Fortran programmer I'd use that but I lack software engineering expertise with Fortran). I can very much live with the performance loss (which isn't even major) if I gain engineering benefits. You can always measure + optimize later if need be.

+I'd rather not waste time thinking about licensing


Is your complaint specifically that he is writing in JavaScript, or that he is starting from scratch and not relying on libraries to use more sophisticated algorithms?

Because both seem like odd complaints for a tutorial like this. No, you're not going to do real world work this way, but you're not going to write a game in shadertoy either.

The latter shouldn't be replaced with an apology note for wasting everyone's time and link to UE4.


Ya, you sound a bit grumpy. I think Kaparthy wrote this blogpost to a) play with JavaScript and b) to reach a new audience.

If you want more rigor, check out the coursed he co-created: http://vision.stanford.edu/teaching/cs231n/


No, here's where he plays with JavaScript: http://cs.stanford.edu/people/karpathy/convnetjs/

This is "on web where all books should be" and claims to "contain very little math" despite being nothing but an endless list of formulae and absurd javascript math that looks like it was ripped out of a BASIC language manual for "how to draw circles" in text mode circa 1978.

Nothing wrong with JavaScript. And maybe this prof is working out some issues his students are struggling with.

But this ain't exactly the Feynman lectures:

    var x_try = x + tweak_amount * (Math.random() * 2 - 1); // tweak x a bit
    var y_try = y + tweak_amount * (Math.random() * 2 - 1); // tweak y a bit


Being proprietary closed source software, Matlab is unsuitable for most practical uses; but if it's good enough for your "serious" tasks you are welcome to use it. On the other hand, recommending Matlab for interactive web demos, which require using Javascript and redistributing your machine learning "toolbox", is absurd, not merely "grumpy".


Maybe he wanted to find common ground as an introduction to something that is quite intimidating to a lot of people? I see the whole point of 'introductions to' like this is to let people decide whether they are interested or not and then it's up to them to take it further, whether that be matlab or C or whatever.


Exactly.


> Or maybe I'm just grumpy because I interviewed a Caltech CS grad last week who flat out did not know C. Never touched it in his studies. World's gone nuts.

I've heard stories about some students in France, who are doing a whole year and half of C projects before anything else. Maybe these are just far tales of unicorns.


Former (French) engineering student here: indeed we (at least in my school) started learning programming with C ! Matlab was also used, but in more specific courses (signal processing, control theory with Simulink...).


Not Fairy tales. There is a least two major schools with a throughput of 1000 students / year each that do it this way in Paris : Epitech.net (7000$/year) and 42.fr (no fees).


Strange. The operating systems class when I attended (late 2000s) required C. I wonder what they replaced it with?


More C => more Good. Balance has been restored.


You can not be serious about all possible research trends. CNN is one of possibilities that you can spend time on. The faster you get the key ideas the faster you can move on to something else, or dig deeper as you propose.


Sounds like tacos shouldn't be interviewing anyone as it's clear the lack of C knowledge surely makes any candidate an incapable one.


Job was for C++ implementing signal processing code with SIMD optimizations. He couldn't unpack RGBA from a long.

I've also seen Matlab shitheads who couldn't do it in C either, so it goes both ways.

The point of mentioning that wasn't to "neener neener" a recent grad it was to open the greater discussion you see here regarding CS trends.

Caltech is brutal and spending five years there without touching C seems... odd. How can you spend half a decade driving past the Jet Propulsion Laboratory on the way to Trader Joe's and not pick up the skills to get an interview with SpaceX?


You do sound grumpy. But I'd be grumpy too if I was faced with a CS grad who didn't know at least how to read C and write basic functions.

Actually, I think a CS grad should be able to read any language as long as it's not completely bonkers like J or brainfuck or malbolge or something. But maybe I'm weird.


As much as I want it to be true, among Lisp, Erlang, Prolog, Ocaml/Haskell and Smalltalk (those are non-mainstream and definitely not bonker language) it's almost impossible for an average CS grad to be able to read them all.


Is it?

Sure, you might not understand all the intricacies of each, but I think the basic idea should be understandable when expressed in any of those languages. But I'm biased, I was exposed to Prolog in college and I learned Lisp and Haskell on my own (to an extent) so it's really difficult for me to judge what is and isn't readable.

But if you can read: foo.map(function (d) { return d+1; }), you should be able to read (map foo (+1)) imho.


Among all of those, I've never used Prolog, and only had one summer project's experience in Erlang, and a tiny bit of coursework in Smalltalk. But I'm pretty sure that other than Prolog, I can figure out what's going in a program in any of those languages.

Mind, at the time I graduated undergrad, I had no real idea about Smalltalk. But that was 4 years ago.


12 years experience here. I think I could have a geuess at what Lisp code is doing, likewise with OCmal / Haskell. I have never written anything in any of them, but have read a couple for tutorials.

Never seem Erlang, Prolog or Smalltalk code.

I am capable of deciphering Perl code as long as its not too obfuscated.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: