Hacker News new | past | comments | ask | show | jobs | submit login
Andrew Ng and the Quest for the New AI (wired.com)
147 points by ivoflipse on May 7, 2013 | hide | past | web | favorite | 70 comments

I've said this before, but deep learning is terribly powerful precisely because you don't have to spend lots of time doing feature engineering. Multi layer networks that are trained in semi-supervised, unsupervised and supervised fashions all can now produce networks that meet or beat the state of the art hand created models for speech, handwriting recognition, ocr, and object recognition. We are only just beginning to see what is possible with these sorts of techniques. I predict within a few years time we will see a huge renaissance in AI research and Neural Network research specifically as these techniques are applied more broadly in industry. My startup is building some cool stuff around this technology, and I know there are hundreds like me out there. This is going to be a fun ride.

This has been said about neural nets two times already. Sadly, they did never deliver.

There are still applications where e.g. random forests beat the crap out of all kinds of deep learning algorithms in (a) training time (b) predictive quality (c) prediction time.

We should stop hyping this. I am a researcher working in deep learning myself, but the current deep learning hype is actually what makes me worry that I will have trouble getting a job because industry will be disappointed a third time.

Well I think a big part of this is that right now we have finally gotten to where the algorithms + the required computing power are starting to become more widely available. Cheap graphics cards or things like Intel's Phi + techniques like drop out to prevent overfitting are really enabling much more sophisticated things to be done in a reasonable wall time. Granted multilayer neural networks aren't a free lunch that will solve everything, but there are large classes of problems that are falling to these techniques all the time. We are also finding that Neural Networks scale very well to very large architectures, better than some of the other techniques. I understand we should be careful not to overhype since we have seen previous excitement cause a mass exodus from this research before. I however think people like Hinton were always right, and this was awesome stuff. We just couldn't really take advantage of it because we could never train it for long enough and we hadn't learned how to do things efficiently yet.

Yes you are right.

Still, deep learning has done nothing more than classification right now.

What about predictive distributions, regression of complicated outputs (e.g. periodic data) and, most of all, heterogenous inputs? Right: nothing impressive has been done in that area, despite of huge amounts of practical problems.

Let's see if deep learning generalizes to those things. If it does (and I personally believe so) let's be happy. Before that, we still have to envy what Gaussian processes, Gradient boosting machines and random forests can do what DL so far cannot.

Still, deep learning has done nothing more than classification right now. What about predictive distributions, regression of complicated outputs...

http://homepages.inf.ed.ac.uk/imurray2/pub/12deepai/ has predictive distributions from deep learning, passed on to time-series smoothing for articulatory inversion. It's a previous neural net approach made deep, and working better as a result.

(I agree that like any machine learning framework, neural networks have their strengths and weaknesses, and open challenges.)

Okay, I should have worded that differently. There is also a paper of Salakhutdinov learning a kernel for Gaussian processes. That'd account for that as well.

My point is (I did not really write that above) that deep learning does not stand unchallenged in this domain. Its dominance is so far "only" apparent in vision and audio classification tasks.

It's a little offtopic, but can you recommend a random forest implementation to be used from the command line, like Vowpal Wabbit does for linear learning? I like VW for its speed and ease of use.

Can you recommend any introductory papers in the area, or keywords to look for aside from the obvious?

If an article says that Andrew Ng is "the man at the center of deep learning" it's just not right. Geoffrey Hinton's and Yoshua Bengio's impact were at least as high as his, if not much higher.

There is a very quick reference to the person who inspired him, Jeff Hawkins, whose book is worth a read:


Edit: update link

Grok (formerly Numenta) has a slightly more technical white paper that goes into more detail on the actual algorithms from On Intelligence:


There are some subtle differences between HTMs and straight up deep learning, mainly the requirement for HTM data to be temporal and spatial.

I know Andrew used to sit on an advisory committee at Numenta, I don't know if he still does.

Link does not load CSS for me, but this does:


Thanks, had stripped the tracking codes off the mobile link, but it doesn't load right on desktop.

I just saw Jeff Hawkins give a talk and it was quite interesting. I was a bit worried, however, that he is basing his theory of intelligence on the human neocortex, while claiming to go after general principles.

This is guaranteed not to be terribly general, considering the many bits of matter on this planet that exhibit intelligence without a neocortex. By many, I mean ones that hugely outnumber humans.

So very interesting stuff, but not the answer that I think he wants it to be.

In "On Intelligence" he postulates that he's not necessarily going for a "human-like" intelligence or even a "life-like" one.

Basically he just wants something that's very good at recognizing patterns over time, which I can imagine the neocortex would be great at.

Though, he also references the thalamus and hippocampus in the books a lot, as very important parts of the brain to his framework. [http://en.wikipedia.org/wiki/Memory-prediction_framework#Neu...]

That book will change the way that you look at yourself.

We should stop trying to claim every new method is "like the brain". We don't have any clear understanding of how the brain works. One can be inspired by a particular and likely wrong cognitive theory, but one can not say one is building "machines that can process data in much the same way the brain does" truthfully without a deeper, and currently unavailable, understanding of the functioning of the human brain.

I feel the same way, that researchers should stop trying to mimic the brain, but not because we don't understand the brain. While I think there are still several decades before we'll be able to have mind uploads, I also think a lot of people underestimate the quality of modern brain science. In any case, I have the same reason as Dijkstra for why I think mimicking the brain isn't that great an idea. In http://www.cs.utexas.edu/~EWD/transcriptions/EWD10xx/EWD1036... (really a great read to branch all sorts of thoughts off of) Dijkstra said, "The effort of using machines to mimic the human mind has always struck me as rather silly: I'd rather use them to mimic something better."

It's probably a harder problem, creating smarter-than-human intelligence on a machine, but research isn't as constrained by laws and ethics (they don't have to bemoan not being able to experiment with living human brains). I wish more people were active in the area.

They are trying to find THE ONE ALGORITHM that solves all A.I. problems. The brain has an implementation of it, but it is in wetware, hard to extract. Deep learning makes some pretty good approximations of the visual areas, though.

You assume that human cognition has an algorithmic component. Maybe you are right, but we still have a pretty shaky understanding of how Neurons work, let alone how the brain works on a large scale. Who knows, lets investigate by trying possibilities but lets understand that we are still in a position of ignorance.

We have some probabilistic models of that successfully predict various future states of the brain from past states or stimuli. This is not the same as understanding it or even approximating it.

OT: His online Machine Learning class last year was great. He is the best professor I've ever had, and explains things so clearly that you understand them the first time. You are lucky if you ever get to work or study under him.

I second this! I have taken 7 Coursera classes, and most of them "lightly": just doing as much work as I needed to for passing the class, with just a few classes that I did put a lot of energy into. Andrew's class was in this second category: I kept taking the tests and tweaking the homework assignments over and over again until I got a 99.5% score in the class. His class was lots of fun and also very useful material. Recommended!!

Yes, unfortunately his partner, Daphne Koller, though very esteemed academically, made me feel very frustrated in the PGM class. Not the same great feeling of clarity.

In case this comment dissuades others from checking out Dr. Koller's class, I will add that I found her class challenging and well-constructed. I recommend it.

That's awesome to hear! I'm taking the Coursera class now, and it's been great so far. It just started a couple weeks ago, so it definitely isn't too late to join! https://www.coursera.org/course/ml

I wish Coursera followed the Udacity model. I always find out about these classes after they're already weeks in progress or over.

You can star any Coursera class to receive notifications whenever new sessions are announced.

Also, I believe it's still possible to join the current session (first assignment was due this weekend, but you can turn it in late with just a 20% penalty.)

I take new courses at any time, even if they have ended. Later on, when they recycle, I can do them all over again with ease.

Unless you're attached to getting a certificate of completion, you can pretty much follow the Udacity model. As long as the course hasn't finished, sign up and get around to the videos and assignments when you get to them. There isn't the same discussion forum interchange, and your homework isn't graded, but they don't drop the class from your list even if you do nothing during the run.

Note that you'll want to be careful to cache the materials offline if you do this, especially if you plan on "catching up" after the formal end date for the class. Some of the courses (notably the Princeton algorithms ones) disable access to the materials once the official course ends.

Check out class-central.com for a list of all current and upcoming classes. Coursera also lets you star a specific class and get notified if they are repeated in a new cycle.

I'm also taking the class. Just finished the logistic regression programming assignment. Great stuff. You can take those algorithms and kind of add a bit of magic to your software.

thanks very much for pointing this out, joined the class now, hope i can catch up

Keep it going! I attended the very first one (~75%), and then did a second time last year. It was then easy to get to a perfect 100% score, having done most of the exercises during the first time.

It is one of the best Coursera classes. I had a blast, and strongly recommend it. I decided to continue learning ML, mostly because Prof. Ng.

Haha me too late to the party. I just joined. Frantically watching the video lectures since the assignments are due today (hard deadline)!

Okay, now you got me scared. The hard deadlines for all my assignments are on July 8th 8:59AM (that's CEST, so it's probably July 7th in PST.)

I took this one https://www.coursera.org/course/ml and it started on Apr 22 and ends around July 1st I think. So not sure what course you are talking about.

Well, this is weird. I'm taking the exact same course and here's what I see for the first programming assignment: http://i.imgur.com/GyEwcUG.png (same with review questions)

Sorry my bad. I was under that assumption since I saw that the hard deadline for review questions were today, so I thought that must hold for the programming assignment.

It’s great this year as well. Very enjoyable class.

I have to blame you, because now I had to take the course in coursera! I always wanted to learn machine learning. Better late than never!

Hmm...I don't mean to be a skeptic, but I do not see any new theories here. Neural networking has been around for a long time, as have an abundance of theories and implementations around it...some people have gone so far as to build an actual brain replica (a digital version of the bio/analog thing). Neural networking is extremely powerful, but to be of any use, you need a lot of computing power. As it turns out, our brains are really good at massively parallel tasks like image and motion processing; these things can be done explicitly on a computer with some ease, but having a computer learn on its own from scratch how to do them is not easy.

You're correct in that neural networks as a model have been around for a long time. However, those networks were restricted to be shallow because backpropogation didn't work well on networks with many hidden layers. Only recently have researchers developed learning procedures that can learn these deep architectures efficiently, using some clever unsupervised learning techniques. And surprisingly, they are finding that these deep networks perform remarkably well, beating the state of the art in a number of benchmarks.

You are also right that you do need a lot of processing power to get neural networks to work well. But that is changing rapidly. Hinton's convolutional neural network has the state of the art in the ImageNet benchmark, yet was trained using significantly less power than google brain. Regardless, you don't need google scale computation to get deep networks to work well. The point of google brain is to see how far one could push neural networks.

> Only recently have researchers developed learning procedures that can learn these deep architectures efficiently, using some clever unsupervised learning techniques.

Would you mind naming some of these techniques, if you're familiar with them? I'd like to take a deeper look.

Restricted Boltzmann Machines. https://www.youtube.com/watch?v=AyzOUbkUf3M

This video drives the point home, and is made by the author of this technique.

I think the requirement on enormous computing power is exactly the point. Since not many have access to such computing power, Andrew Ng and folks at Google (incl. Jeff Dean!) built a large scale neural network system called DistBelief (nice name!). This system allows programmers to think of really large scale neural networks without worrying about how to scale them, handling fault tolerance, etc. You can think of it as the MapReduce for neural networks. They demonstrated how a large scale neural network can do interesting stuff on its own (e.g. recognising cats and human faces from unlabeled youtube videos).


Neural networks of course "mimic" the way the brain works, but it stops there.

Not to mention several modern AI techniques have nothing to do with mimicking biology (like SVMs)

This looks a reporter pushing his own agenda to make for a colorful story.

So I am a little confused. Where are we on the learning part of AI? As I understand it, the current consensus is to throw as much data as you can at your model (millions of cat pictures in this article's example) to make it pick up patterns and yet still claim that we are closing in on how the brain works? As far as I can tell no human brain would need that many pictures to see a pattern. In fact, and this is probably more apparent in language, we humans tend to work with degenerate data and still end up with perfect models.

You may not have seen millions of cats, but how many images have your brain processed since you can see? A five years old brain has been trained on billions of images (along with many other simultaneous inputs).

Bear in mind that the neural network is starting at the same point as a newborn baby.

It seems the relationship between the brain and deep learning has evolved in such a way that the later can help with insights into how the former works.

In this regard, I thought I would mention the extraordinary simple and elegant talk by G. Hinton last summer: http://www.youtube.com/watch?v=DleXA5ADG78

It starts from a simple and clever improvement to an existing deep learning method and ends up with beautiful (and simple!) insights on why neurons are using simple spikes to communicate.

I thought the man behind the Google Brain was Ray Kurzweil (http://www.wired.com/business/2013/04/kurzweil-google-ai/).

Kurzweil is a hack. (Source: I've an Msc in Robotics and read all his books)

Please think about what a source is. This isn't reddit.

Bullshit. There is nothing wrong with using personal experience for reference so long as it is properly cited.

It beats "This guy is wrong (source: random blog post that I didn't really read)", anyway.

Edit: To be clear, I don't love the Reddit snowclones, but there's nothing wrong with the sentiment behind "I'm a scholar in this field, and I think this guy is a hack."

>Bullshit. There is nothing wrong with using personal experience for reference so long as it is properly cited.

I agree with the sentiment, but "properly cited" suggests a bit more than a one line comment from a username with barely a handful of posts (generally pertaining to bitcoin and intermediate level networking certifications) on an anonymous website.

>It beats "This guy is wrong (source: random blog post that I didn't really read)", anyway.

I have to disagree. At least a random blog post presents the potential for useful information or a fully articulated opinion. What we here is 4 words and an unsubstantiated appeal to authority.

> I agree with the sentiment, but "properly cited" suggests a bit more than a one line comment from a username with barely a handful of posts (generally pertaining to bitcoin and intermediate level networking certifications) on an anonymous website.

Isn't the purpose of citing claims precisely so others can effectively verify or discount their validity?

Citation: a post from an anonymous Internet user who claims to have a graduate degree. Take it for what it is. What's wrong with that?

> I have to disagree. At least a random blog post presents the potential for useful information or a fully articulated opinion. What we here is 4 words and an unsubstantiated appeal to authority.

Conversely, it is far easier to engage in vigorous debate on HN than a random blog. I call it a wash.

No, I think the purpose of citing claims is to demonstrate that they have validity. People can disagree about what constitutes "validity" and the citation is often inappropriate. But you shouldn't intentionally cite a source that shows your claim has no support, you just shouldn't make the claim. (at least, that's my impression of hnetiquette)

Academic citation isn't an honor system. They're there so you can look them up. That said, your point is effectively what I was saying in that the statement "I have personal experience with this" taken in good faith is much more supportive than a link which does not actually support my point, yet the latter frequently passes without comment.

They're there so that you can look them up, but they're expected to justify your claim. It's not a scavenger hunt either.

So misapplying the term source and saying that one has a degree gives one enough credibility as a scholar to turn a dismissive and colloquial statement into a worthwhile contribution to a discussion about AI?

The point of sourcing your statements is to give the listener enough context to judge for themselves what your credibility is. It's not a layperson's responsibility to shut up, only to avoid misrepresenting themselves as an expert. It's the listener's responsibility to judge the credibility of those they listen to.

(Besides, my read of the comment that started this was that it's quite tongue-in-cheek. He was basically saying "Don't trust this any more than any other comment you read on the Internet.")

In this case the term "source" is obviously a rhetoric device, and we can avoid a lot of pedantry by noting that the poster was totally transparent about his reasoning thus achieving the ends we want via means that aren't awful. PS having a degree by definition implies some credibility as a scholar.

Seconded. I'm curious to hear your reasoning, and your actual arguments will sway me much more than your credentials anyway.

I thought the statement was obviously a playful joking way to point out this was his personal opinion.

Dude, I am sure you know more than me in the field of robotics and A.I. but I don't think Google would spent so much money on a hack.

drcross is a hack. (Source: I'm a space fairy)

If you're going to throw broad statements around some examples would really help lend you credibility.

Ah. The good old AI cycle.

Scientist: X can help us get full AI!

You: Why?

Scientist: Because of reason R.

You: But, reason R is a non sequitur...

More seriously, reasons similar to that for deep learning have been repeated multiple times in AI with failure (e.g. Thinking Machines).

I would suggest that these folks remain calm and build something on the scale of IBM's Watson using just deep learning..

Very interesting article, it makes me hopeful.

This might be slightly off-topic, but I'll try it here anyway: can anyone recommend any books/other learning resources for someone who wants to grasp neural networks?

I'm a CS student who finds the idea behind them really exciting, but I'm not sure where to get started.

Great and inspiring professor. Taking his ML course on coursera and trying to follow his talks.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact