Hacker News new | comments | show | ask | jobs | submit login
Greedy, Brittle, Opaque, and Shallow: The Downsides to Deep Learning (wired.com)
312 points by monsieurpng 40 days ago | hide | past | web | favorite | 149 comments

Brittle and opaque are real problems. The brittleness seems to be associated with systems which put decision surfaces too close to points in some dimension. That's what makes strange classifier errors possible.[1] (This is also why using raw machine learning for automatic driving is a terrible idea. It will do well, until it does something totally wrong for no clear reason.)

Opacity comes from what you get after training - a big matrix of weights. Now what? "Deep Dream" was an attempt to visualize what a neural net used for image classification was doing, by generating high-scoring images from the net. That helped some. Not enough.

The ceiling for machine learning may be in sight, though. Each generation of AI goes through this. There's a new big idea, it works on some problems, enthusiasts are saying "strong AI real soon now", and then it hits a ceiling. We've been through that with search, the General Problem Solver, perceptrons, hill-climbing, and expert systems. Each hit a ceiling after a few years. (I went through Stanford just as the expert system boom hit its rather low ceiling. Not a happy time there.)

The difference this time is that machine learning works well enough to power major industries. So AI now has products, people, and money. The previous generations of AI never got beyond a few small research groups in and around major universities. With more people working on it, the time to the next big idea should be shorter.

[1] https://blog.openai.com/adversarial-example-research/

then it hits a ceiling

Or apparent ceiling. Sometimes ideas haven't really "ceilinged out" so to speak, but they've run into an artificial barrier due to available compute resources at a specific point in time, etc.

Somebody (Geoffrey Hinton, Andrew Ng, I don't remember exactly) made the point that neural networks were in a pretty moribund place, until the confluence of three things came along: more compute power, more data, and algorithmic improvements. Given that, one wonders what other older ideas are waiting to be resurrected by a similar confluence.

> artificial barrier due to compute resources

That's not really an artificial barrier. The program which encodes General AI can already be discovered by brute force, given sufficient compute resources.

Let's assume that evaluating a program takes one microsecond and one atom, and that we can parallelize the search across every atom in the observable universe. If the AGI program is 500 bits long, it will take about 10^57 years to find by brute-force search.

There's a difference between ideas for which there is not enough compute right now and ideas that are computationally intractable according to our knowledge of the physical universe.

Even more fun, I learned from Bruce Schneier's blog post [1] that it takes a minimum 4.4E-16 ergs of energy for an ideal computer running at 4.4K to "flip a bit" in an ideal setting.

Wolfram says there are roughly 2E76 ergs of mass-energy in the universe. [2]

So even if all matter and energy in the universe were put together to power a computer that did nothing but count upwards from zero, there's only enough matter and energy in our universe for the computer to count to about 10^91, or roughly 2^304 [3]. That's not even counting trying to use each resulting set of bits for anything.

So I don't think we'd be able to design too many useful computers by flipping bits until a useful algorithm comes out (at least not in this universe.)

[1] https://www.schneier.com/blog/archives/2009/09/the_doghouse_...

[2] http://www.wolframalpha.com/input/?i=mass-energy+in+the+know...

[3] http://www.wolframalpha.com/input/?i=log2(mass-energy+of+the...)

So, given the Shannon number, even the universe itself can't be used to brute force a game of chess.

Simulating evolution could also be called brute force. Doing that at a scale comparable to our own evolutionary history may be way out of range for present computers, but not foreseeable ones. (http://e-drexler.com/d/06/00/Nanosystems/ch1/chapter1_1.html) Likely we'll get there sooner in some not-quite-so-brutish way -- I'd hope so for ethical reasons -- but that way is available.

There is no reason to assume that evolution always brings about intelligent live. Anything "higher" on the ladder than bacteria could very well be an aberration that only arose due to very specific circumstances on prehistoric Earth.

I think we have more than a sample of one instance to look at though. The distinctive thing about humans is more that we operated collectively according to social rules, than pure intelligence.*

Due to evolution, the coordination of individual entities has happened repeatedly. Individual molecules became living cells, some suggest living cells merged into cells with mitochondria, viruses and cells merged, single cells merged into multicellular organisms, insects organized into social colonies, and now primates, i.e. humans have also organized into social colonies.

This doesn't prove evolution always creates social multicellular life, but it means (IMO) there is a clear pattern that provides a reason to assume it does short of specific reasoning or evidence about a barrier - why should the bacterial level be unique, and which bacteria would be the highest level?.

*I don't think you can separate intelligence from a social context. What we think of intelligence is closely related to things like the willingness to give up current rewards for long term ones. But it's not "smart" to do that unless you have justified faith that you live in a stable society where people are trusted to follow social rules.

There are a couple of most plausibly hard steps: the origin of life up to just before bacteria split from archaea, and the origin of eukaryotes, which happened just once after something like 2 billion years. But I had in mind something cheaper more like scaling up the idea of http://www.karlsims.com/evolved-virtual-creatures.html. Intelligence has evolved in animals as unrelated as octopi and us.

In an artificial setting you can manipulate the conditions to favour whatever you want.

FYI, we know what the theoretical physical bounds are:


The program which encodes General AI can already be discovered by brute force, given sufficient compute resources.

Even if that's true (and I don't think anybody can prove that point one way or the other just yet), who's to say that we have "sufficient compute resources" right now? That's what I'm getting at... an idea might be useful / valid in principle, but just not completely usable today because we don't have enough compute cycles available. The analogy is to ANN's which were limited by the computers available in the 80's and 90's, but became markedly more useful with modern CPU's and - even more so - with the advent of modern GPU's.

The point is that the barrier is REAL. It isn't "apparent" or "artificial". It is a fact about current computing resources.

the barrier is REAL. It isn't "apparent" or "artificial". It is a fact about current computing resources.

Right. And let me add this: what I was getting at is that the ceiling is not an innate part of the approach itself. Perhaps I should have said "external" instead of "artificial". Anyway, yes, the point is about current computing resources vs computing resources (and other factors) at some indeterminate point in the future.

How exactly? Do you have a codeable definition of General AI?

It's a truism and as such it doesn't really say anything meaningful. The limiting factor is "sufficient compute resources". Assuming that a general artificial intelligence is possible and can be coded as a program, then whatever that program is could be created by something as dumb as a random number generator.

Of course, given that it seems obvious that any program which encodes a general artificial intelligence will be complex, I expect it would take longer than the expected life of the universe to generate it with a random number generator using all of the compute resources available on the planet. So it's theoretically possible but practically impossible, which makes the OP's statement essentially pedantic.

  while(passes_turing_test(general_ai) == false) {
      general_ai = generate_random_code;

If you can implement passes_turing_test() in a way that is testable without human intervention... that would be a pretty great advance. I'm pretty sure we're not anywhere near there.

The fundamental problem, as I understand it, is that none of us really understand what consciousness is - we all "know it when we see it" - which is a test that is not so amenable to automation. Perhaps if we had enough examples of different kinds of consciousness, we could set up a ML process :)

With regards to what consciousness is, Peter Watts opened an interesting rabbit hole that I have yet to dive into. His discussion about his own real-world inquiries into consciousness is appended at the end of his fiction work "Blindsight" [1]. Search for the heading "Sentience/Intelligence", and read the first couple paragraphs there.

Watts suspects from his readings that sentience makes up the majority of our day-to-day physical and sensorium activities and sapience (consciousness) is a very thin self-aware layer on top of that. If his sources that he's reading are correct, then we can go a long way implementing practical and profitable applications with "just" ML, but with less focus on trying to reach AGI. There are more radical suggestions within the decidedly philosophical side of the cognitive science community that seem to wonder if consciousness isn't simply a very sophisticated, highly complex ML'ish pattern-matching illusion, and we're all reductively representable as deterministic entities somehow.

All this postulating is too far away from the testable engineering/tinkering/science'ing I prefer to inform myself about, so I'm not quite sure what to make of it, other than to observe that ML hasn't made much of a dent in the sentience side, as we're nowhere near a general-purpose robot that cleans floors, does the laundry, and simultaneously avoids squishing the house cat, none of which requires sapience.

[1] http://www.rifters.com/real/Blindsight.htm

"none of us really understand what consciousness is - we all "know it when we see it""

I don't think that's as solid an assumption as your comment makes it sound. We all know magic when we see it, yet it doesn't exist. Many of us recognise when something "feels pain" even if it's a cuddly toy, or recognise "evil presence in the room" when it's gas poisoning or sleep paralysis or biped-shaped-shadows and low frequency sound.

Many of us think we see a level of intelligence where there is none, in perception of God or Evolution "designing" creatures or "coming up with clever solutions", and many refuse to recognise some level of intelligence in humans in cases where they use good reasoning but get the wrong answer, or do something dumb from using the wrong reasoning, and we have ethical/medical arguments about whether people in comas and similar locked-in syndromes are conscious and aware or not, and whether any plants are.

We have continual newspaper headlines along the lines of "scientists recognise {Dolphins, Elephants, $animal} is more intelligent than previously thought" - if we simply "know it when we see it", it should be pretty clear what creatures are conscious and which aren't.

Certainly we can recognise people-like-us as conscious, but to get "enough examples of different kinds of consciousness" we'd have to be able to know it when we see it, and I'm not sure we do.

Haha. Your position, then, is that recognising consciousness is so hard that we don't even know it when we see it?.

That's fair, and in some ways I agree. But I think a certain amount of that is implicit in "I know it when I see it." The phrase implies that there may be significant divergence of opinion.

Some relevant papers:

An Approximation of the Universal Intelligence Measure


Measuring universal intelligence: Towards an anytime intelligence test


I'd probably do it with a writing sample. Say, Macbeth. Just execute randomly generated bytes until they write Macbeth on their own. Of course it could pass on a 100 monkey type fluke, so after it passes then you submit it to a traditional Turing test.

Given sufficient computing resources, this should yield strong AI in just a few minutes.

You could, of course, do machine checks of some subset of "can generate syntactically correct English" -and that would cut out a whole lot more bad programs than 'can you write MacBeth.'

but- even when you pass that test, you will have a nearly infinite number of marcov-chain generators and similar programs to sort through.

My point is that if you want to solve a program by generating random code, you need machine-executable fitness test.

We don't have a machine-executable test for consciousness. The Turing test relies on the intuition of humans, and we have a very finite supply of that.

> even when you pass that test, you will have a nearly infinite number of marcov-chain generators and similar programs to sort through

Not really relevant to your point, but Markov chain generators aren't able to generate syntactically correct English.

(Or, they are able to generate it, but they aren't able to stop themselves from generating syntactically invalid English, and almost all of their output will in fact be invalid.)

I think writing Macbeth isn't a good indicator of anything except being Shakespeare. Asking it a few questions about some general concepts like logic, math, and patterns(relationships in Macbeth maybe?) would probably yield results that are much more likely to pass the test.

I disagree on all counts. The first program that would be generated by such a procedure would very likely be the constant program that happens to always respond with the correct questions to your test. It depends how you randomly generate code, of course, but I doubt the complexity of a "general AI program" would ever be smaller than the complexity of the constant program which by luck returns the correct answer to your test.

Generate the Math tests then? That prevents constants.

Generating macbeth is a prompt most people here couldn’t pass, myself included...

you can still use generating Macbeth as a good indicator, just by providing some inputs and constraints.

Input is a Macbeth outline/play summary (e.g. https://www.cliffsnotes.com/literature/m/macbeth/play-summar... ) Contraints are "in the style of : KingLear and Othello"

and compare the produced text to Macbeth

> Somebody (Geoffrey Hinton, Andrew Ng, I don't remember exactly) made the point that neural networks were in a pretty moribund place, until the confluence of three things came along: more compute power, more data, and algorithmic improvements. Given that, one wonders what other older ideas are waiting to be resurrected by a similar confluence.

Andrew Ng says something like that on his deep learning Coursera course.

Also, Animats mentions perceptrons. A neural network is a generalized (combination of) perceptron(s) on steroids. That ceiling was basically removed again with the advent of the above-mentioned improvements. Similarly, the gradient descent used to optimize a neural network resembles hill climbing, as well.

> This is also why using raw machine learning for automatic driving is a terrible idea.

Agreed. Train a billion-parameter model on million-feature sensor inputs from "normal" conditions, and it will drive really well until its massive over-fitting runs into slightly unusual circumstances, like a tumbleweed rolling across the road. Then it will do something completely unpredictable, and people will probably die. ML plus pervasive surveillance can automate a lot of routine work, but it has a serious problem with outliers.

"like a tumbleweed rolling across the road. Then it will do something completely unpredictable, and people will probably die."*

<sarcasm> Seems a small price to pay for progress. We lost 37,000 people to auto accidents in 2016. I wouldn't be surprised if one or more of those deaths involved a mis-identified tumbleweed.</sarcasm>

I find your comment a little strange. First you say something interesting about Deep Dream being an early attempt at better understanding of vision networks but that it's not enough. Then you say the end is in sight?

If you just take the visualization part we have a massive amount of work to do. Understanding how these networks solve problem in sophisticated ways will motivate new advances to mitigate shortcomings. It is a software problem that can have huge benefits. For example, see the paper Visualizing Loss Functions For Neural Networks as an example of the incredible insight to be gained with better visualization.

The intuition of loss functions hasn't changed much in the last several decades. If you have a local optimizer it needs to be able to find its way down further than it already is. Anything you can do to remove junk configurations from even being considered during optimization will help. But knowing when that is happening has been really hard because it's just so computationally intensive to generate plots like in that paper.


When they say the end is in sight, they aren't referring to the problems with neural networks. In fact, they mean the limits of what neural networks can do is in sight.

I haven't seen anyone saying that the current trend of deep learning using neural networks will result in strong AI soon.

Except for Elon Musk, who with all due respect, I believe does not count as an informed person in DL.

It really bothers me that he (who is clearly an excellent entrepreneur and human being in general) makes such strong statement on a topic he is clearly not an expert in and is so vocal about it too. Given the amount of influence and number of online followers he has, I find this irresponsible.

Elon's training is in physics and economics. He understands linear algebra and programming perfectly well. He also understands that policy (economic or otherwise) won't be developed unless people put it on the government's plate. And he understands that government tends to be reactive - they are invariably way behind the curve in governing tech. So he's using his bully pulpit to get the troglodytes (a concept that maps nicely to many Republicans) and luddites (a concept that maps nicely to a lot of Democrats) in Congress to start working on the problem (for context, Ted Cruz of all people runs the Senate science and technology committee).

I'm worried about machine learning taking over corporate management. Optimizing for shareholder value is a goal a machine learning system can work on.

Yup. Machine learning driving the optimization loop, with humans working as smart sensors and actuators, responding to high-level commands. That's a pretty powerful AI right there.

Did he really say that specifically about the "current trend of deep learning using neural networks"? Surely he would have been speaking more generally about AI?

See the references I give below; if he is talking about general trends in AI and is giving an estimate of 7 or 8 years for emergence of AGI, where is he basing it on? Is he basing the estimate on unseen breakthroughs (i.e. unknown unknowns) to occur within that 7 or 8 years frame, or is he talking about current DL techniques pushed to their limits? For the former, nobody can make a reasonable estimate (nature of unknown unknowns) and for the latter, most experts in the field seem to agree that that current techniques do not lead to AGI. (Like have you seen how hard it is do things like visual question answering (VQA) or text summarization tasks? These are much much simpler than AGI but like AGI do not lend themselves easily to supervised learning.)

He talks about AI in those links, and does not mention specific approaches like DL etc. He's been around long enough to see the different approaches to AI, so as to not confuse it with a specific technology. I doubt it would even occur to him think AI=DL.

Many people have made estimates for human level AI, without relying on specific technologies. e.g. Vernor Vinge, Ray Kurzweil, Bill Joy. For example, the lower part of Vinge's range included the 2020's - for his prediction made a few decades ago.

Is Musk accurate? How can any of these be very accuracte, with so much unknown, as you note. Is he optimistic (er, predicting too early)? I guess so.

I'll note that even for Musk's own companies, upon which he is the expert, and upon which he wields more influence than anyone, he is notably optimisitic in his estimates...

Anyway, the answer to my question is: no, he didn't speak of those specific technologies, just AI.

BTW some more in my other comment in this thread: https://news.ycombinator.com/item?id=16350798

He doesn’t say soon.

See e.g. Smerity reporting on Elon's talk in NIPS 2017: https://twitter.com/Smerity/status/938994837323259904 as well as https://goo.gl/trX8SJ & https://goo.gl/aTMqv6

What's your evidence for saying he does not say so? A bit mean of you to downvote without asking for evidence.

You don't know who downvoted you and who did not.

I assumed akvadrako downvoted that since the vote and the comment appeared very close to each other and both right after my comment. But of course, I surely could be mistaken.

And anyways, downvoting is not a big deal and they have the absolute right to do so. It's just that stating why you disagreed or downvoted would be helpful.

I don't know if it's still true, but it used to be the case that if your reply to a comment your downvote to that comment does not count. It was a pretty clever mechanism in that you could be certain any downvotes didn't come from people replying to you.

I don’t see why HN doesn’t require every down vote to have a comment. Judging from the sheets of gray on most threads, most down voted comments aren’t actually in violation of site policy, off topic, trolling, etc. It’s just a few little people disagreed. Which is fine! But it’s not constructive or interesting to just slap the commenter with your dick.

Because then every other comment would have a stream of replies justifying the downvotes - replies which, presumably, would be regular comments, and thus susceptible to further downvotes (with comments)!

In other words, this would be a huge noise generator.

With the exception that it could not have been the person you replied to. (Or yourself, for completeness)

I don't see any evidence in your links; why don't you provide a quote?

Asking me for evidence someone didn't say something is a bit absurd.

I think a lot of people say this. Very few informed people say this. But for a layperson, its difficult to tell who is informed and who isn't (and in fact laypeople haven't heard of the real experts: Hinton who?)

Well, I get the feeling a lot of AI experts see some real potential for AGI to arrive sooner than expected, but they will be the first to admit that current techniques are insufficient and it would require some smart team to significantly augment current techniques with major new ideas.

It's not what the people who are experts believe, it's what everyone else thinks. The fact is that "deep learning" is now just routinely and casually called "AI" across the industry. And you can bet that people who make decisions who are not sufficiently savvy to know the limits will be duped into a false sense of confidence in this technology.

As a person outside the field, my subjective experience is that every day comes headlines with new superlatives regarding "The Coming AI Revolution." My expectations are being set for radical transformation across most areas of technology in the next decade or two on the basis of current developments in this area.

>My expectations are being set for radical transformation across most areas of technology in the next decade or two

As someone who as lived through most of the last four decades, this seems like a very reasonable expectation. I mean, this maybe wasn't obvious to people who weren't born into the IT industry, but to me? the '90s seemed to have a 'holy shit, this changes everything!" moment every few years. (Of course, some things stay the same. UNIX is older than I am, as is c, and both are recognizable and ubiquitous today.) - and I think as the aughts rolled around, those 'holy shit, this changes everything!' moments became, ah, more evenly distributed. More of this new tech touched the lives of ordinary people in obvious ways.

> on the basis of current developments in this area.

This part... I think that you have to understand that the way we talk about technological change is functionally science fiction. Yes, yes, it's very likely that the high rate of change maintained over the last few decades will continue and even continue to accelerate.[1] but I personally treat claims of where tech will be in 10 years as functional science fiction. A lot like how I treat predictions of what sectors of the stock market will be where in ten years. Yes, the market as a whole will probably go up... but to say you know the winners and losers ten years out? that's... a strong claim.

So yeah, saying "there will be revolutionary changes in the next decade!" is fine, and probably right. there were revolutionary changes last decade. And, hell, you could say that ML and similar have already given us revolutionary changes this decade. Arguing that ML will give us revolutionary changes next decade? well, maybe... but it looks more like Delphi than C to me, if you know what I mean. (Of course, I'm IT; I operate, manage and maintain infrastructure; I'm no programming expert. so take that how you will.)

[1]the counterpoint here is moore's law. Your compute per dollar is not getting massively better over time the way it was in the 80s through the 00s. I mean, it's still trending towards better, but not nearly as fast as it was. The difference between the compute power of available compute resources in 1980 and 2000 is gong to be dramatically greater than the difference between the compute power of available compute resources from 2000 to 2020. This means that if you want to make predictions, you can't rely as much on computers being more powerful in the future.

Re Moore's Law has slowed a lot. I still like Kurzweil's chart, showing that historically, we can backtrapolate the exponential improvement to previous other information technologies. So this might extend to some future information technology, other than silicon... but that form of reasoning really is science fiction.

Most of these imminant AI predictions are founded on computer complexity soon approaching that of the human brain. Even some fundamental discovery, like RNA playing an information processing role within/between neurons, that pushes brain complexity up by many orders of magnitude, would not push back exponential growth reaching it by all that many years.

The whole thing hinges on Moore's Law, which is (currently) looking rickety.

this mainly has to do with the incentives of journalists in an attention-based industry, I think.

As well as the incentives of researchers, developers, and executives in the NN industry.

Yeah what we have is an nice improvement for things, but at the end of the day I think everyone is well aware of the limits of statistical methods. What we have is the machine acting kind of like a parrot, it can get good at certain tasks like imitating an artistic style. What we don't have is independent thought or reasoning or the ability to consume a narrative. A machine can make a strange mash up of styles in writing for example, but it can't generate a narrative simply by machine learning a series of novels. The best it can do is copy the writing style. However its still an amazing advance, but for creative fields you see AI enhancing the work of the creator rather than replacing it.

1) This is Jürgen Schmidhuber claiming that ANN-based AI will soon become as intelligent as small animals, then within a decade of that, as intelligent as humans:

But I think that within not so many years we'll be able to build an NN-based AI (an NNAI) that incrementally learns to become as smart as a little animal (...) once we have animal-level AI, a few years or decades later we may have human-level AI, with truly limitless applications (...)

Will AI Surpass Human Intelligence? Interview with Prof. Jürgen Schmidhuber on Deep Learning


2) Here's a quote from DeepMind claiming their DQN system is a stepping stone towards Artificial General Intelligence:

This work represents the first demonstration of a general-purpose agent that is able to continually adapt its behavior without any human intervention, a major technical step forward in the quest for general AI.

Human-level control through Deep Reinforcement Learning


3) Here's another example from MIT Press:

“With appropriate uses of the deep learning technologies, we could be a further step closer to the true human intelligence.”

Deep Learning Machine Beats Humans in IQ Test


4) Here's Elon Musk telling NIPS that auto-cars will be significantly better than humans in 2020:

Now at the conference on Neural Information Processing Systems (NIPS) yesterday, Musk said that they could achieve some level of full self-driving within two years, but that the more important timeline would be 3 years, at which point self-driving capabilities would be significantly better than human drivers.

The quote doesn't mention deep learning, but he was at NIPS talking about deep learning.

Elon Musk updates timeline for a self-driving car, but how does Tesla play into it?


I believe, Eliezer Yudkowski et al Singularitarians have redoubled their efforts to convince everyone that the Singularity is Coming bolstered by the recent successes of Deep Learning- although youwon't find Yudkowski at least trying to make any specific predictions about AGI, or about anything.

Those are all real problems, and they are also real problems that everyone who works with AI knows about. This type of article is a nice corrective to the hype, but it faces a similar risk of being too one-sided: in this case too critical. In a way it's a shallow criticism of shallow hype, its necessary straw man and prelude.

> But almost all the interesting problems in cognition aren’t classification problems at all.

That's BS. Knowing how to name or interpret event, to apply a symbol to the noise of sensory data, is a crucial part of cognition and it happens to be the foundation of the symbolic reasoning Gary Marcus and company tout so often.

The whole "emperor has no clothes" tone of these critiques is in itself false, because the people at the center of deep learning are well aware of its limits and trying hard to work around them.

I'll try to address the epithets one by one:

* Greedy - Yes, deep learning is data hungry. No one denies it. That is a great boon for storage companies and GPU makers, so we may briefly ponder the incentives of neighboring industries to push DL. However, there's a lot of work being done to diminish the number of examples an algorithm must train on before it can make an accurate prediction. You can Google "one-shot learning" and "zero-shot" learning for more info.

* Brittle - On the one hand, deep neural networks learn what they train on; on the other, they are pretty strictly judged by their ability to generalize to data they haven't seen before. In fact, it is this ability that has one them notoriety. For work on making them less brittle, see Hinton's recent work on "capsule networks" among other research. (Capsule networks are also deep learning.)

* Opaque - interpretability is a problem, but it's not as simple as calling DNNs a black box. (Yes, the billions of parameters are not really human readable, but they are machine-readable and we can apply functions to them to obtain insight into the model.) In fact, there are many approaches to interpretability. It's some of the most exciting research in AI and it's coming out of really strong labs.

Here's something recent from DeepMind: Learning explanatory rules from noisy data https://deepmind.com/blog/learning-explanatory-rules-noisy-d...

* Shallow - This is the hard one. Gary Marcus and others suggest that we can augment learning algorithms by pre-programming them with rules or knowledge of the world. Fine. But it would seem to me much more impressive that these algorithms arrive at knowledge without human intervention of the rules-based kind, which is precisely what Hinton and LeCun and Ng have advocated doing with unsupervised learning, which is cough also deep.

One of the weird things about the role that Gary has cast for himself as the gadfly of deep learning is that he's very critical of this one narrow branch of AI. In fact, the top labs using deep learning are already combining it with other types of machine learning algorithms. DeepMind's AlphaGo is DL + reinforcement learning + Markov decision process.[0] Pedro Domingo is very clear that combining algorithms across machine-learning and AI sub-disciplines is the most promising path.

I'm not sure I know of any "deep neural networkists" who espouse a dogma that would actually justify Marcus's strange obsession. Most people, including the godfathers of deep learning, are agnostic. They use algorithms that work and will combine them with other algorithms that work when those come along. It seems like a semantic quibble. The real work continues.

[0] https://deeplearning4j.org/deepreinforcementlearning

This article is a little too glib in my opinion, preferring citations and statements to substance and explanations.

For a more cutting and insightful critique, watch Ali Rahimi's short talk at NIPS 2017 (where he was presenting for a paper that won the "Test of time" award, for standing out in value a decade after publication). The standing ovation he received at the end indicate that his comments resonated with a significant fraction of the attendees.


Here's a teaser from the talk:

"How many of you have devised a deep neural net from scratch, architecture and all, and trained it from the ground up, and when it didn't work, felt bad about yourself, like you did something wrong? This happens to me about every three months, and let me tell you, I don't think it's you [...] I think it's gradient descent's fault. I'll illustrate..."


Ben Recht and Ali Rahimi published an adendum to the talk, elaborating on the direction they envision -- http://www.argmin.net/2017/12/11/alchemy-addendum/

Ali also has a post taking a stab at organizing some puzzling basic observations about deep learning, and motivating that with analogous historical progress in optics -- http://www.argmin.net/2018/01/25/optics/


PS: The first 11 minutes, on the idea of using random features (the main idea in the research he presented) are also interesting.

In case you'd rather read text than watch a video, I believe this is the same?


I think deep nets solve isolated problems really well, problems such classification based on given training sets is very good. However we sometimes compare it to humans which is I think an error on our part, humans process lot more training data than the deep nets, the vast amount of data we process over the years is incomparable. Don't get me wrong, the architecture of deep nets are fine, it can learn anything given the data and the loss function -- but its difficult to formulate every human decision in some kind of loss function. Its really hard.

I remember NIPS being less terrible back in the day: the Google robots read their corporate slide decks, but we mostly ignored them, and they didn't have walls of sponsors with fine gradations... https://imgur.com/a/Hn3Aa

>> Google Translate is often almost as accurate as a human translator.

This is the kind of overhyped reporting of results highlighted by Douglas Hofstadter in his recent article about Google Translate:

I’ve recently seen bar graphs made by technophiles that claim to represent the “quality” of translations done by humans and by computers, and these graphs depict the latest translation engines as being within striking distance of human-level translation. To me, however, such quantification of the unquantifiable reeks of pseudoscience, or, if you prefer, of nerds trying to mathematize things whose intangible, subtle, artistic nature eludes them. To my mind, Google Translate’s output today ranges all the way from excellent to grotesque, but I can’t quantify my feelings about it.


It's funny how the article above is claiming to speak of "the downsides" to deep learning, yet it spends a few paragraphs repeating the marketing pitch of Google, Amazon and Facebook, that their AI is now as good as humans in some tasks (limited as they may be) and all thanks to deep learning. To me that goes exactly counter to the article's main point and makes me wonder, what the hell is the author trying to say- and do they even know what they're talking about?

There's only a contradiction if you insist on simplistic binary thinking. (Either translation works or it doesn't. Machine learning will either solve everything or is useless garbage.)

Instead, we can acknowledge that Google Translate has reached a useful level of skill (enough to usually get the gist of an article), but not enough to be fully reliable, accurate translation. The Hofstadter article usefully demonstrates how far we have to go.

>> There's only a contradiction if you insist on simplistic binary thinking. (Either translation works or it doesn't. Machine learning will either solve everything or is useless garbage.)

I'm sorry, but I don't see where in my comment it looks like I'm saying anything like that.

>> Instead, we can acknowledge that Google Translate has reached a useful level of skill (enough to usually get the gist of an article), but not enough to be fully reliable, accurate translation.

Hofstadter's article actually argues that Google Translate is not enough to get the gist of many passages, probably all non-trivial ones.

Okay, in that case I'm not sure what you're trying to say. Maybe something got lost in translation :-)

You can ask me what you don't understand.

That article pretty much sums up what my sister in law (runs a professional translation business) says.

To me, someone who doesn't have a second language strong enough to verify the accuracy of translation, I simply run things through google translate, to the target language and then run the output back to English. Its the translation equivalent of the game of telephone.

Take "the fat cat sat on the mat"):

"the big cat was sitting on the carpet" or "The cats of oil sat on the bed"

and lots of other things which are just odd..

What does this prove?

Languages are different, you should expect any automatic translation program to do something like this, as it is trying to translate without prior knowledge of the 'conversation' you are having.

Many languages put a lot of meaning into a single word. Look at English. "Tank" is good example. It's either a thing that holds a lot of fluid, a thing that blows up buildings on treads, or a verb that means you are taking a lot of damage in lieu of others taking damage. One word has a lot of meaning. Then you can get into conjugations and tenses, yeesh.

Google translate is not meant to be a natural language processor, it's just a dumb translator. It can't figure out context as it is just a simple text box and doesn't look at the million and one things natural language processing would use.

Trying to play telephone with it just proves that a dumb text box is dumb.

Its proves that _really_ simple concepts aren't being conveyed despite the claims that it is more than simple word substitution. The linked article talks about translation of things which require some cultural knowledge. The idea that an overweight house cat is sitting on a protective piece of material is simply lost, and replaced with "big cat" which could just as well be a lion or a bed is misleading at best and just plain wrong at worse.

I might understand if the target languages didn't have a concept because its cultural. That sentence failing to be translated implies that the target language doesn't have the concept of people/animals being overweight, or what a mat is. I might be more accepting if it came back with "throw rug" or something instead of mat, but it never does that, its like it has a list of rough synonyms and it picks one at random. Hence the "Fat cat" bit becoming what likely was "oily cat". The more subtle things (cat in this case being a house cat/pet) might be the bit of cultural information that lends understanding to the whole sentence, but it most cases that isn't really what the translation is getting wrong..

Claiming any of this is even near human translation levels is misleading at best, considering it falls down worse than most poorly translated computer manuals I've read. Despite all the claims of the wonders of DNN's the translations look little more advanced than direct word translations (fat=oil or mat=something you sleep on) with a bit of fuzz that fails in strange ways.

Rather than translating the description into an abstract idea, and then translating that idea into a different language; it seems like it's trying to go straight from English to Chinese.

To be fair, translating to 'idea-space' is really hard. The medium is the message. We have ideas in English that other languages do not have, and vice versa. Chinese is famous for not having a past tense. Spanish has a subjunctive tense that is difficult for fluent speakers to translate into English. Some languages have cardinal direction based gender tenses. The word 'che' in Italian is a head-tweaker for English speakers.

The problem expands as a binomial (handshaking problem). Google translate has 104 languages, which means 5356 cases to deal with (n(n-1)/2) for each 'idea' present. As such, the 'idea' could be translated, but it would result in a mess of a response that no native speaker would translate the 'idea' into. You'd have to teach the basics of the language to the person before real translation could occur. Languages are meant to communicate between two people (writing is subtly, but importantly, different) and context, body language, and recent history play large roles into that communication.

> Claiming any of this is even near human translation levels is misleading at best

Oh, got it. The issue is that Google is misrepresenting it's translate service as an 'end-all-be-all'. Yes, I would agree with that.

But as Hackers, we should know better than to think that it could ever be such, given that it is just a text-entry box. NLP needs a lot of context that a text box can never have.

This is actually a very useful measure of the amount of error in Google Translate translations, or any similar process really. I see it as sending a noisy signal down some medium. You send the signal back and forth over the medium and observe the amount of information lost in the process. The speed by which the signal is corrupted beyond the ability to extract useful information from it is indicative of the amount of noise in it (but also tells us something about the medium itself).

Hofstadter makes it obvious that 'Google Translator' is a misnomer (and not even a naive one). Even "dumb translator" seems overblown. 'Magic decoder ring' might be closer.

It's troubling is that there are, undoubtedly, people out there who actually believe that the 'translation' they're seeing is a halfways accurate represention of the original. If that accidentally happens, it's sleight-of-hand. If not, it could be dangerously misleading.

It's kinda like everything Google does these days. They wow'd you 5-10 years ago, and now they've regressed. Look at Maps, it was better in 2010. Now? I have to second guess it at just about every freeway on-ramp.

This business of 'constant improvement' very often leads to inferior, harder-to-use product. When the person who 'got it' and 'really cared' moves on, you can see it in the result.

Maps was NOT better in 2010. That's an absurd claim to make.

Then I'm absurd. To me, the interface was better then. It really started dropping off in ~2015.

This is a nice example of adversary input, but I think it misses the point a bit. The point being, if I take a couple paragraphs of a tech text, online article, or the like, I will get a correct translation most of the time.

It won't do poetry, but GT did remove a need in a human translator in many everyday tasks.

Just last week a friend asked me to translate a preamble to a master thesis (political economy). I fed it into GT and it just worked.

Where is the contradiction?

I've said it before and I'll say it again. Machine learning is specifically not magic. It only works to the extent that we can build our own priors into the model.

A typical media story... deep learning really is great. It represents the first time we've really figured out how to do large-scale nonlinear regression. But it is certainly not a magic bullet. However moderate headlines don't get as many hits as overhyped ones so every day we get another ridiculous article spouting nonsense...

Very tiresome. The truth is pretty interesting, can we talk about that instead?

For instance, how and why deep learning works at ALL is very much an open question. Consider - we're taking an incredibly nonlinear, nonconvex optimization problem and optimizing in just about the dumbest way imaginable, first-order gradient descent. It is really amazing that this works as well as it does.

... Why does deeper work better than wider? It has been known for many years that a shallow net has equivalent expressivity to a deep one. So what gives? (actually some interesting work towards answering this question in recent years by Sohl-Dickenstein et. al)

Here here. Training an algorithm to mimic prior knowledge. The prior knowledge is inherently there. It's like the inflationary claims for QC which depend on demonstrated factorisation to a known product of primes: it's not magic when you seek to minimise error to a known goal but you have to know the goal.

> Machine learning [..] only works to the extent that we can build our own priors into the model.

AlphaZero is a good counterexample. In contrast to AlphaGo, it has no priors.

Well it does have some priors baked into it by the (convolutional) architecture of the network. See the “deep image prior” project for a feel for just how strong this convolutional prior is.

Humans, as opposed to deep learning, have embodiment. We can move about, push and prod, formulate ideas and test them in the world. A deep net can't do any of that in the supervised learning setting. The only way to do that is inside an RL agent. The problem is that any of our RL agents so far need to run inside a simulated environment, which is orders of magnitude less complex than reality. So they can't learn because they can't explore like us.

The solution would be to improve embodiment for neural nets and to equip RL agents with internal world simulators (a world model) they could use to plan ahead. So we need simulation both outside and inside agents. Neural nets by themselves are not even the complete answer. But what is missing is not necessarily a new algorithm or data representation, it's the whole world-agent complex.

Not to mention that a human alone is not much use - we need society and culture to unlock our potential. Before we knew the cause, we believed disease was caused by gods, and it took many deaths to unlock the mystery. We're not perfect either, we just sit on top of the previous generations. Another advantage we have - we have a builtin reward system that guides learning, which was created by evolution. We have to create this reward system for RL agents from scratch.

In some special cases like board games, the board is a perfect simulation in itself (happens to be trivial, just observe the rules, play against a replica of yourself). In that case RL agents can reach superhuman intelligence, but that is mostly on account of having a perfect playground to test ideas in.

In the future simulation and RL will form the next step in AI. The current limiting factor block is not the net, but the simulator. I think everyone here has noticed the blooming of many game environments used for training RL agents from DeepMind, OpenAI, Atari, StarCraft, Dota2, GTA, MuJoCo and others. It's a race to build the playground for the future intelligences.

Latest paper from DeepMind?

> IMPALA: Scalable Distributed DeepRL in DMLab-30. DMLab-30 is a collection of new levels designed using our open source RL environment DeepMind Lab. These environments enable any DeepRL researcher to test systems on a large spectrum of interesting tasks either individually or in a multi-task setting.

Before we build an AI, we need to build a world for that AI to be in.

Embodiment is part of the picture, but I also am not sure that we have yet developed structures that are capable of learning in the way that humans do, no matter how rich the environment.

Rocks, flatworms, and the Hoover Dam all have embodiment in the same complex world that we do, but none of them will ever coherently debate philosophy, because they have no structures capable of learning to do so. I’m not convinced that our RL agents do either.

That's easy. Flatworms can't because we took the top spot and hogged resources. Give them a world where they are the most advanced species, and a few billion years, and they can debate philosophy too. What would happen if a flatworm came out of the sea and tried to take up land from us?

Hoover Dam is not embodied because it is not an agent. An agent has good an bad, life and death. The Dam doesn't give a damn about any of it.

> run inside a simulated environment

It's pretty common to use DRL in real-world robotics e.g. in self-driving cars or robotic motion planing, so you don't need to simulate the real thing. The obvious issue with RL is that it needs even more iterations/episodes/epochs than a typical DL, making real-world development either impractical (time) or too costly.

To my untrained mind it is unexpected on the one hand to observe complete sensory deprivation for extended periods is considered highly cruel to sapients, yet here we are on the other hand, building what amounts to sensory deprivation tanks with tightly-controlled inputs, and expecting to somehow build sapient minds out of those modest beginnings. Even what we consider as "only" sentient-but-not-sapient phyla within the Animalia kingdom appear to wither on increasing degrees of sensory deprivation, perhaps until flatworm-scale sentience [1].

[1] https://www.sciencedirect.com/science/article/pii/0003347265...

> The solution would be to improve embodiment for neural nets and to equip RL agents with internal world simulators (a world model) they could use to plan ahead.

That’s one possible solution, but it only works to the extent that your simulator accurately reflects reality. Any hard coded simulation is therefore doomed to fail badly, and any adaptable “simulation” isn’t really a simulation at all — just more neural architecture.

> So we need simulation both outside and inside agents.

We don’t even need a simulation outside of agents, if you just use physical robot bodies :) Humans do just fine learning without extra simulation environments — the real world is enough.

Of course, simulated environments are incredibly useful, but not in the ways you’re claiming. Mostly it’s because reinforcement learning is still so slow to learn that without a simulated environment it would take forever to test anything. Also, they allow repeatable, testable results that allow learning effectiveness to be measured scientifically, etc.

Also I know you can formulate RL such that you use a simulator as part of the learning algorithm, but IMO that is cheating at worse, and a misguided rewsearch path at best.

> Neural nets by themselves are not even the complete answer.

Only to the extent that human brains are also not the complete answer to intelligence (i.e. maybe there’s a possible brain architecutre much better than ours), but I don’t think that’s what you’re saying.

If you meant the current wave of artificial neurons are not the complete answer, then of course I agree. But I don’t think that’s what you’re saying either.

Show me the part of our brain that you think contains a perfect simulation of reality and I’ll concede. Now certainly, our brain can do a kind of “neural simulations” to understand and reason about consequences of actions, but that “simulation” (if we can even call it such) is learned.

RL agents (and humans) don't actually need perfect internal simulation of the environment. Instead we need ability to quickly recover from deviations, when the model diverges from reality, like self driving cars. We are using the world itself as a "simulator".

The problem with AI agents doing the same is that it's expensive. We'd need cheap, fast, indestructible robots and environments to do that. Humans benefited form millions of years of evolution, and previous species took billions of years to evolve up to our species. That's a lot of "high resolution environment time" which would be impossible to recreate in silico as of now.

Then there's the problem of sample complexity - humans need less examples than AI agents to learn, and that's because we have prior knowledge about the world encoded in our genetic code by evolution, while RL agents don't. We have to bake in some pretty clever prior knowledge in RL agents if we want them to be efficient.

So AI's can make due with imperfect internal simulation, but they need perfect external simulation of the same complexity with the real world in order to become smart like us.

agreed, this project is trying to do just that: https://github.com/jtoy/sensenet

There are groups and companies exploring probabilistic programming as an alternative to CNN and other deep learning techniques. Gamalon (www.gamalon.com) combines human and machine learning to provide more accurate results while requiring much, much less training data. The models it generates are also auditable and human readable/editable - solving the "opaque" issue with deep learning techniques. Uber is exploring some of the same techniques with their Pyro framework.

Having said all of this, we're not arguing that CNN have no place - in fact you can view CNNs as just a different type of program as part of an overall probabilistic programming framework.

What we're seeing is that the requirement of large labeled training sets becomes a huge barriers as complexity scales - making understanding complex, multi-intent language challenging.

Disclosure: I work for Gamalon

Are there scholarly papers describing the Gamalon Idea Learning system? My cursory search shows only tech press, TEDx talks and pdfs of slides.

Edit: and patents, lots of patents.

Edit2: maybe this: https://github.com/gamalon/chimple2

My intuition is that the approach Gamalon is using has more potential than deep learning.

I've been playing with the concept for a while however failing to get any good results. Debugging probabilistic programs is so damn long and difficult since bugs can show as subtle biases in output instead of clear cut deviations. (I described my approach here: https://www.quora.com/What-deep-learning-ideas-have-you-trie... ) For me, this is just a hobby.

Joshua B. Tenenbaum et al.'s group seem to name their approach program induction, I had called mine Bayesian Auto Programming. I see you are calling yours Bayesian Program Synthesis. Clearly we have similar intuition about the essence of the solution to AI.

I wish you better luck than me.

Very interesting report on your discoveries. Makes me remember of Genetic Programming. Was thinking about using the same principles to generate a more declarative Bayesian program (like a subset of SVG).

Do you have some source available around your experiments?

By "human learning" do you mean there is a human in the loop, or do you mean the system mimics how we think humans learn? Looking at their website the latter appears to be the case.

A human can provide guidance to the system and edit models. As to whether it “thinks like a human” I think that’s always a dangerous and loaded statement - but what we believe is that a Bayesian probabilistic approach is closer to the way humans learn and make decisions.

agree with the other posters, could you guys please publish something on what you're doing?

Another good article in a similar vein: "Asking the Right Questions About AI"


HN discussion: https://news.ycombinator.com/item?id=16286676

I love the way the articles trots out the line that Google translate is almost as good as a human translator, in view of Hofsdadter's recent article.

If good is defined by more factors than quality of translation, maybe. But translations between the two languages I speak, English and Spanish, are never good enough to actually use without a lot of follow up work by a human.

Have you tried the translator by DeepL? It often gets incredibly close on my experience.

No, I haven’t. I have only used Google Translate and the one Microsoft has. Interestingly, I feel like, without any actual measurement, that Google Translate seems to fail on simple translations more frequently than before, but I also use it far less frequently now than in the past.

Do you have a link for the one by DeepL?

Edit: https://www.deepl.com/translator

> If good is defined by more factors than quality of translation, maybe.

That translates to 'is not good' in my language. Your low-key sarcasm is much appreciated.

> That translates to 'is not good' in my language

The original line is more precise. In most cases, I need the gist of a block of text in an unfamiliar language. Google Translate is faster and cheaper than a human translator. That makes it better, for this purpose, than a human, despite its lower accuracy.

When I need legal documents translated, most human translators are not good enough. A specialist, and only a specialist, will do.

Haha, it was only kind of sarcastic. But thank you. I meant, if you evaluate based on things like speed, volume, and other measures, and quality of translation is secondary to “good enough” it may be better than humans. However, if you base purely on accuracy, no way. :-)

Calling it "deep learning" was the first mistake. It makes it sound a lot more profound than it really is. "Machine intuition" is the term I prefer.

I like "machine perception". We're not much past the point of sensory-motor development.

I have heard it called computer vision.

That's different. Some deep learning models are used for computer vision, and some computer vision involves deep learning. Neither is an essential part of the other.

"linear transformations with soft thresholds sandwiched in between"

LT-STIB = Linear Transformations w/ Soft Thresholds In Between

How about "long skinny networks" over "short and fat networks". Deep maps to profound in my mind rather than just talking about lots of layers.

I prefer "just throwing cycles at the problem".

You can throw cycles in various ways (like with regular algorithmic code that implements various heuristics or something). So, "just throwing cycles at statistics"

See also “p-value fishing”

We had a presentation at work about AI and Deep Learning a while ago, and I asked what is the test approach or test plan for deep learning... the answer I got was a strange look.

If you have a self driving car crash and the cause is "the algorithm", that's not going to be satisfying to customers, insurance agencies, or regulators [I should be clear, the team giving the presentation does not work on SDCs].

This article presents zero evidence or indications for their claim. One argument is persistence. Quoting: " “We are born knowing there are causal relationships in the world, that wholes can be made of parts, and that the world consists of places and objects that persist in space and time,” Which is unsubstantiated and irrelevant, because AI that understands 3D is only now being developed. Also children up to 3 years of age or so, cannot understand the perspective of 3rd parties. There might be some hard-wired rules in our brain, but that's not intelligence anyway.

The article has a point about something: Conventional feed-forward, convolutional neural networks can only model a very limited space in the grand scheme of things. Backpropagation is not perfect. There are other learning methods. Hell, sometimes there isn't a global minimum anyway. But saying that Deep Learning will stall in the near future is just wrong, and in my opinion the reasons why are evident to all those who follow the latest developments.

It's the combined enthusiasm of academics and entrepreneurs the one driving the current revolution on artificial intelligence. Said enthusiasm is punctuated by high profile CEOs and investors making fiery remarks from time to time. But alas, we people are not enthusiastic forever about something, and sooner or later our collective psyche will move on.

That doesn't mean that the technology revolution, and it's AI component will stop. We have had machine learning for a long time, doing its thing, as best we managed to make it work. It's impact inside our collective speech was more subdued, that for sure, but it was there. Research and development never stopped. Even if some big name university funded it less and shut up about building GAI, well there were still thousands of less shiny institutions and companies working in more tractable problems and building a foundation.

And, to be clear, I wish the current collective enthusiasm lasts a little bit longer, because we have a long way to go still and research grants and investment money flow better when the media is abuzz with the subject. In particular, we need to either move out or build upon matrix crunchers like deep learning. Better forms of AI and eventually GAI will need a little bit of innovation in chip-making and in computing architectures in general.

> Marcus believes that deep learning is not “a universal solvent, but one tool among many.” And without new approaches, Marcus worries that AI is rushing toward a wall, beyond which lie all the problems that pattern recognition cannot solve.

The thing that fascinates me is that we've only just scratched the surface of what today's ML tech can solve. Let's worry about that wall when we get there...

In the meantime let's not lose sight of today's potential in some misguided idealistic pursuit of perfectionism or "general artificial intelligence".

There are countless problems which current deep learning research combined with some well-thought out UI/UX could solve today in a myriad of industries.

The 1990's software 'revolution' in industry/business was largely just formalization/automation of paper-based processes into spreadsheets and simple databases, which then evolved into glorified CRUD/CMS software interfaces on desktops, then web/SaaS, and then another massively boost with smartphones.

If such a simple translation of human processes into machines can achieve trillions of dollars in value then there is no doubt machine learning can do the same for hundreds of thousands of other simple problem-sets which we haven't even considered. Plus the desktop/smartphone/internet/etc infrastructure is already in place for it to be plugged into.

This can only be negatively judged in the context of all significant steps forward in technology being oversold and misunderstood. But in practical real-world utility we're very far from fully utilizing what has been researched and accomplished today in a small set of markets. And the proliferation of this tech should be encouraged, promoted, and accurately communicated to tech/business talent who can potentially use it, rather than downplayed because it fails to live up to some media hyperbole or SciFi fantasies of where we should be in 2018.

The article mentions that taking AI/ML/data science courses has just become the "hottest new field" for young smart kids to join. Well that means we're just on the cusp of taking advantage of that technological evolution and it's FAR too early to look at what's been accomplished today and be pessimistic about deep learnings potential.

I would argue that 'greedy' applies to neutral networks in a different sense: they seize on their first solution; they have no mechanism to re-evaluate a decision that is proving false.

Sure they do, it happens at training time. Each time you update parameters you look at all your predictions and figure out how well you did. That informs your next tweak to the parameters. Run online training and you are constantly doing this.

Meta Thought: Isn't this basically describing the difference between human children and human adults? How effectively they can bridge known and unknown context?

The article seems a bit outdated in that Hinton himself has argued for the need to go beyond backpropagation.

Marcus' article mentions Hinton's recent misgivings about backprop:

"Perhaps most notably of all, Geoff Hinton has been courageous enough to reconsider has own beliefs, revealing in an August interview with the news site Axios 16 that he is “deeply suspicious” of back-propagation, a key enabler of deep learning that he helped pioneer, because of his concern about its dependence on labeled data sets. Instead, he suggested (in Axios’ paraphrase) that “entirely new methods will probably have to be invented.” I share Hinton’s excitement in seeing what comes next.

> “We are born knowing there are causal relationships in the world, that wholes can be made of parts, and that the world consists of places and objects that persist in space and time,” he says.

Here's a little too overconfident in the claim that these are innate ideas or something a priori.

I am still wondering, why is it not possible, or maybe too hard, to take a trained neural network and reduce its amount of neurons to get a simplified solution to a problem?

Is there some theoretical or mathematical analysis of neural networks?

If the hypothesis this article is aiming to strike down is that there is currently a clear path to solving AGI, then it's simply fighting a potential misconception of the masses. The reality is that recent advances in AI, while not providing the full picture, are quite strong. Object recognition ten years ago, even at the level of 2011's Imagenet results, seemed impossible. The downsides to deep learning mentioned in the article, are both clear side effects of the formulation, and simultaneously being addressed by the community in various ways. For what it's worth, I'm with LeCun and Hinton on this one. The abstract human reasoning required for even high level perceptual tasks is difficult for humans to consciously dissect, but it could easily be that it's actually relatively mundane when subjected to appropriate representations.

> The abstract human reasoning required for even high level perceptual tasks is difficult for humans to consciously dissect, but it could easily be that it's actually relatively mundane

I don't think so. Philosophers have theorized about abstract thought for thousands of years. The materialist metaphysics that is uncritically and unconsciously held by the triumphalist philistines isn't even capable IN PRINCIPLE of accounting for it.

In any case, AI is useful, and will continue to be, but the fanfare is just noise distracting from the truth of what it is.

relatively mundane when subjected to appropriate representations.

You omitted the important part of my statement. And the difficulty humans encounter in reasoning about the nature of abstract thought itself is something I also pointed out. My point is that this is not necessarily an indication of inherent difficulty. The ability to model and explain the nature of a self-reflecting mechanism is not a guaranteed natural ability of that mechanism itself.

Keep in mind, folks -- the following is also true:

> Greedy, Brittle, Opaque, and Shallow: The Downsides to Evolved Human Intelligence

I wonder if we can have our cake and eat it too, in the realm of machine learning and artificial intelligence.

The problem I see with this article is that it starts with problems with deep learning today and extrapolates to say that researchers won't get past them:

"None of these shortcomings is likely to be solved soon."

There's no evidence for this, and I think it underestimates the large and well-funded machine learning community. Maybe they'll run into a brick wall, but who's to say a bunch of smart researchers won't figure out ways around today's limitations?

Without physical constraints or impossibility results, I think the only rational outside view is that this is unpredictable. For any of today's pressing problems with machine learning, maybe someone will post a new research paper tomorrow with a different approach that solves it. Or maybe not?

The 50 year history of AI research portends the difficulty in surmounting secondary obstacles unearthed by each new AI technique's innovations. The limits of deep learning are almost certainly no different from past statistically-driven methods in reaching comparable limits, especially those obstacles that show no signs of vulnerability to the slings and arrows inherent in rich semantics. The recent article by Doug Hofstadter in The Atlantic on using DNNs to do machine translation highlights this nicely.

To date, no work in DNNs has shown any potential for redressing problems of the common sense knowledge needed to know when your translated sentence is semantic nonsense. Likewise, no statistic-based learning method has shown itself capable of tapping into much less creating the deep knowledgebases and wealth of relations that give bare facts the semantic meaning needed to convey or understand subtleties in messages that are more than trivial.

Deft mimicry may win a Turing Test, but complex syntheses of thought (like writing a fictional story in which characters have internal thoughts and hidden motives that depend on relationships, subtexts, and dependencies) — no form of AI has yet shown any potential to solve such problems. Too much internal state and complex relations must be modeled, much less, the ability to extend and translate these to many possible worlds — as humans do every day.

No, statistics-based AI techniques (like deep nets) show little promise to grow indefinitely unto full (wo)manhood.

I'm just an outside observer, but I wouldn't say "no potential". For example, I'm reminded of a paper [1] about a technique that allows translation between language pairs that weren't in the training data.

This doesn't rise to the level of common sense yet, but it seems impressive, and there is a language-independent portion of the neural net that seems to be encoding something.

[1] https://research.googleblog.com/2016/11/zero-shot-translatio...

>>> "None of these shortcomings is likely to be solved soon."

>> There's no evidence for this, and I think it underestimates the large and well-funded machine learning community.

Well- deep learning has been around for a few decades now. LeCun's LeNet-5 is already 20 years old. LSTM is 21 (from the Hochreiter and Schmidhuber paper). Those are the two best-known today architectures, but work on deep nets goes way back to the '80s.

So those are old algorithms and techniques and they've had the same limitations for 20+ years. What makes them "likely to be solved soon" now?

A lot more attention. The field seems to be moving pretty fast and doesn't seem to be stuck.

But I wouldn't say "likely" or "unlikely" but rather that it's uncertain.

Two problems with that.

First, what determines whether a problem is close to solving is usually not the amount of interest in it but, rather, its hardness. See for instance P = NP, or the drive to find a cure for "cancer". Neither has been solved yet despite considerable effort and investment (particularly in cancer cures). The limitations of deep learning are a similarly hard problem- they're "baked in" to its DNA so it's a very hard problem how to overcome them [1].

Second, yep- the field is stuck, well and truly stuck. The fact that LSTM and CNN are 20+ years old is telling. The vast majority of ANN research today is basically trying small tweaks to old ideas to see if something will come out of them. There is very little incentive to research something radically different- because people can already make a very lucrative career with little timid architectural tweaks and researching something new and untried is risky.

So even if just throwing a lot of money and minds to a hard problem could just make it go away, nobody is really trying. It will be a long time before a majority of researchers recognise the limitations of deep learning and in general statistical machine learning and do something about them.


[1] The article links to an article by François Chollet of Keras. That's the second part of two. The first one is orders of magnitude better informed than the article above and recommended background reading in any discussion about the limitations of deep learning:


If it is shallow than maybe it should not be called deep? :)

In any case Hinton's use of the word "deep" was an excellent one from a marketing perspective:

" In 2006, a publication by Geoff Hinton, Osindero and Teh[26][27] showed how a many-layered feedforward neural network could be effectively pre-trained one layer at a time, treating each layer in turn as an unsupervised restricted Boltzmann machine, then fine-tuning it using supervised backpropagation.[28] The paper referred to learning for deep belief nets."

Perhaps this usage ("deep belief networks") was too effective, as so many marketing terms prove to be. Hinton is likely well-aware of the importance of terminology that captures the imagination of the public (and the marketplace for research).

Still, this observation might be not deep enough for the hn folks ;p

Those four words convey potential problems in any model or method, which an engineer would want to address.

Nothing new to see here. Just a recycle of Gary's Marcus's recent criticism of deep learning.

Applications are open for YC Summer 2018

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact