The Limitations of Deep Learning

therajiv · on July 17, 2017

As someone primarily interested in interpretation of deep models, I strongly resonate with this warning against anthropomorphization of neural networks. Deep learning isn't special; deep models tend to be more accurate than other methods, but fundamentally they aren't much closer to working like the human brain than e.g. gradient boosting models.

I think a lot of the issue stems from layman explanations of neural networks. Pretty much every time DL is covered by media, there has to be some contrived comparison to human brains; these descriptions frequently extend to DL tutorials as well. It's important for that idea to be dispelled when people actually start applying deep models. The model's intuition doesn't work like a human's, and that can often lead to unsatisfying conclusions (e.g. the panda --> gibbon example that Francois presents).

Unrelatedly, if people were more cautious about anthropomorphization, we'd probably have to deal a lot less with the irresponsible AI fearmongering that seems to dominate public opinion of the field. (I'm not trying to undermine the danger of AI models here, I just take issue with how most of the populace views the field.)

kensoh · on July 17, 2017

I don't have ML or deep learning background (no Masters or PhD), adding comment from experience with backtesting trading systems. We will collect market data and design algorithms that seem to produce the kind of outcomes we want. Then test on some other data sets which the algorithms have never been applied on. Many iterations later, you can get a decent profitable algorithm. And if the 'holy grail' algo is run in market long enough, eventually there will be severe drawdown and going bust. The quality of the algo and I assume the deep learning model lies in the quality (breadth and depth) of the data, and how honest with himself the person choose to model it. There will be time and again new 'black swan' or edge events happening (remember LTCM), because using machine learning is like using the past to predict the future.

I guess as long as the users' expectations are correct it can be useful in some very specific areas. Referencing the AlphaGo game last year, I was a Go player for more than a decade. But yet AlphaGo's weird move inspires new insights that break the conventional structure / thinking-framework of a Go player. From that angle, I do think that even though DL is somewhat a blackbox, humans can pick up new insights because it explores areas which are normally ridiculous to a human with 'common sense' to explore.

paganel · on July 17, 2017

> The quality of the algo and I assume the deep learning model lies in the quality (breadth and depth) of the data, and how honest with himself the person choose to model it.

I've only dabbled with machine-learning here and there for the past 10 years or so, but if there's one thing I've learned so far is that the data behind your ML code (and the way it is structured) is responsible for almost all the success or failure of any given ML algorithm. I have an younger colleague at work who I've started tutoring, and he seems really interested in doing ML work (maybe because of all of the recent hype).

I've tried to emphasize to him several times that ML algorithms come and go and that he should focus a lot of his time on the data itself (from where he intends to collect it? how is it structured? is it reliable? is it "enough"? etc), but it looks that my data-related advice falls on deaf ears every time, he's only interested in me pointing to him the latest cool ML algorithm. I guess he'll live and learn, so to speak.

tchalla · on July 17, 2017

> I've learned so far is that the data behind your ML code (and the way it is structured) is responsible for almost all the success or failure of any given ML algorithm

Data is indeed a necessary condition but certainly not sufficient. You require a good marriage between engineering features and data to have a good success rate. Learning curves [0] are a good way to understand if your ML algorithm requires more data or better feature engineering.

[0] http://mlwiki.org/index.php/Learning_Curves

honestoHeminway · on July 18, 2017

Much of the programming with ML has moved towards cleaning, extrapolating and generating the data.

But this type of programing is - miracles- bugfree. We never hear of data-conversion gone wrong, data corrupted or data-mining withou conclusive results here. Obviously such bugs lack the glamour of security bugs.

skgoa · on July 18, 2017

It's also very difficult to catch these errors. Your trained model just doesn't work as well as it could, but how would you be able to tell?

Buttons840 · on July 17, 2017

> focus a lot of his time on the data itself... from where he intends to collect it? how is it structured? is it reliable? is it "enough"?

What's the best books on this subject? I suppose it's a very broad topic and thus more difficult to talk about than a single "neural network" algorithm.

bllguo · on July 17, 2017

Interested in what part of that you feel needs to be explained in more depth? Not sure reading several books is necessary for explaining data collection and data munging...to me it's definitely something best learned by doing.

work in data analysis/stats

Buttons840 · on July 18, 2017

Lots of things are best learned by doing. I just noticed there are dozens of books about machine learning algorithms but none on how to gather data. Of course, both those things can be learned independently, but I think there's room for at least a few books about data gathering considering it's so important for good machine learning results.

_oya8 · on July 19, 2017

Here at Manning (we're publishing Francois Book) have something in our early access program on this now - https://www.manning.com/books/the-art-of-data-usability

emodendroket · on July 18, 2017

This is the domain of statistics, isn't it?

randcraw · on July 18, 2017

Agreed. AFAIK, only statistics has addressed the question of info sufficiency in data and discriminative power of method. Personally, I think the former is an enormously important subject that isn't addressed well in most ML texts. How much data is necessary to answer a given question in practice? How do you know if your data or method are "good enough"?

From what I've seen, statistics addresses these questions better than CS-taught ML does. CS-based ML is no different from algorithm analysis; it suffers from sensitivity to limits inherent in the data. But ML courses often don't address these limits very rigorously. Yet knowing those limits is all important when effectively mining information at a professional level.

If you can't tell the decision maker what you know and what you don't, your inference/prediction really isn't useful. From what I've seen, statistics addresses this best.

kensoh · on July 17, 2017

Thanks for sharing your experience. I'm happy that my previous exposure to trading algorithms at least helped me understand more what the experts here are talking about. I believe the output model is only as good as the data (at least for the deep learning branch of ML). If the dataset does not cover data-points which exist in a wider space but in the same domain of the problem, or which haven't yet have a precedent, then we really can't simply assume that it is the algo/model that needs tweaking when shit hits the fan.

tlear · on July 18, 2017

This is incredibly true, even with crappy old algorithms you can do A LOT if you have great data.

Recent experience with a company that is building some models based on.. few guys recording few hours of audio and annotating it. I still can't get over the fact that otherwise smart people think this is going to work at all.

red75prime · on July 18, 2017

> but it looks that my data-related advice falls on deaf ears every time, he's only interested in me pointing to him the latest cool ML algorithm.

So, it seems their learning/planning algorithm fails, even when it is given the right data. That's unfortunate.

Sorry, I can't help but notice that you aren't happy with their brain's algorithm, while talking about importance of data. I don't say that data doesn't matter or anything. Just random observation.

landon32 · on July 18, 2017

Could actually be their data, right? Imagine if you had only had experience with software engineering. The only data you use when engineering software are the data you learn when using the product or writing tests, it's all the algorithms behind it that's important. So to them, they just don't have data on situations where the data are important.

Wow that's confusing wording. I hope it makes sense.

red75prime · on July 18, 2017

It does, but the algorithm doesn't seems to be state-of-the-art, it's more like current ML algorithms, which need lots of data to work successfully in each new domain. Well, there's a lot of improvement possibilities, at least.

yters · on July 17, 2017

The data processing inequality says processing data does not increase its information content.

yorwba · on July 18, 2017

But processing does increase the "obviousness" of the information content.

E.g. projecting the data onto independent dimensions doesn't change the information it contains, but it highlights that those dimensions are indeed independent. Decomposing a multimodal distribution into a mixture of unimodal distribution gives more insight than just viewing it as a bunch of data mushed together. And so on.

I think there should be a branch of information theory that quantifies the obviousness of information and how it is changed by various data processing methods.

fspeech · on July 17, 2017

The "creative" moves may very well come from the search part of the AlphaGo algorithm, though of course the networks have done their jobs of pruning the search space.

kensoh · on July 17, 2017

I see.. That's true. Though credit still goes to the algo for choosing that particular weird move out of the entire search space (it's just 'weird' and something you will think is a move made by a total newbie to the game). I remembered for that whole week during lunchtime I would watch the broadcast live on YouTube. How devastated I was to see Lee Sedol losing match after match. It was a moment I would never forget, in my mind the computer had crossed an imaginary threshold and it won. I know ML/DL experts will say it is only for a very specific area. But what's stopping more mastery of enough 'specific' areas that the mastery will be broad enough to pass Turing tests?

webmaven · on July 17, 2017

Careful, that's the sort of thinking that led to the last 'AI Winter': assuming that if enough rule-based expert systems were built, general-purpose systems could be assembled from them and/or enough could be learned to build general-purpose systems.

Now, it is worth noting that DL models are already being assembled together (often with a coordinating DL model to switch between them). This can have the advantage of the smaller models being reusable to some extent (certainly more than expert systems ever were) but is not a panacea. The results are still essentially bespoke models rather than general purpose ones.

Deep Learning obviously has a lot more mileage left in it, given that much human mental labor is 'just' training and using our general-purpose intellects for what amount to a series of rather narrowly defined tasks, but it won't surprise me if there is a wall of some sort lurking just over the horizon that will require a different approach (albeit one that may still be called 'deep learning') to cross.

OTOH, it does seem as though the folks at DeepMind are fairly aggressively pursuing whatever is on the other side of that particular horizon:

https://deepmind.com/blog/neural-approach-relational-reasoni...

https://deepmind.com/blog/cognitive-psychology/

https://deepmind.com/blog/imagine-creating-new-visual-concep...

eanzenberg · on July 17, 2017

We can debate, but I don't think another AI winter will happen again in my lifetime. AI work is just earning way too much money for its funding to get cut, and a lot of funding is currently private too.

webmaven · on July 18, 2017

I wasn't arguing for another AI Winter per-se. My warning was more along the lines of pointing out a potential personal "career winter".

sjg007 · on July 18, 2017

I'd be surprised to see inductive learning anytime soon. But I definitely see the next generation of AI systems, robots and their implementation across industry. But that will rapidly fill out and then we will still be left with self determination.

mtremsal · on July 17, 2017

My understanding is that innovation comes from reinforcement learning during self-play (rather than supervised learning of pro games), and thus goes against the best moves suggested by AlphaGo's policy network, in turn pushing it towards new options.

In a sense, it seems innovation arises when the value network forces the policy network to expand the search space because an apparently unlikely move leads to downstream positions deemed favorable.

Cybiote · on July 17, 2017

It's not that simple. The creativity is that the combination of rollouts, policy and value networks allow for more efficient traversal of the search space. Which gets you better exploration of possible paths, meaning more options than a human considered and therefore more creativity.

Florin_Andrei · on July 17, 2017

> Pretty much every time DL is covered by media, there has to be some contrived comparison to human brains

Well, what we've done so far is emulate maybe 1 mm^3 of brain matter - some isolated, very specialized functional blocks in the greater architecture of the brain. They behave as expected - are experts on very narrow topics, but of course fail to integrate their functioning with a larger body of knowledge, because that body just isn't there (yet).

The strength of the human mind is that is has this profusion of little subject matter experts all over the place, covering an enormous array of topics - and then it has an intricate superstructure that integrates the outputs of these narrow expert machines, tweaks their functioning, even subtly alters their inputs, providing coherence to the global output according to the capabilities of the whole system.

We're still far from that complex high level architecture.

civilitty · on July 18, 2017

> Well, what we've done so far is emulate maybe 1 mm^3 of brain matter - some isolated, very specialized functional blocks in the greater architecture of the brain. They behave as expected - are experts on very narrow topics, but of course fail to integrate their functioning with a larger body of knowledge, because that body just isn't there (yet).

I think you're falling into the same anthropomorphism trap that the GP is talking about. We haven't even breached the most important topic: neural plasticity - a brain's ability to rewire itself based on a complex feedback loop driven by environmental inputs (which are, at this point in human development, an almost infinitely more complex system of culture built up over tens of thousands of years). From my work in neuroscience, it seems that the computational complexity of the state of the art DL algorithms barely register when compared to a network of a few hundred biological neurons like the nervous system of Caenorhabditis elegans, which is itself far less capable of self reorganization than even the simplest mammalian brain. Hell, even the most basic potentiation that you'd find in decades old research on addiction is far outside the scope of modern machine learning research and we don't yet have any clean mathematical theories that can emulate plasticity like back propagation or gradient descent can with simple learning.

The current hype around neural networks is the equivalent of saying that we've analytically solved the n-body problem when all we've done is solve a system of equations with two linear variables. The domains are connected but only in the trivial sense that both have variables named "x" and "y."

Florin_Andrei · on July 18, 2017

I think you're far too eager to look for and criticize anthropomorphism - hence you see it where it's not.

civilitty · on July 18, 2017

You said "what we've done so far is emulate maybe 1 mm^3 of brain matter," comparing computational neural networks to us, a biological system - that's literally anthropomorphising.

skgoa · on July 18, 2017

You seem to be under the assumption that a typical feedforward DNN is anywhere close to operating like the brain, just on a smaller scale. But that assumption is not correct.

Both the brain and artificial neural networks are connectivist, but that's about where the similarities end. The brain uses completely unknown algorithms and mechanisms that are almost certainly very different from our (current) ANNs. So it's not just a matter of increasing the scale.

Florin_Andrei · on July 18, 2017

That is nowhere near what I am saying.

romaniv · on July 17, 2017

I think it would help a lot if we brought random forests and SVMs to the same level of performance as DNNs. Demonstrating that more "mechanical" algorithms can be as efficient would dispel some of the anthropomorphism and allow for better analysis of why certain things work.

I also believe that researches have responsibility to outline the limits of their own algorithms in research papers. (For example, presenting examples that aren't recognized or data sets on which the approach doesn't work at all.) That is valuable information and they almost certainly have it at the time of publication.

raverbashing · on July 17, 2017

Not possible, unfortunately

nojvek · on July 17, 2017

I've occasionally found that SVM's work great for one shot learning if you have good features and nicely labelled dataset. CNN's are really good at extracting features. Once you've extracted features that are generic, using an SVM as the last layer to train while keeping the CNN parameters intact yields great accuracy.

I think that's where we are really headed. A combination of deep learning, boosted trees, svm, evolutionary algos, knowledge graphs e.t.c all stitched together to build stronger AI systems.

Remember our aeroplanes don't flap wings but still carry tonnes of weight and fly half way around the world. Once we discovered fundamentals of aerodynamics a lot of supernatural things were possible.

Same with intelligence, once we discover the essentials of intelligence and mathematically formulate it, supernatural intelligence is very possible. This is the thing that really scares people. I have no idea how close we are to it, but I'm sure it will change society the way internet and mobile phones changed the world.

landon32 · on July 18, 2017

Wow, I had never considered superintelligence that wasn't at least at some level modeled after the human brain. That is crazy to think about. We could be at the very low end of the spectrum of intelligence I guess.

Iv · on July 18, 2017

Homo sapiens is the dumbest creature able to spawn a civilization that evolution could produce.

--Accelerando

raverbashing · on July 18, 2017

That's a good comment, and yes, SVM are very powerful in itself, they might not be "deep learning" but they're more powerful than linear learning and good for a lot of cases (as a last layer, as you mentioned, it's a good use case)

Yes, we'll have GAs building CNN architectures, or a mix of several techniques, I'm enthusiastic for what the future holds

taneq · on July 18, 2017

> I'm sure it will change society the way internet and mobile phones changed the world.

It will change the entire world the way humans changed the world. And that's scary.

darksaints · on July 18, 2017

Kaggle has already proven hundreds of times over that deep learning is not a silver bullet.

raverbashing · on July 18, 2017

Thanks, I'm familiar with Kaggle and how most of the time a Random Forest (or XGBoost, or something like Vowpal Wabbit) will solve your problem

fnl · on July 18, 2017

True - until some clever guy proves us all wrong and finds ways to train some multidimensional/complex/deep/... kernel/forest/swarm/... that can learn those nonlinearities that currently only deep nets can be trained to detect (essentially, due to their relative simplicity, I'd say) :-)

raverbashing · on July 18, 2017

I don't think we'll see a deep svm, but if we see one I think we'll have something very powerful

Same for a deep decision tree (forest?). Or maybe a combination of several techniques, etc

psandersen · on July 18, 2017

Probably comes down to whether the model can be trained with gradient descent (at least in the short term).

A general pre-trained RL guided architecture search (#1) together with more choices of nonlinearity (#2), feature extraction (#3), pooling and memory argumentation (#4) and other tricks (#5) could be very powerful amongst many domains. Make it be able to accept multiple pre-trained models as priors and we're well on our way to general AI or at least a place where most data-scientists could be automated away.

(#1 deepmind had a demo a year back or so that was quite novel) (#2 vaguely remember someone training decision trees with gradient descent; could definitely see a 'randomforest' layer appearing in the middle of deep nets) (#3 just convolutions + tricks really). (#4 neural turing machine etc) (#5 any attention mechanism/any sequence mechanism (rnn/lstm etc)/ any graph relational understanding like the recent deepmind paper).

pron · on July 17, 2017

One of the greatest clear and present dangers of AI is that various existing algorithms are called just that, rather than what they are: statistical analysis algorithms, or, in short, statistics. Statistics used to be what we called the worst kind of lie; now it's becoming associated with intelligence, hinting at the ability to expose some great hidden truth. The problem lies not only with the algorithms, but with the models they learn (which are indirectly shaped by the algorithms' limitations) that are simplistic to begin with. E.g., they are trained to predict behavior based on a snapshot of statistical data, using either a constant model (which assumes behavior doesn't change over time) or some simplistic first-order model of change. They certainly aren't usually trained to take into account long-term changes or how their own recommendations impact behavior. The result is a powerful yet completely unjustified boost to the public image of statistical data with simplistic change models.

fnl · on July 18, 2017

This. I still cannot forget the disappointment of my parents and some family friends, all retired scientist or MDs, when I explained them how deep learning and natural language processing works a few years ago. They were truly upset that all this was "nothing more than clever accounting and statistics" at the end of the day, and no trace of the "advertised intelligence" - with Hinton's RBMs maybe coming closest, but by the time I was explaining how you use MCMC to train a Boltzmann machine, they again were complaining that even this is just modeling "statistical likelihoods, not true intelligence"...

In essence, we are only modeling patterns and their transformations, even if rather complex ones. But even the most basic prokaryote can model patterns, that has nothing to do with intelligence or consciousness per se. (And please don't get me started on swarm intelligence now... :-))

ouid · on July 17, 2017

Perhaps the problem simply lies in calling them neural networks.

schoen · on July 17, 2017

This terminology goes back to McCulloch and Pitts in 1943, who said they were making an analogy or model based on the behavior of biological neurons.

https://en.wikipedia.org/wiki/Artificial_neuron#History

There are many things that are inexact about this analogy or model, and many of them were known to be inexact in 1943, but that was the direct inspiration.

Apparently there are lots of different mathematical models available about biological neuron behavior:

https://en.wikipedia.org/wiki/Biological_neuron_model

TremendousJudge · on July 17, 2017

turns out it's very hard to model a thing that we don't know how it actually works

curiousgal · on July 17, 2017

To be fair, we do understand how neurons work, at least on a singular level. Perceptrons model that quite well.

philipkglass · on July 17, 2017

Implementing a basic perceptron classifier is an undergrad homework assignment. Biological modeling of neurons is a work of decades:

http://www.genesis-sim.org/

https://www.neuron.yale.edu/neuron/what_is_neuron

robotresearcher · on July 17, 2017

McCulloch's argument was that perhaps the gross behaviour of a NN as layers of simple transfer functions is where the real action is, and the rest of the details are just gravy.

The fact we now give this to undergrads as homework suggests that there was some value to this idea.

philipkglass · on July 17, 2017

Students in computer science may implement a perceptron as a homework problem. Students in biology don't do that, nor do they use perceptrons to learn about brains, because perceptrons bear only faint resemblance to biological neurons. Reproducing important biological features of real neurons requires much more complicated software.

I'm not denigrating perceptrons or other neuro-inspired approaches to classification. I'm just pointing out that perceptrons are not a faithful model of neurons.

webmaven · on July 17, 2017

But it turns out that they don't have to be. We know that radically different low-level implementations can approximate the same higher-level functions given a large enough network and enough training (eg. half-precision floating point, integer, or even binary ANNs, not to mention the wide variety of activation functions such as relu, sigmoid, tanh, maxout, softmax, etc.), and we've seen increasingly varied ANN architectures applied to the same tasks with good results, so I would expect this to continue to hold true for ever more sophisticated tasks.

I am certain, BTW, that further study of biological neurons will continue to yield insights for the design of ANNs, but it does not at all follow that ANN design will become more similar to biological NNs as a result. Given the completely different substrates, simulating a biologically plausible NN in order to perform a task (for purposes other than gaining further understanding of biological NNs, that is) would be incredibly wasteful and unnecessary, even if your goal is to create an AGI of some sort.

philipkglass · on July 18, 2017

I was disagreeing with someone who wrote that we understand how neurons work and that perceptrons model them "quite well." They do not model biological neurons well at all. I agree that biological fidelity is not important for building useful ANNs.

robotresearcher · on July 18, 2017

I presented (a vulgar summary of) McCulloch's hypothesis, not my own. And since I didn't use the words "quite well", you are not entitled to put them in quotes.

yorwba · on July 18, 2017

philipkglass was referring to curiousgal's comment upthread: https://news.ycombinator.com/item?id=14790965

robotresearcher · on July 18, 2017

OK, thanks. Too late to edit. Adjust flames accordingly. Of course a perceptron is not an accurate model of a biological neuron. But as a reduction to a minimal model it's still pretty darn interesting.

Klockan · on July 17, 2017

But how does a neuron decide to grow new axons or how to change input weights? Biological neurons does this when solving tasks and not just during training. Isn't it possible that human-like intelligence depends on the network being dynamic? For example, when you play a game for the first time a lot of things suddenly starts to click, couldn't that be the result of new connections forming or at least some weights being changed? If this is true then it would be impossible to create a general game playing AI with human-like performance using our current model.

bitL · on July 17, 2017

Biological neurons are fundamentally different to models used in deep learning. They can have multiple outputs, can span over whole brain and do local protein-based computations we don't really understand yet. What we have in perceptron is just a very simple model based on what we observed using rudimentary electricity detectors.

SomeStupidPoint · on July 17, 2017

Don't fully connected layers do exactly what you describe?

0xdeadbeefbabe · on July 17, 2017

As well as the title "artificial intelligence".

ThomPete · on July 17, 2017

One can say that the human mind consist of millions of not very special parts. It's the aggregate, the complexity of which they interact that makes it special.

Once you start to connect all these seemingly non-special abilities in deep learning the "magic" starts to happen. You get something that is more than the sum of it's parts. Of course it's not DL in itself thats interesting but the potential emergent complex relationships.

fnl · on July 18, 2017

That's just another version of the trap GP spoke about. About a decade ago everybody was expecting emergent complex behavior from all kinds of evolutionary, intelligent ("swarm") systems. Didn't happen, seen that.

https://en.m.wikipedia.org/wiki/Swarm_intelligence

ThomPete · on July 18, 2017

About a decade ago winning GO or self-driving cars were seen as pipedreams many decades away. Yet here we are.

The author is making the mistake of thinking that just because he can show some areas were we aren't as far as we thought he has made an argument against AI.

Thats not how it works. We don't get to decide what is the right metrics. All we can see is that we keep making progress sometimes large leaps sometimes slow.

I always find it fascinating that we have no problem accepting the idea that human consciousness evolved from basically nothing but the most elementary building blocks of the universe and once we became complex enough we ended up being conscious yet somehow the idea of technology going through the same just in a different media seems to many impossible.

I know where my bet is at least and I haven't seen anything to counter that neither the OP's essay.

posterboy · on July 18, 2017

The fallacy there is glorifying consciousness. Full consciousness as in omniscence is an unachievable ideal. If we prescribe consciousness to ourselves, depending on the individual theory of conscious thought, that's likely faulty in some respect already.

ThomPete · on July 18, 2017

I don't see anyone glorifying consciousness especially not as some omniscient ideal. In fact I only see people arguing that consciousness isn't really the goal or the focus here but rather that you can't talk with any certainty about whether or not it's possible. You can however point to the fact that we are making progress towards more and more complex relationships and that this looks very much like how we became conscious. Thats all really.

logicallee · on July 18, 2017

>Didn't happen,

...yet. See my comment here: https://news.ycombinator.com/item?id=14770230

fnl · on July 18, 2017

"Never" is a strong prediction. But yes, ANNs have nothing in common with BNNs (biological ... :-)) at all, other than taking them as a very rough abstraction for teaching the basic intuition of the chained up tensor transformations.

The hard thing is to predict the when, or even if, of AI. If it will happen, it will be a sudden, light-switch like moment. I don't think AI can happen gradually. At least the first artificially scentient entity will be a moment much like a singularity some love to predict in the near future...

But as to when that moment will occur, or even if, I think we have no real data that shows we are any closer today than say 10 or 30 years ago. Pattern matching, no matter how complex, isn't "all there is" to intelligence and conciseness.

EDIT: OP changed his reply from "will never happen" to "hasn't happened yet" while I was replying, explaining why mine might read a bit strange now... :-)

somenewacc · on July 18, 2017

> If it will happen, it will be a sudden, light-switch like moment. I don't think AI can happen gradually.

But our own intelligence happened gradually.

mcbits · on July 18, 2017

Human intelligence at the individual level evolved pretty gradually, but there hasn't been enough time for biology to explain our advancement in the last 10,000 years or 500 years. Culture and social organization are the essential nurturing factors there.

Every human genius would be out foraging for roots, perhaps reinventing the wheel or the lever, if it grew up without the benefit and influence of a society that makes greater achievement possible. Modern science and high technology that we attribute to human intelligence are really the products of a superintelligence (not to be conflated with consciousness) acting through us as appendages.

I think it's entirely possible (even likely) that all of the components of a new computational superintelligence already exist, but they are still "hunting and gathering" in the halls of academia or the stock market or biotech or defense...

reckoner2 · on July 17, 2017

Has anyone been able to do this? Is anyone working on it?

I only follow the field as a hobby, but as far as I can tell we are nowhere near getting to this point. I think the ability to combine all these parts in a way that the sum is greater than it's parts is going to require many many breakthroughs still.

ThomPete · on July 17, 2017

The thing is that it's most likely not something anyone does per se but something that happens with enough complexity.

If you happen to believe evolutionary theory is the most convincing then we weren't built either but a byproduct of emergent complexity.

It is my belief that humans are pattern recognizing feedback loops and carriers of information. We externalized some of that into books and built libraries to be able to keep even more than humans can remember as individuals and now have technology to save even more information and even manipulate it in ways impossible up until 80 years ago or so.

I am fairly certain that a technology is part of nature and that technology based conscience is nothing like our limited conscience but something rather different. The end result will not be like humans just better but nothing like humans at all but much better at the carrying of information part.

And so with that (my personal belief) perspective in mind no one is going to be able to do it it will happen as a by-product.

Please keep in mind that I saying "we exeternalized" in the same way we say "selfish genes" it's not a conscious effort as such but rather something which happen to be favorized in the game of life.

Why that is I have no idea but I am fairly certain humans aren't the last species. But yes it's all very speculative I just haven't been able to find better explanations for now.

DiThi · on July 17, 2017

The problem is that we don't really know for sure. We kind of predict things by extrapolating what we know and what we have, but we can never be sure there won't be any sudden breakthroughs.

L_Rahman · on July 17, 2017

I agree with your position. But I want to add a warning against the humanization of the brain. Many parts of it are complex in unknown ways, but some parts are truly mechanical.

The parts of your central nervous system that respond to reflexes, that locate the source of sound or parse the color of retinal input are far more similar to deep learning algorithms than they are to what we think of as human consciousness.

fnl · on July 18, 2017

Because that has nothing to do with consciousness... Every living cell can perceive such inputs, even the simplest of prokaryotes can "sniff" out their food sources.

taneq · on July 18, 2017

> Many parts of it are complex in unknown ways, but some parts are truly mechanical.

I feel like this is a bit of a false dichotomy. We've never encountered any spooky non-mechanical non-physical part of the brain, and we've been looking since Cartesian dualism was in vogue.

What we think of as human consciousness is likely just a bunch of feedback loops allowing the brain to analyze some of its own state as if it were an external entity.

randcraw · on July 18, 2017

The same oversimplification could have been made of the visual system before we became aware of specialized cortical units and their federated/hierarchical arrangement.

In time I suspect we'll yet discover that much of the brain is inhomogeneous in unexpected ways and peculiarly interconnected. If it were not, we'd understand more about how it works by now.

pishpash · on July 17, 2017

It isn't anthropomorphizing. There are undeniable architectural similarities between ANN's and biological neural networks. We don't understand either very well yet but the parts we do understand have led to a lot of cross pollination. I don't think computational intelligence will ever match biological networks detail by detail due to the different substrates and resource usage tradeoffs, and they don't need to match. Intelligence can develop in different ways and we are learning about the universal aspects of it.

therajiv · on July 17, 2017

This is exactly my point - the danger of "anthropomorphization" lies in taking the brain analogy too far. That is, there shouldn't necessarily be a link between research in neuroscience and advances that make deep learning models more accurate. The tasks are completely different (human learning vs. minimizing a loss function), and it's important for researchers in both fields - neuroscience and AI - to keep that in mind.

currymj · on July 17, 2017

However, there definitely are analogies! E.g. early work in convnets was inspired by the architecture of cat brains.

I think the fields have useful things to say to each other, but we're getting over a (maybe justified) taboo in talking about machine learning methods being biologically inspired.

felippee · on July 17, 2017

The origins of that analogy are very flimsy:

1) Hubel and Wiesel discover simple and complex cells in cat's V1 in the 60's. They came up with an ad hoc explanation that somehow the complex cells "pool" among many simple cells of the same orientation. No one to date knows how such pooling would be accomplished (that selects exactly simple cells of similar orientation and different phase, not vice versa), or whether that pooling is only on V1 or elsewhere in the cortex.

2) Fukushima expanded that ad hoc model into neocognitron in 80's, though there is exactly zero evidence for similar "pooling" in higher cortical areas. In fact, higher cortical areas are essentially impossible to disentangle and characterize even today.

3) Yann Lecun took neocognitron and made a convnet which worked OK for MNIST in the late 80's. Afterward the thing was forgotten for many years.

4) Some few years ago Hinton and some dude who could write good GPU code (Alex Krizhevsky), took the convent and won ImageNet. That is when the current wave of "AI" started.

In summary, covnets and very loosely based on an ad hoc explanation to Hubel and Wiesel findings in primary visual cortex, which today in neuroscience are regarded as "incomplete" to say the least (more likely completely wrong). Now this stuff works to a degree, but really all these biological inspirations are very minimal.

beachbum8029 · on July 17, 2017

How do you know your brain's not minimizing a loss function?

kxyvr · on July 17, 2017

For the analogy to hold, it's more of a question of whether or not ML algorithms operate in the same way as the brain. Right now, ML models use algorithms from continuous optimization that require certain structure. Namely, we require a Hilbert space, so that we can define things like derivatives and gradients. This puts certain requirements on the kinds of functions that we can minimize and the kinds of spaces that we can work with. These are requirements that are difficult to have precise analogies in biology. What does it mean to have an inner product in the brain? We does twice continuously differentiable mean in the context of a neuron? Even if there is a minimization principle, which I am not sure there is or is not, if ML uses algorithms, which are fundamentally not realizable in biology, how can we say it replicates the brain?

beachbum8029 · on July 18, 2017

Based on what goes on in every cell in our bodies when it comes to the information processing involved with DNA, I don't think there is any such algorithm which is fundamentally not realizable in biology. I'll grant you, I don't think biological neurons are calculating derivatives across connection strengths, but there must be some analogous process to control neural connection strengths.

kxyvr · on July 18, 2017

That may very well be and I think it's a fantastic area to do research on. Namely, can we accurately model the body with an algorithmic process and what does this process look like? However, unless ML directly mirrors: the algorithms involved in the body, the models used by the body, and the the misfit function used by the body, which together already assumes that the body really does operate on a strict minimization principle, then I contend it's improper to anthropomorphize the algorithms. They're good algorithms, but a better name would be empirical modeling since we're creating models from empirical data.

s-macke · on July 17, 2017

You might find a slide of my talk interesting:

https://ibb.co/fXAn4a

You have to read it from left to right with an twinking eye of course ;)

mojomark · on July 17, 2017

In your slide - why is back propogation a further stretch from a true bio-NN than an ANN without back propogation?

s-macke · on July 17, 2017

An ANN still resembles major features of an bio-NN.

1. A network

2. Flow of information is mainly unidirectional through a node

3. Multiple inputs, but one output, which is connected to the inputs of other neurons.

4. The connection strength between 2 neurons can be changed.

5. Non-linear behavior.

After all, I think, this is not such a bad first approximation. Hence the picture in the middle.

But I cannot believe that we learn by comparing thousands or millions of input and output patterns and back propagate the error through the network to perform a gradient descent at the neurons. That is simply not, what our brain does.

sqeaky · on July 17, 2017

When there is feedback in neurons, what do you think that conveys?

I agree it is not some simple error correction like what is propagated backwards, but it happens often and I presume its something useful or it wouldn't be there.

Cybiote · on July 17, 2017

Top down predictions are likely mediated by feedback connections from higher to lower areas. Functions include possibly encoding a generative prior for prediction, speeding up inference. They also play an important role in coding more informative error signals than simple derivatives and are part of how the brain learns even as it predicts.

sjg007 · on July 18, 2017

This is only true because we don't know how the brain actually works. But the NN architecture is not unreasonable, it maps structures seen in the brain. Backpropagation is also reasonable to abstract the changes in gene and protein regulation (e.g. how learning could be encoded).

nerdponx · on July 17, 2017

Well said. It's just curve fitting.

pishpash · on July 17, 2017

Maybe everything is "curve fitting." -- Note: I think it's more hierarchical than that but curve fitting is certainly one of the important capabilities of biological systems.

kxyvr · on July 17, 2017

I don't think so. There's an incredibly important art and science to model selection that is not encapsulated in curve fitting. For example, say we observe a boy throwing a ball and we want to predict where the ball will land. From basic physics, we know the model is `y = 0.5 a t^2 + v0 t + y0` where `a` is the acceleration due to gravity, `v0` is the initial velocity, and `y0` is the initial height. After observing one or two thrown balls, even with error, we can estimate the parameters `a`, `v0`, and `y0` relatively well. Alternatively, we could apply a generic machine learning model to this problem. Eventually, it will work, but how much more data do we need? How many additional parameters do we need? Do the machine learning parameters have physical meaning like those in the original model? In this case, I contend the original model is superior.

Now, certainly, there are cases where we don't have a good or known model and machine learning is an extremely important tool for analyzing these cases. However, the process of making this determination and choosing what model to use is not solved by curve fitting or machine learning. This is a decision made by a person. Perhaps some day that will change, and that will be a major advance in intelligent systems, but we don't have that now and it's not clear to me how extending existing methods will lead us there.

Basically, I agree with the sentiment of the grandparent post. Machine learning is largely just curve fitting. How and when to apply a machine learning model vs another model is currently a decision left up to the user.

pishpash · on July 17, 2017

You're talking about the complexity of the model. If you take a purely input-output view of the world (which by the way, even classical Physics does), every problem _is_ curve fitting in a sufficiently high dimensional space. There is no _conceptual_ problem here. There is perhaps a complexity problem, but that's why I wrote that "I think it's more hierarchical than that."

randcraw · on July 18, 2017

I disagree. Many problem spaces are not continuous and can involve incomplete information that make a continuous model like a curve useless.

For instance, a linguistic model that lacks definitions for some words, or which allows too much ambiguity can leave sentences unparsable or uninterpretable. Disruptions to word order in sentences can lose sufficient information that no curve or fitment can recover it. A curve has to capture sufficient information for fitting it to be useful. I think not all concepts or relations are amenable to N-dimensional cartesian representation. (Though I'd like to see a reference confirming this.)

And hidden interdependence between dimensions can make any curve drawn in that coordinate space a misrepresentation of the actual info space, and any curve fit in it, dysfunctional.

Any mapping of info onto a cartesian coordinate space presumes constraints that limit the utility of any function that across that space. So no curve is guaranteed to be meaningful in "the real world" unless those assumptions are conserved upon reentry from the abstract world.

George Box's "All models are wrong, but some are useful" suggests that while fitting curves in wrong models may be possible, it well may be form without function.

eli_gottlieb · on July 17, 2017

>If you take a purely input-output view of the world (which by the way, even classical Physics does), every problem _is_ curve fitting in a sufficiently high dimensional space.

Not all spaces are Euclidean, and "purely input-output" still contains a lot of room for counterfactuals that ML models fail to capture.

nerdponx · on July 18, 2017

What do you mean by counterfactuals? NNs are function approximation algorithms, in any geometry. No ifs ands or buts about it.

eli_gottlieb · on July 18, 2017

Oh, I agree that neural networks are function approximators with respect to some geometry. When I say "counterfactuals", I'm talking about typical Bayes-net style counterfactuals, but as also used in cognitive psychology. We know that human minds evaluate counterfactual statements in order to test and infer causal structure. We thus know that neural networks are insufficient for "real" cognition.

SomeStupidPoint · on July 17, 2017

You seem to have replied on a tangent: how is what you describe not just "curve fitting"?

Humans didn't magic that model up: you're ignoring the huge amount of human effort over thousands of years that it took to arrive at that model. If we gave a ML algorithm a similar amount of time and asked it to construct a simple model of the situation, it might very well hand back the formula you presented.

Your entire post basically begs the question: it supposes that humans are doing something that isn't "curve fitting", and then uses that to argue that they do more.

What, specifically, are you supposing can't be done by "curve fitting"?

kxyvr · on July 17, 2017

I believe the process for deriving fundamental physical models differs from the techniques used in ML. For example, say we want to use the principle of least action to derive an expression for energy similar to what Landau and Lifshitz derive in their book Mechanics. Here, we assume that the motion of a particle is defined by its position and velocity. We assume that the motion of the particle is defined by an optimization principle. We assume Galilean invariance. We assume that space and time are homogeneous is isotropic. Then, putting this all together we can derive an expression for energy that `E=0.5 m v^2`. At this point, we can validate our model with a series of experiments that curve fits this expression to the results.

Alternatively, we could just run a bunch of experiments on data using ML models. Eventually, someone may have a wonderful idea and realize that we can just reduce the ML model into a parabola. Of course, this is due to intuition and not the ML model. Nevertheless, even though we end up at the same result, I contend the first result is different. It has a huge amount of information embedded into it about the assumptions we made into how the world works. When those assumptions are no longer satisfied, we have a rubric for constructing a fix. For example, if Galilean invariance no longer holds, we can fix the above model using the same sort of derivations to obtain relativistic expressions. Again, we could just throw more data at this new problem and fit an ML model to and perhaps someone would stare at this new model and realize that `E = m c^2`. However, I think that's discounting the embedded information in deriving these models and I don't think this information is present in ML models. ML models are generic. Our most powerful physical models are not.

Now, sure, once we have the models, we're just going to fit them to the data and it's all just curve fitting. Other fields call this parameter estimation, parameter identification, or a variety of other names. At that point it's all curve fitting. However, again, I contend the process for determining a new model is not.

nerdponx · on July 18, 2017

Of course. "What do I fit this curve to" is a prerequisite to "what is the shape of this curve?"

You shouldn't feel the need to defend theory-based modeling against some imagined incursion from arrogant deep learning researchers. NNs work tremendously well in a few specific problem domains that we had no way to approach otherwise. Elsewhere, they're not much better than any other prediction algorithm. By the way XGBoost is curve-fitting, too.

kxyvr · on July 19, 2017

I very much agree! Barring some kind of special intuition to the problem, I think ML are a fantastic tool for building models from empirical data. Even with intuition, sometimes they work as well. My core argument is that anthropomorphizing the algorithms has led to a great deal of confusion as to when we should or should not use these models. I often do computational modeling work with engineers and many of them are starting to eschew good, foundationaly sound models for ML not because they work better, in fact, on many of these problems they work far, far worse, but because good computational modeling is hard and it sounds like all they have to do with ML is teach the algorithms how physics works and how to be an engineer. Since they're good teachers, they should be able to teach the algorithm, right? In reality, it's still dirty, grinding computational modeling work. If we just called these models what they really are, empirical models, I think there'd be far less confusion as to when they should be used.

SomeStupidPoint · on July 17, 2017

You haven't explained how the first case isn't "curve fitting": the agents performing the compilation of those facts into the new fact are just spitting out the "best" fit string of symbols based on learned rules, etc etc. Somethings computers can (theoretically) do, and which fits the description "curve fitting" just fine. School (and other education) is training the model they're using to do that compilation, but it's still just "curve fitting" based on reward/punishment signals.

What part of that can't an ML agent learn to do?

From my perspective, you're just describing the "higher order" layers of the network and pretending that humans aren't actually running those functions embedded on deep networks, then proclaiming that deep networks can't do it.

kxyvr · on July 17, 2017

Alright, so from my perspective, curve fitting consists of three things

1. Definition of a model. ML models like multilayer perceptrons used a superposition of sigmoids, but newer models have superpositions of other functions and more nested hierarchies.

2. A metric to define misfit. Most of the time we use least squares because it's differentiable, but other metrics are possible.

3. An optimization algorithm to minimize misfit. Backpropogation is a combination of an unglobalized steepest descent combined with automatic differentiation like algorithm to obtain the derivatives. However, there is a small crowd that uses Newton methods.

Literally, this means curve fitting is something like the problem

min_{params) 0.5 sum_i || model(params,input_i) - output_i ||^2

Of course, there's also a huge number of assumptions in this. First, optimization requires a metric space since we typically want to make sure we're lower than all the points surrounding it. Though, this isn't all that helpful from an algorithmic point of view, so we really need an complete inner product space in order to derive out optimality conditions like the gradient of the objective being zero. Alright, fine, that means if we want to do what you say then we need to figure out how to compile these facts into a Hilbert space. Maybe that's possible and it raises some interesting questions. For example, Hilbert spaces have the property that `alpha x + y` also lie in the vector space. If `x` is an assumption like Galilean invariance and `y` is an assumption that time and space are isotropic, I'm not sure what the linear combination would be, but perhaps it's interesting. Hilbert spaces also require inner products to be well defined and I'm not sure what the inner product between these two assumptions are either. Of course, we don't technically need a Hilbert or Banach space to optimize. Certainly, we lose gradients and derivatives, but there may be something else we can do. Of course, that would involve creating an entire new field of computational optimization theory that's not dependent on derivatives and calculus, which would be amazing, but we don't currently have one.

From a philosophical point of view, there may be a reasonable argument that everything in life is mapping inputs to outputs. From a practical point of view, this is hard and the foundation upon which ML is cast is based on certain assumptions like the three components above, which have assumptions on the structures we can deal with. Until that changes, I continue to contend that, no, ML does not provide a mechanism for deriving new fundamental physical models.

Gtifn · on July 17, 2017

What do you think about a bayesian interpretation of the above as MAP/MLE?

https://arxiv.org/abs/1706.00473

kxyvr · on July 17, 2017

Unless I'm missing something, and I likely am, the linked paper is still based on the the fundamental assumptions behind curve fitting that I listed above. Namely, their optimization algorithms, metrics, and models are still based on Hilbert spaces even though they've added stochastic elements and more sophisticated models.

nerdponx · on July 19, 2017

Interesting abstract. I love Bayesian stats so hopefully this will be a fun commute read. Thanks!

nerdponx · on July 18, 2017

I think you're reading way too far into my post. I was just pointing out that our amazing AI revolution is really just a new type of function approximation being that has magical-seeming results.

SomeStupidPoint · on July 18, 2017

I can't think of a succinct way to describe my response, but I'm not sure we disagree, so much as we're talking about slightly different things.

Regardless, I wanted to thank you for the detailed replies -- having a back and forth helped me ponder my thoughts on the matter.

Have a good one. (:

kxyvr · on July 18, 2017

Thanks for chatting!

visarga · on July 17, 2017

> Eventually, it will work, but how much more data do we need?

For a model that small, with so little variance (assume you measure correctly where the ball lands) it would be enough to do just a few throws to fit the parameters.

curiousgal · on July 17, 2017

I hope Elon Musk understands that.

kinkrtyavimoodh · on July 17, 2017

I am sure he does.

urethrafranklin · on July 18, 2017

His public statements would indicate otherwise.

nerdponx · on July 19, 2017

Consider that his public statements are made on the advice of his publicist, and that encouraging the AI hype is self-serving.

dboreham · on July 17, 2017

Or finding eigenvalues.

SomeStupidPoint · on July 17, 2017

Is what you do not "just curve fitting"?

eanzenberg · on July 17, 2017

The anthropomorphization was done by academic researchers to gain/increase funding for themselves and the field. You can read the papers and see. This is commonly done for marketing purposes and is important since the pool of research money can be limited.

posterboy · on July 18, 2017

I always counter, the intelligence is not in the machine, but the builder. Antropomorphism is in line with that, because it projects the human qualities onto the machine, because, in a broad sense, they are modelled after those. Egoistic as we are, that's the only way to understand anything, to remove the shizm between animate and inanimate objects. Just like a fishing rod is just the extension of an arm.

ankurdhama · on July 18, 2017

> The model's intuition doesn't work like a human's

The model doesn't have intuition, it is just a series of computations.

toisanji · on July 17, 2017

There is some good information in there and I agree with the limitations he states, but his conclusion is completely made up.

"To lift some of these limitations and start competing with human brains, we need to move away from straightforward input-to-output mappings, and on to reasoning and abstraction."

There are tens of thousands of scientists and researchers who are studying the brain from every level and we are making tiny dents into understanding it. We have no idea what the key ingredient is , nor if it is 1 or many ingredients that will take us to the next level. Look at deep learning, we had the techniques for it since the 70's, yet it is only now that we can start to exploit it. Some people think the next thing is the connectome, time, forgetting neurons, oscillations, number counting, embodied cognition,emotions,etc. No one really knows and it is very hard to test, the only "smart beings" we know of are ourselves and we can't really do experiments on humans because of laws and ethical reasons. Computer Scientists like many of us here like to theorize on how AI could work, but very little of it is tested out. I wish we had a faster way to test out more competing theories and models.

eli_gottlieb · on July 17, 2017

>I wish we had a faster way to test out more competing theories and models.

Luckily, the state of actual cognitive science and neuroscience is fairly far ahead of, "Gosh there's all these things and we just don't know." Unfortunately, MIT-style cogsci hasn't generated New Jerseyan fast-though-wrong algorithms for Silicon Valley to hype up, so the popular press keeps spreading the myth of our total ignorance.

Besides which, we do know what's missing from deep learning: the ability to express anything other than a trivial Euclidean-space topological structure. We know that real data is sampled from a world subject to cause-and-effect, and that any manifold describing the data should carry the causal structure in its own topology.

_ntka · on July 17, 2017

Hardly a "made-up" conclusion -- just a teaser for the next post, which deals with how we can achieve "extreme generalization" via abstraction and reasoning, and how we can concretely implement those in machine learning models.

bjt · on July 17, 2017

> we can't really do experiments on humans because of laws and ethical reasons.

Ethics and laws constrain but do not forbid experimenting on humans. We do experiments on humans all the time, including experiments on how people learn and reason. There are numerous academic journals devoted to these topics.

landon32 · on July 18, 2017

I don't think he's talking about literally mimicking how the human brain works. It seems like he's just talking about making neural nets more effective in certain tasks by allowing for more types of abstraction, just like a human brain has more types of abstraction than Artificial Neural Networks do.

Houshalter · on July 18, 2017

This article is a bit misleading. I believe NNs are a lot like the human brain. But just the lowest level of our brain. What psychologists might call "procedural knowledge".

Example: learning to ride a bike. You have no idea how you do it. You can't explain it in words. It requires tons of trial and error. You can give a bike to a physicist that has a perfect deep understanding of the laws of physics. And they won't be any better at riding than a kid.

And after you learn to ride, change the bike. Take one where the handle is inversed. And turning it right turns the wheel left. No matter how good you are at riding a normal bike, no matter how easy it seems it should be, it's very hard. Requires relearning how to ride basically from scratch. And when you are done, you will even have trouble going back to a normal bike. This sounds familiar to the problems of deep reinforcement learning, right?

If you use only the parts of the brain you use to ride a bike, would you be able to do any of the tasks described in the article? E.g. learn to guide spacecraft trajectories with little training, through purely analog controls and muscle memory? Can you even sort a list in your head without the use of pencil and paper?

Similarly recognizing a toothbrush as a baseball bat isn't as bizarre as you think. Most NNs get one pass over an image. Imagine you were flashed that image for just a millisecond. And given no time to process it. No time to even scan it with your eyes! You certain you wouldn't make any mistakes?

But we can augment NNs with attention, with feedback to lower layers from higher layers, and other tricks that might make them more like human vision. It's just very expensive.

And that's another limitation. Our largest networks are incredibly tiny compared to the human brain. It's amazing they can do anything at all. It's unrealistic to expect them to be flawless.

BooglyWoo · on July 18, 2017

It's a good article in a lot of ways, and provides some warnings that many neural net evangelists should take to heart, but I agree it has some problems.

It's a bit unclear whether Fchollet is asserting that (A) Deep Learning has fundamental theoretical limitations on what it can achieve, or rather (B) that we have yet to discover ways of extracting human-like performance from it.

Certainly I agree with (B) that the current generation of models are little more than 'pattern matching', and the SOTA CNNs are, at best, something like small pieces of visual cortex or insect brains. But rather than deriding this limitation I'm more impressed at the range of tasks "mere" pattern matching is able to do so well - that's my takeaway.

But I also disagree with the distinction he makes between "local" and "extreme" generalization, or at least would contend that it's not a hard, or particularly meaningful, epistemic distinction. It is totally unsurprising that high-level planning and abstract reasoning capabilities are lacking in neural nets because the tasks we set them are so narrowly focused in scope. A neural net doesn't have a childhood, a desire/need to sustain itself, it doesn't grapple with its identity and mortality, set life goals for itself, forge relationships with others, or ponder the cosmos. And these types of quintessentially human activities are what I believe our capacities for high-level planning, reasoning with formal logic etc. arose to service. For this reason it's not obvious to me that a deep-learning-like system (with sufficient conception of causality, scarcity of resources, sanctity of life and so forth) would ALWAYS have to expend 1000s of fruitless trials crashing the rocket into the moon. It's conceivable that a system could know to develop an internal model of celestial mechanics and use it as a kind of staging area to plan trajectories.

I think there's a danger of questionable philosophy of mind assertions creeping into the discussion here (I've already read several poor or irrelevant expositions of Searle's Chinese Room in the comments). The high-level planning, and "true understanding" stuff sounds very much like what was debated for the last 25 years in philosophy of mind circles, under the rubric of "systematicity" in connectionist computational theories of mind. While I don't want to attempt a single-sentence exposition of this complicated debate, I will say that the requirement for "real understanding" (read systematicity) in AI systems, beyond mechanistic manipulation of tokens, is one that has been often criticised as ill-posed and potentially lacking even in human thought; leading to many movements of the goalposts vis-à-vis what "real understanding" actually is.

It's not clear to me that "real understanding" is not, or at least cannot be legitimately conceptualized as, some kind of geometric transformation from inputs to outputs - not least because vector spaces and their morphisms are pretty general mathematical objects.

EDIT: a word

glenstein · on July 18, 2017

I similarly find myself frustrated with philosophy of mind "contributions" to conversations on deep learning/consciousness/AI. There seems to be a lot of equivocation between the things you label as (a) and (b) above, and a lot of apathy toward distinguishing between them. But (a) and (b) are completely different things, and too often it seems like critics of computers doing smart things treat arguments for one like they are arguments for the other.

Probably the most famous AI critic, Hubert Dreyfus, said "current claims and hopes for progress in models for making computers intelligent are like the belief that someone climbing a tree is making progress toward reaching the moon." But it is progress. Because by climbing a tree I've gained much more than height. I actually did move toward the moon. I've gained the insight that I'm using the right principle.

CountSessine · on July 17, 2017

Surely we shouldn't rush to anthropomorphize neural networks, but we'd ignoring the obvious if we didn't at least note that neural networks do seem to share some structural similarities with our own brains, at least at a very low level, and that they seem to do well with a lot of pattern-recognition problems that we've traditionally considered to be co-incident with brains rather than logical systems.

The article notes, "Machine learning models have no access to such experiences and thus cannot "understand" their inputs in any human-relatable way". But this ignores a lot of the subtlety in psychological models of human consciousness. In particular, I'm thinking of Dual Process Theory as typified by Kahneman's "System 1" and "System 2". System 1 is described as a tireless but largely unconscious and heavily biased pattern recognizer - subject to strange fallacies and working on heuristics and cribs, it reacts to it's environment when it believes that it recognizes stimuli, and notifies the more conscious "System 2" when it doesn't.

At the very least it seems like neural networks have a lot in common with Kahneman's "System 1".

eli_gottlieb · on July 17, 2017

>In particular, I'm thinking of Dual Process Theory

Which has been at least partly debunked as psychology's replication crisis went on, and has been called into question on the neuroscientific angle as well.

CountSessine · on July 17, 2017

Partly, yes - especially with ego depletion on the ropes. I'm not sure that dual process theory needs to be thrown out along with ego depletion, though.

eli_gottlieb · on July 17, 2017

I can see three reasons to "throw it out":

1) Replication failure, plain and simple.

2) Overfitting. There are dozens to hundreds of "cognitive biases" on lists: https://en.wikipedia.org/wiki/List_of_cognitive_biases. When you have hundreds of individual points, you really ought to draw some principles, and the principle should not be, "The system generating all this is rigid and inflexible."

3) Imprecision! Again, dozens to hundreds of cognitive biases. What possible behavior or cognitive performance can't be assimilated into the heuristics and biases theory? What can falsify it overall, even after so many of its individual supporting experiments and predictions have fallen down?

It looks like a mere taxonomy of observations, not a substantive theory.

CountSessine · on July 17, 2017

1) Replication failure, plain and simple.

How many meta-analyses have been conducted as of 2017 showing one result or the other? I don't think ego depletion itself has been thoroughly "debunked" yet. If it is a real effect, it's probably quite small - but I don't think that ego depletion has been thrown in the bin just yet.

2) Overfitting. There are dozens to hundreds of "cognitive biases" on lists: https://en.wikipedia.org/wiki/List_of_cognitive_biases. When you have hundreds of individual points, you really ought to draw some principles, and the principle should not be, "The system generating all this is rigid and inflexible."

3) Imprecision! Again, dozens to hundreds of cognitive biases. What possible behavior or cognitive performance can't be assimilated into the heuristics and biases theory? What can falsify it overall, even after so many of its individual supporting experiments and predictions have fallen down?

Wait a second - has anyone ever tried to explain the "IKEA Effect" using Dual Process Theory? What does a laundry-list of supposed cognitive biases have to do with the theory? Is anyone really trying to explain/predict all this almanac-of-cognitive-failings with Dual Process?

eli_gottlieb · on July 17, 2017

>Is anyone really trying to explain/predict all this almanac-of-cognitive-failings with Dual Process?

To my understanding, yes. That's basically what Dual Process theories exist for: to separate the brain into heuristic/bias processing as one process, and computationally expensive model-based cause-and-effect reasoning as another process. Various known cognitive processes or results are then sort of classified on one side of the line or another.

When you apply Dual Process paradigms to specific corners of cognition, they can be useful. For example, I've seen papers purporting to show that measured uncertainty allows model-free and model-based reinforcement learning algorithms to trade off decision-making "authority". This is less elegant than an explicitly precision-measuring free-energy counterpart, but it's still a viable hypothesis about how the brain can implement a form of bounded rationality when bounded in both sample data and compute power.

But when you scale Dual Processes up to a whole-brain theory, it's just too good at describing anything that involves dichotomizing into a "fast-and-frugal" form of processing and another expensive, reconstructive form of processing. One of the big issues here is that besides the potentially false original evidence for Dual Processes, we don't necessarily have reason to believe there exists any dichotomy, rather than a more continuous tradeoff between frugal heuristic processing and difficult reconstructive processing. The precision-weighting model-selection theory actually makes much more sense here.

CountSessine · on July 17, 2017

This is a fantastic answer - thank you, Eli. So what do you think of the original article?

eli_gottlieb · on July 17, 2017

>This is a fantastic answer - thank you, Eli.

Thanks! I've been doing a lot of amateur reading in cog-sci and theoretical neurosci. The subject enthuses me enough that I'm applying to PhD programs in it this upcoming season.

>So what do you think of the original article?

Thorough and accurate. I'll give a little expansion via my own thought. One thing taught in every theoretically-focused ML class is the No Free Lunch Theorem. In colloquial terms it says, "If you don't make some simplifying assumptions about the function you're trying to learn (and the distribution noising your data), you can't reliably learn."

I think experts learn this, appreciate it as a point of theory, and then often forget to really bring it back up and rethink it where it's applicable. All statistical learning takes place subject to assumptions of "niceness". Which assumptions, though?

Seems to me like:

* If you make certain "niceness" assumptions about the functions in your hypothesis space, but few to none about the distribution, you're a Machine Learner.

* If you make niceness assumptions about your distribution, but don't quite care about the generating function itself, you're an Applied Statistician.

* If you make niceness assumptions about your data, that it was generated from some family of distributions on which you can make inferences, you're a fully frequentist or Bayesian statistician.

* If you want to make almost no assumptions about the generating process yielding the data, but still want just enough assumptions to make reasoning possible, you may be working in the vicinity of any of cognitive science, neuroscience, or artificial intelligence.

The key thing you always have to remind yourself is: you are making assumptions. The question is: which ones? The original article reminds us of a whole lot of the assumptions behind current deep learning:

* The "layers" we care about are compositions of a continuous nonlinear function with a linear transform.

* The functions we care about are compositions of "layers".

* The transforms we care about are probably convolutions or just linear-and-rectified, or just linear-and-sigmoid.

* Composing layers enables gradient information to "fan out" from the loss function to wider and wider places in the early layers.

* The data spaces we care about are usually Euclidean.

These are things every expert knows, but which most people only question when it's time to look at the limitations of current methods. The author of the original article appears well-versed in everything, and I'm really excited to see what they've got for the next part.

language · on July 17, 2017

"Machine learning models have no access to such experiences and thus cannot "understand" their inputs in any human-relatable way"

It may be that distinctions like the one you're describing here are useful to make, but I don't think this claim refutes the possibility of ML "fitting a particular piece within a larger, yet unarticulated model."

I think the assertion is more that our current ways of representing elements of human experience are necessarily very lossy - or that there's some aspect of the situation that you can't describe/implement in terms of models of neural nets.

visarga · on July 17, 2017

The problem with neural nets is that they have a fixed input type - tensors or sequences. For example, imagine the task is to count objects in an image and say if the number of red objects is equal to the number of green objects. You make a net that solves this situation. Then you want to change the colors, or add an extra color, and it will fail. Why - because it learns a fixed input representation.

What neural nets need is to change their data format from plain tensors to object-relation graphs. The input of the network is represented as a set of objects that have relations among them, and the network has to be permute invariant to the order of presentation. An implementation is Graph Convolutional Nets. They learn to compose concepts in new ways and once they learned to count, compare, select by color, they can solve any combination of those concepts as well. That way the nets generalize better and transfer knowledge from a problem to the next.

Graphs are able to reduce the complexity of learning a neural net that can perform flexible tasks. But in order to get to even better results, it is necessary to add simulation to the mix. By equipping neural nets with simulators, we can simplify the learning problem (because the net doesn't have to learn the dynamics of the environment as well, just the task at hand). Examples of simulators used in DL are AlphaGo, the Reinforcement Learning applications on Atari Games, protein/drug property prediction, generative adversarial networks (in a way).

The interesting thing is that graphs are natural for simulation. They can represent objects as vertices and relations as edges, and by signal propagation the graph works like a circuit, a simulator, producing the answer. My bet is on graphs + simulators. That's how we get to the next level (abstraction and reasoning). DeepMind seems to be particularly focused on RL, games and recently, relation networks. There is also work on making dynamic routing in neural nets, in fact applying graphs implicitly inside the net, by multiple attention heads.

nilkn · on July 17, 2017

There is actually work being done on this problem, at least to some extent. A DNC, for instance, can accept variable-structure inputs by storing each piece in its external memory bank. This is illustrated in the original Nature paper by feeding in an arbitrary graph definition piece by piece, then feeding in a query about the graph.

This doesn't necessarily address all the nuances of your post, but I do believe it's a step in the right direction. It pushes networks from

"learn how to solve this completely statically defined problem via sophisticated pattern matching"

to

"learn how to interpret a query, drawn from some restricted class of possible queries; accept variable-structure input to the query; strategize about techniques for answering the query; and finally compute the answer, possibly over multiple time-steps"

siliconc0w · on July 17, 2017

A neat technique to help 'explain' models is LIME: https://www.oreilly.com/learning/introduction-to-local-inter...

There is a video here https://www.youtube.com/watch?v=hUnRCxnydCc

I think this has some better examples than the Panda vs Gibbon example in the OP if you want to 'see' why a model may classify a tree-frog as a tree-frog vs a billiard (for example). IMO this suggests some level of anthropomorphizing is useful for understanding and building models as the pixels the model picks up aren't really too dissimilar to what I imagine a naive, simple, mind might use. (i.e the tree-frog's goofy face) We like to look at faces for lots of reasons but one of them probably is because they're usually more distinct which is the same, rough, reason why the model likes the face. This is interesting (to me at least) even if it's just matrix multiplication (or uncrumpling high dimensional manifolds) underneath the hood,

cm2187 · on July 17, 2017

I think the requirement for a large amount of data is the biggest objection to the reflex "AI will replace [insert your profession here] soon" that many techies, in particular on HN, have.

There are many professions where there is very little data available to learn from. In some case (self-driving), companies will invest large amount of money to build this data, by running lots of test self-driving cars, or paying people to create the data, and it is viable given the size of the market behind. But the typical high-value intellectual profession is often a niche market with a handful of specialists in the world. Think of a trader of financial institutions bonds, or a lawyer specialized in cross-border mining acquisitions, a physician specialist of a rare disease or a salesperson for aviation parts. What data are you going to train your algorithm with?

The second objection, probably equally important, also applies to "software will replace [insert your boring repetitive mindless profession here]", even after 30 years of broad adoption of computers. If you decide to automate some repetitive mundane tasks, you can spare the salary of the guys who did these tasks, but now you need to pay the salary of a full team of AI specialists / software developers. Now for many tasks (CAD, accounting, mailings, etc), the market is big enough to justify a software company making this investment. But there is a huge number of professions where you are never going to break even, and where humans are still paid to do stupid tasks that a software could easily do today (even in VBA), and will keep doing so until the cost of developing and maintaining software or AI has dropped to zero.

I don't see that happening in my life. In fact I am not even sure we are training that many more computer science specialists than 10 years ago. Again, didn't happen with software for very basic things, why would it happen with AI for more complicated things.

visarga · on July 17, 2017

I'd say on the contrary, the problem with experts is that they are so expensive to train and so rare. It is easier to collect data, train the AI and then equip doctors all over the world with it than to have thousands of experts in that particular field.

A doctor that treats patients all day long doesn't have time to keep up with the research and state of the art. A researcher that is on the cutting edge of medicine doesn't have time to treat the patients. We need to equip doctors with AIs to keep them up to date with the best practices.

cm2187 · on July 17, 2017

well, you are mentioning an example where:

- there is data

- there is a wide market that could justify large investments in AI

With this combination, yeah I can see AI being used. In fact medecine is one of the few professions that never industrialised. But there are loads of other professions where either or none of the conditions above are met.

If you are talking about a doctor specialised in a rare disease, where there is very little data, and very few patients to cure, how do you think AI will replace that?

webmaven · on July 18, 2017

> If you are talking about a doctor specialised in a rare disease, where there is very little data, and very few patients to cure, how do you think AI will replace that?

Well, since transfer learning is a thing, you would start with a general purpose medical system and then train it on what little data you do have on the rare disease to produce an appropriate model (which isn't too different from the way a human expert is produced). In fact, I would assume that the first such systems will be created and used by the researchers focusing on rare diseases.

randcraw · on July 18, 2017

Yes, experts will use AI to build better tools that will be accessible via computer and by less-expert physicians to augment their skill base in diagnosis. Trying to fully automate the physician promises to be so difficult and expensive (and legally fraught) that the value it adds will be nowhere near worth the investment. No doubt a few direct-to-patient AI-based apps and services will arise in this space, and maybe AI will allow generalists will extend their reach further into the space of specialists. But robot doctors will remain the stuff of fantasy for many decades yet, I suspect.

_ibu9 · on July 17, 2017

> and will keep doing so until the cost of developing and maintaining software or AI has dropped to zero.

I have no idea about the progress of AI, but normal software will get an order of magnitude cheaper to develop as we slowly wake up from the Unix/worse-is-better/everything-is-text mindset and abandon the dynamically typed and imperative languages, broken systems abstractions, etc. that hold us back.

cm2187 · on July 17, 2017

I think it's a more fundamental problem than the choice of languages (though I share your griefs!).

To the vast majority of the educated population, software is very much a black art and people would have no idea of how to do even the most basic things. That's of course true for more senior people, but I find that it is as true for the generation who graduates today. They can do incredible things with their smartphone that I didn't suspect was possible, but wouldn't know where to start to code something.

Until this skill gap changes dramatically, and that everyone gets out of high school with basic knowledge in programming, like they have basic knowledge of maths, biology, physics or history, this gap will never close.

eli_gottlieb · on July 17, 2017

May God will it that the heathens should see the light and come to Haskell ;-).

sadlion · on July 17, 2017

I sincerely would like to know what you think the alternatives are?

goatlover · on July 17, 2017

Sounds something like Haskell with a Smalltalk environment. Functional, statically typed with powerful type extensions, but with an image instead of text files that you modify.

From just using Jupyter Notebooks, I can see the appeal of working with a live environment, and it's just a fancy REPL, not a full Lisp or Smalltalk environment.

cm2187 · on July 17, 2017

If it has to be a general public language, I'm afraid it will have to be light on special characters and abbreviations or acronyms that made sense 30 years ago. I'd say a Basic or Python-like language, but modernised, and with strong typing to enable the IDE to help a lot the users with auto-completion and error checking.

But if you think about it, most business users are even intimidated by VBA. So it will have to be very fluffy, and I don't think you can spare the mandatory coding 101 teaching at school.

bpicolo · on July 17, 2017

Programming is describing a solution space, and we describe things with words. I don't see how anything but text/speech would map to that aspect of programming.

emodendroket · on July 17, 2017

I'm not holding my breath for that development

meh2frdf · on July 17, 2017

Correct me if I'm wrong but I don't see that with 'deep learning' we have answered/solved any of the philosophical problems of AI that existed 25 years ago (stopped paying attention about then).

Yes we have engineered better NN implementations and have more compute power, and thus can solve a broader set of engineering problems with this tool, but is that it?

nv-vn · on July 18, 2017

Yep. The whole machine learning craze is just fueled by the fact that it's now feasible to create models for handwriting/voice/image recognition that actually work reliably. But in terms of the underlying technology, we haven't had some "breakthrough" that explained how the brain works or anything even close to that.

randcraw · on July 18, 2017

I 90% agree, but if that were all there was to it, we should have been able to achieve the same level of success decades earlier, just by running our ML code longer. Given enough time, old slow AI software should have produced the same analytical results as deep learning does today.

But that's not the case. Deep nets can model vastly more information / state than any other AI/ML method. Once Hinton (and others) showed how to train NNs with more than three layers (ca. 2006) it was finally possible to learn and store all that state. Then with the rise of GPGPUs soon after, deep nets became efficient as well. Thereafter several tasks that had been infeasible even using curated information became amenable to mostly brute force learning strategies driven only by labeled examples -- just lots of 'em.

The question now is how far can we extend DL's tools and examples. Are they sufficient to build higher level cognitive AI agents. Must AGI employ many thousands of deep nets? Or can all those specific-skill nets be folded together somehow into one unified "deep mind"?

Like you, I'm doubtful that today's very specific successes in DL will lead to higher level cognition in the foreseeable future. That path isn't at all clear to me.

landon32 · on July 18, 2017

This is totally true, but I think it's still important to note that while something like Artificial General Intelligence is still way beyond the state of the art, the state of the art still has a huge impact on the world. A tiny slice of that can be seen in autonomous vehicles and the impact that they seem poised to have.

brittohalloran · on July 18, 2017

Don't underestimate the self fulfilling prophecy effect. Quite possible that the massive influx into the field right now will move the needle.

landon32 · on July 18, 2017

Hmm, sometimes I think that we won't get super close to AGI until we can actually model something the size of a Human Brain (in terms of neurons or Synapses). Human Brain has 1B+ Neurons, or 4Qu+ synapses. So that's 12.5 GB all at once to deal with, if you're representing neurons as either 0 or 1. However, in reality Neurons are much more complicated, and could only treat them as binary if you have a neuromorphic computer. So we would need to deal with many many times that many GB at once, even if we had really efficient ways of storing the data.

That's a lot of data to deal with, especially since you need to train it, running huge computations using each neuron.

I know nothing about hardware, and this is a very crude prediction/estimation of how AGI would happen, but my point is that we might be limited by Hardware for a few more years.

mehh · on July 18, 2017

I fully agree with the above responses, and I am optimistic about major break throughs, however like many of you guys, I don't think we should just assume a bit more horsepower and things will magically work.

landon32 · on July 18, 2017

Yeah, I think the author's just priming the pump for a few more posts in this series that show a new way of abstraction/new NN architecture that solves some new problems.

Doesn't seem like he's trying to claim anything philosophical.

tiborsaas · on July 17, 2017

> In short, deep learning models do not have any understanding of their input, at least not in any human sense. Our own understanding of images, sounds, and language, is grounded in our sensorimotor experience as humans—as embodied earthly creatures.

Well maybe we should train systems with all our sensory inputs first, like newborns leans about the world. Then make these models available open source like we release operating systems so others can build on top of that.

For example we have ImageNet, but we don't have WalkNet, TasteNet, TouchNet, SmellNet, HearNet... or other extremely detailed sensory data recorded for an extended time. And these should be connected to match the experiences. At least I have no idea they are out there :)

jonobelotti · on July 17, 2017

Brooks' 'Intelligence Without Representation' (http://people.csail.mit.edu/brooks/papers/representation.pdf) starts with a pretty strong argument imo against the story of 'stick-together' AGI you're describing.

randcraw · on July 18, 2017

I think Brooks' Cog initiative was an attempt to 'ground' the robot's perceptions of the physical world into forming a rich scalable representation model. But it looks like that line of investigation ended ~2003 with Brooks' retirement. Too bad, given the seeming suitability of using deep nets to implement it.

http://www.ai.mit.edu/projects/humanoid-robotics-group/cog/o...

webmaven · on July 18, 2017

Thanks for the link to this interesting paper.

I think we're seeing some recapitulation of those arguments WRT 'ensembles of DL models' approaches.

thundergolfer · on July 18, 2017

I agree. Google has come out with some papers that are, to put it harshly, basic gluing together of DL models followed by loads of training on their compute resources.

webmaven · on July 18, 2017

Not just Google. The FractalNet paper comes to mind.

reckoner2 · on July 17, 2017

This approach has always interested me. I can train an decent Cats Vs. Dogs classifier in a few minutes. But real human intelligence takes many years of continuous and varied input to develop.

Are there systems out there that are taking influence from newborns being exposed to the world? An unsupervised learning system with a huge array of inputs running for years?

tiborsaas · on July 17, 2017

All these current examples of AI and ML are just a very small fraction of what we mean by intelligence so I'm not surprised by the pessimistic posts that hit HN from time to time.

Training systems with rich real world experiences sounds something that Open AI should be developing. It's probably not something that you can do over a weekend plus it takes serious of funding and wetware so it's probably the reason it's not there yet.

folksinger · on July 17, 2017

See: Kant on a priori notions of space-time.

debbiedowner · on July 17, 2017

People doing empirical experiments cannot claim to know the limits of their experimental apparatus.

While the design process of deep networks remains founded in trial and error, and there are no convergence theorems and approximation guarantees, no one can be sure what deep learning can do, and what it could never do.

andreyk · on July 17, 2017

"Here's what you should remember: the only real success of deep learning so far has been the ability to map space X to space Y using a continuous geometric transform, given large amounts of human-annotated data."

This statement has a few problems - there is no real reason to interpret the transforms as geometric (they are fundamentally just processing a bunch of numbers into other numbers, in what sense is this geometric), and the focus on human-annotated data is not quite right (Deep RL and other things such as representation learning have also achieved impressive results in Deep Learning). More importantly, saying " a deep learning model is "just" a chain of simple, continuous geometric transformations " is pretty misleading; things like the Neural Turing Machine have shown that enough composed simple functions can do pretty surprisingly complex stuff. It's good to point out that most of deep learning is just fancy input->output mappings, but I feel like this post somewhat overstates the limitations.

sherjilozair · on July 17, 2017

Just because there's a paper on it, and the model has a name, doesn't mean it works. NTM and deep RL don't work for real problems.

divideby0829 · on July 18, 2017

yeah this was my main problem, I guess he is technically right because they are geometric but many of his analogies like the paper crumpling were deeply misleading as they would imply that the transformations are linear. The fact that they are not is fundamental to neural networks working.

tsbertalan · on July 18, 2017

Paper-crumpling is nonlinear. Maybe your complaint is rather that paper-crumpling is "only" a topology-preserving diffeomorphism?

emodendroket · on July 17, 2017

Presumably that's why the word "just" is in scare quotes.

eanzenberg · on July 17, 2017

This point is very well made: 'local generalization vs. extreme generalization.' Advanced NN's today can locally generalize quite well and there's a lot of research spent to inch their generalization further out. This will probably be done by increasing NN size or increasing the NN building-blocks complexity.

xamuel · on July 17, 2017

Or maybe increasing NN size/complexity is the 21st century version of adding epicycles to make geocentrism work.

http://wiki.c2.com/?AddingEpicycles

rhaps0dy · on July 17, 2017

Heh, but it makes geocentrism works better! And we don't yet know how 21st century heliocentrism will look like, while adding epicycles is less daunting.

mojomark · on July 17, 2017

Yay, I found the rabbit hole - technically, no cellestial body rotates purely around the other (thanks mass!). So, perhaps adding epicycles wasn't erronious after all - just a measurement from a different reference point.

mehh · on July 18, 2017

Or "adding a few more rules" to a symbolic system ...

Cacti · on July 17, 2017

I would really like to hear a definition of what generalization means, because I don't think we have one.

Unless we're talking generalization to arbitrary distributions, which is of course unsolvable.

jd007 · on July 17, 2017

I tend to think of "generalization", in the general sense (pun not intended), to be information compression.

We are simply trying to answer the question: what is the shortest description (i.e. most informationally compressed/dense version) that fits what we see in this infinite (at least to us mortals) universe of ours? Mathematically the length of such a description can be thought of as the Kolmogorov complexity.

Edit: I should add, the information compression performed when generalizing can (and often is) lossy.

eanzenberg · on July 17, 2017

Usually in ML context generalization means anytime a model makes a prediction on unseen (not in the training set) inputs. Usually you do CV to see how well your model generalized because you limit the training data seen, predict unseen inputs that you know the answer to, and see how far off the prediction is.

pc2g4d · on July 17, 2017

Programmers contemplating the automation of programming:

"To lift some of these limitations and start competing with human brains, we need to move away from straightforward input-to-output mappings, and on to reasoning and abstraction. A likely appropriate substrate for abstract modeling of various situations and concepts is that of computer programs. We have said before (Note: in Deep Learning with Python) that machine learning models could be defined as "learnable programs"; currently we can only learn programs that belong to a very narrow and specific subset of all possible programs. But what if we could learn any program, in a modular and reusable way? Let's see in the next post what the road ahead may look like."

visarga · on July 17, 2017

The author said in a Twitter conversation today that he is aware that this phrase is ignoring something essential - namely, that we have systems with memory and attention. That is something different than simple X to y mappings. With memory you can do general computation, recursivity, graphs, anything. They work well on some problems such as translation, but still need to become much better in order to match general purpose programming. But at least we're past the X->y phase.

divideby0829 · on July 18, 2017

considering they're the author of a python based machine learning library I would sure hope so. Still it seems like a pretty grievous oversight in writing the dang thing at all considering how at least in my fields of research memory-ful networks are increasingly popular.

visarga · on July 18, 2017

It was reserved for part 2.

gallerdude · on July 17, 2017

I'm sorry, but I don't understand why wider & deeper networks won't do the job. If it took "sufficiently large" networks and "sufficiently many" examples, I don't understand why it wouldn't just take another order of magnitude of "sufficiency."

If you look at the example with the blue dots on the bottom, would it not just take many more blue dots to fill in what the neural network doesn't know? I understand that adding more blue dots isn't easy - we'll need a huge amount of training data, and huge amounts of compute to follow; but if increasing the scale is what got these to work in the first place, I don't see we shouldn't try to scale it up even more.

nshm · on July 17, 2017

"sufficiently large" could be much more than number of atoms in the universe. You just do not have resources to run computation at such scale.

randcraw · on July 18, 2017

This is my problem with the thesis that simply scaling deep nets to new heights will ultimately subsume all brain function. If it takes weeks to train a simple object recognizer deep net, how long would it take a grand unified deep net to learn to tie its shoelaces? Puberty?

eli_gottlieb · on July 17, 2017

>But what if we could learn any program, in a modular and reusable way? Let's see in the next post what the road ahead may look like.

I'm really looking forward to this. If it comes out looking like something faster and more usable than Bayesian program induction, RNNs, neural Turing Machines, or Solomonoff Induction, we'll have something really revolutionary on our hands!

fnl · on July 18, 2017

Put a lot simpler: Even DL is still only very complex, statistical pattern matching.

While pattern matching can be applied to model the process of cognition, DL cannot really model abstractive intelligence on its own (unless we phrase it as a pattern learning problem, viz. transfer learning, on a very specific abstraction task), and much less can it model consciousness.

cs702 · on July 17, 2017

Yes.

Here's how I've been explaining this to non-technical people lately:

"We do not have intelligent machines that can reason. They don't exist yet. What we have today is machines that can learn to recognize patterns at higher levels of abstraction. For example, for imagine recognition, we have machines that can learn to recognize patterns at the level of pixels as well as at the level of textures, shapes, and objects."

If anyone has a better way of explaining deep learning to non-technical people in a few short sentences, I'd love to see it. Post it here!

randcraw · on July 18, 2017

I really enjoyed this article. It's the first attempt I've seen to assess deep learning toward the integrated end of human level cognition or AGI.

I found one point especially noteworthy: " So even though a deep learning model can be interpreted as a kind of program, inversely most programs cannot be expressed as deep learning models—for most tasks, either there exists no corresponding practically-sized deep neural network that solves the task, or even if there exists one, it may not be learnable, i.e. the corresponding geometric transform may be far too complex, or there may not be appropriate data available to learn it.

Scaling up current deep learning techniques by stacking more layers and using more training data can only superficially palliate some of these issues. It will not solve the more fundamental problem that deep learning models are very limited in what they can represent, and that most of the programs that one may wish to learn cannot be expressed as a continuous geometric morphing of a data manifold. "

What he seems to be suggesting is that a human level cognition built from deep nets will not be a single unified end-to-end "mind" but a conglomeration of many nets, each with different roles, i.e., a confederation or "society" of deep nets.

I suspect Minsky would have agreed, and then suggested that the interesting part is how one defines, instantiates, and then interconnects the components of this society.

lordnacho · on July 17, 2017

I'm excited to hear about how we bring about abstraction.

I was wondering how a NN would go about discovering F = ma and the laws of motion. As far as I can tell, it has a lot of similarities to how humans would do it. You'd roll balls down slopes like in high school and get a lot of data. And from that you'd find there's a straight line model in there if you do some simple transformations.

But how would you come to hypothesise about what factors matter, and what factors don't? And what about new models of behaviour that weren't in your original set? How would the experimental setup come about in the first place? It doesn't seem likely that people reason simply by jumbling up some models (it's a line / it's inverse distance squared / only mass matters / it matters what color it is / etc), but that may just be education getting in my way.

A machine could of course test these hypotheses, but they'd have to be generated from somewhere, and I suspect there's at least a hint of something aesthetic about it. For instance you have some friction in your ball/slope experiment. The machine finds the model that contains the friction, so it's right in some sense. But the lesson we were trying to learn was a much simpler behaviour, where deviation was something that could be ignored until further study focussed on it.

pishpash · on July 17, 2017

You can self train, like AlphaGo or that pong playing thing. As for the aesthetics part, there are theories on what aesthetics is, and part of it has to do with parsimony. Machines can certain constrain themselves on that.