Hacker News new | past | comments | ask | show | jobs | submit login

I think the current critique of Deep Learning as a technique is a dead end. Now I'm only a semi-professional in a sea of recent development and research, but if you ask me, the technique itself can take us all the way to General Intelligence and more.

What's missing from this discussion is how we humans don't just use a single, vision based "deep net" in order to identify the banana, we use lots of very different deep nets that run in parallel and cross-validate each other.

We have a vision net like computers that can be fooled with the same tricks (just google "optical illusion").

But a second net does three-dimensional inference of the banana shape from stereo vision and cross-validates that with a toaster. If in doubt, unnoticed parallax movement will be commanded to our body in order to get a better shape estimation.

Another net parses the content. A banana in a jungle or on a table would be highly plausible. A banana in an operating room or on a roller coaster would be implausible.

Another net parses context and timeline. If no banana was there and no banana has been incoming through our continuous estimated 3d space, there is probably no banana there.

There's smell, taste, all kinds of things running in parallel that continuously cross-validate our perception of this being a banana, probably a lot of different models just from vision alone that feed back into the main classification thread.

We'll even happily do looped cross validation if our eyes rest on a scene. Our memory will fetch reference data for all first order predictions and start subconciously processing this into the main prediction. It will predict how a banana feels, how soft it is, how warm. If we see a banana on lava that is not burning, our logic and reasoning will challenge the prediction. All of this happens subconciously, and there's no magic behind.

If you ask me, there are no limits of deep learning. There's just still a limit of imagination in feature extraction and processing.




The "Beyond Deep Learning" section of the article talks about this approach:

> A more radical possibility is to give up trying to tackle the problem at hand by training just one big network and instead have multiple networks work in tandem. In June 2018, the DeepMind team published an example they call the Generative Query Network architecture (7, http://science.sciencemag.org/content/360/6394/1204?ijkey=01...), which harnesses two different networks to learn its way around complex virtual environments with no human input. One, dubbed the representation network, essentially uses standard image-recognition learning to identify what’s visible to the AI at any given instant. The generation network, meanwhile, learns to take the first network’s output and produce a kind of 3D model of the entire environment—in effect, making predictions about the objects and features the AI doesn’t see. For example, if a table only has three legs visible, the model will include a fourth leg with the same size, shape, and color.


If we want "dissectable" AI, we will almost certainly have to break them up into modules or semi-independent sub-systems. While one-big-network may technically be able to do the job, it's harder for most humans to study, tune, and tame such a system. We like to know who to sue if bots go kaflooey :-)


I'm really excited by the concept of modular neural networks


It doesn't have to be just NN's. For example, stereoscopic vision (triangulation) or LIDAR can create a 3D depth map model (perhaps with certainty weights). NN's can also create one based on mass 3D training on flat images (non-stereo). The two models can be compared, and alternative pattern-match candidates can be explored for the spots that don't match well.

I suppose the stereoscopic vision system could use NN's, but the implementation can be swapped for something else as long as the result (interface) is a depth map. You could switch it from LIDAR to multi-eye stereoscopic translation and vice versa. The interface between the parts is something human analysts can inspect and relate to.

You may also have a 3rd module that attempts to model the scene using 3D rendering of the "guessed" objects (using a database of known objects) from the flat-image NN and depth-map first pass, using the match/non-match feedback to test yet more guesses. (Certain objects with semi-random shapes like a pile of cloth or mountains may be excluded from known-object-matching, at least as a test candidate.)

There might even be a genetic algorithm to "breed" the best 3D-rendered model that fits the depth map, the flat image, and the top flat-NN candidates.


>> We have a vision net like computers that can be fooled with the same tricks (just google "optical illusion").

I don't know of any optical illusions where a bus turned upside down is perceived as a plowshare, or an image of an elephant superimposed on the image of a room causes the other objects in the room to be misidentified.

Human optical illusions reveal the structural bias in the way we see, but that has nothing to do with adversarial examples and overfitting to a few pixels.


> I don't know of any optical illusions where a bus turned upside down is perceived as a plowshare, or an image of an elephant superimposed on the image of a room causes the other objects in the room to be misidentified.

I wouldn't be so sure about that actually. Observing my daughters, I'm sometimes extremely surprised about how they identify objects. As you grow, your concious mind probably simply does not notice the continuous cross-validation happening.

Also, at the severe augmentation levels of years of ~50 fps "live video" training data, I can imagine there's another compensation algorithm at work in humans that tries to logically orient any picture into a most common, "upright" fashion before even trying to identify anything.


I don't understand what you mean by cross-validation. I'm guessing you don't mean actual cross-validation, where you test a model learned on a training partition against a testing partition, etc?


Maybe it's about context. One wouldn't expect to see a (realistic) elephant in a living room. For humans, if something seems "out of place", we tend to focus on it more to make sure we are interpreting it correctly. NN's have no sense of "out of place" other than "knowing" X and Y are not normally seen together in the training set.

Humans give oddities "more CPU" (or run like heck to avoid being trampled and worry later about being right). We apply life experience and logic to explore it deeper. In the elephant case, the shadows look wrong also, which should trigger the "Maybe it's a cheap Photoshop job" module in our brain. But NN's don't understand the concept of cheap Photoshop jobs because they don't browse the web for cute kitten videos and fun but fake news like humans do. Maybe NN's can be trained for fake photo detection, but solving every identification hurdle of the human world by adding yet another dimension to the training step probably won't scale.


> a bus turned upside down is perceived as a plowshare

If I'm thinking of the same picture you are, it's a schoolbus on its side on a snowy road that is misidentified as a snowplow.

FWIW, that picture does look a lot like a snowplow to my eye. Big yellow rectangle against road cleared of snow? If I glanced at it, I might have said it's a snowplow.



Stay awake long enough though and human image processing gets really messy. I personally mis-identify things as people or spiders when scanning a scene. My brain jumps to a conclusion before I have time to double check and refine what I see.

I wonder what the mental model for that is - I've always assumed the brain iterates and refines but short-circuits when we're tired since processing is slower/limited.


I'm not so sure. We have the entire industry of magicians devoted to making some pretty ridiculous illusions. Everyone knows it's fake on a conscious level. But the fact that everyone applauds this thing they know is fake is a pretty good proof that they are very impressed on a subconscious level.

The human mind has a lot more potential inputs AND it also tends to have years or decades of experience to draw on as well. Give the AI a quadcopter to fly around in and a huge database and it's going to go investigate the mislabeled data and figure out that it was misled. Similarly people walk towards mirages looking for an oasis.


>> Give the AI a quadcopter to fly around in and a huge database and it's going to go investigate the mislabeled data and figure out that it was misled.

What kind of AI is that? It has the ability to choose its own goals, to figure out when it's made a mistake and explore the world around it to correct its model of it? That's a pretty damn advanced AI.

Edit: Btw, if you think that sort of thing can work and that all you need is an AI, a quadcopter and a huge database- why don't you try it? The rewards would be astronomical, well worth any investment.


Apparently, I was too vague. I'm not convinced that what I have in mind is this ambitious.

1) Database with objects likely to be found together and statistical weights next to them.

2) Image recognition that can link to the database from 1.

3) Lidar that will allow the quadcopter to map environment in 3D regardless of how it labeled objects.

4) Any object that shows up that shows a statistical unlikely placement triggers the quadcopter to fly around it in a circle in an attempt to get a different picture so that a relabel attempt can be made.

I do admit that my total understanding of how good image recognition is isn't state of the art, but I thought there were some success stories.

> Btw, if you think that sort of thing can work and that all you need is an AI, a quadcopter and a huge database- why don't you try it? The rewards would be astronomical, well worth any investment.

This does sound a bit like, "If you're so smart why aren't you rich." I suppose I won't be able to implement the plan because A) I don't have access to massive databases that contain objects and the other objects they are likely to be found next to and B) I don't think my "fly around weird stuff and try to relabel" algorithm is likely to have a very good business model.


>> 2) Image recognition that can link to the database from 1.

What do you mean "link"? Link- how? Why does the image recognition model need a separate database? From your description, that sounds like a database with proximity relations between different objects. "Statistical weights" sounds vague- what statistics? Weights appying to what? In any case, what objects would go to the database? Where would they come from? Are you imagining a database of pairs of images of all possibles objects the AI might encounter and in every possible combination? That is unrealistic.

You are describing a setup that is not at all like the image recognition systems in existence today, that take in a large number of labelled examples of images of various types of objects (the labels determine the type of object) and output the parameters of a classification function.

4) Any object that shows up that shows a statistical unlikely placement triggers the quadcopter to fly around it in a circle in an attempt to get a different picture so that a relabel attempt can be made.

Yes, this is too ambitious. You seem to base this ability on the proximity database you describe earlier, but like I say, such a database is unrealistic. Even given such a database, your proposed system would not automatically have the ability to take correct decisions based on that database- you'd have to develop the ability to "fly around it in a circle" etc separately, and that is a very hard problem. Because it's a hard problem to develop intelligent agents with autonomous behaviour, particularly with intrinsinc motivations, like a motivation to refine their own model of the world. At the moment, this kind of thing is science fiction.

I didn't mean to offend you. The creation of the database you describe would have a very high monetary cost, but if what you propose could work you'd get your money back in spades, and undying fame through the ages, to boot, because you would have solved strong AI. So, if you really think it can work and you don't see why not- try it; and see why not.

But if you won't try it, at least try to learn a few things about AI and figure out why what you propose makes no sense given current technological capabilities.


> From your description, that sounds like a database with proximity relations

No that's not what I had in mind.

> Are you imagining a database of pairs of images of all possibles objects the AI might encounter and in every possible combination?

No, that's not what I had in mind.

> You seem to base this ability on the proximity database you describe earlier,

If you say so, but that wasn't my intent.

> "fly around it in a circle" etc separately, and that is a very hard problem

The first thing that I actually did suggest was possible. Using lidar to make sure you don't bump into anything, fly left of the object and once lidar indicates you've cleared it in the other axis fly "towards it. Continue until you've circled it (although, in this scenario it's a square). Made much easier because quadcopters fly so it's much less likely to bump into something.

Here's a video where someone claims to have an autonomous plane that avoids obstacles: https://www.youtube.com/watch?v=_qah8oIzCwk

That's the only thing my claim needs.


Obstacle avoidance is nothing like what you describe in both your previous posts. You're talking about an AI that can autonomously decide when it needs to refine its model of the world and then take appropriate action to do so. You are describing a system that can make autonomous decisions about where to go and what to do. The "only" thing that needs is full autonomy and general intelligence.

I'm sorry if I don't understand what kind of database you mean, still, but that's because you keep describing it in very vague terms. First it was a "huge database" (of what?), then it was a "Database with objects likely to be found together and statistical weights next to them". What kind of objects, what kind of statistics and what kind of weights?

I mean, if you yourself don't know what you're trying to say, don't blame me for not getting it.


Re: fly around...choose its own goals...explore the world around it to correct its model...

"This is LearnBot. Human, I need to extract your spleen to learn how they differ from bladders. Please hold still..."


To add to this, I think the critique in this article greatly underestimates how much data a child is exposed to by the point they can identify a cow. Yes, you can teach a two year old what a cow is without training them on 10k cow images — but they have developed deep neural connections by this point in their lives, generalizable on all sorts of problems.


If there's one lesson to take away from our history of studying the brain, the phenomenon of human intelligence, and the halting 'progress' of AI, it's that it always seems as easy as you propose. At first.


One great problem - and I am but a hobbyist at best - is that of back-propagation, which is the foundation of deep learning neural networks.

As far as I know of (and I could be wrong) - there has yet to be any kind of proven method that the human brain uses any form of back-propagation for learning. From what I understand, there isn't a biological equivalent.

Which may be ok, in the long run - after all, we fly using fixed wings and propellers, and not flapping wings (as was tried over and over in the history of flight, with little to no progress and numerous failures). That is, success can be had by not replicating nature exactly.

ANNs do this at one level - neurons in the network are simplified mathematical representations. But the learning portion seems to add more complexity; the backpropagation algorithm is simple in concept, but seemingly more complex than what the human brain and natural neurons do for learning.

We also seem to learn things faster; natural neural networks - brains - regardless of species - only require relatively few examples to learn and generalize from, while deep learning artificial neural networks require thousands and more, of labeled examples.

These two issues - the need for backpropagation for learning, and the need for thousands and thousands of labeled examples exhaustively run through - both have resulted in machines and systems of hardware and software of immense complexity and power usage. We haven't even created the equivalent of a mouse brain, and if we did, it would take the energy output of Niagara Falls to run it.

From what I understand, this is also (in broad terms) Geoffrey Hinton's position as well; that our current systems, as well as they do perform (and do they - I'm still amazed at some of the things I saw my simplified implementation of NVidia's End-to-End model do to drive a simulated car around a track), are probably not the right implementation for going forward, and we should step back and find another way (Hinton's current thing is something called "Capsule Networks" which I haven't looked into too deeply - from what I have seen of them, they still appear overly complex, but again, that is just my hobbyist opinion, and certainly carries no weight here).

As a hobbyist in this field, I don't have any solutions to these issues, but they are something that keeps me thinking: How has nature actually done it? How does is it able to be done with so little energy input (beyond scale differences)? How does it work with so few training examples?


> neurons in the network are simplified mathematical representations.

That is the important difference. A biological neuron is more like a cell in the cell automaton (Conway's game of life). It is a dynamic process, even if the dynamics change by a small factor, it could have a very different behaviour. Artificial neurons on the other hand are much more uniform and lack the time dimension. If you account for dynamics, then learning in the brain is a completely different thing than learning in neural nets. It has been proven that Hebbian learning (local plasticity of synapses) approximates gradient based learning, though in simulation it attains lower accuracies.

Another observation was that biological neural nets don't have a mechanism to communicate gradients back. But a deep learning experiment proved you can have fixed random back connections and the network will still learn. They don't actually need to be symmetrical connections.

The real mystery is about loss functions. Such a loss function would have to be computed in a distributed fashion in the brain. They are the origin of the learning signal. The brain has many such loss functions that have been discovered by evolution. We have not understood loss functions well enough, and understanding them would be a huge leap forward.

Another important observation is that the brain learns from the environment, not a fixed training set. Environments are richer sources, and our current crop of AI agents are playing all sorts of games, but they never have the same complexity as real world. Btw, tomorrow DeepMind will make a Starcraft demo.


I keep saying this to neural net enthusiasts: brains and neural nets don't have anything in common other than some nouns. Not network topology, not even level encoding (bio neural nets are pulse rate encoded -closest "neural net" to this is something like a liquid state machine, which is not easily simulated on a Von Neumann computer). We actually know an awful lot about how things in the visual cortex work. Nothing like a "deep network."


Which makes the compelling results of deep networks even more mysterious. (https://gandissect.csail.mit.edu/) How could we have captured the seemingly enormously complex act of visual object recognition, with all the neurons and synapses and chemicals, in something so crude and simple as backpropagation and stochastic gradient descent with a bunch of ones and zeros? Or perhaps both processes, biological and artificial, point to a more fundamental emergent phenomenon of the computational universe we inhabit?


I'm pretty sure if you threw the number of grad students and GPUs at PGMs or various ensemble techniques, the results wouldn't be very different. Fundamentally prediction as we do it is just a form of data compression.

I don't think brains work in a way we understand at all (though pieces like visual cortex are well understood).


Hobbyist here as well.

> As far as I know of (and I could be wrong) - there has yet to be any kind of proven method that the human brain uses any form of back-propagation for learning. From what I understand, there isn't a biological equivalent.

Is there research done on how individual or a small group of neurons learn?

> We also seem to learn things faster; natural neural networks - brains - regardless of species - only require relatively few examples to learn and generalize from, while deep learning artificial neural networks require thousands and more, of labeled examples.

When you take the point of your parent's post, then I'd assume having multiple neural nets learning at the same time and seeing things from different perspectives then we might get there as well.

> We haven't even created the equivalent of a mouse brain, and if we did, it would take the energy output of Niagara Falls to run it.

Our brain is specialized architecture. Emulating a brain on a Von Neumann machine designed computer is similar to a brain trying to emulate a computer. To me it makes sense that you'd need specialized hardware for this.

I wonder if we could simulate a neural net of a simple creature, like a worm.


It's far too early to render a judgment on the long term potential of DL toward building Commander Data, both pro and con. But after some 60 years of AI research we understand the problem space pretty well, and we can see also that DL has shown little or no promise toward serving many of the component processes necessary to deliver General AI/AGI.

(And just because you can federate and combine DL networks like you can NAND gates doesn't mean they are turing equivalent to the human brain. It's fundamental to claims of such turing-AI-equivalency that we have absolutely no idea yet how to do this -- making yet another AI technique likely to suffer the same fate as all preceding AI methods in failing to solve general AI problems, that is, a failure to generalize and scale up beyond serving a specific narrow task, AKA brittleness.)

For instance, Gary Marcus at NYU has addressed the present limits of DL in terms of cognition in several papers and talks:

https://arxiv.org/abs/1801.00631

https://medium.com/@GaryMarcus/the-deepest-problem-with-deep...

Likewise, Doug Hofstadter has been critical of using DL to automate machine translation's semantic nuances:

https://qz.com/1088714/qa-douglas-hofstadter-on-why-ai-is-fa...

At present, it's still unclear how DL could surmount single-task brittleness to address these shortcomings, much less diversify into the use of logic or the modeling of concepts, their dependencies, causality, explain itself, and much more. Unbridled faith in DL's ability to jump these sharks should be tempered by the AI community's past mismeasures of earlier AI tech that we were sure would conquer everest over the years, but in fact fell far short (e.g. perceptrons, expert systems, backprop-driven shallow nets, and oh so much more).


It could be like Turing Equivalency in that deep nets of deep nets may be able to do the job, but are not the most efficient and/or easiest AI for humans to tune and tame.

I'm not saying gluing bunches of deep nets together is bad R&D, only that it may not be the only or best AI game in town. I suppose if we got a wad of deep nets to give General Intelligence (including "common sense"), but it took a big room of servers, that may be a good enough springboard, but I'm not sure putting all our eggs in the deep net basket is the best route to fuller AI just because it's been the most promising up to this point.

It's roughly comparable to spending all R&D perfecting vacuum tubes in the 1940's because they were the best known route back then to general digital electronics. The best starters are not always the best finishers.


> It's roughly comparable to spending all R&D perfecting vacuum tubes in the 1940's

Vacuum tubes continued to be improved all the way up into the early 1960s, long after transistors had taken hold:

https://en.wikipedia.org/wiki/Compactron

One in particular was used mainly in television sets, all the way up into the early 1970s:

https://en.wikipedia.org/wiki/Nuvistor

...it competed with transistors, and was almost as small as some of its contemporary transistor cousins.

Even today, you can't rule out the vacuum tube - for some things, there isn't any substitutes:

http://www.nutsvolts.com/magazine/article/vacuum-tubes-for-t...

...and new ideas for the old technology are being investigated:

http://www.sciencemag.org/news/2012/05/return-vacuum-tube


I didn't mean to imply that vacuum tubes stopped having uses or stopped improving. My main point is that they stopped being the primary source of electronics and digital processing innovation and progress.

Even if by chance neural/deep-nets stop being the primary source of AI innovation, I'm sure people will find and make improvements in them regardless.


Agreed. Are you working on anything related to this?


I wish more so called experts and professionals would put serious effort into meditation and/or quantum physics.

There's plenty more to consciousness than statistics and pattern matching, self awareness is a big one; and adding more layers of the same and/or curating better data sets isn't going to solve the problem.

Of course there are limits, it's statistics and pattern matching and has very little to do with intelligence. Not to speak of risks, since it will keep failing in surprising ways and never be able to explain why.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: