Hacker News new | past | comments | ask | show | jobs | submit login
End to End Learning for Self-Driving Cars [pdf] (nvidia.com)
157 points by rdabane on May 9, 2016 | hide | past | favorite | 101 comments

One problem with training a neural network end-to-end this way is that the system is susceptible to unpredictable glitches: The same principle that lets people trick a NN into [thinking a panda is a vulture](https://codewords.recurse.com/issues/five/why-do-neural-netw...) can happen randomly just by differing lighting/shadow conditions, sun glare, or who knows.

One can always train the network with more and more scenarios, but how do you know when to stop? How good is good enough in this regard?

For the "thinking a panda is a vulture" problem, don't humans fail in similar ways? The analogous examples for us are camouflage, optical illusions, logical fallacies, etc.

It doesn't really have to be perfect as long as it doesn't fail in common scenarios.

Humans don't appear to fail in the same ways - camouflage and optical illusions are very different to the specific imperceptible-to-humans changes that trick neural networks. Then again, there's no way to test the method on humans because you need to know the neural network weights and that is tricky for people!

In practice it probably doesn't matter anyway - the chance of the exact required perturbation of the input happening by chance are infinitesimal, due to the high dimensionality of the input. And even if it was a problem there are ways around it.

For the "thinking a panda is a vulture" problem, don't humans fail in similar ways?

This is a good question. My impression is that humans fail and artificial neural networks fail but we don't know enough about the brain to say artificial neural networks fail in the same way as humans.

As another poster notes, humans accept human error more than computer error and I think that's because humans have an internal model of what other humans will do. If I see a car waving in a lane and going slowly, I have some ideas what's happening. I don't think that model would extend to a situation where neural network-driven car was acting "wonky".

Is this a good time to ask whether the dress is blue/black or white/gold? ;)

It should never fail since any failure could potentially create a fatal scenario. People usually accept fatalities because of human error but they won't accept death because of algorithmic failure.

I suspect that it won't take long for people to come to terms with it in the same way we now "accept" industrial accidents. "Accept" in this case simply means that the industry in question is allowed to continue doing business.

That's an unattainable high acceptance bar. A more reasonable one would be to have mass adoption of self driving cars as soon as self driving cars cause less accidents than human drivers.

Not every car crash ends in death. But the AI will learn a lot from each crash. I think mistakes and 'bugs' in the system will get ironed out at low speed crashes and in high speed crashes on test circuits...

Have you seen the AI Formula 1 called roborace? Once those cars get good enough to beat Lewis Hamilton or Seb Vettel I'll trust it with me and my family.

Do people accept death due to autopilot error in aeroplanes? It's the same thing. There has been no demands for autopilot to be removed from planes or mass refusal to fly. The reason is that most people can see that autopilot is an overall safety gain compared with getting a human to concentrate on the same thing for long periods of time.

> It doesn't really have to be perfect as long as it doesn't fail in common scenarios.

i agree that it doesn't have to be perfect, but the standard should be higher than "doesn't fail in common scenarios." we should also expect graceful handling of many uncommon but plausible scenarios. we expect human drivers to handle more than just common scenarios, and human drivers are pretty bad.

The adversarial examples are so weak that they disappear if you give the CNN even some attention or foveation mechanisms (that is, they work only on a single pass). How much effect are they going to have on a CNN being used at 30FPS+ to do lane following under constantly varying lighting and appearances and position? None.

Are you referring to this foveation paper (http://arxiv.org/abs/1511.06292)? I'm quite skeptical of the claims in that paper; upon closer reading their experiments are problematic. Also, it appears the paper was rejected. I can elaborate if that is indeed the case.

> I can elaborate if that is indeed the case.

Please do anyway.

I'm wondering whether adversarial examples can also be found for autoencoders to the same extent. It seems very intuitive that you can overstep the decision boundary that a discriminatory network learns by slightly shifting the input into the direction of a different, nearby label.

Yes. And rejection means little. The point is that adversarial examples have to be fragiley constructed to fool on one single example for one forward-pass. There is no evidence that any adversarial examples exist which can fool an even slightly more sophisticated CNN, fool a simple CNN over many time-steps, fool a simple CNN for enough time-steps to lead to any noticeable differences in action, fool a simple CNN for enough time-steps to lead to a noticeable difference in action which could lead to an accident, or fool a simple CNN for enough time-steps to lead to a noticeable difference in action which leads to an accident frequently enough to noticeably reduce the safety advantages.

The paper was rejected (you can read the ICLR comments) because the experiments did not really support their point. And I agree. The gist of the experiments they ran to support their thesis was to take a CNN and construct adversarial examples that sucessfully fooled it. They then applied foveation, and showed that the CNN was no longer fooled. Which is obvious! It's kind of obvious to me that adding preprocessing that the attacker is unaware of would be able to beat the attacker. What they didn't do is regenerate the adversarial examples assuming the attacker has knowledge that the target was using foveation.

There are no experiments that support your statements, unfortunately.

The examples where this happens have always seemed fairly weak to me. How many of the grave errors, not just where it's the wrong type of animal or container but actually thinking it's radically different, survive an application of Gaussian blur? Furthermore self-driving cars are a combination of signals; you are going to need to simultaneously fool both LIDAR and cameras.

On top of that you are going need to fool them over multiple frames, while the sensors get a different angle on the subject as the car moves. For example in the first Deep Q-learning paper, "Playing Atari with Deep Reinforcement Learning"[0], they use four frames in sequence. That was at the end of 2013.

I don't think anyone will be able to come up with a serious example that fools multiple sensors over multiple frame as the sensors are moving. Even if they do then inducing an unnecessary emergency stopping situation is still not the same as getting the car to drive into a group of people. Even if fooled in some circumstances the cars will still be safer than most human drivers and still have a massive utilitarian moral case in relation to human deaths, on top of the economic case, to be used.

The fooling of networks is still an interesting thing, but it's been overplayed to my mind and is not particularly more interesting than someone being fooled for a split second into thinking a hat stand with a coat and hat on it is a person when they first see it out of the corner of their eye.

[0]http://arxiv.org/pdf/1312.5602.pdf page 5

1. Gaussian blur is just a spatial convolution (recall from signal processing). If a network is susceptible to adversarial examples, it will still be susceptible after a Gaussian blur (assuming the adversary knows you're applying a Gaussian blur. If the adversary doesn't, that's just security by obscurity, and they'll find out eventually).

2. A sequence of frames does not solve the issue because you can have a sequence of adversarial examples (although it would certainly make the actual physical process of projecting onto the camera more difficult, but not really any more difficult than the original problem of projecting an image onto a camera).

3. Using something conventional like LIDAR as a backup is the right approach IMO, and I totally agree with you there. But Tesla and lots of other companies aren't doing that because it's too expensive.

1. If that's the case perhaps another kind of blurring? "Intriguing properties of neural networks" (https://arxiv.org/pdf/1312.6199.pdf page 6) has examples where you get radically different classifications that I don't think would occur naturally or survive a blur with some random element, let alone two moving cameras and a sequence of images. As the title says it's an intriguing property, not necessarily a huge problem.

2. I honestly can't think of a situation where this could occur. It's the equivalent of kids shining lasers into the eyes of airline pilots, but the kids need a PhD in deep learning and specialised equipment to be able to do it. A hacker doing some update to the software via a network sounds much more plausible than attacking the system through its vision while it's traveling.

3. This is the real point in the end I guess, this Google presentation (https://www.youtube.com/watch?v=tiwVMrTLUWg) shows that the first autonomous cars to be sold will be very sophisticated with multiple systems and a lot of traditional software engineering. Hopefully LIDAR costs will come down.

1. Those are examples for a network that does not use blurring. You have the be careful because, remember, the adversary can tailor their examples to whatever preprocessing you use. So the adversarial examples for a network with blurring would look completely different, but they would still exist. Randomness could just force the adversary to use a distribution over examples, and it could mean they are still able to fool you half the time instead of all the time. However, I wouldn't trust my intuition here: that is really a question for the machine learning theory researchers (whether there is some random scheme that is provably resilient or if they're all provably vulnerable, or proving some error bounds on resilience, etc.).

2. The problem of projecting an image onto a car's camera already implies you'd be able to do it for a few seconds.

"It unlikely to happen" is not a good strategy to rely on with systems operating at scale. There are about a billion cars on earth traveling trillions of miles every year, many of which will eventually be self-driving. At that scale, you don't need a malicious actor working to fool these systems, you just need to encounter the wrong environment. And even if the system is perfect on the day it's released, that doesn't mean that it will remain so indefinitely (even with proper maintenance).

Studying induced failure in neural networks may help us understand the failure modes and mechanisms of these systems.

I haven't seen a paper that shows this "tricking" can be used as a real world attack or happen randomly. Just because you can compute an input that has this unusual behavior doesn't mean there is a demonstrably nonzero probability of it happening.

Wait? That's exactly what it means. Since the networks are not "continuous" you can't reason about how the system will behave in actual real world conditions because any random fluctuations can cause the whole thing to malfunction. I put continuous in quotes because it's not the real definition of continuous like in real analysis but a good enough analogy as in small variations in input should not lead to wildly different outputs.

This is why any model that lacks explanatory power can't be used in mission and safety critical systems. If it can't reason about things the same way people can reason about things then the system overall can't really be trusted. It's one thing when a translation from english to spanish is wrong, it's a completely another thing when the control software of a self-driving car decides to accelerate instead of break and the root cause analysis is people throwing their hands up and saying neural networks are inherently susceptible to these kinds of problems.

To be fair, you should be more precise. The attacks are specifically calculated. The combinatorial space of possible inputs is so massive that I'm sure it is extremely unlikely for a malicious input to occur randomly.

I don't think it has to do with the combinatorics of the input space. Adversarial inputs are hard to generate until someone figures how to point a set of laser pointers at exactly the right spots on a truck on a highway to get it to swerve out of control.

How is that different than today with human drivers? A laser to the eye will cause lots of swerving.

> because any random fluctuations can cause the whole thing to malfunction

You made a very specific claim that random fluctuations could have the same effect as adversarial examples. I was addressing that.

Yes, that makes sense and you're right. I don't have a proper definition of randomness and I wouldn't expect generic noise to cause issues.

These changes are not random. The whole reason neural networks work at all is because probable differences do not mislead the network.

This is not true at all. Adversarial input can indeed be probable input depending on your definitions and I haven't seen anything yet that describes the probability distributions of inputs. Everyone takes a bunch of training examples and extrapolates from there.

Is not this the problem of induction in Philosophy?

Are not all the minds subjected to the same limitation?


In theory, yes. In practice, we've built our roads, signals, and car interiors, tailored to our specific minds and concepts.

So when we see someone that just started driving perform well under some circumstances, we can good performance under circumstances that are similar to the human mind. The problem that the "fooling neural networks" experiments show is that two things that are similar for humans can be wildly different for a NN that's been trained to recognize them.

You use a test set of scenario's on which you don't train but only measure effectiveness. When accuracy on the test set exceeds your chosen threshold, that is good enough.

What is the accuracy of the human brain in recognizing traffic situations? It is probably not that hard to get a NN to do better, even if periodically it still causes an accident. This is the uncanny valley effect for self-driving cars. It's not enough to be better than average humans at driving, which i think they already are, they have to be perfect at driving for people to trust them.

The constant question for self driving cars is "How will we know when they are good enough?"

Is there any reason they couldn't just put a driving test examiner in the car and test it like you would a human? Just ask the thing to drive around town, emergency stop, park, navigate a roundabout etc.

Yeah, the driving test assumes you have human level cognitive function and can apply the demonstrated skills in a much much wider variety of situations than those that occur during the test.

With just 72 hours of training data, this is an extremely impressive result, scientifically speaking. However, in terms of quality control/best practices in automotive, it's simply unacceptable to trust human lives with a monolithic black-box system. The reason to have modular building blocks in a production system is not necessarily better performance, but the must-have ability to test, troubleshoot, debug, fix and replace things by reducing the degrees of freedom - it's the ABC of engineering even in much less sensitive industries. So while N2N is going to be very useful for rapid prototyping, and perhaps setting the performance bar for other method, I doubt it will ever be used in production.

Can be used as additional system for taking decisions.

This is a great point. Reminds me of how airplanes have redundant flight computers and compare the outputs of the computers to determine if one might be faulty. https://en.wikipedia.org/wiki/Fly-by-wire#Redundancy The output of different self driving models could be compared to handle more difficult driving situations -- I never thought of that.

This reminds me of ensemble. Most machine learning techniques benefit in accuracy and precision by ensembling different methods. More different the methods are (variance) better improvement you get.

Another obvious problem with having the neural network map from vision to steering is that you can't make it take decisions based on what it hasn't seen yet, but will see. E.g. changing lanes because to reach your destination you'll have to make a right turn. The authors note that E2E learning makes for a better smaller system, on account of being free from human-imposed concepts. That's fine if that's your only goal. But I think it's essential for autonomous cars to be able to reason in terms of those human concepts.

This doesn't describe the (very basic prototype) system in the paper, but there is no theoretical reason there cannot be a recurrent net that plans ahead (obviously developing such a system is extremely difficult)

What would you give it as an input, though? Real-time screen captures of Google Maps in navigation mode?

This is really interesting, but I'd want autonomous cars to be better than humans at driving, not to emulate them.

I've read that one of the problems with Google's self-driving cars has been that other cars tend to run into them because the self-driving cars drive extremely conservatively and violate other driver's expectations of how a typical California driver is expected to behave.

I think this sort of thing is something developers are going to have to find ways of dealing with; a car can be technically driving in a safe, legal way but if it's too different from how a human would drive, they are going to be a safety hazard.

Of course, standard driving behavior varies dramatically from place to place. For instance, in the United States, everyone is expected to get out of the way of whichever car has the right-of-way in that situation. In Indonesia, the car that has the right-of-way is expected to slow down, stop, or move over to accommodate other cars that do things like pull out in front of them in an intersection or pass on a two-lane road with oncoming traffic. A self-driving car in Jakarta would need to be trained very differently than a self-driving car in Seattle or Paris. Not just because the traffic laws are different, but because drivers have very different expectations about what is normal behavior.

>problems with Google's self-driving cars has been that other cars tend to run into them

I feel like this is already an urban myth, given the small amount of people who have actually been driving around the cars. And won't some of the people at fault for hitting them try to put the blame on the robot anyways?

Is there even a credible source for what you read?

Maybe you wouldn't consider this "credible", but Google publishes a monthly report listing every collision their autonomous cars have been involved in. [1]

I was curious, so I went through the whole list. By my count, in the history of the program they've been involved in 19 accidents during autonomous operation, and the car was only at fault in one of those. [2] The majority of the other crashes were caused by other drivers rear-ending the car while it was stopped.

[1]: https://www.google.com/selfdrivingcar/reports/

[2]: http://www.theverge.com/2016/3/9/11186072/google-self-drivin...

"While it was stopped" is the key.

It's hard to argue that a car stopped at red light violates anyone's expectation of how a human would drive.

That's a misleading phrase. A number of the accidents have occurred because the Google car abruptly came to a stop in a situation where a human driver would not have stopped. Slamming on the brakes is dangerous.

Here is one example. It is hard to be sure exactly what happened, because Google obviously phrases its accident reports to put its cars in as favorable light as possible.

"April 28, 2016: A Google self-driving prototype vehicle travelling westbound in autonomous mode on Nita Avenue in Palo Alto was involved in an accident. The prototype vehicle came to a stop at the intersection of San Antonio Road, then, prior to making a right turn on San Antonio Road, began to gradually advance forward in order to get a better view of traffic approaching from the left on San Antonio Road. When the prototype vehicle stopped in order to yield to traffic approaching from the left on San Antonio Road, a vehicle approaching at approximately 9 mph from behind the prototype collided with the rear bumper of the prototype vehicle."


Obviously the textual description is limited (e.g. a video would settle this question), but that description alone is hardly unnatural. The behavior of the Google car is behavior I make all the time, and one I see drivers making all the time: inching forwards in the right-hand lane to look at the traffic on the left, then stopping because you've decided not to go for it.

The fact is: slow speed rear ends are really common. I've had them happen to me several times during one year where I commuted every day. I've done it myself on another car.

I would not be surprised if over the course of the next few years Google cars get rear-ended at light to moderate speeds hundreds of times.

That says nothing about driver expectations, and is a poor interpretation of statistics.

About 23-30% of human accidents are rear end collisions. It is entirely possible that the car drives so well that other types of collisions are minimized.

That leaves rear-end collisions - the type the car can't control - misleadingly seeming to be abnormally high.

So if a conservative human driver from a peaceful low-traffic part of the country goes to California and drives there, and an aggressive Californian causes the accident between the two cars, the conservative driver is to be blamed?

Sounds like a textbook example of blaming the victim.

Self driving cars must do significantly better safety wise otherwise the adoption will be hampered.

Plus, the whole point here I think is to save lives. Google self driving cars arent really a hazard they're more just very annoying because they are overly cautious.

I feel like people just assume that they are annoying to drive around, but very few people actually have experience driving around google's cars. The times that I'm around them (a few mornings a week), they are never in any way weird or annoying. In fact, they are extremely predictable, and therefor, if anything, less annoying to drive around.

I agree -- they never encroach in your lane, they signal with plenty of room to spare, they don't threaten to pull out in front of you, in many different ways, they're preferable to human (Californian) drivers.

Right. I see them a couple times a week, and once in a while, they pass me while I am on my bicycle.

I will admit that once I became aware of a Google car coming up to pass me on my left, and I did a little jink toward it on my bicycle. It reacted conservatively but decisively. It didn't jump into another lane or slam on its breaks. It just quickly gave me some more room and gently passed.

Kind of creepy, but very cool.

I think the fear is that they'll drive the speed limit on the freeway, even when no one else is.

Actually what would speed up adoption would be if they were able to drive over the speed limit legally. If they are significantly safer then this should be a win-win for everyone other than those issuing speeding tickets.

People are afraid the cars will follow traffic laws, and drive legally, even when no one else is? And, I find it hard to believe that everyone disobeys the posted speed limits on freeways. For example, trucks with speed-limiters, people who don't want to break the law, buses, etc.

Certainly not the whole point. An autonomous vehicle that can drive me around exactly as safely as I can drive myself is a vast improvement over the status quo, in which commuting is a significant waste of my time.

Even if it was trained to drive like a human, it'd be better in many ways: It never gets drunk, doesn't get tired, doesn't get distracted, can drive old people (or anyone) around who shouldn't be driving anymore, etc.

I couldn't find the article, but i remember a project learning from remote control plane pilots. People mess up all the time. It's not even drunk or tired, there's just errors every so often, it's almost statistical. The little errors people make wash out as noise.

This almost sounds like the line from terminator two.

"It can't be bargained with. It can't be reasoned with It doesn't feel pity, or remorse, or fear. And it absolutely will not stop, ..."

That's the first Terminator. Terminator 2 would be:

It would never leave him, and it would never hurt him, never shout at him, or get drunk and hit him, or say it was too busy to spend time with him. It would always be there. And it would die, to protect him.

Yeah, but if a computer screwed up like a human we'd never hear the end of it.

I think the trick is not to mix autonomous and non-autonomous. Cities delegate autonomous only zones where no non-autonomous cars can go; this creates a transportation circulatory system and safe experimental zone which can expand - which can include just making the area of city only for autonomous cars larger or gradual commingling with non-autonomous cars. Or again partial autonomy of cars on highways / more predictable driving scenarios, like we are already seeing.

That seems like the trick to making sure they are never adopted in a big way.

I mean sure, that's an easy problem to solve but in that case why use cars at all and not people movers or the like?

But this is just wrong. Virtually the only accidents were people rear ending a stopped car at a red light. The current google cars are probably improving the general driving safety.

Yeah,make sense. But this is the first example I've seen where the network is trained only with driver's input. Has anyone seen this kind of approach before?

Thanks for the reference.

The neural net that accurately predicts human control inputs also extracts the relevant features you would want to build a more principled autopilot. For example you could take one of these nets and then for build a speed limit sign locator on top of it without training a whole net for that from scratch. Using that you could then hard code the rules for obeying the speed limit into a more traditional planner.

There are real potential advantages to training networks using human behaviors. For example, the person-to-person communication that takes place while driving might be better understood by machines this way.

- Was that an obscene gesture or a thank you wave?

- Did the other driver suggest I go forward or tell me to stop?

- etc

Wondering if they have to create near collision scenarios in order to train the network :-)

They need some sort of prefilter to remove any bad driving habits before training begins I suppose.

I imagine with enough training data from enough drivers that bad habits will disappear as noise.

Depends on the "bad habit". A huge number of people speed. How's that going to disappear as noise?

wouldn't it just converge on an average driver? not even a good one.

I'd like to believe that would be the case, but I live in Seattle.

From the paper:

To train a CNN to do lane following we only select data where the driver was staying in a lane and discard the rest.

This technology is much more universal than just autonomous cars.

Good point. But there are a lot of things like self driving cars where emulating would be a quantum leap forward...

No problem in it driving like the best human drivers around.

Here is a great, very accessible video where the CTO of MobileEye (which provides some of the components for Tesla's "autopilot") explains his views on the challenges of end-to-end learning for autonomous vehicles, and why it's preferable to decompose the problem instead (still using deep learning for the decomposed modules). https://www.youtube.com/watch?v=GCMXXXmxG-I

I'm inclined to agree, especially because it helps in 1) providing diagnostic information (such as the great driving visualizations shown in the video), and 2) makes it easier to incorporate algorithms and sensors (like with Google's cars) as a redundancy in case the neural network hits a crazy edge case.

Engineers are going to break down the problem into many subsystems and test the heck out of them.

Maybe the system can still be globally optimized, though, as long as individual subsystems are still verifiably correctly trained. i.e. lane detection, pedestrian detection could share some of the same convolutional layers and still be tested separately.

My personal prediction is that all of this 2D convolutional network stuff will be extended to 3D within a few years. The front-end will do a full 3D scene reconstruction from first principles, and then some sort of 3D features will be learned on ~that data.

Url changed from https://blogs.nvidia.com/blog/2016/05/06/self-driving-cars-3... to the paper it points to.


Cool trivia: the building is in Holmdel, NJ and is really nice but sat abandoned for years. Companies can rent coworking space now, they're rebranding as "Bell Works" (http://bell.works)


An Nvidia self-driving car. Impressive...but can it run Crysis?

In all seriousness though I'm really enjoying the number of corps exploring the space of self-driving cars. It can only make the reality of a road full of autonomous cars come all the sooner. (Though with the rate at which I see Google's self driving cars around Austin you'd think they were already out for public consumption.)

The machine learning, no model approach is kind of scary. It's likely to do the right thing most of the time, and something really bogus on rare occasions. There needs to be more than just a model trained from successful driving. Some kind of recognition of "this is bad" is needed.

Yup, like a near collision scenario to teach it to drive off the road instead of head on collision ...

In aeronautics, we were able to increase safety to an insane level by understanding the physics of the environment, formally proving and certifying algorithms, using Robust control theories that allow to formally deal with uncertainty. The power of mathematical modeling together with robust software testing and a limited/controlled use of learning algorithms is - in my opinion - likelier to bring safety to such systems rather than such an opaque use of CNNs. And I'm not even talking about human machine interaction or liability issue that would emerge from such extreme approaches (which node of the net or which training sample will be blamed?). I guess CNNs allow a lot of wannabe engineers to play with real world problems and dream that their 10 lines python code would match semantic models if fed with more training data, but I'm pretty sure Aircraft/Car manufacturers will/should not replace formally certified controls algorithms and redundant architectures built on the top of hundreds years of analytical results with a rack of NVIDIA GPUs.

Its amazing how the characteristics defined in (https://en.wikipedia.org/wiki/Contextual_learning) apply directly to this scenario. (Learning from a master )

It's mind-blowing to me that they did raw image mapping to steering angle, without any manual feature extraction whatsoever. This is just revolutionary.

Wow I guess having a self driving car is the new thing next we'll have self driving sli, can drive two cars if they are joined together with a proprietary cable :-)

More seriously, this is amazing work, I am really impressed, I just wonder if we can't get something more topical like training lasers on mosquitoes or something. I feel like I did when 3D graphics was the new thing, every day it felt like there was a new advance, now the same tech is doing the same thing to machine learning.

In case you missed it, the ninth reference is a good video of the system in action:


Seems very similar to the GeoHot self driving car experiment.

Does anyone know which camera they were using?

Nvidia is one of the main reasons my computers, I do not want this true of my cars as well.

Edit - Is a joke really worth this many downvotes? I mean who hasn't had occasional trouble with Nvidia drivers and games?

"Nvidia is one of the main reasons my computers, I do not want this true of my cars as well." If this was a joke I think the grammatical error led to confusion?

I think they accidentally a word. (My guess would be 'main reasons my computers crash' :P )

Dammit, totally. How did I not see that!?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact