Hacker News new | past | comments | ask | show | jobs | submit login
Lane Following Autopilot with Keras and Tensorflow (wroscoe.github.io)
305 points by yconst on Jan 23, 2017 | hide | past | favorite | 72 comments

This is nice work, but anyone wanting to try it for themselves should be warned that you shouldn't unpickle data received from an untrusted source.


I copied this method of loading datasets from Keras. https://github.com/fchollet/keras/blob/master/keras/datasets.... What's a better alternative.

Another serialization format which doesn't create objects, like JSON, XML, CSV,...

glad to see this as the first comment


Was the track changed at all during the training? I'm wondering if there's some subtle overfitting here where the car learned to drive along only this specific track. It mentions this but I'm not sure what concrete actions were taken to avoid overfitting:

> The biggest problem I ran into was over fitting the model so that it would not work in evenlly slightly different scenarios.

Regardless, a very cool project.

The method to avoid overfitting was to use the model with the lowest validation loss, not training loss. I was able to change the track around my house with reasonable success. I think would need many more example turns in the training data to become robust.

Great summary, I always think it's best when machine learning projects have visuals and videos to showcase what is actually being learned.

This simple project is a good example of supervised learning from what I can tell - the network will learn to steer "as good as" the human that provides the training data. For a different (and more complex) flavor of algorithm, check out reinforcement learning, where the "agent" (computer system) can actually learn to outperform humans. Stanford's autonomous helicopters always come to mind - http://heli.stanford.edu/

Consider the fairly massive changes to the competitive landscape ushered in by the combined factors of self-driving and electric vehicles:

- For liability reasons, most of the algorithmic IP will likely be open sourced. Either because it's required by regulators or because it's the most efficient way for car makers to socialize risk of an algorithmic failure.

- Electric vehicles have many fewer moving parts, which means that the remaining parts are likely to be converged upon by the industry and used widely. This breaks a lot of platform-dependency issues and allows for the commoditization of parts like motors. As these become standardized and commoditized, and easily comparable on the basis of size, torque, and efficiency, there will be virtually no benefit to carmakers to manufacture their own. The same applies to aluminum monocoque frames, charging circuitry, etc.

Tesla currently differentiates its models based on how many motors and what size batteries, but beyond that it's mostly just cabin shape, along with new innovations like the hepa filter cabin air cleansing which will likely be a standard part of all future models.

- Battery tech works the same way as motors, with little competitive advantage to be gained by automakers, especially since most of the IP in this area is already spoken for.

Compare the number of patentable parts in a model T vs a 1998 Taurus vs a 2017 internal combustion vehicle vs a Telsa. Tesla is one innovator, and GM has already likely patented many inventions relating to EV technology back in the original Chevy Volt era.

All this is why Tesla acquired SolarCity and is attempting to make an infrastructure play rather than a technology play. Only due to Musk's rare ability to self-finance big risks is this even possible, since infrastructure moonshots featuring $30K+ hardware units are hard to fund.

How do you see car makers differentiating their products in a world where all the parts including the frame are commoditized and the software is open source?

Also, GM built an electric car back in the 90s called the EV-1. I wonder how much innovation was in that car vs the Volt.

Interior design. Cars will evolve into mobile living spaces so the quality of interior becomes more significant to exterior.

Quality, maintenance, cost of ownership, style, comfort, physical construction, actual innovation, accessories, existing brand preference; all the things they differentiate themselves with today.

I like your analysis here but how is this related to the article?

This DIY lane following algorithm made me think about it, and after reading Elon Musk's tweet about retrofitting I'd been thinking about the market and figured this thread was as good a place as any to deposit my thoughts.

Not to put down the OP's work (I think it's a great project), but I'm just wondering what advantages might an ML approach have over "traditional" CV algorithms. In a really well controlled environment lanes will be easy to detect, and computing the difference between the current heading and lane direction should be doable; maybe if we're talking about complex outdoor environments and poor sensors then ML would have an advantage? Or if we're teaching the robot what the concept of a lane is?

I think back to the days when I basically implemented lane following with an array of photo resistors, an Arduino, a shitty robot made from Vex parts and some c code. The problem is much simpler than the one presented in this article, but then the computational resource used was order of magnitudes less. At what point then, do you decide that "OK I think the complexity and nature of the problem warrants the use of ML" or "Hmmm I think neural network is an overkill here"?

Traditional CV approaches are much easier to debug as well. I chose the ML approach with the assumption that it would be easier to build a robust autopilot that would work in many lighting conditions. Actually my short term goal is to get the car to drive around my block on the sidewalk (no lines). From my experience CV approaches have many parameters that need to be tuned specifically for each environment. While ML approaches also have parameters that need tuning they stay constant between environments.

I see, that makes sense. It'd be indeed worth it if we can apply a model trained on controlled environment to a more challenging one with little to no modification. Good luck with the project and keep us updated!

Because ML approaches can adapt to different environments like a forest trail. While this can probably be achieved with OpenCV, this just feels natural: https://www.youtube.com/watch?v=umRdt3zGgpU

My first thought was something that used several PID mechanisms.

I updated this post with some of the great feedback from the comments. Also I just ported the algo used by the last DIYRobocar race winner, CompoundEye. Here's that post: https://wroscoe.github.io/compound-eye-autopilot.html#compou...


Nicely done! But I'm assuming that this is more of an exercise rather than a real-world application of ML? I say this because the task of keeping a car between two lines is trivially done using control algorithms. Of course, the CV part -- "seeing" the lines -- requires some form of ML to work in the real world.

Obviously, "Lane Following Autopilot using my brain and controls theory" would not make it to the top of HN. Welcome to the new era where Tensorflow replaces Lyapunov and ML spares you the need of understanding hard problems... until you need guarantees and safety... but but it's ok let's add more data.

I agree with you. If you can leverage control theory from the 1950s to solve your problem, what's the point?

However, I will state that using e.g. Lyapunov functions to prove the stability of the system requires a model of the system. And even if you need a guarantee for your system, that guarantee is only as good as the fidelity of your model. For an inexpensive RC car, with slippage and saturation, without torque control or inertial sensing, you're going to have a hard time doing something that sounds as principled as what you suggest.

You seem to be forgetting the entire vision pipeline that automatically extracts "lanes" and that information gets incorporated in an end to end manner requiring only true steering angles and nothing else. Its easy to comment but its not as straightforward or trivial as one might assume.

Indeed, I was not really talking about the vision pipeline. But once you decouple the problem (use ML for vision, planning for the trajectory, controls for the rest), you'll get much more stability, guarantees and insight into how to improve your problem. These kinds of end-to-end approaches are very hard to evaluate, they have zero educational value, are not parsimonious and tend to reduce people's analytical skills.

But to be able to decouple the vision pipeline you need a lot of manual annotation work which is tedious.

Tedious, and also solved for a decade already. Also, it's much easier to just find lanes using traditional CV and simply using annotators to verify the lane labels.

You can never use 1950s control theory to solve your problem? I think you didn't understand the my comment, so please let me clarify: I was claiming that even the control problems in this RC-car-lane-keeping domain can benefit from learning approaches.

Are you disagreeing with my comment? Or stating that I should have included additional points in my comment?

In any case I think I understand your comment, that in addition to the control problem, there's a perception problem.

Rather than demeaning someone's effort, learn why specific methods are considered appropriate/state-of-the-art when solving certain problems.

This is an unbelievably wrong comment. All the Lyapunov and traditional Process control theory in the world won't help you solve autonomous driving. Also regarding "Guarantees and Safety" they don't magically appear out of thin air when you use traditional process control especially in noisy domains like autonomous driving. This comment is equivalent of "I can write code to solve Atari Pong in any programming language deterministicly so any post showing Deep Reinforcement Learning is stupid"...

So control theory is ok for unmanned aerial drones but autonomous driving is just too far? Control theory can't handle noisy domains?

Guarantees of safety (more accurately stability) is the entire point of lyaponov analysis, and it's used on noisy systems all of the time (https://www.mathematik.hu-berlin.de/~imkeller/research/paper...). Can you point to a specific noisy system that control theory is ill suited for?

Once you have a path to follow, classical control theory can be used to control the steering angle to follow it.

But classical control theory hasn't been able to extract, from camera pixels, the open path in a road with cars, bicycles, and pedestrians. Camera inputs are million-dimensional, and there aren't accurate theoretical models for them.

Unmanned drones are orders of magnitude easier since you don't have anything that you can just fly into once you are above few hundred feets. They also don't have to rely on any vision based sensing. E.g. a drone has altitude, current speed, heading all of which while noisy can be represented easily as a small set of values.

The whole Lyapunov and control theory assumes perfect knowledge of sensors. Even though the signal itself might be error prone you have a signal. In case of autonomous driving even in simple cases as those described in the blogposts knowing the exact position of the markers and then using them to tune the contoller is not as easy as you might think.

The end-to-end system shown here solves three problems it processes the images to derive the signal, it then represents it optimally to the controller and then tunes the controller using provided training labels.

I cited Lyapunov, more as the ABC of nonlinear controls. Much more can be done in an analytical fashion, the "end-to-end" system here does not "solve" anything. It is a trained steering command regressor, nothing fancy, it's likely to work in this guy's living room, under certain lighting conditions, there is no way of predicting its accuracy, sensibility or anything else. Engineers have been breaking down systems into sub systems for a reason -> tractability of testing and improvement. End-to-end systems like that have close to zero value if you need something reliable.

Obviously, my message was slightly provocative, deep learning methods and classical controls (which by the way are able to quantify robustness to plant uncertainties and noisy signals) are all very useful but shall be used in combination. End-to-end techniques that bundle perception, planning and control in an opaque net are fun to play with (like in this article), it just very sad to see people believing this produces robust and safety-critical systems and we see too much of such articles on HN.

I agree with you in the sense that if a known and reliable way to map knowledge and information from one domain to another (e.g. from desired trajectory + perceived current position to steering inputs), I'd much prefer that than black box ish neural nets. Neural nets aren't meant to be the silver bullet.

But in this case though, any kind of state space control also requires rather precise knowledge of the physical laws that govern the dynamics of the vehicles. When such information is not available, can neural nets do a decent job at mimicking an analytical control algorithm? I think that's an interesting problem worth exploring.

Why learn to walk when crawling is effective? When crawling you have hard guarantees that you won't fall down.

When falling down actually corresponds to killing a pedestrian, then I'd rather try to understand the complexity of robust walking rather than observing a bunch of humans, mimic their behavior and hope for the best.

Also, general autonomous driving isn't as hard as it sounds. Given a list of time-dependent coordinates of obstacles, it should be pretty easy to navigate around such that no collision occurs. The hardest part is testing, but this is just a matter of tedious work and doesn't require great intellectual effort.

> Given a list of time-dependent coordinates of obstacles

A tree falls in an intersection because some carpenter ants chewed through trunk. Cars swerve to miss the tree and collide in an inelastic ball of nonlinearity, showering debris everywhere. You approach this at 65 mph and have 23 ft to decide what to do. Fear not, you have a list, a perfect list with coordinates, velocities, and material properties of every solid body in the area. Furthermore, without great intellectual effort, you can solve the millions of coupled differential equations that govern the dynamics of the entire system in near real time. Oh, and your list also has a measure of importance of each bit of mass, whether it is human, animal, or inert. And your list also accounts for the degrees of freedom introduced by every other car approaching the intersection, also using their own respective lists and perfect knowledge of the world to miss each other?

Do you happen to have a PhD in Robitics?

I think Rockets are straightforward too just a bottle with expanding gasse through a series of nozzles, pointed at different angles at correct time but since I know I am not a rocket scientist I dont go around claiming moon-landing was not a "great intellectual" effort.

Yep, this was an exercise to compete in the DIYRobocars race in West Oakland last weekend. There were 9ish cars with 7 running end to end Tensorflow autopilots and the others using OpenCV/line detection. Open CV one the race.

This is primarily a fun toy problem.

It uses a Raspberry Pi and ~50 lines of code. So I don't think anyone should expect it to do something that's impossible with other approaches.

> trivially done using control algorithms

Is it really trivial? Honest question... Which control algorithms are you speaking of?

As u/gumby said, I was thinking of a PID controller. Basically, the car would continuously measure how far away it is from the line and compare that with the "expected" (computed) value. Based on the error between the two values, the controller would adjust some variable (e.g., wheel angle).

Computing how much of an adjustment is required is where the PID part comes in. The controller uses the Derivative (rate of change) of the error as well the the Integral of the error to improve its estimate. These two values can intuitively be thought of as the predicted error and history of the error, respectively.

[1]: https://www.wikiwand.com/en/PID_controller

I think it's important to ponder that a PID controller is , in almost every case it is used, a heuristic controller which achieves pretty mediocre performance. Unless the plant is second-order linear system, with constant gains, and maybe some non-constant biasing (what the I term is supposed to handle), a PID controller is theoretically inappropriate, and requires tuning.

The handful of parameters you need to adjust in a PID controller parameterizes a very small class of controllers. For a given control problem, the controller you want might fall outside that class. People try to expand the class of "PID" controllers in all sorts of ways (e.g. anti-windup), but from where I stand it's just hacks on top of hacks.

It makes sense to consider a much wider class of controllers, with many more parameters, to possibly achieve better performance, or at least to avoid having an expert tune some gains in place of collecting bucket-loads of data.

As someone who's tried to control a car with PID, there is more to it than that. The angle of the car relative to the lines and the distance from the line need to be taken into account separately, since they each independently contribute to the distance from the lines as you move forward, and the PID controller can't separate them by itself. Think of how hard it is to keep straight in heavy fog when you can only see a few feet in front of the car, and you'll get the idea.

The delay in your steering response (how fast you can measure the error and turn the wheel) is large enough here that if you aren't actively taking the non-linearities of the problem into account you will oscillate off the road. The other way to fix this is to aim for a point far ahead of you such that your steering response time is significantly less than your "following distance", but that results in cutting corners.

Cool. I'd never heard of a PID controller... although it sounds vaguely related to a Kalman filter. Indeed: https://www.quora.com/Is-there-any-intrinsic-connection-betw...

Thanks also for the link to wikiwand!

Huh, I'll need to read up on Kalman filters then!

Yeah, Wikiwand is amazing. Be sure to grab the browser plugin: it automatically redirects any Wikipedia links to Wikiwand.

This is part of the reason I love HN.. I just spent my lunch learning about PID controllers! Thank you!

For context, a hardware PID controller is a commodity part you can buy for a couple of dollars.

umm...say, a PID loop?

> But I'm assuming that this is more of an exercise rather than a real-world application of ML?

While this example is simplified - and I wouldn't recommend it for a real-world full-size vehicle trial - it does implement everything (scaled down) described in NVidia's paper:


In short, the project uses OpenCV for the vision aspect, a small CNN for the model, uses "behavioral cloning" (where the driver drives the vehicle, taking images of the "road" and other sensor data like steering - as features and labels respectively - then trains on that data), and augmentation of the data to add more training examples, plus training data for "off course" correction examples...

If you read the NVidia paper, you'll find that's virtually all the same things they did, too! Now - they gathered a butt-ton (that's a technical measurement) more data, and their CNN was bigger and more complex (and probably couldn't be trained in reasonable time without a GPU), plus they used multiple cameras (to simulate the "off-lane" modes), and they gathered other label data (not just steering, but throttle, braking, and other bits)...but ultimately, the author of the smaller system captured everything.

Furthermore, NVidia's system was used on a real-world car, and performed quite well; there are videos out there of it in action.

This is virtually the same kind of example system that the "behavioral cloning" lab of Udacity's Self-Driving Car Engineer Nanodegree is using. We're free to select what and how to implement things, of course, but I am pretty certain we all understand that this form of system works fairly well in a real-world situation, and so most of us are going down the same route (ie, behavioral cloning, cnn, opencv, etc). Our "car" though is a simulation vehicle on a track, built using Unity3D.

> Of course, the CV part -- "seeing" the lines -- requires some form of ML to work in the real world.

Actually, it doesn't. The first lab we did in the Udacity course used OpenCV and Numpy exclusively to "find and highlight lane-lines" (key part was to convert the image from BGR to HSV, and mask using the hue). No ML was required.

That said - I wouldn't trust it for real-world vehicle driving use - but it possibly could be used as part of a system; however, as NVidia has shown, a CNN works much better, without needing to do any pre-processing with OpenCV to extract features of the image - the CNN learns to do this on its own.

I might be missing it, but I don't see instructions for installing TensorFlow/Keras on the Raspberry Pi in the Donkey repo or in this blog post (needed to actually run the trained model, it looks like). For TensorFlow, there are pre-built binaries and instructions to build from source here:


Note: I am the owner of this repo

Donkey runs a client on the Pi and a remote server that runs Keras/Tensorflow.

A ha! Very cool- apologies for not seeing how it worked at first; I assumed you used the server to control/collect data manually, and then loaded the model onto the device. Thanks for the demo!

Two major errors: 1) This doesn't seem to be controlling overfitting on the right validation set. 2) There isn't a test set at all (separate from validation).

Using Keras' "validation_split" parameter will just randomly select a validation set. This is not the right thing to do when your data is image sequences, because you will get essentially identical data in training and validation.

Because of this, the numbers/plot here might as well be training accuracy numbers.

Keras uses the end of the data set as validation, and only randomizes it if the "shuffle" argument is set to True [1].

[1]: https://keras.io/getting-started/faq/#how-is-the-validation-...

Except the second half of the data is the flipped of the first half (X = np.concatenate([X, X_flipped]))

Well shit. Thanks for pointing that out. I'll revise.

Apologies if I'm being stupid, but I can't find the details on how to physically connect the hardware together anywhere. Is this still on the todo list? I'm interested in applying this tutorial and making an autonomous RC car.

What I would love to see is an end to end neural network soln. On one end camera input comes through, on the other outputs for speed and steering angle.

But rather than a black box, it's explainable what the different layers are doing. If neural nets are turing machines then we should be able to compile some parts of the net from code.

Then the net is a library of layers. Some Layers trained with back prop, some compiled from code.

Almost all neural nets are not Turing complete. Only very specific RNNs are; most RNNs aren't, including pretty much any RNN model used in the real world right now (https://uclmr.github.io/nampi/talk_slides/grefenstette-nampi...).

Also, this is a useless fact, because so many other random things are Turing complete.

I'm working on adding the throttle. This is difficult because you need to drive the correct speed and stopping or running off course can mess up the training data. This project was inspired by Otavio's carputer which does predict throttle, steering angle, and odometer.

The end to end approach of regressing steering wheel angle already exists, check nvidia's paper.

X is in the range 0, 255. They don't show code converting it to a much saner range for the network they've chosen. Is the full source somewhere?

Have you already looked at the pickled data? Because it looks like the model is outputting a single label value out of 256 labels; depending on the training data (steering angle) and how it is represented in the data (signed float or integer?), each one of those 256 learned should (?) be similar - I think.

Again, I'm not an expert. Or - maybe it is outputting a number 0-255, and then taking that number and converting it (and maybe other operations) into values suitable for the servo on the car (perhaps centered around 0 - so -128 to 127 or something like that - then scaled for servo PPM width or whatever values needed)...

All guesses, of course.

The input values are image arrays 120x160 pixels with 3 channels for red,green,blue. The values range from 0-255 and are not normalized before they are fed into the convolution layer. I found this did not make a difference.

The output of the model is a single real number between -90(left) and 90(right). I believe a better approach would be to bin the outputs and use a classifier. This way you'd know when the model was getting confused (ie, approaching a perpendicular line.

Also the full repo is mentioned (I think the article is just a general highlighting of the full repo):


Before even opening the link, I was thinking, which jurisdiction would it be legal to program "personal" autopilot.

Awesome tutorial.

I remember seeing keras in commaai source. Who else uses it ?

Tensorflow anounced last week that it will incorporate Keras as its default higher level abstraction. It's a pleasure to use.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact