I helped assemble a Donkey car at the ARM event in Seattle last week, it’s really quite elegant in the efficiency of its approach. A couple hours from unboxing to self-driving if the top parts are already printed.
I feel like a whole new world has just opened up before me. I have a rasp pi, what is the route with least technical barriers for me to get into this and start tinkering?
For DonkeyCar: follow the guide, join the slack group, ask questions!
You'll need to purchase a number of things more than the raspberry pi itself (camera, battery, car, servo controller board, wires, 3D print or purchase or build a mount for it all).
I had to look up what 'end-to-end' means in regard to neural nets and machine learning. And the answer seems to differ slightly or very much with every webpage I hit.
Stack exchange says it means 'all classifiers are trained jointly', nvidia and quora seem more in line with what I assume it means in this project. 'No human input to training sets or results as part of learning' is that what this talks about?
In the context of automated driving, 'end-to-end' tends to mean that the net learns a transformation function that maps input images to steering wheel and pedal positions. In that sense, there is 'no human input' at any stage.
In contrast to this, every significant AD effort uses hand crafted parts for most of their pipeline. Machine learning is mostly used in the initial object detection and tracking steps.
In this case, the training is supervised. It’s a simple convolutional neural network that predicts the turning angle based on the image of the line on the “road”.
Wow, they seem to be using Hierarchical Temporal Memory rather than more traditional deep learning approaches. We've seen some rather grandiose claims about HTMs' capabilities from Numenta (who I think invented the approach?) but no really convincing applications yet, so it's really interesting to see a concrete demonstration like this.
NVidia published a very interesting paper on this technique called "End-To-End Learning For Self-Driving Cars": https://arxiv.org/abs/1604.07316 . It's an enjoyable read.
The car learns to steer itself on an empty road. It's a good experiment to witness the power of deep learning and neural nets. For autonomous vehicles though, you need much more than that (e.g. sensor fusion, obstacle detection, localization, behaviour prediction, trajectory prediction, path planning, motion control, etc.).
> Compared to explicit decomposition of the problem, such as lane marking detection, path planning, and control, our end-to-end system optimizes all processing steps simultaneously. […] Better performance will result because the internal components self-optimize to maximize overall system performance, instead of optimizing human-selected intermediate criteria, e. g., lane detection.
One can imagine that it might be more difficult to get a network to solve this large problem all at once, and that there might be easier to decompose the problem and solve each part. Would it be a good idea to guide the end-to-end system by first decomposing the problem and solving each part, then using that solution as a starting guess for the whole problem? I mean, the decomposition might perhaps be a reasonable approximation of how the whole problem should be solved. (Then again, it might not.)
Perhaps I'm misunderstanding this. But why is a CNN necessary for predicting the turning angle? Seems like OpenCV would be sufficient (assuming sufficient contrast between the line and the floor). Grab image, threshold, find the largest contour around the line, grab its minAreaRect() and from that calculate the angle.
Editing to add -- this is still a cool project. I don't mean to detract from it by pointing out that I think it could be done without the AI piece.
You need to be more specific about what you want to build.
A true LIDAR system infers the round trip time of photons, either directly (time-of-flight) or based on some proxy like phase or frequency shift (AMCW/FMCW) LIDAR.
Lots of people include triangulation systems, which I disagree with. For example the RPLidar A2 is just a nicely packaged version of Kurt Konolige (et al)'s RevoLDS [1]. It projects a laser spot and takes a picture of it. Add in some known geometry, and you can use triangulation to measure distances. It's easy to build, you can do it with any old camera and a laser pointer. It's incredible how overpriced the A2 is, given that the Revo was published as $30 system.
A true LIDAR is much more difficult to build. If you go time-of-flight then you need a good laser and excellent timing electronics. If you go phase-shift then you need beam mixing/separation optics, beam modulation and a phase detector. Neither of these are easy for the average hobbyist. If you're interested, buy a laser tape measure from Leica and have a look inside to see how it works.
I belive the LIDAR-Lite is a proper LIDAR system - it ranges up to 40m which is a good hint that it's probably a true time of flight system. The Scanse Sweep is a LIDAR-Lite turned into a 2D scanner, like the sibling post. A good indicator is price. Real LIDAR systems are expensive - starting at £1k typically for something like a Hokuyo. They also tend to be much better built than the hobbyist stuff.
In principle all you need is a pulsed laser, a big collecting lense, a fast rise-time photodiode and a good (picosecond accurate) timing circuit.
UPENN had something called F1 autonomous racing competition where they design and build 1/10th version of a F1 car race. They also have a tutorial on how to build one.
Here's the link:
http://f1tenth.org/
Does anybody understand where the supervision (target steering angles) comes from? I checked Learn to Drive.ipynb but that seems to just read steering outputs from a file. Shouldn't there be manual "labeling" involved?
The steering angles are probably automatically obtained from the script you run when manually training the toy vehicle. I presume at every frame an image is captured along with the steering angle for this image.
You then create a neural net architecture that is being fed images and steering angle positions and outputs steering angle predictions. This is a regression problem that can be expressed in plain English as:
"First train the neural network with a collection of images and associated steering angles. After training, if I were to give the neural network a new image it has never seen before, what steering angle would the network predict?"
NVIDIA has a paper on it[1], and I blog about a similar coursework I complete as part of Udacity's self-driving car nanodegree[2]
This is what I don't understand. Self driving cars could be doing lots more toy simulations in toy worlds before "having to" collect data on public roads.
Self-driving cars are already reasonably good most of the time. They need to get better at specific cases, which would be a lot harder to model in a toy. Stopping distance is going to be a lot different, just for one example.