1. Imagine the problem, add a camera or two (or more).
2. Build/use a pre-trained ImageNet model as a starting point (probably using TensorFlow/Keras).
3. Build a dataset, split it into test, train, validation sets.
4. Train the model further.
5. Test and validate the model. Lower the error rate (don't overfit though!).
As far as what language to use, depending on the speed of whatever you're trying to do, Python would likely work fine in the majority of cases. If you need more than that, C/C++ is around the corner.
Oh - and OpenCV or some other vision library will probably be used (but just to grab the images, maybe a little pre-processing).
You wouldn't have to use this exact pipeline (you could substitute other deep learning libs, other vision libs, other languages, etc) - but the basics are to start with a well-known CNN model, preferably "pre-trained", then apply your own dataset(s) to the task to get it to work better. Not much more tweaking needed, the biggest thing is to get (or be able to synthesize from what you do have) enough data to throw at it (and have a fast enough system to train it in reasonable time).
We've seen this approach many, many times; it seems to work well for a ton of domains and problems. Again - very "Lego-like"...
I can't comment specifically on airborne drones, but at my lab we've demonstrated that a robot with an extremely low resolution camera and a very simple model is capable of learning to avoid running into walls with about a minute or so of training data.
We use reinforcement learning with linear function approximation, and even though the robot sees the world through a couple hundred pixels, it's sufficient to discover that walls have a certain color and that if too much of your vision is "wall" you should probably move in a different direction.
The advantage of deep learning is that your agent hopefully learns to generalize, and so isn't fooled by changes in brightness/color or room layout.
If your task is simpler than that, you could just use some OpenCV filters to extract colors and textures and let the a simple linear model figure out which ones correspond to obstacles.
0. An iRobot Create, basically a Roomba without the vacuum.
1. Incidentally, that means that if you don't require deep reinforcement learning, you can get simple obstacle avoidance up and running with like 5000 data points.
RL (Q-learning?)with a linear approximation wouldn't work if you have subtle patterns in the image (poor contrast between walls and floors, a gradient, a border, etc.) and that's exactly the issue with robots: not detecting things that seem obvious to us.
The output of most machine learning algorithms is just a belief function. Deep learning is nice, because it basically removes the need to manually choose features, which can be the hardest part of applying machine learning to solve a classification task. But the output is still a belief function.
Machine learning as we generally know it today isn't about making a computer understand "concepts" or anything higher order like that. I think it is easy to compare learning a color histogram to learning classifications (e.g. couch, floor) because the two algorithms do exactly the same task in different ways.
The parent is saying that the function the robot needs to learn is linear. It doesn't matter that it's a drone, and the deep learning apparatus in the middle is overkill, because learning linear functions is easy (you don't need much data to figure out which way is up on a line)
It's not even a belief function, in the sense of a normalized probability distribution that respects conditionalization properly. It's basically just a one-hot vector for classification.
Sure, it's different from human-like perception, if that's really what the deep net is learning to do.
But the burden of confirming that it's learning "concepts" instead of, say, dedicating a million parameters to implementing Sobel filters or wavelet transformations, or something even more trivial like "if all my pixels are one color, I am probably near an obstacle" is not on me.
When I approach a deep learning problem, my default assumption is that the model is out to humiliate me by learning something entirely trivial, and so I go to great lengths to augment my dataset and validate the fact that I got some extra mileage out of spinning up the ol' GPU that wouldn't have been possible (or at least, not as easy) with simpler methods.
Because if you can use something simple, why not do it?
For robots, it's maybe a full page of code to try some quick image filters, flatten them, and implement SARSA or Q(λ).
Our demo used Pavlovian control (basically, TD methods to estimate the likelihood of running into a wall, and turning if a collision seems too probable).
You can run it on a Raspberry Pi in real time, no GPU required, including the robot it costs less than $300, and it doubles on sax.
When I'm done with my current project I'd like to try it with a drone, because aerial demolition derbies sound like the next great spectator sport.
0 . There are techniques you can employ, for example: examining individual units or clusters of them for their response to different frames, deconvolution, and of course, messing with the inputs.
But this is rarely done, because it takes time away from using the magic hammer of DNNs to nail yet another previously difficult problem.
This is understandable, but it makes me wish I had the time to develop some tools for performing quick and easy trepanation on deep models so that examining the representation becomes as easy as the training part.
1. My colleague Marlos Machado has written a paper that seems relevant to this sorta thing: https://arxiv.org/abs/1512.01563 .
By looking at what Deep Mind's Atari DQN was doing (or what it seems to be doing) and developing the analogous linear features, you can get performance that is almost as good with a model that ingests data many times faster.
That is, their median score was better than or equivalent to Deep Mind's.
When it comes to RL, if you can use linear methods it's a huge help-- you know you're going to converge, probably quite quickly.
2. Subject to technical conditions, e.g. you're dealing with a stationary ergodic MDP, Robbins-Munro stepsizes, the environment's oblivious, the algorithm's on-policy, no purchase necessary, see in store for details etc.
3. It might not be as good a representation, but maybe a faster reaction time is more important?
And also because it could be that quadcopters are an entirely different kind of beast and really do need some of that old time deep learning religion, and I don't want to fall into the trap of thinking that the problem is simple when it's not.
With these sorts of questions you either have to do the experiment yourself or pay a guy from Google to tell you what they've done (and then be prepared to litigate).
Institutional memory is something of a weak point in the software industry compared with other fields.
But a reliable way to improve your performance (or at least write a publishable paper) is to examine why a specific technique worked, rewrite it as a differentiable function or finite automaton, and then implement it as a component in a deep net.
So it's a cause for concern, but also an opportunity for academic arbitrage.
I think it will work out eventually, but I become uncertain when I meet machine learning "experts" who are only familiar with deep learning and unwilling to consider anything else.
My programming methodology has been validated at last!
Jumping off a rooftop or a mountain wearing a wingsuit is practically suicidal. If something goes wrong, you die. The margin of error is simply too small.
The latter form of wingsuit flying is relatively new and highly controversial, even within the wingsuit and BASE jumping communities.
I don't know that i'd describe it as 'controversial' but would rather describe it as 'that thing a bunch of people with nowhere near enough experience or currency to be doing it keep doing and fucking killing themselves.'
We do often (as instructors) talk about how nervous we get when we're with a student that we're pretty sure is just going to flatspin uncontrollably for like 8k ft and it's just like 'Okay heres everything you need to really have a bad afternoon. Dont? Please?'
Here's that story: https://vimeo.com/167054481
(Not the definition of 'Expert System' I was thinking of, however)
Seems like that's an essential to claiming you have expertise.
Edit: looking at the paper, they apparently use many physics based models of the car as a basis, but then use ml to mix the models together.
The car, on the other hand, uses hand written algorithms to forward simulate various controls. Based on the forward simulations, it can pick controls which are predicted to give good results. Forward simulation relies on a model of how the car reacts to any possible control. However, this model is complicated because of the nonlinear dynamics going on (inertia, wheel slip, etc). Therefore, they use ml techniques to identify the model.
We write programs that predict how the car will drive given steering inputs. Because we're not sure, we write several programs that give slightly different answers.
Given a driving input, all the programs predict the future : Parent comment called this "forward simulation."
We pick the program that has worked well in the past and do what it says to do - that program drives the wheel of the car.
We measure what actually happens to the car. We then remember which algorithm actually gave us the right answers (might be different from the one we picked to steer) - next time, we'll trust that one more.
Because it's annoying to keep writing more programs, we figure out what we can tune - like a left / right balance knob on the stereo or a base / treble knob. In this case, it might be a "ground slipperiness" or friction knob.
So as well as picking the programs, we ask the algorithm to tweak the "friction" knob and try to pick a setting that seems to match reality.
In the flying case:
We make a "black box" full of sheets of numbers and put a picture into one side. Each dot in the picture does some maths with the first sheet of numbers which makes a new "picture" for the next sheet.
We run maths based on remembered numbers and the answer (say 0.0 - 1.0) tells us "safe to fly" or not. Lets say 1.0 is safe (0.0 unsafe, in-between unsure).
Once we figure out that a given picture was safe we go backwards through the sheets of numbers to apply "back propagation" and change them - we make the "safe" picture output something closer to 1. Perhaps it output 0.50 before, now that same picture outputs 0.51. If the picture was unsafe, we adjust the other way.
We do that LOTS of times. Eventually safe pictures output 0.91 and unsafe ones 0.12 or something. We show the computer a new picture, and we call the answer "Safe" (say 0.8-1.0) "unsafe" (0.0-0.2) and unsure (0.2-0.8). We fly only towards pictures which are "safe".
Everyone pops champagne. We didn't learn much - only that lots of numbers can solve more wacky problems than before. It's hard to generalise what the computer "learnt" or really understand it.
Now in the future maybe they want to add the ability for AI drivers? It would sure make things interesting! Or boring depending on how good the AI does.
Or would the physics be too complex to model well for simulation?
The elegant thing about using machine learning is that you don't need to build any models at all. And you can develop the ML technique once and then reuse it to train different hardware configurations, instead of incurring the cost of modeling every one.
Also, note that the physics of the simulation doesn't even need to be realistic. Unless you are doing high-speed control or aggressive maneuvers, the challenging part is the perception and not the control. In the paper from OP, the controls are even high-level discrete actions: left, forward, right.
It would also be interesting to see if the learned policy corrects for perturbations. If we tilt the drone by hitting it, will the policy stabilize it again?
While this is a really cool result, I suspect that this approach might not be the best way to control UAVs. Dragon flies are ready to fly, avoid obstacles, perch on stuff, hunt down prey right after warming up their wings for the first time. This implies that a good amount of the flight behavior is 'hard-coded.'
Although I really can't wait until someone expands upon this approach. So instead of outputting left or right, the network could output 'stick vectors,' which translate to control stick commands. Maybe even have the network take in some sensor data and a 'move in this direction' vector. Add in a pinch of sufficiently fast video processing and we could probably learn how to do fly through an FPV course or do aggressive maneuvers to fly through someone's windows
My understanding of the way this is being done is that the output from the machine learning model is already a simple "left", "right", "straight on", so it's not really responsible for stabilization anyway.
That side of things is likely being handled by the drone's control software which takes those inputs, translates those into what angle the propellers need to be at to achieve it, and then translates that into the correct rotor speeds. If you hit the drone the gyroscope will pick up that it's at the wrong inclination, feed that information into the control software, and the control software will adjust rotor speeds to correct.
Also, the flight had an almost organic quality to it somehow. Spooky, but cool.
Funnily enough, if you've ever implemented a Pacman clone, this is also how the ghost AI works.
So the question is, what does autonomous mean and what is it adding?
Sure, the tagging of objects in the field of view in this model may be unnecessary but you leverage an existing model that should allow the drone to 'think' beyond the current limited "obstruction here". It could at-least have been used as a base model to build upon.
Personally, I'm also looking forward to neural networks modeled after real brains  .... but the tech to accurately scan the complex interconnections in larger brains seems far away.
If my plane is about to crash, do I really care whether it's a mountain or a building?
eg: If it's a human "Bob" : listen for command. If no relevant command, detour & continue to goal.
If it's a vehicle in the middle of the road, wait for it to move and then go no with your path if there are no other moving vehicles on a collision trajectory with you.
It's definitely a step ahead of what the OP are doing.... but isn't it a more practical approach ?
The point I'm trying to make is that training an NN to efficiently detect objects has been a solved problem for quite some time now. We should give more attention to experiments that take it a step ahead.
+ Not going through glass (although humans do not always exceed in that themselves)
+ Going through a fly door curtain
+ Going through smoke
+ Not going through a mosquito net.
+ Not going through a fountain.
+ Pushing through soft objects.
+ Pushing a half open door.
I think you mean 'whether'.
I think you mean 'move'.
A similar approach using unsupervised learning would be even cooler...
Why did they use an input that does NOT provide any information about depth/distance from objects?
Also, off-the-shelf depth sensors can add a lot of weight to the drone. It might still be possible to fly with the extra weight, but now the drone will be more sluggish and fragile. It would be great if commercial drones had a built-in depth sensor.
Distance sensors such as sonar and proximity sensors are usually very noisy and they are susceptible to interference (if you use more than one).
They clearly didn't need it. Human pilots clearly don't need it.
Extra custom sensors might produce more noise than they are worth.
Image processor is a hot topic of research in CNNs.
The equivalent on this setting would be adding a second camera.
I'm not criticizing, their experiment is pretty cool, I was just wondering why they chose to use only the camera on board.
This is really overstated. It really only matters for about 3 or 4 meters of distance. We do depth perception well enough at distances to drive cars.
We also do just fine at perceiving depth in video games and single lens camera.
How often do you have trouble determining the depth of something in a movie? Only about as often as the filmmakers want you to.
As for radar, most planes don't have their own radar. About all you're gonna get for anything close to depth is al altimeter.
Edit: I guess there are some differences -- I think this is what I was remembering:
...but maybe there was further work that is more closely related.
Luckily, that possibility is too remote even for Hollywood.
I was curious, in the article it mentions difficulties navigating through glass environments, could they combine visual information with sonar to avoid crashing into glass and other transparent barriers?
> The gap between simulation and real world remains large especially for perception problems.
Also the cost of a human crashing a plane is a bit more than a drone crashing itself, such that it's probably better to save on planes, and invest in simulators - whereas developing an accurate physics simulator for the purpose of training a drone might take more time/money than just letting it crash, and figure it out itself.
The amount of time needed to master is more a combination of deep practice, motivation to master & the individual learning rate.
'The talent code' talks about deep practice. Deep practice is the 80 in the 80/20 principle.