
An Unintuitive Take on Data Augmentation for Self-Driving Cars - matthew_cooper
https://towardsdatascience.com/when-conventional-wisdom-fails-revisiting-data-augmentation-for-self-driving-cars-4831998c5509
======
Animats
That's scary. Using deep learning to recognize "obstruction" using cameras.

Discriminate "car", "bike", "pedestrian", and "bus", sure. Recognize signs and
traffic lights, fine. But "obstruction"? No. That's a geometry problem. If
it's not flat road, it's an obstacle. Deep learning is for deciding what kind
of obstacle. Because deep learning just isn't that good. It's going to be
badly wrong a few percent of the time.

 _" The car will almost never be on the left side of the road, and the cameras
will never flip angles, so training on flipped data forces the network to
overgeneralize to situations it will never see."_ What could possibly go
wrong?

~~~
function_seven
> _" The car will almost never be on the left side of the road, and the
> cameras will never flip angles, so training on flipped data forces the
> network to overgeneralize to situations it will never see." What could
> possibly go wrong?_

That quote jumped off the page for me as well. This sounds like the start of a
new blog post: _Falsehoods Programmers Believe About Driving_

You can throw a few more in there.

    
    
        - "Pedestrians will use crosswalks"
        - "Pedestrians will be walking"
        - "Traffic signals will always illuminate one of the three lamps"
        - "Traffic signals have three lamps"
        - "Lane markings are either white or yellow"
        - "Lane markings exist"
        - "There are lanes"
    

Your other comment is right on I think. Relying on DNNs to get near-100%
coverage on possible scenarios is a fool's errand. There's a very long tail of
possible circumstances on the road that no amount of training data is going to
cover. When you're trying to classify your photo library, this is okay. When
making real-time decisions on live input, it's not going to be okay. A more
structured and _scrutible_ system is needed.

Until there's a breakthrough in our ability to understand what a NN "thinks",
I don't think we should place too much trust in them.

Or, I guess, we can wait and see if the imperfect NNs are doing better than
humans in the long run. From what I read, that may be true today only in ideal
circumstances (sunny, dry, maintained roads, etc.)

~~~
mikeash
Structured, scrutible systems perform really badly on this task and I don’t
think anyone knows how to do much better. The current state of the art is an
extremely unstructured and inscrutable system with no design behind it
whatsoever, and that system gets a _lot_ of people killed. The moment anything
can do better than that, we should push it hard even if it has stupid
failures.

~~~
Piskvorrr
With the obvious assumption that we _do_ have something better. So far...nope.

~~~
mikeash
With the assumption that we will, not that we currently do. That’s not a
given, but it seems likely. The bar to clear is not high.

~~~
Piskvorrr
And what was an assumption, again becomes a circular axiom by the end of the
line: "[it will be easy to do, because] the bar to clear is not high, [because
it's easy]" Do you have any data to support that? (Or more precisely, are we
talking about the 80/20 Pareto bar? ("it is okay, as long as it kills fewer
people per million miles, _on average_ "))

~~~
mikeash
It’s not circular. I’m referring to specific objections people raise, like
misclassifying objects or failing to react or whatever. The current state of
the art often spends several seconds at a time with its cameras pointed at a
screen instead of at the road.

~~~
function_seven
I appreciate you continuing to refer to sacks of meat as "the current state of
the art". It does drive the point home that we don't need to outrun the bear,
just the other camper :)

~~~
Piskvorrr
I suppose that's a part of the issue: the SDV camp has been overenthusiastic
in flying their "Mission Accomplished" banners, and hitting (pun not intended)
yet another unexpected problem right afterwards. In other words, we're not at
that point _yet_ \- in fact, we might not even have the complete toolset to
_measure_ this.

The field is in flux - alchemic approaches "what if we try something
unrelated" might work for unrelated reasons etc.; practical applications that
don't spontaneously combust are still some way out there.

------
zaroth
Forgive me hijackjng a self-driving ML discussion to ask a novice question on
self-driving — “stateful” vs “stateless”, and how can state be used safely?

Specifically I mean pre-determined saved/downloaded knowledge about the
current route being driven, or your specific instantaneous location. Anything
from annotated maps, to LIDAR scans.

It seems like the paradox is that stateful algorithms can dramatically improve
performance by factoring in data that real drivers benefit from tremendously,
namely, familiarity with the road.

However detecting when known state is violated due to some temporal change
(road work, accident, natural disaster), and being able to shift back into
“general flight” rules, seems in some ways to be even trickier than not using
historical state in the first place.

So you need an algorithm that can learn to drive better than a human, without
having basically any contextual knowledge of the road it’s driving on. In
other words the algorithm needs to be a better driver the first time it ever
goes down a road than a human who has driven the road for years.

As a human, familiarity with the road makes a tremendous difference for how I
drive it. I generally drive the same roads 80% of the time (I.e. commuting)
and knowing what to expect around each bend absolutely changes how I drive the
road, even down to where the speed traps will be.

Any ML driving solution that depends on super-fancy stateful pre-scans of the
environment seem fundamentally flawed. If you can’t drive a road safely that
isn’t pre-scanned in hidef LIDAR for instance, I don’t suppose you can safely
drive that road on an arbitrary Monday. Maybe solutions like this were never
even attempted, but certainly some amount of statefulness is inherent in some
of the commercial solutions out there (Supercruise?)

So what kind of state can you use safely? First thought was basics like speed
limit, number of lanes, type of road surface, and your algorithm may have
predictions of upcoming changes being constantly weighted against the current
assessment of the present state.

So maybe the fundamental rule is no hard-coded state that can’t be reliably
detected in real-time to the point where at some point a real-time signal is
able to over-ride the programmed state?

But then it seems that inevitably you get to the point where your real-time
classifiers are basically running the show anyway so are you back to - what
good is a Map anyway?

Would love interested but accessible readings on the subject, or maybe it’s so
off base that it’s not really part of the discussion?

~~~
CamperBob2
The question of statefulness reminds me of some not-immediately-intuitive
observations from game development. Say you're working on potentially-visible-
set computation for a first-person shooter. Where do you spend your
optimization efforts? Do you put a lot of work into taking advantage of object
coherence between frames, for instance? Chances are good that something that's
visible in one frame will remain visible in the next, right? So if you start
with that assumption you might think you could save some time by not bothering
to test any surfaces on that object for visibility until some nominally-
unrelated condition is met, such as the player turning rapidly in place.

You can waste a lot of time thinking about optimizations like that, but at
some point it'll occur to you that it's a giant waste of time to optimize for
anything but the worst case, where visibility information computed during one
frame is completely unusable in the next frame for whatever reason. Otherwise
all that your clever frame-coherence hacks can ever do is speed up the
rendering of the sorts of frames that weren't going to dominate the player's
perception of the game's performance anyway. An engine that renders 95% of
frames at 60 FPS and 5% at 43 FPS is going to look pretty terrible, so you're
usually better off putting work into the slowest frames rather than wasting
time looking for hacks and shortcuts that make the fast frames even faster.

Likewise, yes, you can assume that the car will almost never be traveling
backwards on the left side of the road or whatever, so the temptation to take
advantage of that is going to be high. But the cases where that assumption
breaks down will hurt the user's experience badly, possibly disastrously. So
you're better off without relying too much on assumptions that contain phrases
like "hardly ever" or "most of the time" or "typically."

~~~
Piskvorrr
The problem is a nigh-unbounded problem space: so many variables that there
are myriads of corner cases.

~~~
CamperBob2
The AI people would argue that simply following the prime directive ("Don't
hit anything") covers a multitude of driving sins, and they're not wrong in a
technical sense. But the prime directive doesn't cover all of them. Driving
defensively involves much more than just not hitting stuff.

My favorite example is a humorous image that went around several years ago, a
photo that depicted someone driving in a Miata or similar convertible with the
top down. The convertible was driving behind a sewage truck, the kind that has
a large tank with a hose attachment to clean out septic tanks.

The sewage truck, in turn, was heading straight for an overpass that obviously
had nowhere near enough clearance. An alert human driver would have no problem
anticipating what was about to happen, but the oblivious one in the Miata was
clearly about to find out the hard way.

Every time I find myself idly wondering if would be fun to work on self-
driving cars, I flash back to that image. Then I get back to work on whatever
I'm actually supposed to be doing.

------
Piskvorrr
"and the car will always be on the right side of the road (assuming US driving
laws)." In other words, overfitting to USA, questionably usable anywhere else,
definitely not in UK? Okay, this quote is worth preserving.

------
wcoenen
I find it interesting that the project team had been implementing several
"improvements" that actually worsened performance, and that it took an intern
to figure this out.

Or are these augmentations default-on options in deep learning frameworks?

~~~
oldgradstudent
You're right to put the "improvements" in scare quotes.

He improved the performance in normal conditions, but he might have made the
performance outside normal catastrophic.

When you can't kill anyone more often than once in 100 million miles, this
might not be the right tradeoff.

------
YeGoblynQueenne
>> It’s worth noting that these augmentation tricks won’t work on datasets
that include images from different camera types, at different angles and
scales.

No comment on why this is the case?

------
w_t_payne
It is interesting to compare this result to the results on domain
randomization on synthetic data...

