
A popular self-driving car dataset is missing labels for hundreds of pedestrians - yeldarb
https://blog.roboflow.ai/self-driving-car-dataset-missing-pedestrians/
======
yeldarb
This is really scary. I discovered this because we're working on converting
and re-hosting popular datasets in many popular formats for easy use across
models... I first noticed that there were a bunch of completely unlabeled
images.

Upon digging in, I was appalled that fully 1/3 of the images contained errors
or omissions! Some are small (eg a part of a car on the edge of the frame or a
ways in the distance not being labeled) but some are egregious (like the woman
in the crosswalk with a baby stroller).

I think this really calls out the importance of rigorously inspecting any data
you plan to use with your models. Garbage in, garbage out... and self-driving
cars should be treated seriously.

I went ahead and corrected by hand the missing bounding boxes and fixed a
bunch of other errors like phantom annotations and duplicated boxes. There are
still quite a few duplicate boxes (especially around traffic lights) that
would have been tedious to fix manually, but if there's enough demand I'll go
back and clean those as well.

~~~
tomnipotent
> This is really scary.

No, it's not even remotely "really scary". No one is putting an actual self-
driving car on the market using this specific data set. Disingenuous to
pretend this is any indication of the data using by serious companies in the
space or is represented of the impact a few mislabelled samples have on the
ability of these systems & algorithms to generalize.

~~~
mumblemumble
It's scary when you consider two other factors:

First, the AI hype train. People think that calling something "Artificial
Intelligence" implies that it is artificial, yes, but also, critically, that
it is intelligent. Many enthusiastic people, and also many policymakers, don't
fully realize the extent to which machine learning is constrained by both the
quality and nature of its training data, and the capabilities of the larger,
non-intelligent, software framework that it's being plugged into.

Second, that we have one case study - the post-mortem analysis of the fatal
pedestrian collision in Arizona - that strongly indicates that commercial
products are not free of the sorts problems being highlighted here, and that,
unlike what others have suggested, misclassification problems aren't
necessarily an issue that's isolated to individual frames and that will come
out in the wash when the software is dealing with a stream of frames.

Me, I think that self-driving cars are probably a lot like nuclear power. In
theory, yes, it is a great idea. In practice, there are a lot of little
details that one _must_ get right, and there seem to be a whole lot of
opportunities for flaky engineering decisions and incompetent public policy,
both enabled by insufficiently-tempered optimism, to scuttle the whole thing.

~~~
ummonk
Given that nuclear power is significantly safer than other forms of power, are
you asserting that the risks of self-driving cars are more about PR and
perception than actual risk?

~~~
mumblemumble
I'm saying that nuclear power _could_ have been a safer energy option, but, in
practice, the whole enterprise has been scuttled by a bunch of regrettably bad
decisions that have pretty much destroyed everyone's trust. So now it doesn't
really matter if it's safer, because it can no longer realistically be
considered an option.

~~~
keenmaster
If everyone believes that everyone else will be irrational, then they
themselves will not throw their support behind nuclear power, rendering their
stance on nuclear de facto irrational. As Baby Boomers age, and
Millennials/GenZ form a greater percentage of the voting population, we have
an opportunity to press the reset button on nuclear. The younger generations
don’t really have a solid opinion on the matter, and probably don’t think
about it much.

Scientists are in the best position to influence the public and push for
change. It is astonishing that scientific organizations have been timid about
nuclear energy, or outright against it, because of a few people with the most
extreme, empirically unjustified stance.

~~~
kozd
Is it a tautology or just an unfortunate nash equilibrium?

~~~
keenmaster
As you know, a nash equilibrium is the point at which no player can benefit
from a move away from the equilibrium. However, that's a concept from game
theory. What is the game here? Who are the players?

Part of my point is that you might think that the game is "scientists vs.
politicians, ignoramuses, and fear-mongers," but that's not the case.
Sometimes they're all on the same side. Imagine a 4 player split-screen Star
Fox, where the scientist leaves their controller and helps the others shoot
down their plane because they don't think they can win. If that sounds
baffling, it is.

------
stared
It is a self-correcting problem: these pedestrains won’t be present in the
next dataset.

~~~
sgustard
People should learn not to go outside if they're not labelled.

~~~
netsharc
What a funny future it'd be, if we have to wear something distinctive (giant
QR codes?) so we don't get killed outside. As a side effect it would make
tracking us much easier...

~~~
mumblemumble
It would be less inconvenient than what I already have to do to avoid being
killed by human drivers. Getting to the other side of a street according to
regulation procedure can easily require taking a 10 minute detour. It can be
worse than that in the suburbs.

~~~
mtrower
I suppose this differs depending on location and what your perspective is.
Here, it can be quite a chore for human drivers to avoid killing pedestrians.
They'll just wander out across the crosswalk, nevermind that their walk-light
is red, and proceed leisurely across without a care for the speeding traffic
(that has the right-of-way to begin with).

To say nothing of drunks.

\--- Anecdotally, I do also remember, as a pedestrian, waiting for 10 minutes
or more in subzero temperatures at an un-managed T crossing (crosswalk but no
lights), waiting for an opportunity to cross. Traffic just kept coming and
coming from all directions; this was a frequent problem at the time of day I
needed to cross. I suppose there's opportunity for improvement on both sides.

------
haditab
I acknowledge the issues in the dataset and that it has a lot of stars on
github because it's from Udacity; but calling it 'a popular self-driving car
dataset' is misleading as it implies this dataset is popularly used for self-
driving cars when it is in fact only a small dataset Udacity uses to teach the
basics of training neural networks for self-driving cars.

I've been involved in the autonomous vehicle industry for a while and have
been focused on perception for most of it. Most research papers will test
their models on popular datasets for self-driving cars and show the results as
a sort of benchmark. I've _never_ seen this dataset mentioned anywhere. Heck
the size of the dataset is an order of magnitude smaller than most of the
popular ones as well.

This is just a github repo. That's it.

~~~
thanatropism
Are these larger datasets routinely subject to the same kind of inspection
this titanic.csv of self-driving car datasets?

~~~
yeldarb
I _hope_ so. I've personally tested Scale's labeling service and it was much
higher quality than this dataset. But it's a pretty secretive industry so I'd
bet some companies' data is better than others.

It'd be interesting if the NHTSB had a held-back "test set" they used to
evaluate self driving cars before letting them on the road.

------
Animats
These are manually tagged stills, right? Not video? That's a data set for
training CAPCHA breakers, not self-driving. You need to use video, where you
get to see the same objects at different ranges. Recognition gets better as
you get closer. Then track the recognized objects backwards to when they first
appear, and try to recognize them at smaller sizes.

~~~
myself248
I'm tangential to the field, but not directly in it myself, and I tend to
agree. The goal of these systems should be to "perceive" their environment,
not just to "recognize" it.

Perception means understanding that just because a truck (that we recognized 3
frames ago) went behind a tree, doesn't mean it ceased being a truck.
Furthermore, this knowledge should be used to refine the model, to say "hey I
can still see the wheels, I know those wheels were attached to that truck a
moment ago, therefore I still know where the truck is, even if I can't
recognize it plainly as such right now".

Furthermore, even if it's completely out of view, it's still there, probably
moving close to the same speed and track it was. And if its path intersects
ours, we need to assume that it'll reappear at some point. And the longer it's
out of view, the bigger are the errors on its estimated position, according to
our knowledge of the acceleration and braking limits of trucks of that type.

I've never heard of anyone even working towards this sort of perception, much
less having achieved it. And until we get there, these things are all toys.
Dangerous, legally nebulous toys.

~~~
lahwran
you're correct that this sort of persistent world modeling is needed for self
driving cars, but from what I've heard from friends who work in the industry,
both cruise and waymo have it. they're very far from using a plain CNN on
their video cameras, they've got depth mapping and such and carefully
constructed software making use of the perception data to model how the world
will change and react to that. idk if it works well, but they definitely know
they need it and are trying.

that said, I've driven a tesla on autopilot, and holy crap is was so
incredibly bad. I'm optimistic about self driving cars in general, but not
about tesla's. it will frequently lose track of the road lines at night and
fail to make turns, suddenly beeping at you that you're in control now, with
no warning! I only ever used it like cruise control, but I can't understand
how anyone driving a tesla would dare use the tricks that allow bypassing the
restrictions that prevent taking your hands off the wheel.

------
Fricken
Well, if an autonomous vehicle outfit were running unmonitored Level 4
vehicles on public roads using only an open source data set I'd be worried.
Even if it was labelled thoroughly and correctly, there isn't nearly enough
data in any open source dataset to train an autonomous vehicle perception
system that can operate safely without human supervision. This is not a safety
critical issue.

~~~
cmiles74
The Uber vehicle that ended up killing the pedestrian in Arizona was running
under conditions similar to the ones that you outline here. The only notable
difference was that a person was monitoring the vehicle.

~~~
dylan604
The only notable difference was that a person was _present and supposed to be_
monitoring the vehicle.

Fixed that for you

~~~
ClumsyPilot
And it will happen again. There were cases of people forgetting Nuclear
Weapons on a runway for a whole day by accident. People, eventually, will fuck
up. Mislabelled datasets will be used.

------
danbrooks
I work in the AV space. There's a lot of ambiguity in labeling.

How should a crowd of people be annotated? A line of parked cars? A photograph
of a car?

Examine the training set!

~~~
PeterisP
A big thing is not that the example is missing, but that it counts as a
negative example.

I.e. if during training a ML system notices the ambiguous combination (i.e. a
woman pushing a baby stroller or a crowd) and marks it as a pedestrian, then
it gets penalized in a manner that teaches it to ignore these ambigious
combinations and treat it as nothing; while in practice it should probably
treat such ambiguous combinations as _even more_ "avoid-worthy" as an ordinary
pedestrian.

The problem is that the default assumption is "clear road, you can drive
there" \- so what we need isn't "pedestrian detection" that finds pedestrians
and only pedestrians, we need detection of random stuff that you shouldn't
drive over. If a kid is wearing a weird Halloween costume, that doesn't look
like a pedestrian, but it is one; If somebody has set up a tent in the middle
of a supermarket parking lot, that's not a pedestrian but it should be avoided
just like one.

~~~
myself248
Likewise, I have yet to see one of these things that can recognize potholes
and swerve to avoid them lest a wheel be ripped off.

Sooner or later it'd be nice to be able to drive one of these things in a
place that isn't southern California.

~~~
jacquesm
That's a feature, not a bug. Swerving for potholes can be very dangerous, more
dangerous than having undercarriage damage. If a pothole surprises you enough
that you have to swerve you were either not paying attention to the road or
you are following too close.

~~~
PeterisP
Regardless of what's the appropriate action to take given the context
(ignoring, swerving, slowing down, a timely proper change of lane) it's
probably not controversial that potholes should be identified by a car vision
system and taken into account. And from a computer vision perspective there's
no qualitative difference between "just" a deep pothole and a lane-wide ten
foot deep sinkhole or a construction pit that's unmarked for some reason, it's
just a matter of size.

------
sheepstrat
My knee-jerk is to be upset because the risk factor of a poorly functioning
pedestrian classifier is obviously higher than something like a sentiment
analyzer, but at the same time, this dataset is just for educational purposes,
right? Is Udacity actively recommending people use this in production
settings?

~~~
yeldarb
They claim[1] they're working on building an "open source self driving car"
but it looks like the project hasn't had much activity recently.

[1] [https://github.com/udacity/self-driving-
car](https://github.com/udacity/self-driving-car)

------
a_t48
Hold on a second here - are unlabeled pixels used in training a NN to do
detection? Will a typical NN get trained to label those pixels as “not a
human”? I agree that they should be labeled, but it’s the difference between
needing to throw more data at the problem (because you aren’t getting as much
learning per image as you could) and actively training the car to do something
bad.

~~~
yeldarb
Yes, if you don’t “punish” incorrect predictions in your loss function your
neural net could just get “perfect” accuracy by putting a giant bounding box
around the entire image.

Technically that’s “right”; it did put a box around all the obstacles just
like you asked it to. But that “solution” is not useful. You want it to find
what it’s looking for and _only_ what it’s looking for.

In this case, if it detects an unlabeled pedestrian the loss function will
penalize it a bit for that “wrong” answer and it will slightly deviate to try
to not find _that_ pedestrian but still find the correctly labeled examples.
It’s trying to fit the examples you give it best as possible.

------
Der_Einzige
Until self driving is possible with purely unsupervised data, self driving
cars and other algorithms relying on massive datasets are unlikely to get
really strong because label inaccuracies like this are inevitable

Also, far worse has been found in large scale datasets. Pretty sure there was
CP found somewhere in imagenet

------
Karliss
Isn't training against subset of data and then validating against rest a
common practice? It wouldn't detect all the mislabeling but should detect some
indicating that manual inspection is required, assuming error isn't very
systematic.

~~~
yeldarb
It is, and there are some interesting techniques published recently to help
mitigate things like this. But if you don't have a good ground truth you're at
the very least flying blind and at worst feeding garbage in and getting
garbage out; your models will learn what you tell them to learn.

------
shoeffner
While I am happy about their efforts, it's interesting that in the bottom left
image of their example I can clearly see another unlabeled car in the lower
left half (standing at the sidewalk of the street). Also, I am not sure, but
it seems like there's a cyclist on the sidewalk, visible between the stroller
and the car (the wheel and hands are more clearly visible). The google image
marks a car and a few pedestrians, but completely misses the traffic lights at
the junction.

So I guess even their fixed dataset still misses many labels, if already their
showcases miss some.

~~~
yeldarb
Hey, OP here, yeah you’re correct. The dataset doesn’t label any obstacles
that small/far in the distance. I zoomed in on the region with errors for the
sake of the screenshot.

Here’s the original run through Google Vision AI. They actually don’t get the
pedestrian either: [https://imgur.com/a/84IVTV6](https://imgur.com/a/84IVTV6)

(I fired up the labeling tool I use and grabbed a recording of the few seconds
of video around that frame to give an idea of what’s labeled in the dataset
and what’s not at that imgur link as well)

~~~
shoeffner
Nice, thank you for the upload and thank you for clarifying that small objects
are not labeled, that explains it. I was suprised because other images do
contain rather small labels for traffic lights or even cars, but I guess it's
always in the eyes of the person who labels the data.

I think you did an amazing amount of work and huge improvements over the
original, have you considered contributing the changes back upstream?

~~~
yeldarb
I plan to. They use a custom CSV format that my labeling tool can't work with
so I converted everything to VOC XML. I need to write a script to convert back
to their format to submit a PR.

Not sure if they'll accept the PR though; the original data had a
"visualization link" back to the labeling company on each line which I can't
reproduce.

------
djsumdog
So, do they claim to be human managed datasets where people drew all the
object bounds?

I wonder if they used their own classification AI on sample sets and just
called it a day. AI blind leading the AI blind?

~~~
yeldarb
They claim it was ("The dataset was annotated entirely by humans using Autti"
via [https://github.com/udacity/self-driving-
car/tree/master/anno...](https://github.com/udacity/self-driving-
car/tree/master/annotations)).

But after looking at the data I'm almost certain there was some "tool assist"
going on. There were dozens of frames in a row with phantom bounding boxes in
the exact same location which made it look in some way automated (maybe user
error combined with a "copy bounding boxes from the previous frame" feature?)

------
marta_morena
It's a bit like saying "We found this store and it doesn't label kitchen
knifes as potentially dangerous"... Nobody in their right mind would use this
data set for developing a self-driving car and if one does, then they will not
get a license to operate... And if by any chance they do it's still like
stabbing other people with a knife snatched from the shelf a supermarket. You
just don't do it.

------
joshvm
This is the same for MS COCO:
[https://github.com/AlexeyAB/darknet/issues/4085](https://github.com/AlexeyAB/darknet/issues/4085)

Lots of datasets are semi-automatically generated, with human oversight.
Sometimes annotators miss things or are just plain lazy.

------
FanaHOVA
What's the dataset size? If there's billions of pedestrians tagged and a
couple hundreds are missing, would it actually have a big impact on the
training? Also looks like a lot of the people in those images are off the
road. What's the current standard in AV? Is everything tagged on the sidewalk?
(Genuine questions)

~~~
zentiggr
If they are reporting a 33% image error rate, I would expect a large effect on
accuracy no matter what the individual annotation error rates were.

I am unfamiliar with the detailed tagging standards but that also seems
irrelevant until the more egregious problems are resolved. Get everything
untagged first, then look for smaller scoped issues. And thanks Roboflow for
doing whatever amount of this.

------
sneak
> _REDACTED accelerates your computer vision workflow through automated
> annotation quality assurance, universal annotation format conversion (like
> PASCAL VOC XML to COCO JSON), team sharing and versioning, and exports
> directly to file format, like TFRecords._

This is an ad, posing as a sensationalistic blog post.

------
lolc
This just shows how clueless AI is still. Imagine your driving instructor
insisted there was no person with a stroller on that corner. You'd dump them
on the spot!

Only when the machines start to complain when they are fed shitty data, we can
talk about them being fit to drive.

------
gdsdfe
Do they really think people are using these datasets for commercial
applications?

~~~
ClumsyPilot
Do you really think there is any possible fuckup that won't happen sooner or
later?

Will staff of a nuclear silo forget to lock the door, and then fall asleep?
Because that happened in US

Do you really think someone will put a plane in production with a single
safety critical sensor with no backup or fall back?

Do you really think someone will pour dissolved uranium down the drain,
starting a nuclear reaction and dying horribly?

Do you really think someone will crash a spacecraft into mars, by mixing up
imperial and metric units?

~~~
gdsdfe
None of these is similar! The off the shelf dataset won't help you achieve
nothing commercially, maybe learning or tinkering at best.

------
trhway
amplification of the old complex/neurosis - not only people don't notice me,
now robots too!

    
    
      Cellophane
      Mister Cellophane
      Shoulda Been My Name
      Mister Cellophane
      'Cause You Can Look Right Through Me
      Walk Right By Me
      And Never Know I'm There...

[https://www.youtube.com/watch?v=WKHzTtr_lNk](https://www.youtube.com/watch?v=WKHzTtr_lNk)

------
allovernow
This seems like the kind of problem that is suitable for and important enough
to throw hundreds or thousands of (volunteer?) people at. Imagine assembling a
massive public corpus of human verified training data - we'd collectively be
that much closer to a technology which will change society! This problem is
screaming for a consortium effort on behalf of major corporate entities in the
space, who could each benefit without revealing internal secrets.

~~~
kevingadd
If you've ever had recaptcha ask you to identify stop signs, traffic lights,
buses etc you're helping improve Waymo's data set for free :) Maybe it'll be
shared with the public instead of used for profit in a decade or two.

