Hacker News new | past | comments | ask | show | jobs | submit login
A popular self-driving car dataset is missing labels for hundreds of pedestrians (roboflow.ai)
411 points by yeldarb 8 days ago | hide | past | web | favorite | 191 comments

This is really scary. I discovered this because we're working on converting and re-hosting popular datasets in many popular formats for easy use across models... I first noticed that there were a bunch of completely unlabeled images.

Upon digging in, I was appalled that fully 1/3 of the images contained errors or omissions! Some are small (eg a part of a car on the edge of the frame or a ways in the distance not being labeled) but some are egregious (like the woman in the crosswalk with a baby stroller).

I think this really calls out the importance of rigorously inspecting any data you plan to use with your models. Garbage in, garbage out... and self-driving cars should be treated seriously.

I went ahead and corrected by hand the missing bounding boxes and fixed a bunch of other errors like phantom annotations and duplicated boxes. There are still quite a few duplicate boxes (especially around traffic lights) that would have been tedious to fix manually, but if there's enough demand I'll go back and clean those as well.

> This is really scary.

No, it's not even remotely "really scary". No one is putting an actual self-driving car on the market using this specific data set. Disingenuous to pretend this is any indication of the data using by serious companies in the space or is represented of the impact a few mislabelled samples have on the ability of these systems & algorithms to generalize.

> No one is putting an actual self-driving car on the market using this specific data set. Disingenuous to pretend this is any indication of the data using by serious companies in the space

Are you sure about that? What about "non-serious" companies? Various fly-by-night self-driving startups?

I mean, this sounds like the machine learning equivalent of "no serious business is pulling random bits of code from StackOverflow or whatever tutorial popped up first in the search results", and yet I think everyone who worked with software for more than few years can confirm that companies absolutely do it, and a lot.

(Tangentially, PHP got a bad rap in large part because of this - people not knowing how to do things correctly copying code from tutorials written by people who didn't know it either; these days, JS ecosystem is getting a bad rap for having such bad code wrapped in neat and easy-to-install NPM packages.)

I've mentioned to a few people now that the first generation of self-driving cars don't scare me that much. It will be weird, but I assume that special care will be taken and the cars themselves will be quite risk-averse. I'm scared about the next generation of self-driving cars when we have "solved it" and the race is on to get cars out quickly and at low cost (and things like in-depth testing are out of budget). Hopefully regulations and testing frameworks catch up by then, but who knows.

> I assume that special care will be taken and the cars themselves will be quite risk-averse.

Bold assumption that's already got a counterpoint: https://en.wikipedia.org/wiki/Death_of_Elaine_Herzberg

Wikipedia already has a list: https://en.wikipedia.org/wiki/List_of_self-driving_car_fatal...

Along with your pedestrian death there were five (!!!) driver deaths.

There's a reason why places like Germany don't want Tesla to use the term 'autopilot' because it's reckless endangerment.

>There's a reason why places like Germany don't want Tesla to use the term 'autopilot' because it's reckless endangerment.

If Tesla weren't a US company I'm sure they'd have already been sued for billions for labeling their assisted driving system autopilot.

The US is the country where you can pay millions for not having a sticker on the microwave showing that you shouldn't dry your cat in there.

IMO to be "scared" of self driving cars they need to be more dangerous than any other random car on the street today, and the bar for that is pretty low.

Nope, with other cars you know the factors that increase your risks (fog, drunk driving, distractions). You can make decisions, like not getting into a car with a drunk friend.

With machine learning you never know when it might mistake the back of a semi for an overpass or soemthing.

Also, not to fear monger too much, but if this ends up happening there will inevitably be a period where people still assume that self driving cars are foolproof, and as a result, a lot of pedestrians/other drivers will get blamed for incidents that weren't in fact their fault.

Thank you for pointing this out. Every time they show road accident statistics as an argument for the safety of self-driving cars or airlines, I wonder how is the risk conditional on me in the vehicle. Surely the risk substantially decreases than the population statistics in case of road accidents, since I am much more prudent than the average. However, the risk remains the same in the case of airlines or self-driving cars since I am not in control.

No pedestrian makes the decision to get hit by a drunk driver, which is the more relevant situation in the context of this post I think

Are you sure about that? What about "non-serious" companies? Various fly-by-night self-driving startups?

Fly-by-night self-driving startups will at some point run into issues with the NHTSA. George Hotz self-driving car project was shut down as soon as he announced that he'd start selling some prototype.

States require permits just to be allowed to test self-driving prototypes on the open road. I don't know what the requirements are, but they're probably subject to some kind of review that's more than just filling in a form.

Given HN's general suspicion of government regulation, especially its quality and accuracy, I'm surprised to see people argue here that it's sufficient to counter the "move fast and break things" mentality. Especially given that regulatory arbitrage (moving from CA to AZ) was involved in Uber's vehicular manslaughter.

They haven't shut down, they are selling their prototype and I think they have recently put out a new model.

It doesn't look like it was shut down? http://comma.ai

> In the days that followed, Hotz abandoned his plans to sell Comma One directly. Instead, he embarked on a regulatory workaround: this current path of selling just the hardware and open-sourcing the code behind it, meaning it is available for free. He says often, “We are not selling a consumer product.”

No one stopped Uber, no one has stopped Tesla (Tesla hasn't been able deal with regression testing on a lane finding system...)

> Various fly-by-night self-driving startups?

What is the alternative other than holding these companies responsible for the deaths they might cause ? The only solution I see is mandating very high insurance requirement than a human driver. For example 25M per death they might cause.

Perhaps reasonable regulations involving extensive pre-street testing and solid public documentation of ML design decisions and training data sources before allowing use in uncontrolled environments? Also, criminal liability during initial deployment?

Of course, real responsibility is probably unfashionable...

(edit: I'm personally in high-stakes time-sensitive diagnostic ML development, so...)

> these days, JS ecosystem is getting a bad rap for having such bad code wrapped in neat and easy-to-install NPM packages.)

Is it? IME nobody seems to give a flying fuck outside of HN. Much like the PHP days. Or maybe I've never been in the right workplaces to notice.

I work in Ruby on Rails (currently with a React frontend, but previous jobs used Angular) in the Midwest. IME almost everything around the "recent" JS ecosystem (npm, npx, node, yarn, grunt, react, jsx, etc) are used (begrudgingly) when necessary, but are just the butt of jokes 99% of the time.

It's scary when you consider two other factors:

First, the AI hype train. People think that calling something "Artificial Intelligence" implies that it is artificial, yes, but also, critically, that it is intelligent. Many enthusiastic people, and also many policymakers, don't fully realize the extent to which machine learning is constrained by both the quality and nature of its training data, and the capabilities of the larger, non-intelligent, software framework that it's being plugged into.

Second, that we have one case study - the post-mortem analysis of the fatal pedestrian collision in Arizona - that strongly indicates that commercial products are not free of the sorts problems being highlighted here, and that, unlike what others have suggested, misclassification problems aren't necessarily an issue that's isolated to individual frames and that will come out in the wash when the software is dealing with a stream of frames.

Me, I think that self-driving cars are probably a lot like nuclear power. In theory, yes, it is a great idea. In practice, there are a lot of little details that one must get right, and there seem to be a whole lot of opportunities for flaky engineering decisions and incompetent public policy, both enabled by insufficiently-tempered optimism, to scuttle the whole thing.

> First, the AI hype train.

Yes! I really wished we would call this something more descriptive of the boring statistics involved. Artificial Intelligence is just too cool sounding name not to get excited about it. And what people don’t realize is that this is really just a statistical inference model, nothing intelligence. Machine Learning is better, but it is still not descriptive enough.

I think if we called it something like Computational Reinforcement Modelling or Iterative Weight Inference people might stop having the idea that there are machines making “smart” choices involved and finally seeing that really we are just inferring based some computations on provided data.

Really, “Machine Learning” deserves no more hype than the boring sounding Kalman Filter.

...I like the Kalman Filter even if I might not be quite up to programming one. Every time I drive a modern car with a shitty automatic transmission that glitches and stutters and fails to shift right, I think "Is this thing using a Kalman filter, or what fundamentally wrong thing was done with the programming here?" I wish someone would reveal to me whether the algorithm is not the panacea I wish it were, or else what a lousy programmer does when they don't know about it.

Given that nuclear power is significantly safer than other forms of power, are you asserting that the risks of self-driving cars are more about PR and perception than actual risk?

I'm saying that nuclear power could have been a safer energy option, but, in practice, the whole enterprise has been scuttled by a bunch of regrettably bad decisions that have pretty much destroyed everyone's trust. So now it doesn't really matter if it's safer, because it can no longer realistically be considered an option.

If everyone believes that everyone else will be irrational, then they themselves will not throw their support behind nuclear power, rendering their stance on nuclear de facto irrational. As Baby Boomers age, and Millennials/GenZ form a greater percentage of the voting population, we have an opportunity to press the reset button on nuclear. The younger generations don’t really have a solid opinion on the matter, and probably don’t think about it much.

Scientists are in the best position to influence the public and push for change. It is astonishing that scientific organizations have been timid about nuclear energy, or outright against it, because of a few people with the most extreme, empirically unjustified stance.

Is it a tautology or just an unfortunate nash equilibrium?

As you know, a nash equilibrium is the point at which no player can benefit from a move away from the equilibrium. However, that's a concept from game theory. What is the game here? Who are the players?

Part of my point is that you might think that the game is "scientists vs. politicians, ignoramuses, and fear-mongers," but that's not the case. Sometimes they're all on the same side. Imagine a 4 player split-screen Star Fox, where the scientist leaves their controller and helps the others shoot down their plane because they don't think they can win. If that sounds baffling, it is.

Check this concept car from 1957: https://en.m.wikipedia.org/wiki/Ford_Nucleon

We could have had your safe nuclear power involved in every road accident. Yeah!

I think the point is rather that it's an unpredictable-stakes game. That is, if you make few enough mistakes, you'll be fine for a long time until you make the wrong mistake at the wrong time and it kills somebody.

Maybe a better analogy is driving a car, that is, nuclear power and self-driving cars are both vaguely like driving a car.

However, my intuition is that nuclear power has less variables than driving a car. I say intuition because I don't know much about neither.

I think the point is that self-driving cars trained on poor data are no better than poorly trained superhumans at driving.

"However, my intuition is that nuclear power has less variables than driving a car."

During the cold war they launched fission reactors into orbit. That probably had more variables. Then again, since the nuclear airplane project was unsuccessful, maybe that had the most variables of all.

Well, nuclear power isn't harder because you put it in a car, but it's a great way to add to the complexity of the system as a whole.

Putting a nuclear reactor in space (as we still do, I think the most recent Mars rover had one) greatly reduces the risk of it directly causing deaths, since it's really far away from humans for most of its service life.

As long as self driving vehicles are small passanger vehicles, they are not able to kill more than a few people.

If you look at the Uber accident, killing 1 person severly harmed the company.

The only extremely harmful scenario that I can imagine in self driving is a bad remote mass software update (which is sadly possible even in an otherwise great company) causing lots of accidents at the same time in different vehicles.

I'm not sure what point you're trying to make, but it has nothing to do with some mislabeled examples. Every system using supervised training assets has labeling issues, and the world hasn't ended yet. Take a look at Google Translate. This is only "scary" if you're ignorant of the problem.

The OP is selling something, and it's in their best interest to spread FUD to sell it.

That's probably the least charitable interpretation possible. No idea who the OP is, in any case.

It's scary that the only way we know how to build something that detects pedestrians in an image with any kind of reliability is to use training data.

You: All training data has labeling issues. Me: Training data is the only way we know how to build some aspects of systems. Other people here: Some of these systems are safety-critical.

I doesn't feel like FUD to be concerned that we need to have strategies for mitigating quality issues in training data. That if you don't have anything like that, then you cannot be in the market of building a safety-critical system. As you have identified, labeling issues are always there. Kind of how human error is always there. Okay, so now that you believe that, you have to find ways to mitigate it.

But when I hang around people who do self-driving cars, they don't have a good answer to how to mitigate this risk. Some of them don't even really believe in it (which is different than being ignorant of the problem), because they argue with enough data things will get good enough. It's all really sloppy.

Instead of just saying "This is FUD", why not tell us why you feel so reassured that this is not a problem? So far, you said "this always has been a problem and always will be a problem", "the world hasn't ended yet" and "Google translate". What?

re: The world hasn't ended yet: okay, but we also haven't been building self-driving cars

> No idea who the OP is, in any case.

The person that originally posted this thread that works for the company that's selling this FUD. At the bottom of the blog post:

"Roboflow accelerates your computer vision workflow through automated annotation quality assurance"

This is a non-issue exasperated by a for-profit corporation that's creating an "issue" they conveniently have a service to help you fix.

I think you missed the key part of the post you're replying to:

> tell us why you feel so reassured that this is not a problem

Because to me you're coming across as the old-school "you don't need QA if your code is perfect" type.

Nobody gets killed by Google Translate etc. messing up, so no sweat, throw together something that works 99% or the time.

But for safety critical systems it's six nines reliability or GTFO.

> Nobody gets killed by Google Translate etc. messing up, so no sweat

It's not so much that "nobody gets killed by Google translate messing up" as "when people get killed by Google Translate messing up, I can't tell".

I think Google Maps is more relevant to self driving. Google Maps has become, in my opinion, the best way to navigate, far superior to any other GPS-enabled option I know of. But the better it gets, the more obvious it is that you can't rely on it totally. If self-driving was feasible tomorrow or in the next decade, navigation would be much better. And if anyone could do it, Google would.

Think of the trope about idiots driving into lakes or off the road or whatever because they were following navigation. And now think about unconditionally trusting Google Maps or anything else, for say, a year. You know it's not up to that standard.

I think you can debate whether it needs to be an order of magnitude better, or multiple orders of magnitude better.

Again, how does that have anything to do with a public data set? The amount of pure BS FUD in this topic is astounding.

> People think that calling something "Artificial Intelligence" implies that it is artificial, yes, but also, critically that it is intelligent.

I mean, the exact same argument could be made about people. Just because you belong to Homo Sapiens doesn't imply you're an intelligent being at all times - e.g. drink a bottle of vodka and the sapiens part is gone.

The thing with people is that we all share the same brain architecture - we mostly think alike. As a species, we have a hundred thousand plus years of experience dealing with each other; as a civilization, a couple thousand. We've explored most corner cases, designed our infrastructure around those, and built systems protecting us from outliers. For instance, people deemed too unpredictable will not be given driving licenses (and extremely unpredictable people tend to get locked up). And DUI is severely punished; while the execution isn't perfect, the threat of consequences goes a long way towards reducing it.

That's the bar an AI system has to meet today. And one big hurdle is that, viewed as minds, AI systems are completely unlike our own. They're extremely unpredictable. On top of that, they lack self-preservation instinct (this may be indirectly a side effect of how we structure companies - the level of effort put into an AI system is a reflection of how much people working on it care about getting it right vs. getting it to market).

I agree entirely with your point, and wanted to follow a small tangent:

> they lack self-preservation instinct

I suspect that a self-preservation instinct, even in relatively dumb animals, is enormously complex. It might seem simple to us because it's so evolved, so hardwired. If I had to bet, I'd bet that a mouse's self-preservation instinct is more complex than the first-to-market Level 5 self-driving car will be.

And that's good, because I definitely don't want automated vehicles to have more than a hint of self-preservation wiring.

That's a good point, and yes, I definitely don't want an automated vehicle to have self-preservation instinct. What we're all after is people-preservation instinct, which is not entirely unrelated to the former.

But the broader reason I brought up self-preservation is that in humans, it's not just about not getting killed in a crash. It's also about not having your life ruined by causing it, even if you walk away physically unscathed. A person has personal, deeply visceral stake in driving safely. A self-driving car AI currently doesn't, because it's not a mind; what it has is the reflection of priorities in the company that made it. And as it turns out, companies can kill people and get away with it, so I don't feel confident about how safety is prioritized (see also: this article).

> what it has is the reflection of priorities in the company that made it

I think it might even be more like Platonic shadows on a firelit cave wall of the priorities of the company that made it.

At the current level that self-driving is at, the cars' control software is only able to perceive and respond to situations that its programmers anticipated and specifically coded for. That could be heuristics and rules in the hard-coded bits, but, also, even the most sophisticated deep neural net out there isn't even particularly capable of perceiving, let alone understanding, things that weren't coded into its training set. Meaning that, even if the company's intention was that the car should not plow into a pedestrian holding an enormous cellophane-wrapped Edible Arrangements™ fruit basket, that intention doesn't have any bearing on the vehicle's behavior if the software instead classified that object as a plastic bag blowing in the wind.

Great points. And I'd add that the other implementation issue here is the human system any company's intentions get filtered through. So the interests modeled are not the entire corporation, but C-suite intentions passed through executive intentions, manager intentions, and then worker intentions.

E.g., it might be in the long-term interest of the company to not run over pedestrians. If nothing else, the PR cost is very high. But when an exec wants to be first in the field to demonstrate personal success, then hitting a made-up date becomes the priority. Which implicitly puts "not kill people" lower. Middle managers don't want to get blamed, so they'll favor a more complex, muddled organization structure. Per Conway's law, that means muddled code. So instead of the car's software reflecting "don't run over pedestrians" as a key goal, its true priorities are things like "gives a good demo", "was delivered 'on time'", and "reflects an architecture clever enough to get that architect promoted that quarter".

It's not impossible that good software comes out of this system, but the deck is certainly stacked against it.

It’s because we are intelligent that we want the bottle of vodka in the first place.

It's scary not because this specific dataset was used to train Teslas that are on the road today. Rather, because it makes us aware of an entire class of errors that most of us probably hadn't thought about before. I guess you are absolutely certain that training data used in production cars will be free of these issues, but it's not clear why.

It does not make use aware of a new class of errors. Labeling issues is nothing new, but plenty of systems trained on them continue to work just fine. This is FUD.

Is there any statistical/mathematical tool to completely eradicate or greatly diminish the effects of bad labeling? Is there any reason - other than the combination of pure circumstance and gut feeling of the Data Scientist in charge of saying that it's good enough to deploy - that ~33% insanity in training doesn't become ~33% insanity in the system?

Not sure about bad labels, but semi-supervised learning is the term for training on data with a lot of missing labels. Essentially the algorithm makes predictions on the unlabeled data and uses its highest confidence predictions as additional training data. Generative models can also "dream up" entirely new training examples. There is a risk of amplifying the confidence in bad predictions, but it works well overall (better than using only the labeled portion of the data).

> Is there any statistical/mathematical tool to completely eradicate or greatly diminish the effects of bad labeling

Yes, it's called statistics and probability theory.

> Yes it's called statistics and probability theory.

My understanding of statistics is:

- I can halve the % insanity by adding another 100% of good labels.

- If I want to reduce the insanity of labels to 1/33th of ~33% I need to add another 3200% of good labels.

- If I want to reduce the insanity to 0% I need to balance the bad labels with an infinite amount of good labels.

Is there anything I'm missing entirely except probability theory? Is probability theory the answer or is there something else?

You don't reach 0%, that's a straw man. The goal is better than human, and the 35,000+ vehicle-related fatalities that happen in the U.S. each year.

There's a disconnect here.

People who talk about the danger of humans driving cars always seem to talk about the raw numbers, because humans drive cars a lot and the raw numbers are rather large.

But when we talk about automated driving, it's in percentages, because it's not being done on the same scale.

So to compare apples to apples, you'd have to convert the number of fatalities to an accuracy percentage. Have you considered trying? There is certainly more than one way to do it, but it would greatly contribute to the discussion if you made some attempt.

> you'd have to convert the number of fatalities to an accuracy percentage

Telsa's early results for their very limited "self-driving" technology has shown a huge reduction in accidents for any given period of time the vehicles are on the road.

That seems like it incorporates a lot of assumptions. I think it's best to slow down and realize that comparisons don't mean much if you're comparing the wrong things. The first step is to determine the first thing that you are comparing and exactly what it is. Then you can move on to the other half and determine whether it is appropriate.

Humans are much safer than people on average, when driving in conditions suitable for Autopilot.

> Humans are much safer than people on average

This makes zero sense and isn't how "average" works. For the same 1000 hours on the road, a Tesla car with Autopilot will have fewer accidents than a car driven for 1000 hours by humans. This changes as driving conditions get worse, and humans outperform Autopilot.

Deleting half a sentence and saying it doesn't make sense?

The way "average" works is that you average over something - a population or set. It is very important to be clear about what that something is and whether it's appropriate.

Why do you believe that Autopilot outperforms humans in comparable conditions? If this is based on Tesla marketing, I'm extremely prejudiced against them, and assume out of hand that they simply aren't making the right comparison and don't care. However, if you think that is incorrect, you could elaborate on why you have the opinion you do.

The average of something cannot be more than the average of... itself. Thus, "Humans are much safer than people on average" is nonsensical.

> Why do you believe that Autopilot outperforms humans in comparable conditions?

Because they have the data that proves it?

> I'm extremely prejudiced against them

And I've chosen to take them at face value with a grain of salt, and to believe that for the data they've collected from the hundreds of thousands of Tesla's with millions of hours of data using Autopilot, it's fair to say they have a large enough sample to draw conclusions about the safety of their cars vs. any incident rates from pretty much any other distribution.

"Thus, "Humans are much safer than people on average" is nonsensical."

Does it make more sense as "Humans, when driving in conditions suitable for Autopilot, are much safer than people on average"?

"I've chosen to take them at face value with a grain of salt, and to believe that for the data they've collected from the hundreds of thousands of Tesla's with millions of hours of data using Autopilot, it's fair to say they have a large enough sample to draw conclusions about the safety of their cars vs. any incident rates from pretty much any other distribution"

You seem to be saying that if you have a lot of data it doesn't matter what you compare it to. That seems wrong to me. Also, I don't have this data, and you are not bothering to help me find it.

> Humans, when driving in conditions suitable for Autopilot, are much safer than people on average

The immaterial distinction between "humans" and "people" still makes that sentence confusing. I take it that you mean "driving a mile (either as human or autopilot) is safer in conditions that are good for autopilot than driving a mile in average conditions"? Or more directly, isn't your question really "are the conditions the same for the averaged human drivers and the averaged autopilots"?.

"isn't your question really "are the conditions the same for the averaged human drivers and the averaged autopilots"?"

My implication is that I severely doubt the conditions are the same, when somebody touts a comparison, and I would need clear and convincing evidence otherwise to change my mind. As well as strong evidence of good intent and trustworthiness by the source of the information.

It's not just about being intentionally deceitful, but about the fact that it's hard to do the right comparison, so people feel justified in giving up on it.

"Our secret telemetry dataset gives us reason to believe that the cars produced by us are not safe."

Would a statement like this be a surprise for you?

Only because you made it up to spread FUD. Good job.

It's hard to reach 0% bad labels because:

1. You can't have an infinite amount of good labels 2. Humans are in charge of labeling too.

The question is if you can reliably overcome the number of bad labels in your training set, so that 33% of bad labels equates to <33% "insanity" in the system.

Your understanding is wrong for anything nonlinear. The whole reason machine learning is useful is because it is nonlinear.

How nonlinear are we talking? My understanding is probably closer to the truth than to the opposite of the truth. I'm looking for an estimate of how far from the truth I am.

How would a system reliably discredit missing labels while still learning from good labels? The simplest solution would be that system is able to spot the bad/missing labels itself with some certainty, but that seems like a catch 22.

That's correct. I know what goes in and what comes out, not what happens in the middle. How does ~33% insanity in become < ~33% insanity out?

Edit: Parent was edited, was previously (paraphrased)

> I'm guessing you have no technical understanding of how this works

How does making up something ridiculous like "33% insanity" give you anything that's resembles a subject that we can discuss? Hyperbole in, hyperbole out.

I'm 33% insane myself. I believe that's part of what makes me human.

We have a lot more than the gut feeling you're assuming: https://arxiv.org/abs/1611.03530

Maybe we could use deep learning? Oh, wait...

The scary part isn't necessarily this dataset, but that unlabeled data causes a silent decrease in model performance -- which can be esp important for underrepresented classes.

Thank you. I can't believe the grandstand here.

I understand your concern and share it myself. This is an important time and we should be really careful training these things.

However, training as used in the real world isn't on a still frame only basis, it's used in sequence. And while a single frame might be missing a label, I bet that at-speed most everything important gets labeled correctly enough to be better than a distracted human driver, or the average human driver for that matter.

Then, of course, my intuition might be incorrect.

>And while a single frame might be missing a label, I bet that at-speed most everything important gets labeled correctly enough to be better than a distracted human driver, or the average human driver for that matter.

Would you bet a family member? That a distracted driver is a hazard does not mean other drivers are safe, or even safer.

>Would you bet a family member?

"think of the children!"

Lives at stake don't change anything here. The question is whether self-driving cars, even with the errors, are safer for people than regular drivers on average. If so, then absolutely yes everyone should bet their lives and their families'.

Thousands of people are dying every day in cars. This is not something we need to wait for it to be perfect. It only needs to be better.

I think people intuitively ascribe a moral dimension to whether people are involved in accidents, hence why they are more worried about dying on an airplane than in a car, even though the former is much less likely than the latter.

If I die in a plane, I have no control over the matter. If I die in a car, then at least I may have had control, and it could be chalked up to my inattentiveness, bad driving behavior, etc. The latter then implies the individual has direct control over whether they live or die, and can thus ensure life by being a responsible driver. I bet the same reasoning will be applied to self driving cars, even if they are orders of magnitude safer, just as the case with airplanes.

The great part of self-driving cars is that just like when there are major airplane accidents there is a thorough learning process afterwards and all flights become safer as a result on a frequent basis.

This is the same reality for self-driving cars. The edge cases will happen but they won't keep happening with rigorous improvements in the models and trained behaviour.

Flight safety has improved dramatically over the past century and the ability for self driving cars to adapt will be even faster, as it's mostly just software.

Of course this implies that an accident had to happen for such an improvement to take place. But we have both simulation improving drastically to help alleviate that plus the alternative is the current situation where the same types of accidents keep happening again and again with only the occasional improvement to car technology and safety features.

> Lives at stake don't change anything here. The question is whether self-driving cars, even with the errors, are safer for people than regular drivers on average. If so, then absolutely yes everyone should bet their lives and their families'.

Maybe logically that makes sense but from an ethical perspective I argue it's much more complicated than that (e.g. the trolley problem)

In the current system if a human is at fault, they take the blame for the accident. If we decide to move to self driving cars that we know are far from perfect but statistically better than humans, who do we blame when an accident inevitably happens? Do we blame the manufacturer even though their system is operating within the limits they've advertised?

Or do we just say well, it's better than it used to be and it's no one's fault? When the systems become significantly better than humans, I can see this perhaps being a reasonable argument, but if it's just slightly better, I'm not sure people will be convinced.

I'm voting for the "less dead people" option. Mostly because I'm a selfish person, been in automobile accidents caused by lapsing human attention, and I want it to be less likely that I'll die in a car crash.

But it's not just about quantity. It's also _different_ people who will die. That radically alters things from an ethical perspective.

Yep. Medical professionals have been aware of this dilemma for millennia: many people die from an ailment if no treatment is attempted, but bad approaches to treatment can kill people that would have survived otherwise. And setting 'better average accident rates' as the threshold for self driving vehicle software developers to be immune from the consequence of their errors is like setting 'better than witch doctors' as the threshold for making doctors immune from claims of malpractice.

Move fast, break different things, is not the answer.

What if its very much better average accident rates? This isn't black-and-white.

No, it certainly isn't black and white. Indeed 'much better' is hard to even define when human drivers cover an enormous amount of miles per accident, miles driven are heterogenous in terms of risk, there isn't even necessarily a universally accepted classification of accident severity or whether drivers should be excluded from the sample as being 'at fault' to an unacceptable degree. Plus the AV software isn't staying the same forever: every release introduces new potential edge case bugs, and any new edge case bug which produces a fatality every hundred million miles makes that software release more lethal than human drivers, even if it's better at not denting cars whilst parking and always observes speed limits in between. I don't think every new release is getting a enough billion miles of driving with safety drivers to reassure there's no statistically significant risk of new edge case bugs though.

And in context, we still punish surgeons for causing fatalities through gross negligence even though overall they are many orders of magnitude better at performing surgery than the average human.

Sophistry. 'Much better' can be very clear, in terms of death or injury, or property damage, or insurance claims, or half a dozen reasonable measures.

Sure it takes miles to determine what's better. Once automated driving is happening in millions (instead of hundreds) of cars on the road, it will take only days to measure.

I mean, the 'half a dozen reasonable measures' is a problem, not a solution, when they're not all saying the same thing. And sure, it only takes days before we know the latest version of the software actually isn't safer than the average human. And a lot of unnecessary deaths, and the likelihood the fix will cause other unnecessary deaths instead [maybe more, maybe less]. It's frankly sociopathic to dismiss the possibility this might be a problem as sophistry.

Straw man? There are many phases to testing a new piece of software, short of deploying everything to the field indiscriminately.

Some of us believe (perhaps wrong but there it is) that the human error rate will be trivially easy to improve upon. That's not sociopathic. It would be unhelpful to dismiss this innovation (self-driving cars) because of FUD.

Some of us believe, based on the evidence that the human fatal error rate is as low as 3 per billion miles driven in many countries, and some people actually are better than average drivers. Might be trivially easy to improve upon human ability to not to dent cars whilst parking or observe speed limits, but you're going to struggle to argue that improving on the fatal error rate is trivially easy for AI, or that the insurance cost of the dents matters more than the lives anyway.

People who actually want initiatives to succeed are going to have to do better than sneering dismissal in response to anybody people pointing out obvious facts that complex software seldom runs for a billion hours without bugs and successfully overfitting to simulation data in a testing process doesn't mean a new iteration of software will handle novelty it hasn't been designed to solve less fatally than humans over the billions of real world miles we need to be sure.

People CAN drive well. But understand in my rural state the highway department has signs over the road, showing fatalities for the year. It averages one a day. I don't think the cancer patients in the hospital die that frequently.

So you can name-call all you like and disparage dialog because you disagree or whatever. But I don't think a billion miles between accidents is anywhere close to what I see every day.

FUD isn't a position, its got no place in this public-safety discussion.

I vote for that option as well.

So far, I have been killed exactly zero times in car crashes. All the empirical evidence tells me that there's no need to surrender control to a computer.

If I die in a crash, perhaps I'll change my mind...

Do we gain something from placing blame? Who do we blame for people who die from natural disasters? Freak occurrences?

Are deaths where blame can be placed preferable to deaths where it cannot? By what factor? Should we try to exchange one of the latter for two of the former?

"The question is whether self-driving cars, even with the errors, are safer for people than regular drivers on average. If so..."

You could say "the question is whether aircraft, even with the errors, are safe for people than regular drivers on average. If so..." - then how should we change policy towards Boeing in light of the 737 MAX fiasco? Should we then avoid any adverse action towards them and focus on encouraging more people to fly?

If we declare that something is safer, particularly before it even exists, isn't there a danger of a feedback loop that prevents it from being safer?

Regular drivers kill people, but they are also generally vulnerable to the crashes that they cause. Boeing engineers, or the programmers of self-driving AIs, don't have their interests aligned with you, the occupant of the vehicle, nearly as much.

There's still some nuance that's important. If self-driving cars always sacrifice other road users to protect the driver, self driving cars could be reduce death/injury overall, but there's a question about whether or not this behavior is ethical. And if the training datasets consistently label cars but not other road users, then this bias could be baked in completely by accident.

It only needs to be better.

A Pedestrian likely has a different definition of "better" than the car driver.

I would never buy a car that's just better than an average driver: many crashes come from driving while drunk, in a fog, on ice, etc.

I would only buy a car that's as good as a fully attentive, sober, skilled and well rested driver.

Besides, there will be unforceen issues that increase your likelihood of death: hacking, sensor failures, etc.

> Would you bet a family member?

I bet myself and my passengers and everyone else on the road every time I take the wheel. That's what seat belts and other safety measures are for. I'm also willing to take those risks of other drivers I can't control. Why wouldn't I extend those risks one more step to a proven system? (The onus is on proof of that risk)

You currently do every day when you let your family members drive with senior citizens.

Your intuition should be that almost everything is incorrectly labelled. And even more so on a large scale sequential basis.

If you look at public datasets, it's more often that things are incorrectly tagged/labeled rather than correctly.

Entropy is a real thing.

At what framerate? With 30 fps, a 60mph vehicle advances about 5 feet per frame. Every incorrectly labeled frame delays the reaction by a nontrivial amount.

My intuition is that your trained accuracy will not exceed the accuracy of the training set. This is literally a matter of life and death; every frame matters.

Tesla claims using 110 fps in their older and 2,300 fps in their newer cars. (https://en.wikipedia.org/wiki/Tesla_Autopilot#Hardware_3)

Following the wikipedia source:

> CEO Elon Musk says that it is capable of processing 200 frames per second and Tesla’s hardware 3 computer, which is optimized to run a neural net, will be able to handle 2,000 frames per second with redundancy.

No way they're shooting 2,000fps, let alone making that many adjustments per second. Maybe that's just a radar ping/signal frequency?

I like Elon, but why does he keep saying things like this?

> With 30 fps

What is the framerate actually used by real-world systems, such as Tesla cars?

You would hope that the software would be resilient enough that missing or even faulty labels would not be the cause of problems.

It is by the way relatively easy to - once you reach some arbitrary threshold of precision - to detect such missing or wrong labels automatically, a simple suggestion UI with an accept / reject button could take care of supplying the bulk of the missing labels and correcting the bulk of the errors.

How would one reach a threshold of precision if the data on which precision is determined is missing or faulty?

The simple idea is to do stuff like train your model on randomized subsets of your data and then compare its performance to using all the data you have.

This doesn't overcome a true "black swan", but that's not what NN are meant to be doing anyhow.

"The simple idea is to do stuff like train your model on randomized subsets of your data and then compare its performance to using all the data you have."

How do you do this when you cannot verify that your data, in subset or in whole, is accurate? And furthermore you don't know how inaccurate it is?

You need to have a known-good test dataset that is as representative as possible. Something that you are absolutely sure is golden - i.e. someone has gone through it manually and verified all the labels are correct and complete. Then it doesn't matter quite so much if your input labels are noisy, because if you perform well at test time, your model is working.

If you have no idea of the accuracy of any of your data then you're probably asking the wrong question of it. You can do things like test for consistency using cross-validation, e.g. does half of your dataset predict the other half with the same kind of performance? But that can't detect the same errors repeated throughout your data.

My intuitive take on it:

You train on a subset of the initial data. Even if the data has a certain number of incorrect frames, it should still do a decent job getting a lot of things right.

Then you manually loop through all the images of the data set for which the network has detected something that isn't present in the annotations (and vice versa). If the network correctly identified a missing item that wasn't in the original set, all you need to do is press "correct" (and, again, vice versa). You now have an improved data set.

Retrain, rinse, repeat.

Eventually, you'll converge to a case where you have consistency between training and annotations. And then, you manually go through all images again to weed out the final mistakes.

The benefit of this method is that it's much faster to click "correct" that it is to draw rectangles on the screen to label something.

The drawback of this method is that it's easy to miss a pedestrian amidst the sea of rectangles already drawn.

You would be far better off paying people $x/hour to look at an image for 30 seconds and answer the question "Does this image have any people in it?" Y/N and watching for the human who says "Yes" when the AI says "No".

Their accuracy rating will help distinguish who is best able to detect pedestrians that AI and other people missed (and who is just random-clicking Y/N for pay), and their group effort will ensure that someone eventually sees the pedestrian, even if no one else has.

Asking them to draw boxes distracts them from their job, which is "verify that we are able to detect human beings with perfect accuracy vs. a hundred people trying to detect human beings".

(At worst, ask them to click on the person. No need for a box. Either it's a person or it isn't. If it is, and your AI missed it, then what they think is the right kind of box to draw is the least of your concerns.)

Why pay people to do this when you can make a CAPTCHA that requires them to do it for free?

Google must have a really good data set from all those "click the boxes containing X" tests they make people do.

Most people don't have the funds on hand to use predatory pricing^ tactics to train machine learning algorithms.

^ "the pricing of goods or services at such a low level that other suppliers cannot compete and are forced to leave the market", to quote Google's definition (that was itself taken from some other company's dataset).

Both methods aren't mutually exclusive.

I've been thinking a lot about this sort of thing lately, and isn't it the case that ideally you shouldn't need to manually confirm or reject mismatches? If the learning program maintains a probability density for "training labels are wrong", definitive ground truth should be unnecessary - eventually it will figure out the mismatches by itself.

As I understand it, this is the core of recursive Bayesian estimation. At the end of the day we don't really have ground truth for anything - it's all filtered through senses with error bars. So any learning process needs to be robust to that.

Sure, you can modify your model to better account for bad training data. But you could also fix the training data. The previous comment pointed out that fixing the training data (to a high degree at least, using the predictions of an intermediate version) is significantly faster than the initial labeling.

Does this dataset have an auxiliary field for each data point to note if it's been human reviewed? We used to train models with only those datapoints that had n=3 concordance in manual reviewers.

It doesn't, but since the original was supposedly human generated and I've now checked them all (twice) it should be n=2. Would love for someone to triple check! If you find errors let me know and I'll be happy to correct them.

I would think you want some images that cut off vehicles since this is how cameras will work at times, no?

I think the problem is not that the car is cut off, its that the cut-off portion of the car isn't labeled. If that's your training data, then you're training the system to interpret cut-off bits of cars as not cars.

No it's not scary in the slightest. Missing or scrambled labels are routine in ML and the algorithms are able to handle it.

This was a huge paper a couple years ago that demonstrates deep nets will still find the structure in the data even with totally randomized labels: https://arxiv.org/abs/1611.03530

There are also various well known techniques to verify whether you've over or underfit.

> deep nets will still find the structure in the data

Yes, but in the training data, not in the non-training data! We want the cars to avoid real pedestrians, not only the ones labelled in the training set!

From the paper:

> When trained on a completely random labeling of the true data, neural networks achieve 0 training error. The test error, of course, is no better than random chance as there is no correlation between the training labels and the test labels.

Strongly disagree. ML is traditionally deployed in low-risk scenarios like image labeling or recommendations where 1% or 5% error rate is totally "routine" and acceptable. Doesn't mean this kind of ML pipeline can translate to cases where errors are literally fatal.

It is a self-correcting problem: these pedestrains won’t be present in the next dataset.

People should learn not to go outside if they're not labelled.

What a funny future it'd be, if we have to wear something distinctive (giant QR codes?) so we don't get killed outside. As a side effect it would make tracking us much easier...

It would be less inconvenient than what I already have to do to avoid being killed by human drivers. Getting to the other side of a street according to regulation procedure can easily require taking a 10 minute detour. It can be worse than that in the suburbs.

I suppose this differs depending on location and what your perspective is. Here, it can be quite a chore for human drivers to avoid killing pedestrians. They'll just wander out across the crosswalk, nevermind that their walk-light is red, and proceed leisurely across without a care for the speeding traffic (that has the right-of-way to begin with).

To say nothing of drunks.

--- Anecdotally, I do also remember, as a pedestrian, waiting for 10 minutes or more in subzero temperatures at an un-managed T crossing (crosswalk but no lights), waiting for an opportunity to cross. Traffic just kept coming and coming from all directions; this was a frequent problem at the time of day I needed to cross. I suppose there's opportunity for improvement on both sides.

You seem to be joking, but isn't that pretty much the SQ? Not universally, but there is a lot of clothing incorporating high-viz features. Most prominent in children's clothes and work clothes, but I've seen reflective material that is almost invisible during daytime being used in business coats, apparently with an eye towards bankers on bikes and the like.

That is why you don't wear purely black clothing in winter and why visibility vest are a thing (think French yellow vests). Or why as a cyclist you want reflective stripes. Funny present.

IR reflective tape on outdoor clothing. Not the most outlandish idea. I hate it but I can see a future of it.

I really hope that gets verified before people start doing this (and I'm sure some will). Being classified as a lens flare / visual artefact is not a great way to die either.

Sounds like something out of the book Snow Crash or the comic book Transmetropolitian

That's not a reply to the GP and actually a very good idea.

> People should learn not to go outside if they're not labelled.

I worry what will happen when this idea breeds with the "why worry about privacy if you've got nothing to hide?" fallacy.

Wouldn't it be nice if preserving your privacy simply meant not wearing your QR code?

It's just 'license plates for people'. And driving without a license plate is already illegal in many places. It would be a small step to require citizens to wear their nameber clearly visible at all times when outside. The penalty for failure would be a bit harsh. But on the plus side, all the corpses would be unidentified, so technically no-bodies...

Suddenly, being labelled doesn't sound so bad.

I acknowledge the issues in the dataset and that it has a lot of stars on github because it's from Udacity; but calling it 'a popular self-driving car dataset' is misleading as it implies this dataset is popularly used for self-driving cars when it is in fact only a small dataset Udacity uses to teach the basics of training neural networks for self-driving cars.

I've been involved in the autonomous vehicle industry for a while and have been focused on perception for most of it. Most research papers will test their models on popular datasets for self-driving cars and show the results as a sort of benchmark. I've never seen this dataset mentioned anywhere. Heck the size of the dataset is an order of magnitude smaller than most of the popular ones as well.

This is just a github repo. That's it.

Are these larger datasets routinely subject to the same kind of inspection this titanic.csv of self-driving car datasets?

I hope so. I've personally tested Scale's labeling service and it was much higher quality than this dataset. But it's a pretty secretive industry so I'd bet some companies' data is better than others.

It'd be interesting if the NHTSB had a held-back "test set" they used to evaluate self driving cars before letting them on the road.

These are manually tagged stills, right? Not video? That's a data set for training CAPCHA breakers, not self-driving. You need to use video, where you get to see the same objects at different ranges. Recognition gets better as you get closer. Then track the recognized objects backwards to when they first appear, and try to recognize them at smaller sizes.

I'm tangential to the field, but not directly in it myself, and I tend to agree. The goal of these systems should be to "perceive" their environment, not just to "recognize" it.

Perception means understanding that just because a truck (that we recognized 3 frames ago) went behind a tree, doesn't mean it ceased being a truck. Furthermore, this knowledge should be used to refine the model, to say "hey I can still see the wheels, I know those wheels were attached to that truck a moment ago, therefore I still know where the truck is, even if I can't recognize it plainly as such right now".

Furthermore, even if it's completely out of view, it's still there, probably moving close to the same speed and track it was. And if its path intersects ours, we need to assume that it'll reappear at some point. And the longer it's out of view, the bigger are the errors on its estimated position, according to our knowledge of the acceleration and braking limits of trucks of that type.

I've never heard of anyone even working towards this sort of perception, much less having achieved it. And until we get there, these things are all toys. Dangerous, legally nebulous toys.

you're correct that this sort of persistent world modeling is needed for self driving cars, but from what I've heard from friends who work in the industry, both cruise and waymo have it. they're very far from using a plain CNN on their video cameras, they've got depth mapping and such and carefully constructed software making use of the perception data to model how the world will change and react to that. idk if it works well, but they definitely know they need it and are trying.

that said, I've driven a tesla on autopilot, and holy crap is was so incredibly bad. I'm optimistic about self driving cars in general, but not about tesla's. it will frequently lose track of the road lines at night and fail to make turns, suddenly beeping at you that you're in control now, with no warning! I only ever used it like cruise control, but I can't understand how anyone driving a tesla would dare use the tricks that allow bypassing the restrictions that prevent taking your hands off the wheel.

I've never built a self driving car but if I were to give it a go, identifying what's in a frame would be the first step I'd tackle. From there you'd add more layers to the stack to get a full understanding of the world, predict what will happen next temporally, and then choose which actions to take.

Well, if an autonomous vehicle outfit were running unmonitored Level 4 vehicles on public roads using only an open source data set I'd be worried. Even if it was labelled thoroughly and correctly, there isn't nearly enough data in any open source dataset to train an autonomous vehicle perception system that can operate safely without human supervision. This is not a safety critical issue.

The Uber vehicle that ended up killing the pedestrian in Arizona was running under conditions similar to the ones that you outline here. The only notable difference was that a person was monitoring the vehicle.

The only notable difference was that a person was present and supposed to be monitoring the vehicle.

Fixed that for you

And it will happen again. There were cases of people forgetting Nuclear Weapons on a runway for a whole day by accident. People, eventually, will fuck up. Mislabelled datasets will be used.

I work in the AV space. There's a lot of ambiguity in labeling.

How should a crowd of people be annotated? A line of parked cars? A photograph of a car?

Examine the training set!

A big thing is not that the example is missing, but that it counts as a negative example.

I.e. if during training a ML system notices the ambiguous combination (i.e. a woman pushing a baby stroller or a crowd) and marks it as a pedestrian, then it gets penalized in a manner that teaches it to ignore these ambigious combinations and treat it as nothing; while in practice it should probably treat such ambiguous combinations as even more "avoid-worthy" as an ordinary pedestrian.

The problem is that the default assumption is "clear road, you can drive there" - so what we need isn't "pedestrian detection" that finds pedestrians and only pedestrians, we need detection of random stuff that you shouldn't drive over. If a kid is wearing a weird Halloween costume, that doesn't look like a pedestrian, but it is one; If somebody has set up a tent in the middle of a supermarket parking lot, that's not a pedestrian but it should be avoided just like one.

Likewise, I have yet to see one of these things that can recognize potholes and swerve to avoid them lest a wheel be ripped off.

Sooner or later it'd be nice to be able to drive one of these things in a place that isn't southern California.

That's a feature, not a bug. Swerving for potholes can be very dangerous, more dangerous than having undercarriage damage. If a pothole surprises you enough that you have to swerve you were either not paying attention to the road or you are following too close.

Regardless of what's the appropriate action to take given the context (ignoring, swerving, slowing down, a timely proper change of lane) it's probably not controversial that potholes should be identified by a car vision system and taken into account. And from a computer vision perspective there's no qualitative difference between "just" a deep pothole and a lane-wide ten foot deep sinkhole or a construction pit that's unmarked for some reason, it's just a matter of size.

tbf there's still the evaluate whether slowly changing your lane position to avoid the pothole will impede or confuse other road users, and do so only if that isn't the case option to avoid potholes, and I don't imagine [semi]autonomous driving systems do that either?

Right, what's really needed is a "clear road detector".

A line of parked cars absolutely needs to be labeled as individual cars. Any one of them could pull out in front of you at any moment.

Indeed, whilst additionally presenting the chance of a door opening or an obscured pedestrian stepping out from between them.

Yup. You need to consider not only what you can see, but what you can't see. And this is a harder problem.

Are there any studies on how this affects recognition? I'm labeling some vehicles and will my neural net learn better (for say vehicle identification) from a row of identical vehicles, individual vehicles with overlap, non-overlapping parts only?

My knee-jerk is to be upset because the risk factor of a poorly functioning pedestrian classifier is obviously higher than something like a sentiment analyzer, but at the same time, this dataset is just for educational purposes, right? Is Udacity actively recommending people use this in production settings?

They claim[1] they're working on building an "open source self driving car" but it looks like the project hasn't had much activity recently.

[1] https://github.com/udacity/self-driving-car

Hold on a second here - are unlabeled pixels used in training a NN to do detection? Will a typical NN get trained to label those pixels as “not a human”? I agree that they should be labeled, but it’s the difference between needing to throw more data at the problem (because you aren’t getting as much learning per image as you could) and actively training the car to do something bad.

Yes, if you don’t “punish” incorrect predictions in your loss function your neural net could just get “perfect” accuracy by putting a giant bounding box around the entire image.

Technically that’s “right”; it did put a box around all the obstacles just like you asked it to. But that “solution” is not useful. You want it to find what it’s looking for and only what it’s looking for.

In this case, if it detects an unlabeled pedestrian the loss function will penalize it a bit for that “wrong” answer and it will slightly deviate to try to not find that pedestrian but still find the correctly labeled examples. It’s trying to fit the examples you give it best as possible.

Until self driving is possible with purely unsupervised data, self driving cars and other algorithms relying on massive datasets are unlikely to get really strong because label inaccuracies like this are inevitable

Also, far worse has been found in large scale datasets. Pretty sure there was CP found somewhere in imagenet

Isn't training against subset of data and then validating against rest a common practice? It wouldn't detect all the mislabeling but should detect some indicating that manual inspection is required, assuming error isn't very systematic.

It is, and there are some interesting techniques published recently to help mitigate things like this. But if you don't have a good ground truth you're at the very least flying blind and at worst feeding garbage in and getting garbage out; your models will learn what you tell them to learn.

While I am happy about their efforts, it's interesting that in the bottom left image of their example I can clearly see another unlabeled car in the lower left half (standing at the sidewalk of the street). Also, I am not sure, but it seems like there's a cyclist on the sidewalk, visible between the stroller and the car (the wheel and hands are more clearly visible). The google image marks a car and a few pedestrians, but completely misses the traffic lights at the junction.

So I guess even their fixed dataset still misses many labels, if already their showcases miss some.

Hey, OP here, yeah you’re correct. The dataset doesn’t label any obstacles that small/far in the distance. I zoomed in on the region with errors for the sake of the screenshot.

Here’s the original run through Google Vision AI. They actually don’t get the pedestrian either: https://imgur.com/a/84IVTV6

(I fired up the labeling tool I use and grabbed a recording of the few seconds of video around that frame to give an idea of what’s labeled in the dataset and what’s not at that imgur link as well)

Nice, thank you for the upload and thank you for clarifying that small objects are not labeled, that explains it. I was suprised because other images do contain rather small labels for traffic lights or even cars, but I guess it's always in the eyes of the person who labels the data.

I think you did an amazing amount of work and huge improvements over the original, have you considered contributing the changes back upstream?

I plan to. They use a custom CSV format that my labeling tool can't work with so I converted everything to VOC XML. I need to write a script to convert back to their format to submit a PR.

Not sure if they'll accept the PR though; the original data had a "visualization link" back to the labeling company on each line which I can't reproduce.

So, do they claim to be human managed datasets where people drew all the object bounds?

I wonder if they used their own classification AI on sample sets and just called it a day. AI blind leading the AI blind?

They claim it was ("The dataset was annotated entirely by humans using Autti" via https://github.com/udacity/self-driving-car/tree/master/anno...).

But after looking at the data I'm almost certain there was some "tool assist" going on. There were dozens of frames in a row with phantom bounding boxes in the exact same location which made it look in some way automated (maybe user error combined with a "copy bounding boxes from the previous frame" feature?)

It's a bit like saying "We found this store and it doesn't label kitchen knifes as potentially dangerous"... Nobody in their right mind would use this data set for developing a self-driving car and if one does, then they will not get a license to operate... And if by any chance they do it's still like stabbing other people with a knife snatched from the shelf a supermarket. You just don't do it.

This is the same for MS COCO: https://github.com/AlexeyAB/darknet/issues/4085

Lots of datasets are semi-automatically generated, with human oversight. Sometimes annotators miss things or are just plain lazy.

What's the dataset size? If there's billions of pedestrians tagged and a couple hundreds are missing, would it actually have a big impact on the training? Also looks like a lot of the people in those images are off the road. What's the current standard in AV? Is everything tagged on the sidewalk? (Genuine questions)

If they are reporting a 33% image error rate, I would expect a large effect on accuracy no matter what the individual annotation error rates were.

I am unfamiliar with the detailed tagging standards but that also seems irrelevant until the more egregious problems are resolved. Get everything untagged first, then look for smaller scoped issues. And thanks Roboflow for doing whatever amount of this.

From the article:

>Perhaps most egregiously, 217 (1.4%) of the images were completely unlabeled but actually contained cars, trucks, street lights, and/or pedestrians.

This gives us a data set size of 15500.

The dataset is 15,000 images. Breakdown of the number of labels per class (post fixes) is here: https://i.imgur.com/bOFkueI.png

Not all of the pedestrians and cyclists were on the sidewalk, no (eg the kid on his bike in the road and the lady with a stroller in a crosswalk).

I stuck with what it looked like the conventions of the original dataset were (all people labeled as pedestrians whether on the road or not). They just didn't do it very well or consistently.

I do also think that makes the most sense in this context; if you were building a self driving car this layer of the stack would want to know where the people are; higher up you can combine that with where you know the roads/crosswalks/stoplights, etc (and the delta of their position between frames) to make predictions about where they might go next so your car can act accordingly.

For example, a stationary pedestrian at a corner will probably cross the street when the light turns green; if you're turning you need to factor that in.

> REDACTED accelerates your computer vision workflow through automated annotation quality assurance, universal annotation format conversion (like PASCAL VOC XML to COCO JSON), team sharing and versioning, and exports directly to file format, like TFRecords.

This is an ad, posing as a sensationalistic blog post.

This just shows how clueless AI is still. Imagine your driving instructor insisted there was no person with a stroller on that corner. You'd dump them on the spot!

Only when the machines start to complain when they are fed shitty data, we can talk about them being fit to drive.

Do they really think people are using these datasets for commercial applications?

Do you really think there is any possible fuckup that won't happen sooner or later?

Will staff of a nuclear silo forget to lock the door, and then fall asleep? Because that happened in US

Do you really think someone will put a plane in production with a single safety critical sensor with no backup or fall back?

Do you really think someone will pour dissolved uranium down the drain, starting a nuclear reaction and dying horribly?

Do you really think someone will crash a spacecraft into mars, by mixing up imperial and metric units?

None of these is similar! The off the shelf dataset won't help you achieve nothing commercially, maybe learning or tinkering at best.

amplification of the old complex/neurosis - not only people don't notice me, now robots too!

  Mister Cellophane
  Shoulda Been My Name
  Mister Cellophane
  'Cause You Can Look Right Through Me
  Walk Right By Me
  And Never Know I'm There...

This seems like the kind of problem that is suitable for and important enough to throw hundreds or thousands of (volunteer?) people at. Imagine assembling a massive public corpus of human verified training data - we'd collectively be that much closer to a technology which will change society! This problem is screaming for a consortium effort on behalf of major corporate entities in the space, who could each benefit without revealing internal secrets.

If you've ever had recaptcha ask you to identify stop signs, traffic lights, buses etc you're helping improve Waymo's data set for free :) Maybe it'll be shared with the public instead of used for profit in a decade or two.

It's interesting that it was a public dataset (ImageNet) that kickstarted the amazing strides in computer vision over the last decade! More public data would be awesome (if high quality).

Applications are open for YC Summer 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact