Upon digging in, I was appalled that fully 1/3 of the images contained errors or omissions! Some are small (eg a part of a car on the edge of the frame or a ways in the distance not being labeled) but some are egregious (like the woman in the crosswalk with a baby stroller).
I think this really calls out the importance of rigorously inspecting any data you plan to use with your models. Garbage in, garbage out... and self-driving cars should be treated seriously.
I went ahead and corrected by hand the missing bounding boxes and fixed a bunch of other errors like phantom annotations and duplicated boxes. There are still quite a few duplicate boxes (especially around traffic lights) that would have been tedious to fix manually, but if there's enough demand I'll go back and clean those as well.
No, it's not even remotely "really scary". No one is putting an actual self-driving car on the market using this specific data set. Disingenuous to pretend this is any indication of the data using by serious companies in the space or is represented of the impact a few mislabelled samples have on the ability of these systems & algorithms to generalize.
Are you sure about that? What about "non-serious" companies? Various fly-by-night self-driving startups?
I mean, this sounds like the machine learning equivalent of "no serious business is pulling random bits of code from StackOverflow or whatever tutorial popped up first in the search results", and yet I think everyone who worked with software for more than few years can confirm that companies absolutely do it, and a lot.
(Tangentially, PHP got a bad rap in large part because of this - people not knowing how to do things correctly copying code from tutorials written by people who didn't know it either; these days, JS ecosystem is getting a bad rap for having such bad code wrapped in neat and easy-to-install NPM packages.)
Bold assumption that's already got a counterpoint: https://en.wikipedia.org/wiki/Death_of_Elaine_Herzberg
Along with your pedestrian death there were five (!!!) driver deaths.
There's a reason why places like Germany don't want Tesla to use the term 'autopilot' because it's reckless endangerment.
If Tesla weren't a US company I'm sure they'd have already been sued for billions for labeling their assisted driving system autopilot.
The US is the country where you can pay millions for not having a sticker on the microwave showing that you shouldn't dry your cat in there.
With machine learning you never know when it might mistake the back of a semi for an overpass or soemthing.
Fly-by-night self-driving startups will at some point run into issues with the NHTSA. George Hotz self-driving car project was shut down as soon as he announced that he'd start selling some prototype.
States require permits just to be allowed to test self-driving prototypes on the open road. I don't know what the requirements are, but they're probably subject to some kind of review that's more than just filling in a form.
What is the alternative other than holding these companies responsible for the deaths they might cause ? The only solution I see is mandating very high insurance requirement than a human driver. For example 25M per death they might cause.
Of course, real responsibility is probably unfashionable...
(edit: I'm personally in high-stakes time-sensitive diagnostic ML development, so...)
Is it? IME nobody seems to give a flying fuck outside of HN. Much like the PHP days. Or maybe I've never been in the right workplaces to notice.
First, the AI hype train. People think that calling something "Artificial Intelligence" implies that it is artificial, yes, but also, critically, that it is intelligent. Many enthusiastic people, and also many policymakers, don't fully realize the extent to which machine learning is constrained by both the quality and nature of its training data, and the capabilities of the larger, non-intelligent, software framework that it's being plugged into.
Second, that we have one case study - the post-mortem analysis of the fatal pedestrian collision in Arizona - that strongly indicates that commercial products are not free of the sorts problems being highlighted here, and that, unlike what others have suggested, misclassification problems aren't necessarily an issue that's isolated to individual frames and that will come out in the wash when the software is dealing with a stream of frames.
Me, I think that self-driving cars are probably a lot like nuclear power. In theory, yes, it is a great idea. In practice, there are a lot of little details that one must get right, and there seem to be a whole lot of opportunities for flaky engineering decisions and incompetent public policy, both enabled by insufficiently-tempered optimism, to scuttle the whole thing.
Yes! I really wished we would call this something more descriptive of the boring statistics involved. Artificial Intelligence is just too cool sounding name not to get excited about it. And what people don’t realize is that this is really just a statistical inference model, nothing intelligence. Machine Learning is better, but it is still not descriptive enough.
I think if we called it something like Computational Reinforcement Modelling or Iterative Weight Inference people might stop having the idea that there are machines making “smart” choices involved and finally seeing that really we are just inferring based some computations on provided data.
Really, “Machine Learning” deserves no more hype than the boring sounding Kalman Filter.
Scientists are in the best position to influence the public and push for change. It is astonishing that scientific organizations have been timid about nuclear energy, or outright against it, because of a few people with the most extreme, empirically unjustified stance.
Part of my point is that you might think that the game is "scientists vs. politicians, ignoramuses, and fear-mongers," but that's not the case. Sometimes they're all on the same side. Imagine a 4 player split-screen Star Fox, where the scientist leaves their controller and helps the others shoot down their plane because they don't think they can win. If that sounds baffling, it is.
We could have had your safe nuclear power involved in every road accident. Yeah!
Maybe a better analogy is driving a car, that is, nuclear power and self-driving cars are both vaguely like driving a car.
However, my intuition is that nuclear power has less variables than driving a car. I say intuition because I don't know much about neither.
I think the point is that self-driving cars trained on poor data are no better than poorly trained superhumans at driving.
During the cold war they launched fission reactors into orbit. That probably had more variables. Then again, since the nuclear airplane project was unsuccessful, maybe that had the most variables of all.
Putting a nuclear reactor in space (as we still do, I think the most recent Mars rover had one) greatly reduces the risk of it directly causing deaths, since it's really far away from humans for most of its service life.
If you look at the Uber accident, killing 1 person severly harmed the company.
The only extremely harmful scenario that I can imagine in self driving is a bad remote mass software update (which is sadly possible even in an otherwise great company) causing lots of accidents at the same time in different vehicles.
The OP is selling something, and it's in their best interest to spread FUD to sell it.
It's scary that the only way we know how to build something that detects pedestrians in an image with any kind of reliability is to use training data.
You: All training data has labeling issues. Me: Training data is the only way we know how to build some aspects of systems. Other people here: Some of these systems are safety-critical.
I doesn't feel like FUD to be concerned that we need to have strategies for mitigating quality issues in training data. That if you don't have anything like that, then you cannot be in the market of building a safety-critical system. As you have identified, labeling issues are always there. Kind of how human error is always there. Okay, so now that you believe that, you have to find ways to mitigate it.
But when I hang around people who do self-driving cars, they don't have a good answer to how to mitigate this risk. Some of them don't even really believe in it (which is different than being ignorant of the problem), because they argue with enough data things will get good enough. It's all really sloppy.
Instead of just saying "This is FUD", why not tell us why you feel so reassured that this is not a problem? So far, you said "this always has been a problem and always will be a problem", "the world hasn't ended yet" and "Google translate". What?
re: The world hasn't ended yet: okay, but we also haven't been building self-driving cars
The person that originally posted this thread that works for the company that's selling this FUD. At the bottom of the blog post:
"Roboflow accelerates your computer vision workflow through automated annotation quality assurance"
This is a non-issue exasperated by a for-profit corporation that's creating an "issue" they conveniently have a service to help you fix.
> tell us why you feel so reassured that this is not a problem
Because to me you're coming across as the old-school "you don't need QA if your code is perfect" type.
But for safety critical systems it's six nines reliability or GTFO.
It's not so much that "nobody gets killed by Google translate messing up" as "when people get killed by Google Translate messing up, I can't tell".
Think of the trope about idiots driving into lakes or off the road or whatever because they were following navigation. And now think about unconditionally trusting Google Maps or anything else, for say, a year. You know it's not up to that standard.
I think you can debate whether it needs to be an order of magnitude better, or multiple orders of magnitude better.
I mean, the exact same argument could be made about people. Just because you belong to Homo Sapiens doesn't imply you're an intelligent being at all times - e.g. drink a bottle of vodka and the sapiens part is gone.
That's the bar an AI system has to meet today. And one big hurdle is that, viewed as minds, AI systems are completely unlike our own. They're extremely unpredictable. On top of that, they lack self-preservation instinct (this may be indirectly a side effect of how we structure companies - the level of effort put into an AI system is a reflection of how much people working on it care about getting it right vs. getting it to market).
> they lack self-preservation instinct
I suspect that a self-preservation instinct, even in relatively dumb animals, is enormously complex. It might seem simple to us because it's so evolved, so hardwired. If I had to bet, I'd bet that a mouse's self-preservation instinct is more complex than the first-to-market Level 5 self-driving car will be.
And that's good, because I definitely don't want automated vehicles to have more than a hint of self-preservation wiring.
But the broader reason I brought up self-preservation is that in humans, it's not just about not getting killed in a crash. It's also about not having your life ruined by causing it, even if you walk away physically unscathed. A person has personal, deeply visceral stake in driving safely. A self-driving car AI currently doesn't, because it's not a mind; what it has is the reflection of priorities in the company that made it. And as it turns out, companies can kill people and get away with it, so I don't feel confident about how safety is prioritized (see also: this article).
I think it might even be more like Platonic shadows on a firelit cave wall of the priorities of the company that made it.
At the current level that self-driving is at, the cars' control software is only able to perceive and respond to situations that its programmers anticipated and specifically coded for. That could be heuristics and rules in the hard-coded bits, but, also, even the most sophisticated deep neural net out there isn't even particularly capable of perceiving, let alone understanding, things that weren't coded into its training set. Meaning that, even if the company's intention was that the car should not plow into a pedestrian holding an enormous cellophane-wrapped Edible Arrangements™ fruit basket, that intention doesn't have any bearing on the vehicle's behavior if the software instead classified that object as a plastic bag blowing in the wind.
E.g., it might be in the long-term interest of the company to not run over pedestrians. If nothing else, the PR cost is very high. But when an exec wants to be first in the field to demonstrate personal success, then hitting a made-up date becomes the priority. Which implicitly puts "not kill people" lower. Middle managers don't want to get blamed, so they'll favor a more complex, muddled organization structure. Per Conway's law, that means muddled code. So instead of the car's software reflecting "don't run over pedestrians" as a key goal, its true priorities are things like "gives a good demo", "was delivered 'on time'", and "reflects an architecture clever enough to get that architect promoted that quarter".
It's not impossible that good software comes out of this system, but the deck is certainly stacked against it.
Yes, it's called statistics and probability theory.
My understanding of statistics is:
- I can halve the % insanity by adding another 100% of good labels.
- If I want to reduce the insanity of labels to 1/33th of ~33% I need to add another 3200% of good labels.
- If I want to reduce the insanity to 0% I need to balance the bad labels with an infinite amount of good labels.
Is there anything I'm missing entirely except probability theory? Is probability theory the answer or is there something else?
People who talk about the danger of humans driving cars always seem to talk about the raw numbers, because humans drive cars a lot and the raw numbers are rather large.
But when we talk about automated driving, it's in percentages, because it's not being done on the same scale.
So to compare apples to apples, you'd have to convert the number of fatalities to an accuracy percentage. Have you considered trying? There is certainly more than one way to do it, but it would greatly contribute to the discussion if you made some attempt.
Telsa's early results for their very limited "self-driving" technology has shown a huge reduction in accidents for any given period of time the vehicles are on the road.
Humans are much safer than people on average, when driving in conditions suitable for Autopilot.
This makes zero sense and isn't how "average" works. For the same 1000 hours on the road, a Tesla car with Autopilot will have fewer accidents than a car driven for 1000 hours by humans. This changes as driving conditions get worse, and humans outperform Autopilot.
The way "average" works is that you average over something - a population or set. It is very important to be clear about what that something is and whether it's appropriate.
Why do you believe that Autopilot outperforms humans in comparable conditions? If this is based on Tesla marketing, I'm extremely prejudiced against them, and assume out of hand that they simply aren't making the right comparison and don't care. However, if you think that is incorrect, you could elaborate on why you have the opinion you do.
> Why do you believe that Autopilot outperforms humans in comparable conditions?
Because they have the data that proves it?
> I'm extremely prejudiced against them
And I've chosen to take them at face value with a grain of salt, and to believe that for the data they've collected from the hundreds of thousands of Tesla's with millions of hours of data using Autopilot, it's fair to say they have a large enough sample to draw conclusions about the safety of their cars vs. any incident rates from pretty much any other distribution.
Does it make more sense as "Humans, when driving in conditions suitable for Autopilot, are much safer than people on average"?
"I've chosen to take them at face value with a grain of salt, and to believe that for the data they've collected from the hundreds of thousands of Tesla's with millions of hours of data using Autopilot, it's fair to say they have a large enough sample to draw conclusions about the safety of their cars vs. any incident rates from pretty much any other distribution"
You seem to be saying that if you have a lot of data it doesn't matter what you compare it to. That seems wrong to me. Also, I don't have this data, and you are not bothering to help me find it.
The immaterial distinction between "humans" and "people" still makes that sentence confusing. I take it that you mean "driving a mile (either as human or autopilot) is safer in conditions that are good for autopilot than driving a mile in average conditions"? Or more directly, isn't your question really "are the conditions the same for the averaged human drivers and the averaged autopilots"?.
My implication is that I severely doubt the conditions are the same, when somebody touts a comparison, and I would need clear and convincing evidence otherwise to change my mind. As well as strong evidence of good intent and trustworthiness by the source of the information.
It's not just about being intentionally deceitful, but about the fact that it's hard to do the right comparison, so people feel justified in giving up on it.
Would a statement like this be a surprise for you?
1. You can't have an infinite amount of good labels
2. Humans are in charge of labeling too.
The question is if you can reliably overcome the number of bad labels in your training set, so that 33% of bad labels equates to <33% "insanity" in the system.
How would a system reliably discredit missing labels while still learning from good labels? The simplest solution would be that system is able to spot the bad/missing labels itself with some certainty, but that seems like a catch 22.
Edit: Parent was edited, was previously (paraphrased)
> I'm guessing you have no technical understanding of how this works
However, training as used in the real world isn't on a still frame only basis, it's used in sequence. And while a single frame might be missing a label, I bet that at-speed most everything important gets labeled correctly enough to be better than a distracted human driver, or the average human driver for that matter.
Then, of course, my intuition might be incorrect.
Would you bet a family member? That a distracted driver is a hazard does not mean other drivers are safe, or even safer.
"think of the children!"
Lives at stake don't change anything here. The question is whether self-driving cars, even with the errors, are safer for people than regular drivers on average. If so, then absolutely yes everyone should bet their lives and their families'.
Thousands of people are dying every day in cars. This is not something we need to wait for it to be perfect. It only needs to be better.
If I die in a plane, I have no control over the matter. If I die in a car, then at least I may have had control, and it could be chalked up to my inattentiveness, bad driving behavior, etc. The latter then implies the individual has direct control over whether they live or die, and can thus ensure life by being a responsible driver. I bet the same reasoning will be applied to self driving cars, even if they are orders of magnitude safer, just as the case with airplanes.
This is the same reality for self-driving cars. The edge cases will happen but they won't keep happening with rigorous improvements in the models and trained behaviour.
Flight safety has improved dramatically over the past century and the ability for self driving cars to adapt will be even faster, as it's mostly just software.
Of course this implies that an accident had to happen for such an improvement to take place. But we have both simulation improving drastically to help alleviate that plus the alternative is the current situation where the same types of accidents keep happening again and again with only the occasional improvement to car technology and safety features.
Maybe logically that makes sense but from an ethical perspective I argue it's much more complicated than that (e.g. the trolley problem)
In the current system if a human is at fault, they take the blame for the accident. If we decide to move to self driving cars that we know are far from perfect but statistically better than humans, who do we blame when an accident inevitably happens? Do we blame the manufacturer even though their system is operating within the limits they've advertised?
Or do we just say well, it's better than it used to be and it's no one's fault? When the systems become significantly better than humans, I can see this perhaps being a reasonable argument, but if it's just slightly better, I'm not sure people will be convinced.
Move fast, break different things, is not the answer.
And in context, we still punish surgeons for causing fatalities through gross negligence even though overall they are many orders of magnitude better at performing surgery than the average human.
Sure it takes miles to determine what's better. Once automated driving is happening in millions (instead of hundreds) of cars on the road, it will take only days to measure.
Some of us believe (perhaps wrong but there it is) that the human error rate will be trivially easy to improve upon. That's not sociopathic. It would be unhelpful to dismiss this innovation (self-driving cars) because of FUD.
People who actually want initiatives to succeed are going to have to do better than sneering dismissal in response to anybody people pointing out obvious facts that complex software seldom runs for a billion hours without bugs and successfully overfitting to simulation data in a testing process doesn't mean a new iteration of software will handle novelty it hasn't been designed to solve less fatally than humans over the billions of real world miles we need to be sure.
So you can name-call all you like and disparage dialog because you disagree or whatever. But I don't think a billion miles between accidents is anywhere close to what I see every day.
FUD isn't a position, its got no place in this public-safety discussion.
So far, I have been killed exactly zero times in car crashes. All the empirical evidence tells me that there's no need to surrender control to a computer.
If I die in a crash, perhaps I'll change my mind...
Are deaths where blame can be placed preferable to deaths where it cannot? By what factor? Should we try to exchange one of the latter for two of the former?
You could say "the question is whether aircraft, even with the errors, are safe for people than regular drivers on average. If so..." - then how should we change policy towards Boeing in light of the 737 MAX fiasco? Should we then avoid any adverse action towards them and focus on encouraging more people to fly?
If we declare that something is safer, particularly before it even exists, isn't there a danger of a feedback loop that prevents it from being safer?
Regular drivers kill people, but they are also generally vulnerable to the crashes that they cause. Boeing engineers, or the programmers of self-driving AIs, don't have their interests aligned with you, the occupant of the vehicle, nearly as much.
It only needs to be better.
A Pedestrian likely has a different definition of "better" than the car driver.
I would only buy a car that's as good as a fully attentive, sober, skilled and well rested driver.
Besides, there will be unforceen issues that increase your likelihood of death: hacking, sensor failures, etc.
I bet myself and my passengers and everyone else on the road every time I take the wheel. That's what seat belts and other safety measures are for. I'm also willing to take those risks of other drivers I can't control. Why wouldn't I extend those risks one more step to a proven system? (The onus is on proof of that risk)
If you look at public datasets, it's more often that things are incorrectly tagged/labeled rather than correctly.
Entropy is a real thing.
My intuition is that your trained accuracy will not exceed the accuracy of the training set. This is literally a matter of life and death; every frame matters.
> CEO Elon Musk says that it is capable of processing 200 frames per second and Tesla’s hardware 3 computer, which is optimized to run a neural net, will be able to handle 2,000 frames per second with redundancy.
No way they're shooting 2,000fps, let alone making that many adjustments per second. Maybe that's just a radar ping/signal frequency?
I like Elon, but why does he keep saying things like this?
What is the framerate actually used by real-world systems, such as Tesla cars?
It is by the way relatively easy to - once you reach some arbitrary threshold of precision - to detect such missing or wrong labels automatically, a simple suggestion UI with an accept / reject button could take care of supplying the bulk of the missing labels and correcting the bulk of the errors.
This doesn't overcome a true "black swan", but that's not what NN are meant to be doing anyhow.
How do you do this when you cannot verify that your data, in subset or in whole, is accurate? And furthermore you don't know how inaccurate it is?
If you have no idea of the accuracy of any of your data then you're probably asking the wrong question of it. You can do things like test for consistency using cross-validation, e.g. does half of your dataset predict the other half with the same kind of performance? But that can't detect the same errors repeated throughout your data.
You train on a subset of the initial data. Even if the data has a certain number of incorrect frames, it should still do a decent job getting a lot of things right.
Then you manually loop through all the images of the data set for which the network has detected something that isn't present in the annotations (and vice versa). If the network correctly identified a missing item that wasn't in the original set, all you need to do is press "correct" (and, again, vice versa). You now have an improved data set.
Retrain, rinse, repeat.
Eventually, you'll converge to a case where you have consistency between training and annotations. And then, you manually go through all images again to weed out the final mistakes.
The benefit of this method is that it's much faster to click "correct" that it is to draw rectangles on the screen to label something.
You would be far better off paying people $x/hour to look at an image for 30 seconds and answer the question "Does this image have any people in it?" Y/N and watching for the human who says "Yes" when the AI says "No".
Their accuracy rating will help distinguish who is best able to detect pedestrians that AI and other people missed (and who is just random-clicking Y/N for pay), and their group effort will ensure that someone eventually sees the pedestrian, even if no one else has.
Asking them to draw boxes distracts them from their job, which is "verify that we are able to detect human beings with perfect accuracy vs. a hundred people trying to detect human beings".
(At worst, ask them to click on the person. No need for a box. Either it's a person or it isn't. If it is, and your AI missed it, then what they think is the right kind of box to draw is the least of your concerns.)
Google must have a really good data set from all those "click the boxes containing X" tests they make people do.
^ "the pricing of goods or services at such a low level that other suppliers cannot compete and are forced to leave the market", to quote Google's definition (that was itself taken from some other company's dataset).
As I understand it, this is the core of recursive Bayesian estimation. At the end of the day we don't really have ground truth for anything - it's all filtered through senses with error bars. So any learning process needs to be robust to that.
This was a huge paper a couple years ago that demonstrates deep nets will still find the structure in the data even with totally randomized labels: https://arxiv.org/abs/1611.03530
There are also various well known techniques to verify whether you've over or underfit.
Yes, but in the training data, not in the non-training data! We want the cars to avoid real pedestrians, not only the ones labelled in the training set!
From the paper:
> When trained on a completely random labeling of the true data, neural networks
achieve 0 training error. The test error, of course, is no better than random chance as there is no
correlation between the training labels and the test labels.
To say nothing of drunks.
Anecdotally, I do also remember, as a pedestrian, waiting for 10 minutes or more in subzero temperatures at an un-managed T crossing (crosswalk but no lights), waiting for an opportunity to cross. Traffic just kept coming and coming from all directions; this was a frequent problem at the time of day I needed to cross. I suppose there's opportunity for improvement on both sides.
I worry what will happen when this idea breeds with the "why worry about privacy if you've got nothing to hide?" fallacy.
I've been involved in the autonomous vehicle industry for a while and have been focused on perception for most of it. Most research papers will test their models on popular datasets for self-driving cars and show the results as a sort of benchmark. I've never seen this dataset mentioned anywhere. Heck the size of the dataset is an order of magnitude smaller than most of the popular ones as well.
This is just a github repo. That's it.
It'd be interesting if the NHTSB had a held-back "test set" they used to evaluate self driving cars before letting them on the road.
Perception means understanding that just because a truck (that we recognized 3 frames ago) went behind a tree, doesn't mean it ceased being a truck. Furthermore, this knowledge should be used to refine the model, to say "hey I can still see the wheels, I know those wheels were attached to that truck a moment ago, therefore I still know where the truck is, even if I can't recognize it plainly as such right now".
Furthermore, even if it's completely out of view, it's still there, probably moving close to the same speed and track it was. And if its path intersects ours, we need to assume that it'll reappear at some point. And the longer it's out of view, the bigger are the errors on its estimated position, according to our knowledge of the acceleration and braking limits of trucks of that type.
I've never heard of anyone even working towards this sort of perception, much less having achieved it. And until we get there, these things are all toys. Dangerous, legally nebulous toys.
that said, I've driven a tesla on autopilot, and holy crap is was so incredibly bad. I'm optimistic about self driving cars in general, but not about tesla's. it will frequently lose track of the road lines at night and fail to make turns, suddenly beeping at you that you're in control now, with no warning! I only ever used it like cruise control, but I can't understand how anyone driving a tesla would dare use the tricks that allow bypassing the restrictions that prevent taking your hands off the wheel.
Fixed that for you
How should a crowd of people be annotated? A line of parked cars? A photograph of a car?
Examine the training set!
I.e. if during training a ML system notices the ambiguous combination (i.e. a woman pushing a baby stroller or a crowd) and marks it as a pedestrian, then it gets penalized in a manner that teaches it to ignore these ambigious combinations and treat it as nothing; while in practice it should probably treat such ambiguous combinations as even more "avoid-worthy" as an ordinary pedestrian.
The problem is that the default assumption is "clear road, you can drive there" - so what we need isn't "pedestrian detection" that finds pedestrians and only pedestrians, we need detection of random stuff that you shouldn't drive over. If a kid is wearing a weird Halloween costume, that doesn't look like a pedestrian, but it is one; If somebody has set up a tent in the middle of a supermarket parking lot, that's not a pedestrian but it should be avoided just like one.
Sooner or later it'd be nice to be able to drive one of these things in a place that isn't southern California.
Technically that’s “right”; it did put a box around all the obstacles just like you asked it to. But that “solution” is not useful. You want it to find what it’s looking for and only what it’s looking for.
In this case, if it detects an unlabeled pedestrian the loss function will penalize it a bit for that “wrong” answer and it will slightly deviate to try to not find that pedestrian but still find the correctly labeled examples. It’s trying to fit the examples you give it best as possible.
Also, far worse has been found in large scale datasets. Pretty sure there was CP found somewhere in imagenet
So I guess even their fixed dataset still misses many labels, if already their showcases miss some.
Here’s the original run through Google Vision AI. They actually don’t get the pedestrian either: https://imgur.com/a/84IVTV6
(I fired up the labeling tool I use and grabbed a recording of the few seconds of video around that frame to give an idea of what’s labeled in the dataset and what’s not at that imgur link as well)
I think you did an amazing amount of work and huge improvements over the original, have you considered contributing the changes back upstream?
Not sure if they'll accept the PR though; the original data had a "visualization link" back to the labeling company on each line which I can't reproduce.
I wonder if they used their own classification AI on sample sets and just called it a day. AI blind leading the AI blind?
But after looking at the data I'm almost certain there was some "tool assist" going on. There were dozens of frames in a row with phantom bounding boxes in the exact same location which made it look in some way automated (maybe user error
combined with a "copy bounding boxes from the previous frame" feature?)
Lots of datasets are semi-automatically generated, with human oversight. Sometimes annotators miss things or are just plain lazy.
I am unfamiliar with the detailed tagging standards but that also seems irrelevant until the more egregious problems are resolved. Get everything untagged first, then look for smaller scoped issues. And thanks Roboflow for doing whatever amount of this.
>Perhaps most egregiously, 217 (1.4%) of the images were completely unlabeled but actually contained cars, trucks, street lights, and/or pedestrians.
This gives us a data set size of 15500.
Not all of the pedestrians and cyclists were on the sidewalk, no (eg the kid on his bike in the road and the lady with a stroller in a crosswalk).
I stuck with what it looked like the conventions of the original dataset were (all people labeled as pedestrians whether on the road or not). They just didn't do it very well or consistently.
I do also think that makes the most sense in this context; if you were building a self driving car this layer of the stack would want to know where the people are; higher up you can combine that with where you know the roads/crosswalks/stoplights, etc (and the delta of their position between frames) to make predictions about where they might go next so your car can act accordingly.
For example, a stationary pedestrian at a corner will probably cross the street when the light turns green; if you're turning you need to factor that in.
This is an ad, posing as a sensationalistic blog post.
Only when the machines start to complain when they are fed shitty data, we can talk about them being fit to drive.
Will staff of a nuclear silo forget to lock the door, and then fall asleep? Because that happened in US
Do you really think someone will put a plane in production with a single safety critical sensor with no backup or fall back?
Do you really think someone will pour dissolved uranium down the drain, starting a nuclear reaction and dying horribly?
Do you really think someone will crash a spacecraft into mars, by mixing up imperial and metric units?
Shoulda Been My Name
'Cause You Can Look Right Through Me
Walk Right By Me
And Never Know I'm There...