
Udacity open-sources additional driving data - olivercameron
https://techcrunch.com/2016/10/05/udacity-open-sources-an-additional-183gb-of-driving-data/
======
olivercameron
We have much, much more coming soon, in addition to LIDAR/radar data. You can
join in with the fun here: [https://udacity.com/self-driving-
car](https://udacity.com/self-driving-car)

It's worth pointing out some other awesome datasets, such as:

• [http://research.comma.ai](http://research.comma.ai)

• [https://devblogs.nvidia.com/parallelforall/deep-learning-
sel...](https://devblogs.nvidia.com/parallelforall/deep-learning-self-driving-
cars)

• [http://data.selfracingcars.com](http://data.selfracingcars.com)

Any questions I can answer?

~~~
ingenieroariel
What's the datasets' license? Public Domain?

~~~
olivercameron
MIT! [https://github.com/udacity/self-driving-
car/blob/master/Lice...](https://github.com/udacity/self-driving-
car/blob/master/License.txt)

~~~
ryanlol
Might be a good idea to mention that in the readme.md next to the datasets
since they're hosted separately. The wording of the MIT license doesn't really
help avoid confusion here.

edit: or in the .tar with the dataset

------
Animats
This is just a database of normal driving. That's useful for learning how to
follow the car in front and avoid stationary objects, but not much else. It's
going to result in systems that drive like humans right up to the point they
do something really bad.

A more useful database would be the one Nexar is accumulating.[1] They collect
dashcam imagery of events where the driver did a hard brake or the system
detected some other hazardous condition. That database could be used to train
a system which recognizes trouble before braking starts.

Both systems need a much wider field of view. Probably at least 160 degrees,
so cross traffic shows up before the collision.

[1] [http://spectrum.ieee.org/cars-that-
think/transportation/sens...](http://spectrum.ieee.org/cars-that-
think/transportation/sensors/the-ai-dashcam-app-that-wants-to-rate-every-
driver-in-the-world)

~~~
mjshiggins
Definitely good points. We have three cameras that are arranged colinearly
along the whole width of the windshield, so this dataset has a pretty big
effective field of view. And while it is currently limited in a lot of ways,
it's just the beginning of the types of data we will be releasing. Everything
will start scaling up to cover more use cases, as this data is mainly meant to
support the training of a visual network for steering wheel predictions. For
the moment, we actually do want to train the networks to drive like "normal"
humans in normal situations. Thanks for your thoughts!

------
mjshiggins
I'm here as a core contributor to the project and will be answering questions
as well (in between making cable extensions for our lidar units).

------
TuringNYC
A great, widely-used, dataset that teams can benchmark against is a superb
start. Kudos to Udacity on this. I'd love to have a blind test set as well
that teams can test and rank against.

~~~
mjshiggins
There will be a blind test set for the challenge itself! Including a public
leaderboard. We are asking the world to compete on building the best vision
based network for predictive steering.

------
RangerScience
A friend of mine once commented that if you're not Google or Facebook, you
don't have big data.

Is this adding a tier to that? A gigabyte a second, per car?

(I know, Google has self-driving cars, but forget about that for a moment)

~~~
rw
The world is bigger than web companies.

For contrast, one experiment at CERN produces 40TB/sec of sensor data, before
downsampling and filtering:
[https://en.wikipedia.org/wiki/Compact_Muon_Solenoid#Collecti...](https://en.wikipedia.org/wiki/Compact_Muon_Solenoid#Collecting_and_collating_the_data)

~~~
RangerScience
Holy crap. And yes, it definitely is; if you haven't looked up Industry 4.0 or
Industrial Internet, that entire sector is making a push to sensorize.

As a rearguard defense: Yes, but, what percent of the time is CERN running an
experiment [that generates that data]?

According to a quick Google search, average time spent driving is 101 minutes
/ day.

Totally makes sense that CERN (and, likely, any large Science! efforts)
produces that level of burst data, but wouldn't these cars produce more data
over time?

As a different topic, a different friend of mine is of the opinion that AI is
dependent on the throughput of data through the system; think about the amount
of information your body feeds to your brain, and how much time it was doing
that before you were capable of communication.

------
asimuvPR
Could you explain what type of data it is? Is it camera, lidar, engine
controls, input controls, or all combined?

What time length does the data cover? One hour of driving? Two?

Do you have open map data related to the data set?

~~~
mandeepj
Great questions. it is lidar data.

~~~
asimuvPR
Thanks for clearing that up! :)

