Open-sourcing 5,000hrs of self-driving dataset

SnYaak · 2025-03-11T17:57:51 1741715871

Today Hugging Face (LeRobot) & Yaak are releasing the worlds largest open source self driving dataset for training end-to-end models.

We are inviting the entire AI & robotics community to search curate datasets for training end2end models.

To search the data, Yaak is launching Nutron - A tool that is revolutionizing natural language search of robotics data. Check out the video to see how it works (We promise to step-up our video game some day)

TL;DR Natural language search of multi-modal data Open sourcing L2D dataset - 5,000 hours of multi-modal self-driving data Community powered dataset curation. Tech Blog: https://lnkd.in/dPaPv554 Try Nutron: https://lnkd.in/dvBzAX5N

dpe82 · 2025-03-11T22:21:40 1741731700

[flagged]

SnYaak · 2025-03-11T22:45:39 1741733139

Noob here doing noob things

pvg · 2025-03-11T22:54:16 1741733656

You can't have both text and and a link in most posts but it doesn't really matter, either is fine.

menaerus · 2025-03-12T08:39:24 1741768764

> the adoption of end-to-end AI within the robotics and automotive community remains low

What do you think how else lane assistance, traffic sign recognition, highway and traffic jam assist, emergency braking or pedestrian or any traffic object detection and tracking is actually implemented? All of them are backed up in one way or another by neural networks, which by today definition is I guess "AI", and this has been the case for at least ~10 years in all major car manufacturers.

That said, I don't believe we will see Transformers architecture anytime soon implemented in resource and $$$ constrained environments such as automotive industry.

anon373839 · 2025-03-13T02:27:57 1741832877

I took "end-to-end AI" here to refer to whole systems trained end-to-end as black boxes (pictures in, throttle/steering/braking signals out). Which, to me, is not something I would ever want to see deployed in safety-critical applications.

a2128 · 2025-03-12T10:03:55 1741773835

I don't know if automotive is that resource and money constrained. People will spend tens of thousands on upgrades if they see the value, and there's plenty of space in there to put a powerful GPU cluster. But of course reliability constraints will be stricter

menaerus · 2025-03-12T10:37:10 1741775830

Generally speaking, hardware found in commercial vehicles is very cheap and therefore constrained - e.g. slow CPUs and slow memory, low amount of memory, limited (flash) storage, and technology is generally lagging behind - you would for example still be able to find higher-end cars that had been produced recently that are unable to stream the content from a smartphone to an infotainment system without using a USB cable. This is a feature we had elsewhere for at least a decade. There are other such examples but when compared to the commodity smartphone hardware, automotive hardware is lagging in many respects.

Both BMW and Mercedes-Benz produce ~2.5M vehicles/year so I guess when you need to scale your production out to such quantities, saving even a few cents per component means a noticeable margin. Competition today is bigger than ever with Chinese companies entering the market, especially for European car makers, so I envision that cutting costs will become even more important than it was before.

AtlasBarfed · 2025-03-12T15:59:30 1741795170

Look I really really really don't want to give Tesla anything.

But 5,000 hours might be 1 millionth what Tesla theoretically has. They have six million vehicles that have been driven for about 2-3 years figure they average 1-2 hours of use a day.

I still think route-optimized/precalculated datasets are necessary for the next couple 9s of self-driving. I'm not an AI guy so take this with a huge grain of salt, but routes can be trained with adversarial AIs to match route-specific metrics/conditions of "winning".

I think convergent infrastructure is necessary for another set of 9s.

6stringmerc · 2025-03-11T20:33:58 1741725238

Is it possible to sift through the set and create a selection of instances where the self-driving vehicles hit birds, curbs, and run over wildlife critters and train a model specifically on those?

Let’s take some liberty with the fact that if we’re going to train things, shouldn’t we understand the worst case outcomes possible as a ground to check against?

SnYaak · 2025-03-11T22:49:26 1741733366

You can search the dataset and curate dataset collections. We are releasing a TriageAI soon. Trained in expert behavior it will score all the data compared to what a driving instructor would do. If the driving decision deviates too much from what a local expert would have done, the scenario will get a low score.

Next version of search you will be able to search the dynamic environment in the scene as well.

You can already now search harsh breaking events etc.

clemnt · 2025-03-11T19:59:58 1741723198

very cool!