Hacker News new | past | comments | ask | show | jobs | submit login
GPUDrive: Data-driven, multi-agent driving simulation at 1M FPS (arxiv.org)
98 points by jonbaer 5 months ago | hide | past | favorite | 10 comments



Unless I'm missing something big, this looks like a significant deal for independent developers of self-driving AI software: GPUDrive enables them to run driving simulations with hundreds of AI agents on consumer-grade GPUs at 1M FPS, and it comes with Python bindings, wrappers for Pytorch and Jax, and a friendly standard MIT license. Thank you for sharing this on HN!


Unfortunately, 1M FPS is not for the simulated front camera view of the road, it's just an overhead view of the map of the roads/tracks.


I am not an expert, but the way that I understand self-driving systems is that there are multiple models running, and then those outputs are fused into yet another model which outputs the raw controls/actuations. In other words, I see this model/trainer as the "conductor", telling the car how it should approach an intersection, enter a highway, deal with merging traffic or construction zones, etc.

There is another model which interprets visual data to assist with lane-keeping, slow down or stop for pedestrians, inform the conductor of road signs... The final model combines all these inputs and incorporates the user preferences and then decides whether to brake or accelerate, how much to rotate the steering wheel.

Idk heh. The point of the high performance training is you can train the "conductor" role faster, and run inference faster. Assuming the car has limited compute/gpu resources, if you have a very high performance conductor function, you can dedicate that much more budget to visual/sensor inference and or any other models like the Trolley Problem decider (jk).

edit: grammar/details


It's not particularly useful since the simulation is too high level. Good for games or homework projects, but not anything real world.


Is this just the location data being trained on, or is there image and sensor input data too? It looks like it's just location, which seems like it limits the applicability, but I'm not sur

Edit: reading a bit more it's somewhere in between. Afaict no raw sensor data etc.,. but different "parsed" sensor inputs are supported. I'm not sure whether this is synthetic or not? E.g., is the LIDAR view real LIDAR data from some system or a processed result of what the system thinks LIDAR would be able to see? I can't tell.



I don't know this field of research thus my question: why such a high framerate is consider as a feature at all? Does it help with learning rate?


If you have a simulation where realtime is 60 fps, you could simulate a little over 4.5 hours per second if you could run it at 1M fps. That would definitely help with learning rate.


Feels like less ≠ more. If you want to fill in these gaps you might aswell interpolate


He's not saying "break realtime into microsecond chunks."

He's saying: run through 4.5 hours of 16 millisecond chunks of time in a second. This is good for regression testing or producing training data quickly.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: