Hacker News new | past | comments | ask | show | jobs | submit login
Genesis – a generative physics engine for general-purpose robotics (genesis-world.readthedocs.io)
161 points by tomp 21 hours ago | hide | past | favorite | 37 comments





In the sizzle reel, the early waterdrop demos are beautiful but seem staged, the later robotics demos look more plausible and very impressive. But referring to all these "4D dynamical worlds" sounds overhyped / scammy - everyone else calls 3D space simulated through time a 3D world.

> Genesis's physics engine is developed in pure Python, while being 10-80x faster than existing GPU-accelerated stacks like Isaac Gym and MJX. ... Nvidia brought GPU acceleration to robotic simulation, speeding up simulation speed by more than one order of magnitude compared to CPU-based simulation. ... Genesis pushes up this speed by another order of magnitude.

I can believe that setting up some kind of compute pipeline in a high level language such as Python could be fast, but the marketing materials aren't explaining any of the "how", if it's real it must be GPU-accelerated, but they almost imply that it isn't. Looks neat, hope it works great!


It is a nice physics engine, it uses Taichi (https://github.com/taichi-dev/taichi) to compile Python code to CUDA/GPU (similar to what Warp Sim does, https://github.com/NVIDIA/warp)

> But referring to all these "4D dynamical worlds" sounds overhyped / scammy - everyone else calls 3D space simulated through time a 3D world.

In the research community, "4D" is a commonly used term to differentiate from work on static 3D objects and environments, especially in recent years since the advent of NeRF.

The term "dynamic" has long been used similarly, but sometimes connotes a narrower scope. For example, reconstruction of cloth dynamics from an RGBD sensor, human body motion from a multi-view camera rig, or a scene from video, but assuming that the scene can be decomposed into rigid objects with their individual dynamics and an otherwise static environment. An even narrower related term in this space would be "articulated", such as reconstruction of humans, animals, or objects with moving parts. However, the representations used in prior works typically did not generalize outside their target domains.

So, "4D" has become more common recently to reflect the development of more general representations that can be used to model dynamic objects and environments.

If you'd like to find related work, I'd recommend searching in conjunction with a conference name to start, e.g. "4D CVPR" or "4D NeurIPS", and then digging into webpages of specific researchers or lab groups. Here are a couple interesting related works I found:

https://shape-of-motion.github.io/ https://generative-dynamics.github.io/ https://stereo4d.github.io/ https://make-a-video3d.github.io/

All that considered, "4D dynamical worlds" does feel like buzzword salad, even if the intended audience is the research community, for two main reasons. First, it's as if some authors with a background in physics simulation wanted to reference "dynamical systems", but none of the prior work in 4D reconstruction/generation uses "dynamical", they use "dynamic". Second, as described above, the whole point of "4D" is that it's more general than "dynamic", using both is redundant. So, "4D worlds" would be more appropriate IMO.


> "4D dynamical worlds"

Its a feature of that field of science. I'm currently working in a lab that is doing bunch of things that in papers are described $adjective-AI. In practice its just a slightly hyped, but vaguely agreed upon by consensus in weird science paper english term, or set of terms. (in the same way that guassian splats and totally just point clouds with efficient alpha blending[only slightly more complex, please don't just take my word for it])

You probably understand what this term is meant to describe, but to spell it out gives a bit of insight into _why_ its got such a shite name.

o "4d": because its doing things over time. Normally thats a static scene with a camera flying through it (3D). when you have stuff other than the camera moving, you get an extra dimension, hence 4D.

o "dynamical" (god I hate this) dynamic means that objects in the video are moving around. So you can just used the multiple camera locations to build up a single view of an object or room, you need to account for movement of things in the scene.

o "worlds" to highlight that its not just one room being re-used over and over, its a generator (well its not, but thats for another post) of diverse scenes that can represent many locations around the world.


They could be implying a little bit of computer graphics in the mix. Rotation, shear, and transformation matrices have a dimension of 4.



I saw this on twitter and actually came on HN to see if there was a thread with more details. The demo on twitter was frankly unbelievable. Show me a water droplet falling...okay...now add a live force diagram that is perfectly rendered by just asking for it? What? Doesn't seem possible/real. And yet it seems reputable, the docs/tech look legit, they just "aren't released the generative part yet".

What is going on here? Is the demo just some researchers getting carried away and overpromising, hiding some major behind the scenes work to make that video?


My understanding is they built a performant suite of simulation tools from the ground up, and then they expose those tools via API to an "agent" that can compose them to accomplish the user's ask. It's probably less general than the prompt interface implies, but still seems incredibly useful.

The values on the forces diagram can't be real

So we can run AI agents with RL in molecular level simulations for replacing product designing,machanical engineering, electrical engineering, aerospace engineerig and everything else right!!? If we can combine protein folding too then we could possibly solve any disease and poverty with fully automation

This looks neat. Single step available - as far as I can tell though, no LIDAR, no wheels? Very arm/vision focused. There’s nothing wrong with that, but robotics encompasses a huge space to simulate, which is why I haven’t yet done my own simulator. Would love a generic simulation engine to plug my framework into, but this is missing a few things I need.

Maybe I missed it, but are there any performance numbers? It being 100% implemented in Python makes me very suspicious that this won’t scale to any kind of large robot.

It’s implemented in Python, but it is using existing Python libraries which themselves are implemented in C, etc.

Notably it uses both Taichi and Numba, which compile code expressed in (distinct restricted subsets of) Python (much broader in Numba’s case) to native CPU/GPU code including parallelization.


Python is used here to wrap around some sort of kernel compiler (taichi). Not out of the realm of possibility that kernels which are compiled out of Python source code could be placed on device with some sort of minimal runtime (although taichi executes on CPU via LLVM, so maybe not so minimal)

There is enough space on large robots to add in beefier compute if needed (at the expense of power consumption). Python is run all the time on robots. Compute usually becomes more of a problem as the robot gets smaller, but it should still be possible to run the intensive parts of a program on the cloud and stream the results back.

Any roboticists here? Is this impressive/what is the impact of this?


What method is Genesis using for JIT compilation? What subset of Python syntax / operations will be supported?

The automatic differentiation seems to be intended for compatibility with Pytorch. Will Genesis be able to interface with JAX as well?

The project looks interesting, but the website is somewhat light on details. In any case, all the best to the developers! It's great to hear about various efforts in the space of differentiable simulators.


> What method is Genesis using for JIT compilation?

Taichi and Numba are both in the pyproject.toml


I believe they use Taichi.

100% python and fast? Either it isn't 100% python, or it isn't fast.

Depends where your boundary for "100% Anything" is I suppose. It seems to use GPU accelerated kernels written in Python via the Taichi library for most of the physics calculations. At some point, sure, the OS+GPU driver+GPU firmware you need to run the GPU accelerated kernel are not written in Python (and if you run it on CPU instead it will be slow, but more because you're using the CPU than you're not using C or something). There is a bit of numpy too, which eventually boils down to some non-Python stuff (as any Python code eventually will). I'm not sure that's a useful distinction or that the choice of language in defining the kernels makes a meaningful difference on the overall performance in this case.

The doc emphasizes "100% Python" and that backend is natively in Python. I'm reading this as "you don't need anything else than Python interpreter." Given a large number of packages aren't in Python under the hood, that's a big, unnecessary hyperbole. It's Ok to acknowledge that there's a heavily reliance on non-python code, e.g. Taichi or Numpy.

I also think that the distinction isn't particularly useful. Just pedantic claims will get pedantic feedback.


It’s particularly useful if it is an open source project and you want to communicate to people who might want to hack on it (either in a fork or the main project) what languages they will need to work directly with to do so.

It’s not important to end users, but they aren’t the only audience.


The Genesis code itself is 100% python. The underlying Python libraries it uses are not (just as, or that matter, the Python standard library isn’t, but this is, in particular, using Numba – which compiles fairly normal Python to CPU and optionally GPU-native code – and Taichi, which compiles very specially-crafted Python to kernels for GPU.)

I was mildly impressed with the water demo, but that robot thing is kinda crazy, really. Finally looks like a framework for AI which can do my laundry.

What does it mean that gs.generate() is missing in the project?

"Currently, we are open-sourcing the underlying physics engine and the simulation platform. Access to the generative framework will be rolled out gradually in the near future."

I suspect that the actual generation and simulation/rendering takes several minutes for each step.

The simulation/rendering is actually pretty fast since it's all done by heavily optimized gpu-based physics and graphics engines. The "generative" part is that they have some LLM stuff that's finetuned for generating configurations/parameters for the physics engine conditioned on some text. Ie, the physics and graphics are classical clockworky simulations, with a generative frontend to make it easier (but less precise) to get a world up and running. The open source release currently provides the clockworky simulator stuff, with the generative frontend to be released some time in the future.

The GitHub claims:

> Genesis delivers an unprecedented simulation speed -- over 43 million FPS when simulating a Franka robotic arm with a single RTX 4090 (430,000 times faster than real-time).

That math works out to… 23.26 nanoseconds per frame. Uhh… no they don’t simulate a robot arm in 23 nanoseconds? That’s literally twice as fast as a single cache miss?

They may have an interesting platform. I’m not sure. But some of their claims scream exaggeration which makes me not trust other claims.


It's possible they're executing many simulations in parallel, and counting that. 16k robot arms executing at 3k FPS each is much more reasonable on a 4090. If you're effectively fuzzing for edge cases, this would have value.

The reason why they are using the FPS (frames-per-second) term in a different way, is that this robotics simulator is primarily going to be used for reinforcement learning, where you run thousands of agents in parallel. In that context, the total "batched" throughput of how many frames you can generate per second is crucial for training your policy network quickly - than the actual latency between frames (which is more important for real-time tasks like gaming)

Yeah it’s gotta be something like that. The whole claim comes across as rather dishonest. If you’re simulating 16,000 arms at 3000 fps each then say that. Thats great. Be clear and concise with your claims.

Agreed.

The fine text at the bottom of speed comparison video on the project homepage says "With `hibernation = True`". Based on a search through the code, the hibernation setting appears to skip simulating components which reach steady state.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: