Hacker News new | past | comments | ask | show | jobs | submit login
Speeding Up Reinforcement Learning with a New Physics Simulation Engine (googleblog.com)
84 points by apsec112 on July 17, 2021 | hide | past | favorite | 13 comments



We invite researchers to perform a more qualitative measure of Brax’s physics fidelity by training their own policies in the Brax Training Colab. The learned trajectories are recognizably similar to those seen in OpenAI Gym.

Why would the comparison be qualitative? We have known equations of motion. Seems like it would be more productive to produce a quantitative metric where we compare the output to the theoretical solution predicted by the equations of motion.

2nd question: if we have equations of motion, why is this needed or better than a monte-Carlo solution to those equations of motion?


You have to integrate which means discretizing time. And you have to deal with collision and solve constraints. There are no exact solutions in physics simulation.


A computer simulation by definition is not exact because it is discretized. However it is still possible to solve mathematical formulae and use them to validate the AI based solution. This is done every day in the physical infrastructure you use and works extremely well. The planes you fly in to the submarine cables Internet traffic is carried on are all modeled numerically. They aren’t modeled using AI ( at least not yet).


this isn’t an AI based physics simulator. It’s a normal physical simulator for use in AI


What does a “normal” physics simulator mean? Is it supposed to have less scrutiny because it’s “AI” based?


It's normal because it's not powered by AI.


if you're wondering why lots of these differentiable pipelines are tasked with learning physics (and what that has to do with google) the answer is that this is "compute oriented development". by "compute oriented development" i mean that since google has access to unlimited compute they can use this compute to run physics kinematics solvers (ie pde solvers) that are then used to generate training data for RL models. what's the point of the RL model if the physics model already exists and gives you high fidelity simulations? well it's clearly an easy paper to write... but other than that, some people claim the RL models are faster than the physics solver. i guess that's true if you don't take into account the millions of hours of compute spent on the solvers themselves.


Good point. This is why robotics researchers do not take deep RL papers seriously unless they have some real world robotics results. I'm looking at you, people who only show mujoco results and claim their algorithm is useful for robotics.

Simulators are useful though for real world robotics. You can prototype your environment and algorithm, and also attempt sim2real transfer. For example, use the simulator to generate a lot of image data, and train image based controllers. Add enough domain randomization and maybe your controller trained on the simulator can transfer to real images.


(disclaimer: work on RL, have trained models for simulated tasks)

I'm fairly sure that people work on control because general algorithms for control would be very useful (e.g., robot that can skin a cat and drive a car by holding the steering wheel). Such a robot would exist in our 3d physical world, so simulations of of our 3d world are used for training. If this could be done with radically less compute, it would be.


sure but it doesn't hurt that you have infinite data too (i.e. the thing most other ML research is bound by). like you can't argue that it's not a very comfortable corner to be in wrt being able to publish.


Sounds quite a bit like you're complaining that they chose/engineered a fruitful field of study. I think I'm missing what the problem with that is.


>I think I'm missing what the problem with that is.

I'm complaining that publishing endless papers on your methods that are trained on endless amounts of synthetic data is more about paper churn than contributing something novel. like the person below says: no real control system uses an RL controller (e.g. boston dynamics uses only classical controls).


Oh I see. I would have guessed that that was because this way relatively new. If it really doesn't translate to anything real then I definitely get your point.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: