Hacker News new | past | comments | ask | show | jobs | submit login
A neural net solves the three-body problem 100M times faster (technologyreview.com)
180 points by EndXA 3 months ago | hide | past | web | favorite | 51 comments



If I understood correctly, on page 2 they say they generated their training and test cases using random initial positions and velocities for the 3-body problem. But this is exactly the sort of case that a non-sophisticated solver should be able to handle easily as well. Brutus is special in the sense that it tries hard to give some sort of a guarantee about the accuracy of its solution to a chaotic problem. But the neural network solver provides no such guarantee, and has not been trained or evaluated on "hard" examples. I'm not sure the comparison is fair or particularly meaningful. I wonder if exactly the same sort of "positive" result could be achieved with a standard symplectic integrator, it's not at all obvious from the paper that they have ruled that out.


Yeah, this is classic neural network hype. I wish there were a way to automatically copy-paste my usual comment on trusting the MIT Tech Review, and especially its Emerging Technologies column, whenever such a link is submitted.


You could train a neural net to do it.


No, that would be classic neural network hype. I wish there were a way to automatically copy-paste knzhou's usual comment on trusting the MIT Tech Review, and especially its Emerging Technologies column, whenever such a link is submitted


You could train a knzhou to do it.


Yeah, this is classic neural network hype. I wish there were a way to automatically copy-paste my usual comment on trusting the MIT Tech Review, and especially its Emerging Technologies column, whenever such a link is submitted.


Argh, just looked up Brutus and it's pretty much exactly what I wanted to do on my own after quite a lot of reading on n-body simulations and finding nothing like it out there. I was (very naively, I realise) hoping to do some original research.

Oh well, time to study this intensively. Here's a link for the curious: https://arxiv.org/abs/1411.6671


That’s total validation of your thinking!!!


You know, as a PhD student, I sometimes feel like the moment I come up with a good idea, it's guaranteed to show up as a result on Google Scholar the next day, which can be rather discouraging.

I've never really thought about it in terms of your comment, however, which is actually very motivating. Thanks for the perspective random internet person.

(Apologies for the OT comment)


Reminds me of when I asked a professor once "why don't they try <alternative approach> instead?". He replied, "I tried that when I was a grad student. It turns out not to work that well because <reasons>." I never figured out if that validated or invalidated my thinking, but I found it funny and kind of eye-opening either way.


A lot of the value of being in a department is being able to have these shortcuts. I remember as an undergrad in physics asking why we didn't use imaginary time to formulate special relativity? The grad students around me by and large looked puzzled and said they had never thought of it. A very, very experienced nuclear/mathematical physicist said, "Oh, yeah, folks did try that back in the early 20th century. It works fine, but you have to go to a full metric for general relativity, so why carry around two formalisms?" I contemplated that for about thirty seconds, said, "That makes sense," and happily put aside the question.

It vindicates your thinking in that it indicates that you are picking up the techniques that professionals use to generate ideas. You just have to accept that as you're learning, most of the ideas will have been generated and addressed.

What gets awkward is when you're doing interdisciplinary research and you're using the generative mechanisms from one discipline in the other. By and large both sides stare at you in confusion, or it requires multiple hour conversations to lead the other discipline's practitioners into a mental space where they can say, "Oh, yeah, that wouldn't be useful because of X." It makes the process of tuning your idea generation much slower. On the flip side, you hit the unknown way faster, sometimes great vistas of unknown where everyone goes, "Umm...wow. Yeah. Never heard of that continent. Let us know what you find."


It is validation that you had an interesting idea that others thought was worth checking out.


I think even a PhD is more about execution than good ideas. There is generally speaking no shortage of good ideas, but many turn out to be too labour intensive or have other subtle problems. In any case I’ve also gotten discouraged by this, but what eventually matters is really execution on one or two good ideas during your PhD


This is very true.

Advise to new PhD candidates: take a boring idea and write the best paper you can on it. It will be 10x better received than your "great" idea.


The 3-body problem is far from solved from what I understand. There's plenty of scope for original research.


Agreed, this paper did not show any meaningful generalization and is sorely short on baselines.

If you wanted to train a neural network model that would generalize, you would need build in physical constraints into the model architecture. For example, dynamics are governed by a Hamiltonian based on pairwise interactions, which should be integrated as an ODE. A nice recent example of this from folks at DeepMind is this paper “Hamiltonian Graph Networks with ODE Integrators”: https://arxiv.org/abs/1909.12790. That said, if you go down this road too far you may find that you are just learning an approximation to Newton’s law of gravity, which we already know exactly!

The interesting work in this space is finding appropriate interpolation problems inside the context of state of the art physics based models. In any complex simulation, there are tunable parameters and approximations that are suitable to replace with ML. If I were working on this problem, I would start by making a differentiable version of Brutus, and try to deeply understand its strengths and weaknesses.

Neural nets are just high dimensional interpolation. The only reason why they are more powerful than “classical” ML is that you can fit them when embedded in an arbitrary computational graph, in which it is easy to embed prior knowledge (“differentiable programming” style). If you’re not doing that, you might as well just stick with the blackbox algorithms of Scikit-Learn.


Also, they split their data to 9900 training and 100 validation simulations. They didn't do cross-validation which makes sense given the cost but ultimately their setup means that they tested their models on 100 examples out of 1000 they generated and out of who knows how many in nature. That's not very promising.

In terms of practical applications I don't see what's the point of training a network on simulations produced by deterministic systems. But I guess they didn't really mean their technique as a replacement for deterministic systems because they say they envision it as a sort of crutch were Brutus falters.


The initial conditions are all stationary, and it's in the plane exploiting symmetry, so they are just 2 real numbers.

They don't evolve the simulation very long and the errors rapidly increase with time. See Figure 3 for typical errors of about 20% on the validation set with clear signs of overfitting. https://arxiv.org/pdf/1910.07291.pdf


The original paper can be found here: https://arxiv.org/abs/1910.07291

Abstract:

> Since its formulation by Sir Isaac Newton, the problem of solving the equations of motion for three bodies under their own gravitational force has remained practically unsolved. Currently, the solution for a given initialization can only be found by performing laborious iterative calculations that have unpredictable and potentially infinite computational cost, due to the system's chaotic nature. We show that an ensemble of solutions obtained using an arbitrarily precise numerical integrator can be used to train a deep artificial neural network (ANN) that, over a bounded time interval, provides accurate solutions at fixed computational cost and up to 100 million times faster than a state-of-the-art solver. Our results provide evidence that, for computationally challenging regions of phase-space, a trained ANN can replace existing numerical solvers, enabling fast and scalable simulations of many-body systems to shed light on outstanding phenomena such as the formation of black-hole binary systems or the origin of the core collapse in dense star clusters.


As long as you're focusing on a bounded time interval, I believe a fair comparison would be comparing to interpolation between precomputed solutions obtained with the same computing power as used for training the network. If there is not any improvement over that, then it's not interesting. Look-up tables for rapidly solving such problems in a limited region have also existed since Newton's time.


> Look-up tables

don't work well on GPU because threads don't access the same memory location based on the argument to teh lut. Some nonlinear basis or projection doesn't have the same problem.


I expect KSP will finally get a real N-body simulator! My mun orbit insertion will be even harder!


There already is a pretty good one available as a mod: https://github.com/mockingbirdnest/Principia


When I read that KSP didn't use "real" N-body physics, I lost interest in trying it. Maybe it doesn't make a big difference, but I don't understand how it would save any noticeable amount of cpu cycles. I mean, there aren't that many objects in the game, are there? It's not like it involves galaxies or star clusters.


That's because N-body physics implies unstable orbits.

For the planet scale objects, the game designers would have to make sure that the orbits don't go crazy (over times that can be explored at maximum time warp). But then if the goal is to keep those orbits stable, why not just save all the dev effort and just keep them on Keplerian rails?

For the smaller objects launched by the player, unstable orbits aren't necessarily desirable either. Imagine going on a multi-decade mission and returning home to find your favorite space station missing. Game devs aren't going to spend effort on complicated features that will just piss off most players.


I can think of a few reasons why it was a pain to add at any given point, (and then you just have to imagine some dev inertia/laziness/other priorities to ge tthe rest of the way): The game started off with only on body, Kerbin (earth knockoff), with the sun going around it. They then added the Mun and put in spheres of influence.

As they added more planets, if, at any point, they had changed from the simple model to the true one, it would have broken all tutorials, delta-v calculations, extant saves, etc. It would also have required more changes to make the game properly playable. In KSP, players will often leave vessels in orbits on timewarp while focusing on something else- they'll have 10 missions ongoing at once. This works well when orbits aren't being perturbed, but in real life/real physics, you can't just leave a vessel in orbit around the moon for a year without course correction- it'll crash. The problem is, the game has no way to automate course correction/stationholding, and that would be far less trivial to add than merely changing the gravity model from one system to another.

I'm not apologising for the devs not having done it right, really. KSP didn't really live up to all its promises, though it is pretty good.


Is this approach provably guaranteed to come up with an accurate solution?


You can come up with a probablistic notion of accuracy, but I'm not sure proving guaranteed accuracy on a deep ANN is possible even in principle. Am I misunderstanding the question?


> be found by performing laborious iterative calculations that have unpredictable and potentially infinite computational cost, due to the system's chaotic nature

This is an exaggeration. This is an undergraduate level problem. It's not difficult to integrate the movement of 3 bodies under the influence of gravity. Sure, it can be tricky sometimes, or you need a small step size, but it's not rocket science.

> provides accurate solutions at fixed computational cost and up to 100 million times faster than a state-of-the-art solver

100Mi times faster is a bit of a stretch.

> Our results provide evidence that, for computationally challenging regions of phase-space, a trained ANN can replace existing numerical solvers, enabling fast and scalable simulations of many-body systems

Yes but extrapolating from 3 to several is exactly where this technique is bound to break.


There's already a much faster method for solving many-body systems than calculating the influence of every one on every other one.

Basically, you divide your space into pieces depending on the location of the masses, and you build a tree, where far away areas are treated as if they were point masses at the center of gravity. This converts an N^2 computation into an N log N one. I'm not clear whether this neural net does better than the well known fast approximate method.


> one of the classic problems of applied mathematics.

Maybe I'm biased with my physics PhD, but I'd call it a physics problem. "Solving" it to me means a closed-form solution, not a particular prediction, but I guess the applied math problem is a different problem with a different solution.


(another Physics Ph.D. here)

Solving a problem in physics or math generally implies a closed form solution. Which this is, decidedly, not. It is in fact a special case, with low particle count, and doesn't return velocity information, thus leading to problems assessing whether some things like Energy, are actually conserved.

They discuss this. And have to use a second network to compute velocity, used for computing kinetic energy. In which they discover that in close near-collision dynamics, their error can spike. Unless they reproject the phase space coordinates specifically atop the "right" constant energy surface.

So ... yeah ... title of preprint is quite misleading. Their conclusions are also misleading.

Are they better/more general at numeric solution for arbitrary mass, arbitrary particle count, arbitrary initial conditions than, say, a good symplectic integrator of high order?

No. Actually they only studied the equal mass case of 3 particles. Given the combinatorics, I could not fathom the size of the training set they'd need for N(particles) of any reasonable size.

There they would start having to look into some of the approximations that the simulation folks have been using for a long ... long time. Just to get their training cost down far enough to be useful.

An example. Take a simulation with N ~ 10<sup>6</sup>. 1 Million objects. Take a mass distribution in a Poissonian distribution, or an EVD, or similar. Make it similar to approximate stellar masses. Create your initial conditions and generate your training sets from these initial conditions. How many of these simulations would you need to run, and for how long, in order to be able to get something which provided long time scale prediction, which meshed with reality?

Oh, take N at least 8 for planetary mass distribution. With satellites, so N ~ 200. Never mind planetary rings or Oort cloud/Kuiper belt/asteroid belt objects.

How does large does your training need to grow in order to be able to predict, with any degree of accuracy, orbital stability over gigayear intervals?

My point is not to beat on the authors, but to point out that this study, while interesting, is most definitely not a solution to the 3 body problem. There is vast literature on solutions to N-body problems in general. These problems are considered hard. Because, they actually are. There are some "short cuts", basically field-like approximations, that one can use to make some generally reasonable arguments about the interaction with many distant objects. They actual dynamics, and details of Poincare maps and phase space portraits are, again, very well studied phenomenon. With 100+ years of research behind this.

If they wanted a simpler dynamic problem to work with, the double pendulum exhibits chaotic behaviour. All they need to vary are 2 parameters for each arm/mass, length and mass. See how close they come to actual behavior, onset of trajectory divergence, etc.

[edit: slight clarification of second network computing velocity for kinetic energy]


Physics PhD here. I'd call it both. A lot of what applied math departments study are problems of physical nature; and a lot of physics research boils down to applied math. (Broadly speaking, all theoretical physics is applied math.)

Guess I'm biased too because I'm a theorist and started out from a math background.


I went the other direction, physics to math, but agree with your broad characterization. Not all applied math is physics, obviously. I do find mathematicians and physicists often have different characteristic ways of thinking about similar things, but the lines are blurry.


How is it different?

Edit: Sorry, I assume you refer to the OP in which case yes, I agree that it's not a solution at all. Rather, it is a presumably faster-than-before simulation.


It's not a surprise that after brute forcing a problem with neural networks and generating a black box model it becomes much faster than on-the-fly computations.

The post made me remember this article about predicting chaotic equations with machine learning with great accuracy:

https://www.quantamagazine.org/machine-learnings-amazing-abi...


I've been wondering if neural nets will able able to find solutions to PDEs with sparse inputs such as this one. Intuitively it makes sense that fast solvers would exist for any PDE where the input data is either very sparse or highly symmetrical. A lot of the PDE domain can be "meshed" advantageously with portions of the domain either reusing computation or being wiped out as insignificant contributions. A neural net can learn to represent a PDE domain in terms such regions. I am really curious if this is what happens here and if there is something vastly beyond what current PDE solvers are able to do.

But I also imagine as the input size grows within the orders of the hidden layers it will lose the advantage.


Some earthquake researchers I know have started using neural net approximations of PDEs for doing viscoelastic calculations [1] (simulating how the Earth's crust and upper mantle responds during and after an earthquake). The have found pretty huge speedups (5000% on CPU - 50,000% on GPU) using neural nets with good accuracy, but I am not sure that the PDE meshing was super highly optimized.

[1]: https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1002/2017...

edit: Whether this NN approximations will enable us going well beyond what PDE solvers can do is a tough question, especially because PDE solvers are generally needed to build the training sets as far as I understand (i.e. they are trained on synthetic, not observational, data). It's possible that one could strategically and judiciously train smaller components of a NN system and then assemble the system to do things that the PDE solvers couldn't do, perhaps, in the way that one can often write unit tests for scientific software but not really test the final results directly (because there is nothing to test against). However I think that in general the lack of training data for the more complex system will always be a major obstacle, whereas it's not really for PDEs.


But how are you going to trust it?

If the neural net gives you a solution, is there any cheap way to confirm it?


Yeah. Guarantees about robustness are pretty useful.

That said, even if you didn't trust it, it might be possible to use the approximate alleged solution spat out by the neural net as an initial solution to a trusted solver with guarantees. In some cases you might be able to get a guaranteed result from the proper solver with the neural net solution used to accelerate convergence


Depends on the problems, but in many situations yes. For PDE's you can plug in the proposed solution and calculate the error. Usually you'd do this in a discretized version of the PDE, but since neural nets are symbolically differentiable you could instead approximate (by sampling) the average error from the real PDE itself.


Sounds like a good way to train an adversial network


"trust, but verify".

You can get decent error estimates for lots of these sorts of systems, with a manageable amount of extra computation.


Not had a chance to read the paper yet but the ability to find computational shortcuts in a epistemically minimalist way (ie, essentially admitting that we have no idea how to solve the problem and let a computer iteratively work it out) for solving analytically irreducible PDEs makes me very bearish on quantum computing. A quantum system can be described by Schrodinger's equation, which is no more esoteric than any other wave equation. If it's hard to solve now, maybe it's just because the computationally efficient methods can't be reached by mathematical analysis. But it doesn't mean those computational methods don't exist.


I like all the amazing comments and critique there but have we considered that somehow, the neural network was able to formulate the concept of gravity, emulate a chaotic system and do all of it using a constant computational cost? It is a bunch of interacting units that reconfigured itself using gradient descent to behave like a chaotic three body system. I find that quite fascinating. Although I would agree that the hype created is not justified but this neural network approach bypasses concepts of gravity, newtonian mechanics, chaos, ODE integration and gives a simple function to calculate position at arbitrary times. Quite interesting.


Can someone explain: When they say solve,do they mean it can show state after N iterations accurately after sufficient computation? Or do they mean it can find a formula for the bodies where state can be predicted at constant time regardless of iteration?


The first. For N>=3, there is no such formula (it's been proven).


Thanks, didn't know lack of formula was prove. I thought it was simply not found yet.


This may be news for reasons I’m not aware of (it got an MIT writeup), but as someone who worked for a bit in scientific computing, using a neural network to build a reduced model has basically no novelty to me.


Possibly unrelated, but did anyone ever attempt to accelerate the solution of linear systems of different kinds using neural networks?


Basically, memoization, trading off accuracy for space instead of time?

It’s also faster since you never factor in the training time for the NN, alright.




Applications are open for YC Summer 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: