
Scientists create first billion-atom biomolecular simulation - respinal
https://www.lanl.gov/discover/news-release-archive/2019/April/0422-atom-biomolecular-simulation.php
======
jryb
Biologist's perspective: this paper is __not __about the biology. The
simulation performed here has zero biological interest - the point of the
paper was to show how efficient and scalable their software is. This article
about the paper is terrible, but honestly I feel like they should be given a
pass - it 's hard to justify to a lay audience that understands neither DNA
nor memory bandwidth why you would choose to study something that is of no use
to the field.

There's an inherent tension when doing a method development paper - if your
result is too fantastical it's hard to know whether it's an artifact of your
(potentially faulty) technique, and if the thing you're studying is well-
understood then it serves as a good control, but it's less interesting. I
suspect they chose the latter path since it requires no validation via
existing methodologies.

~~~
stochastic_monk
I'm not familiar with this method, but NAMD and CHARM are a great way to
inspect biological systems at greater resolution than experimental methods can
measure. There was really cool work on fibrinogen about a decade ago showing
how it provides elasticity.

Reference:
[https://www.ks.uiuc.edu/Highlights/?section=2008&highlight=2...](https://www.ks.uiuc.edu/Highlights/?section=2008&highlight=2008-02)

~~~
dekhn
That's a PR narrative twisted to make the results sound much more useful than
they really were :(

------
AllegedAlec
While very impressive, I'm not sure about their goal:

> Modeling genes at the atomistic level is the first step toward creating a
> complete explanation of how DNA expands and contracts, which controls
> genetic on/off switching.

That does not work in isolation like this: McGuffee and Elcock did some
fantastic work in 2010 where they showed that protein stability is quite
dependent on that the cytoplasm is insanely densely packed. The same is also
true for the nucleus of a cell. You can't make any serious claims of a bottom-
up explanation of gene regulation and expression unless you can model the
entire nucleus, which is out of our reach for years and years to come.

And that is before I start on whether this is an actually useful way to
research gene expression in general, which I'd very much argue against.

~~~
cjslep
Depending on how they model the simulation's fields, long range intra-atomic
forces, and boundary conditions, I'd still say this kind of molecular dynamic
simulation is better than doing nothing and waiting to simulate the whole
nucleus. It could have accounted for these sorts of effects in a way that is
cruder than whole nucleus simulation, but I haven't read any papers about this
simulation.

~~~
jpvelez
Fascinating discussion! Could any of you knowledgeable folks say a bit more
about 1. why whole-nucleus simulation is so hard and out of reach, 2. what
role simulation currently plays in understanding gene expression, 3. to what
extent will simulation be important for understanding how genes translate to
phenotypes over the long term?

I know next to nothing about this field or the relevant biology, but would
love to learn more!

~~~
saalweachter
Time is the problem.

Imagine you're doing a straight-forward computer simulation. You have a state-
of-the-world, a time-step, and rules for how the world evolves during that
time-step.

You might write your own orbital simulation; the positions of the various
planets and asteroids are your state, you might use a time-step of a minute or
an hour or a day, and your rules for how the state evolves during each time
step might just be Newton's laws of gravity and motion. You can probably
update your state in a fraction of a second, even if you are simulating
hundreds or thousands of planetary bodies, and you can simulate eons in hours.

The problem with simulating biology on an atomic level is that your time-step
needs to be in the femtosecond to picosecond range, because atoms can move
_fast_ , and small differences in their position make big differences, but
interesting things like protein folding can take _minutes_ of real time.

So to simulate anything interesting, you need to step through _quadrillions_
of time steps.

Oh, and you're running this on more than one processor, so you need to do
smart things so your simulation isn't trying to synchronize all of its
processors on every step; oh, and we can't tell you the exact position of
every atom in a cell to start with, or even necessarily all of the different
types of molecules that are present. Oh, and all of these cells -- or pieces
of cellular machinery -- behave in ways that are dependent upon the
environment they are immersed in, so you also need to simulate that somehow.

There's a reason Randall Monroe described the difficulty of protein folding as
"we may one day find a harder problem".

------
callesgg
"130,000 processor cores with 1 ns/day"

Still a bit to slow... I guess it is progress :)

~~~
dekhn
when I wrote my thesis in 2001 I got 1ns/week on a much smaller system (1M
atoms running on a T3E w/ ~256-512 processors).

It's not really clear whether MD simulations like this are true contributions
or whether they will ever produce useful results compared to well-
parameterized neural nets.

~~~
bocklund
Are NNs producing useful results in this area? I'm on the density functional
theory side of things, so much smaller scales than MD.

To me, it's not clear whether we are good at quantifying when a neural net is
well-parameterized in atomistic simulations.

The limitations of MD simulations and all the built in assumptions are well
understood. We can reason about where our simulations might fail, even for
very large systems. It's always a question about whether the underlying model
can capture all the physics that occurs.

With a NN or other black box model, there's no way to reason about it except
for benchmarking test/validation sets that (hopefully) capture the physics
that you care about, but you still cannot really reason about how well the
model will extrapolate to multiple interacting physics with different
magnitudes at different length scales.

~~~
dekhn
I don't think anybody has shown that NNs produce better predictions of protein
models (yet).

The limitations of MD aren't well understood. We don't know the implications
of using a polarizable water model other than to say "it should be more
accurate".

All the statements you make about NNs apply to MDs since MDs are basically
feedforward ASTs with the same level of complexity and non-linearity as NNs.

------
tlobes
I’m curious to know how much room there is for optimization in their code

------
jayalpha
Sounds great. Again, how many mols are that? ;-)

------
moonbug
must be Gordon Bell prize time again.

~~~
safsafsafa
1E9 atoms / 130 000 cores = 8000 atoms/core

in 2011 I was building 40 000 atom models on a 2001 SGI Fuel simulating them
on a single 8 core node at 1ns/week -> 8000 atoms/core. They're about 7 times
faster then me 8 years later.

So from a computation point, meh.

From a tools point this is cool. It isn't trivial to build or analyze a model
of 40 000 atoms, never mind 1E9 atoms.

~~~
moonbug
weak scaling is easy.

