
Anton (Computer) - bane
https://en.wikipedia.org/wiki/Anton_(computer)
======
mechagodzilla
To make this more timely, DE Shaw Research has been posting COVID-19 related
simulations done on their Antons:
[http://www.deshawresearch.com/resources_sarscov2.html](http://www.deshawresearch.com/resources_sarscov2.html)

------
brianyu8
“The performance of a 512-node Anton machine is over 17,000 nanoseconds of
simulated time per day for a protein-water system consisting of 23,558
atoms.[5] In comparison, MD codes running on general-purpose parallel
computers with hundreds or thousands of processor cores achieve simulation
rates of up to a few hundred nanoseconds per day on the same chemical system.”

17,000 ns of simulation per day sounds crazy small but I wonder how that
compares to the timescale of molecular interactions. How helpful have Antons
been to pharma research etc?

~~~
tchitra
[Disclaimer: I used to work at D. E. Shaw Research from 2011-2016]

The early Anton 1 numbers of 17us/day on 100K atoms were huge leap forward
then. At that time, GPU-based simulations (e.g. GROMACS/Desmond on GPU) were
doing single digit ns/day. Remember, even for 'fast-folding' proteins, the
relaxation time is on the order of us and you need 100s of samples before you
can converge statistical properties, like folding rates [0]. Anton 2 got a
50-100x speed-up [1] which made it much easier to look at druggable pathways.
Anton was also used for studying other condensed matter systems, such as
supercooled liquids [2].

Your question of why is this so slow or small is prescient. On the reasons
that we have to integrate the dynamical equations (e.g. Newtonian or
Hamiltonian mechanics) at small, femtosecond timesteps (1 fs = 1e-15s) is
because the vibrational frequencies of bonds are on the order of picoseconds
(1 ps = 1e-12s). Given that you also have to compute Omega(n^2) pairwise
interactions between n particles, you end up having a large runtime to get to
ns and us while respecting bond frequencies. The hard part, for atomistic/all-
atom simulation is that n is on the order of 1e5-1e6 for a single protein with
100s of water molecules. The water molecules are extremely important to
simulate exactly since you need to get polar phenomena, such as hydrogen
bonding, correct to get folded structure and druggable sites correct to
angstrom precision (1e-10 meters). If you don't do atomistic simulations (e.g.
n is much smaller and you ignore complex physical interactions, including
semi-quantum interactions), you have a much harder time matching precision
experiments.

[0]
[https://science.sciencemag.org/content/334/6055/517](https://science.sciencemag.org/content/334/6055/517)

[1]
[https://ieeexplore.ieee.org/abstract/document/7012191/](https://ieeexplore.ieee.org/abstract/document/7012191/)
[the variance comes from the fact that different physics models and densities
cause very different run times -> evaluating 1/r^6 vs. 1/r^12 in fixed
precision is very different w.r.t communication complexity and Ewald times and
FFTs and ...]

[2]
[https://pubs.acs.org/doi/abs/10.1021/jp402102w](https://pubs.acs.org/doi/abs/10.1021/jp402102w)

~~~
madez
This explanation is interesting. Thanks for sharing it. While reading it, I
got the impression that the simulation is not fully quantum mechanical, but
rather classical with select quantum mechanical effects.

Which parts of quantum mechanics are idealised away and how do we know that
not including them won't significantly reduce the quality of the result?

Are you possibly using stochastical noise in the simulations and repeat them
multiple times, in the hope that whatever disturbance caused by the
idealisation of the model is covered by the noise?

~~~
tchitra
That's a good question and there are a number of ways to try to tackle this.
One of the main reasons you cannot do QM simulations directly is that the high
quality methods can cost Omega(n^6/eps) to get eps. relative accuracy (you can
do better with DFT, but then you're making your life hard in other way). At a
high-level (and I mean, 50,000 ft. level), here are the simplest way:

1) Do quantum mechanics simulations of interactions of a _small_ number of
atoms — two amino acids, two ethanol molecules. Then fit a classical function
to the surface E[energy(radius between molecules, angles)], where this
expectation operator is the quantum one (over some separable Hilbert space).
Now use the approximation for E[energy(r, a)] to act as your classical
potential. \- Upshot: You use quantum mechanics to decide a classical
potential for you (e.g. you chose the classical potential that factors into
pairs such that each pair energy is 'closest' in the Hilbert space metric to
the quantum surface) \- Downside: You're doing this for small N — this ignores
triplet and higher interactions. You're missing the variance and other higher
moments (which is usually fine for biology, FWIW, but not for, say, the
Aharanov-Bohm effect).

2) Path Integral methods: This involves running classical simulation for T
timesteps, then sampling the 'quantum-sensitive pieces' (e.g. highly polar
parts) in a stochastic way. This works because Wick rotation lets you go from
Hamiltonian evolution operator e^{i L}, for a Lagrangian density L, to e^{-L}
[0]. You can sample the last density via stochastic methods to add a SDE-like
correction to your classical simulation. This way, you simulate the classical
trajectory and have the quantum portions 'randomly' kick that trajectory based
on a _real_ Lagrangian.

3) DFT-augmented potentials: A little more annoying to describe, but think of
this as a combination of the first two methods. A lot of the "Neural Network
for MD" stuff falls closer in this category [1]

[0] Yes, assume L is absolutely continuous with regards to whatever metric-
measure space and base measure you're defined over :) Physics is more flexible
than math, so you can make such assumption and avoid thinking about nuclear
spaces and atomic measures until really needed

[1] [https://arxiv.org/abs/2002.02948](https://arxiv.org/abs/2002.02948)

~~~
madez
> Upshot: You use quantum mechanics to decide a classical potential for you
> (e.g. you chose the classical potential that factors into pairs such that
> each pair energy is 'closest' in the Hilbert space metric to the quantum
> rest) - Downside: You're missing the variance.

Couldn't the quantum mechanical state become multimodal such that the
classical approximation picks a state that is far away from the physical
reality?

And, couldn't this multimodality excaberate during the actual physical process
and possibly arrive at a number of probable outcomes which are never predicted
by the simulation? Is there more than hope that that doesn't happen?

~~~
tchitra
Yes, for sure. In practice (and not at the 50,000 ft. level), you do try to
include the multimodalities — you don't _really_ just use
E[quantum_energy(r)]. But you ARE still reliant on some
computable/smooth/Lipschitz moment and/or expectation from the quantum
surface. The semi-heuristic argument for why you get away with this in
biological simulation is somewhat heuristic, but of the following form:

\- Most quantum field theories are described by of the form L(E), where E is
an energy level ["effective field theory"] and the Lagragian changes as E
change.

    
    
       - When E is low, L(E) is classical mechanics & EM
    
       - When E is around 1GeV, L(E) is the aforementioned plus QED
    
       - When E is around 100GeV, L(E) has the aforementioned plus some QCD
    
       - When E is at 1 TeV, L(E) has the aforementioned plus Higgs-like stuff
    
    

Now biology is on the lowest end of that scale, so you mainly have to deal
with QED and perturbative electronic expansion. These electronic expansions
are the most important part — you need them to get hydrogen bonding +
electrodynamic molecular interactions correct — BUT they are highly local.

This locality is what you take advantage of when you normalize — you find from
QM that the potentials only matter when the two charged/polar molecules are
close, so you try to make a classical potential that has quantum 'jumps' when
these things are close.

Do you miss the purely quantum stuff? Aharanov-Bohm, Chern classes, and the
like? Of course. But from a practical standpoint, you do get the structures
that you measure from experiment to be correct because the 'cool' quantum with
'tons' of states is less important for pedestrian things at low energy scale.

It is still hard to get right though! There's a lot of entropy you need to
localize correctly and in some sense, you have to make sure you get the modes
as a function of local particle positions correct.

The final thing to point out is that the Wick rotated path integral stuff
works for biology much better than for real HEP-type of stuff because
molecules are contained at low energies — those tunneling probabilities are
O(h E), and log(E) is still dwarfed by -log(h) so you _can_ safely ignore
them.

This is _not_ true for things like circuits, however, because the lithography
at EUV scales (3nm -_-) does have tunneling issues at high field strengths.

tl;dr: Biology has some saving graces that give you good approximations. Are
they perfect? No, but if you find a time that I have to compute a vanishing
first Chern class in a noisy, ugly biological system, then you deserve a Nobel
Prize!

------
kylek
There is an Anton 2 (no wikipedia article unfortunately)-

[https://insidehpc.com/2016/02/anton-2-supercomputer-at-
psc-w...](https://insidehpc.com/2016/02/anton-2-supercomputer-at-psc-will-
increase-speed-and-size-of-molecular-simulations/)

~~~
zelphirkalt
There is also "Son of Anton".

~~~
mdonahoe
Gilfoyle, is that you?

------
ackbar03
I have a lot of respect for david shaw. He quit managing his hedge fund day to
day because he said something along the lines of finance make me stupid or
something and went back to doing something useful. If only more of our elites
realized this (and cared to do something useful with themselves)

~~~
downerending
Me as well. It does seem that the whole point of being a billionaire is to do
whatever you want. I can't imagine why so many seem to stick to managing their
creations, which after a while can't be much fun.

~~~
pm90
Most billionaires got their wealth through inheritance. All they know is to
manage the wealth creation agent that was handed down to them.

Most of the self made ones too have spent a large chunk of their lives
perfecting the wealth generation agent which made them rich. It would be like
asking a pro NBA player to also take a shot at being a pro NFL player. It’s
not what they trained for; they would need to learn a new skill, a new
industry and they will most likely fail anyways.

Only a minority of such billionaires actually end up doing different things;
Elon Musk comes to mind. Most are victims of their previous successes.

~~~
colanderman
To fully connect the dots: after spending many years doing something _really
well_ , it can be humbling and depressing to try to extract purpose or
identity from something you're comparatively terrible at. Impactful competence
(or at least, the belief thereof) is a tough drug to come off of.

------
andbberger
All these years running laps around everyone else doing MD simulations - what
do they have to show for it in terms of discoveries?

~~~
unemphysbro
It is strange that their academic output isn't on par with some of the more
prominent bio-molecular simulations research groups.

But I don't know much about their internals, perhaps, they're leasing a good
bit of computer time to biotech companies.

~~~
LolWolf
I’d say this is very much on purpose (I worked there for two summers, also
with one of the people in this thread). DESRES is very, very particular about
the papers it puts out, so, while there is an incredible amount of great
science with brilliant people who were mostly poached from academia, only the
very top papers ever get published. Many more are written or kept as internal
documents, but the firm is very particular about only publishing very
impactful research.

Unlike in academia, there isn’t a push to publish only okay or average quality
research since funding is not public and there are no metrics to push.

------
unemphysbro
I used to run simulations on this big guy :)

~~~
ggm
Without breaching NDA, do you think it was able to show outcomes which the
naieve computer scientist would say justified the approach?

------
Daub
Now seems as good a time as any to share a replica of an Antoine van
Leeuwenhoek microscope that my father made. The lens he made in the same way
as the original... heating up glass to a semi molten stake, then letting a
drop fall through the air. By the time it had landed and cooled...voila! A
spherical lens.

[https://www.dropbox.com/s/in4x3vjysw1o1wc/IMG_0291.JPG?dl=0](https://www.dropbox.com/s/in4x3vjysw1o1wc/IMG_0291.JPG?dl=0)

~~~
idoby
Can we please have more information about this? And better pictures?

~~~
Daub
My pleasure. Thanks for taking an interest:

[https://www.dropbox.com/s/qhs8qf2qw5e4n35/micro_01.jpg?dl=0](https://www.dropbox.com/s/qhs8qf2qw5e4n35/micro_01.jpg?dl=0)
From the back

[https://www.dropbox.com/s/4vl3rbfkelf1nv9/micro_02.jpg?dl=0](https://www.dropbox.com/s/4vl3rbfkelf1nv9/micro_02.jpg?dl=0)
Same, but annotated. 1 = The lens housing 2 = The 'stage'. Basically, just a
pin that you stick the subject onto 3 = The handle. using this, the assembly
is positioned in front of a strong light.

[https://www.dropbox.com/s/4xra81bu2qdsxj9/micro_03.jpg?dl=0](https://www.dropbox.com/s/4xra81bu2qdsxj9/micro_03.jpg?dl=0)
The front, showing the viewing hole

The whole thing actually works. The lens is hit or miss (literally), and it
shows. Plenty of chromatic aberration. Basically just a droplet of glass. He
tried many times to get it right.

Why did he make it? Well... he has a fascination with old technology. He was a
founding member of the British Vintage Wireless Society, and has written a
book on the subject of old radios. Also one on ancient navigation techniques.
He is an old fashioned polymath. Also have in my possession a replica of
Galileo's first telescope. Also works fine.

~~~
idoby
Very cool. Thanks!

------
entropi
My master's thesis work was on MD simulations. My setups had around 150k atoms
each, and it took months and hundreds of cores to finish any meaningful
simulation. I was incredibly jealous of that machine.

But frankly I am still not convinced of the usefulness of the MD studies
except for a few cases (docking studies, etc.).

------
person_of_color
Its really hard to get a job at DESRES.

~~~
sabujp
probably as a swe or research scientist, yes, i was contacted by them for an
SA position ~6 years ago

------
antoniuschan99
Thank you first time hearing about this

------
sabujp
how is this compared to gpus?

~~~
unemphysbro
Anton is MD calculations implemented in hardware, so an applications specific
machine.

------
dzonga
who else thought about son of Anton, from Silicon Valley show ?

~~~
vermilingua
I would say it's likely that the SV Anton was named after this.

~~~
dunkelheit
I’ve always assumed it was a reference to Anton LaVey (seemed appropriate for
Gilfoyle to name his server after the founder of the church of satan). It
would be interesting to know which theory is correct.

~~~
vermilingua
Good point, and while SV does have a lot of obscure references, this one may
be a _bit_ too obscure. Happy coincidence perhaps.

~~~
bavent
Gilfoyle refers to himself as a LaVeyan satanist early on in the show.

~~~
vermilingua
Yeah, I was referring to the supercomputer as obscure.

