Hacker News new | past | comments | ask | show | jobs | submit login
What exascale computing could mean for chemistry (acs.org)
46 points by Trouble_007 on Sept 11, 2022 | hide | past | favorite | 25 comments



Frontier (supercomputer) - Hewlett Packard Enterprise Frontier (OLCF-5) : https://en.m.wikipedia.org/wiki/Frontier_(supercomputer)

TOP-500 » June 2022 : https://www.top500.org/lists/top500/2022/06/

HN discussion of TOP-500 » 67 days ago » 71 comments : https://news.ycombinator.com/item?id=32002830

Aurora exascale Supercomputer – Planned to be completed in late 2022 : https://en.m.wikipedia.org/wiki/Aurora_(supercomputer)



That's a generous use of the word "info"

There's so many buzzwords you would think they were shilling crypto.


ORNL Celebrates Launch of Frontier – the World’s Fastest Supercomputer : https://www.olcf.ornl.gov/2022/08/17/ornl-celebrates-launch-...

Science at ORNL : https://www.olcf.ornl.gov/leadership-science/

---

OLCF researchers win R&D 100 award

Team honored for work on Flash-X software simulation package : https://www.olcf.ornl.gov/2022/09/08/olcf-research-team-wins...


This is exactly like if you were to ask: what can exascale do for larger scale quantum simulation.

Not much! These devices are vanity projects and prey upon people's intellectual blindness in the face of giant numbers.


This is… amazing misinformed. I am assuming you’ve never done scientific simulation work if you think that. Physical simulations in many fields get better due to increases in compute, memory, and bandwidth faster than they do from algorithmic improvements (there are only so many algorithmic improvements one can make to a PDE solver). And certain problems simply can’t be simulated until a certain amount of compute (and more importantly memory) are available.

And while some of the time the entire cluster will be given to a single large scale project, most of the time it will be acting as a massive GPU farm for all sorts of research. A win-win for everyone.


I have my Ph.D. in this exact field from UC Berkeley. My thesis was about polynomial scaling algorithms for quantum molecular systems. I was a postdoc at Harvard. I received an NSF career award. I know exactly what I'm talking about lol.

These computer platforms are drastically inefficient on a flop / $ basis. They exist to funnel money into the pockets of the companies who assemble them. They never ever achieve even a tiny fraction of their peak rated flops on any calculation that has any scientific meaning.


It sounds like your experience is exclusively with open-science DoE user facility machines. What you're saying is true for the bulk of working scientists, not enough of whom are given funding to make efficient code -- just to make working code. However, even on those systems there is some good competition for the Gordon Bell prize each year.

Meanwhile, Defense and closed-science systems of similar scale continue to be used at very good efficiency on problems that are strictly non-feasible on smaller clusters. The leadership-class systems are prestigious, and that prestige helps drive needed technological advances, even if the places that need them aren't in the university system.


Can I ask you what's standing in critical path to obtaining scientifically efficient compute?

Asking as a private user of commercially non trivial compute, but very short on the research depth required to translate optimal thinking into efficiency.

Edit: we're similarly bound by eg PDE solutions. We've found the greatest improvements in rolling our own storage. Not purely capex improvements but orders in ingest.


If your pde has spatial locality, parallelization works. I suspect you might be talking about multidimensional diffusions such as occur in finance. These decompose locally in price space. If the pde is nonlocal there is very little that works in general. I wrote a paper on parallelization in the time dimension, but it works poorly.


Probably, everyone is coding software using electron.

(While I am picking on electron, the truth is, if compute power exists, it seems DEVs never need to reign in code, or worry about efficiency.

Where I work, we purposefully set DEVs loose in VMs with minimal RAM, minimal CPU. If your app can't work with small RAM, and small CPU, how on Earth will it scale to 100s of requests per second? Compute costs.)


> These computer platforms are drastically inefficient on a flop / $ basis. They exist to funnel money into the pockets of the companies who assemble them. They never ever achieve even a tiny fraction of their peak rated flops on any calculation that has any scientific meaning.

Well... yeah.

Because these supercomputers also need communication networks so that they can actually work on such a large problem together. So any "large computer" is going to be slower than any "small personal computer" because communication costs grow with the size of a computer.

A computer with 1-million cores needs more than 1-million times the communication than a computer with 1-core. That's just the innate issues of complexity theory, Ahmdal's law, and other such fundamental compute problems.

---------

But the "small personal computer" is *impossible* to work on a larger problem. The "small computer" doesn't even have enough RAM to even hold a problem that these supercomputers work on, let alone the time/energy needed to finish the problem within 2 months.

---------

At a minimum, supercomputers are needed to solve and verify the models of the next-generation of computers. Its not like these chips with 8-billion transistors in them are correct on their first design. The design is iterated upon, simulated, and verified before hardware is made. These simulation steps happen on a computer, and a rather large one at that.


most supercomputer codes have communication patterns that scale sublinearly with node count.


Also this is not just a hardware issue. Many HPC systems in academic research have very mixed workloads that are hard to optimise the design of a HPC system for. The majority of researchers have little interest or time to spend optimising their code, they are seldom have enough experience in programming or software engineering to know how optimise without assistance (in our case we pick out the worst system hogs that are affecting other users on our HPC systems).


Alpineidyll already commented on QM calculations, but I can attest these machines are useless for molecular dynamics as well. In order to actually run in any even vaguely “efficient” regime (ie be faster than a single GPU on your exaflop machine), you need to scale the problem to the size that you can no longer approach interesting timescales for the dynamics involved. A 10 nanosecond simulation of a 100M particle system is useless. The problem almost immediately becomes latency bound when you use multiple chips. The first generation Anton machine from DE Shaw Research (which they built in 2008!) is still over an order or magnitude faster than any existing machine (in 2022!) that isn’t a newer Anton when it comes to MD.

If they wanted a GPU farm they could probably have built it for 1/10 the cost or less by throwing out the interconnect and infrastructure that makes it a “supercomputer”.


Well put. Of course Anton's Force field isn't accurate enough to correctly predict conformational equilibria of proteins ;P although it does scale!


That’s just a general limitation of the fixed-charge Amber/charmm molecular mechanics force fields everyone uses for molecular dynamics (whether you’re running on an Anton, or running gromacs or namd on a cluster or GPU, they all use the same MMFFs).


If you understood anything about how literally every chemistry problem scales non-linearly, and requires exponential amount of sampling to yield any predictive meaning you wouldn't talk about things you don't understand lol. Peta=>Exa equals 1000x so you can get 31x the accuracy in a monte carlo assuming perfect parallelism (at 1000x the cost), and that's one of the better scaling things it could be used for.

It's fun to have big toys sure, it's fun to make big GPU clusters. But if they spent what they spend on this computer, on just funding students to solve problems and toss them a 3090 100000000x more scientific breakthroughs would happen. This machine is 60% paperweight, at best, with a hefty budget for good old fashioned contract pork.


I kind of assumed (from a largely ignorant perspective) that this was part vanity/pork project as yu say, but that the real underlying purpose was to either simulate nuclear explosions for the DoE or to synthesize stupid large AI models with a trillion variables.

If the people in charge of this thing had an epiphany (or a blackout, take your pick) and left you with the keys for a year, what could you do with this inefficient but impressively large cluster, besides anchoring your paperwork?


at this point it seems like the folks building supercomputers for DoE and the folks building TPUs for Google should be talking. Both sides have interesting technology, but the communities are too far apart. Modern ML training is basically very similar to supercomputing that I think by working together, DeepMind could make better use of an exascale supercomputer than most scientists running simulations. Some of the network innovations in TPUv4 could be used in supercomputers.


100% agree with this. Although mini-batch SGD is much more parallel than most problems.

the scientific codes which parallelize poorly often parallelize poorly because they are written in ancient languages with support for whatever supercomputer interconnect bolted on poorly, Whereas TPU's + JAX have beautiful functional abstractions for distributed tensor computations.

Just funding re-writes of all the basic math/physics stack into a language with a PORTABLE parallel functional design and perhaps a compilation layer would definitely get more basic science done than this thing.


> Peta=>Exa equals 1000x so you can get 31x the accuracy in a monte carlo assuming perfect parallelism

True.

> (at 1000x the cost)

Not as true. Summit, the previous machine, hit 148 petaflops (Rmax) at a cost of $325 million. Frontier has already hit 1102 petaflops (Rmax) at a cost of just under $600 million.


You're right.


Is there a limit to the usability of that accuracy or will there be a huge increase in the demand for electricity soon that is only limited by the ability of NVIDIA and AMD to produce GPUs?


I've run sims on supercomputers and other HPC environments. MD and QM simulations really don't solve any problems, they just generate press releases.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: