
AWS Outlines Current HPC Cloud User Trends - jonbaer
https://www.nextplatform.com/2017/01/26/aws-outlines-current-hpc-cloud-user-trends/
======
CaliforniaKarl
Although AWS is making progress, I doubt that they'll be able to take over the
entire HPC space[1].

First off, if a research group does not have access to a data center[2], then
AWS and GCP are the way to go. Just off the cuff, I'd say to go GCP unless you
have big memory or specialized hardware needs, then go AWS.

Other than that, AWS (and GCP) are great if you need to quickly spin up
something to investigate the usefulness of new technologies: If you're not
sure if a GPGPU or ASIC is going to work for your code, then test it at AWS.
But, if you plan on doing _a lot_ of computing, then once you've done the
validation, it's typically worth investing in the hardware so you can run it
locally.

Another reason, related to hardware, is that some places have better treatment
for capital costs (like computing hardware) vs. regular expenses (like AWS and
GCP time). If AWS could come up with a GAAP method for capitalizing AWS time,
then that would help them out alot.

MPI is mentioned alot in the article, and it's definitely an issue that
affects the more "traditional HPC" crowd, but there's something else: Alot of
clusters use Infiniband, for storage access and also for MPI (and other inter-
node communication). Infiniband is not just high-bandwidth (single-link EDR is
25 Gbps, 4x is almost 100), it's also _very_ low-latency (less than a
microsecond). The filesystems used over IB are similarly highly-performant
(like Lustre).

The speeds (and, just as important, the latencies) in HPC data centers are
amazing, much better than what you get with 10 Gbps Ethernet. People who have
experienced that want to keep it. However, IB is a different type of topology
than Ethernet, which doesn't work as well with the "your X can land anywhere
in an AZ" nature of AWS.

I'd also suggest that AWS get involved in SLURM development. Don't take it
over or absorb it; work with SLURM developers to get it to work naturally with
AWS.

So, unless (or until) AWS can address those issues, there's still a big place
for on-prem HPC!

[1] For anyone who might reply "AWS isn't thinking to take over all of it", I
would say, in the backs of some people's minds, I bet they are.

[2] "Data Center" could mean a rack or cage at a colo facility, but I am _not_
referring to a small server room or closet; you typically need more power
and/or cooling than those rooms can provide.

