
Japan Captures TOP500 Crown with Arm-Powered Supercomputer - l31g
https://www.top500.org/news/japan-captures-top500-crown-arm-powered-supercomputer/
======
pininja
The link is working for me, here are the details on the winner.

“The new top system, Fugaku, turned in a High Performance Linpack (HPL) result
of 415.5 petaflops, besting the now second-place Summit system by a factor of
2.8x. Fugaku, is powered by Fujitsu’s 48-core A64FX SoC, becoming the first
number one system on the list to be powered by ARM processors. In single or
further reduced precision, which are often used in machine learning and AI
applications, Fugaku’s peak performance is over 1,000 petaflops (1 exaflops).
The new system is installed at RIKEN Center for Computational Science (R-CCS)
in Kobe, Japan.

~~~
timClicks
Wow, so we've hit exascale. I heard claims that we would reach exascale by
2020 in 2012. I didn't believe them.

~~~
zekrioca
0.5 exascale

------
Animats
The end of the US semiconductor industry is now in sight.

The only US owned state of the art fabs in the US belong to Intel. Intel
survives because they have a high margin on x86 CPUs. Today, TMSC announced
5nm, and the top supercomputer is ARM-based.

Apple seems to be going ARM. Chromebooks are ARM. Microsoft now offers Windows
on ARM, on the Surface Pro X. Mobile never used x86. x86 is on the way out.
What's left for Intel?

(Micron is still a major force in DRAM, amazingly.)

~~~
pankajdoharey
You see Intel is a 50 yr old company, you think they will sit hand in hand on
their bums? If the majority Industry shifts towards ARM ISA Intel will evolve,
What stops Intel from Licensing the ARM Core and build a industry leading ARM
Chip? I think no-one with the exception of Apple in the semiconductor industry
has more resources than Intel to build a world class ARM CPU. Intel is just
trying to drag x86 as far as possible because it can monopolise the
architecture only AMD and Via are other two vendors who have license to build
x86 processors.

~~~
Animats
_What stops Intel from Licensing the ARM Core and build a industry leading ARM
Chip?_

That others can compete directly on price. Intel can probably do it
technically, but will not have the margins they had with x86.

~~~
pankajdoharey
Yes that is true. if the Industry shifts happens then the only way they can
have those margins is if they build a equally competent ARM Chip, hard to
justify to the investors to abandon billions invested in the current
architecture. Surely the top management and marketing would be fired. But
Intel could do it.

------
gnufx
Perhaps it's worth pointing out some context. Given the remarkable
predecessor, K Computer, this was only a matter of time. (I heard a great
early talk on K, and I wish I knew the speaker for credit who was obviously
working quite hard in English, but flawless, ending with basically we did it
all ourselves largely de nuovo.) It seems that given the current
circumstances, they haven't kept to schedule -- it was supposed to be
operating _next_ year.

There's a lot non-mainstream in this, like K, but partly influenced by K
experience. Unusually, it's all apparently specifically designed for the job,
from the processor to the operating system (only partly GNU/Linux). Notably,
despite the innovation, it should still run anything that can reasonably be
built for aarch64 straight off and use the whole node, even if it doesn't run
particularly fast; contrast GPU-based systems. (With something like simde, you
may even be able to run typical x86-specific code.) However, the amount of
memory/core is surprising -- even less than Blue Gene Q -- and I wonder how
that works out for large-scale materials science work for which it's obviously
prepared. Also note Fujitsu's consideration of reliability, though the oft-
quoted theory of failure rates in exascale-ish machines was obviously wrong,
otherwise as the Livermore CTO said, he'd be out of a job.

The bad news for anyone potentially operating a similar system in a
university, for instance, is that the typical nightmare proprietary software
is apparently becoming available for it...

~~~
m_mueller
Maybe you mean Matsuoka sensei? He's the director of RIKEN AICS since a couple
of years and a known media figure in Japan.

~~~
jabl
Satoshi Matsuoka is an "international rock star" in HPC circles. But I don't
think he was involved with the K computer; before RIKEN he was IIRC at Tokyo
Tech doing their "Tsubame" GPU clusters.

~~~
m_mueller
Yes, but I've seen a talk of him where he also mentions K, so maybe that's
what GP meant.

Edit: Btw. I did my PhD at Tokyo Tech (not with Matsuoka though) ;-)

------
ViralBShah
Given that today's HPC architectures are mostly power constrained, and a
majority of the FLOPS often come from GPUs (for their flop/watt ratios), this
direction is not surprising.

ARM has been making major strides in the high performance area. The new AWS
Graviton processors are pretty nice from what I have heard. And then there's
the ARM in Mac. Yup and Julia will run on all of these!

While I say all of this, I should also point out that the top500 benchmark
pretty much is not representative of most real-life workloads, and is largely
based on your ability to solve the largest dense linear solve you possibly can
- something almost no real application does.

(The website is down, so I haven't been able to look at the specs of the
actual machine).

~~~
fhqghds
Get ready for a surprise then: all those FLOPS are coming from the ARM
cores.... This beast has no GPUs:

[https://postk-web.r-ccs.riken.jp/spec.html](https://postk-
web.r-ccs.riken.jp/spec.html)

~~~
leeter
So looking at anandtech's breakdown the CPUs are closer to a knights landing
'CPU/GPU' than a traditional CPU (currently). They also have a ton of HBM2
right next to the dies so this should be insanely fast as they can feed those
cores very very quickly regardless of how fast each core is by clock and
pipeline. That should massively reduce stalls.

~~~
stephencanon
The "traditional CPU" portion of the core is a bit more capable than KNL, but
yeah, that's roughly accurate.

~~~
leeter
Oh agreed, but honestly what makes this so interesting is how tuned it is. I'm
honestly surprised we haven't seen Intel or AMD ship an HPC CPU with on
package HBM2 yet.

------
dman
Really wish Fujitsu sold a developer kit with an A64fx chip - its the only
shipping ARM chip with SVE that I know of and I would love to get my hands on
one to play with.

~~~
loudmax
No kidding!

I don't have any sense of how much these cost to manufacture. There ought to
be a market for a A64fx based rackmount server system. If the price isn't
outrageous, I'd love to see these sold as an SBC.

~~~
timthorn
Something like the Fujitsu PRIMEHPC FX700?

~~~
krapht
Minimum 128 unit purchase for those outside Japan, though.

~~~
kohtatsu
A group buy would be cool.

------
calaphos
Even more impressive than the linpack result (2.8x faster than the runner up)
is the HPCG result at 4.6x the result of summit in second place.

That benchmark consists of more sparse matrixes which are a lot more realistic
depiction of hpc workloads. Seems to scale a lot better with irregular access
patterns than basically Nvidia GPUs on the other systems.

------
ksec
Well the link is dead. But I am guessing it is from Fujitsu A64FX, a 512 bit
SIMD extension for ARM.

Edit: Turns out I was right. May be this link is better.

[https://www.anandtech.com/show/15869/new-1-supercomputer-
fuj...](https://www.anandtech.com/show/15869/new-1-supercomputer-fujitsus-
fugaku-and-a64fx-take-arm-to-the-top-with-415-petaflops)

~~~
floatboth
To be clear, the extension -- _Scalable_ Vector Extensions -- is for any width
between 128 bits and 2048 bits. (It's in the name!) The implementation in
Fujitsu A64FX seems to be 512 bit specifically.

------
Symmetry
For those interested in more details they did a presentation at Hot Chips. The
slides are here:

[https://www.hotchips.org/hc30/2conf/2.13_Fujitsu_HC30.Fujits...](https://www.hotchips.org/hc30/2conf/2.13_Fujitsu_HC30.Fujitsu.Yoshida.rev1.2.pdf)

------
akamoonknight
Is there information on how the Fugaku machines are connected together? The
highest performing Power9 ones seem to use InfiniBand, but is that still true
with the ARM devices?

Edit: seems to be a Fujitsu designed interconnect [0]. Wonder how much of the
overall performance is dependent on the difference in communication.

[https://www.fujitsu.com/global/documents/solutions/business-...](https://www.fujitsu.com/global/documents/solutions/business-
technology/tc/catalog/08514929.pdf)

~~~
gnufx
I don't know whats the best reference, but here's one:
[https://www.fujitsu.com/global/Images/the-tofu-
interconnect-...](https://www.fujitsu.com/global/Images/the-tofu-interconnect-
d-for-supercomputer-fugaku.pdf)

~~~
akamoonknight
Ah, much better than the white paper, thanks!

------
gok
It's kind of interesting that in terms of perf/watt, it's actually slightly
less efficient than Summit, which is over 2 years old. Also interesting that
they went with a homogenous design (all Arm cores) instead of a heterogenous
CPU+GPU setup.

------
sadfev
Can any of the HPC experts shed some light on how these ARM chips are better
than their predecessors. I toured a small cluster in LANL, where the ARM chips
ran the hottest and their cooling was the loudest.

~~~
blopeur
A64fx have on board HMB -> that means no dram. If you look at the fugaku
mother board their is no Dimm slots. All the memory is on the same package as
the CPU.

This delivers a huge boost in bandwith.

HMB stand for high memory bandwidth. It offers up to 900 GB/s.

Now if you add the tofu interconnect on top you have a systems finely tuned
for maximising data movement.

Remember : compute is cheap, communication is expensive.

You can have load of gpu and processors but if you can't feed them data fast
enough they are useless.

~~~
lukevp
That's fascinating. I know that AMD has been touting HBM as a faster memory
subsystem for their GPUs, is that the same as HMB where it's stacked? Or are
they just calling it something similar?

~~~
rrss
It's the same thing. High end GPUs have been using HBM2 for a few years, A64fx
uses HBM2 for CPU memory.

------
praveen9920
Pushing the boundaries is the best way to advance technology

Car manufacturers need ridiculously expensive race cars to push the technology
to get advancements in everyday car technology. Similarly, Top500 is one way
to push the technology for not just computationally but also things like
better power, heat and network management in processors and computers in
general.

With ever doubling server farms, Heat management of these systems will become
the major contributor for environmental pollution than all vehicle exhaust.
Rather than spending on renewable sources of energy for these farms, it makes
sense to optimize energy consumed per processor. Hoping to see advancements in
this area

In my opinion, next arm/intel will be the company who does energy efficient
processors

~~~
dawg-
>Heat management of these systems will become the major contributor for
environmental pollution

Here's an idea, maybe we should just turn the moon into a giant data center?

~~~
giantrobot
In a vacuum the only way to dissipate heat is radiation and to a lesser extent
conduction. In an atmosphere you also have convection. It's more efficient to
cool something on Earth than on the Moon.

We also have more power options on Earth. On the Moon you'd only really have
solar power available and then only two weeks a month. On Earth you've got
insolation for half a day every day and the option of other renewable sources.

This is all besides the phenomenal cost of building data centers on the Moon
vs Earth.

~~~
moonchild
> heat dissipation

Eh, the moon is made of pretty cold rock.

> two weeks a month

> half a day every day

Amortized, that's the same thing.

~~~
giantrobot
The Moon might be solid rock but that doesn't make it a good heat sink. If
solid rock was a good heat sink you could just build data centers inside
mountains on Earth orders of magnitude cheaper. Rock might eventually
dissipate heat but it's not conductive enough to carry it away from heat
generating elements to keep them from melting down.

Renewables on Earth include solar, wind, hydro, tidal, and geothermal. Even if
your data center was 100% solar you only need enough storage to cover night
usage and maybe a few days of total cloud cover. A grid tie is also trivial. A
data center on the Moon _requires_ two weeks of power storage and has no grid
tie option.

Again, this is all ignoring the literally astronomical cost of getting mass to
the Moon.

If you want expensive but non-polluting data centers you'd get more bang for
your buck building them on or under the ocean with renewable power attached. A
barge or submerged platform with ocean water for cooling anchored off-shore
would provide orders of magnitude better latency and be orders of magnitude
cheaper.

------
flyGuyOnTheSly
There's something amusing about a blog post announcing the fastest computer in
the world being unable to serve up web requests in under 10 seconds.

(When I wrote this comment, I was seeing 500 status code errors when trying to
load the page)

~~~
glouwbug
True, but we can blame the programmer as we usually do

------
d_tr
For what it's worth, Cray is also offering supercomputers with A64FX chips.

------
29athrowaway
Some more information about Fugaku, aka Post-K, here:

\- Slides on Fugaku: [https://www.fujitsu.com/global/Images/supercomputer-
fugaku.p...](https://www.fujitsu.com/global/Images/supercomputer-fugaku.pdf)

\- SVE 512 instructions for armv8:
[https://www.fujitsu.com/global/Images/armv8-a-scalable-
vecto...](https://www.fujitsu.com/global/Images/armv8-a-scalable-vector-
extension-for-post-k.pdf)

------
mxcrossb
It’s sort of funny to watch the ISC event and see news of a machine with an
exaop of AI performance, while the zoom presentation still can’t properly crop
out the background.

------
hangonhn
Anyone know the reason for the dominance of Power processors in the top 10
other than it's from IBM and they get a lot of contracts for HPC?

~~~
detaro
They're fast, have lots of memory and IO bandwidth and can do some cool other
tricks (I can't remember the name right now, but they have thing for PCIe
devices to participate in cache coherency, their in-system protocols scale
better to more CPUs, ...)

~~~
gnufx
CAPI

~~~
detaro
exactly, thanks

------
fomine3
Green500 #1 is MN-3 by PFN that's also from Japan and they also use original
chip!

~~~
IanCutress
[https://www.anandtech.com/show/15177/preferred-
networks-a-50...](https://www.anandtech.com/show/15177/preferred-
networks-a-500-w-custom-pcie-card-using-3000-mm2-silicon)

------
cwaffles
Backup link if you're getting http 503:
[http://archive.is/JSvCi](http://archive.is/JSvCi)

------
IncRnd
[http://archive.is/PNY80](http://archive.is/PNY80)

------
cinntaile
This kind of feels like a publicity stunt for Arm. Arm is owned by the
Japanese company SoftBank and now Japan captures the supercomputer crown. I
don't want to take away from the achievement and it's certainly possible that
this is just a coincidence, maybe someone with more knowledge on the subject
can comment on this?

~~~
lovemenot
Fujitsu announced they would be building a Kei2 based on ARM, soon after their
Kei was #1 around 5 years ago.

ARM was a British company until it was bought by SoftBank a couple of years
ago.

------
Aaronstotle
CoreTeks on youtube has a great video that explains the brilliance of this
chip.

~~~
tandr
[https://www.youtube.com/watch?v=IfHG7bj-
CEI](https://www.youtube.com/watch?v=IfHG7bj-CEI)

That video is from end of 2018... which makes it even more amazing.

------
ausbah
is x86 likely to ever go away, or anytime soon? asking as a non-systems guy

~~~
stjohnswarts
Not for a long time. ARM is just more competition which is good. The guys
claiming that Intel is basically a has been and can't make anything new don't
know what they're talking about. Intel has come back more than once.
Competition is good, I'm glad it's heating up a bit more these days.

------
ClarkMills
Ah but does it run Linux? [probably but couldn't see in the link] Looks like
another nail in Intel's coffin...

------
mortenjorck
I cannot think of any plausible way in which Apple could have influenced the
date of this announcement, but the timing, given what is expected to be
announced later today, is uncanny.

~~~
xxs
The chips in the supercomputer are not exactly ARM, they do have arm-8
instruction set but that's just for loading the proprietary vector extension
unit.

Pretty much a GPU alike stuff with arm-8 set to be able to run its OS.

~~~
floatboth
No, it's not a proprietary extension, it's Arm's Scalable Vector Extension!

------
davidhyde
The site of the organisation responsible for assessing the fastest computers
in the world succumbs to the hacker news hug of death.

~~~
lorenzhs
Oh my, it's a django app with debug mode enabled. I just got an InterfaceError
with the full traceback and django configuration. (I've emailed them so they
can fix it)

~~~
Hamuko
Just how in the world are people deploying Django apps with DEBUG = True?

~~~
ChuckNorris89
Simple, human error and no code review process for your production
environment.

Something similar happened to a huge retailer here in Austria where just
typing your username without password would log you in. Reason? An intern
committed debug code to production and nobody noticed. In my book that's not
the fault of the intern but the fault of the CTO/$TECH_LEAD that hasn't
implemented and religiously uphold a code review process for everything that
goes into production since stuff like this could happen even to experienced
engineers that are tired or having a bad day.

~~~
kernaussage
I live in Austria as well, could you share to which retailer it happened?

------
mtgx
Also a king in efficiency:

[https://www.nextplatform.com/2019/11/22/arm-supercomputer-
ca...](https://www.nextplatform.com/2019/11/22/arm-supercomputer-captures-the-
energy-efficiency-crown/)

~~~
rrss
The full scale supercomputer is not quite as efficient as the prototype.

> The number nine system on the Green500 is the top-performing Fugaku
> supercomputer, which delivered 14.67 gigaflops per watt. It is just behind
> Summit in power efficiency, which achieved 14.72 gigaflops/watt.

------
UI_at_80x24
I can't help but think that the top minds from Cyrix aren't feeling both smug
at the vindication and dismayed that they were just a little too ahead of the
curve.

The writing was on the wall that RISC would win, but the x86 juggernaut
appeared unbeatable.

~~~
qayxc
> The writing was on the wall that RISC would win [...]

What do you mean by "win" exactly? RISC is just an architectural choice and
means nothing on its own. For reference, Google's TPUs, which - according to
Google - deliver 30-80x better performance per Watt than contemporary CPUs,
use a CISC design instead [1].

This whole "RISC vs CISC"-nonsense is quite inane, given that it's a design
choice that's highly application-specific.

It's even debatable, whether the A64FX can even be considered a "pure" RISC
design, considering the inclusion of SVE-512 and its 4 unspecified "assistant
cores" [2] ...

[1] [https://cloud.google.com/blog/products/gcp/an-in-depth-
look-...](https://cloud.google.com/blog/products/gcp/an-in-depth-look-at-
googles-first-tensor-processing-unit-tpu)

[2]
[https://www.fujitsu.com/jp/Images/20180821hotchips30.pdf](https://www.fujitsu.com/jp/Images/20180821hotchips30.pdf)

~~~
russler23
RISC is a philosophy, not so much a set of rules. If you let the creators of
RISC define their approach, the division between RISC and CISC becomes more
clear. Most summaries of RISC oversimplify it. Maybe that’s ironic, haha.

------
SomeoneFromCA
I suggest building a supercomputer out of Ivy Bridge Celerons. They are dirty
cheap, like $2 in bulk, yet quite performant.

~~~
zokier
Can you buy _10+ million_ units of those?

~~~
SomeoneFromCA
Finally someone who got the joke.

------
xhkkffbf
Not to be dismissive, but can't anyone "build" the biggest supercomputer by
reserving enough instances at AWS or GCP? I'm sure that AWS or GCP would like
to encourage this competition, but it seems a bit, well, boring.

~~~
floren
The ranking is calculated based on the Linpack benchmark. Being a parallel
application, performance is not simply scaled to number of processors; the
network interconnect is hugely important.

Now, although Linpack is a better evaluation metric for a supercomputer than
simply totaling up # of processors and RAM size, it's still a very specific
benchmark of questionable real-world utility; people like it because it gives
you a score, and that score lets you measure dick-size, err, computing power.
It also, if you're feeling unscrupulous, lets you build a big worthless
Linpack-solving machine which generates a good score but isn't as good for
real use (an uncharitable person might put Roadrunner
[https://en.wikipedia.org/wiki/Roadrunner_(supercomputer)](https://en.wikipedia.org/wiki/Roadrunner_\(supercomputer\))
in this category)

~~~
sushshshsh
I am curious to learn a bit more about how supercomputer scores measure
proportionally to "real world performance", which is a hard thing to quantify
since there are probably hundreds of different application "types" in the
"real world".

Combine this with the fact that many applications are limited by network
throughput rather than by CPU/SSD/RAM/PCIE, and performance becomes a hard
thing to quantify even in terms of "how many ARM cores do i need to buy to
make my CPU not be the bottleneck"

There are benchmarks for ARM linux compilation and ARM openjdk performance
benchmarks which are a good start, but I don't know how to compare SKUs
between those ARMs and the ones found in top500 supercomputers.

~~~
mxcrossb
HPCG is another benchmark on the Top500 site, and it’s more of a real world
benchmark. It’s of course not perfect, but maybe that’s more what you’re
looking for.

------
fizixer
As I have said countless times in the past:

\- Moore's law is dead at the level of the transistor

\- Architecture, HPC updates will keep coming for many years into the future

\- AGI has already escaped Moore's law (i.e., development of a fully
functional AGI will not be constrained by lack of Moore's law progress). And
that's what really matters.

\- Related note on AGI: it has escaped the data problem as well (as in we have
the right kind of sensors: mainly cameras, microphones, and so on). That is,
according to the categorization of AGI challenges in terms of hardware, data,
algorithms, the only missing piece is the right set of algorithms.

~~~
AnimalMuppet
Why do you say that AGI has escaped Moore's law? Especially, how can you say
so when you don't know what the right set of algorithms are?

~~~
fizixer
Oh then you're not going to like what I'm going to say next:

\- Somewhere between 2015 and 2025, multiple individual groups will have
cracked the AGI problem independently. (but 2015 is in the past, which means
there are likely groups out there that have cracked the problem and keeping it
a secret).

\- AGI-in-the-basement scenario is very doable and has been or will be done,
many times over.

~~~
AnimalMuppet
And your evidence for this claim is... what?

~~~
fizixer
A combination of:

\- sources available freely online

\- my own thought process and piecing of things together

~~~
AnimalMuppet
What sources freely available online tell you that there are likely groups out
there that have cracked AGI and are keeping it a secret?

~~~
fizixer
[https://www.futuretimeline.net/blog/2017/06/13.htm](https://www.futuretimeline.net/blog/2017/06/13.htm)

~~~
AnimalMuppet
By my eyeball, that adds up to maybe 12% probability by 2025. As evidence for
your claim, that's... rather unimpressive, and less than convincing.

------
KKKKkkkk1
We're living in an age in which AWS dwarfs all the machines on the TOP500
taken together. The TOP500 is a vestige of the cold war that needs to be
retired. Similarly to how the US and the USSR used to compare their numbers of
nuclear warheads, it is comparing a reserve of capacity that is probably going
to be retired having brought marginal benefits at best, all in order to goad
taxpayers into a futile competition.

~~~
rwmj
What a complete load of nonsense. Do let us know what your AWS bills are like
when you run your 100s of petaflops HPC job there. And what is the
interconnect like? A few gigabit switches aren't the same as the interconnects
on these supercomputers.

~~~
user5994461
Google Cloud has a much superior interconnect, easily doing twice the
bandwidth of AWS with lower latency.

Ethernet may not be approaching Infiniband in raw speed and latency, but I
think it's doing pretty decently with 10 Gbps going to every node.

Ethernet networks are definitely much more competitive today than 10 years
ago, when Infiniband already had cheap 20 Gbps network cards but 10 Gbps
Ethernet card were expensive and the network switches were a rarity.

