
China Tops U.S. in Supercomputers - jonbaer
http://www.eetimes.com/document.asp?doc_id=1329941
======
dkural
The truth is, US Super-Computer spending is hard to justify and most of the
owners of these in various national institutes are shopping for customers to
justify their funding, although their set-up is not a good fit for many types
of computing.

Every dollar spent on building these computers is a dollar NOT spent in the
budget of a research-driven grant. Give these dollars to physicists,
mathematicians, genomics scientists, chemists, climate people etc. This way
the demand in computing will dictate the type of systems built.

p.s. I've attended the event in person that's listed in the article hosted by
the OSTP, which is a part of the White House (National Strategic Computing
Initiative). My conclusion was most of these computers aren't needed and a
misdirection of funds to the tune of billions of dollars that computing /
scientific computing could really benefit from in other ways.

~~~
batbomb
Yep. I'm part of a significant project which is being shoehorned into a Cray
because the government has already spent the money for it.

If it's not an MPI job, it's going to be painful.

~~~
gbrown_
_> If it's not an MPI job, it's going to be painful._

What are you using for communication between nodes?

To make the tone clear I'm genuinely interested as I'm not sure what running
on a Cray prevents you from doing?

~~~
batbomb
That's the problem:

We don't need to communicate between the nodes; We don't need MPI. We need
distributed databases and we might need Spark. We need machine learning. We
need servers. We would like availability and reliability.

We get full batch nodes that are relatively anemic in memory for our workload
(4GB/core) for a maximum of 24 hours at a time.

~~~
nickbauman
> We don't need to communicate between the nodes

Huh. At the core of almost every problem I've ever encountered in my 20+ year
career programming that I would consider "interesting" is the need to allocate
work across many compute nodes _that have some amount of shared state._ You
could just as well take A* as a baseline for these types of problems. Your
typical cloud computing infra/design is not a good match for these.

~~~
batbomb
Well I guess computing in astrophysics and HEP isn't that interesting then :)

It's true that we need shared data, but we don't need shared state (memory)
for any of our workloads, and shared data (e.g. disk/db) is a much easier
problem to solve than shared state.

There is still coordination and orchestration, but that's an extremely coarse
amount of communication compared to the cost of Cray interconnects. That being
said, there are cases where we might benefit from the Cray networking, but
that comes at the cost of other tradeoffs (no local disk, low memory per
core).

So what do we do? Well, we use a handful of Ferraris to get the job done
because they are available when cheap bus would have suited us fine. The
double whammy is that the Ferraris end up in the shop all the time and
occasionally somebody else gets exclusivity to them when they want to get in a
Bell prize submission.

~~~
cinquemb
Well this is where you talk to the network admins at your university and get
them in on a top secret program, code name: "It Was China", to hack into all
the detectable ps4's on campus, call all those students to a hall for a random
lottery social for a couple of hours while you commandeer there equipment, for
science of course.

Or you can send get the network admins to allow you to send them a email, but
the first suggestion sounds more fun :P

~~~
nickbauman
That, sir, is hilariously awesome.

------
narrator
What's worse is that the U.S prevented China from buying Intel chips to build
their supercomputer with[1]. It was a huge order and probably had something to
do with the recent layoffs.

[1] [http://nextbigfuture.com/2016/05/us-supercomputer-chip-
ban-d...](http://nextbigfuture.com/2016/05/us-supercomputer-chip-ban-
delayed.html?m=1)

~~~
bhouston
The worst is yet to come. China's home grown chips may now spread out from
China to undercut Intel at the end high cloud market if Linux-based software
is easy to port to it. Intel's CPUs are way over priced and ripe for
disruption if someone can get the required massive upfront investment to
create a real competitor, and it looks like China is doing this. Repeatedly
topping the supercomputer list is perfect marketing for introducing a new high
performance CPU to the world.

~~~
mtgx
A former NSA and CIA chief, wrote a post about stuff like this, before the US
banned US chip makers from selling to China:

> _By the time I became NSA director in the late 1990s, however, the
> calculation was no longer that simple. We still wanted an MTOPS advantage,
> of course, but we were fast realizing that our preferred limits were
> undermining the global competitiveness of the U.S. computer industry — the
> very industry on which we relied for our success. It was becoming clear that
> the overall health of that industry was more important than any MTOPS
> advantage against a specific target country. We still insisted on limits
> with regard to places such as Cuba and North Korea, but we became far more
> forgiving elsewhere.

This, of course, had a powerful, positive commercial impact, but the NSA
didn’t flip its position for commercial reasons. We did it for security
reasons. On balance, this change made us stronger, not weaker, over the long
haul, since retarding exports would inevitably retard the technological
progress that was both our economic and our security lifeblood.

That early lesson has caused me to continue to challenge arguments that
technological protectionism furthers national security. It might, but then
again, it could have the opposite effect if it freezes development, alienates
allies, feeds distrust or invites the creation of similar barriers abroad. I
would recommend these broader considerations to those in the U.S. security
enterprise with responsibility for evaluating these trade-offs today._

[https://www.washingtonpost.com/opinions/dont-let-america-
be-...](https://www.washingtonpost.com/opinions/dont-let-america-be-boxed-in-
by-its-own-
computers/2015/04/02/30742192-cc04-11e4-8a46-b1dc9be5a8ff_story.html)

The whole post is worth reading. The same logic could be applied to US trying
to put backdoors in its products for a _specific_ national security goal,
which will ultimately end up undermining US technological supremacy and
national security as well. And yet the current CIA director just implied that
US would be _fine_ with backdoored products, because "where are people going
to get their encryption from? The foreigners? Ha!"

This sort of _arrogance_ , which is the same type of arrogance that ended up
banning chip sales to China last year, is what _will_ make the US lose out in
the long term.

[https://www.techdirt.com/articles/20160618/08022234741/cia-d...](https://www.techdirt.com/articles/20160618/08022234741/cia-
director-john-brennan-says-non-us-encryption-is-theoretical.shtml)

~~~
nickpsecurity
Thanks for that article by Hayden. Wow. It echoed much of what Schneier et al
kept saying to him. Shows that they're more honest, receptive, or both when
they finally leave the job that forces them to BS for SIGINT gains. ;)

------
bjd2385
Please excuse my ignorance in this area, I was just wondering - why does it
matter so much who has the biggest and baddest supercomputer? Shouldn't that
which we process on them matter more?

A while back I recall reading rumors that China didn't even know what to
process on their Tianhe-2. That kind of makes a supercomputer look like a pile
of hardware, no matter how well it's organized or engraved.

~~~
dibanez
Its essentially a matter of realistic benchmark acceptance. Right now machines
are compared by how well they run this benchmark called LINPACK, which has
been criticized for being non-representative of real science codes. As
mentioned here [1], China's new system only achieved 0.3% of its peak flops on
a slightly more realistic benchmark, HPCG.

[1] [http://www.hpcwire.com/2016/06/19/china-125-petaflops-
sunway...](http://www.hpcwire.com/2016/06/19/china-125-petaflops-sunway/)

~~~
sanxiyn
While Sunway TaihuLight has weak HPCG performance, 0.3% number should be
understood in context. Tianhe-2 (#2 system) scores 1.1%, and Titan (#3, US #1
system) scores 1.2% for HPCG. So Sunway TaihuLight is 3~4x worse for HPCG
compared to other top systems.

Note that K computer (#5 system, Japan) scores 4.9% for HPCG. So Tianhe-2 and
Titan are again 4x worse for HPCG compared to systems which score best for
HPCG.

~~~
bhouston
Why does the Sunway TaihuLight have bad HPCG performance? I suspect it has to
do with memory bandwidth needed to serve 260C on the same chip?

~~~
sanxiyn
Yes, it's memory and interconnect.

I don't think it's as simple as LINPACK bad, HPCG good. LINPACK _is_
representative for some workloads, when compute dominates. HPCG aims to
balance compute and memory. There is Graph500, if memory dominates for your
workload.

By the way, K computer is #1 in Graph500.

~~~
ams6110
[http://www.graph500.org/](http://www.graph500.org/)

The interconnect on the Sunway TaihuLight seems to be standard Infiniband,
from reading other articles about the system.

------
supergirl
clickbait title. previous top supercomputer, now number 2, was also chinese.
should link this article instead of the paged, clickbait one:
[http://www.top500.org/news/china-tops-supercomputer-
rankings...](http://www.top500.org/news/china-tops-supercomputer-rankings-
with-new-93-petaflop-machine/)

------
DominikR
They already did in 2013 with Tianhe-2.

[https://en.wikipedia.org/wiki/Tianhe-2](https://en.wikipedia.org/wiki/Tianhe-2)

What is interesting about this new supercomputer is that it is apparently
built with CPUs that were developed and produced by the Chinese themselves.

New Supercomputer:
[https://en.wikipedia.org/wiki/Sunway_TaihuLight](https://en.wikipedia.org/wiki/Sunway_TaihuLight)

CPU architecture:
[https://en.wikipedia.org/wiki/ShenWei](https://en.wikipedia.org/wiki/ShenWei)

------
qd6pwu4
Happy to see the US government ban intel chips from China, we are forced to
make something by ourselves, good start.

------
fspeech
Designers of the supercomputer published their paper here:
[http://engine.scichina.com/publisher/scp/journal/SCIS/59/7/1...](http://engine.scichina.com/publisher/scp/journal/SCIS/59/7/10.1007/s11432-016-5588-7?slug=abstract)

------
filereaper
China Tops for _now_

Sierra and Summit are coming out soon, which are POWER based.

[https://www.olcf.ornl.gov/summit/](https://www.olcf.ornl.gov/summit/)

[http://www.cnet.com/news/ibm-nvidia-land-325-million-
superco...](http://www.cnet.com/news/ibm-nvidia-land-325-million-
supercomputer-deal/)

------
gbrown_
Whilst touched on in the linked article but not mentioned is that the codes
for the Gordon Bell runs are in the 30-40 petaflop range of sustained
application performance. That's not too shabby.

    
    
      According to TOP500 author Jack Dongarra, three scientific simulation codes run
      on TaihuLight have been chosen as Gordon Bell Prize finalists, two of which have
      managed to reach a sustained performance of 30 to 40 petaflops. The award is
      bestowed each year on the most noteworthy HPC application, based on “peak
      performance or special achievements in scalability and time-to-solution on
      important science and engineering problems.”
    

Source: [http://www.top500.org/news/china-tops-supercomputer-
rankings...](http://www.top500.org/news/china-tops-supercomputer-rankings-
with-new-93-petaflop-machine/)

~~~
grkvlt
Here's something that's always bothered me. In the HPC and scientific
computing world, it seems that the word 'program' is never used to refer to
what is being run on the computers, rather it is always 'codes' (note plural)
that are executed. Does anyone know when or why this split of terminology
occurred?

------
bhouston
So what is the cost of these CPUs? Could then be used in desktops or more
importantly various cloud services (AWS, Google, Microsoft)? What OS do these
run?

Is there plans to commercialize these chips? We need accessible low cost
ultra-high core count chips widely available. I feel that Intel and AMD have
been under performing in this area (growing core count) for the last decade.
And Intel has kept the price of CPUs exceptionally high for the last decade as
well.

If low cost ultra-high performance chips from China under cut Intel's
excessively priced Xeons (+$3K per CPU), it really could change things. If
these start to spread, I wouldn't be surprised if Intel tried to prevent the
spread of these CPUs outside of China via trade barriers.

~~~
imtringued
Ultra-high performance and low cost don't belong in the same sentence. Even
with intel if you want bang for buck you're better off buying 12 core CPUs
than the 18 core ones.

------
sanxiyn
For technical information, read the paper:
[http://www.netlib.org/utk/people/JackDongarra/PAPERS/sunway-...](http://www.netlib.org/utk/people/JackDongarra/PAPERS/sunway-
report-2016.pdf)

Compared to Tianhe-2, the previous top system, 2.7x more flops, and 3.2x more
flops per watt.

------
jonbaer
[http://www.netlib.org/utk/people/JackDongarra/PAPERS/sunway-...](http://www.netlib.org/utk/people/JackDongarra/PAPERS/sunway-
report-2016.pdf)

------
partiallypro
Forgive my ignorance, but what's the point of a "super computer" in a world
where we can have an application access thousands of machines around the world
at once with a button click to compute something? I could see maybe cost
savings, but considering how much it costs to build the machine and that I
doubt it's used to 100% capacity at all times. Is it a security thing, or just
a pride thing? What's the advantage over just farming it out to a cluster in
the cloud?

~~~
aab0
Traditionally, supercomputers and mainframes have way higher inter-node IO
performance like bandwidth than anything you can spin up on EC2. IO is
extremely important because it's very rare to have a true shared-nothing
algorithm, which means that Amdahl's Law will bite you and make the little bit
of IO and coordination a dominant factor. For example, in deep learning, an
awful lot of the work is simply moving around and updating data, which has
been a bottleneck for attempts to train NNs on multiple GPUs.

------
iofj
I have a question. 125.4 Pflop/s ... that would be about 23k nvidia GPUs
(granted, 1080s). They claim they'll be able to do that with 10k by the end of
the year, beginning of next year with server-class GPUs.

So that Chinese number seems awfully low to me. I would expect that the number
Amazon has to be higher, for instance. Same for Microsoft and Google.
Therefore I'd be amazed if the DoD, NSA and even the DoE wouldn't have more
capacity available.

~~~
Etheryte
This isn't about total computing power, but about computing power per one
system.

~~~
jobigoud
But if you have an application that can be distributed transparently to
thousands of GPUs, the difference between one and several systems is not very
relevant.

~~~
zhte415
It is, because some problems can be distributed to 1000s of GPUs and done in
parallel, and some can't, because the answer to one calculation depends on the
answer to another calculation. You could combine the computing power of Azure,
AWS and Google, and be pretty disappointed because of all the waiting time due
to latency from one data center to another That's when the system's
architecture - software and hardware - becomes important.

~~~
dekhn
Many problems are very latency tolerant. Unless you have an algorithm which is
truly latency intolerant, I argue you are best served not investing in low-
latency interconnect because it costs so much.

When I ran Exacycle, we distributed protein folding, protein design, drug
discovery, telescope design, and other problems globally. We never had an
issue with latency, because these problems all partition really well. People
who claim supercomputers are "necessary" for these problems typically
construct problems that are well-matched to supercomputers (for example,
running molecular dynamics on huge proteins) but they tend not to have very
high scientific value.

In my experience, partitioning to minimize communication has always increased
my total scientific throughput, while programming to supercomputers has always
reduced it.

------
ArtDev
The Chinese government has been committing industrial theft on a large scale
for years. This is well-documented but somehow China has gotten away with it.

