
FASTRA II: the world’s most powerful desktop supercomputer - jacquesm
http://fastra2.ua.ac.be/
======
microarchitect
Calling this a "supercomputer" is a bit disingenuous. The most powerful
supercomputer today can do 10.5 Petaflops on the Linpack benchmark [1]. The
500th most powerful supercomputer can do 51 Teraflops.

If the authors actually tried to measure Linpack scores on their cluster,
they'd find that peak performance would be much worse than the 12 teraflops
they're claiming as "peak" performance. This link [2] seems to indicate that
one S2050 along with a fairly fast CPU gives you about 0.7 teraflops. Even if
they get perfect scaling across GPUs, which they won't, and if they could
solve all the heating issues with their design, they won't get more than about
0.7*6.5 = 4.55 TFLOPS peak on linpack.

In any case, many (39 out of 500 to be precise) of the most powerful
supercomputers in the world also use GPUs to accelerate certain applications
so there's simply no way this thing would be faster than those computers.

[1] <http://www.top500.org/lists/2011/11/press-release> [2] <http://hpl-
calculator.sourceforge.net/Howto-HPL-GPU.pdf>

~~~
tycho77
One S2050 has four M2050s, not two (I inferred this assumption from your 6.5
multiple).

I'd like more information on the CPUs and CPU/GPU bus of this FASTRA II. CPUs
are really gaining back a lot of their performance relative to GPUs - an M2090
is only about 5-10x faster than the newer Intel Xeons in a properly
multithreaded program that pays attention to the NUMA nodes. This is efficient
enough that for large simulations which do not fit on one GPU, running
portions of the simulation on CPU easily overcomes the CPU/GPU communication
overhead in terms of performance.

~~~
VladRussian
>I'd like more information on the CPUs and CPU/GPU bus of this FASTRA II.

"the graphics cards, which are connected to the motherboard by flexible riser
cables." - so this sounds like typical PCIe based design. Usually the bundle
of GPU cards on PCIe backplane is placed into separate case which connected
back to the CPU host, and in this case they engineered it into one big case
thus each card getting it's own x16/x8 pipe.

In general such "supercomputer" is cheaper, hardware-wise, than equivalent
CPU-based, yet it is still more expensive human labor (ie. programming) -wise.

------
wunki
Although the computer itself is reason enough to be impressed, I must admit I
more impressed with the use of the "super" computer. Beautiful that us geeks
can contribute so much to every other discipline.

Immediately, back in my head, I got this gnarling feeling that I should also
use my skills in such life improving projects instead of building the next
social _______ life drain.

~~~
h0h0
there are several, relatively well-paid, phd positions available at the vision
group ;)

------
DiabloD3
How is it the world's most powerful if its loaded with Nvidia cards?

200 series era Geforce cards were slower per watt and per dollar and per slot
(all important here) than 58xx series Radeons on both single precision and
double precision math (and integer as well, but integer is only useful for
crypto and Radeons have native single cycle instructions common for crypto
which makes them about 4-5x faster than Geforces, but not a typical use case).

And if we're comparing series 400/500 to 69xx, the beating is much worse.

~~~
sp332
Nvidia and ATI cards have (approximately and with a few exceptions) the same
performance per dollar when it comes to 3D graphics applications. But the
tradeoffs made internally mean that they have radically different performance
in other areas. For example, ATI chips are faster then Nvidia for bitcoin
mining (edit: lots of sha256 hashes), but Nvidia is generally faster than ATI
at physical modeling.

[http://ewoah.com/technology/a-very-good-guide-to-
building-a-...](http://ewoah.com/technology/a-very-good-guide-to-building-a-
bitcoin-mining-rig-cluster-
guide/#Why%20are%20ATI%20GPUs%20better%20than%20Nvidia%20GPUs%20for%20Bitcoin%20mining)

<http://stackoverflow.com/a/4642578/13652>

All this is moot if the application they're running only supports CUDA - then
they're stuck with Nvidia.

~~~
DiabloD3
Author of DiabloMiner, the worlds most popular GPU miner, here... I think you
mean SHA256.

If they're stuck on CUDA, I feel sorry for them. They'd be better off
rewriting the app to use OpenCL to take advantage of AMD's superior chip
design and not be stuck on Nvidia/Windows (Nvidia Linux support, especially
under CUDA, is horrible; rather depressing once you consider most CUDA
clusters run Linux).

~~~
supar
Can you give us some thoughts on how (why?) bad is nvidia/linux support is?

As a linux developer doing scientific visualization, and having tested (and
still testing, btw) how bad _still_ is the AMD driver status on linux, I see
basically nvidia as my only choice if I want to get anything done with decent
(and _stable_ ) performance.

I wouldn't run a cluster on anything but .nix, so I really see no alternative
on nvidia by my reasoning here, even if nvidia's architecture is actually
inferior.

But I'd really like to have some opinions on that front. I really want
alternatives, and the closed nature of nvidia's drivers _is_ a problem for me.
So, can you elaborate?

I also programmed on w2k (from around 2001-2004) and I used to target both
nvidia and ati, and I still remember the sheer number of bugs I had with GL in
general with the radeon drivers. But I cannot really speak anymore on that
front.

~~~
DiabloD3
I have answered most of your question here already:
<http://news.ycombinator.com/item?id=3337682>

Nvidia driver quality continues to be subpar on Linux, yet I have zero issues
with Catalyst. And even if Catalyst had a problem, as I described in the above
URL, AMD is shelling a lot of money out to make the FOSS driver their one and
only driver which already is shaping up to be a lot better than Catalyst.

I just don't see a point in doing business with a company that is clearly
trying to lock me in with closed source vendor specific APIs if they have no
interest in supporting me after the sale of the hardware.

In addition, their OpenCL implementation (which is shared across both Windows
and Linux) seems to run code much slower than it should, which I suspect (but
have no way of proving) they are doing it to make CUDA look better. The same
code ran on AMD hardware has zero issues, and the equivalent code in CUDA runs
about as fast as I think it should.

And also, re: Catalyst vs FOSS, in about 2 years the Mesa/Gallium OpenCL stack
should be finished, so not only will I be able to quit using Catalyst
altogether, Nvidia users will be able to make the better use of their hardware
and ditch the broken drivers Nvidia keeps forcing on people.

Now, who knows, maybe Nvidia will change tactics and quit screwing over
customers and developers, and release the programming manuals for their
hardware so FOSS can develop a better driver faster... I just don't see it
_ever_ happening.

Even if AMD hardware had a higher total cost of ownership, the fact AMD
actually cares about me as a Linux customer is why for the past 5 video cards
I've owned they have gotten my business.

re: Your programming experience, 2004 predates AMD taking over ATI. Driver
quality pre-AMD was worse than Nvidia was at the time, but it has greatly
changed. AMD has vastly increased the quality of the Windows driver, and the
original Linux driver was thrown out in exchange for one that shares almost
all the code with the Windows one (day and night kind of change in quality).

AMD didnt finalize their purchase of ATI until 2006, and it wasn't until 2007
or 2008 that driver quality started taking off.

As a statement of driver quality, people who use my software regularly report
30+ day uptimes using Catalyst on Linux. If it was so buggy and unstable, this
would probably be impossible.

~~~
supar
I was hoping for some more meaty details about what exactly is "subpar" about
nvidia on Linux (except the closed nature of the driver).

In terms of GL implementation, the nvidia driver has everything you would
expect. If you use GL directly, it is one of the most conforming
implementations I've been using (supporting "deprecated" stuff like bitmap
operations, line and polygon aliasing, imaging extensions, all GL revisions,
pretty much all ARB extensions, etc). Everything works from Cg to CUDA to
OpenCL to VDPAU. There is really no feature gap between windows and linux.

I've never compared OpenCL to CUDA performance really, but switching to OpenCL
for new projects is something I'm planning to do.

My mayor problem, really, is that CUDA is locked to specific versions of GCC,
and _that_ is even of a bigger issue than the closed nature of the driver
itself.

------
Quequau
They've had enough time to build a FASTRA III since this was released...

------
ksadeghi
It looks like they have 7 cards in there. If all the GPU's in that system
where pegged at 100% for 30 minutes that system would overheat. There isn't
enough space between the cards for free air flow.

7 x 5970's would give you just under 35 Teraflops instead of 12.

~~~
DiabloD3
To be fair, 5970s are dual GPU cards and would probably overheat packed in
like that. 5970 have two 5870s, but are downclocked so the performance is
closer to two 5850s.

8 5870s packed into two cases would survive if you keep the cases closed and
put 3 Delta AFB1212s in the front pushing all the air through the cards and
out their rear vents, this is how most Bitcoin miners build cases.*

If done right, 8 5870s should easily be kept under 85c even when marginally
overclocked.

* Large scale Bitcoin miners have a habit of not using cases and use flexible risers to keep the cards away from each other out in the open. I do not recommend this as you lose out on high pressure cooling, a staple of enterprise/high performace computing.

~~~
SilasX
FWIW, I set up a bitcoin mining rig with four 5870s by using liquid cooling.
Full story here: [http://silasx.blogspot.com/2011/12/exclusive-silass-
bitcoin-...](http://silasx.blogspot.com/2011/12/exclusive-silass-bitcoin-
mining-rig.html)

------
jvdb
(keep in mind, 2009)

