
Supercomputing on the cheap with Parallella - ingve
http://programming.oreilly.com/2013/12/supercomputing-on-the-cheap-with-parallella.html
======
n00b101
The Parallella Gen-1 has single precision throughput of 90 GFLOPS, power draw
of 5W, and unit cost of $99. This yields 0.91 FLOPS/s per dollar and 18 FLOPS
per Watt.

By comparison, the NVIDIA GTX 780 Ti GPU has single precision throughput of
5046 GFLOPS, maximum TDP of 250W, and unit cost of $699. This yields 7.22
FLOPS per dollar and 20 FLOPS per Watt.

This makes the GPU board 8 times cheaper and at least 1.1 times more power
efficient than the Parallella board.

Note that the average cost of electrical power in the US is about 10.5 cents
per kilowatt-hour (kWh), which means a single GPU board running 24 hours at
peak utilization would cost about 63 cents per day - so it would take over 3
years of 24/7 peak operation for the nominal power cost to equal the upfront
unit cost. So while power efficiency is very important, that doesn't mean the
unit cost can be ignored.

In addition to FLOPS/$ and FLOPS/W, one must also consider the processing
power density, which essentially measures how many "FLOPS per cubic foot" are
yielded by each device. This is important because these machines take up
physical space and physical space (i.e. real estate) is costly. A single GPU
board has 56 times the throughput of a single Parallella board, and I would
say that a GPU board is only slightly larger than the Parallella board (mainly
because GPU boards include massive cooling components which Parallella lacks).
A machine that requires 56 times more real estate to achieve the same
performance is clearly not competitive.

So far we've only looked at floating-point throughput. One must also consider
memory capacity and bandwidth of each device. The GTX 780 Ti has 3GB GDDR5
memory, and I understand that Parallella has 1GB memory. I don't know how the
memory bandwidth compares between the two.

It's clear that the Parallella board has a long way to go before it could be
competitive for supercomputing applications. And as Parallella tries to catch
up, the industry will keep moving - GPUs (and other accelerators, like Xeon
Phi) we continue to improve along all these dimensions and I imagine that
NVIDIA/Intel/AMD have vastly larger R&D budgets than Parallella/Adapteva. So
it is difficult to see how this could ever be a viable supercomputing platform
and not just an interesting hobbyist board.

~~~
georgeecollins
One advantage Parellella has is that I can put it inside a small robot
(backyardrobots.com) for say, image processing. I just can't do that with a
NVIDIA card, particularly once you include the desktop that it connects to.
There are probably other applications where a little card with that kind of
processing is helpful.

But you make a good point, if I were mining bitcoins, I would take the NVIDIA
hands down.

~~~
wmf
I don't even see any advantage for Parallela in embedded devices compared to
something like an AMD Temash.

~~~
hosh
Fair enough, that's an interesting device. How does it compare with a
Parallela? Are there anything as available as say, the Raspberry Pi? Or
compared to the Parallela after coming into general availability? Would this
chip necessarily run things like computer vision or deep learning better in
the embedded space? How about if you were a hobbyist or a maker? That is, what
if you weren't looking at industrial embedded application and instead, you're
looking more at DIY?

------
dkhenry
I get more and more concerned every time I see another delay. The good thing
is adapteva has been totally transparent every time there has been a delay,
the bad news is there have been lots of them. As a kickstarter backer I am not
too concerned eventually my new toy will show up and I can try some fun hacks
and projects with it. However part of the deal when they proposed the
kickstarter was that we were kick starting a company who was going to do
awesome things in the parallel processing space and its taken two years now to
just get a commercial 64 core solution out the door. While I am confident they
will deliver the parallela board as promised, I am now much less confident
that Adapteva will be the vanguard of next generation multi core processing.

~~~
VLM
"there have been lots of them"

Welcome to hardware development. Turnaround times for the software guys are
stereotypically measured in minutes, for hardware its measured in days. Just
kinda how it is.

I see no point in calling out, but pretty much any time someone who's never
shipped hardware tries to ship their first hardware, it always takes two to
ten times longer. If these guys are still together doing version 4 or
something, those estimates will probably be close to reality. Its just a
hardware design pattern to always be almost an order of magnitude more
optimistic. They ALL do it, the RF guys, analog guys, digital guys. RF guys
are by far the worst because of EMI/EMC and licensing reqs, if it makes you
feel any better (LOL).

~~~
dkhenry
Which is the problem. If your going to have success in shipping hardware you
need to be able to deliver which is why I am concerned, not because its taking
a long time to get a product out the door, but because every setback they have
here means a longer delay until you could get a commercially sustainable
product available.

If they could have gotten the 16-core parallella out the door in 1Q 2013 then
we could have hit the 2014 target of 1K-core parallella at 1.4 TFlops which is
much more useful

------
CraigJPerry
I think today's the last day to claim a free case or t-shirt for your
Parallela.

I stumbled across a mail from andreas@adapteva in my spam box the night before
last. Just an apology for the delays, looking like a Feb delivery now. You've
to send a mail to sales@ with your preference for t-shirt or case.

~~~
TallGuyShort
May I ask when that email is dated?

~~~
brucehart
I received my e-mail on Dec. 3rd at 3:58 PM (US Eastern time).

~~~
TallGuyShort
Thank you for your reply. I had sadly received no such email :(

------
peter303
A 'supercomputer" is defines withing 10% of world fastest computers. Since
those are 20,000,000 GFlops these, a super is at leat 1,000,000 GFlops
(petaflop).

~~~
lambda
Yes, when people use "supercomputer on a chip" marketing like this, they mean
"as powerful as a supercomputer of X years ago" for some value of X, as well
frequently "using high levels of parallelism like a supercomputer does."

I just checked the Top500 list from November 1999 (just an arbitrary year to
pick) and the lowest on the list hits 38.5 GFlop/s, so if the Parallella gets
90 GFlop/s, it's ell within the range of supercomputers of 14 years ago. In
fact, it looks like it would still make the cut in the November 2000 list, but
is hitting the edge in the June 2001 and solidly off the November 2001 list.

All, of course, assuming that the 90 GFlop/s quoted is comparable to that
measured on the Top500 machines; but even if you toss in a factor of 2 or 4,
that still only pushes you a few more years.

Of course, if you stretch that X back a few more years, almost any modern
processor is a supercomputer on a chip. I remember when the Cray Y-MP was a
supercomputer to dream about, and nowadays its performance would be considered
mid-tier for a smartphone.

------
estebanrules
Could this be used for scrypt-based altcoin mining?

