
A distributed supercomputer using Parallella boards - plumeria
http://supercomputer.io/
======
nickpsecurity
I've enjoyed reading on Adapteva's work not only for their processor specs but
the sheer impressiveness of them iterating so many designs with so little
money vs most ASIC vendors. This is definitely _not_ a supercomputer (buzzword
alert!) but is a good distributed computing idea. I could see their stuff
getting crammed into compute nodes in clusters, too.

Basically, a narrow set of applications that leverage these sorts of chips
best and are easily distributed can benefit from this setup. It will probably
have more raw performance per watt than most @home setups. Yet, I'd _guess_
lower potential in number of nodes given fewer Parallella users on the market.
I hope they benchmark it with many, _realistic_ software that can be compared
to other parallel computing methods. This would give us useful information on
whether to use something similar for a local site or another @home, non-profit
effort.

~~~
adapteva
Thanks!

Clearly we don't have the volume to make this as big as @home, but let's not
forget how expensive it is to run a big computer. The folding@home project
claims 36,000 TFLOPS of performance. If we generously estimate 1 GFLOPS/W for
most home computers, then we are talking about a cost of ~$36M/year for the
people donating cycles. This would be a smaller system, and if we get all
10,000 Parallella boards in the field hooked up, the electrical cost would
only be ~$50K/year.

I am not going to get into the whole "what is a supercomputer thing":-) I am
never going to beat Nvidia at that game.
[http://nvidianews.nvidia.com/news/nvidia-launches-
tegra-x1-m...](http://nvidianews.nvidia.com/news/nvidia-launches-
tegra-x1-mobile-super-chip) What I do know is that 180,000 CPU cores working
on one problem is one insane distributed computer.

~~~
nickpsecurity
The electrical cost savings is a sound argument. On the side, that might also
be a good reason for your marketing team to target customers in areas with
high energy prices. Just a thought.

I'm still waiting for Adapteva to combine their work with SGI- or NUMAscale-
style tech to have a UV-style system full of CPU's, your accelerators for
general use, and optionally FPGA's for special purpose stuff. The CC memory +
bandwith + ton of Adapteva IC's per node would = incredible performance per
dollar and watt. Maybe. ;) Any plans for integrating with ccNUMA
architectures?

------
Quequau
I was pretty enthusiastic about Adapteva's kickstarter for their Parallella
board, even though I felt like some of the wording they had was a bit
deceptive. I began to sign on for the 64 core version but I backed out because
I got burned by another Kickstarter project around the same time it began to
become clear they weren't really going to make it to the funding level they
needed for the 64 core version.

I forget how long that's been but I have to admit to be being really
disappointed that there hasn't been much demonstrable progress. There's still
no 64-core version of the Parallella board. There was some talk about an
update to 16-core Parallella board but I don't think it's shipped. As far as I
know there's still no way to connect multiple Parallella boards using their
"e-Link" ports. And lastly no way to assemble a small Adapteva cluster with a
more favorable ratio between the Adapteva and Zynq (as far as power, heat and
money go) than a stack of the 16-core Parallella boards.

Obviously I expected too much and I guess I'm being a tad unfair. That I've
seen the dev team have always been decent, reasonable, and polite; so I really
hate to write anything negative at all. I just thought they'd be much further
along by now and that I might have a mini many-core cluster.

~~~
M8
At some stage Kickstarter was more about suggesting dreams rather than
contracts.

~~~
Quequau
Yeah, that's probably true but I really got the feeling that when they first
published that they sort of masked the fact that your average Raspberry Pi
enthusiast wasn't going to be able to buy the board and run existing
application code on the 16-core epiphany chip. They did eventually edit the
page a bit to clarify this a bit.

As it turned out there were a few folks that appeared on the forums who had
these kinds of misconceptions. After all this time there weren't that many; so
I probably overestimated the number of people who signed on based on those
misunderstandings.

Having said all that, I guess I should point out that while I do have a very
skeptical view of kickstarter as a whole, I don't view Adapteva's kickstarter
as being very unethical or as complete failure. They eventually were able to
deliver on what I see a bare minimum of what they said they were going to do.
Though I do feel we have to be honest and acknowledge that they weren't able
to fully live up to all the claims and promises that were made... like you
said "more about suggesting dreams than making contracts".

~~~
adapteva
Have you used the Parallella?

[https://www.kickstarter.com/projects/adapteva/parallella-
a-s...](https://www.kickstarter.com/projects/adapteva/parallella-a-
supercomputer-for-everyone/description)

Kickstarter goal (first sentence from day 1): Making parallel computing easy
to use has been described as "a problem as hard as any that computer science
has faced". With such a big challenge ahead, we need to make sure that every
programmer has access to cheap and open parallel hardware and development
tools. Inspired by great hardware communities like Raspberry Pi and Arduino,
we see a critical need for a truly open, high-performance computing platform
that will close the knowledge gap in parallel programing. The goal of the
Parallella project is to democratize access to parallel computing. If we can
pull this off, who knows what kind of breakthrough applications could arise?"

Curious, besides being woefully late, what major promises did the Parallella
campaign NOT live up to?

~~~
semi-extrinsic
Since you're counting a streaming coprocessor as multicore, how is your
offering "democratizing access to parallel computing" when the cheapest OpenCL
capable GPU is half the price, and the programmer likely already owns one?

~~~
adapteva
Part of democratizing is access to information, access to different choices,
access to drivers.

Can GPUs do this. In my view, not a bad result considering 2014 general
availability? How long (and how many $billions did it take for CUDA to catch
on...)

[https://www.parallella.org/2015/05/25/how-the-do-i-
program-t...](https://www.parallella.org/2015/05/25/how-the-do-i-program-the-
parallella/)

In terms of cost, not really sure what you are referring to...
supercomputer.io is free to researchers, which last time I checked is less
than not zero.

------
ChuckMcM
There was a guy at the Makerfaire who had a Parallela based super computer and
a Jetson [1] based one. He felt that by the time the Parallela stuff was
running he had already surpassed it with the nVidia offering, faster and less
money per Teraflop. It is an interesting space to be sure.

But how does trying to send fragments over the internet really work?

[1] [http://www.nvidia.com/object/jetson-tk1-embedded-dev-
kit.htm...](http://www.nvidia.com/object/jetson-tk1-embedded-dev-kit.html)

~~~
adapteva
Yes, I know Brian's work well. His computers are awesome! Counting TFLOPS is
important for some, but for others it's the number of CPU cores that count.
Better to compare real workloads. In the case of supercomputer.io the key will
be software packages for machine learning and smartly distributing data across
the network. There is a long history for this kind of work with folding@home
and BOINC.

------
rkwasny
This is the same problem as with other systems: You NEED to have an
embarrassingly parallel problem to solve.

Sequential part and Amdahl’s law will catch you quickly.

~~~
white-flame
From my reading of the architecture, it's not just parallelism, but
_streaming_ parallelism. The individual cores really don't have memory or
cache to do "real work" on their own. In many cases, they're all competing for
memory bandwidth.

The target usage model seems to be where the actual algorithm employed can be
pipelined, streaming in-flight results from one core to another. That has even
more limited applicability than embarrassingly parallel architectures, and is
incredibly difficult to map general problems to keeping the cores busy.

------
icelancer
Signed up and donated my unused Parallella kickstarter board. Cool project!

------
imrehg
Looks like the image download is hosted on S3, is it possible to get a direct
S3 link to it? Then one could use the "?torrent" query string trick[1] to get
a bittorrent download, I'd be happy to seed it for a while.

[1]:
[http://docs.aws.amazon.com/AmazonS3/latest/dev/S3TorrentRetr...](http://docs.aws.amazon.com/AmazonS3/latest/dev/S3TorrentRetrieve.html)

~~~
alexandros
Hey there, resin.io founder here (we're working on this with adapteva) this
should be possible, let me ask around internally and get back with more
details

~~~
imrehg
Oh, cool, just checking your site as well, it's a very real issue you solve,
have some spare Raspberry Pi to try it out.

It feels like a big change in dev thinking that I cannot ssh into the board
anymore, but also very interesting.

By the way, I see that you'll be supporting the SabreLite soon. Would it mean
to be able to support other i.MX6-based boards like the VIA VAB-820 [1] or the
UDOO?

[1]: [http://www.viaembedded.com/en/boards/pico-
itx/vab-820/](http://www.viaembedded.com/en/boards/pico-itx/vab-820/)

~~~
alexandros
the idea is to make deployments repeatable, so while we may allow sshing in
the future, it will be for diagnostic/experimentation reasons, not for
altering the device state, so I think you understand the problem we're solving
pretty well.

We keep adding devices and will soon release a guide on how users can add
their own devices to the mix. That said, the primary determinant on whether we
can support a device is whether a yocto/openembedded BSP exists and is
relatively modern (uses a kernel above 3.8). If that exists, it's almost
certain that resin support will be relatively easy. Happy to chat more, email
in profile.

~~~
imrehg
Yeah, i wasn't really missing SSH, as "I never thought to take away SSH" and
actually it makes sense to have a different deployment and management mode. :)

And yeah, as a lucky chance, VAB-820 just had a yocto layer released with
3.10.17 [1]. Looking forward to see where this is headed!

[1]: [https://github.com/viaembedded/meta-via-
vab820-bsp](https://github.com/viaembedded/meta-via-vab820-bsp)

------
adapteva
Any questions/comments? (AMA)

~~~
stonogo
When will we get a demonstration with a real interconnect? If the FLOPS/Watt
is as good as claimed, why is nobody packaging this as an actual
supercomputer?

~~~
markhahn
because small, slow computers are easy to make power-efficient. the truth is
that the processor isn't that interesting, if you're taking a lots-of-wimpy-
nodes approach (like BGQ).

~~~
adapteva
Do you have some ideas/opinions who the industry will contain power
consumption with big cores? Current super computers running big brawny cores
and GPU accelerators are running at 5 GFLOPS/W at best. (consensus that we
need to get to 50 GFLOPS/W).

------
lmeyerov
Why this over say spinning up a cluster on AWS? You're paying for HW and watts
either way, and AWS already has the hardware sharing built in. For just $10,
you can get ~100 GPU spot instances for an hour, which is a ridiculous number
of cores/flops.

~~~
adapteva
Not familiar with the price point, reference? Pricing at AWS seems to be more
like $0.65/hour for one
GPU?[http://aws.amazon.com/ec2/pricing/](http://aws.amazon.com/ec2/pricing/)

Either way, the point of supercomputer.io is that it would be free.(thanks to
the contribution of everyone donating cycles). Think BOINC, not AWS. This is
not for commercial use.

~~~
ac29
Spot pricing is more like $0.065/hr:

[https://aws.amazon.com/ec2/purchasing-options/spot-
instances...](https://aws.amazon.com/ec2/purchasing-options/spot-instances/)

~~~
adapteva
Mind boggling pricing for sure, but $0.065/hr is still greater than $0/hour:-)

------
imaginenore
I'm impressed with how efficient the CPUs are. The 64-core Epiphany-IV is 50
GFLOPs/watt (single precision). That beats pretty much every GPU, as far as I
know.

------
bradknowles
This site is useless if you don't have Javascript enabled.

Next!

~~~
adapteva
Direct links if you still want to participate. Just dd card, powerup, and
forget.

[http://parallellocalypse.s3-website-us-
east-1.amazonaws.com/...](http://parallellocalypse.s3-website-us-
east-1.amazonaws.com/os/resin-supercomputer-0.1.0-0.0.14-Z7010-16.img)

[http://parallellocalypse.s3-website-us-
east-1.amazonaws.com/...](http://parallellocalypse.s3-website-us-
east-1.amazonaws.com/os/resin-supercomputer-0.1.0-0.0.14-Z7020-16.img)

