
The 64-core Parallella is alive - matt42
http://www.adapteva.com/parallella-kickstarter/the-64-core-parallella-is-alive/
======
fragsworth
They are making a _critical_ mistake here by not letting users pre-order the
next batch. It says "Sold Out" and you can't do anything else. They could make
hundreds or thousands of sales over the next day or two due to their free
launch publicity, but they're fucking blocking everyone from paying for it.

Most of the folks like me who _would have bought something today_ while
reading about it will just forget about it later. These guys are missing out
on a huge opportunity.

~~~
runeks
> They are making a critical mistake here by not letting users pre-order the
> next batch.

It's possible that we can thank the FTC for this:

 _The Federal Trade Commission’s (FTC’s) Mail or Telephone Order Rule covers
all merchandise ordered by mail, phone, over the internet, or via the fax
machine. It stipulates that, if a merchant does not promise a specific
delivery time, the merchandise ordered must be delivered within 30 days of the
merchant’s receipt of the order (or the date merchandise is charged to your
credit card). If the company is unable to ship within the promised time, the
company must give the buyer the choice of agreeing to the delay or canceling
the order and receiving a prompt refund. However, if you are applying for
credit to pay for your purchase and a company doesn 't promise a shipping
time, the company has 50 days to ship after receiving your order._

[http://www.hcs.harvard.edu/~scas/wp/wordpress/?page_id=24](http://www.hcs.harvard.edu/~scas/wp/wordpress/?page_id=24)

In other words, in the United States, it is not legal to take pre-orders, and
incur a delay, without offering customers their money back. So 100,000 pre-
orders could come in, and in case of a delay -- since they are forced to offer
a refund -- they could lose, potentially, all of their funding.

Imagine ordering parts for 100,000 boards and having 50% of your customers
take their money and run, in case of a delay. That's a fairly unmanageable
risk.

~~~
ensignavenger
Set the delivery date with plenty of extra padding, and have your software
move the date forward as more pre-orders come in.

~~~
leo_santagada
^- that. Simple and let users have a choice

------
awda
If you haven't heard of Parallella before (like me) and want to learn more:
[http://www.parallella.org/](http://www.parallella.org/)

Edit: Ok, here's my quick summary. Please correct me if I'm wrong:

This looks like a small PCB (raspberrypi-alike) that sits the main attraction:
a 16- or 64-core Epiphany coprocessor, as well as an ARM cpu to run the OS.
Not sure how these relate in performance to other coprocessors (GPUs with
OpenCL?). Power draw seems low (5W). Would love to read more about the
architecture, why Epiphany processors are special, etc.

~~~
vidarh
It's a dual ARM core Zynq SoC + the Epiphany. The Zynq has an on-die FPGA.
Part of the FPGA is used for "glue" for the Epiphany, but you can update the
FPGA config as well, with some care.

The main CPU is substantially faster than the Pi, but it doesn't have HW
accelerated graphics, so it's not a speed daemon for desktop/workstation type
use.

As for the Epiphany, assume that it'll be slower than most GPU's for tasks
that GPU's are good for. That is, if you can make do with few instruction
streams, the Epiphany is not well suited for it, as most GPUs will blow the
current chips out of the water in terms of performance.

If, on the other hand, your problem is poorly suited for GPUs due to lots of
independent instruction streams, it may be better suited.

One of the most interesting aspects of the Epiphany is that is can also be
connected into a grid - each chip has four high speed links that can be
connected to other Epiphany chips, or be used to interface with the main CPU
or off-chip memory.

The cores can all access each-others memory without any special instructions,
including that of the cores on other Epiphany chips that are hooked up via the
external links - the only difference between in-core and out-of-core memory
access is the speed.

~~~
MichaelGG
If you set up a grid, how is cache coherency handled and what's the impact on
performance?

~~~
watmough
I'd imagine it works like a Transputer, and the only accessible memory is
local to the chip, with communications including data coming down the serial
links.

~~~
vidarh
The chips have a flat address space that covers any Epiphany chips that have
been interconnected via the serial links, and the main memory of the CPU.

You can address core-local memory, memory in another core, or main memory the
same way - the difference is speed.

There's no cache, and you're responsible for avoiding race conditions in
memory access yourself.

------
Everlag
I was informed a few days ago that my 16 core Parallella has shipped; I had
hoped, when I ordered, that it would come earlier in the year before exams but
the fact that it shipped- several kickstarters which did not deliver have made
me wary- has me ecstatic to hold it in my hands.

I have a great amount of respect for the Parallella team: to be able to
kickstart a custom chip, that promises very interesting applications, and
deliver it within several months of the estimated delivery date with the
setbacks they have had is absolutely astonishing for me. While I can't comment
on the quality of the final product yet, I would say that they know how to run
an excellent campaign.

~~~
teh_klev
Mine shipped a few days ago as well. Sadly they seemed to have completely
ignored my address change request and the board is shipping to my old address
in another country. Grrr.

~~~
wsh91
They did the same thing to me, too. I live in California and my board has
arrived safely in DC where I used to live. Unbelievable given the difference
in timeline from what they originally set out.

------
avmich
> the 64-core Parallella is still setting the standard in terms of energy
> efficiency. In fact, it could be argued that it’s the most efficient
> computer in the world today

I'd be curious if it beats GreenArrays
([http://www.greenarraychips.com/](http://www.greenarraychips.com/)) numbers
of picojoules per operation. I wonder if those numbers are published for
Parallela?

~~~
daniel-cussen
I work a lot on GreenArrays, and I highly, highly doubt this claim. An F18
core has 1152 bits of SRAM on it, much less than these cores, which I believe
I read have 32 KB. Moreover, while the Parallella is clocked (like almost
every computer out there) the F18 is asynchronous.

From reading the parallella docs, it looks like that chips runs 5 W on a
"typical workload" while the GA144 runs .25 W at an absolute theoretical
maximum, for a 20x difference in energy consumption.

[http://www.parallella.org/board/](http://www.parallella.org/board/)

~~~
solarexplorer
The parallella supports floating point though. I don't know how fast the
GreenArrays run, but a factor of 20 doesn't seem impossible for a floating
point intensive program. More on-chip memory can be beneficial too, if it
helps to avoid accesses to off-chip memory. Of course, the unqualified claim
from parallella is pretty useless…

~~~
daniel-cussen
Parent asked about energy consumption, not speed.

You can program in floating point (I'm doing just that, with 6 cores
performing Karatsuba-3 multiplication of 54-bit elements) but that is quite a
bit slower than hardware DP multing (which Parallella boards lack too). FP
multing will likewise be slower than hardware FP multing, which Parallella
does have.

~~~
solarexplorer
Sure, you can do floating point in software. My point is that dedicated
hardware is very likely more power efficient at it. The same goes for memory:
accesses to off-chip memory cost much more energy than accesses to on-chip
memory. It would be very interesting to get energy and performance numbers for
some real world application running on both chips.

------
wcchandler
Pardon the ignorance but are all 64 cores available to the OS -- as in, if I
run htop, will I see 64 little bars at the top of the terminal? I would think
not if I'm understanding this architecture correctly...

~~~
wmf
I don't think an OS is intended to run on the Parallela chip at all; it's more
of an accelerator.

~~~
tostitos1979
Dumb question. How is this different from the cell processor in the ps3?

~~~
vidarh
The Cell SPE's are not as general purpose. They're SIMD processors (single-
instruction, multiple-data), and don't have transparent access to host memory
or the other cores (for some of the Parallella demo's, you can exit the main
program and watch the Epiphany cores continue to DMA data straight to the
frame-buffer).

They're more similar to a GPU than to the Epiphany. Each SPE is more powerful
in terms of Gflops, but the Epiphany CPU's offer more independent instruction
streams. If your problem is basically well suited for a GPU (easy to
vectorize) chances are it will probably do better on a Cell than the current
Epiphany's. If your problem has lots of independent branching, the Epiphany
stands a better chance.

------
brucehart
I'm looking forward to getting mine. I ordered it back in October and they
recently said the Zynq-2010 based boards would be shipped in mid-May. I wish
they would have been more up front about the delays. There were long periods
where there was no communication from the company. I didn't order through
Kickstarter but directly from Adapteva.

~~~
JoelHobson
What do you intend to use it for? Is there a hobbyist application for these?

~~~
brucehart
I am interested in parallel computation and computer architectures and thought
it would be fun to experiment with. I have worked on some projects developing
signal processing algorithms for large GPU arrays and I want to see how the
Parallela compares. I plan on implementing a real time audio watermarking
algorithm and some image/video processing algorithms.

~~~
fit2rule
I'd love to see someone design a unique synthesizer (audio) algorithm for
these things - hope you'll keep HN updated with any progress you make in that
direction.

------
bitL
Excellent! Congratulations! Can't wait to get my hands on the 64 core version!
:-)

------
damian2000
I was hoping to buy a few of their 16-core model for a commercial application,
and have been in touch with Adapteva but got no reply. It seems like they
really don't have the ability to supply the demand for their product.

------
izietto
What do you think about this + Haskell video encoder / decoder? I'm a little
ignorant about media processing, but I guess it to be processors scalable with
the right encoder / decoder, am I wrong?

------
dnautics
as soon as julia gets running on ARM this is going to make for some monster
high-level scientific programming kit.

------
markvdb
Clueless newbie here.

I see this thing does OpenCL on a completely different architecture. Recent
tesseract ocr versions supports opencl. Will I be able to run tesseract on
this thing? Would I even want to?

~~~
foxhill
in principle, yes, but the parallela board isn't exactly the fastest of all
the embedded boards.

also, OpenCL apparently isn't the preferred programming model for the
epiphany, so performance wont be as good as bare-metal (but that is
practically a tautology for any OpenCL device..)

------
kefka
As a theoretical pie-in-the-sky question, what is the minimum amount of energy
required to do an operation ?

~~~
jacquesm
If you are willing to reverse it the amount of energy can be 0 or a close
approximation.

See:
[http://en.wikipedia.org/wiki/Reversible_computing](http://en.wikipedia.org/wiki/Reversible_computing)

Note that this is the computing equivalent of a straight line.

~~~
kefka
Would that be related to the reversible computing I heard about a few years
ago, in terms of reserving the energy?

~~~
jacquesm
Recovering would be a better term I think. You could read the wikipedia
article to see if it is what you remembered.

------
andyl
Is Parallella alive? I put in an order more than 18 months ago and still
nothing.

Related question: is Erlang running on Parallella yet?

I remain very interested - 64 or even 16 cores on a small form factor would be
incredible.

~~~
acomjean
I got mine today.

I don't have the video cable yet and still have to attach the heat sink.
They'd really like you to have a fan, so I have some work to assemble one.

[http://www.parallella.org/quick-start/](http://www.parallella.org/quick-
start/)

~~~
vidarh
It does work with just the heat sink, but if you want to keep it on for
extended periods, the fan is probably a good idea.

I can recommend this kit for a case/fan:
[http://shop.abopen.com/](http://shop.abopen.com/) \- that fan is enough to
keep everything feeling cool to the touch.

The design is on Github too, if you want to do your own (and if nothing else
the instructions shows you how to hook up a 5v fan directly to the board)

It takes some assembly and very light soldering, but not more than that you
can get away with any crappy soldering iron and ideally a pair of wire cutters
(but scissors will do)

~~~
acomjean
Very useful, especially the 5v on the board. Might have missed that. Thanks

