EDIT: Realizing that mods might change the title at any time, it is right now "Parallella: A Supercomputer For Everyone is Dying"
EDIT: $624,602 44 minutes after your post.... I'm getting more optimistic.
My anecdotal experience backing and following half a dozen projects agrees with this.
I'd be quite surprised if this project doesn't make it.
I realize there is a difference...but I'm not quite sure I grasp it yet. GPU computing is a lot of parallel math computations with limited shared memory. I'm assuming the Epiphany CPU is more capable than the simple GPU math units?
How's it different from multi-core CPUs? Just the sheer quantity of cores they have packed in there?
The main point where Epiphany is diverging from GPUs is that the individual cores are complete RISC environments. This could mainly be a big plus when it comes to branching and subprocedure calls (although NVIDIA is catching up on the later point with Kepler 2). On GPUs the kernel subprocedures currently all need to be inlined and branches mean that the cores that aren't executing the current branch are just sleeping - Epiphany cores seem to be more independent in that regard. I still expect an efficient programming model to be along the same lines as CUDA/OpenCL for epiphany however - which is a good thing btw., this model has been very successful in the high performance community and it's actually quite easy to understand - much easier than cache optimizing for CPU for example.
If we compare epiphany to CPU what's mainly missing is the CPU's cache architecture, hyperthreading, long pipelines per core, SSE on each core, possibly out-of-order and intricate branch prediction (not sure on those last ones). The missing caches might be a bit of a problem. The memory bandwidth they specify seems pretty good to me, but from personal experience I'd add another 20-30% to the achievable bandwidth if you have a good cache (which GPU has since Fermi for example). The other simplifications I actually like a lot - to me it makes much more sense to have a massive parallel system where you can just specify everything as scalar instead of doing all the SSE and hyperthreading hoops like on CPUs - optimizing for CPU is quite a pain compared to those new models.
In the meantime, leave the fast atomic ops, ECC, and full IEEE compliance to the GPUs and Xeon Phis of the world until you have the transistor budget to go after them...
All IMO of course...
I think that would completely defeat the purpose of the architecture, as it'd massively bloat the transistor count per core. Their roadmap is for 1000+ independent cores on a single chip, not stopping at 64 per board.
Besides, this is 28 nm technology and 15x15 mm, no? That's 225 mm^2. AMD's 28 nm Tahiti is 365 mm^2 with 4.3B transistors, making this thing ~2.7B transistors give or take or ~41M transistors per core. Adding 4M transistors (source: it's about 1M transistors on a Cell chip per 4-way SIMD unit) is <10% larger in exchange for 16x the floating-point power. Unless I'm missing something, I'd build that chip in a minute...
Which is to say I don't want 1000+ wimpy cores - it'll get smashed by Amdahl's Law - when I can have ~900 brawny cores. NVIDIA and AMD have been exploring this space for almost a decade now and to start over without considering what they may have gotten right and what they have learned while doing so seems a little daft to me.
You're assuming problems that are suitable for SIMD. If you have problems suitable for SIMD, use a GPU. Lots of problems are NOT suitable for SIMD.
If those 64 data streams all happen to require branches regularly, for example, your 4x 16-way SIMD is going to be fucked.
> Besides, this is 28 nm technology and 15x15 mm, no?
Where did you get that idea? Their site states 2.05mm^2 at 28nm for the 16 core version. 0.5mm^2 per core.
So by your math, more like ~26M transistors, or ~1.6M per core. Your estimated die size is 70% larger than what they project for their future 1024 core version...
> it'll get smashed by Amdahl's Law
This is a ludicrous argument when arguing for a GPU architecture instead. A GPU architecture gets affected far worse for many types of problems, because what is parallelizable on a system with 64 general purpose may degenerate to 4 parallel streams on your example 4 core 16-way SIMD.
There are plenty of problems that do really badly on GPU's because of data dependencies.
> when I can have ~900 brawny cores
Except you can't. Not at that transistor count, and die size, anyway.
> NVIDIA and AMD have been exploring this space for almost a decade now and to start over without considering what they may have gotten right and what they have learned while doing so seems a little daft to me.
Have they? Really? They've targeted the embarrassingly parallel problems with their GPU's, rather than even try to address the multitude of problems that their GPU's simply will run mostly idle on, leaving that to CPU's with massive, power hungry cores and low core count. I see no evidence they've tried to address the type of problems this architecture is trying to accelerate.
Myabe the type of problem this architecture is trying to accelerate will turn out to be better served by traditional CPU's after all, but we know that problems that don't execute the same operations on a wide data path very often are not well served by GPUs.
That said, this is where the R&D done by AMD and NVIDIA have expanded what is amenable to running on a GPU. Specifically, instructions like vote and fast atomic ops can alleviate a lot of branching in algorithms that would otherwise be divergent. It's not a panacea, but it works surprisingly well, and it's causing the universe of algorithms that run well on GPUs to grow IMO.
What I worry about with Parallela is that by having only scalar cores, and lots of them, it has solved issues with branch divergence in exchange for potential collisions reading from and writing data to memory. The ideal balance of SIMD width versus cores count is a question AMD, Intel, and Nvidia are all investigating right now. But again, ~26M transistors - no room for SIMD...
The GA144's F18 core has ~20 thousand transistors, and is asynchronous, and if you make the die size the size of an Opteron, and if you wait until you can pack 20B transistors on a die, you get---one million---cores.
But it's way better than this monstrosity: http://web.media.mit.edu/~bates/Summary_files/BatesTalk.pdf
What sets it apart is that the cores are tiny, with little per-core memory (though all cores can transparently access each-others memory as well as main memory), and so the architecture is well suited for scaling up the number of cores with quite low power consumption.
So for problems that can be parallelized reasonably well, but with more complex data dependencies than what a GPU is good for, this might be a good fit.
I'd put it somewhere in the middle between GPU's (for embarrassingly parallel tasks) and general purpose CPU's with high throughput per core.
Also, this looks like it'd be possible to fit in the power envelope of really small embedded systems, like e.g. cellphones and tablets....
Before more developers have these systems, it'll be hard to say how useful they'll be, but the architecture looks exciting.
That's why I supported it - I really want to see how this type of architecture can be exploited, and whether or not it'll prove to be cost effective and/or simpler to work with than GPU's for the right type of problems.
It's meant to be a development platform for solutions based on their architecture and for people to get familiar with the development model, with an existing 64-core version of their chip and future versions intended to put 1000+ cores on a board as the eventual target.
That it's also a reasonably capable platform to run Linux on (on the ARM chip) so you can do development directly on the board is an added bonus.
We'll find out soon enough.
Software router, possibly.
It just turns out to be a kickstarter project for a powerful computer.
It's differentiated from GPU's in that each core is a simple but fully independent CPU core, with direct access to main system memory AND to the memory of the other cores.
This current project is most interesting as a means for people to start playing with the architecture rather than for the raw performance.
I'd consider Epiphany the simple, "slow" (per core), low power solution, with Tilera somewhere in the middle, and Xeon Phi at the other extreme (complex, fast per core, high power usage).
That said, this is speculation based on reading articles - I've not had my hand on any of the three. Yet :)
Personally, I'd prefer to blame my mental auto-correct. So used to seeing poor / simple grammar mistakes on the internet, I'm in the habit of simply reading in the missing words. In this case, the title wasn't "Parallela: A Supercomputer for Everyone Who is Dying". Which makes this less interesting, but at least this could be real. And from the looks of it, it probably will be.
I can't speak for the other 1800 people in my bin, but I just decided on two.
But from the sounds of it I don't think the company behind it will just give up if that happens. I know for my part if they put up another campaign, preferably with a longer lead time, elsewhere and/or take pre-orders, I'll commit again and I'm sure a lot of the other people who signed up will too.
I think it was unfortunate that they didn't release all the material they've released in the last few days right at the beginning of the campaign, though - they'd likely have done better. They've also clearly had a hard time explaining to people what it's for, which is a pity. I don't think the 16 core version by itself is all that interesting from a performance point of view, but I'm interested in the architecture in the hope that they manage to pull of the 64 core version and larger.
EDIT: It's added $20k in the hour since I wrote this - happily it looks like it's got a good chance to succeed.
Backing is concentrated very heavily in the first three days and the last three. Projects that have reached 80% of their funding goal by the last three days are extremely likely to succeed.
It seems that many people delay backing till the last minute. Possibly this is just human nature, though the Kickstarter process also means that as the project progresses more information is released in a steady stream, and often new funding levels are created.
Additionally backers who really want the project to succeed raise their pledges to help the project succeed.
Does Kickstarter explicitly prohibit such things?
> FAQ: Will you open source the Epiphany chips?
> Not initially, but it may be considered in the future.
Well, that makes it a lot less interesting than I hoped it would be.
For most people I'd assume the main thing is that the architecture is well documented and open, as well as the board, and they have released all of the architecture documentation and a lot of other material.
As much as it'd be great to have a market in other sources for the chips, unless/until the architecture has some traction that is pretty irrelevant.
Here is the trend graph:
Source: http://canhekick.it/projects/adapteva/parallella-a-supercomp...), 13.700 projects graphed). Great project, Daniel.
Did not delve into past performance of kickstarter projects, but comments from across the net seem to confirm rrreese's comment: "Backing is concentrated very heavily in the first three days and the last three. Projects that have reached 80% of their funding goal by the last three days are extremely likely to succeed."
Canhekickit states several todo's, of which aggregates and prediction would be especially useful.
Any comments on how the funding dynamic of future kickstarter/other crowdfunding projects would be affected if this data would be available?
They're selling something that's really cool and a lot of people want to see succeed, but for which there aren't any software applications to take advantage of yet.
That's the type of project where a lot of people could donate at the end so that it succeeds. Opposed to say, a game, where there's a ton at the beginning and than it slowly trails off.
I desperately want to see this sort of pricing for cluster computing available in the future, when I have the scratch and knowledge necessary to make these ideas into products.
I think that future is worth skipping the occasional movie or meal to pay into, and I'm looking forward to my somewhat unexpected end of year gift.
(Edit: Written in response to the title.)
Parallella is an dual core ARM board with an FPGA and a 16 core Epiphany CPU (full general purpose CPU cores with 32k static RAM built into each core - all the cores can access the memory of all other cores as well as system RAM). 1GB RAM total. Expected size around a credit card or so.
The main purpose is the Epiphany CPU, which they also have a 64 core version of. Their problem is that their current CPU's are produced using a process that gives them very low yields and very high per-CPU cost. The main goal of the bounty is to enable them to switch to a much higher yield process and bring the per-chip cost of the 16 core version down to a few dollars per chip.
Their long term roadmap is boards with 1000+ cores.