They only have 31 hours left to raise the rest of the money. Considering they fact that they took several weeks to reach this point it seems unlikely that they will hit the goal, and therefore they will not secure any of the pledged funds.
Could someone explain how this is different from GPU computing and regular multi-core CPU computing?
I realize there is a difference...but I'm not quite sure I grasp it yet. GPU computing is a lot of parallel math computations with limited shared memory. I'm assuming the Epiphany CPU is more capable than the simple GPU math units?
How's it different from multi-core CPUs? Just the sheer quantity of cores they have packed in there?
I've done my master thesis on GPGPU, so maybe I can help out a bit. I'm not yet too familiar with Epiphany's design however. From what I could grasp what sets them apart the most is a different memory architecture compared to multicore CPUs, where the individual cores seem to be optimized for accessing adjacent memory locations as well as the locations of the direct neighbors. This is one point where the architecture seems to be similar to GPUs, although GPUs have a very different memory architecture again - for the programmer it might look similar however, especially when using OpenCL.
The main point where Epiphany is diverging from GPUs is that the individual cores are complete RISC environments. This could mainly be a big plus when it comes to branching and subprocedure calls (although NVIDIA is catching up on the later point with Kepler 2). On GPUs the kernel subprocedures currently all need to be inlined and branches mean that the cores that aren't executing the current branch are just sleeping - Epiphany cores seem to be more independent in that regard. I still expect an efficient programming model to be along the same lines as CUDA/OpenCL for epiphany however - which is a good thing btw., this model has been very successful in the high performance community and it's actually quite easy to understand - much easier than cache optimizing for CPU for example.
If we compare epiphany to CPU what's mainly missing is the CPU's cache architecture, hyperthreading, long pipelines per core, SSE on each core, possibly out-of-order and intricate branch prediction (not sure on those last ones). The missing caches might be a bit of a problem. The memory bandwidth they specify seems pretty good to me, but from personal experience I'd add another 20-30% to the achievable bandwidth if you have a good cache (which GPU has since Fermi for example). The other simplifications I actually like a lot - to me it makes much more sense to have a massive parallel system where you can just specify everything as scalar instead of doing all the SSE and hyperthreading hoops like on CPUs - optimizing for CPU is quite a pain compared to those new models.
Assuming you're programming it in OpenCL, it's effectively a GPU with many more SMs but with a narrower SIMD width. If they were to give it, say, 16-way predicated SIMD with incomplete IEEE compliance on par with the Cell (~4M transistors per core plus a wider internal bus), it would become a very interesting processor IMO with ~1.4 TFLOPs per 64-core epiphany board. At the very least, they'd get bought out if they built such a beast and undercut NVIDIA, AMD, and Intel. Just sayin'...
In the meantime, leave the fast atomic ops, ECC, and full IEEE compliance to the GPUs and Xeon Phis of the world until you have the transistor budget to go after them...
> If they were to give it, say, 16-way predicated SIMD
I think that would completely defeat the purpose of the architecture, as it'd massively bloat the transistor count per core. Their roadmap is for 1000+ independent cores on a single chip, not stopping at 64 per board.
And there's the problem: my personal bias from years and years of GPU programming is that I'd rather target 4 cores with 16-way SIMD than 64 cores each with scalar, or to quote Seymour Cray - "If you were plowing a field, which would you rather use: Two strong oxen or 1024 chickens?"
Besides, this is 28 nm technology and 15x15 mm, no? That's 225 mm^2. AMD's 28 nm Tahiti is 365 mm^2 with 4.3B transistors, making this thing ~2.7B transistors give or take or ~41M transistors per core. Adding 4M transistors (source: it's about 1M transistors on a Cell chip per 4-way SIMD unit) is <10% larger in exchange for 16x the floating-point power. Unless I'm missing something, I'd build that chip in a minute...
Which is to say I don't want 1000+ wimpy cores - it'll get smashed by Amdahl's Law - when I can have ~900 brawny cores. NVIDIA and AMD have been exploring this space for almost a decade now and to start over without considering what they may have gotten right and what they have learned while doing so seems a little daft to me.
This is a ludicrous argument when arguing for a GPU architecture instead. A GPU architecture gets affected far worse for many types of problems, because what is parallelizable on a system with 64 general purpose may degenerate to 4 parallel streams on your example 4 core 16-way SIMD.
There are plenty of problems that do really badly on GPU's because of data dependencies.
> when I can have ~900 brawny cores
Except you can't. Not at that transistor count, and die size, anyway.
> NVIDIA and AMD have been exploring this space for almost a decade now and to start over without considering what they may have gotten right and what they have learned while doing so seems a little daft to me.
Have they? Really? They've targeted the embarrassingly parallel problems with their GPU's, rather than even try to address the multitude of problems that their GPU's simply will run mostly idle on, leaving that to CPU's with massive, power hungry cores and low core count. I see no evidence they've tried to address the type of problems this architecture is trying to accelerate.
Myabe the type of problem this architecture is trying to accelerate will turn out to be better served by traditional CPU's after all, but we know that problems that don't execute the same operations on a wide data path very often are not well served by GPUs.
That said, this is where the R&D done by AMD and NVIDIA have expanded what is amenable to running on a GPU. Specifically, instructions like vote and fast atomic ops can alleviate a lot of branching in algorithms that would otherwise be divergent. It's not a panacea, but it works surprisingly well, and it's causing the universe of algorithms that run well on GPUs to grow IMO.
What I worry about with Parallela is that by having only scalar cores, and lots of them, it has solved issues with branch divergence in exchange for potential collisions reading from and writing data to memory. The ideal balance of SIMD width versus cores count is a question AMD, Intel, and Nvidia are all investigating right now. But again, ~26M transistors - no room for SIMD...
There is certainly something to what you say. The advantage of the GPU model is that you can have the ALUs occupying a much higher percentage of your die if each core is less independent. Independent threads is not necessarily what you need on an accelerator card - that's what you have CPUs for anyways.
Why plow a field with 1024 chickens, when you can plow it with 1M worms?
The GA144's F18 core has ~20 thousand transistors, and is asynchronous, and if you make the die size the size of an Opteron, and if you wait until you can pack 20B transistors on a die, you get---one million---cores.
It's closer to regular multi-core CPU computing than GPU computing. It's general purpose cores.
What sets it apart is that the cores are tiny, with little per-core memory (though all cores can transparently access each-others memory as well as main memory), and so the architecture is well suited for scaling up the number of cores with quite low power consumption.
So for problems that can be parallelized reasonably well, but with more complex data dependencies than what a GPU is good for, this might be a good fit.
I'd put it somewhere in the middle between GPU's (for embarrassingly parallel tasks) and general purpose CPU's with high throughput per core.
Also, this looks like it'd be possible to fit in the power envelope of really small embedded systems, like e.g. cellphones and tablets....
Before more developers have these systems, it'll be hard to say how useful they'll be, but the architecture looks exciting.
That's why I supported it - I really want to see how this type of architecture can be exploited, and whether or not it'll prove to be cost effective and/or simpler to work with than GPU's for the right type of problems.
IMO this combines some of the worse features of Cell (e.g. local memory and DMA) and GPUs, and while the power efficiency is good the absolute performance is very low. For a parallel noob who's using OpenMP/OpenCL I don't think it's any better than a desktop PC because programming it is going to feel the same and performance is going to be equal or lower. And if you don't use the libraries then you're in low-level ninjas-only land — the extremely simple and flexible hardware is good in theory because you can use it many different ways, but it also doesn't help you or give any hints about how to properly exploit it.
It's not meant to compete with a desktop PC, or with a mass produced GPU.
It's meant to be a development platform for solutions based on their architecture and for people to get familiar with the development model, with an existing 64-core version of their chip and future versions intended to put 1000+ cores on a board as the eventual target.
That it's also a reasonably capable platform to run Linux on (on the ARM chip) so you can do development directly on the board is an added bonus.
Typical multicore CPUs don't have nearly as many cores as Parallella. Also, from what I can from www.apteva.com/introduction, the power consumption is much lower and the interconnect is different. In Parallella, cores are laid out in a grid and cores can only talk directly to their neighbors.
It's not so much that it's a powerful computer, but a computer architecture that can scale up to be a very powerful system. The version they're trying to fund is a cost reduced version including their 16 core chip. They also have a 64 core chip, and plan to scale it much higher.
It's differentiated from GPU's in that each core is a simple but fully independent CPU core, with direct access to main system memory AND to the memory of the other cores.
This current project is most interesting as a means for people to start playing with the architecture rather than for the raw performance.
Vs Xeon Phi: Cost, complexity, power. Look at pictures of the Xeon Phi cards. They're covered in heat sink, and with a fan. For comparison, the Epiphany chips are a single tiny die with no cooling. But of course the per-core performance is not likely to be anywhere near Xeon Phi either.
I'd consider Epiphany the simple, "slow" (per core), low power solution, with Tilera somewhere in the middle, and Xeon Phi at the other extreme (complex, fast per core, high power usage).
That said, this is speculation based on reading articles - I've not had my hand on any of the three. Yet :)
I've not had the time to read up on Xeon Phi, but compared to the Tilera, the Epiphany is a considerably simpler processor. There's no MMU in the cores, instead of caches there is direct DMA control, and the on-chip network extends past the edges of the chip (that's all the I/O, there are no peripherals in the chip). It all adds up to something you can scale by mounting more of them on a board, assuming your task is sufficiently adaptable to a data flow (since the external bandwidth scales slower than the number of cores). It's not at a level where you can run a general purpose operating system with virtual memory and memory protection (though extending it for that would be fairly easy - perhaps Epiphany V?), nor does it (currently) run multiple threads per core, but this simplicity affords it a much lower power expense.
A GPU may be more similar, as those tend to have prefetch operations and no memory protection, but they are designed to have huge bunches of threads doing the exact same type of work. They look like vector processors handling between 16 and 128 identical operations per control core (each a multiprocessor). Mainly the Epiphany is easier to program, but optimization is a different story (similar to place and route processes FPGAs need).
It's a move toward a data and control flow granularity currently not available at a price for individuals. And to make it more useful, those individuals need to try things.
I'm embarassed to say the same. I had forgotten about this Kickstarter, so I thought this was some sort of MRI-supercomputer backup plan, like cryostasis.
Personally, I'd prefer to blame my mental auto-correct. So used to seeing poor / simple grammar mistakes on the internet, I'm in the habit of simply reading in the missing words. In this case, the title wasn't "Parallela: A Supercomputer for Everyone Who is Dying". Which makes this less interesting, but at least this could be real. And from the looks of it, it probably will be.
I'm happy to see it get more attention, but to say it is "dying" is a bit hyperbolic. Sure, the Kickstarter campaign seems like it's unlikely to meet its target.
But from the sounds of it I don't think the company behind it will just give up if that happens. I know for my part if they put up another campaign, preferably with a longer lead time, elsewhere and/or take pre-orders, I'll commit again and I'm sure a lot of the other people who signed up will too.
I think it was unfortunate that they didn't release all the material they've released in the last few days right at the beginning of the campaign, though - they'd likely have done better. They've also clearly had a hard time explaining to people what it's for, which is a pity. I don't think the 16 core version by itself is all that interesting from a performance point of view, but I'm interested in the architecture in the hope that they manage to pull of the 64 core version and larger.
EDIT: It's added $20k in the hour since I wrote this - happily it looks like it's got a good chance to succeed.
Several commenters and the OP, seem to think that this Kickstarter will fail. Having backed quite few Kickstarter campaigns, and watched a lot more, this seems unlikely.
Backing is concentrated very heavily in the first three days and the last three. Projects that have reached 80% of their funding goal by the last three days are extremely likely to succeed.
It seems that many people delay backing till the last minute. Possibly this is just human nature, though the Kickstarter process also means that as the project progresses more information is released in a steady stream, and often new funding levels are created.
Additionally backers who really want the project to succeed raise their pledges to help the project succeed.
You were hoping for what exactly? VHDL/Verilog for the chips? Netlists?
For most people I'd assume the main thing is that the architecture is well documented and open, as well as the board, and they have released all of the architecture documentation and a lot of other material.
As much as it'd be great to have a market in other sources for the chips, unless/until the architecture has some traction that is pretty irrelevant.
GreenArrays are intriguing, but a completely different animal from this computer. The GreenArrays compute nodes are microscopic by comparison. Think 256 bytes of storage, shared with instructions and data. If you can map your problem on to them they seem very efficient.
I'm impressed that they made it as far as they have. $612k puts them in the top tier of all Kickstarter projects, but unfortunately they look to have set their goal too high. Maybe they can pull a Clang and raise a ton of money in the last 24 hours, but I'd be surprised to see that happen. Here's hoping I'm wrong.
Maybe this is cynical, but there seems to be little reason for them to not borrow enough from friends and family to collect from Kickstarter, and pay them back immediately after. Unless the gifts are ridiculously expensive?
I think the market is telling these guys: We don't care about computing power. People are getting by with iPads and Chromebooks powered by ARM cores with 1/8 the computing power of an Intel processor.
Don't get me wrong if you want to play around with parallel computing you should love this, and support it. Just don't be surprised when it doesn't reach Pebble funding levels.
They want to enable small scale computers to do more powerful computations, i don't think it is directly for ipad's and chromebooks, it could be useful for something like a quadcopter where there is a algorithm that gives better stability but needs more computational power.
While I am happy to see another post for this on the front page, I would have preferred a positive post. People jump on bandwagons I would rather we started a positive bandwagon rather then one looking to find the shovels and a decent grave for a awesome project
If the current trend rate continues, they should be able to reach their goal. If they could somehow get on the Reddit front page it would easily happen. I think there are many who might be interested if they only knew.
I did my part. I'm very much the archetypal broke college student at the moment, but I won't always be. I have big plans in the Artificial Intelligence and Machine Learning sectors, and I can't imagine a better, cheaper solution to get started on working with multi-agent systems.
I desperately want to see this sort of pricing for cluster computing available in the future, when I have the scratch and knowledge necessary to make these ideas into products.
I think that future is worth skipping the occasional movie or meal to pay into, and I'm looking forward to my somewhat unexpected end of year gift.
Parallella is an dual core ARM board with an FPGA and a 16 core Epiphany CPU (full general purpose CPU cores with 32k static RAM built into each core - all the cores can access the memory of all other cores as well as system RAM). 1GB RAM total. Expected size around a credit card or so.
The main purpose is the Epiphany CPU, which they also have a 64 core version of. Their problem is that their current CPU's are produced using a process that gives them very low yields and very high per-CPU cost. The main goal of the bounty is to enable them to switch to a much higher yield process and bring the per-chip cost of the 16 core version down to a few dollars per chip.
Their long term roadmap is boards with 1000+ cores.
It's alive. Alive! Adapteva reached their target; right now they are at $769,996 pledged with a $750,000 goal, which was cleared on october 27th between 2 and 3 am. Not sure what (US) timezone this refers to.
Did not delve into past performance of kickstarter projects, but comments from across the net seem to confirm rrreese's comment: "Backing is concentrated very heavily in the first three days and the last three. Projects that have reached 80% of their funding goal by the last three days are extremely likely to succeed."
Canhekickit states several todo's, of which aggregates and prediction would be especially useful.
Any comments on how the funding dynamic of future kickstarter/other crowdfunding projects would be affected if this data would be available?
Argh! I get paid in a couple of days! I was hoping the cash would go into my account before this runs out. Looks like that might not happen now....
Why couldn't you have given us another few days?
Oh well. I bet it'll get funded.
Video games is exactly where heterogenous computers like this have flourished. But they don't have a game to market it yet - historically it hasn't even mattered much if that initial game used the platform anywhere near well. How did the Ouya sell, for instance? (It really resembles a Nexus 7 with a broken screen.)