
Intel takes wraps off 50-core supercomputing coprocessor plans - evo_9
http://arstechnica.com/#!/business/news/2011/06/intel-takes-wraps-off-of-50-core-supercomputing-coprocessor-plans.ars
======
juiceandjuice
After attending 6 different talks about GPUs from various researchers today,
it's pretty clear that most the people I've talked to pretty much just use
Cuda. The general consensus was that OpenCL has a lot of overhead from setup
and less performance despite being more flexible, and nVidia is much more
active in actually courting researchers, often handing out machines for free.

Some of the speedups especially for quantum chemistry and other
"embarrassingly parallel" computations is pretty nuts.

~~~
malkia
I guess the expectation is that you can find OpenCL for general purpose CPUs,
and it's supported by more vendors - AMD, Intel, IBM, etc.

------
microarchitect
CUDA (and GPUs in general) seem to perform well only with SIMD workloads. I
think Larrabee is likely to be more flexible than GPUs at the cost of a little
SIMD performance. I also think the advantage of being on 22nm with the FinFET
process will prove to be significant because NVIDIA and co. are likely to be
stuck on a conventional 28nm process for a while.

It has the potential to be a good part.

~~~
justincormack
You still need some simd to get the performance numbers talked about as it has
512 bit wide extended SSE registers.

------
aaronblohowiak
How does memory coherence work with 50 processors?

~~~
beza1e1
The last paragraph mentions a single bus (instead of tiles), so they can use a
bus sniffing cache coherence protocol (MOESI etc). No problem here.

As far as i know, the frontier is at 60-80 cores, where bus sniffing gets too
expensive. However, i'd love to have some references for this.

------
calebmpeterson
If these MICs really are your bread and butter x86, then Clojure/JVM (and
immutable FP in general) will be quite a capable environment - particularly as
the core count climbs...

~~~
modeless
They're x86 processors, but rather slow ones. The high FLOPS figures come from
the custom vector instruction units, which are 4x as wide as SSE and much more
flexible. You're going to need excellent SIMD support in your language to use
this chip.

~~~
palish
4x as wide as SIMD. Wow.

(SIMD is four 32-bit floats wide, i.e. you can multiply four floats AAAA with
four other floats BBBB in a single instruction = AB AB AB AB. So 4x that is 16
pairs of floats in a single instruction!)

~~~
1amzave
s/SIMD/SSE/, methinks.

SIMD is a more general concept; SSE is a set of SIMD-style instructions (the
second 'S' actually stands for SIMD).

~~~
palish
Ah. Yes, I've always thought the two were interchangeable.

------
daimyoyo
What's the theoretical limit here? When you have 50 cores on a 22nm die,
wouldn't you have to start worrying about signal interference?(Disclaimer: I
have no training in semiconductor physics so if this question is naïve I
apologize.)

~~~
ori_b
You've already got to worry about signal interference within one chip. More
chips aren't going to cause a bigger problem on that end. The issues come with
stuff like heat dissipation, latency, interconnect density, power use, and so
on.

