

Parallel Programming and Optimization with Intel Xeon Phi (2013) [pdf] - nkurz
http://inside.mines.edu/~tkaiser/csci580fall13/one-day-2.pdf

======
acqq
Interestingly, Xeon Phi processors draw between 200 and 300 W alone, cost
between under 2000 and up to 5000 USD.

[http://en.wikipedia.org/wiki/Xeon_Phi](http://en.wikipedia.org/wiki/Xeon_Phi)

The cheapest one has 57 cores, 28.5 MB L2 cache, up to 6 GB RAM but 240 GB/s
RAM bandwidth over 12 channels:

[http://ark.intel.com/products/75797/Intel-Xeon-Phi-
Coprocess...](http://ark.intel.com/products/75797/Intel-Xeon-Phi-
Coprocessor-3120A-6GB-1_100-GHz-57-core)

Anybody knows some "prosumer" product (and its price when equipped with 6 GB)
that uses Xeon Phi?

~~~
sspiff
I recently ordered a Xeon Phi 31S1P. They're under $200 at the moment[1]. The
biggest problem I have is that you can't just plug them into any computer like
you would a graphics card. You need a compatible motherboard, and those tend
to be in the high price range and 2011 sockets (which means expensive CPU).
Most of the time, you won't find out if a board is compatible until you plug a
card in and try it out.

I'm curious how well the card will work, I'd love to see something like Erlang
on such a thing.

[1] [https://software.intel.com/en-us/articles/special-
promotion-...](https://software.intel.com/en-us/articles/special-promotion-
intel-xeon-phi-coprocessor-31s1p)

~~~
nkurz
I've stared at that promotion (and even submitted it here, I think) but I
haven't really understood it. Who did you end up buying from? Are there
limitations on who can participate? How are you going to deal with the passive
cooling aspect?

~~~
sspiff
Colfax International. No restrictions, although they can't ship to Europe
unless you have a DHL/FedEx/UPS/... freight collect account.

They've been very helpful, agreed to ship the cards to a friend in the US, and
were patient as my bank blocked my credit card twice because of the
"suspicious" transaction. I ended up wiring the money instead.

I ended up paying 492 euros for 3 cards (2 of my friends also ordered one).

I'm just an individual enthousiast, and my friends are researchers at the
local university. You don't need a company, you don't need to order a lot of
them, you can just order a single card as a private individual.

------
PaulKeeble
Many times I have had programming problems that would have usefully been able
to use a lot of cores especially with reasonable IO wait on each of the
threads of execution. But in order to really have just been able to use this
it needs:

1) The cores need to run modern x86 so its a simple matter of just running the
binary. 2) The offloading and concept of heterogeneous cores needs to be added
into the operating system. All the cores should be exposed to programs and
then putting threads onto accelerators ought to be something a hint can
suggest or maybe a special type of parallel thread created. In essence the OS
needs to be exposing them like any other core with shared memory and
everything else this entails to make it native.

The current model that GPUs use is very good for doing matrices of data but it
really doesn't lend itself to agents or other types of concurrency. Its a bit
of a stretch to be writing in a restricted form of C which is pretty low level
with openCL or DirectCompute combined with all the API and data passing
overhead its a very specific type of program that benefits and it requires
rewriting your code completely. The future can't possibly be this in the
general case and its not really being adopted all that widely, some people are
using it of course but most aren't.

In my opinion lots of low power cores that run the same instruction set as the
primary CPU gives us a useful middle ground that is easier to use and optimise
for and can be used alongside the primary fast cores of the main CPU. That is
the future I am hoping for.

~~~
higherpurpose
AMD/ARM's HSA doesn't need OpenCL (although it will support OpenCL 2.0 which
is much more optimized for heterogeneous computing). I guess you're talking
about the _current_ state of GPU computing. The next-generation, that will be
based on HSA, should be much better. You can even use Java or other languages
to write for it.

------
ternaryoperator
For what I thought was a good, short, clear intro to programming the Phi,see
[1]

[1] [http://www.drdobbs.com/240144160](http://www.drdobbs.com/240144160)

------
jcr
The linked pdf is of chapters 3 and 4, but chapters 1 and 2 are also
available:

[http://inside.mines.edu/~tkaiser/csci580fall13/one-
day-1.pdf](http://inside.mines.edu/~tkaiser/csci580fall13/one-day-1.pdf)

All the other files from "Advanced High Performance Computing CSCI 580" class
also look interesting:

[http://inside.mines.edu/~tkaiser/csci580fall13/](http://inside.mines.edu/~tkaiser/csci580fall13/)

------
rbanffy
The Xeon Phi is a fascinating exercise in futurology. It may face strong
competition from GPUs in HPC environments, but future end-user non-specialised
processors will probably have more cores than current designs and any effort
optimising code to run on more cores than currently available is an investment
that'll bear fruit in the future.

~~~
seanmcdirmid
The Phi is SIMD with a thin memory hierarchy. So it's not really that
different from a GPU...it definitely isn't general purpose.

~~~
rbanffy
It is different in that it _looks_ general purpose enough (it looks like a lot
of Atom-like cores with wider SIMD units hooked up to a pool of shared memory)
and in that it can run off-the-shelf software (albeit poorly).

Learning to make it run effectively may give you some insight on how to
persuade your personal computer of 2024 to use all its cores and make your
browsing experience better.

