
Why didn't Larrabee fail? - KVFinn
http://tomforsyth1000.github.io/blog.wiki.html#%5B%5BWhy%20didn%27t%20Larrabee%20fail%3F%5D%5D
======
pavanky
> Larrabee ran Compute Shaders and OpenCL very well - in many cases better (in
> flops/watt) than rival GPUs

Oh god the horror that was OpenCL on Xeon Phi.

1) It was extremely buggy. Any OpenCL program of decent complexity was bound
to encounter bugs in their driver.

2) FLOPS are not the only measure for computations / simulations. The
bandwidth is is extremely important for many applications. Whatever Xeon Phis
OpenCL was doing was not achieving anything close to its peak. Any OpenCL
kernels we had tuned would end up improving the performance of the host CPU
(16 x 2 Xeon) as well. It would usually result in the Xeon CPU performing
better than the Xeon Phis.

3) NVIDIA released multiple generations of their GPUs since Xeon Phi was
released. The performance per watt on NVIDIA and AMD GPUs has been improving
substantially over the last 3 years while Xeon Phi's stagnated in 2012.

4) Even their claim for best FLOPS / WATT is wrong. The Tesla K20 released
around the same time had the same power usage (225W) but about 50% higher
FLOPS (2TFLOPS vs 3TFLOPS single precision).

The only selling point was that they were x86 cores and hence did not require
rewriting your code. But you _had_ to write additional directives indicating
what part of your code you wanted to offload to your accelerator and when you
wanted to get it back. The best performance ironically came after a lot of
tuning for your application.

4 years down the line, GPU based accelerators still dominate the market and
growing. and I wonder how many of the researchers who went with Xeon Phis
regret their decision.

~~~
gbrown_
I agree with all of your points but I think KNL will be a turning point.

Traditional multi core CPUs simply aren't an option to advance in the HPC
space any more. Per generation there's not enough of a performance improvement
to make it effective. And buying more of them isn't feasible from a power
perspective. So everyone is going to have to re-write for massive parallelism.
I think Intel's KNL and future Knights will be the most attractive path for a
lot of people. You have the tooling and familiarity of x86 and a simpler
memory model than GPUs.

~~~
llukas
Do you still need to use vector instructions on KNL to get top performance? If
so then forgot about "familiarity" when rewriting.

~~~
gbrown_

      Do you still need to use vector instructions on KNL to get top performance?
    

Yes, but this is also the case for regular Xeons.

    
    
      If so then forgot about "familiarity" when rewriting.
    

When I say "familiarity" I'm referring to tooling in things like compilers and
debuggers as well as just the instruction set.

~~~
gnufx
Right. The only thing that looks particularly different (compared with the
typical HPC node intended for workhorse vectorized code) about KNL on paper is
the memory system, and possibly integrated interconnect. Given the usual
library support, I don't see why there should be significant re-writing to be
done, any more than for other AVX transitions (unless you're developing BLAS
etc.).

------
stonogo
The jury is still out. Larrabee is in the wild but it's a hell of a long way
from 'not failed' \-- every single deployment I'm aware of suffers from
massive failure rates of the hardware and every single developer I've worked
with much prefers to work with Tesla gear.

It's all well and good for them to aim for 'prestige buys' like Tianhe 2, but
when the marketing smoke clears, the product is not achieving the things we in
HPC buy products to achieve. Utilization rates are in the toilet across the
ten or so Phi contracts I touch. I'm sure KNL's integrated architecture will
change this, since everyone will just get this by default from vendors, but I
don't know a single site who decided to buy Phi twice.

As far as I can tell, Phi is a play at a numbers game: you can ram up the Rmax
of your system under LINPACK, regardless of whether it creates any actual
value for your users. In other words, it's a cheap way to claw some false cred
out of your top500 position, and not so much a useful HPC tool yet.

Time will tell, I guess.

~~~
sijoe
I am setting a unit up now for a customer. In the years I've been working on
them, the toolchain you need has not gotten much better ... no ... it went
from complete crap to meh.

You really can't not use the intel compilers for them. Gcc won't optimize well
for it at all. This means that in addition to the higher entry price for the
hardware, you have a sometimes painfully incompatible compiler toolchain to
buy as well, and then code for. Which means you have to adapt your code to the
vagaries of this compiler relative to gcc. These adaptations are sometimes
non-trivial.

I am not saying gcc is the be-all/end-all, but it is a useful starting point.
One that the competitive technology works very well with.

From the system administration side, honestly, the way they've built it is a
royal pain. Trying to turn this into something workable for my customer has
taken many hours. This customer saw many cores as the shiny they wanted. And
now my job is (after trying to convince them that there were other easier ways
of accomplishing their real goals) to make this work in an acceptable manner.

The tool chain is different than the substrate system. You can't simply copy
binaries over (and I know they have not yet internalized this). The debugging
and running processes are different. The access to storage is different. The
connection to networks is different.

What I am getting to is that there too many deltas over what a user would
normally want.

It is not a bad technology. It is just not a product.

That is, it isn't finished. It's not unlike an Ikea furniture SKU. Lots of
assembly required. And you may be surprised by the effort to get something
meaningful out of it.

As someone else mentioned elsewhere in the responses, the price (and all the
other hidden costs) are quite high relative to the competition ... and their
toolchain stack is far simpler/more complete.

The hardware isn't a failure. The software toolchain is IMO.

------
mrb
_" Make the most powerful flops-per-watt machine. SUCCESS!"_

Hum not at all... Larrabee gets _destroyed_ by the competition in terms of
flops-per-watt. Knights Corner is rated 8 GFLOPS/W at the device level (2440
GFLOPS at 300 W). For comparison Nvidia Tesla P100 rates 4× better: 32
GFLOPS/W (9519 GFLOPS at 300 W) and AMD Pro Duo rates ~6× better: 47 GFLOPS/W
(16384 GFLOPS at 350 W although in the real-world it probably often hits
thermal limits and throttles itself, so real-world perf is probably closer to
30-40 GFLOPS/W).

Also, if Intel wants to make Larrabee gain mindshare and marketshare, they
need to sell sub-$400 Larrabee PCIe cards. Right now, everybody and their
mother can buy a totally decent $200 AMD or Nvidia GPU and try their hands at
GPGPU development. And if they need more performance, they can upgrade to a
multiple high-end cards, and their code almost runs as is. But because
Larrabee's cheapest model (3120A/P) starts at $1695
([http://www.intc.com/priceList.cfm](http://www.intc.com/priceList.cfm)), it
completely prices it out of many potential customers (think students learning
GPGPU, etc).

~~~
nkurz
_Larrabee gets destroyed by the competition in terms of flops-per-watt_

While I expect the graphics cards to have the edge in performance per watt for
brute-force math, I'm surprised that the current GPU's would have _that_ large
of an advantage. Are you sure you are comparing the same size "flops" for
each?

And while it's in keeping with the article, I don't think lumping all the
generations together as "Larrabee" makes sense when comparing system
performance. While availability is still minimal, Knights Landing is
definitely the current generation (developer machines are shipping), and as
you'd expect from a long-awaited update at a smaller process size, efficiency
is a lot better than older generations.

Here's performance data for different generations of Phi:
[http://www.nextplatform.com/wp-
content/uploads/2016/06/intel...](http://www.nextplatform.com/wp-
content/uploads/2016/06/intel-xeon-phi-compare-table-1.jpg)

And here's for different generations of Tesla:
[http://www.nextplatform.com/wp-
content/uploads/2016/03/nvidi...](http://www.nextplatform.com/wp-
content/uploads/2016/03/nvidia-tesla-lineup-1.jpg)

From these, the double-precision figures appear to be:

2016 KNL 7290: 3.46 DP TFLOPS using 245W == 14 DP GFLOPS/W.

2016 Tesla P100: 5.30 DP TFLOP using 300W == 18 DP GFLOPS/W.

I don't know if these numbers are particularly accurate (I think actual KNL
numbers for the developer machines are still under NDA), but I think the real
world performance gap will be something more like this than one approach
"destroying" the other. If you are making full use of every cycle, the GPU's
will win by a bit. If your problem requires significant branching, Phi won't
be hurt quite as badly.

 _Also, if Intel wants to make Larrabee gain mindshare and marketshare, they
need to sell sub-$400 Larrabee PCIe cards._

The interesting move Intel is making is to concentrate initially on KNL as
heart of the machine rather than as an add-in card. This gets you direct
access to 384GB of DDR4 RAM, which opens up some problems that current
graphics cards are not well suited for. I think this plays better to their
strengths: if your problem is embarrassingly parallel with moderate memory
requirements, modern graphics cards are a better fit. But if you need more
RAM, or your parallelism requires medium-strength independent cores, Phi might
be a better choice.

 _But because Larrabee 's cheapest model (3120A/P) starts at $1695_

While it's true that Intel's list pricing shows they aren't targeting home use
at this point, List Price may not be the best comparison. For a while last
year, 31S1P's were available at blow out prices that beat all the graphic
cards. In packs of 10, they were available at $125 each: [http://www.colfax-
intl.com/nd/xeonphi/31s1p-promo.aspx](http://www.colfax-
intl.com/nd/xeonphi/31s1p-promo.aspx).

It's true that even at that price they were not a great fit for most home
users: they require a server motherboard with 64-bit Base Address Register
support ([https://www.pugetsystems.com/labs/hpc/Will-your-
motherboard-...](https://www.pugetsystems.com/labs/hpc/Will-your-motherboard-
work-with-Intel-Xeon-Phi-490/)) and they require external cooling. But if you
were willing to be creative
([https://www.makexyz.com/3d-models/order/b3015267f647354f96ff...](https://www.makexyz.com/3d-models/order/b3015267f647354f96ff04d809d10add))
and your goal was putting together an inexpensive many-threaded screaming
number cruncher, the price was hard to beat.

~~~
vardump
Would _love_ to kick tires of one of those Knight's Landing cards.

It might be possible to pull off similar tricks as on normal Intel Xeons. I
wonder what kind of memory controller it got. Does it for example have
hardware swizzling and texture samplers accessible?

But we use exclusively Nvidia for our number crunching product. And since the
relevant code is implemented in CUDA, we have an Nvidia platform lock-in.
Other options are not even on the table, and will not be.

Intel would be very smart to offer a migration path from CUDA code.

------
jcbeard
This article is a reminder to me that people are bloody critical of things
that they don't really understand. Many products are killed, many research
ideas never take off even if they're proven to work well. Ideas/products are
often killed, not because they weren't great, but for a litany of other
reasons having nothing to do with the awesomeness or usefulness of the idea.
Products are killed b/c of politics, lack of a massive market, and most
importantly poor execution. Even when things fail (research, products,
otherwise), they often have a huge impact on the ecosystem. The product might
fail (in this case, it really didn't), but the ideas will live on.

------
nwhitehead
This is fun to read, I'm always amazed as the disconnect between public
presentations and the engineering reality of what's really going on.

For me, "Larrabee" died during it's first public demo in IDF 2009 [1]. At the
time I was working on CUDA libraries at NVIDIA. I remember everyone watching
the stream to see how much of a threat it would be. When we saw the actual
graphics, everyone started laughing. "Welcome to the 1990s!" At that point it
was obvious that Larrabee would not be a graphics threat to NVIDIA, it just
had too far to go. It was not a discrete GPU killer.

[1]:
[https://www.youtube.com/watch?v=G-FKBMct21g](https://www.youtube.com/watch?v=G-FKBMct21g)

------
0x0
A graphics card where the graphics core is actually an x86 application running
on FreeBSD running on the "GPU"? That you could even log in to? That sounds
out-of-this-world-amazing. What a shame it wasn't released like that.

~~~
SXX
Actually Xeon Phi is almost the same thing except it's not sold as GPU. You
can ssh into it or run software rasterizer on it's cores and only thing
missing would be host graphics driver.

~~~
phire
It's also missing the texture units.

What I hear from my friend who works in the GPU industry, one of the main
outcomes from the Larrabee research project was: It is possible to do most of
graphical operations on a SIMDed general purpose CPU, with good performance.
Everything except texture decoding, which really needs dedicated texture
decoding silicon. And with texture units taking up 10% of the silicon, Intel
really needs to decide if they want to sell it as a GPU or as a general
purpose compute unit with 10% more cores. Intel chose the latter, and you
can't really blame them, as you can sell dedicated compute units for more
money.

Sony ran into the same problem with the PS3 and the cell. They originally
designed it so the game developers could implement whatever rendering method
they wanted, in software, on the SPUs. But the performance wasn't high enough,
partly due to texture decoding taking up too much time. By the time it was
discovered this was a problem, the cell was more or less finalises. Sony were
considering adding a second cell to the console, to brute force through the
problem, but eventually they asked Nvidia to hack together a traditional GPU.

~~~
pjc50
What does "texture decoding" actually consist of here - mipmap+interpolation?

~~~
phire
Nothing terribly complex.

For the typical case of 2D uncompressed textures with some kind of
trilinear/anisotropic filtering enabled, you are basically just calculating 8
addresses based on texture coordinates and clamping and mipmap levels, doing a
gathered load (and CPUs hate doing gathered loads). Remember to optimise for
the case that each group of 4 addresses are typically, but not always right
next to each other in memory.

Then you use trilinear interpolation to filter your 8 raw texels into a single
filtered texel. With the exception of the gathered load, none of these
operations is actually that expensive on CPUs, but shaders do a lot of these
texture reads and it's really cheap to and faster to implement it all in
dedicated hardware.

You can also put these texture decoders closer to the memory, so the full 8
texels don't need to travel all the way to the shader core, just the final
filtered texel. And since each texture decoder serves many threads of
execution, you have chances to share resources between similar/identical
texture sample operations.

And while the texture sampler is doing it's thing, the shader core is free to
do other operations (typical another thread of execution). It's not that the
CPU can't decode textures, its just that CPU cores without dedicated texture
decoding hardware can't compete with the hardware which has the dedicated
texture decoding hardware.

------
loser777
I'm not sold on the " __1\. Make the most powerful flops-per-watt machine. "
success.

From my knowledge that claim only makes sense for the ever-shrinking domain of
applications that haven't been ported to GPGPUs with _simpler_ cores. I also
don't understand how the best "flops-per-watt" can be extrapolated from the
fact that the Phi is used in top ranked TOP500 machines.

The Phi really just seems like an intermediate between CPUs and GPGPUs in the
ease-of-programming vs. perf/watt pareto curve.

~~~
wmf
Here's the list:
[http://www.green500.org/lists/green201606&green500from=1&gre...](http://www.green500.org/lists/green201606&green500from=1&green500to=100)

I see NVIDIA near the top at 4.7 GFLOPS/W and the old Xeon Phi at #44 at 2.4
GFLOPS/W which supports your argument.

------
WhitneyLand
Tom is conflating two premises - As engineers the team has done some nice
work, but as a product Larrabee has failed.

\- The perf/watt has not been competitive

\- For the most part it has not been loved by its users

\- The market share/penetration is small

\- There is no evidence sales have been material to Intel; In fact it's not
even clear it's been profitable

On the upside, though the product has failed to date, it's not dead. Maybe the
2016 release can turn things around.

Also it's nice that AVX512 came out of Larrabee, but the last line claiming
credit to a single person seems dubious.

------
pjmlp
Yet, from the point of view of someone seating on the GDC Europe rooms hearing
how Larrabee was going to change the world of game development, back in 2009,
I think we can state it did fail.

~~~
wolfgke
> I think we can state it did fail.

I think we can state it did fail as a _graphics processor._

~~~
tw04
I think the point is the author is being EXTREMELY disingenuous when he claims
Intel didn't even want to make a GPU.

~~~
binarycrusader
Read again, the claims were very specific in regards to a _discrete_ GPU.
Also, the author worked at Intel on Larrabee so unless you have proof
otherwise I think it's best to give them the benefit of the doubt as to
accuracy of the claim.

~~~
tw04
Did you not read the original post you responded to? They were ALL OVER this
in the middle part of 2000 proclaiming larrabee was a discreet GPU.

[http://www.anandtech.com/show/2580](http://www.anandtech.com/show/2580)

[http://www.anandtech.com/show/3738/intel-kills-larrabee-
gpu-...](http://www.anandtech.com/show/3738/intel-kills-larrabee-gpu-will-not-
bring-a-discrete-graphics-product-to-market) [http://www.cnet.com/news/intels-
larrabee-more-and-less-than-...](http://www.cnet.com/news/intels-larrabee-
more-and-less-than-meets-the-eye/)

[http://www.pcper.com/reviews/Graphics-Cards/Larrabee-New-
Ins...](http://www.pcper.com/reviews/Graphics-Cards/Larrabee-New-Instruction-
set-gives-developers-first-glimpse-new-GPU)

~~~
binarycrusader
Read the original post again. I don't care what the news sites said at the
time, the guy who wrote the post literally worked on it.

Read the post carefully.

~~~
pjmlp
Do you know who is Michael Abrash?

GDCE 2009, "Rasterization on Larrabee: A First Look at the Larrabee New
Instructions (LRBni) in Action"

[http://www.gdcvault.com/play/1402/Rasterization-on-
Larrabee-...](http://www.gdcvault.com/play/1402/Rasterization-on-Larrabee-A-
First)

There were another Larrabee talks, but only this one is online.

The marketing message was not only about graphics, but how the GPGPU features
of Larrabee would revolutionize the way of writing games.

~~~
binarycrusader
Yes, I know who Abrash is -- I have one of his most famous books on my shelf.
I'll say it again, the author's claims were _very_ specific:

 _in terms of engineering and focus, Larrabee was never primarily a graphics
card_

Note the author says _primarily_ and is specifically talking about it in terms
of _engineering_ and _focus_.

------
kevin_thibedeau
> This page requires JavaScript to function properly.

Hint: For a page with nothing but text content you're doing it wrong. Even
more so since this is a static site and you have to go to extra effort to
break normal degradation of HTML content.

~~~
angry_octet
Once again someone down votes you because they disagree.

Personally I also browse with javascript disabled and thought the mandatory JS
was silly. Hence I did view source, and lo and behold it is a tiddly wiki,
with 2MB of html containing every article.

While github now has pages.github.com and the ability git pull/push your wiki
(git clone git@github.com:username/project.wiki.git), I'm not sure how well
that works for using on another web server or locally.

------
Saad_M
Isn't this a case where Intel should not of talked about their internal R&D
until they had a clear product & message to bring the market? I wonder if this
lack of discipline cost Larrabee the correct narrative that it should have.

~~~
ihaveajob
If I remember correctly, the idea at the time was to catch up to Nvidia/ATI
(leapfrog them, really) in the discrete GPU market. Thus by announcing the
technology long before it was ready, this would put a damper on their
competitor's sales and lead. Kind of like an externalized Osborne effect.

[https://en.wikipedia.org/wiki/Osborne_effect](https://en.wikipedia.org/wiki/Osborne_effect)

------
ericjang
_But Intel management didn 't want one, and still doesn't._

Why is this the case? It seems counter-intuitive given the recent Nervana
acquisition (presumably to compete with NVIDIA)

~~~
SXX
This might have something to do with their patent dispute with Nvidia that
cost them $1.6 billion back in 2011. Probably Intel still want (able to)
compete with Nvidia on compute, but not graphics market.

~~~
csours
Do intel integrated graphics suck because of patents?

~~~
DigitalJack
Mostly it's because of the limited power budget.

------
gbrown_
O.K. fucking major rant below. Disclaimer I worked for an vendor selling large
quantities pretty much anything in the HPC space at the time KNC came to
market and I now work for a different (larger) HPC vendor.

    
    
      When I say "Larrabee" I mean all of Knights, all of MIC, all of Xeon Phi, all of
      the "Isle" cards - they're all exactly the same chip and the same people and the
      same software effort. Marketing seemed to dream up a new codeword every week,
      but there was only ever three chips:
      
          Knights Ferry / Aubrey Isle / LRB1 - mostly a prototype, had some performance gotchas, but did work, and shipped to partners.
          Knights Corner / Xeon Phi / LRB2 - the thing we actually shipped in bulk.
          Knights Landing - the new version that is shipping any day now (mid 2016).
    

I can kind of see what the author is getting at here but it's worth reminding
general readers KNL is a wildly different beast from KNF or KNC.

1\. It's self hosting, it runs the OS itself/ there is no "host CPU".

\- Yes KNC ran it's own OS but it was in the form of a PCI-E card which still
had to be put into a server of some kind.

\- Yes KNL is slated to have PCI=E card style variants but the self hosting
variants are what will be appearing first and I honestly don't think the PCI-E
variants will gain much if any market traction.

2\. It features MCDRAM which provides a massive increase in total memory
bandwidth within the chip. Arguably this is necessary with such large core
counts.

3\. Some KNL variants will have Intel's Omni-Path interconnect directly
integrated which should further drive down latency in HPC clusters.

    
    
      Behind all that marketing, the design of Larrabee was of a CPU with a very wide
      SIMD unit, designed above all to be a real grown-up CPU - coherent caches,
      well-ordered memory rules, good memory protection, true multitasking, real
      threads, runs Linux/FreeBSD, etc.
    

Here's an interesting snippet, drop "Larrabee" from the above and replace it
with "Xeon". I'm just wanting the point here that the Xeon and Xeon Phi
product lines are only going to look more similar over time. For a while now
the regular Xeons have been getting wider. Which is to say more cores and
fatter vector units. The Xeon Phi line started out with more cores but is
having to drive the performance of those cores up. Over time we're going to
see one line get features before the other e.g. AVX512 is in KNL and will be
in a "future Xeon" (Skylake). And it's widely rumored that "future Xeons" will
support some form of non-volatile main memory, something not on any Xeon Phi
product roadmap today. The main differentiator will be that Xeon products need
to continue to cater to the mass market whereas Xeon Phi can do more radical
things to drive compute intensive workloads (HPC).

    
    
      Larrabee, in the form of KNC, went on to become the fastest supercomputer in the
      world for a couple of years, and it's still making a ton of money for Intel in
      the HPC market that it was designed for, fighting very nicely against the GPUs
      and other custom architectures.
    

Yes at the time Tianhe 2 made it's debut it was largely powered by KNC and it
placed at number 1 on the top 500.

Did KNC as a product make a lot of money for Intel? Probably not. Yes they
shipped a lot of parts for Tianhe 2 but for other HPC customers, not so much.
At the time putting a KNC card against a Sandy Bridge CPU it wasn't leaps and
bounds faster. Yes you're always going to have to re-factor when making such a
change in architecture but the gains just weren't worth it in most cases. Not
to say there haven't been useful deployments of KNC that have contributed to
"real science", but it was not a wild success by any measure.

As for "fighting very nicely against the GPUs and other custom architectures",
I don't think so. When KNC got to the market Nvidia GPUs already had a pretty
well established base in the HPC communities. Many scientific domains already
had GPU optimized libraries users could pick off the self and use. Whilst KNC
was cheaper than the Kepler GPUs on the market at the time it wasn't much
cheaper. Again back to regular Xeons, for a cost/ performance benefit KNC
wasn't worth it for a lot of people.

    
    
      Its successor, KNL, is just being released right now (mid 2016) and should do
      very nicely in that space too.
    

I do agree with this, KNL is going to do very well. There are whole systems
being built with KNL rather than large clusters with some KNC or some GPUs.

    
    
      1. Make the most powerful flops-per-watt machine.
    

Not sure what is mean here, most efficient machine in the top 10? Tianhe 2 is
not an efficient machine by any stretch of the imagination. For reference
here's the top 10 of the Green 500 for when Tianhe 2 placed as number 1 on the
Top 500.

[http://www.green500.org/lists/green201306](http://www.green500.org/lists/green201306)

    
    
      SUCCESS! Fastest supercomputer in the world, and powers a whole bunch of the
      others in the top 10. Big win, covered a very vulnerable market for Intel, made
      a lot of money and good press.
    

Yes KNC provides the bulk of the computing power for Tianhe 2 but at the time
of writing of the above is number 2, not number 1. Secondly that is the ONLY
machine in the top 10 that uses KNC AT ALL.

------
chx
Just yesterday ASRock announced a Xeon Phi x200 (Knights Landing) system. This
is now using a main CPU socket instead of a PCIe extension card and it seems
to support only a single LGA-3647 socket (that's a LOT of tiny legs). So
ASRock took four of these, each taking half a U (Supermicro had similar
formats with their Twins) and packed four into 2U.

~~~
nkurz
Thanks for pointing this out. It's an interesting design:
[http://www.anandtech.com/show/10553/asrock-rack-
launches-2u4...](http://www.anandtech.com/show/10553/asrock-rack-
launches-2u4nf-x-200-knights-landing-xeon-phi-cpu)

------
rbanffy
Thinking of the Phi as a special-purpose part misses an important point: CPUs
will not get much faster. Vector units will get wider and core count will
increase. Programming for Xeon Phi gives you a view of what a CPU will be a
couple years from now. Right now my laptop has 4 cores running 8 threads and
the only reason it has exquisitely fast cores that dedicate large areas to
extract performance from every thread (rather than investing all those
transistors on something else) is because our software is built for
ridiculously fast versions of computers from the 90s.

------
SXX
I suppose a lot of ideas behind graphics on Larrabee end up in OpenSWR
project. So anyone interested in scientific visualization might check it.

~~~
rubber_duck
I think Larrabee was interesting because it had the potential to go the other
way - it could emulate existing rasterization APIs decently but would allow
for new realtime rendering approaches - like raytracing in to some sparse
voxel tree structure for non-animated geometry - maybe do hybrid rendering
where you would do deferred shading/rasterization for dynamic scene and then
ray trace in to the tree for global illumination with static scene and similar
stuff.

I'm sure emulating OpenGL on the CPU has it's uses outside of being a
reference for drivers and such - but it's not really that exciting, GPUs are
already excellent at doing that kind of rendering.

~~~
elihu
I also wish that had taken off; 3d-graphics via a software ray tracing engine
would in a sense be a return to the way things worked in the 90's before
hardware rendering, where every game had its own custom engine and developers
were always trying crazy new things. In some sense, GPU hardware has gotten
flexible enough that we're sort of back in that world again, but writing GPU
code isn't quite the same as writing a renderer from scratch.

When it comes to ray-tracing in particular, as far as I know there really
doesn't exist (except perhaps as research prototypes) any hardware platform
that's great at ray tracing. CPUs are okay but they need more parallelism.
GPUs are okay but (as far as I understand it) they aren't good with lots of
branches and erratic memory accesses. It seems like KNC/KNL ought to be ideal
for this kind of thing (and in fact, it appears a fair bit of effort has gone
into optimizing Embree for Xeon Phi).

~~~
pjmlp
You can still do that kind of stuff, but on the GPU with geometry/compute
shaders.

Now with Metal, DX12 and Vulkan GPU languages, maybe that is ever more
approachable, specially in cards like Pascal.

------
unsignedqword
Larabee was a glimpse into the future. 10, 15 years from now, are we really
going to be discretizing our computers into seperate CPUs and GPUs?

~~~
pjmlp
Sure we are, that is the path of the last say 5 years, even on mobiles.

------
rhaps0dy
Only slightly related, but: why do all Intel consumer-oriented processors now
feature an integrated GPU?

~~~
guardian5x
A simple answer would be to raise market share in the gpu market. Also a lot
of non-gaming software uses the gpu for various tasks, even office and
photoshop nowadays. But they do not need a full-fledged gpu.

------
castratikron
My comp sci professor has a good rule of thumb: "If I can't buy it on Newegg,
it doesn't exist."

Searched for 'Xeon Phi' on Newegg just now and only desktop CPUs show up. 'GTX
1080' brings up 59 results.

------
zump
Ugh. This guy has been spruiking Intel for atleast a five years.

------
crb002
Govt HPC buys.

