Hacker News new | comments | show | ask | jobs | submit login
Intel to Develop Discrete GPUs, Hires AMD's Raja Koduri as Chief Architect (anandtech.com)
408 points by namlem 4 months ago | hide | past | web | favorite | 218 comments

The connection between "AI" and "GPU" in everyone's mind is a testament to the PR chops of NVIDIA. You don't need a GPU to run ML/DL/neural networks, but NVIDIA have GPU tech so they're selling GPUs. What you need is the massive ALU power and, to lesser extent, the huge internal bandwidth of GPUs. There are huge chunks of GPU die area that are of no use when running NN-type of code: the increasingly complex rasterizers, the texture units, the framebuffer/zbuffer compression stuff, and on the software side, the huge pile of junk in the drivers that allows you not only to run games from a decade ago, but also run them better than last year's GPU. If you can afford to start from scratch, you can lose a lot of this baggage.

Indeed, and that's why there are a couple of startups working on new chips and why Google has the TPU. Here's a nice technical talk from Graphcore's CTO about that https://youtu.be/Gh-Tff7DdzU

Was at that talk and it looks very interesting, the team has delivered before.

They just raised more money too - I just wonder how painless the developer experience will be with using their drivers with latest versions of your chosen DL framework, and how price/perf will compare with DL specific tensor processor/GPU hybrids like Volta.

What you need is a massive amount of parallel cores. Currently, the cheapest and most efficient way to achieve this is GPUs. It's true that some graphics-specific parts of a GPU are not needed for compute-only kernels (such as those for ML or other AI tools), but it's still a lower overhead than a CPU in yet another box.

Who knows, perhaps Intel will be developing more general purpose massively parallel compute processors, but intends to integrate some of the knowledge and experience accrued from the field of graphics processors.

I think they'll learn from the knights series of processors:. Intel seems to keep shooting itself in the foot by backing powerful computational cores with terrible on chip network architecture

And yet Intel seems to be wanting to make GPUs for machine learning now...so I guess Nvidia's PR worked against Intel, too?

But as I said in another comment, the truth is Intel doesn't seem to be knowing what it's doing, which is why it's pushing in 5 or 6 different directions with many-core accelerators, FPGAs, custom ASICs, neuromorphic CPUs, quantum computers, graphcores, and so on.

By the time Intel figures out which one of these is "ideal" for machine learning, and behind which arrows to "put more wood," Nvidia will have an insurmountable advantage in the machine learning chip market, backed by an even stronger software ecosystem that Intel can't build because it doesn't yet know "which ML chips will win out".

If I would describe Intel is a sentence these days is "Intel doesn't have a vision." It's mostly re-iterating on its chips and rent-seeking these days by rebranding weak chips with strong chip brands, and adding names like "Silver" and "Gold" to Xeons (and charging more for them, because come on - it says Gold on them!), as well as essentially bringing the DLC nickle-and-diming strategy from games to its chips and motherboards.

Meanwhile, it's wasting billions every year on failed R&D projects and acquisitions because it lacks that vision on what it really needs to do to be successful. Steve Jobs didn't need to build 5 different smartphones to see which one would "win out" in the market.

Non-incremental advances require a lot of wasted-path R&D. If any of Intel's projects creates a generational leap, it will pay off handsomely. When the way forward isn't clear, I like to use the concepts from path finding algorithms to drive strategy. Assuming you can afford multiple parallel efforts.

It's not clear if doing this in-house, or closely monitoring the state of the art and then buying a company that develops a winner, is superior.

Nvidia is probably "wasting" just as much money on R&D to figure out "which ML chips will win out" 5 years from now, so that they can built it first and sell it as a "GPU".

> many-core accelerators, FPGAs, custom ASICs, neuromorphic CPUs, quantum computers, graphcores

Most of those are completely different technologies that will almost certainly not share a niche.

> If you can afford to start from scratch, you can lose a lot of this baggage.

Effectively, this argument is much like saying that your personal workloads don't use AVX and demanding that Intel tape out a whole different die without it. You would very rightly be laughed out of town for even suggesting it.

Much like the economics of cryptomining cards that lack display outputs, this comes down to whether there is actually enough of a market to justify taping out a whole specialty product just for this one niche, vs the economies of scale that come from mass production. AFter all that is the logic behind using a GPU in the first place, instead of a custom ASIC for your task (like Google's Tensor Processing Unit). On the whole it is probably cheaper if you just suck it up and accept that you're not going to use every last feature of the card on every single workload. It's simply too expensive to tape out a different product for every workload.

This only gets more complicated when you consider that many types of GPGPU computation actually do use things like the texture units, since it allows you to coalesce memory requests with 2D/3D locality rather than simple 1D locality. I would also not be surprised if delta compression were active in CUDA mode, since it is a very generic way to increase bandwidth.

The GPGPU community absolutely does use the whole buffalo here, there is very little hardware that is purely display-specific. If you want hardware that is more "compute-oriented" than the consumer stuff, that's why there's GP100 and GV100 parts. If you want even more compute-oriented than that, you're better off looking at something like that's essentially fixed-function hardware dedicated to your particular task, rather than a general-purpose GPU unit.

So, it doesn't really make any economic sense.

I'm curious - what's getting "increasingly complex" about rasterizers?

I was referring to the move towards tile-based operation, and towards more paralellism in the front-end (e.g. for a long time even the most powerful GPUs had a bottleneck of processing one triangle at a time at a certain point in the pipeline, and recently increased it to... two).

I hope we can also get a more scalable form factor, so that we can keep stacking these new compute engines even if we run out of PCI slots or physical space inside the case.

Internal bandwidth is usually the limiting factor for neural networks, not the ALU.

Can't fight them on price? Fight them on talent.

Whoever at AMD who refused to match the offer probably made a terrible decision. This is about the worst time to lose that talent right after inking a gpu die deal which, in light of this news, will only be temporary. AMD just got played.

If I were AMD, I would review Mark Papermaster's comp and incentives to ensure he doesn't leave.

(I'm long AMD)

I don’t think this was all about money. Raja had been trying to run Radeon Technologies Group like an independent company and pushing for separation from AMD for a while. HardOCP did a good piece on this -> https://hardocp.com/article/2016/05/27/from_ati_to_amd_back_...

I think the recent Intel + AMD custom chip was probably the last thing Raja did before he got pulled RTG got the reins put back on and now he’s hoping ship to peruse what he’s wanted all along. To work with more independence.

Taking the GPU unit independent was never going to happen. The entire point of the ATI acquisition was to create the APUs that AMD has made for years, as well as get ahold of that great gaming market profit.

What leads you to think Intel will give him more autonomy? Or is their integrated graphics team setup differently than Radeon? I would suspect the old ATI boundaries after 11 years would still be stronger than something Intel has homegrown over the past 20.

Well, he wouldn’t go there if Intel wouldn’t give him what he wanted, whatever that might be.

And if it's not corporate structure, maybe it's a boat.

More power to him.

And if you were at an auction, would you review Warren Buffett’s bid to ensure he doesn’t leave with the item you want?

I know AMD has access to some resources, but If Intel decided he was a strategic hire, the game was over before it started. It’s not just that they are richer, it’s worth spending more to them because you could argue it ties into their most important long term IP battles in related areas like massively parallel computing, ML, etc.

Strategic hiring should fail against survivalistic retention, but I may be an idealist. I'd think in practice hiring for growth would fail against retaining to survive.

There is a point where the cost of rentention would ensure a failure to survive.

At that point you have to just cut the line.

Buffet doesnt do auctions

AMD GPUs haven't faired well against Nvidia for some time. AMD GPUS use 1.5X more power to keep pace. The recently released Vega GPUs are underwhelming. I don't think Nvidia is too concerned about him going to Intel. AMD needed a shakeup in that division anyway.

Why do you think power consumption by itself is a pain point for AMD?

Gaming customers don't care because it's only consuming more when actually playing - 500 hours played in a year times 0.1 eur/usd per kwh says you only save 50 eur or usd per year per 1000 watts reduction on power consumption, so it's like 2.5 eur or usd per year for a 50W difference. Cryptocurrency miners have voted with their wallets and gone AMD. There are some 24/7 HPC GPGPU users who put weight on power consumption but it is a small market segment.

Of course lower power consumption lets you do all kinds of useful engineering decisions within the product, so lowering it would make the product faster or cheaper, but that is already accounted for in the direct cost and performance numbers of the current product.

In the high-end gamers do care very much about power consumption and especially cooling. Even in the mid-range it's something that can sway the buying decision from AMD to Nvidia, especially when the higher power consumption doesn't come with a performance lead.

With cryptominers I'm not sure they are actually "voting" for anything, to me it rather looks like they are buying up pretty much all the decent mid to high range cards in bulk, regardless of brand, it's more about availability.

For the mass market, heat/power matters a lot. The most successful GPU of recent years was the GeForce 750 Ti. It was notable for being a GPU that did not require an external power connector that could be plugged into basically any pre-built computer with crappy integrated graphics for a major upgrade, without having to improve cooling or the PSU. This is the segment where most of the revenue is made in GPUs, and it's where AMD is really hurting.

Right now, if you are building a new computer and get to make all your own component selections, a Vega 56 card at MSRP can be a really good deal, at that segment AMD GPUs are really competitive except at the very top -- you might pay a little more in the PSU and your power bill, but the sticker price of the GPU will make up for that. However, if, like >90% of gamers, you don't make all your own component decisions and don't have the power budget, the NVidia 1050Ti reigns supreme. The best competition AMD can muster against it in the "no external power connector" segment are some RX560 models. This is in itself a terrible marketing decision as it prevents the kind of word of mouth "just buy a 750Ti" advertising that worked really well for nVidia, on top of the fact that in the segment the AMD cards just do much less well than the nVidia ones.

You're overstating some of this. Some 750 Tis actually do need an external power connector, and the high-end/enthusiast market actually doesn't care about power consumption as long as the performance justifies it.

If Vega were smoking a 1080 Ti then you wouldn't hear any grumbling at all. It's when you get into a situation where Vega is pulling more power than a 1080 Ti and delivering performance that's barely above a 1070 in many titles that people start to get queasy about it.

PCGH just put out their new benchmark charts and the only Vega part that can even match the 1080 on average is the 64 Liquid Cooled version, which is roughly a $650 product at the moment. You're paying 1080 Ti money for a 1080 that pulls twice as much power as a 1080, which is a pretty unappealing on the whole. The only real value argument AMD has been able to make is FreeSync, but it can't really make up for that kind of performance/value deficit.


Another little-discussed disadvantage is that Vega is a delicate little flower, even moreso than Fiji was. Even most board partners that normally allow you to keep your warranty while using a waterblock have decided that Vega is just too delicate to have users taking the cooler off. You put a waterblock on, you lose your warranty. Many stores are not allowing returns for them either and I suspect this is a factor (along with generally immature drivers and other problems resulting in generally low user satisfaction).

The no return policies started with the current cryptomining boom. Most stores I've seen with them apply it to both nvidia and amd.

High-end gamers are not representative of the audience.

In the mid-range (/general audience) it's true heat can sway the buying decision, and in fact, everything that comes along with heat (for example, space occupation) is taken into account in the price.

Crypto currency miners don't use GPU. They use ASICs, which are about 6 orders of magnitudes more efficient, energy wise.

Indeed, they're not voting for anything.

Please tell me where to buy one of these Ethereum ASICs.

I kinda assumed Bitcoin, since that's where most (almost all?) of the computation is happening.

And if Ethereum gets any significant traction, the ASICs will come. That's pretty much inevitable. Heck, I bet ASIC would be worth the investment even for Argon2d hashes —even though that one was designed for modern stock hardware.

> And if Ethereum gets any significant traction

You mean like a $30 billion market cap? [0]

Afaik not all blockchain implementations profit from ASIC hardware, some even actively discourage ASCI use by making ASCI hardware use not efficient, like Monero [1], could be that Ethererum does something similar.

[0] https://cryptocoincharts.info/coins/info

[1] https://monero.stackexchange.com/questions/47/is-monero-amen...

A 300 watt cpu is enormously bulky and makes an enormous racket. I remember when I upgraded to a maxwell-based (then pascal) GPU, it was a revelation. It no longer sounded like a hair dryer when gaming and no longer was my PC getting annoyingly toasty.

Gamers care about noise and heat very much.

More power means more cooling which means more noise.

I'd imagine gamers being a significantly smaller market than HPC/cloud where power consumption directly correlates to Total Cost of Ownership.

> There are some 24/7 HPC GPGPU users who put weight on power consumption but it is a small market segment.

Ugh latest estimates on machine learning are something like $40B by 2024.

Gaming is roughly double the size of data centre and professional visualisation sales combined for NVidia. I’d imagine the difference is even larger for AMD.

See https://wccftech.com/nvidia-second-quarter-2017-fy-18-analys...

Today. See my point about 2024.

For video games maybe, but for high-performance computing, Vega is incredible bang-for-buck especially in FP16, where a $500 Vega 64 gives you 25+ TFLOPs while a $6000 Quadro with GP100 gives you 20 TFLOPS (other NVidia cards do FP16 at 1/64th the speed of FP32, making it largely pointless).

In this position and pay grade, i think he wouldn't even be allowed to work for intel if amd would have said no.

Yeah, that's kind of weird, at the time when AMD need to invest more effort in their GPUs to get ahead to Nvidia.

If they are playing this smart, there should be a bench of people, nearly as experienced, and willing to bust their ass twice as hard for half as much.

Somehow, companies never play this smart. Maybe at very R&D laden companies things are different, so I wouldn't bet against AMD, but wouldn't bet on them either.

Nothing some hot OTC options can't solve (hah!) - AMD definitely got played.

I am short AMD, long NVDA. Keep an eye open on ER tomorrow.

Raja, if you are reading this make sure your Intel GPU has two things that competition doesn't:

1) FP8 half-precision training: NVidia is artificially disabling this feature in consumer GPUs to charge more for Tesla / Volta.

2) A licensed / clone of AMD SSG technology to give massive on-GPU memory: NVidia's 12 GB memory is not sufficient for anything beyond thumbnail or VGA sized images.

My experience with Intel Phi KNL has been miserable so far, I hope Raja has better luck with GPU line.

Freely available full documentation, from package pinouts to ISA and programming guides (at least as extensive as the x86 SDM[1]), would also be met with great praise, especially in the open-source community. For a recent GPU to have such open documentation would be, AFAIK, a first.

[1] https://software.intel.com/en-us/articles/intel-sdm

1) is not true, in the current generation FP16 is not artificially disabled on consumer GPUs but rather does not exists. And it’s not just consumer GPUs out of 4 Pascal Tesla’s only 1 has support for FP16 and FP64 the GP100 based P100.

The GP102 and 104 Cards which include the consumer cards and Tesla’s like the P4 and P40 are inference focused and support Int 16 and Int 8 dot products while the first generation Pascal GP100 doesn’t.

If anyone artificially locked down FP16 support it’s AMD as consumer VEGA doesn’t support it for compute.

2) NVIDIA already has a competing solution, Pascal has had unified memory support form day one, 49 bit address space, AST and paging they already partner with SSD makers to have an addon card which is mapped to VRAM.

> My experience with Intel Phi KNL has been miserable so far, I hope Raja has better luck with GPU line.

I'd love to see the Phi approach taken further. I'm not a huge fan of having different ISAs, one for my CPU, one for the compute engines of the GPU (to say nothing about the blobs on my GPU, network controller, ME). I'd prefer a more general approach where I could easily spread the various workloads running on my CPU to other, perhaps more specialized but still binary-compatible, cores.

Heck... Even my phone has 8 cores (4 fast, 4 power-efficient, running the same ISA).

When you've got a huge out of order engine the extra effort it takes to decode x86 instructions is lost in the noise. When you're going with the flock of chickens approach and you have a huge number of very small cores then the overhead is killer. Intel tried to solve this by using medium cores with big SIMD units but SIMD is just less flexible than a GPU's SIMT is.

Power and area generally scale as the square of the single threaded performance of a core. The huge number of "cores"/lanes in a GPU are much smaller and more efficient individually than even your phone's smaller cores. And the x86 tax gets worse and worse the smaller you try to make a core with the same ISA. Intel wasn't even able to succeed in competing with the Atom against medium sized cellphone chips.

> but SIMD is just less flexible than a GPU's SIMT is

There is nothing preventing the x86 ISA to be extended in that direction. As long as all cores (oxen and chickens, as Seymour Cray would say) can shift binaries around according to desired performance/power, I don't care.

Binary compatibility is awesome for software that has already been written and for which we don't have the source code. Pretty much everything on my machines has source available.

The OS may need to be more aware of performance characteristics of software it's running on the slightly different cores so it can be better allocated, but, apart from that, it's a more or less solved problem.

Atoms didn't perform that much worse than ARMs on phones. What killed them is that they didn't run our desktop software all that well (even though one of my favorite laptops was an Atom netbook).

That's a nice goal on paper but in the real world a "generic" architecture is never going to approach the performance of a specialized one with bare-metal optimizations.

If your program is computationally intensive enough that you're abandoning general-purpose processors and moving to a specialized co-processor card, you should really just go whole-hog on it and bare-metal optimize to get some performance out of it. It doesn't make sense to do this half-way - which is why Xeon Phi has always faced such an uphill battle for adoption.

> 1) FP8 half-precision

half precision is FP16.

> 1) FP8 half-precision training: NVidia is artificially disabling this feature in consumer GPUs to charge more for Tesla / Volta.

No, this physically is not present on consumer chips. You can't subdivide the ALUs like that even on Tesla P5000 cards. Of course you can promote FP8 to FP32 without an issue, on any card, but you don't gain any performance either.

At the time Pascal was designed it didn't make any sense to waste die space on FP16 support let alone FP8, since games are purely FP32. This is changing now that Vega has FP16 capability ("Rapid Packed Math") and titles may be using this capability where appropriate. I would not be surprised to see it in Volta gaming cards at all.

It's funny, everything old is new again. Someone comes up with this idea about once every 10 years. Using FP16 or FP24 used to be big back in the DX9 days.

> 2) A licensed / clone of AMD SSG technology to give massive on-GPU memory: NVidia's 12 GB memory is not sufficient for anything beyond thumbnail or VGA sized images.

You're looking for NVIDIA GPUDirect Peer-to-Peer, which has existed since like 2011.


AMD's product is actually purely marketing hype, it's simply a card that contains a PLX chip to interface a NVMe SSD. It is the same technology that is used for multi-GPU cards like the Titan Z or 295x2, and it offers no performance advantages vs a regular NVMe SSD sitting in the next PCIe slot over.

This is something that people didn't know they wanted until AMD told them they wanted it. But you can do this on any GeForce card even, no need to shell out $7000 for some crazy custom card that doesn't even run CUDA.

The bigger problem is that there really isn't much of a use-case for it. NVMe runs at 4 GB/s, which is painfully short of the ~500 GB/s that the GPU normally runs at. That is even significantly less bandwidth than host memory can provide (a 3.0x16 PCIe bus limits you to 16 GB/s of transfers regardless of whether that's coming from NVMe or host memory).

AMD SSG supports up to quad-SSD built right on the graphics card, presumably to improve bandwidth.

I am not convinced that Intel can win here. They seem to not succeed with home grown GPU tech and other big bang approaches. Now if they were to acquire decent GPU tech then I would bet on them. Just the homegrown route seems to not work out for them.

I suspect part of the reason is the long time frames for dev of this tech. I suspect it is at least 2 years for this to see the light of day. That is forever in this space.

Intel failed with Larrabee and itanium. Maybe this will go better?

It looks like Raja will lead the development of machine learning-focused GPUs. Isn't this Intel basically admitting that their Xeon Phi, Nervana, and Altera GPU efforts to win the machine learning market are all a dead-end?

How many machine learning strategies is Intel going to try? Does it even know what it's doing? Spending billions of dollars left and right on totally different machine learning technologies kind of looks like it doesn't, and it's just hoping it will get lucky with one of them.

And even if you think that's not a terrible strategy to "see what works", there's still the issue that they need to have great software support for all of these platforms if they want developer adoption. The more different machine learning strategies it adopts, the harder that's going to be for Intel to achieve.

Intel needs a CEO that skates to where the puck is going to be, not where it was three goals ago.

That's scary of that's what is required of a ceo. You would either need to be an oracle and predict where the industry will go or you'd need to make the industry go the direction you're taking the company.

I think there is more to AI and ML than deep learning in which TPUs and GPUs are obvious choice.

But I bet that branching instructions (various variants of search) still play a big role when you go beyond classifiers to reinforcement learning etc. so there is need for other architectures beyond GPUs.

I do love AVX512 and FMA instructions for CPU code with lots of branching. But that's not definitely going to cut it for the pure almost solely linear algebra workloads of most deep learning.

I do have high hopes for their memristor initiative, but that's got to be years out.

The GPU move is smart for Intel.

Titanium was really more of a failing of core technology than a failing of execution whereas Larrabee was much more of a failing of execution than core tech, and the issue with Larrabee was really the fact that software GPU functions were not only slower in speed, but even slower to develop (!).

I thought Itanic was just a really smart diversionary tactic to kill off HP-PA (HP), MIPS (SGI) and Alpha (DEC/Compaq) development. It worked, and it didn't matter that Itanic was itself on borrowed time.

The later is further proof that if you sell to dinosaurs, you won't survive the Big One (in this case, x86 growing up courtesy of AMD and Arm spending 15 years washing the footing underneath both by moving personal computing to mobile devices). This should be a big warning sign to the OpenPower guys. You need to start small and scale/price up, not the other way around.

Itanium was an all-around failure, including idea, execution, management, and ecosystem.

Interesting. I guess I view homegrown tech just really risky if you try a new approach. Success isn't assured here unless you just mostly replicate with a just small little spin on top.

Maybe they are building upon AMD's core tech based on that other licensing deal? If so I would bet on them succeeding.

Intel also utterly failed in the mobile space, IoT things and are setting themselves up for failure in edge computing. What's one more thing to fail at?

The link is incredibly light on actual content but this seems to be good news for AI enthusiasts as perhaps now we'll get a reasonable competitor to CUDA/CUDNN and their associated hardware for running GPU accelerated machine learning. Intel seems to be taking the ML/AI space seriously and this move seems very likely to be related. Yes I'm aware of OpenCL as I am also aware of it's level of support with libraries such as PyTorch, Tensorflow, Theano -- it isn't the first class citizen that CUDA is. While those libraries aren't perfect they offer the experience of writing the experiment on your laptop without a GPU, validating, then running the full experiment on larger hardware.

In my ideal world competition from intel would force NVidia to play nice with OpenCL or something similar, and encourage competition in the hardware space instead of driver support space. Unfortunately the worst-case looks something more like CUDA, OpenCL and a third option from Intel with OpenCL like adoption. :(

My view is CUDA has already won, and everyone else needs to get over it. Even clang supports PTX now, which is a reasonably device agnostic representation, albeit controlled mostly by Nvidia. Perhaps intel will introduce their own extensions to this ISA.

Even if my precompiled CUDA application could run on Intel GPUs at 50% of the throughput, I'd be happy if I could later tweak and recompile it to get the full benefits from their hardware.

Yes, Intel knows how to play when they own the instruction set (x86, sse1,2,3,4 and amd64). No, amd64 is just cross-licensing between two and only two companies. AMD will never catch Intel. Ergo, Intel owns amd64 too.

It shouldn't be a surprise then that along comes NVidia with their own instruction set, PTX, and Intel's desire to own the instruction set will be their undoing.

One way Intel can compete with CUDA is if they allow us to write plain Numpy code, and magically compile it for their GPUs with similar performance to what we get from Nvidia when writing in CUDA.

An alternative would be for them to (again magically) modify all major DL frameworks to support their GPUs.

I don't even know which option is more realistic.

I think #1 is the most realistic. Intel does some very significant work with Numpy as it is with MKL. #2 seems like dark voodoo magic that'd be neat, but pretty much impossible.

One way Intel can compete with CUDA is to show a benchmark where they outperform NVidia in training speed. Not speed per watt, not inference, not oddly priced speed per dollar comparisons.

If the speed is there the world will write the code.

Sure, they can "show a benchmark", however, to be more than 30% faster in reality (not just on a marketing slide) than the next generation Nvidia chip is probably just as unlikely as the first two options.

I'd love to get surprised though.

I just hope it won't be all too focused on deep learning. That's boring. We need massively parallel machines that are more general than that. Modern NVIDIA-style GPUs are actually doing quite well, although they've started adding stuff that is only marginally useful outside of straightline matrix operations.

Only armchair spectators are still talking about OpenCL. It's as dead as disco.

I still use OpenCL on the daily, it allows me to program for both NVidia and AMD GPUS simultaneously with the minimum amount of pain. And you can still use it for mobile GPUs and specialized chips like the Myriad.

Do you have another open, cross-platform, widely compatible GPU programming framework to recommend?

There is DCompute[1] which is able to generate SPIR-V and PTX simultaneously from the same D source. See my comment to the parent.


Same here. What else can you do to ship GPU acceleration to AMD and NVIDIA people?

The alternatives recommended here aren't even serious IMHO. I'd rather switch to CUDA and wait till Intel/AMD sort out a REAL compatibility layer than deal with those.

Unless I'm mistaken, HIP still requires a separate compile for either platform and what runtime do they expect end users to have exactly?! At least CUDA and OpenCL are integrated in the vendor drivers.

Vulkan compute with SPIR-V seems to be the only real solution, but even that is still very early. Sill waiting for proper OpenCL 2.0 support in NVIDIA drivers :P

You simply don't ship. Enterprise deep learning doesn't ship their training code - large models are trained on purpose-designed, dedicated hardware. Hardware compatability doesn't matter, software does. Even that's flexible if it's significantly faster.

(The models can be executed on low-powered, commodity CPUs. No need for any GPU there.)

You simply don't ship

That's totally an option for our product, great idea! Why did I never think of this!

No seriously we are shipping, using OpenCL and it gives about a 20 times performance advantage for most users regardless if they have AMD or NVIDIA hardware. If something that's actually better than OpenCL comes along (or if AMD RTG goes out of business) I'll switch to it no heart broken.

But that hasn't happened yet.

General-purpose programming on the GPU has many use cases apart from deep learning, e.g. image processing, computer vision, offline rendering and other high performance computing applications. OpenCL allows you to have a single code-base working on NVidia, AMD and Intel GPUs without having to recompile or put any special effort - the same kernel that works on NVidia will work anywhere.

HIP allows developers to convert CUDA code to portable C++. The same source code can be compiled to run on NVIDIA or AMD GPUs https://github.com/ROCm-Developer-Tools/HIP

What mobile GPUs?

On iOS it is kind of deprecated and the way forward is Metal Compute.

On Android it never happened, Google created their own Renderscript dialect instead.

There is no reason that you can't create both OpenCL (i.e. SPIR-V) and PTX from the same source. Thats what I do for [DCompute](https://github.com/libmir/dcompute). Agnosticism at the library level/CPU side is next on my list.

The problem isn't getting it to run, the problem is getting similar performance. You need self-tweaking algos to get some level of performance portability using just opencl, let alone across two apis.

Indeed, there is a very large amount of parameterisation you can do in D with ease. You can, if you have prior knowledge of the hardware that it will run on, create different versions of the kernels to exploit that (e.g. with template kernels, which we support directly unlike OpenCL C++).

Do you mean adaptive algorithms or dynamic recompilation? And yes I do expect that cross API will be difficult, for both running and getting good perf.

But it is not just the room at the high performance end of the spectrum that is important, but also the lower end that is stifled by the barrier to entry that would benefit from the extra compute.

I meant dynamic (re)compilation yes - like adapting the stride in your kernel when you vary work group size for OpenCL kernels. I mean, I have something like that (a primitive self-tuning pre-run calibration step) and I only vary work group-, local- and vector size; and that's already a massive pain in the ass. It's ok on toy kernels but once you move beyond that, it just seeps into all your kernel code, making it almost into a meta-language. And then I'm not even talking about differences like using image types vs arrays for data that is 2d by nature. I don't see how I can generalize that; I just write various versions of my kernels. Which doesn't scale very much, to put it mildly.

My point is - was drawn to OpenCL for its 'portability' claim, and yes kernels will 'run' on various hardware, but with massive differences in speed. what good does that portability do me? My workloads (scientific computing, branch heavy) are different from the typical ML half- or single precision MulDiv()/linear algebra applications, so all the hand-tuned CUDA libraries aren't even my concern. The elephant in the room is that performance doesn't depend on this API or that; it's in how you tune your algorithm to the actual hardware you're running on. Which is the direct opposite of portability.

To come back to the post I was replying to - yes it's 'trivial' (I mean, a lot of work, but technically not hard, not to belittle your work) to compile almost any statically typed code into either SPIR-V or PTX or any other future format for that matter. But that doesn't mean that it will work 'well' (not even 'optimal') on other hardware. In the real world, you're almost always better off just spending a few hundred to get the same GPU as whoever wrote the kernels tested them on (or if you're running your kernels on existing clusters, to focus on optimizing them for what you know you'll be running them on).

Oh and all of that is just considering GPU's. I mean, when you read an OpenCL book they make it seem (in the first few chapters) that you don't even have to think about whether you'll be running on a CPU or a GPU. And then you accidentally use USE_HOST_PTR instead of COPY_HOST_PTR (or the other way around) in the wrong place, and all of a sudden your code is 10x slower than the sequential version of your algo even.

What I'm saying is - I no longer believe in easy to use abstractions for these purposes. If you want speed, you code to the metal, and/or you tweak your abstractions to your specific use case. Yes this is a lot of work. And if you don't need speed, you just throw in a few std::thread's here and there and call it a day.

But that's just my conclusion for my use cases.

Thanks for you insight.

I agree that it is not easy to 'parameterise the metal', but it is certainly doable in D[1], in C++ (guessed from the std::thread) good lord no: D's is orders of magnitude ahead of C+'s meta-programming. Writing different versions of a kernel is a poor mans parameterisation ;)

LDC, the LLVM D Compiler, will be getting a dynamic re-optimisation, which I hope to get to play nicely with DCompute. Well PTX, because SPIR-V doesn't yet have a jit backend. Dynamic re-optimisation from what I understand is freeze some variables as constants and rerun the optimisation passes. This is as opposed to recompilation with complete restructuring of the kernel. Not quite the same but for things like loop counts and branch "prediction" this should help a lot.

w.r.t USE_HOST_PTR/COPY_HOST_PTR, that's not a part of the kernel parameterisation, that's part of the host and is easily adjustable. Yes you need to figure our which one to use, but that's just part of tuning.

> What I'm saying is - I no longer believe in easy to use abstractions for these purposes.

I hope to be able to show you otherwise :)


That is more or less what I was getting at, while trying to remain polite on the subject. The truth is that none of the deep learning backends have real support for it and OSX supports an outdated version (IIRC). It has seen very little adoption outside of a handful of niche use cases.

Disco will never die.

I'll continue using it as long as the users are still split between NVIDIA and AMD cards.

If AMD GPUs die out, CUDA it is.

Or maybe give up and use a wrapper library for whatever 10 alternatives-only-supported-by-one-marginal-vendor there are. (like Apple)

are there insurmountable technical constraints there, or could it be resuscitated?

OpenCL is a very sound design on a technical level. Unfortunately, it's mostly designed to be a good target for code generation/libraries, and is awkward to use directly. The problems are that everything is very explicit and spelled out, and the kernel language itself is based on C, not C++, and segmented from the rest of the code base. In practice, you use OpenCL API calls to submit strings containing code to the OpenCL backend, which then compiles it and hands you back an opaque compiled program that you can execute. This makes it hard to write modular and maintainable code with OpenCL directly.

The issues go away if you use a good OpenCL frontend. PyOpenCL for Python goes a long way towards this, and is not really any more awkward than the corresponding PyCUDA, and higher-level languages that generate OpenCL code, like Lift[0] or Futhark[1] (tooting my own horn here), remove the awkwardness completely.

[0]: http://www.lift-project.org/ [1]: https://futhark-lang.org

> ... and is awkward to use directly

You're not wrong there. However with the advent of SPIR-V it is possible to write code in whatever language you please (with the caveat that at the moment you need an LLVM backend using https://github.com/thewilsonator/llvm-target-spirv or the Khronos repo I forked that off of. Then comes the issue of making the code generator friendly interface user friendly, which I have done for D so that you get the ease of use of CUDA.

I think that has been the major failure from Khronos, being stuck with C mentality for their APIs.

OpenGL ES only took off thanks to gaming on the iPhone, and now is deprecated on Apple platforms.

Vulkan still lives in a C world, and the semi-official C++ bindings only exist thanks NVidia.

OpenCL waited too long to support C++, Fortran and providing an infrastructure for compiler writers to add GPU support to their own languages. And two years later the majority of drivers are not there yet.

Khronos' problem is that they want to do everything, or not do it at all. Sycl would be 'real' c++ support, way beyond the 'oo wrapper around c api' that already exists in opencl headers. Nvidia is more 'good enough? Ship it' which has shown itself many times in the past to be a dominant strategy. (As much as I hate to admit that).

It seems like a C API is a much better choice if you want your library to be callable from many different languages. What is a reasonable alternative without giving that up?

A bytecode format, like what CUDA, Metal Compute and DirectX Compute use.

Which Khronos finally adopted as SPIR, but the drivers aren't there yet.

Regarding the other Khronos APIs, a C API is like being stuck in a PDP-11 world.

Many mix the idea of C API with OS ABI, it only happens to be the same if the OS APIs are exposed as plain C.

There many cases where this isn't like it, e.g. mainframes, mobile OSes, and most userspace on OS X (Obj-C runtime) and Windows (.NET and COM).

having a bytecode, and having a C API are completely orthogonal. And thankfully we now have both with SPIR-V, although the C API needs to be wrapped unless you like writing C-like code in higher languages that abstract most of the tedium.

Yes driver support is a bit lacking, although I hop that I can convince the OpenCL working group of the need to get a backend (such as https://github.com/thewilsonator/llvm-target-spirv) into mainline LLVM so that writing drivers becomes easier for vendors.

D3D, Vulkan, OpenGL (optionally) etc. do use a bytecode format, but that only covers shader modules, which are a small part of the overall API. I'm not that familiar with CUDA, but it looks like a large portion of it is delivered as a C API, judging by the bindings I see online.

The OpenCL kernel language is very similar to the CUDA one, before they started adding more C++ support. It's pretty easy to translate between the two if you don't use hw intrinsics.

And segmenting the codebase is a MAJOR feature. With CUDA you are stuck on an old compiler until NVIDIA issues an update. How anyone can think this is a good idea...

Because NVidia understands the GPU world has moved beyond C.

"Designing (New) C++ Hardware”


CUDA has had C++, Fortran support since the early days, with PTX for additional compiler backends added in version 1.4.

That was 2007, Khronos waited until 2015 to specific similar capabilities.

What GPU world though?

People still write regular shaders in languages that are much more C like.

People writing GPGPU code are few. Most of the DL GPU use is in Python through several layers, and in the end you are running hand-written SASS assembly sitting in an NVIDIA DLL or whatever.

I guess some people must think it's handy to have C++ support in GPU kernels, or they wouldn't have added the feature. But for it to drive technology, hard to believe.

The only GPU shaders that are C like are the OpenGL ones.

The Metal, DirectX, PS3, PS4, Nintendo and several middleware engines are C++ like.

Also the fact that OpenCL lost to CUDA for being stuck in C for so long, shows what most GPU devs actually prefer.

Also the fact that OpenCL lost to CUDA for being stuck in C for so long, shows what most GPU devs actually prefer.

Aw come on, you're sure it has nothing to do with the largest GPGPU vendor pushing CUDA very heavily and intentionally gutting their OpenCL tools? Or putting out a ton of very high performance libraries with no OpenCL equivalent? Putting out a shitton of marketing and tutorial videos for CUDA only?

Yeah, that surely was totally unrelated.

Also pushing a proprietary standard goes faster than a standardized one. No surprise there.

NVidia is not the only GPU producer.

If AMD, Intel and embedded OEMs actually produced quality OpenCL drivers, debugger and IDE support and libraries that could match CUDA productivity, maybe devs would bother to use C with OpenCL.

Even Google decided to create their Renderscript dialect instead of supporting OpenCL on Android.

So you're admitting that the problem is quality tooling, not lack of C++?


I am saying that if the other GPU vendors bothered to actually provide a competing technology stack, that was worth the pain of using plain C, maybe GPU devs would have bothered.

It's purely political. If nvidia would have a team of 3-5 ppl working op their opencl drivers/tooling, opencl would be as good as cuda. Nvidia would be stupid to do that, of course. Hence the Khronos play to fold opencl into vulkan. We'll see how that plays out.

It could be resuscitated but it is unlikely without the backing of a major player.

Intel has failed in most of the markets that they have taken seriously, especially software, so I don't see why this time would be different.

While I hate to put down an effort before it even gets off the ground, there is some truth to this.

Their recent failed push into 'wearables' was a great example. They bought up a number of small(ish) but interesting players (Basis, Recon Jet etc) and squandered them completely. Their complete missing the boat on smartphones, save perhaps a small amount of modem chip business the past couple of years, is especially damning. With GPUs there's the whole failed Larrabee thing as well.

If Intel ever acquire a company I care about I will be extremely concerned.

They have to if they want to stay relevant. They already lost Mobile, if they failed to catch the AI/ML bus it will be their last.

Interesting, given that just 2 days ago it was announced [0] that Intel was going to start to use AMD for some of their integrated graphics. Now they're going to complete against them in the discrete graphics space.

Also, Koduri recently left AMD after what many felt was a disappointing discrete graphics release in Vega.

[0] https://www.anandtech.com/show/12003/intel-to-create-new-8th...

Wowza. If I moved to a direct competitor like that, my employee contract "non-compete" clause would be brought out immediately. And I'm no C-level executive, just an individual contributor. I wish Washington had California's non-compete law.

Often there is a form of mutually assured destruction at play. Qualcomm and Intel have cross-hired major executives like Murthy and no one got sued.

I feel like non-competes are similar to parents these days. Everyone has tons of patents and everyone is infringing on everyone else so they just agree to pay licensing fees to one another and never go to war.

I’m certain AMD has hired their share of Intel people by now, its a no win.

Haven't read much on it, but this happening right after the integrated GPU deal with AMD just strengthens the "teaming up against NVIDIA" theme going on.

It's like an Age of Empires FFA where the losing players always team up against the leader.

I'm excited. I don't care if Intel wins. I just want a video card that doesn't suck and works perfectly with linux. Even if I unplug my monitor sometimes... Even if it's a laptop and it switches GPU for different outputs... Even if I want to use the standard xrandr and normal ass linux tools for configuring my monitor.

Maybe that would happen if kernel developers were not such divas and thought it was appropriate to use coarse language in public discourse. Nvidias graphics drivers work perfectly on windows and they have the only OpenGL implementation that is not a total joke on Linux.

Why do you say the Mesa OpenGL is a total joke?

This is a bit outdated http://richg42.blogspot.de/2014/05/the-truth-on-opengl-drive... overview of driver status by a game developer, vendor A is NVIDIA and as the article points out they are the only one with a performant relatively bug free implementation. Also notice how he mocks Intel for having two driver teams: That Linux expects to get special treatment by integrating the driver into their Graphics abstractions leads demonstrably to worse performance and less features than if you bypass all those abstractions and use essentially the same driver for all kernels.

Thanks for that link. I must say I deem it maybe not completely outdated, but at least worthy an update.

But I'm in awe of the what one can read there.

"This vendor[Nvidia] is extremely savvy and strategic about embedding its devs directly into key game teams to make things happen. (...). These embedded devs will purposely do things that they know are performant on their driver, with no idea how these things impact other drivers.


Vendor A[Nvidia] is also jokingly known as the "Graphics Mafia". Be very careful if a dev from Vendor A gets embedded into your team. These guys are serious business."

So, basically Nvidia is sabotaging OpenGL to fuck up the specs and then implement other working variations and make the game developers use their version? If that is true, fuck Nvidia.

"On the bright side, Vendor C[Intel] feeds this driver team[Windows Driver Team] more internal information about their hardware than the other team[Linux Driver team]. So it tends to be a few percent faster than driver #1 on the same title/hardware - when it works at all."

What the fuck is going on in this industry? Intel is sabotaging its own Linux driver team? Why?

"I don't have any real experience or hard data with these drivers, because I've been fearful that working with these open source/reverse engineered drivers would have pissed off each vendor's closed source teams so much that they wouldn't help.

Vendor A[Nvidia] hates these drivers because they are deeply entrenched in the current way things are done."

That, now finally, makes sense. Nvidia is strong-arming developers to not support Mesa because they are afraid of it. Nvidia is afraid of Mesa. I think this should be more widely known.

The way I read this was a bit different: NVidia actually is the only vendor that offers a performant, complete and relatively bug free implementation. For example if you consider this http://gdcvault.com/play/1020791/ presentation then it is relatively clear that most major innovations were first available as OpenGL extensions by NVidia. The playing field might have levelled somewhat with the introduction of vulkan, which eliminates a lot of code that had to reside in the driver before. The main reason why Mesa is unlikely to catch up is because the backend compiler code is platform specific, so unless NVidia decides to publish their platform specification, it is unlikely that they will achieve meaningful success. Even if NVidia did publish a specification and left driver development to the community it is unclear to me who would be willing to do the free work for them.

Keep in mind that the blog post is from 2014. Since then AMD has rewritten their Linux driver (fglrx -> AMDGPU) which didn't really pay off before their 4xx series (released 2016).

That's usually driver related, not hardware. e.g. it's a well known fact among game engine developers that OpenGL on AMD cards sucks, and it's not at all because of the hardware, it's purely the software drivers (they are much better on linux with open source ones).

Then why many games do not support the open source AMD drivers on Linux?

I found the answer here: http://richg42.blogspot.de/2014/05/the-truth-on-opengl-drive...

Nvidia is strong-arming developers not to support Mesa because they are afraid of open drivers.

Nvidia is afraid of Mesa.

> GT4-class iGPUs, which are, roughly speaking, on par with $150 or so discrete GPUs.

Erm. Nope. No Intel iGPU is on par with the 1050 much less the 1050 Ti.


(I compared mobile chips since the most powerful GT4 can only be found in the mobile chips.)

It's only slightly behind the 1030 which costs $73.

Look at it another way: no Intel iGPU is on par with any discrete GPU, because in price segments where iGPUs appear, discrete GPUs tend to vanish in a matter of 1-2 years. There used to be a significant number of NVIDIA Geforce MX420/440s, 5200s and 6200s. Then much fewer 730s. Now 1030s are practically only in laptops. Intel has been nibbling away at this market slowly, but steadly for a decade.

If driving FHD display is all that you want then integrated GPU is fine. But we start getting 4k/UHD nowadays...

They’re on par with laptop discrete GPU’s with a low power envelope. The sort that would fit into a Macbook Pro. The 1050 uses 70w.

Finally! Intel has a lot of catching up to do.

As GPUs continue to evolve into general purpose vector supercomputers, and as ML/deep learning applications emerge, it seems clear that more and more future chip real estate (and power) will go to those compute units, not the x86 core in the corner orchestrating things.

This isn't the first time Intel has tried to develop a discrete GPU... anyone remember Larrabee?


And i740 before that (I had one of these, it wasn't very good)


> Finally

Why on earth would you think Intel extending their near-monopoly is a thing to celebrate?

They don't have a monopoly in add-on GPUs. There are two strong competitors, and Intel's current on-chip GPUs are comparatively pitiful.

Intel doesn't have a near-monopoly in anything, really. AMD is now a solid CPU competitor with Ryzen and Threadripper, and ATI and nVidia exist and dominate in the GPU space.

> With his hire, Intel will be developing their own high-end discrete GPUs.

With Intel and AMD backing Mesa, things on Linux will get very interesting.

I am sceptical about the consequences for user-controlled computing. AMDs GPUs have made a positive development in the past, while Intel is unfriendly to users control over the hardware they buy.

Intel GPU drivers are open on Linux. How is that worse than what AMD are doing?

Intel does more than integrated GPUs.

AMD too, but were are talking about GPUs in this case.

Yes, and because of the above mentioned, I am sceptical about the consequences. I don't consider it likely that the new GPU will work without proprietary firmware nor that the documentation will be better than AMDs now.

AMD GPUs also need firmware unfortunately.

Damn, this is a major loss for AMD, losing Raja is definitely not the right move. It would have been interesting to see the next iteration of AMD graphics with Raja on board.

Threadripper, and the Zen architecture, put them back on the map, that’s some serious hardware for the price. I wish they had just kept iterating on the CPUs and GPUs.

Vega is not a bad product, it just doesn’t beat nvidia’s offering in the bar charts, doesn’t mean it’s bad, it just means it’s second place which is fine since it’s cheaper as well. Technology needs to be iterated on. Something must be going on at AMD at the moment.

Can someone explain this to me: Isn't the GPU industry all about Patents and trade secrets (enforced by NDAs). Won't all Rajas expertise be tied up in that?

It's likely Intel's already got cross-licensing agreements for everything they need, seeing as they already build GPUs. (Or is somehow everyone forgetting that fact?)

The case is probably that the Intel graphics team just decided they'd rather play against the big boys at nVidia and actually put enough cores on a chip to be a competitor, but in order to do that, they'd need to go off-chip for power and heat dissipation reasons. Hiring the guy from AMD helps you sell the new solution, since presumably that's what this guy's good at.

The market's rife for being disrupted as it has been incredibly stagnant with nVidia and AMD's tit-for-tat for the past, well, decade.

Do you think Intel played dirty on this one? Licensed something from AMD that also opened the door to this strategy via a loop hole?

I don't have any evidence either way, but it smells like the typical Silicon Valley job sniping that happens here every day.

I'd happily take a Chief Architect role at a company if the paycheck had enough zeros and I was the domain expert for that technology.

Maybe someday I'd actually be able to afford a home in this miserable region...

Intel has a both a large patent portfolio and a lot of legal firepower in that space, so no, I don't think folks like Nvidia will be able to "threaten" Intel with patents. Nvidia might be able to threaten them with the monopoly card (clearly Intel is using its dominance in desktops and laptops to move into an adjacent market) but they have been doing that for many years with the integrated GPUs so I would expect it to be a weak play.

I must have expressed myself poorly. He's coming from AMD/RTG, everything he knows is presumably what they use/own. Nothing to do with NVIDIA

No, I understood, I just called out Nvidia because they are so often on the other side of a patent dispute with Intel. AMD and Intel have broad cross patent licensing deals in place because of previous fights over patents on the frontside bus, the instruction set, etc.

From a strategic markets point of view I see it this way;

Discrete GPUs gives Intel a shot of owning both pieces of high margin silicon in a laptop / tablet design win (GPU & CPU) and potentially it gives Intel additional ammunition to go after Nvidia or to mitigate their encroaching.

Nvidia and Intel have gone to war previously over parents with big settlements.

Maybe Raja's move was part of the AMD Intel GPU deal, as could be some undisclosed fees for use AMD graphics tech for use at Intel. That could mean Raja is there to lead a close integration of the tech bases.

This is unlikely, no company would voluntarily hand over their top talent especially to their rivals. Even with the new deal in place Intel AMD are very much rivals.

Normally yes, but companies also shy away from a high potential to get into IP fights about patents and trade secrets.

Intel is all in on becoming a "data company", with the recent design wins in self driving cars & the AMD deal I'm confident that they will come out of the AI HW race in strong shape. This move just reaffirms that.

It isn't clear that AMD's GPU architecture has been really competing with nvidia. We'll have to see how big a deal this is when AMD's APUs come out. I expect them to be quite a bit better than intels integrated product.

This seems to be more of an direct competitive attack on AMDs integrated product than it is competition with nvidia. It feels to me like building discrete GPUs is almost a misdirection.

An interesting counterpoint here. I have a friend who works for Intel as an algorithms engineer for their self-driving vehicle acquisition (Mobileye). Currently, he's using 2 1080TIs w/ TensorFlow to perform deep learning. It is possible that Intel could be looking to develop a chip used specifically for this purpose (a bet on self driving cars) and not for mass-production/sale outside of that tech. Either way, all of the GPU/CPU updates in the past year is just going to create more competition, which is better for the consumer in most cases.

Well, the whole point of Mobileye acquisition was for Intel to have a competing chip for autonomous cars. But it is possible that they are also looking to compete on 1080Ti level. Which would be very hard.

Nvidia is lightyears ahead in the GPU market besides if this GPU push is aimed at the Deep Learning market Intel will have competition from the likes of Xilinx too. IMHO they need to provide great software to go with their GPUs. Traditionally hardware manufacturers have shipped barely usable software. They should perhaps try to use opencl and keep the rest of the tools and libraries open source.

This is what many people outside the AI world don’t seem to understand. Nvidia has a stranglehold in the form of CUDA and Cudnn. There isn’t any open source equivalent to Cudnn. AMD is trying to push OpenCl in this direction but it will be a long time before DL libraries start migrating to OpenCl. Like tomorrow by miracle if al alternative GPU which is as good as the 1080ti popped up, it would be useless in the AI market.

No it won't. Especially if the price is competitive. Say for the price of 1 1080ti, if i can buy 1.5 units of comparable performance graphics card, i'll surely buy it. There are already resources being spent on OpenCI based ML/DL platforms (https://github.com/plaidml/plaidml). The architectures keep getting bigger and training time keeps getting longer. I think you underestimate this factor. I need as much gpu computing power as i can buy within the budget.

True, I would love for some alternatives such as plaidml. However, I can't seem to fathom that Plaidml will be a worthy alternative to lets say Tensorflow or Pytorch or Caffe. I hope I am proven wrong.

I think support for OpenCI will eventually come to other frameworks. But the main problem is the fact that AMD is still far from the NVidia in terms of performance. Vega couldn't reach the performance of 1080Ti, and with Volta next year, the gap is gonna drastically increase. If only AMD can fill the gap, the support, i'm sure, will soon come after that.

It will be interesting to see how "discrete" thses GPUs will be. I'm assuming they will only be "descrete" in the sense that they are not on the same chip, but rather same package (via EMIB).

Either way surely this is a move by Intel to take away from Nvidia's consumer share (which makes up the vast majority of their income) as Nvidia make inroads into the data center market?

The big win that discrete GPUs provide to the cloud/backend marketplace (that Intel sorta plays in via Xeon Phi) is from large banks of VERY fast memory coupled with fast-clocked vector processors. But without a bunch of HBM or something similar, the discrete GPU won't be able to do training at the scale that NVIDIA and AMD do.

One would assume that in the data center for discrete cards Intel would do something with their Nervana acquisition and HBM, or possibly (but less likely) MCDRAM.

Part of me feels it would be very awesome to see onboard GPU hardware outputting similar performance to discrete chips. Of course it would change the size of the socket, or at least the piece of hardware in the socket and cooling requirements. This would have downsides in terms of consumer choice, mind you, or even the fact that upgrading a chip would involve upgrading both. It definitely has merit in the server or small form factor space though.

My guess is they are competing for the nascent consumer vr/ar market, which may not require top tier gpu performance for that much longer.

Microsoft's Mixed Reality platform has the stated goal of running on integrated graphics and even a mid tier card in a year or two should do fine for usable vr/ar.

I don't know too much about Raja Koduri, but is leaving AMD and immediately joining Intel not... really shady?

When you reach that level in some company you don't really have any options for remaining in your field that don't appear to be pretty shady. It looks particularly bad because Intel is AMD's largest competitor but leaving to found a company in the same space, or leaving to join a smaller company in the same space has similar implications.

As a C-level exec, he could probably afford to take a few months or even a year off between jobs. That would ameliorate the "shadiness" of the optics.

He did just take three months (?) off on a "sabbatical" aka he was "soft-fired". If they wanted him gone anyway after Vega there is no shadiness.

How is it shady to change jobs? This notion that you should be jobless for a period of time is ridiculous.

I agree in general, especially for low level employees. It's another thing for a C-level executive.

they are companies directly competing...it is very possible AMD engineer could bring proprietary information to INTEL. as in this literally recently happened and received massive press coverage with UBER and WAYMO- resulted in intense legal action.

I mean Koduri did skip the whole step of immediately founding a company with almost nothing, which would be purchased in short order for a ridiculous sum of money, and have it then come out that he had met with the CEO of Intel on several occasions prior to this. The entire deal seems much more out in the open and above board.

The fact is that people have to have job mobility, and need to be trusted that when they leave a company, they leave behind that company's secrets. Many companies make you sign a document that attests this: if you have any company data, you destroy it, if you have any company equipment, you return it, if you have any company knowledge, you forget or neglect to discuss it.

Most people, honest people, have no problems understanding these obligations and abiding.

Dishonest people, who lie about destroying documents, are why we have Uber and Waymo battling it out.

Right, companies can't have it both ways.

"We reserve the right to let someone go, at any time, for any or no reason." and "We also reserve the right to dictate who they can (or rather can't) work for."

No. If you want to say "I can't work in my field for 2 years", then you can pay me 2 years severance.

if you have any company knowledge, you forget or neglect to discuss it.

Given how the human brain works, that's very much impossible to do... "standing on the shoulders of giants" and all that, as the saying goes.

I'm sure some companies would love to be able to "reformat" employee's brains when they leave, but (fortunately) that's not the reality.

> standing on the shoulders of giants

Of course. No question that you take the sum of your education and experience with you to each new job. The "company knowledge" limitations are around specific trade secret inventions or verbatim recreation of such.

remember when project larrabee could raytrace quake in real time? hope this will be another stab at an hybrid gpgpu

Fool me once, shame on you. Fool me twice, shame on me.

Unclear what AMD thought they stood to gain with the Monday announcement - and it didn't take long to have it play out in their disfavor.

I'm guessing Intel's GPU will never support OpenPower and Arm servers, and will never ship on a CCIX-enabled adapter.

Can someone explain how I'm supposed to interpret this along with the other recent article on Intel & AMD creating a joint chip of some sort? Are they competing or cooperating?

They’re not creating ‘a joint chip of some sort’ - AMD will be selling GPUs to Intel who will package them with their CPUs via EMIB.

[1] https://www.intel.com/content/www/us/en/foundry/emib.html

maybe in some cases its better to think of large corporations more like states, different elements within competing and cooperating with other entities.

Since the days of Chips and Technology, intel has vacillated between going the full hog and retracting from GPUs.

Wonder if this time they will stick with it for the long haul.

I am not sure everyone here knows about Raja. He is a talent at a totally different level. Big loss for AMD. AMD should have done all it could to keep him.

Intel pick the wrong guy and wrong path.It just don't work.

I think Intel should acquisition Nvidia, and let Jen-Hsun Huang lead the new company.

Brings back memories of Larabee their last attempt at making a GPU before they scrapped the project and wasted everyone’s time.

I was on the room at GDCE 2009, where they were praising the vector instructions while presenting a session on Larabee.

intel once tried to do this with larrabee[0] some years back. hopefully they learn from what went wrong there.

[0] https://en.wikipedia.org/wiki/Larrabee_(microarchitecture)

Larrabee was an attempt to see if the x86 architecture could power a GPU. The answer was "not very likely", but it got turned into a product of its own anyways because it turned out to be very interesting for other compute-heavy use-cases. Larrabee's descendents became "Knights", which became the Xeon Phi product line.

Keep in mind Intel currently builds GPUs - just of the integrated variety. What's new here is that Intel is deciding to build discrete (standalone, like those you'd plug into a PCIe port) GPUs.

It was definitely being targeted to compete with other discrete graphics products, and at some point in the program they figured out that they would never meet the performance necessary to compere effectively. So in order to not have completely wasted several years of development, it was re-purposed as a product targeting HPC (the first generation Knights/Xeon Phi product)

Intel really doesn't mind "wasting" time on innovation - they make tens of billions of dollars a year and they're on top of the market. They can afford to go down blind avenues, especially when the research spills out so well, as it did in this case.

It definitely wasn't a "saving throw" that Larrabee's architecture got repurposed. There were several teams at Intel working in similar directions - one team worked on a "cloud on a chip", one team worked on high bandwidth chip-to-chip interconnects, one team worked on on-chip networking... they all came together and formed the Knights Ferry research project, which then got turned into the Xeon Phi.

The "core" of Larrabee, its quick little Pentium-derivatives, went on to be repurposed in the Quark product line and its lineage (e.g. the Intel PCH has a "Quark" inside). The 512-bit instruction set got parted out and became AVX512 in is various incarnations. They definitely got their money's worth out of Larrabee.

Nobody is disagreeing with the fact that Larrabee didn't turn into a discrete GPU despite their attempts make it so. (It's also not surprising, seeing the carriage turn back into a pumpkin with Cell and other Many Core architectures fail to pan out to be good at graphics workloads). But that's a separate issue from Intel building GPUs, since they have a completely other team that works on building productized and shipped GPUs.

Wow, this might save Intel. They are floundering in the server market right now because they won't put enough PCIe lanes on their platforms, because that means lower sales. If they can grow up in the GPU market, then that means they basically win over AMD's latest maneuvers.

This again shows AMD is not ready for the battle with Intel.

Ryzen's chief architect left in 2015, now the master mind behind its GPU is leaving. You need to be really religious to believe that AMD is going to get any better in the coming competitions with NVIDIA and Intel.

Ah, they sort of half-assed in trying with Real 3D. Was that Larrabee?

Never mind, for me it seems raj has too much of higher expectations, which makes him a bad hire. Hope AMD finds someone committed and a real enthusiast to do the job.

Nice! this will be a great win for Intel

Interesting timing for this announcement, given Nvidia earnings tomorrow. Looks like intel is back its underhanded shenanigans.

Trojan horse 101 lesson for AMD

What is a discrete GPU?

A discrete GPU is a GPU that's not on-die with the CPU. A discrete GPU is usually something you stick in a PCIe slot.

A GPU on the CPU die, non-discrete, is often referred to as an "integrated GPU" or "integrated graphics." They're typically not very powerful, though they run common non-gaming applications just fine.

A GPU that's a separate chip, not integrated into the same chip as a CPU.

Nvidia and ATI should develop x86 cpus for the balance's sake.

Larabee reloaded.

I can't wait for the twist in tomorrow's episode.

What? Is there no noncompete clause? Strange


can they coexist?

what a coup this was.

Raja: "I'm...um...going on sabbatical." Lisa (CEO): "OK." Intel: "We're hiring Raja!!..." Lisa: "WTF".

At the risk of turning HN into Reddit, I'd like to politely suggest to keep jokes, puns, and other shenanigans off of Hacker News. If you have nothing constructive to add to the discussion, please refrain from commenting. Thank you :)

why do we all have to be so stoic and serious? can't we have a bit of fun?

Most people on HN are not looking for fun. They're here for information, intelligent discussion, and constructive criticism.

Use Reddit for fun! There is plenty of fun on the internet. Like Reddit, hackers don't want [Serious] tag.

Applications are open for YC Summer 2018

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact