Hacker News new | comments | show | ask | jobs | submit login
What Makes AI and GPUs So Compatible? (engineering.com)
65 points by jonbaer 11 months ago | hide | past | web | favorite | 42 comments

> NVIDIA’s current growth spurt is based on big bet that Huang made a few years ago, when he understood that he was in a unique position to help advance AI, machine learning and deep learning.

What kind of bullshit is this. Alex Krizhevsky used CUDA to run neural nets much faster than you could on CPU, blew up ImageNet and NVIDIA found out they were sitting on a goldmine powered by a revolution they really had nothing to do with.

NVIDIA really had nothing to do with any of this taking off, except happening to provide the hardware.

Exactly! Plus, even before CUDA existed, people in numerical computing had already noticed that GPUs were ideal hardware for certain classes of problems and were doing awkward things like embedding numerical calculations as shaders and other tricks to use the graphics APIs as a way to encode computations. I recall colleagues who worked on those pre-CUDA systems meeting some resistance from NVIDIA in terms of opening up the low-level APIs that would make the hardware more directly accessible, versus the encodings that they needed to go through to use graphics primitives for the work.

NVIDIA got lucky by having the right kind of hardware, and created CUDA not due to some insight into the AI world, but based on the needs of scientific computing. Just look at their early papers on CUDA (E.g., Luebke's 2008 paper "CUDA: SCALABLE PARALLEL PROGRAMMING FOR HIGH-PERFORMANCE SCIENTIFIC COMPUTING") - AI was nowhere to be seen. They were focused on traditional high performance computing problems.

HPC techniques are exactly those that ML and deep learning use in implementation. I don't see how DL is much different from this point of view. It relies on fast linear algebra. HPC relies on fast linear algebra. Nvidia perfected the processor that does this job orders of magnitude faster and provided convenient libraries and development environment to make it accessible.

Nobody is arguing that not to be true. The issue is arguing that NVIDIA foresaw their systems being relevant to AI/DL/ML: they simply were lucky. They opened their architecture to HPC people, concerned largely with fluid dynamics, finite element methods, nbody methods, and so on. To claim that they had any insight into the emergence of ML as a core application of their silicon is simply a laughable revision of history: as we were talking about in an ancestor of this thread.

Well, I argue that they did foresaw its business viability. They invested heavily in DL before any one of their competitors. Of course there were researchers everywhere working on that, but there are researchers working on everything. What is the most difficult part is recognizing that it can be taken to mainstream and betting huge resources on that. Of course no one argues that Nvidia CEO is the first man in the world that saw GPU/AI applicability. But of those in positions of power in the tech business community...

But they didn't really bet anything huge on it. In 2014 they had a few researchers work on cuDNN; a few years after deep learning had already taken off in academia. The first chip they made which did anything specific for DL was the P100 last year; but no-one is actually using that chip, everyone is using high end gaming chips.

They really haven't made much forward looking investment in this area, they just got lucky since they had built CUDA beforehand for other reasons, and that was what people standardized on.

AMD's chips would work just fine for DL if everything wasn't already written in CUDA, needing them to invest in a cross-platform CUDA alternative in the form of ROCm/HIP.

I've been following AMD's efforts, and I think what you say obscures the practical points.

In fact, I prefer OpenCL to CUDA. However, that it is not only a matter of "a few people"'s effort is that AMD had a few people on BLAS libraries since forever, and their stuff is almost unusable for DL, let alone a match for CUDA. Add to that that Nvidia provided cuDNN that everyone uses. That it is not a matter of just a few people can be seen from the fact that AMD still does not provide an alternative, even a toy one, that works. Everythinig in AMD/OpenCL world relies on a few 3-rd party open-source efforts. Some of those work OK, some are cool, some are great, some are garbage, but there is no ecosystem anywhere near Nvidia's.

They have HIP (and they have OpenCL) but these are only general purpose compilers. They have nothing when it comes to the libraries.

It's pointless to keep arguing about who did what, but I will mention that the ROCm folks have significant headcount and are porting all the major libraries to HIP: https://rocm.github.io/dl.html

In comparison, NVIDIA had all this effort done for them by the library developers.

As someone who has several top of the line GPUs from 3 years ago that are not even properly supported by ROCm, and someone who invested time in OpenCL 2.0 only to be told by AMD that OpenCL 1.2 is what I'm going to get on ROCm because they don't care, I tend to have a cynical view on this.

I'll just say that there is a reason the projects you linked have only a handful of stars on GitHub, despite having significant headcount and supposed megacorp backing.

People have been burned by AMD so many times that rarely anyone cares any more. I do, but I'm not very optimistic...

Personally, I'm waiting for these projects to get upstreamed, I have no interest in running an AMD fork, and thankfully they seem to be trying to get it upstreamed, but that will also depend on the libraries caring enough.

I think everyone in the deep learning space is interested in AMD succeeding so that we're not all locked into NVIDIA or Google's TPUs, but everyone wants someone else to take the leap first and pay that cost.

I'm certainly not going to buy any AMD GPUs before these versions have been upstreamed and a few people have kicked the tires, the lost productivity just isn't worth it.

When Nvidia was targeting HPC the push was for better DP power in GPUs.

Now that Deep Learning is big it's a push toward Half-Precision actually.

Most of the "incredible power" of a Tesla V100 is 100% unusable by a Physicist for example

Also I love your MCMC sampler

Thanks :)

In 2009 NVIDIA wouldn't give Geoff Hinton's group the time of day, let alone a single free GPU to try out to see if they wanted to buy more. Geoff Hinton spoke at a NIPS keynote in 2009 and told 1000 people in the audience to all go out and buy a specific NVIDIA card if they wanted to do work in this area and still NVIDIA didn't bat an eye.

Providing a really well supported platform (CUDA) which only had payoff far in the future did take some foresight. That it could be applied to ML was a good guess, but the fact that the DL revolution happened mostly on NV hardware wasn’t just because the hardware was good. The CUDA environment investment paid off.

But that does not really negate the quote does it? Based on new research in AI, he realized that he was in a unique position and bet a lot of R&D on products that came out years later. Of course he was lucky, but he could have mostly ignored ML and have lost a huge opportunity.

Exactly that! Why the CEO of Intel or the CEO of AMD didn't do the same? It was obvious! Well, it is obvious now, but then it was far from obvious.

Pretty much everything this comment said. Nvidia had nothing to do with the first few uses of CUDA for machine learning.

It could be done on FPGAs and stuff but that wasn't easily accessible to the masses.

It was accessible because they invested in CUDA. That wasn’t a small decision.

That's what I thought as soon as I read the opening sentence. NVIDIA founders, at best, through of making dedicated processors for high quality graphics. They certainly weren't aware of AI back then.

Sir, you contradict yourself - CUDA is software and NVIDIA invested in CUDA long time before anybody believed it would take off.

Uh, no. Lots of people invested in things like cuda at roughly the same time. CUDA was also very clearly built to enable general fast math computation, and AI came much much much later

As GP says, this is some interesting revisionist history.

NVidia figured this out at the same time as everyone else.

I still can run early CUDA programs unmodified. This is investment.

edit: on newest hardware, without recompilation

A lot of people thought it woukd take off! Hell people were doing GPGPU with ps2!

Cuda is a GPGPU paradigm. It was certainly not an investment in machine learning.

Seeing as this https://developer.nvidia.com/cudnn exists and is widely used I'd say that isn't really true.

But all major Deep Learning frameworks use Nvidia-s cuDNN library. They provide lots of sugar, but the central thing is developed (or paid for if you prefer) by Nvidia.

Corrected title:

How GPUs and AI are rooted in DSP and how the lack of generalized DSP chips (perhaps built from FPGAs) have set back progress in concurrent programming immeasurably, especially for the embarrassingly parallel problems whose solutions are making headlines today.

Could you elaborate?

The point you're making about untapped technical potential is interesting, but I'm left wanting to know more about what you think the issue/solution is.

Just that, I have deep reservations about the direction that 3D, physics simulation and AI are going in. We're having to deal with a lot of proprietary and niche methodologies that are going to lead us down a lot of blind allies. OpenCL and CUDA are simply not C and never will be.

It would be so much better if we had generalized multicore computing. Something on the order of 256 or more 1 GHz MIPS, ARM, or even early PowerPC processors. We need to be able to run arbitrary code with flow control and be able to experiment with our own models for data locality, caching, etc.

Something like that running Erlang/Go/Octave/MATLAB or one of the many pure functional languages would open up amazing opportunities if we were no longer compute-bound. I first started caring about this in the late 90s with FPGAs but that possible future was supplanted by the hardware accelerated graphics future we're in now. Yes it's pretty good, but a pale shadow of what's possible.

Edit: here are a few examples of what I'm talking about:




If you have “generalized multicore computing”, you still need to generate a code schedule for your grid / hypercube / what have you of CPUs or PEs, unless you want to program all that stuff by hand. Compilers 30+ years on still aren’t super good at generating good code for parallel machines from abstract expressions of a mathematical problem for CPUs and GPUs, despite all the loop nest transformations with tiling, reordering, vectorization, polyhedral techniques, what have you, outside of some restricted forms of expression (like, say, Halide) or manual annotations you give the compiler as hints along the way. Scheduling code across multiple processors is more or less the same problem, with or without synchronous exchanges of data between the processors.

The modern GPU is a “generalized multicore” machine anyways. Nvidia GPUs have, say, 60 SMs with many warps per SM concurrently resident, giving you hundreds of hardware threads (warps) available, each with 32-wide SIMD units. You have multiple memories you can play with for data locality (the multi-MB sized register file and shared memory space, the smaller caches, global memory). It’s a hard enough problem to use all of that well for all but simple problems.

If you want to change deeper behavior, there's always Verilog.

Things like the Tilera manycore processors?

They do exist, it's just a considerable challenge to program them effectively and they're more expensive than GPUs. Whereas with GPUs the R&D is spread across the huge games market (and more recently cryptocurrency).

The following two points are highly tangential and don't squarely answer your question; I'm just mentioning them as things to file away in case they turn out to be relevant.

- https://news.ycombinator.com/item?id=13741505 shows an experimental new CPU architecture which appears to be single-core but demonstrates interesting power savings and speedups by eschewing standard on-chip caching and building a smarter compiler. TL;DR closed-source compiler, but I think the people making the design are trying to make the architecture design/registers/overview/etc as open as practical/viable with free documentation etc. I'm not sure if there's a simulator for it. I'm very sure they're already taking on commercial customers, and getting access to it is likely to be not overly nontrivial. (I realize you're not likely to need one just yet, but "can't access at all" makes for something much less interesting to file away.)

- Forth, Inc. makes the GA144, a 144-core microcontroller that natively runs Forth (and will NOT comfortably run transpiled <any other language here>). At one point a production company was making evaluation boards. https://www.youtube.com/watch?v=NK1zlz67MjU is a (1hr long, somewhat slow-paced) presentation that provides a good overview of the GA144. I half wonder whether I'm looking at a chip designed by people who don't understand SMP when I watch this, but I also think that it's possible that the brutal simplicity that Forth (and Chuck Moore) applies to everything means that I'm really staring at the raw complexity of SMP here. I've read that Forth is one of those languages that changes the way you think about programming, so maybe the GA144 could be a training processor - crack it, and you'll understand SMP a tiiiny bit better on more contemporary/traditional processor architectures.

Linear Algebra is the mathematics of neural networks, and many matrix operations can be done in parallel. GPU's have more cores, but with limited capacity, but precisely what linear computations need.

In a way i find it fascinating how much an entertainment business like Gaming can contribute to overall technology advancements. AI/ML is just one of the latest fields that benefits immensely from game technology.

Gaming is the space industry of our time.

That is a interesting thought. Would neural networks have ever had their renaissance without the constant push for better game graphics?

Either there's a next-page button I'm missing or the title has very little to do with the content.

GPUs are very compatible with AI that is numeric and data-parallel such as ML/ANNs, but so as much with symbolic AI -- and I think the demise of the latter is greatly exaggerated.

inner product.

Very poor article TLDR is GPU has more flops so it's better.

More flops that are optimized for the linear algebra that usually powers most machine learning. The matrix transforms and vector multiplication that are so beneficial to rasterizing mesh geometries work really well to compute neural networks.

MMA units are new with Volta but even that isn't covered by the article.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact