
What Makes AI and GPUs So Compatible? - jonbaer
https://www.engineering.com/Hardware/ArticleID/15471/NVIDIAs-Artificial-Intelligence-Boom-What-Makes-AI-and-GPUs-so-Compatible.aspx
======
Eridrus
> NVIDIA’s current growth spurt is based on big bet that Huang made a few
> years ago, when he understood that he was in a unique position to help
> advance AI, machine learning and deep learning.

What kind of bullshit is this. Alex Krizhevsky used CUDA to run neural nets
much faster than you could on CPU, blew up ImageNet and NVIDIA found out they
were sitting on a goldmine powered by a revolution they really had nothing to
do with.

NVIDIA really had nothing to do with any of this taking off, except happening
to provide the hardware.

~~~
llukas
Sir, you contradict yourself - CUDA is software and NVIDIA invested in CUDA
long time before anybody believed it would take off.

~~~
arcanus
Cuda is a GPGPU paradigm. It was certainly not an investment in machine
learning.

~~~
nightski
Seeing as this
[https://developer.nvidia.com/cudnn](https://developer.nvidia.com/cudnn)
exists and is widely used I'd say that isn't really true.

------
zackmorris
Corrected title:

How GPUs and AI are rooted in DSP and how the lack of generalized DSP chips
(perhaps built from FPGAs) have set back progress in concurrent programming
immeasurably, especially for the embarrassingly parallel problems whose
solutions are making headlines today.

~~~
SomeStupidPoint
Could you elaborate?

The point you're making about untapped technical potential is interesting, but
I'm left wanting to know more about what you think the issue/solution is.

~~~
zackmorris
Just that, I have deep reservations about the direction that 3D, physics
simulation and AI are going in. We're having to deal with a lot of proprietary
and niche methodologies that are going to lead us down a lot of blind allies.
OpenCL and CUDA are simply not C and never will be.

It would be so much better if we had generalized multicore computing.
Something on the order of 256 or more 1 GHz MIPS, ARM, or even early PowerPC
processors. We need to be able to run arbitrary code with flow control and be
able to experiment with our own models for data locality, caching, etc.

Something like that running Erlang/Go/Octave/MATLAB or one of the many pure
functional languages would open up amazing opportunities if we were no longer
compute-bound. I first started caring about this in the late 90s with FPGAs
but that possible future was supplanted by the hardware accelerated graphics
future we're in now. Yes it's pretty good, but a pale shadow of what's
possible.

Edit: here are a few examples of what I'm talking about:

[https://news.ycombinator.com/item?id=15099422](https://news.ycombinator.com/item?id=15099422)

[https://news.ycombinator.com/item?id=12308671](https://news.ycombinator.com/item?id=12308671)

[https://news.ycombinator.com/item?id=14803235](https://news.ycombinator.com/item?id=14803235)

~~~
jhj
If you have “generalized multicore computing”, you still need to generate a
code schedule for your grid / hypercube / what have you of CPUs or PEs, unless
you want to program all that stuff by hand. Compilers 30+ years on still
aren’t super good at generating good code for parallel machines from abstract
expressions of a mathematical problem for CPUs and GPUs, despite all the loop
nest transformations with tiling, reordering, vectorization, polyhedral
techniques, what have you, outside of some restricted forms of expression
(like, say, Halide) or manual annotations you give the compiler as hints along
the way. Scheduling code across multiple processors is more or less the same
problem, with or without synchronous exchanges of data between the processors.

The modern GPU is a “generalized multicore” machine anyways. Nvidia GPUs have,
say, 60 SMs with many warps per SM concurrently resident, giving you hundreds
of hardware threads (warps) available, each with 32-wide SIMD units. You have
multiple memories you can play with for data locality (the multi-MB sized
register file and shared memory space, the smaller caches, global memory).
It’s a hard enough problem to use all of that well for all but simple
problems.

If you want to change deeper behavior, there's always Verilog.

------
BucketSort
Linear Algebra is the mathematics of neural networks, and many matrix
operations can be done in parallel. GPU's have more cores, but with limited
capacity, but precisely what linear computations need.

------
kayoone
In a way i find it fascinating how much an entertainment business like Gaming
can contribute to overall technology advancements. AI/ML is just one of the
latest fields that benefits immensely from game technology.

~~~
hyperpallium
Gaming is the space industry of our time.

------
Veedrac
Either there's a next-page button I'm missing or the title has very little to
do with the content.

------
thelazydogsback
GPUs are very compatible with AI that is numeric and data-parallel such as
ML/ANNs, but so as much with symbolic AI -- and I think the demise of the
latter is greatly exaggerated.

------
dicroce
inner product.

------
dogma1138
Very poor article TLDR is GPU has more flops so it's better.

~~~
simcop2387
More flops that are optimized for the linear algebra that usually powers most
machine learning. The matrix transforms and vector multiplication that are so
beneficial to rasterizing mesh geometries work really well to compute neural
networks.

~~~
dogma1138
MMA units are new with Volta but even that isn't covered by the article.

