Hacker News new | past | comments | ask | show | jobs | submit login

GPUs:

http://michaelgalloy.com/2013/06/11/cpu-vs-gpu-performance.h...

http://www.anandtech.com/show/7603/mac-pro-review-late-2013/...

This is behind much of the interest in machine learning these days. Deep learning provides a way to approximate any computable function as the composition of matrix operations with non-linearities. It does this at the cost of requiring many, many times the computing power. But much of this computing cost can be parallelized and accelerated effectively on the GPU, so with GPU cores still increasing exponentially, at some point it's likely to become more effective than CPUs.




"Deep learning provides a way to approximate any computable function as the composition of matrix operations with non-linearities."

Thanks, and I wish this sentence was one of the first things I read when I was trying to figure out exactly what Deep Learning really meant. It's much more comprehensible than the semi-magical descriptions that seem far more prevalent in introductory articles.

It's also fascinating that a seemingly simple computing paradigm is so powerful, kind of like a new Turing Machine paradigm.


"Deep learning provides a way to approximate any computable function as the composition of matrix operations with non-linearities."

This actually describes neural networks in general, not so much "deep learning".

Deep learning comes from being able to scale up neural networks from having only a few 10s or 100s of nodes per layer, to thousands and 10s of thousands of nodes per layer (and of course the combinatorial explosion of edges in the network graph between layers), coupled with the ability to process and use massive datasets to train with, and ultimately process on the trained model.

This has mainly been enabled by the cheap availability of GPUs and other parallel architectures, coupled with fast memory interconnects (both to hold the model and to shuttle data in/out of it for training and later processing) and the CPU (probably disk, too).

But neural networks have almost always been represented by matrix operations (linear algebra), it's just that there wasn't the data, nor the vast (and cheap) numbers of parallelizable processing elements available to handle it (the closest architectures I can think of that could potentially do it in the 1980/90s would be from Thinking Machines (Connection Machines) and probably systolic array processors (which were pretty niche at the time, mainly from CMU):

https://en.wikipedia.org/wiki/Systolic_array

https://en.wikipedia.org/wiki/WARP_(systolic_array)

These latter machines started to prove some of what we take for granted today, in the form of the NAVLAB ALVINN self-driving vehicle:

http://repository.cmu.edu/cgi/viewcontent.cgi?article=2874&c...

Of course, today it can be done on a smartphone:

http://blog.davidsingleton.org/nnrccar/

The point, though, is that neural networks have long been known to be most effectively computed using matrix operations, it's just that the hardware wasn't there (unless you had a lot of money to spend) nor the datasets - to enable what we today call "deep learning".

That, and AI winters didn't help matters. I would imagine that if somebody from the late 1980s had asked for 100 million to build or purchase a large parallel processing system of some form for neural network research - they would've been laughed at. Of course, no one at that time really knew that what was needed was such large architecture, nor the amount of data (plus the concept of convolutional NNs and other recent model architectures weren't yet around). Also - programming for such a system would have been extremely difficult.

So - today is the "perfect storm", of hardware, data, and software (and people who know how to use and abuse it, of course).


I don't think GPUs are a particularly good solution for these, they aren't the future and won't be around for mass-deployment that much longer.


It seems the author is down the 'deep learning' rabbit hole.

>> It does this at the cost of requiring many, many times the computing power. But much of this computing cost can be parallelized and accelerated effectively on the GPU, so with GPU cores still increasing exponentially, at some point it's likely to become more effective than CPUs.

So can be any matrix. Sadly, there aren't as many algorithms that are efficiently represented by one.


That's quite a statement - what will replace GPUs for the ever increasing amount of ML work being done?


TPU-like chips; though they can be (partially) included on GPUs as well as is the case with the latest NVidia/AMD GPUs.


There's nothing special about the tpu. The latest gpus are adding identical hardware to the tpu, and the name "GPU" is a misnomer now since those cards are not even intended for graphics (no monitor out). Gpus will be around for a very long time, just not doing graphics.


Yep. Simply the core idea of attacking memory latency with massive parrelization of in flight operations rather than large caches makes sense for a lot of different workloads, and that probably isn't going to change.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: