CPU, GPU Put to Deep Learning Framework Test

amelius · on Oct 22, 2016

Am I the only one who hates the GPU form factor? Basically, I guess most people just want a co-processor box that they can attach to, and place on top of their PC. And link another one to it, et cetera. No clumsiness with power cables inside the confined space of the PC. No running out of space after adding 3 units. (GPUs occupy two PCI slots because of their bulky size).

So I have two questions:

- Do we really need the speed of PCI to connect to GPUs, or would a lower speed connection (USB 3/firewire) be sufficient for most computational applications such as deep learning?

- How would the performance of these deep learning frameworks scale as we add more and more units?

quantumhobbit · on Oct 22, 2016

The speed Of the interconnect is absolutely critical. It depends on the problem somewhat, but if you are moving data back and forth cpu to gpu the interconnect becomes the limiting factor very quickly. So much of a limitation that for some problems you might be better of with a cpu or a puny gpu that is on the same die as the cpu.

If you can do everything on the gpu then it isn't a problem but at that point why isn't the gpu your main processor?

JimmyAustin · on Oct 22, 2016

ATI is taking an interesting approach with this by mounting M2 drives onto the video card [0].

http://www.anandtech.com/show/10518/amd-announces-radeon-pro...

gumby · on Oct 22, 2016

> If you can do everything on the gpu then it isn't a problem but at that point why isn't the gpu your main processor?

The GPU is a highly optimized block of SIMD machines with a more limited set of ALU ops and addressing modes. The CPU has a more general architecture.

It's totally reasonable to have a truck, a car and a bicycle.

quantumhobbit · on Oct 22, 2016

Modern CPUs have multiple cores so they are really four cars bolted together. Why not bolt a car and a truck together and let them share memory (he says completely ruining the metaphor).

Graphics cards are becoming full fledged computers, a sibling poster mentioned AMD adding an interface for SSds to their cards. So why have the big x86 multi core system if all it does is feed data to the symbiotic computer that is the gpu and give us a shell prompt?

I guess I just answered my question. Gpus are symbiots because they don't run Linux. But I would like to see one that does, whatever that would look like.

detaro · on Oct 22, 2016

That would probably look (roughly) like Intel Xeon Phi, which according to some was started as a GPU project. (they are also co-processors, but you can connect to one and log into a Linux running on it)

I'm not sure why you'd totally want to get rid of a general-purpose CPU to run Linux, set up data etc and keep the GPU part simpler (and fast) once you've given them equal memory access.

matt4077 · on Oct 22, 2016

The connection is unfortunately crucial – apart from deep learning, there aren't actually that many problems that require the amount of computation to justify the transfer.

BUT: considering the CPU is really just a very expensive traffic conductor in the way these frameworks work, an option may be to combine GPUs with fast storage (SSD) and/or maybe a sort of second-level RAM.

Scaling across GPUs is pretty easy. You can run batches in parallel and combine the results, and only the learned parameters need to be transferred once you have distributed the data.

egeozcan · on Oct 22, 2016

You can go up to 4 GPUs with very high-end motherboards in an ATX box and I guess, after that, you are in the "specialized hardware" territory anyways? There are also graphics cards enclosures which work through Thunderbolt/USB 3.1 and I'd bet they have enough bandwidth to max out the cards. But I wonder how crazy you can get with the number of cards.

sxp · on Oct 22, 2016

For single GPU setups, you could use an external GPU setup such as http://www.razerzone.com/store/razer-core instead of using a large PC meant for high-end GPUs. Or you could use a dedicated ultra small form factor machine such as https://www.youtube.com/watch?v=s2W0Lsf7hec for ~$500 + GPU if you want a dedicated GPU box.

But for the people for whom it makes sense to buy hardware for deep learning instead of using a cloud service, a $4k 4x GTX machine would be better.

zitterbewegung · on Oct 22, 2016

You can use the razor core to connect to an external GPU http://www.razerzone.com/store/razer-core which uses USB.

zootam · on Oct 22, 2016

just saying USB is a bit misleading- it uses thunderbolt 3 with usb c.

gumby · on Oct 22, 2016

I think you have it backwards: to process a lot of data you really need DMA. The current state of these interconnects is going backwards IMHO.

alecco · on Oct 22, 2016

Actually, since moving data to GPU memory and back is so costly, GPGPU is quite limited. For example, that's why you don't see it used for databases out of some paper or some NVidia marketing thing.

A few years ago there were plans to integrate the GPU and CPU by AMD with a local fast interconect. But I don't know what happened with all that.

haldora · on Oct 22, 2016

Do note the OS and CUDA version differences between the 980, 1080 and K80 tests. Like the last deep learning comparison posted on HN, they failed to get a consistent baseline system. I don't know how much this would affect performance, but it should be considered.

- 980: Ubuntu 14.04; CUDA 7.5

- 1080: Ubuntu 14.04; CUDA 8.0

- K80: CentOS 7.2; CUDA 7.5

shepardrtc · on Oct 22, 2016

The benchmark is not very good because of this. I hope no one takes this seriously. They don't even use the same BLAS libraries for all of the frameworks.

sxp · on Oct 22, 2016

It's strange that they're using a 2012 i7-3820 instead of a modern i7. They're also missing price when comparing a ~$300 i7, 2x $700 Xeons, a $650 GTX 1080, and a $4k K80.

shepardrtc · on Oct 22, 2016

That's not the only inconsistency in it. Different OS'es, BLAS libraries, driver versions.

rayuela · on Oct 22, 2016

The person responsible for making the table with the benchmarks needs to be taken out back and shot. Goddamnit, who makes a table with 286 figures and only uses lines of equal width throughout the whole goddamn thing. Also, I second haldora's qualms about the lack of consistent software used to make these hardware comparisons.

jbmorgado · on Oct 22, 2016

It's quite interesting too see as a consumer card (the GTX 1080) outperforms a professional and much more expensive one (the Tesla K80) by a good margin.

dharma1 · on Oct 22, 2016

The K80 uses 2 generations older architecture (Kepler)