Hacker News new | past | comments | ask | show | jobs | submit login
CPU, GPU Put to Deep Learning Framework Test (nextplatform.com)
32 points by adamnemecek on Oct 22, 2016 | hide | past | favorite | 20 comments



Am I the only one who hates the GPU form factor? Basically, I guess most people just want a co-processor box that they can attach to, and place on top of their PC. And link another one to it, et cetera. No clumsiness with power cables inside the confined space of the PC. No running out of space after adding 3 units. (GPUs occupy two PCI slots because of their bulky size).

So I have two questions:

- Do we really need the speed of PCI to connect to GPUs, or would a lower speed connection (USB 3/firewire) be sufficient for most computational applications such as deep learning?

- How would the performance of these deep learning frameworks scale as we add more and more units?


The speed Of the interconnect is absolutely critical. It depends on the problem somewhat, but if you are moving data back and forth cpu to gpu the interconnect becomes the limiting factor very quickly. So much of a limitation that for some problems you might be better of with a cpu or a puny gpu that is on the same die as the cpu.

If you can do everything on the gpu then it isn't a problem but at that point why isn't the gpu your main processor?


ATI is taking an interesting approach with this by mounting M2 drives onto the video card [0].

http://www.anandtech.com/show/10518/amd-announces-radeon-pro...


> If you can do everything on the gpu then it isn't a problem but at that point why isn't the gpu your main processor?

The GPU is a highly optimized block of SIMD machines with a more limited set of ALU ops and addressing modes. The CPU has a more general architecture.

It's totally reasonable to have a truck, a car and a bicycle.


Modern CPUs have multiple cores so they are really four cars bolted together. Why not bolt a car and a truck together and let them share memory (he says completely ruining the metaphor).

Graphics cards are becoming full fledged computers, a sibling poster mentioned AMD adding an interface for SSds to their cards. So why have the big x86 multi core system if all it does is feed data to the symbiotic computer that is the gpu and give us a shell prompt?

I guess I just answered my question. Gpus are symbiots because they don't run Linux. But I would like to see one that does, whatever that would look like.


That would probably look (roughly) like Intel Xeon Phi, which according to some was started as a GPU project. (they are also co-processors, but you can connect to one and log into a Linux running on it)

I'm not sure why you'd totally want to get rid of a general-purpose CPU to run Linux, set up data etc and keep the GPU part simpler (and fast) once you've given them equal memory access.


The connection is unfortunately crucial – apart from deep learning, there aren't actually that many problems that require the amount of computation to justify the transfer.

BUT: considering the CPU is really just a very expensive traffic conductor in the way these frameworks work, an option may be to combine GPUs with fast storage (SSD) and/or maybe a sort of second-level RAM.

Scaling across GPUs is pretty easy. You can run batches in parallel and combine the results, and only the learned parameters need to be transferred once you have distributed the data.


You can go up to 4 GPUs with very high-end motherboards in an ATX box and I guess, after that, you are in the "specialized hardware" territory anyways? There are also graphics cards enclosures which work through Thunderbolt/USB 3.1 and I'd bet they have enough bandwidth to max out the cards. But I wonder how crazy you can get with the number of cards.


For single GPU setups, you could use an external GPU setup such as http://www.razerzone.com/store/razer-core instead of using a large PC meant for high-end GPUs. Or you could use a dedicated ultra small form factor machine such as https://www.youtube.com/watch?v=s2W0Lsf7hec for ~$500 + GPU if you want a dedicated GPU box.

But for the people for whom it makes sense to buy hardware for deep learning instead of using a cloud service, a $4k 4x GTX machine would be better.


You can use the razor core to connect to an external GPU http://www.razerzone.com/store/razer-core which uses USB.


just saying USB is a bit misleading- it uses thunderbolt 3 with usb c.


I think you have it backwards: to process a lot of data you really need DMA. The current state of these interconnects is going backwards IMHO.


Actually, since moving data to GPU memory and back is so costly, GPGPU is quite limited. For example, that's why you don't see it used for databases out of some paper or some NVidia marketing thing.

A few years ago there were plans to integrate the GPU and CPU by AMD with a local fast interconect. But I don't know what happened with all that.


Do note the OS and CUDA version differences between the 980, 1080 and K80 tests. Like the last deep learning comparison posted on HN, they failed to get a consistent baseline system. I don't know how much this would affect performance, but it should be considered.

- 980: Ubuntu 14.04; CUDA 7.5

- 1080: Ubuntu 14.04; CUDA 8.0

- K80: CentOS 7.2; CUDA 7.5


The benchmark is not very good because of this. I hope no one takes this seriously. They don't even use the same BLAS libraries for all of the frameworks.


It's strange that they're using a 2012 i7-3820 instead of a modern i7. They're also missing price when comparing a ~$300 i7, 2x $700 Xeons, a $650 GTX 1080, and a $4k K80.


That's not the only inconsistency in it. Different OS'es, BLAS libraries, driver versions.


The person responsible for making the table with the benchmarks needs to be taken out back and shot. Goddamnit, who makes a table with 286 figures and only uses lines of equal width throughout the whole goddamn thing. Also, I second haldora's qualms about the lack of consistent software used to make these hardware comparisons.


It's quite interesting too see as a consumer card (the GTX 1080) outperforms a professional and much more expensive one (the Tesla K80) by a good margin.


The K80 uses 2 generations older architecture (Kepler)




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: