Hacker News new | comments | show | ask | jobs | submit login

All of my GPU resources are OpenCL based. The only place I could run it is on AWS. Someone really needs to make a CUDA compiler with an OpenCL backend.



I'm working on a low level library that - in theory - will allow you to write code once and compile it for OpenCL or CUDA backends [0]. It is still pre-alpha and completely unusable but maybe you want to have a look or keep an eye on it.

I am trying to see if I can put a) a portable interface together (for both writing kernels and the coordination of cards, contexts, memory, queues/streams etc.) and b) if the performance is portable. I can already see that performance for trivial kernels is portable from AMD to NVIDIA but as soon as I go to the Intel PHI things are suddenly very different.

[0] https://github.com/sschaetz/aura


Yes, I'm sorry about that, but it's not going to change. I personally don't have experience developing for OpenCL and the deep learning community seems to very much have embraced CUDA.


I have this same issue and am forced to run all my experiments on AWS. Once you get a workflow down it is pretty easy and the AWS resources provide an acceptable level of performance. Though I suppose if you're just toying around / exploring it can be discouraging to go through all of that and incur AWS fees.


But you must have an Nvidia card around some where to test your codes out first? It looks like the best bang for the buck is a GTX 760.


If I had a 760 I'd likely just run all my experiments right on my device. As it stands my only machine is a Early 2009 macbook pro with a Geforce 9600M. The nice part about pylearn2, and theano is that the symbolic expressions compiled by theano can run on your processor too, albeit much slower. You can always test to make sure 1 to 2 epochs work locally before sending it off to AWS for GPU computation.

I'd be very willing to buy a 760, or even a 770 at their current prices. The only thing holding me back is that i'd have to buy an entire computer in which to place the card. Haha :D

If you're interested in just how fast video cards can be for deep learning as compared to CPUs take a peek at the results on [0]. That is for older model GPUs, and they're an order of magnitude faster. Though as I understand it the Geforece 5xx cards are superior for scientific computing as compared to the 6xx and 7xx series which are more gaming oriented. (May still outperform 5xx due to raw speed at the cost of some additional CPU time). Have a look at the appendix on [1] for more info on Fermi vs Kepler GPUs.

[0] http://deeplearning.net/tutorial/lenet.html#running-the-code

[1] http://fastml.com/running-things-on-a-gpu/


The convolutional neural network code that pylearn2 and the Toronto group use is specifically tuned for GTX580 cards - users have reported factors of 2x-10x slowdown using Kepler series cards. In general, most users (of pylearn2 at least) highly recommend a GTX5xx device.

I personally use a GTX570, and it is pretty decent, though not spectacular. Costwise, it is reasonably priced, and "good enough" for most of the networks I have tried (minus ImageNet...)

A key problem is limited GPU memory in the Fermi series, as it is difficult to fit a truly monstrous network on a single card. Krizhevsky's ImageNet work had some very tricky things to spread it across 2 GTX580s, and the training still took a very, very long time.


I didn't realize the performance dip was so dramatic. As of late I've been considering acquiring some hardware and I guess I'm going to have to keep this in mind. If ones were to buy one today I don't even think I can find stores that sell the GTX 580. Would probably need to search on ebay.

Thanks for the heads up friend!


Check out this discussion - it may help you decide what card to get. There was also an email somewhere about how TITAN is currently not any faster than a 580, though no hard numbers.

https://groups.google.com/forum/#!topic/pylearn-dev/cODL9RXP...

Once again, my 570 is slower than a 580 (about 2x), but "good enough" for now.


Huh. Any theories as to why that is? Highly tuned coalesced reads that backfire on the kepler arch?


From what I understand, it is due to the programming specifics of the training algorithms, primarily being focused on exploiting certain registers and architecture features specific to Fermi. The code actually got updated from GTX280 series to GTX580 series IIRC, so it is likely that it will be updated again at some point by a motivated researcher or group. I suspect there simply isn't a need to update right now for most labs (though I suspect TITAN / TITAN LE / TITAN II may change that). Also, Alex Krizhevsky now works for Google :), so someone else may need to do the updating.

http://www.wired.com/wiredenterprise/2013/03/google_hinton/

You can check out the code here - it is really good IMO.

http://code.google.com/p/cuda-convnet/


What types of applications are you guys typically doing with these neural nets? If there was an alternative to AWS which was cheaper / easier to set up would it be a compelling service to check out?


Possibly, though at this point after stumbling through using EC2 for these resources (and contending with the insane price fluctuations recently) I'm coming to the conclusion that owning my own hardware may be of substantial benefit for experimentation.


Have you seen GPU Ocelot? (http://code.google.com/p/gpuocelot/).




Applications are open for YC Winter 2018

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: