I am trying to see if I can put a) a portable interface together (for both writing kernels and the coordination of cards, contexts, memory, queues/streams etc.) and b) if the performance is portable. I can already see that performance for trivial kernels is portable from AMD to NVIDIA but as soon as I go to the Intel PHI things are suddenly very different.
I'd be very willing to buy a 760, or even a 770 at their current prices. The only thing holding me back is that i'd have to buy an entire computer in which to place the card. Haha :D
If you're interested in just how fast video cards can be for deep learning as compared to CPUs take a peek at the results on . That is for older model GPUs, and they're an order of magnitude faster. Though as I understand it the Geforece 5xx cards are superior for scientific computing as compared to the 6xx and 7xx series which are more gaming oriented. (May still outperform 5xx due to raw speed at the cost of some additional CPU time). Have a look at the appendix on  for more info on Fermi vs Kepler GPUs.
I personally use a GTX570, and it is pretty decent, though not spectacular. Costwise, it is reasonably priced, and "good enough" for most of the networks I have tried (minus ImageNet...)
A key problem is limited GPU memory in the Fermi series, as it is difficult to fit a truly monstrous network on a single card. Krizhevsky's ImageNet work had some very tricky things to spread it across 2 GTX580s, and the training still took a very, very long time.
Thanks for the heads up friend!
Once again, my 570 is slower than a 580 (about 2x), but "good enough" for now.
You can check out the code here - it is really good IMO.