Hacker News new | past | comments | ask | show | jobs | submit login
GPU-Accelerated Deep Learning Library in Python (github.com)
148 points by rbanffy on Dec 5, 2013 | hide | past | favorite | 50 comments

Hm. And here I was thinking I'd seen most of the accelerated deep learning implementations. Although this one seems a touch incomplete as far as features go. There are some other for those not aware:

[1] [Python] pylearn2 by the Lisa Lab @ uMontreal. Probably one of the beefiest of the bunch. Has a pretty extensive feature set, but IMO isn't the simplest to use.

[2] [Python] MORB by Sander Dielman. Just restricted boltzman machines, but very nice and intuitive to use. Also built on Theano.

[3] [C++] CUV by the AIS lab at Bonn University in Germany. Not strictly deep learning, but it's a fast CUDA-backed library written in templated C++ with bindings to other languages, including python. It's been used to implement deep learning models.

[4] [Python] DeepNet by Nitish Srivastava and co. at U of Toronto. I don't have as much experience with this one but it's built on cudamat and implements most of the main model types. Interestingly, they've taken the approach of using Google protocol buffers as a sustainable means of defining model configurations.

[5] [MATLAB] DeepLearnToolbox by Rasmus Palm. If matlab is your thing, here you are. Implements most models you are likely to want.

Not all of these developed equally as actively developed, but there is some good stuff above and I haven't found many instances where what I wanted wasn't (somewhat) readily available in at least one. I'm sure I'm forgetting one or two.

[1] https://github.com/lisa-lab/pylearn2

[2] https://github.com/benanne/morb

[3] https://github.com/deeplearningais/CUV

[4] https://github.com/nitishsrivastava/deepnet/tree/master/deep...

[5] https://github.com/rasmusbergpalm/DeepLearnToolbox

Yes, Hebel doesn't have a ton of features and a kitchen-sink of different models yet, but I hope that's going to change. There are lots of things that are quite easy to implement in the current framework, such as: - Neural net regression - Autoencoders - Restricted Boltzman machines

There's a lot of interest for convolutional networks and the best way to implement it will be to wrap Alex Krizhevsky's cuda-convnet, like DeepNet and PyLearn2 have, but this will require a bit more effort.

With respect to other deep learning packages, Hebel doesn't necessarily do everything differently, but depending on your needs it may be the best choice for a particular job.

PyLearn2 is big and monumental and although I haven't used it much personally, it seems to excellent. But as you mentioned, it's not necessarily easy to use and if you want to extend it, you have to learn the Theano development model, which takes some time to grok.

DeepNet is quite similar to Hebel in its approach (even though it offers more models right now). However, DeepNet is based on cudamat and gnumpy, which I have found to often be quite unstable and slow. Hebel is based on PyCUDA which is very stable and according to some preliminary tests I did runs about twice as fast as cudamat.

So, the idea of Hebel is that it should make it easy to train the most important deep learning models without much setup or having to write much code. It is also supposed to make it easy to implement new models through a modular design that lets you subclass existing layers or models to implement variations of them.

Question: do you/will you plan to support converting GPU nets to CPU, perhaps by keeping weights and architecture definition separate from PyCUDA dependent structures during serialization?

I have found that using a trained net for preprocessing can be accomplished using very limited resources (read: Core 2 Duo laptop). This is one of the very nice features of DeCAF, which could allow for some interesting applications on embedded devices.

Great work by the way - I look forward to testing it out soon!

That would be possible, but since Hebel is mainly meant to be used in research I don't think it's a big priority now. The most important reason to do this would be to allow development on laptops and workstations without NVIDIA cards and to run the finished model on CUDA hardware later.

As far as embedded devices go (I assume you're talking about ARM cpus etc), they are probably too underpowered to run Neural nets anyway, or models would have to be written in highly specialized C.

Yup, I didn't mean to belittle Hebel. I actually just meant that the lack of features is likely why I hadn't heard of it. From the looks of things it's on a nice path. You philisophy about what Hebel should be sound similar what's been done with MORB for making RBMs, and that is one of the reasons I've always like that library. Although MORB still does incur the 'working with theano' conceptual overhead.

The reason why haven't heard of it is probably because I only put the code on Github less than two weeks ago ;) - I've really been blown away by the response though.

There is also DeCAF, which actually includes a way to load a pretrained ImageNet network based on cuda-convnet. I have had pretty recent success using this blob as preprocessing for image classification, ala http://arxiv.org/abs/1310.1531.

Github is here:


My current code as an example of combining sklearn/pylearn2 with DeCAF preprocessing (under the decaf folder, sklearn usage is under previous commits):


Thanks for the DeCAF plug! Here's a demo of the classifier with the pre-trained ImageNet weights in action: http://decaf.berkeleyvision.org/

I also have to take the opportunity to plug Caffe [1] - Yangqing's replacement for DeCAF which he actually open sourced just a few hours ago. All the heavy processing (e.g., forward/backprop) can be run either on your (CUDA-enabled) GPU or on the CPU, and the GPU implementation is actually a bit faster than cuda-convnet. The entire core is (imo) very well-engineered and written in clean lovely C++, but it also comes with Python and Matlab wrappers. I've personally been hacking around inside the core for about a month and it has really been a pleasure to work with.

[1] http://daggerfs.com/caffe/

"Decaf (CPU) is 2 times faster than Theano with GPU"

Have you managed to reproduce this? Thats awesome if it is true! I thought Theano was already very fast.

Quick question about your code for the conv net, why do you resize the images down to 32x32? I thought one of the big features of conv nets was the fact that they input does not have to be the same, it just slides a window around the image. Am I complete wrong with this one?

Would you be willing to maybe print out the weights for each layer? I'd be interested to see what features your conv net is capturing.

I was (and still am) trying to use an already trained CIFAR10 net in a similar manner to DeCAF/ImageNet. Because CIFAR10 operates on 32x32 color images, I did the same thing for the input of the DeCAF experiment. As far as I know, the inputs to the network need to be identical between train/test sets , though they can be 0-padded/color filled to make the dimensions match, it may affect results - haven't tried anything but scaling personally. I am pretty sure there are 2 sets of scaling happening for my DeCAF experiment: down to 32x32 with convert, then UP to 512x512, then the center 256x256 is pulled out. I think this may affect my results a little :)

The plan is to operate on 32x32 data for now, then try scaling up the input images or just scaling to 512x512 to see how input data size/resolution affects the DeCAF/pylearn2 classification result, either positively or negatively.

As far as network weights, I haven't tried to print/plot the DeCAF weights yet (though there are images in the DeCAF paper itself). For pure pylearn2 networks, there is a neat utility called show_weights.py in pylearn2/scripts.

Another method, which does do "chopping" is http://www.stanford.edu/~acoates/papers/coatesng_nntot2012.p... - which is a little different than what I am currently trying.

There's also Ersatz (http://www.ersatz1.com) -- deep learning as a service, provides a web interface that allows you to train different neural net architectures and then run them via an API, the networks are GPU backed, been in beta since January. I'm the founder and at NIPS now if anyone wants to ask me about it in person.

I was wondering when you were going to pop up here. Ersatz has a lot of potential, but as it stands now it's more or less a web UI for ensembling the pre-implemented models (correct me if I'm wrong). So if you don't want an autoencoder, convnet or a RNN, you're sort of out of luck, no?

I know the original website advertised 'custom architectures', but it's not entirely clear to me (... not that it necessarily should be) what the route for Ersatz's current implementation to something like that is. Comments?

haha, I guess I do have a way of popping up anytime somebody's talking about deep learning on the internet...

But yeah, fair points re: ersatz. We've got RNNs, autoencoders, conv nets, and deep feed forward nets w/ dropout, different types of nonlinearities, etc etc. I think these represent a pretty flexible set of architectures--but you're right, if you're looking for an RBM, you're out of luck for now. From there, it's a web interface and API that make it pretty straightforward to get started with these types of architectures. Which is still pretty damned cool, if I do say so myself...

I think of it like this:

* Use theano if you want maximum flexibility (and maximum difficulty in getting to results)

* Use pylearn2 if you want a really fair amount of flexibility and pre-built implementations of neural networks. It is, however, difficult to get started with. Otherwise it's awesome.

* Use Ersatz if you want to use neural networks without knowing how to build them--but also know that you're giving up some flexibility and Ersatz is a bit opinionated--which, honestly, i'm not convinced is a bad thing for the type of market we're trying to target (non-ML researchers, really)

Very different offerings for different needs.

Re: custom architectures, yeah, you're right--bottom line is allocation of resources--what should our team spend time on? Because we're bootstrapped, the answer to that is whatever people are asking for (and--pretty importantly--willing to pay for). So far, lack of model types hasn't been a deal breaker for us so we've been spending time improving the API, getting it to run faster, deal better with larger and larger amounts of data, etc. etc. etc. I do have some ideas on how "custom architectures" could work, but we're focusing on polishing the current offering for now.

So yes, I agree, Ersatz is not yet living up to its full potential. But that will come, one step at a time. If theano and pylearn2 seem too complicated, try Ersatz, it's getting better every day.

This is awesome and I'm sure it will be a boon to app developers who want to include Machine Learning capabilities in their apps.

It looks to me as though Ersatz's focus is on providing a limited range of relatively standard models, but make them highly accessible, stable, fast, and suitable for production, whereas most available frameworks like Theano, PyLearn2, etc are more geared to the tinkering researchers and less to be used in actual products.

What about Cuda Conv Net?


Hello Hacker News, this is the Hebel developer. Thanks for the attention and the Github stars!

I'll do my best to answer any questions in the thread. Looking forward to you trying out Hebel and your pull requests ;)

This is very interesting especially for those users more interested in running deep neural nets than programming deep neural nets. I was initially disinterested because it is missing several important features (at the moment) such as auto encoders and convolutional neural nets. A quick peek inside the example folder revealed the fact that you can specify what you want computed using YAML instead of specifying how to compute it in code means that as long as you're using something Hebel does implement you can quickly experiment with structure and parameters without worrying about programming errors. Very useful for a researcher more interested in playing with the structure of a deep neural net.

It also serializes the model and results to disk by default. This is great for loading up the model later on another, possibly less performant, machine and performing your classification / etc...

Of course PyLearn2 covers this feature set and more, but isn't as easy to get started with.

I'll make an effort to use this when I can, unfortunately my two current projects involve a convolutional neural net and an auto encoder. :(

TL; DR: Specifying structure using YAML instead of coding neural net. Working at a higher level than other libs such as Theano.

Serious question to those with experience in the area: is the term "Deep learning" more than a buzzword?

It is not a buzzword, as it describes a very specific concept: learning based on a neural network with multiple (say >= 3) hidden layers.

It is not a new idea, but it has been viable only since the last few years thanks to both computational power advances (huge clusters, GPUs, big data), and algorithmic breakthroughs (sampling algorithms, stochastic optimization, contrastive divergence, ...).

Of course, it's also a buzzword.

Yes, often in research when there is a hot topic like this everybody tries to jump on the bandwagon, and now the term is so widely spread that attaching "deep learning" to anything makes it sound cool.

What I meant is that at least this has some specificity, in contrast with terms like "web scale", "big data", "machine learning", "2.0", that are so broad that can be attached to anything.

What's wrong with "machine learning"? I think it's no less specific than "artificial intelligence" or "computer science".

It has content and communicates an approach to machine learning distinct from other approaches. It isn't like "big data" which is truly meaningless. However, deep learning is also not a single method or algorithm.

I would have described the library in question as a "GPU-Accelerated Neural Network library" since that is more descriptive.

All of my GPU resources are OpenCL based. The only place I could run it is on AWS. Someone really needs to make a CUDA compiler with an OpenCL backend.

I'm working on a low level library that - in theory - will allow you to write code once and compile it for OpenCL or CUDA backends [0]. It is still pre-alpha and completely unusable but maybe you want to have a look or keep an eye on it.

I am trying to see if I can put a) a portable interface together (for both writing kernels and the coordination of cards, contexts, memory, queues/streams etc.) and b) if the performance is portable. I can already see that performance for trivial kernels is portable from AMD to NVIDIA but as soon as I go to the Intel PHI things are suddenly very different.

[0] https://github.com/sschaetz/aura

Yes, I'm sorry about that, but it's not going to change. I personally don't have experience developing for OpenCL and the deep learning community seems to very much have embraced CUDA.

I have this same issue and am forced to run all my experiments on AWS. Once you get a workflow down it is pretty easy and the AWS resources provide an acceptable level of performance. Though I suppose if you're just toying around / exploring it can be discouraging to go through all of that and incur AWS fees.

But you must have an Nvidia card around some where to test your codes out first? It looks like the best bang for the buck is a GTX 760.

If I had a 760 I'd likely just run all my experiments right on my device. As it stands my only machine is a Early 2009 macbook pro with a Geforce 9600M. The nice part about pylearn2, and theano is that the symbolic expressions compiled by theano can run on your processor too, albeit much slower. You can always test to make sure 1 to 2 epochs work locally before sending it off to AWS for GPU computation.

I'd be very willing to buy a 760, or even a 770 at their current prices. The only thing holding me back is that i'd have to buy an entire computer in which to place the card. Haha :D

If you're interested in just how fast video cards can be for deep learning as compared to CPUs take a peek at the results on [0]. That is for older model GPUs, and they're an order of magnitude faster. Though as I understand it the Geforece 5xx cards are superior for scientific computing as compared to the 6xx and 7xx series which are more gaming oriented. (May still outperform 5xx due to raw speed at the cost of some additional CPU time). Have a look at the appendix on [1] for more info on Fermi vs Kepler GPUs.

[0] http://deeplearning.net/tutorial/lenet.html#running-the-code

[1] http://fastml.com/running-things-on-a-gpu/

The convolutional neural network code that pylearn2 and the Toronto group use is specifically tuned for GTX580 cards - users have reported factors of 2x-10x slowdown using Kepler series cards. In general, most users (of pylearn2 at least) highly recommend a GTX5xx device.

I personally use a GTX570, and it is pretty decent, though not spectacular. Costwise, it is reasonably priced, and "good enough" for most of the networks I have tried (minus ImageNet...)

A key problem is limited GPU memory in the Fermi series, as it is difficult to fit a truly monstrous network on a single card. Krizhevsky's ImageNet work had some very tricky things to spread it across 2 GTX580s, and the training still took a very, very long time.

I didn't realize the performance dip was so dramatic. As of late I've been considering acquiring some hardware and I guess I'm going to have to keep this in mind. If ones were to buy one today I don't even think I can find stores that sell the GTX 580. Would probably need to search on ebay.

Thanks for the heads up friend!

Check out this discussion - it may help you decide what card to get. There was also an email somewhere about how TITAN is currently not any faster than a 580, though no hard numbers.


Once again, my 570 is slower than a 580 (about 2x), but "good enough" for now.

Huh. Any theories as to why that is? Highly tuned coalesced reads that backfire on the kepler arch?

From what I understand, it is due to the programming specifics of the training algorithms, primarily being focused on exploiting certain registers and architecture features specific to Fermi. The code actually got updated from GTX280 series to GTX580 series IIRC, so it is likely that it will be updated again at some point by a motivated researcher or group. I suspect there simply isn't a need to update right now for most labs (though I suspect TITAN / TITAN LE / TITAN II may change that). Also, Alex Krizhevsky now works for Google :), so someone else may need to do the updating.


You can check out the code here - it is really good IMO.


What types of applications are you guys typically doing with these neural nets? If there was an alternative to AWS which was cheaper / easier to set up would it be a compelling service to check out?

Possibly, though at this point after stumbling through using EC2 for these resources (and contending with the insane price fluctuations recently) I'm coming to the conclusion that owning my own hardware may be of substantial benefit for experimentation.

Have you seen GPU Ocelot? (http://code.google.com/p/gpuocelot/).

Looks good! What performance gain could be expected when using GPU vs CPU for Deep Learning? It'd be great to get some figures.

Depends, of course, on the architecture but you could be looking at a ~x50 speedup.

Yes, you can't come up with a hard number, but a 10x speedup is really the baseline. 50x and more are not out of the question.

Theano is for compiling mathematical expressions to CUDA, it doesn't implement any deep learning itself. If anything this may be comparable to pylearn2.

Yes, this is true. Theano is just a GPU-computing framework that you can use to implement any kind of numerical model. Hebel implements some specific models, but it uses PyCUDA as a backend.

Went to the docs page, looked for benchmarks, haven't found any.

Went to the github page, looked for unit tests, haven't found any.

Other than that, looks promising.

What kind of benchmarks were you looking for? Benchmarks with respect to speed or MNIST results? For MNIST, using the DNN example, Hebel gets results that are in line with reported results on the MNIST homepage [1] for similar models (pure Neural nets with cross-entropy error and no distortions etc), which is about 1.3% error.

There are some unit tests in Hebel though [2], even though coverage is not complete.

[1] http://yann.lecun.com/exdb/mnist/ [2] https://github.com/hannes-brt/hebel/blob/master/hebel_test.p...

Awesome! I'll be testing this one out when I get some free time.

In which WebApp project did you use deep learning?

What problem did you solve with it?

Needs cool demos of practical applications.

If you execute the code shown in the README it should run an experiment against the MNIST dataset[0]. If it performs as well as other deep learning examples[1][2] you should see a test error rate of less than 1% on classification of handwritten digits. This is pretty impressive, especially considering the fact that the collection contains a few digits that I cannot even recognize myself.

[0] http://yann.lecun.com/exdb/mnist/

[1] http://deeplearning.net/tutorial/lenet.html#lenet

[2] http://neuralnetworksanddeeplearning.com/chap1.html

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact