[Python] pylearn2 by the Lisa Lab @ uMontreal. Probably one of the beefiest of the bunch. Has a pretty extensive feature set, but IMO isn't the simplest to use.
 [Python] MORB by Sander Dielman. Just restricted boltzman machines, but very nice and intuitive to use. Also built on Theano.
 [C++] CUV by the AIS lab at Bonn University in Germany. Not strictly deep learning, but it's a fast CUDA-backed library written in templated C++ with bindings to other languages, including python. It's been used to implement deep learning models.
 [Python] DeepNet by Nitish Srivastava and co. at U of Toronto. I don't have as much experience with this one but it's built on cudamat and implements most of the main model types. Interestingly, they've taken the approach of using Google protocol buffers as a sustainable means of defining model configurations.
 [MATLAB] DeepLearnToolbox by Rasmus Palm. If matlab is your thing, here you are. Implements most models you are likely to want.
Not all of these developed equally as actively developed, but there is some good stuff above and I haven't found many instances where what I wanted wasn't (somewhat) readily available in at least one. I'm sure I'm forgetting one or two.
There's a lot of interest for convolutional networks and the best way to implement it will be to wrap Alex Krizhevsky's cuda-convnet, like DeepNet and PyLearn2 have, but this will require a bit more effort.
With respect to other deep learning packages, Hebel doesn't necessarily do everything differently, but depending on your needs it may be the best choice for a particular job.
PyLearn2 is big and monumental and although I haven't used it much personally, it seems to excellent. But as you mentioned, it's not necessarily easy to use and if you want to extend it, you have to learn the Theano development model, which takes some time to grok.
DeepNet is quite similar to Hebel in its approach (even though it offers more models right now). However, DeepNet is based on cudamat and gnumpy, which I have found to often be quite unstable and slow. Hebel is based on PyCUDA which is very stable and according to some preliminary tests I did runs about twice as fast as cudamat.
So, the idea of Hebel is that it should make it easy to train the most important deep learning models without much setup or having to write much code. It is also supposed to make it easy to implement new models through a modular design that lets you subclass existing layers or models to implement variations of them.
I have found that using a trained net for preprocessing can be accomplished using very limited resources (read: Core 2 Duo laptop). This is one of the very nice features of DeCAF, which could allow for some interesting applications on embedded devices.
Great work by the way - I look forward to testing it out soon!
As far as embedded devices go (I assume you're talking about ARM cpus etc), they are probably too underpowered to run Neural nets anyway, or models would have to be written in highly specialized C.
Github is here:
My current code as an example of combining sklearn/pylearn2 with DeCAF preprocessing (under the decaf folder, sklearn usage is under previous commits):
I also have to take the opportunity to plug Caffe  - Yangqing's replacement for DeCAF which he actually open sourced just a few hours ago. All the heavy processing (e.g., forward/backprop) can be run either on your (CUDA-enabled) GPU or on the CPU, and the GPU implementation is actually a bit faster than cuda-convnet. The entire core is (imo) very well-engineered and written in clean lovely C++, but it also comes with Python and Matlab wrappers. I've personally been hacking around inside the core for about a month and it has really been a pleasure to work with.
Have you managed to reproduce this? Thats awesome if it is true! I thought Theano was already very fast.
Would you be willing to maybe print out the weights for each layer? I'd be interested to see what features your conv net is capturing.
The plan is to operate on 32x32 data for now, then try scaling up the input images or just scaling to 512x512 to see how input data size/resolution affects the DeCAF/pylearn2 classification result, either positively or negatively.
As far as network weights, I haven't tried to print/plot the DeCAF weights yet (though there are images in the DeCAF paper itself). For pure pylearn2 networks, there is a neat utility called show_weights.py in pylearn2/scripts.
Another method, which does do "chopping" is http://www.stanford.edu/~acoates/papers/coatesng_nntot2012.p... - which is a little different than what I am currently trying.
I know the original website advertised 'custom architectures', but it's not entirely clear to me (... not that it necessarily should be) what the route for Ersatz's current implementation to something like that is. Comments?
But yeah, fair points re: ersatz. We've got RNNs, autoencoders, conv nets, and deep feed forward nets w/ dropout, different types of nonlinearities, etc etc. I think these represent a pretty flexible set of architectures--but you're right, if you're looking for an RBM, you're out of luck for now. From there, it's a web interface and API that make it pretty straightforward to get started with these types of architectures. Which is still pretty damned cool, if I do say so myself...
I think of it like this:
* Use theano if you want maximum flexibility (and maximum difficulty in getting to results)
* Use pylearn2 if you want a really fair amount of flexibility and pre-built implementations of neural networks. It is, however, difficult to get started with. Otherwise it's awesome.
* Use Ersatz if you want to use neural networks without knowing how to build them--but also know that you're giving up some flexibility and Ersatz is a bit opinionated--which, honestly, i'm not convinced is a bad thing for the type of market we're trying to target (non-ML researchers, really)
Very different offerings for different needs.
Re: custom architectures, yeah, you're right--bottom line is allocation of resources--what should our team spend time on? Because we're bootstrapped, the answer to that is whatever people are asking for (and--pretty importantly--willing to pay for). So far, lack of model types hasn't been a deal breaker for us so we've been spending time improving the API, getting it to run faster, deal better with larger and larger amounts of data, etc. etc. etc. I do have some ideas on how "custom architectures" could work, but we're focusing on polishing the current offering for now.
So yes, I agree, Ersatz is not yet living up to its full potential. But that will come, one step at a time. If theano and pylearn2 seem too complicated, try Ersatz, it's getting better every day.
It looks to me as though Ersatz's focus is on providing a limited range of relatively standard models, but make them highly accessible, stable, fast, and suitable for production, whereas most available frameworks like Theano, PyLearn2, etc are more geared to the tinkering researchers and less to be used in actual products.
I'll do my best to answer any questions in the thread. Looking forward to you trying out Hebel and your pull requests ;)
It also serializes the model and results to disk by default. This is great for loading up the model later on another, possibly less performant, machine and performing your classification / etc...
Of course PyLearn2 covers this feature set and more, but isn't as easy to get started with.
I'll make an effort to use this when I can, unfortunately my two current projects involve a convolutional neural net and an auto encoder. :(
TL; DR: Specifying structure using YAML instead of coding neural net. Working at a higher level than other libs such as Theano.
It is not a new idea, but it has been viable only since the last few years thanks to both computational power advances (huge clusters, GPUs, big data), and algorithmic breakthroughs (sampling algorithms, stochastic optimization, contrastive divergence, ...).
What I meant is that at least this has some specificity, in contrast with terms like "web scale", "big data", "machine learning", "2.0", that are so broad that can be attached to anything.
I would have described the library in question as a "GPU-Accelerated Neural Network library" since that is more descriptive.
I am trying to see if I can put a) a portable interface together (for both writing kernels and the coordination of cards, contexts, memory, queues/streams etc.) and b) if the performance is portable. I can already see that performance for trivial kernels is portable from AMD to NVIDIA but as soon as I go to the Intel PHI things are suddenly very different.
I'd be very willing to buy a 760, or even a 770 at their current prices. The only thing holding me back is that i'd have to buy an entire computer in which to place the card. Haha :D
If you're interested in just how fast video cards can be for deep learning as compared to CPUs take a peek at the results on . That is for older model GPUs, and they're an order of magnitude faster. Though as I understand it the Geforece 5xx cards are superior for scientific computing as compared to the 6xx and 7xx series which are more gaming oriented. (May still outperform 5xx due to raw speed at the cost of some additional CPU time). Have a look at the appendix on  for more info on Fermi vs Kepler GPUs.
I personally use a GTX570, and it is pretty decent, though not spectacular. Costwise, it is reasonably priced, and "good enough" for most of the networks I have tried (minus ImageNet...)
A key problem is limited GPU memory in the Fermi series, as it is difficult to fit a truly monstrous network on a single card. Krizhevsky's ImageNet work had some very tricky things to spread it across 2 GTX580s, and the training still took a very, very long time.
Thanks for the heads up friend!
Once again, my 570 is slower than a 580 (about 2x), but "good enough" for now.
You can check out the code here - it is really good IMO.
Went to the github page, looked for unit tests, haven't found any.
Other than that, looks promising.
There are some unit tests in Hebel though , even though coverage is not complete.
What problem did you solve with it?