[Python] pylearn2 by the Lisa Lab @ uMontreal. Probably one of the beefiest of the bunch. Has a pretty extensive feature set, but IMO isn't the simplest to use.
 [Python] MORB by Sander Dielman. Just restricted boltzman machines, but very nice and intuitive to use. Also built on Theano.
 [C++] CUV by the AIS lab at Bonn University in Germany. Not strictly deep learning, but it's a fast CUDA-backed library written in templated C++ with bindings to other languages, including python. It's been used to implement deep learning models.
 [Python] DeepNet by Nitish Srivastava and co. at U of Toronto. I don't have as much experience with this one but it's built on cudamat and implements most of the main model types. Interestingly, they've taken the approach of using Google protocol buffers as a sustainable means of defining model configurations.
 [MATLAB] DeepLearnToolbox by Rasmus Palm. If matlab is your thing, here you are. Implements most models you are likely to want.
Not all of these developed equally as actively developed, but there is some good stuff above and I haven't found many instances where what I wanted wasn't (somewhat) readily available in at least one. I'm sure I'm forgetting one or two.
There's a lot of interest for convolutional networks and the best way to implement it will be to wrap Alex Krizhevsky's cuda-convnet, like DeepNet and PyLearn2 have, but this will require a bit more effort.
With respect to other deep learning packages, Hebel doesn't necessarily do everything differently, but depending on your needs it may be the best choice for a particular job.
PyLearn2 is big and monumental and although I haven't used it much personally, it seems to excellent. But as you mentioned, it's not necessarily easy to use and if you want to extend it, you have to learn the Theano development model, which takes some time to grok.
DeepNet is quite similar to Hebel in its approach (even though it offers more models right now). However, DeepNet is based on cudamat and gnumpy, which I have found to often be quite unstable and slow. Hebel is based on PyCUDA which is very stable and according to some preliminary tests I did runs about twice as fast as cudamat.
So, the idea of Hebel is that it should make it easy to train the most important deep learning models without much setup or having to write much code. It is also supposed to make it easy to implement new models through a modular design that lets you subclass existing layers or models to implement variations of them.
I have found that using a trained net for preprocessing can be accomplished using very limited resources (read: Core 2 Duo laptop). This is one of the very nice features of DeCAF, which could allow for some interesting applications on embedded devices.
Great work by the way - I look forward to testing it out soon!
As far as embedded devices go (I assume you're talking about ARM cpus etc), they are probably too underpowered to run Neural nets anyway, or models would have to be written in highly specialized C.
Github is here:
My current code as an example of combining sklearn/pylearn2 with DeCAF preprocessing (under the decaf folder, sklearn usage is under previous commits):
I also have to take the opportunity to plug Caffe  - Yangqing's replacement for DeCAF which he actually open sourced just a few hours ago. All the heavy processing (e.g., forward/backprop) can be run either on your (CUDA-enabled) GPU or on the CPU, and the GPU implementation is actually a bit faster than cuda-convnet. The entire core is (imo) very well-engineered and written in clean lovely C++, but it also comes with Python and Matlab wrappers. I've personally been hacking around inside the core for about a month and it has really been a pleasure to work with.
Have you managed to reproduce this? Thats awesome if it is true! I thought Theano was already very fast.
Would you be willing to maybe print out the weights for each layer? I'd be interested to see what features your conv net is capturing.
The plan is to operate on 32x32 data for now, then try scaling up the input images or just scaling to 512x512 to see how input data size/resolution affects the DeCAF/pylearn2 classification result, either positively or negatively.
As far as network weights, I haven't tried to print/plot the DeCAF weights yet (though there are images in the DeCAF paper itself). For pure pylearn2 networks, there is a neat utility called show_weights.py in pylearn2/scripts.
Another method, which does do "chopping" is http://www.stanford.edu/~acoates/papers/coatesng_nntot2012.p... - which is a little different than what I am currently trying.
I know the original website advertised 'custom architectures', but it's not entirely clear to me (... not that it necessarily should be) what the route for Ersatz's current implementation to something like that is. Comments?
But yeah, fair points re: ersatz. We've got RNNs, autoencoders, conv nets, and deep feed forward nets w/ dropout, different types of nonlinearities, etc etc. I think these represent a pretty flexible set of architectures--but you're right, if you're looking for an RBM, you're out of luck for now. From there, it's a web interface and API that make it pretty straightforward to get started with these types of architectures. Which is still pretty damned cool, if I do say so myself...
I think of it like this:
* Use theano if you want maximum flexibility (and maximum difficulty in getting to results)
* Use pylearn2 if you want a really fair amount of flexibility and pre-built implementations of neural networks. It is, however, difficult to get started with. Otherwise it's awesome.
* Use Ersatz if you want to use neural networks without knowing how to build them--but also know that you're giving up some flexibility and Ersatz is a bit opinionated--which, honestly, i'm not convinced is a bad thing for the type of market we're trying to target (non-ML researchers, really)
Very different offerings for different needs.
Re: custom architectures, yeah, you're right--bottom line is allocation of resources--what should our team spend time on? Because we're bootstrapped, the answer to that is whatever people are asking for (and--pretty importantly--willing to pay for). So far, lack of model types hasn't been a deal breaker for us so we've been spending time improving the API, getting it to run faster, deal better with larger and larger amounts of data, etc. etc. etc. I do have some ideas on how "custom architectures" could work, but we're focusing on polishing the current offering for now.
So yes, I agree, Ersatz is not yet living up to its full potential. But that will come, one step at a time. If theano and pylearn2 seem too complicated, try Ersatz, it's getting better every day.
It looks to me as though Ersatz's focus is on providing a limited range of relatively standard models, but make them highly accessible, stable, fast, and suitable for production, whereas most available frameworks like Theano, PyLearn2, etc are more geared to the tinkering researchers and less to be used in actual products.