First, this is a lot of code! As a C++ machine learning programmer, I am impressed as I know the pain (someone explains why, see comment https://news.ycombinator.com/item?id=5613797 ).
Second, it contains a version of Blas and ublas as well as LBFGS and more, much more, coded from scratch as it seems. This seems too much for an ML library, and a lot to maintain.
This makes me skeptical of performances and maintenance of the code, but it would be fairer to try it first.
Still very impressed.
First of all, we are glad that our library is discussed on this board! We are happy for every feedback we can get!
Regarding Performance: we try to get the key algorithms as fast as possible. And for the hardest parts we rely not on ublas, but use optional bindings to ATLAS. Speed was one of the key design criteria. We hope that we achieved that. Clearly this is no guarantee that every algorithm is fast, but in this case: just add a ticket!
Please bear in mind, that Shark is still in beta stage, and we are heavily developing it (I am right now working on the family of multi class SVMs). So for example parallelism using OpenMP is not fully integrated.
Right now the linear algebra library we use -ublas- has the same behaviour as Eigen for BLAS1 type expressions. So it tries to generate optimal (non-SSE) code. Only for BLAS2 and 3 we fall back to the ATLAS-routines which has the same performance as Eigen on the interesting problem sizes.
In the end it is not so interesting whether the BLAS1-type expressions are fast as they make up < 1% of run time performance. The big chunks are the data processing inside the matrix-matrix multiplications of the Neural Networks and similar entities.
Another thing about code generation, I am also using a hacked version of Eigen as well in a project I'm working on that can do the tanh and derivative of the tanh so the NN activations go quite abit faster since you can generate vectorized code for the whole calculation that will visit the memory location exactly once. While true the calculation of the weight updates is the most time spent, I saw 3-4x speedup in the activation code doing it in a single operation due to better memory access patterns and less loop iterations. Better memory access patterns can also have synergistic effects on other code because there is less cache pollution happening. By being fast and loose and introducing a few other copies of the matrix data in my case, my performance falls off a cliff when it no longer fits in the cpu cache nicely. 10x difference in the particular case I am remembering.
As always performance is part art, part science and perhaps it won't matter as much for the general case, but for my specific implementation and my matrix sizes Eigen has made a measurable difference for me compared to other solutions.
Right now I would say that the main focus of shark is research oriented. That is we want to be fast but also modular so that we can still easily exchange different aspects of the algorithms with our own work. As these goals sometime clash, it is hard to claim that we are the fastest, simply because there is for nearly every algorithm some way to improve when you know exactly which combination of model, loss function and training algorithm you use. But we are (hopefully) reasonably fast and certainly want to improve.
Typically I am astonished at the number of implementations of deep learning techniques (Shark does include some, AFAIK).
My past experience is that I had write many AI algorithms myself because I could not find any suitable, free and/or open implementations (or other researchers would not share theirs ;) ).
It's very true what you said about all the implementations available now. Although, for most algorithms I tend to try to implement them myself as a learning experience, maybe the biggest exception is standard SVM since it is kinda tricky but even for that there are some online algorithms that are easy to implement.
Thanks for sharing your experience!
I actually really don't understand why anyone uses GPL for a library. I've been doing open source for a long long time, and love the GPL. I have code in the Linux kernel, and believe free software AND open source software are great solutions to very real problems in software engineering. Having open code just gives people more options, and I firmly believe it will win over time as far as quality is concerned.
I just think only providing libraries to other GPL code is stupid. It just limits the usefulness of the software. LGPL is great here, you get the core changes contributed back to your library from a greater group of people and everyone wins. Limiting a library to GPL means a large population can not use your code, those writing applications that can't be licensed under the GPL. Limiting choice is BAD. The whole reason you should be creating and using free software and OSS is to not weld the hood shut. GPL should be for applications, LGPL just limits choices for libraries. Down with the GPL for libraries!!!
Flame Suit Off / Rant Mode Off
But you sound like you want to limit choice for users. The whole point of free software from the GNU perspective is to keep options open for users, and not allow downstream devs to "weld the hood shut" on derivatives by adding more restrictions on what users can do with those derivatives.
However, there are other reasons people choose it as well. One motivation you sometimes encounter is a view that, if someone's code is used in proprietary, commercial software, they'd like to be paid for it. Hence the dual-licensed model used by libraries like Qt and the Stanford Parser: you can use the GPL version if you're willing to GPL your own app, or you can buy a proprietary license if you aren't. Seems reasonably fair: I give you my code free if you reciprocate and do likewise with your own code, or I sell you a license for cash otherwise.
In the meantime, there's always the MIT licence.
On linux, you have to build distribution specific binaries that match the shared library versions in the package manager.
On Windows, you generally put all of your shared libraries in your application's folder, since there are plenty of bad actors who install DLLs without versions in the filename to system32. Leads to duplication on the system but its a generally accepted bad practice.
(Can't speak to shipping LGPL libs on OS X).
Do you think programmers shouldn't have the choice to decide whether their software may be used to kill people or cause the next flash crash? GPLv3 allows programmers to share their software while making sure big corporations and defense contractors steer clear.
I'm not really sure how they compare since I haven't really used either library, but there does seem to be some overlap.
Just a thought...
As some others have said, GPLv3 is off-putting, but there is the LGPL mlpack lib (http://www.mlpack.org/) (also C++). Personally, project-wise, the only way this could be improved is if the project were pure C, and a BSD, MIT, or similar license. Quite looking forward to checking these out, though.
So I reckon my GPU accelerated python still beats a C++ pthreads approach, and is alot faster to develop on.
Your mileage may vary, from what you said you probably know what you are doing, maybe GPU is not applicable. I was really replying to the initial comments that said they want to start learning machine learning on a C++ system. Training for days suggests you are doing something hardcore like MCMC/DBN/Guassian Processes, learners should not start there though....
I suspect my tuned C++ code will work quite well on a Intel MIC, and that is probably where I'm going to go when I have more resources to throw at the problem. I do know that Theano does use Alex's C++ CUDA code under the covers and I have done lots of reading of some of theano's code looking at implementation details to help developing my code. I just am not a big python (or most scripting languages actually) fan, perhaps I'm just too old school and written C, C++, C# and Java too long. If it doesn't smell or feel like C, I feel like Scotty in Star Trek 4 when he was making the transparent aluminum on the mac.
Implement the algorithm yourself, first, in Python+Numpy. The only reason I feel comfortable with Gaussian Processes and SVMs is due to writing code to solve them manually.
Once you're happy with the basics, and can test your ideas with code you intimately understand, optimise for speed by using a library like this.
edit: For an example of using cvxopt, check out http://www.mblondel.org/journal/2010/09/19/support-vector-ma...
On reflection I guess I might have had more free time to spend on this than a normal person - I did the SVM as a [small] part of my masters project, so if you're time constrained with a real job and a life then might be best to disregard me.
You might also ask on the fsharp-opensource mailing list, maybe someone has an F# ML library I don't know about:
folks on SO also like WEKA run through IKVM (a Java to .NET converter): http://stackoverflow.com/questions/1624060/machine-learning-...