Some Lesser Known Machine Learning Libraries (paralleldots.com)
33 points by gargisharma 1 hour ago | hide | past | web | 17 comments | favorite





My first reaction to this, "there must be a reason that they are not known", I click the link and "Error establishing a database connection" :D

We have fixed it now... We were not expecting so much traffic :)

If I was going to try machine learning I would code up from scratch rather than relying on black boxes.

Please don't spread bad advice. There is a reason we teach Python and not ia64 Assembly to beginners.

> There is a reason we teach Python and not ia64 Assembly to beginners.

Fashion is a reason, but it's not a good reason.

Here is something you should check out in that case https://github.com/eriklindernoren/ML-From-Scratch . Although be warned, everything from scratch approach l, despite being addictive, is harder than standing on shoulders of giants and takes too much resilience. If you are even a bit of procrastinater like me, it's tend to stay a sweet fantasy a lot of times. It's only when I was half working on a project and had no other option, I could go for implementing my first algorithm from scratch.

Why not take a well tested open source machine learning library and read the source code. No black box, and no spending a lot of time having re-inventing the wheel just to get some work done.

I feel at this point this is not a sensible thing to do any more unfortunately. I totally get the impulse though. For instance, there is still nothing great on the JVM for deep learning with symbolic differentiation (deeplearning4j does not have this, correct me if this has changed).

On the other hand, I realize that between writing native interfaces, symbolic differentiation (e.g. writing a port of autograd), network optimisers, custom layers, parameter servers, multi-GPU scheduling and so forth, I'd spend years before getting to do what I wanted to implement in the first place.

We are adding this now to our tensor library: https://github.com/deeplearning4j/nd4j/pull/1750

I've also added numpy interop via our new python interface jumpy: https://github.com/deeplearning4j/jumpy

We are doing a lot more than autograd though, this is going to support dynamic computation graphs, give you direct access to a graph data structure and will later be usable from nd4s (our scala wrapper)

Rather than spending time going back and forth implementing all of those things you could just pitch in with our existing efforts (hint: you'd actually be getting something done rather than debating ;))

We have a parameter server for word2vec, various kinds of optimizers and the like:https://github.com/deeplearning4j/nd4j/tree/master/nd4j-para...

I'd also just like to note for anyone else reading this: Mulling over doing something helps no one.

If you see something that's open source that's close to what you want try engaging the authors to see what they have to say. Maybe they will guide you. We've done that recently for our lapack integration with cpu and gpu as well as various neural net implementations.

No offense but it kills me to see comments like this. I see tons of people complaining about features yet doing nothing to add them let alone engaging open source authors.

It's kinda funny - every time someone has actually did that I've hired them. The developers that actually take action when engaging open source are amazing people.I have a feeling it's because they take the time to learn and get their feet wet even if it's initimidating.

Other neat community initiatives include flink: https://issues.apache.org/jira/browse/FLINK-5782

Nasa (Apache Tika): https://github.com/apache/tika/pull/165

A language for our ETL library DataVec (supports binary vectorization AND sql like transformations!): https://github.com/deeplearning4j/DataVec/issues/224

A scala lib like tensorflow built on top of nd4j: https://github.com/ThoughtWorksInc/DeepLearning.scala

Our spark ml integration: https://github.com/deeplearning4j/deeplearning4j/tree/master...

The community is very active. We have 4200 people in a gitter room alone: http://gitter.im/deeplearning4j/deeplearning4j

I have contributed to dl4j though ;)

I did not mean not criticise dl4j at all, I was simply pointing out an example of a feature I know I was missing at a point, I think we are actually agreeing. It does not always make sense to start something from scratch even though it's fun and a great learning experience. The ramp-up to something really useful in deep learning is simply very high. Further, few people can be an expert on the whole stack and I have no problem admitting to myself that even if I spent 2 years writing something from scratch, many parts would simply not be as good as something I could copy from an existing open source library. That's why contributing to open source also makes more sense to me - you get to work on a part that you can be good at.

Also should point out that when I was having problems with custom loss functions a year ago you guys were extremely helpful on Gitter in discussing issues.

Hard to tell from an HN user name :D. That's great to hear! I get how hard it can be - what you get out of it is learning though. We have some seriously cool examples that are just weekend projects for folks right now with javafx for example: https://github.com/deeplearning4j/dl4j-examples/pull/421

A lot of community contributions are in the examples now, if you haven't used dl4j in a while maybe take a look.


Probably a better thing to do would be to download one of the many excellent open source libraries and explore how they work. You can then even contribute back. Many open source projects would really benefit from new users contributing to documentation for a start.

Could you name a few such open source projects ?

scikit-learn is a production ready library that has some very well commented and easy to read source code.

https://github.com/eriklindernoren/ML-From-Scratch is an easy to understand understand implementation of some the basic ML algorithms built from first principles and aims for readability over performance.

I was asking about the ones which don't have very good documentation and are in need of contributors.

I guess, it purely depends on what you hope to achieve. If you're going to spend a few months learning how ML works, sure you'll benefit immensely. But if you're planning to apply in some field/area, writing your own library and making sure it's better than others, well, it's not going to be easy or fast..

to code a machine learning library from scratch, you must first invent the universe

