
SystemML: Machine learning made easier (open source) - neilmack
https://developer.ibm.com/open/systemml/
======
probdist
I'm slightly concerned about the Hadoop/Spark + machine learning ecosystem
right now.

There seem to be a lot of technologies and projects being built up on the core
foundation of in memory distributed computation. MLlib in Spark, H2O, Apache
Mahout/Samsara, probably many others.

My impression is Samsara and SystemML's DML are being designed to supply the
primitives you need to build a machine learning model with less awareness of
the underlying system model. So this is ostensibly to save the developer of a
new algorithm the headache of thinking how their algorithm needs to relate to
the distributed computation model like one should consider for developing with
H2O or Spark. However, It seems like many common off the shelf algos are
already available and implemented indicating that perhaps working with Spark
or H2O backends directly is not that hard to produce performant ML codes.

As a computational scientist learning a DSL to accelerate development of an
algorithm of mine into production via Spark seems like a potential distraction
from just getting better with Spark.

~~~
urlwolf
The current Big data ecosystem is a difficult place to be for companies that
bet the house on hadoop (cloudera, Hortonworks, MapR) for sure.

There are new technologies coming at a breakneck pace. Check out apache Flink
if you need streams and the microbatches in Spark don't do it for you. The ML
lib there is not that far ahead yet though.

------
rectang
This project just entered the Apache Incubator 2 days ago.
[http://wiki.apache.org/incubator/SystemML](http://wiki.apache.org/incubator/SystemML)

------
holdenk
If anyone is interested in working on this (or other exciting Spark related
things) @ IBM please reach out to me ( my HN username with the letter k
removed at us.ibm.com )

~~~
michaelsbradley
Has there been any work toward integration with IPython/Jupyter?

~~~
nl
Getting IPython/Jupyter connected to Spark isn't hard.

I did it following a combination of these two guide:

[http://ramhiser.com/2015/02/01/configuring-ipython-
notebook-...](http://ramhiser.com/2015/02/01/configuring-ipython-notebook-
support-for-pyspark/)

[http://thepowerofdata.io/configuring-jupyteripython-
notebook...](http://thepowerofdata.io/configuring-jupyteripython-notebook-to-
work-with-pyspark-1-4-0/)

Edit, if this is interesting then...
[https://news.ycombinator.com/item?id=10496385](https://news.ycombinator.com/item?id=10496385)

------
draven
The readme in the github repo has more information:
[https://github.com/SparkTC/systemml](https://github.com/SparkTC/systemml)

------
niels_olson
> written in Java

gotta go, see ya!

