
IBM's SystemML Machine Learning – Now Apache SystemML - vasili111
https://github.com/SparkTC/systemml
======
btown
So what's the benefit to using SystemML on top of Spark (or another scheduler)
vs. using Spark's own MLlib? From the links below, it seems MLlib supports a
superset of SystemML's algorithms (with the small exception of survival
analysis). Are there any plans for the projects to use insights from each
other, or to merge in some way? Because it seems quite silly for Apache to
sponsor two competing machine learning libraries with exactly the same goals.

[https://sparktc.github.io/systemml/algorithms-
reference.html](https://sparktc.github.io/systemml/algorithms-reference.html)

[https://spark.apache.org/docs/latest/mllib-
guide.html](https://spark.apache.org/docs/latest/mllib-guide.html)

~~~
dusenberrymw
Great question. I'm part of the committer team for the project at IBM, so I'll
leave a few comments representing our thoughts. As a quick overview, SystemML
provides an R-like DSL, called _DML_ , consisting of linear algebra primitives
(vectors, matrices), built-in functions for common functions (such as sums,
means, matrix construction, etc.), UDFs, etc., as well as a compiler/optimizer
engine that can generate optimized runtime plans from the same DML script for
a single node (laptop), Spark, or Hadoop MapReduce. We definitely have
algorithms already available as production-ready examples, but the goal of the
project is to allow for _declarative_ ML using customizable scripts written at
the mathematical DSL level, rather than to provide a fixed _library_ of
algorithms at the base language level (Scala, Python, etc.). MLlib (including
the newer ML API) is awesome, and provides a great set of algorithms that fit
in quite well with Scala, Python (& Java). SystemML is great in that it
provides the ability to run customizable, linear algebra-based ML scripts
(that can be automatically optimized within the engine) on Spark. Together,
it's a great combo. We also have an API for Scala that lets one embed DML into
a Scala program similar in manner to how an SQL script can be embedded
[[http://sparktc.github.io/systemml/mlcontext-programming-
guid...](http://sparktc.github.io/systemml/mlcontext-programming-guide.html)].

Here are our new Apache links:

[https://systemml.apache.org](https://systemml.apache.org)

[https://github.com/apache/incubator-
systemml](https://github.com/apache/incubator-systemml)

~~~
btown
Very cool - so the algorithms shared with MLlib are just an example of what
can be achieved with such a DSL. This could actually be very useful for
inference on domain specific generative models that need custom-derived update
steps. And probabilistic programming could be built on top of this. I stand
very much corrected - this is definitely a project worth attention from the
community separate from Spark itself!

------
ealexhudson
Cynic in me thinks this is just another IBM project thrown over the fence to
die in the Apache graveyard. I hope that's very wrong in this case.

~~~
bitmapbrother
I wonder why companies even bother open sourcing their software with all the
critics looking for angle's to their motivation. Better to keep it private and
not put up with the arm chair open source critics.

~~~
ricksplat
The cynic in me thinks it's often a case of a product line they no longer want
to support but for which an established user base already exists (Flex comes
to mind). The optimist in me thinks this isn't necessarily a bad thing though,
it's certainly better than just killing it altogether. The FOSS idealist in me
thinks it's great to be getting all this stuff, under an Apache licence!

------
nfa_backward
This looks interesting and something I will definitely watch, but at this
point I think I will still stick with [http://h2o.ai/](http://h2o.ai/)
(another JVM based ML open source project that integrates well with 'Hadoop').
I have been really impressed with the quality of the product and even more so
with the quality of the people behind the it.

------
sandGorgon
interesting.. so it allows R syntax. I wonder why they didnt build it on top
of Renjin [1]

[1] [http://www.renjin.org/](http://www.renjin.org/)

------
chris_wot
Ah, Apache. Where projects go to die.

