
Show HN: Eclipse Deeplearning4j - vonnik
https://projects.eclipse.org/proposals/deeplearning4j
======
stolsvik
For anyone that hasn't checked out Deeplearning4j yet, and think that Python
is the only place where the cool AI stuff happens, you really should check
this project out!

Their Gitter channel is absolutely kicking, where lots of the actual devs,
including CTO and co-founder Adam Gibson, hang out and /really/ answer /any/
question - I find their responsiveness stunning! (I discussed and reported a
minor improvement to the config builder, and it was implemented - literally -
the very next day!)

Just recently found the library myself, and I found it very refreshing to be
able to utilize my decent Java competence in ML, instead of continuously
having to hammer my way through limited Python experience to get things done.
(Python also have limited multi-threading capabilities which I found annoying
when wanting to do some larger ETL stuff "inline". Python also gives me a
strong feeling of "this is a scripting language!": It is super fast and
expressive when your needs are met by the vast set of (native) libraries - but
when I just want to do something myself with raw code, things gets slower. I
also love a properly typed language, which python obviously isn't!)

Do notice that with dl4j, you still get full-on GPU acceleration via Nd4j,
which is a "NumPy-style" multi-dim array system.

~~~
albertzeyer
If you prefer Java over Python, some of the other more popular frameworks also
have Java bindings. E.g. TensorFlow or CNTK.

~~~
agibsonccc
I'm not exactly sure I'd call those "bindings". If you want an actual
compelling competitor to us because we're not a big name, at least look at
mxnet. That's more credible.

It's incredible that folks insist on this still. Both those frameworks aren't
even going to be aiming for any level of integration with the JVM.

At least take a look at what makes us different.

A big one being:
[https://github.com/bytedeco/javacpp](https://github.com/bytedeco/javacpp)

We own the whole stack here and integrate pretty deeply. This has lead to some
great performance improvements.

We aren't just "java bindings" but offering a lot more in one place than the
other frameworks.

Also: I highly doubt you've even looked at the numbers.

We're actually doing pretty well in the rankings:
[https://twitter.com/fchollet/status/915366704401719296](https://twitter.com/fchollet/status/915366704401719296)

We're not #1 but we have an actual user base.

I think a lot of our upcoming things like our python bindings where people
don't have to "see" java and integrations with spark will win over some folks
at big companies..maybe not DL researchers which is also fine. Our upcoming
api _will_ be like pytorch though.

Regardless: I'm not sure the framework will matter long term given that model
import and things like ONNX are becoming more common.

Finally, I'll just say this: It doesn't hurt the ecosystem to have more
competition and different interests than research.

Every single person who says this usually dismisses us for 1 silly reason or
another ranging from:

1\. Not a "big name" (despite being put in production by a ton of them)

2\. Not a lot of research papers (yeah no crap we're not a research framework,
we do have publications though!)

3\. They see java bindings and can't tell the difference between a full
application suite

and a tensor lib with java bindings generated by SWIG

4\. Usually a PHD student at a university who hasn't worked at a large company
and doesn't understand my actual target audience

~~~
albertzeyer
It sounds like you got a bit angry because of my comment? Sorry, I didn't
meant to mean it negative in any way, esp nothing negative about DL4J. I just
thought that it might be a good contribution to the discussion to have a more
complete overview of the DL options in Java, as it was not mentioned here. Of
course, there are advantages and disadvantages for each. E.g. TensorFlow might
be more well-known but its Java bindings are lacking a lot of the tools which
you have in the Python bindings. Also, despite the programming language, each
framework underlying design has its advantages and disadvantages.

I know some people / companies who do the research and training in Python with
TensorFlow and in production for inference they use the Java bindings (or
C++). That works quite well.

Also, competition never hurts.

~~~
agibsonccc
Oh no problem! I guess what I was getting at with my reply. We get dismissed
all the time by the folks doing research (understandly, they wouldn't use us)

I guess what I was elaborating on (partially for clarity of other readers who
_really_ don't know the difference, partially as a response to your comment
here) was the fact that they really are very different things.

My problem with these things being posted as "alternatives" is: deep learning
is just 1 part of the suite.

When you want to do anything with these other frameworks, usually python is
somewhere in your workflow.

That isn't my only complaint though: Dl4j "the deep learning framework" is 2
parts. Nd4j (the tensor library directly comparable to TF/mxnet,..) and dl4j
the deep learning DSL which is higher level like keras.

As for your point about folks using C++/Java bindings: Even "production" ends
up being nuanced there. TF has different deployment modes with almost no out
of the box tooling for very common deployment scenarios. That includes things
like kafka, spark,..

We take an angular js like approach to this. Rather than have deployment as an
exercise left to the reader, you get imports from different frameworks, a
clear way of doing everything from ETL to setting up a server, and a way of
actually debugging/controlling things from the JVM rather than using some SWIG
bindings with a black box.

For example, our tensor library has deep integrations with the java gc which
allows reference collecting as well as in java memory management system for
cpu and gpu all for handling management on the gpu.

You need to control those things if you say: run things on a tomcat server.
You don't get that granularity when just deploying some c++ bindings. There's
a reason I emphasize having the full runtime tooling available to you (not
just integrations).

So yes: In general when I see someone just off hand pitching these things, I
clarify it. As I said in my previous comment: A whole application suite is a
very different approach than a "tensor lib with autodiff and some java
bindings".

People have different use cases and different requirements. I'd at least like
a fair comparison side by side though..

------
vonnik
The interesting thing about this process is timing. You shouldn't really
contribute a code base to a foundation until it's mature enough.

I think any group of people developing an open-source project wants it to
develop quickly and with healthy governance. Joining any foundation, whether
it's Apache or Eclipse or Linux, sends a signal that they're mature, thinking
about governance and want to make sure the product develops in a way that
agrees with the community.

But sometimes governance can get in the way of speed. What we found talking to
Eclipse was that we could get the governance and keep the speed. Which means
we'll be able to keep pushing DL4J forward without confusion.

Fwiw, here's the DL4J site:
[https://deeplearning4j.org/](https://deeplearning4j.org/)

Here are the repositories:
[https://github.com/deeplearning4j](https://github.com/deeplearning4j)

And the Gitter community:
[https://gitter.im/deeplearning4j/deeplearning4j](https://gitter.im/deeplearning4j/deeplearning4j)

~~~
mindcrime
DL4J strikes me as reasonably mature anyway, although you probably know more
about that than I do. :-)

Does this mean anything specific for Skymind as the commercial vendor behind
DL4J?

~~~
vonnik
That's nice of you. :-) It's mature now, which is why this was the moment to
move it into Eclipse.

It means that DL4J & suite is now vendor neutral. On the Skymind side, we will
continue to develop all those open-source projects, so you can expect a lot
more cool stuff to come: interpretable models, better ETL, Keras as our Python
API, vertical-specific apps for EDA, Robotics...

~~~
mindcrime
Very cool. I'm a DL4J fan. I actually did a talk at Tri-JUG a couple of weeks
ago, on real-time machine learning and BPM, which featured DL4J as part of the
tech stack. I'm also working on a SaaS offering around AI/ML and plan to
include support for DL4J at some point. Hopefully at some point I can get to a
place where I can make some useful contributions to the project.

~~~
agibsonccc
Thanks! Please share what you're doing and we'll promote as much as we can.

------
crockpotveggies
The amount of work we've done this year with Deeplearning4j on performance has
been much higher than previous years. We brought DL4J up to par with community
standards while maintaining the advantages of Java. I think what a lot of
people don't realize is that a ton of effort has been made toward ETL and
integration tooling.

It's very difficult to train on multiple GPUs while maintaining performance of
ETL. ETL is a scary hidden bottleneck.

I'm very interested to see how Eclipse can continue to push development. I
think the people who will especially benefit from this are devops/production
teams operationalizing data science.

~~~
stingraycharles
As someone who is mostly working in the field of data warehousing, ETL has a
very meaning to me (Extract, Transform, Load). Is this the same as you’re
talking about ?

~~~
crockpotveggies
Yes, so one of the core libraries within the DL4J project is datavec which is
ETL-focused. One key problem that we discovered - and fixed - was that reading
and transforming data for training could bottleneck a multi-GPU process. You
spend a lot of $$$ on a deep learning computer, but making the library
performant enough so that you could load data at the same rate the GPUs could
consume it was challenging. This scales to about 4+ GPUs (depending on
datatype) and we're building a datavec server so this can scale much larger.
There are still good returns if you clean and transform your data and presave
to disk, which helps with large machines such as a DGX. However, other
bottlenecks still apply (which we are solving right now).

I hope that answers your question. I consider the process of extracting
records, transforming them, and loading them for training to be "ETL". I
understand ETL also applies to other data consumption.

*I should also note that if you wanted to use datavec for ETL and do not wish to train a deep learning model, it is quite useful for columnar data.

~~~
riku_iki
Why mix ETL and training?

I am using TF, and in my workflow I first do all ETL in some separate process,
dump all training/validation data into TFRecord file, and then my training
program consumes it. Clear separation of concerns without any performance
penalty.

And I can iterate over training logic with various parameters as many times as
I want without touching ETL.

~~~
deepGem
But what if your training logic needs a change in ETL - You have to iterate on
ETL too. So doing it inline makes a lot of sense. Microsoft has done something
like this with SQL server + R offering. Personally I find the MS approach
quite appalling. You have to load your model in a PLSQL script, so the ETL +
running the model is seamless, but imagine debugging and oh forget about multi
GPU support.

~~~
agibsonccc
The part of the ETL you'd want to "crystallize" is just the transformation
from raw data to feature vector.

Beyond that, you already "change in ETL" when you experiment. I'm not sure how
this changes anything.

Out of nowhere you've sprinkled in "multi gpu support" which I'm not sure is
relevant here. Do you mean as part of training?

We handle the gpu bits for you. All you do is define your transform logic, it
runs on one of our backends like spark and then when you go to allocate a
tensor boom gpu.

There's no special compilation or process needed to make this happen.

Dl4j supports multi gpu training out of the box. All you need to do is uses
our parallelwrapper module.

We can also do distributed training with gpus on spark as well (yes this
includes cudnn).

You've also for some reason decided to attach an open ended coding library
where you can do whatever you want to a database?

Baked in ML in the database servers is already notoriously bad. The whole
point of what we're doing is to provide a middle ground.

Data engineers have to do this anyways.

~~~
deepGem
Adam, sorry guess I was not clear. What I said was that the MS Sql server + R
strategy of baking in ML in PLSQL is appalling and doing something like what
you are doing with DL4J is perhaps the right approach. The multi GPU support
rant was for SQL Server + R, not DL4J.

"You've also for some reason decided to attach an open ended coding library
where you can do whatever you want to a database?" I didn't catch this part.

~~~
agibsonccc
Right so I think we were agreeing that how database servers bake in the ML is
a bit weird and not the way to go.

The "open ended coding library" is datavec here. Comparing them here is only
really semi valid. The processes there are definitely brittle.

------
sandGorgon
This is super, super cool!

Love the work you guys have been putting out. How do you generally see the
community mindshare in space especially with
[https://techcrunch.com/2017/02/13/yahoo-supercharges-
tensorf...](https://techcrunch.com/2017/02/13/yahoo-supercharges-tensorflow-
with-apache-spark/) (you guys are also mentioned there)

You guys run this at scale, but Yahoo's TensorFlowOnSpark does look very
enticing with its "Easily migrate all existing TensorFlow programs with <10
lines of code change" punchline

~~~
agibsonccc
Doesn't matter. We don't need to be the #1 framework. We're going to be
focused on model import and integrating with the big data ecosystem.

None of these frameworks you're mentioning integrate properly, are really
supported by an actual community, and don't get meaningful updates.

The biggest problem you run in to pretty quickly is maintenance. TF and spark
both update quickly.

Every time someone has attempted to do a "TF on spark" they don't add ETL,
proper JVM integrations (proper control of cuda and memory management from the
JVM), there are also usually strange interactions with JVM/python run times.

Also, please don't ignore the rest of dl4j. We have a whole suite of tools in
there. It's not just a matrix lib with autodiff like TF and co is.

TF in its own way is adding some of this stuff like TF records and some of
their readers, but it's not going to add a lot in the way of things like
connectors to kafka and a lot of the big data ecosystem.

They've basically added HDFS..that was about it.

Dl4j is also significantly easier to deploy. It's just a jar file or zip
file..you don't need a blob of c++ just to run a model.

~~~
sandGorgon
So just trying to understand this - does dl4j become the essential middle
layer if someone wants to run Tensorflow on spark seamlessly?

Because it seemed on the surface that you guys replaced tf. If you guys offer
a better ecosystem where tf and spark work together (but don't really rip out
tf), then it is even more awesome.

~~~
agibsonccc
Not just TF: pytorch, keras,mxnet,.. Like I said: We won't be the #1 framework
data scientists use. There's a ton of churn in that space yet.

Granted, a ton of it is TF. Google is doing an amazing job now.

We're trying to be a moderate middle ground.

TF and pytorch and anything in python tends to be an interface.

Core logic still operates in C/C++.

That's great when you want speed, but you end up missing the benefits of the
JVM (the tooling is great for monitoring and things folks need at scale).

So we do everything via JNI where we push down the math one block with minimal
overhead if any.

That and if you want to write production code, you don't need to push logic
down to c. You can do something JVM based instead, which gives you
kotlin,scala,clojure,..

~~~
sandGorgon
i think this aspect of dl4j gets lost in the overall message. For me, this is
much more powerful - "use dl4j if you want to have a seamless experience
running tensorflow/keras/mxnet on spark".

Because right now, it looks like it is tensorflow vs dl4j.

~~~
agibsonccc
Honestly hard for me to care..They can both compete as well as integrate.No
offense here but model import existing will become a common thing as things
like ONNX become standard.

We play up what's unique about Dl4j anyways. It's right in the name. We're
square focused on the JVM.

We also hedge our bets against spark. Most of our logic doesn't even run on
spark.

Spark is just a facilitator..it's not where anything that matters runs. We
could just as easily use flink or apex here too. Those are also JVM based
streaming engines.

We can't overplay spark because most of our deployments won't even be with
spark. We don't even use spark in our own production inference tools. We just
deploy as a microservice.

~~~
sandGorgon
Well interesting parallel you give here !

I have personally been campaigning for ONNX to merge/leverage Apache Arrow
([https://news.ycombinator.com/item?id=15195658](https://news.ycombinator.com/item?id=15195658)).
It probably makes sense for the efforts to build on top of each other.

YMMV ;)

~~~
agibsonccc
Integrating with arrow is on our bucket list as well. We plan on integrating
with their tensor data type and format.

I feel like you're trying to cross 2 worlds that shouldn't be for arrow/ONNX
though. You're trying to map a neural net description language to a columnar
format..that doesn't make any sense to me. In our case, our runtime will
understand both but for different reasons though.

We will import the format for our neural nets and integrate with arrow for our
ETL -> tensor pipelines.

------
agibsonccc
For those wondering, why the heck are you guys java:

Our tensor library has a python interface in the making:
[https://www.slideshare.net/agibsonccc/strata-
beijing-2017-ju...](https://www.slideshare.net/agibsonccc/strata-
beijing-2017-jumpy-a-python-interface-for-nd4j)

Our goal with this will be then to write side by side benchmarks with the
other libs that people can easily run. We know a lot of folks from python land
won't jump over and we don't expect them to. The hope is that we're equivalent
and can integrate better with a big data cluster.

Dl4j is not trying to be TF. When I started dl4j 4 years ago, it was theano
and torch as the primary frameworks.

I wrote it for deployment in to production apps and for the hadoop/spark
ecosystem.

We will continue that going forward at the eclipse foundation as well as using
dl4j in our product.

Please check out the oreilly book as well (currently #2 on amazon right next
to the goodfellow book):

[https://amazon.com/Deep-Learning-Practitioners-Josh-
Patterso...](https://amazon.com/Deep-Learning-Practitioners-Josh-
Patterson/dp/1491914254)

Email in profile if there's any specific questions.

