
ScalaNLP – A suite of machine learning and numerical computing libraries - based2
http://www.scalanlp.org/
======
dlwh
So, this is (largely) my project.

Not sure why this is on the front page of HN, but I'm happy to answer any
questions.

I'm not really giving these libraries the love they need these days. I mostly
started them in grad school before the deep learning revolution really hit my
subfield (NLP), and I haven't had time to modernize them. They still have
their uses, especially Breeze, which is used in Spark's MLLib and directly by
a number of companies.

------
gravypod
Does Scala have a visualization suite like ggplot, seaborn, matplotlib, or
something similar?

What benefits does this provide over existing python or matlab code?

> Scientific Computing, Machine Learning, and Natural Language Processing

These seem to be three very different problems. Is there a reason why the
group of them is called "ScalaNLP"? If the libraries are generic enough then
shouldn't other uses be possible/supported?

Is there a reason this doesn't have a generic name similar to the SciPy stack?

~~~
dlwh
Main author here.

Breeze has breeze-viz, which is very basic but at the time there wasn't
anything else. I highly endorse using something else. I personally like
[http://sameersingh.org/scalaplot/](http://sameersingh.org/scalaplot/)

They're under the same aegis basically because they're all mine. ScalaNLP
started out as really being just NLP, but it scope-crept. That said, Epic is a
library for _structured prediction_ first and foremost, and one of the main
applications of structured prediction is NLP.

Breeze is basically like SciPy and large chunks of it power Epic. It's really
the only thing that doesn't belong in the namespace.

~~~
gravypod
I'm glad you're bringing something like this to the JVM/Scala-ecosystem.

There are some things that I've been interested in asking for in a high level
scientific computing library. If you're planning on continuing your
visualization library can you please come up with some solution for layout
specification? Whenever I'm plotting something and I spend 30 minutes getting
all of the data in order the last thing I want to do is fight with the
plotting library's label positions because they overlap. Or if I say "Let me
take this plot, add some more stacked subplots, and show different catagories"
I don't want my labels to be perfect but my scatters to be given a 10x10 pixel
box to draw into.

On the HP/numerical computing side of things have you looked into implicit GPU
operation types? Something that would let you queue up operations that can be
run on a parallel computing system. Basically describe complex operations with
the high-level object's normal operations. The objects aren't actually
calculating anything, they just organize a GPU kernel in the background. As
the final stage you can turn the

    
    
        gpumat a(3, 5);
        gpumat b(5, 3);
        gpumat gpu_op_queue = (a * b) + (a * b) * 5;
    
        function(a, b) operation = gpu_op_queue.compile();
        mat output = operation(some_3x5, some_5x3);
    

In the backend you'd hopefully be able to great your own types like 'cpumat',
'computerclustermat', or 'gpuclustermat'.

If you had some easy way to generically express extremely parallel numerical
operations, an abstract way of implementing high-performance back-ends that
take those operations and compile them to GPU kernels, and a visualization
engine that doesn't feel like it's from the 80s then your library will really
take off.

Personally I feel GPU-optimization and fighting with visualization libraries
are the two biggest pain points in scientific computing.

~~~
dlwh
Thanks for the questions.

I am very unlikely to take on visualization. I don't acutely need it for what
I do, and I am some-but-not-nearly-enough interested in visualization for its
own sake. I started to read about the grammar of graphics stuff at one point
and decided it was too far down the rabbit hole.

I have looked more into gpu stuff, and agree specifying a compute graph (and
then implicitly optimizing it) is more likely to be the future. FWIW, this is
basically what XLA (from TensorFlow) and whatever it was FB announced on
Friday are doing.

I wrote my thoughts up recently on the Breeze mailing list here:
[https://groups.google.com/forum/#!topic/scala-
breeze/_hEFpnI...](https://groups.google.com/forum/#!topic/scala-
breeze/_hEFpnI9gog)

I'm starting to think it through but I'm not sure I have time for that either
:(. A 4-month old and a startup take up a lot of time.

------
vonnik
David did a great job with ScalaNLP.

It happens to depend on Breeze. I would point out that Breeze does not support
n-dimensional arrays (most tensors), although that is necessary to do in deep
learning.

We wrote ND4S and ScalNet to solve that:

[https://github.com/deeplearning4j/nd4s](https://github.com/deeplearning4j/nd4s)

[https://github.com/deeplearning4j/scalnet](https://github.com/deeplearning4j/scalnet)

Moving computation out of Spark's MLlib and into lower level code like C++, as
we do with JavaCPP and libnd4j, also improves speed.

[https://github.com/deeplearning4j/libnd4j](https://github.com/deeplearning4j/libnd4j)

[https://github.com/bytedeco/javacpp](https://github.com/bytedeco/javacpp)

~~~
dlwh
Thanks for the kind words.

Breeze does a large chunk of (dense) compute via netlib-java, which calls out
to "real" lapack if you set it up. Are things really faster than that? Or are
you referring to the non BLAS/non Lapack things?

~~~
agibsonccc
Few things about netlib-java.

1: It's a read only repository now. It's retired. Lack of maintenance will
hurt its long term prospects.

2\. The license on net lib java's native binaries are not commercial friendly

3\. Net lib java does everything on heap with double arrays, we do everything
off heap. There's no copying to worry about, and there's a lot lower latency
and flexibility with our data buffers.

4\. Due to javacpp we have better control and interop with other c++ libraries
like opencv. This makes it easier to write native code and use it from java
later on. This allowed us to write and maintain all of our own c/c++ code with
the same api (see: nd4j there) -
[https://github.com/deeplearning4j/libnd4j](https://github.com/deeplearning4j/libnd4j)

So yes it ends up being faster in practice for a lot of scenarios. Aside from
that, we also have more control over the blas libaries we pick.

This means we also have access to cublas as well as (see below) more
configuration and flexibility.

Net lib java tries to be "pure" which, while elegant, isn't practical if you
want to benefit from gpus and DL. We implemented the proper shims to make
things "just work" from the user's perspective there on top of having more
flexibility (see: mkls opemp knobs etc)

Nd4j has its own built in garbage collector and memory management which means
we don't have to worry about any strange work arounds when working with
cpus/gpus _and_ we can keep off heap buffers in a managed manner.

See:

[http://deeplearning4j.org/workspaces](http://deeplearning4j.org/workspaces)

[http://deeplearning4j.org/native](http://deeplearning4j.org/native)

In general, "just blas" isn't enough. I know from personal experience. I wrote
nd4j after trying to use every java library for matrix compute and all of them
fell flat in terms of speed, interop with other c++ libraries, and the need to
use java arrays was highly limiting. Over the years, we built up nd4j to
handle harder scenarios.

This includes other features like distributed parameter servers among other
things.

Other things aside: I like what breeze attempted but it ultimately didn't
scratch the itch for me when I was looking hard at the various java matrix
libraries (I've tried all of them)

When I originally built out nd4j, it has this backend architecture:

[http://nd4j.org/backend.html](http://nd4j.org/backend.html)

It was so we could just use whatever matrix backend we wanted. None of them
worked well enough due to the flexibility we needed.

I also had an inherent problem with java based for loops in any setting. We
wrote our own forkjoin implementation as well attempting to make it fast and
it just couldn't beat plain c.

We've found especially after matrices of size 128 x 128 or so, we hands down
beat every JVM out there no matter what language is. The last bit we are
working on are smaller matrices.

The other problem we're working on is our sparse support could use some work.
The basics are in there but it's not quite ready for prime time yet.

After that, (I'm obviously biased) I don't see how anything could compete with
us. Especially after we add our autodiff/pytorch like stack on top of all
these primitives.

Hope that helps!

