
Fast Tensors in Clojure – A Sneak Peek - dragandj
https://dragan.rocks/articles/19/Fast-tensors-Clojure-sneak-peek?src=hn
======
dragandj
Some more coordinates related to this post:

Open source software:
[https://github.com/uncomplicate](https://github.com/uncomplicate)

Books: [https://aiprobook.com](https://aiprobook.com)

Deep Learning for Programmers: An Interactive Tutorial with CUDA, OpenCL, MKL-
DNN, Java, and Clojure

[https://aiprobook.com/deep-learning-for-
programmers/](https://aiprobook.com/deep-learning-for-programmers/)

Numerical Linear Algebra for Programmers: An Interactive Tutorial with GPU,
CUDA, OpenCL, MKL, Java, and Clojure

[https://aiprobook.com/numerical-linear-algebra-for-
programme...](https://aiprobook.com/numerical-linear-algebra-for-programmers)

~~~
sansnomme
Congrats on shipping! Are you going to be writing the JVM for Clojure people
book you suggested before? Am really looking forward to it!

~~~
dragandj
First I have to finish the ones that are in progress :)

------
fnordsensei
I can recommend this episode of The REPL podcast, where the author talks about
some of the whys, hows, and current state of data science in Clojure:
[https://www.therepl.net/episodes/25/](https://www.therepl.net/episodes/25/)

------
dkersten
Dragan, thank you for your continued hard work (and your well written posts!)
I haven’t got the time yet, but I’m very much looking forward to reading both
your series of “deep learning from scratch” posts and your books.

------
Scarbutt
What's the pitch in using Clojure for data science instead of Python,
production workloads?

~~~
thom
You don't have the mature bindings to things like TensorFlow or Torch, you
don't have good viz libraries, you don't have broad support for the types of
analysis scipy allows, and beyond Weka and random stuff like XGBoost having
Java bindings, you don't have access to a lot of different models.

That said, Clojure is _much_ better than both Python and R for data prep. You
can build very nice, fast (parallel) pipelines with transducers etc, and stuff
that seems like magic to tidyverse consumers in R is just everyday data
transformation in Clojure. And despite the fact that Incanter more or less
died, I still think the language would be a great fit for data science if the
community was there, and Dragan's work really deserves that sort of attention.
The foundations are already far superior to what's available in R and Python
(e.g. you are doing stuff on the GPU on day one, you can do bayesian analyses
in some cases thousands of times faster than Stan etc).

~~~
mumblemumble
You don't need to sell me on Clojure being a nicer foundation than Python in
most respects, but the thing I keep running afoul of when doing data science
on any JVM language the performance hit from all the copying it takes to pump
data back and forth across JNI.

The showdown that's more interesting to me is Clojure vs Julia, which is very
nearly an acceptable Lisp, and also has a nicer interface to C libraries. And,
IIRC, also the ability to interface directly with C++ libraries, without
having to first wrap them in a C-compatible interface.

~~~
dragandj
There is no copying back and forth across JNI, thus no particular performance
hit there (in Uncomplicate libraries).

~~~
mumblemumble
Well, there wouldn't be once data is already copied into Uncomplicate data
structures. But surely you can't just pass a pointer to the guts of a Java
array, and do have to copy data back and forth to get it into Uncomplicate
data structures in the first place, don't you? Otherwise, how does the
C/Fortran/whatever code deal with the fact that the JVM's garbage collector
reserves the right to move data around?

~~~
dragandj
Why would you pass a pointer (or the contents it points to) to the guts of a
Java array? Neanderhal does not require Java arrays (although it supports
transfer to/from arrays for convenience).

Please try Neanderthal; there are lots of getting starting resources. You can
benchmark it yourself (very easy to do in Clojure) and see...

I assure you that the only copy you would need is the same one you need in C,
C++, or any language: the one from the source of your data (IO such as
database, network, scv string etc). And even this is not required if you
initialize the vectors randomly (which is often the case).

