
The beauty of functional languages in deep learning – Clojure and Haskell - wickwavy
https://www.welcometothejungle.co/fr/articles/btc-deep-learning-clojure-haskell
======
mark_l_watson
I have spent a lot of time experimenting with the Haskell bindings for
TensorFlow and a little time with HLearn. Sorry for a negative opinion, but as
much as I don’t particularly like Python, I would suggest to almost anyone
that they stick with Python for deep learning for now. In a year or two, I
might recommend TensorFlow implemented turtles all the way down in Swift, but
let’s wait and see how that project progresses.

I have only spent a few evenings playing with Clojure and mxnet and while I
appreciate the efforts of the Clojure mxnet subproject team, I think you are
still better off for now with Python, TensorFlow, and PyTorch.

A little off topic: I had a deep learning example for Armed Bear Common Lisp
(implemented in Java) and DeepLearning4j in the last edition of my Common Lisp
book. In the latest edition of my book, I removed that example and the chapter
that went with it and replaced it with two examples of Lisp code using REST
services written in Python, SpaCy, and TensorFlow - I think that is more
practical right now; the situation may change in the future.

EDIT: I also added REST examples using Python, SpaCy, and TensorFlow to the
second edition of my Haskell book.

~~~
melling
How quickly is Swift progressing for Deep Learning?

In this podcast, Jeremy Howard ([https://www.fast.ai/](https://www.fast.ai/) )
also sounded excited about Swift:

[https://lexfridman.com/jeremy-howard/](https://lexfridman.com/jeremy-howard/)

~~~
mark_l_watson
Thanks for the link for Lex’s interview. I watched it last week, a fascinating
conversation. Jeremy is teaching a class with the Swift version of TensorFlow,
which I experimented with but setup on my Mac sometimes worked, but a update
broke it for me.

------
dig1
Some observations:

* Parallelism in Clojure is cheap

Parallelism is Clojure is not cheap if you are using ordinary threads, due all
machinery necessary to setup and start OS thread (the same applies to
Java/C(++)). If you are using core.async, you will get cheap green threads.

However, parallelism in Clojure is extremely easy, especially switching
between non-parallel and parallel implementations (map -> pmap or using
reducers).

* (Lack of) Libraries and Limitations

This is common misconception between Clojure novices or someone wanting to
start with Clojure. Clojure embraces Java and, using Clojure primitives, you
can easily make your Java code functional and safe. Lack of ML libraries in
pure Clojure just means that wrapping TensorFlow java code will require couple
of more Clojure functions.

For me, TensorFlow API isn't the most friendlier API out there and I have
impression it is specifically designed for Python by C++ programmers. You may
want to check alternative DL4J [1].

[1] [https://deeplearning4j.org/](https://deeplearning4j.org/)

~~~
otabdeveloper4
> due all machinery necessary to setup and start OS thread

The machinery to start an OS thread is on the order of magnitude of
nanoseconds. Starting an OS thread far cheaper than printing a line to stdout,
for example.

OS threads are only a problem in interpreted languages, because all of them
have a global interpreter lock of some sort.

For compiled native languages OS threads are more efficient than green
threads.

~~~
CraigJPerry
OS threads means context switching which means relinquishing the cpu from user
space to kernel space for a time.

Green threads is pure user-space. No expensive context switch.

~~~
otabdeveloper4
> OS threads means context switching which means relinquishing the cpu from
> user space to kernel space for a time.

No. The OS will switch contexts regardless of how many threads you have. Even
with zero threads the OS will be context switching between processes anyways.
That's just how pre-emptive multitasking works.

What you're trying to say is that threads might get starved due to imperfect
scheduling; but even that is wrong - you're never going to write a better
scheduler than the one in the Linux kernel. I mean that seriously.

If you want a deterministic scheduling routine without unpredictable delays,
then just use one of the realtime schedulers provided to you by the kernel.
That's what they're there for.

~~~
CraigJPerry
> The OS will switch contexts regardless of how many threads you have

With IRQs pinned (via CPU mask given to the kernel at boot time) and
application process pinning, there will be no preemption of the application
process.

> Even with zero threads the OS will be context switching between processes
> anyways

No, this is what pinning is explicitly for.

> What you're trying to say is that threads might get starved due to imperfect
> scheduling

No, you've misunderstood

------
Athas
While I don't know that much about deep learning, last year I had a student
who did, and he implemented a deep learning library in Futhark, a parallel
functional language. Performance was decent on the small networks we ended up
with, but I'm skeptical about the ability of functional languages to compete
directly with specialised languages like TensorFlow (although I find
TensorFlows Python API to be bad). In particular, the assertion in TFA that
implementing multicore parallelism in Haskell is "easy" is a gross
oversimplification. It is by no means easy to implement deep learning directly
in Haskell (or most other functional languages) with any kind of acceptable
performance, and parallel computation in Haskell is in general a tricky
subject (easy to get right, very hard to make it run fast).

In practice, people do who machine learning with Haskell seem to treat it more
of an instrumentation language, for putting together building blocks written
in other languages, kind of like Python is for TensorFlow.

code:
[https://github.com/HnimNart/deeplearning](https://github.com/HnimNart/deeplearning)

paper: [https://futhark-lang.org/publications/fhpnc19.pdf](https://futhark-
lang.org/publications/fhpnc19.pdf)

------
Heliosmaster
I want to pitch in to suggest the Book by Dragan Djuric [0]: Deep Learning for
Programmers: An interactive Tutorial with CUDA, OpenCL, MKL-DNN, Java and
Clojure"

[0]: [https://aiprobook.com/deep-learning-for-
programmers/](https://aiprobook.com/deep-learning-for-programmers/)

~~~
p1esk
I love how he mentions “no C++“ four times in his pitch :)

------
visarga
> at each layer of the neural work

This article is poorly written, with mistakes and lack of domain knowledge.

> in deep learning, when functions are applied to the data, the data does not
> change

Except for in-place operations.

> Clojure doesn’t replace the Java thread system, rather it works with it.
> Since the core data structures are immutable, they can be shared readily
> between threads

Missing the point - basic ops in DL need to be optimised to an extreme level,
and that implies using C.

~~~
fock
so I'm not the only one wondering how some threading examples (in Clojure?)
generalize to (suggested) applicability of these language for ML...

------
lapink
Thread level parallelism is different from GPU parallelism. Different threads
can perform completely independent operations at any time. GPU threads must do
exactly the same operations, but on different memory locations, at all time.
In exchange for this rigidity, we can pack a lot more of them on silicon than
CPU. A CPU thread is like a complete individual that can do anything they
want. A GPU thread always is part of a pack, and they all move together.

The nice parallelism allowed by Clojure is for CPU threads not GPU threads. It
would still need to rely on an external library for tensor operations, for
instance ATen [1], the C++ backend of PyTorch.

On the other hand, Functional Programming can be useful to describe the model
at a higher level and better handle the scheduling of each component
(Convolution, LSTM, etc) on GPU. When training model, the batch size already
allows near optimal usage of a GPU cores, however when doing evaluation, this
becomes more relevant.

[1] [https://pytorch.org/cppdocs/](https://pytorch.org/cppdocs/)

~~~
xpertmadman
While Clojure code itself is not running on the GPU, we still do have Deep
Learning in Clojure enabled by Neanderthal/OpenCL.

/u/dragandj is quite active here, you can ask him!

------
jwr
I would suggest using partition-all, not partition (in Clojure), unless you
are sure that the number of items is an integer multiple of the partition
size. Otherwise you are in for a surprise as some of your items will end up
unprocessed.

------
make3
This is pretty much the reverse of what happened in mainstream deep learning.
Most people learned with Tensorflow, which is arguably functional, and then
decided PyTorch (which is like numpy, imperative) was a lot more
straightforward to work with, to the point where Google decided to make
Tensorflow 2.0 imperative by default in order to stop losing ground

------
rsp1984
Someone please correct me if I'm wrong but this article suggests to perform
training of DL models in production on the CPU?

Also why does the choice of language to specify my DL model matter that much
if the low level number crunching is abstracted away anyway into a highly
optimised component running on whatever processor / accelerator is most
suitable?

~~~
keymone
> why does the choice of language to specify my DL model matter that much

because model specification is also code that needs to be amenable to change,
tuning, refactoring. it needs to be expressive enough to test new ideas
quickly without hassle.

basically just list all the reasons that drove language design in past half a
century.

~~~
lonelappde
Language doesn't matter; just use linpack, right ;-) You can use any language
you want as long as you do the real work in FORTRAN.

Is TF the new linpack?

------
turingbike
I was hoping that this would be related to Colah's "Neural Networks, Types,
and Functional Programming" [https://colah.github.io/posts/2015-09-NN-Types-
FP/](https://colah.github.io/posts/2015-09-NN-Types-FP/) (but it is not). In
it, he establishes correspondances between different NN architectures and
different FP tricks like `Zipped Left & Right Accumulating Map = Bidirectional
RNN`.

Does anyone know any other resources that take this approach?

~~~
joker3
Backprop as functor
([https://arxiv.org/abs/1711.10455](https://arxiv.org/abs/1711.10455)) is a
paper that explores similar ideas.

------
leblancfg
I get the overall idea _why_ they make sense. But the fact that the author
does not address GPU acceleration either means that she's not thought about
that, or that she thinks its implementation is trivial.

Either way, I would need a deeper dive along those lines to be convinced of
that the argument has real-world merit, and can actually be implemented in
practice.

FWIW my estimate is that 90% of production training loads in the wild are done
on GPU. Please correct me if my assumption is wrong.

------
workthrowaway
always wonder why functional languages or constructs are often described as
being beautiful or using other aesthetic terms. just strange that they are the
only ones we ascribe such attributes, among the few paradigms and the many
coding styles out there.

~~~
mlevental
this is a pet peeve of mine - the use and abuse of humanist terms in stem. you
see this a lot in a math - beautiful proofs, beautiful formulas, etc. I think
it's a really weird perverse even thing to look at something purely formal
(mechanical) and call it beautiful - like brutalist architecture - I think it
says something about the person making the claim (that they appreciate
structure to an inhuman extent).

~~~
tmountain
Humans naturally associate aesthetic qualities with all sorts of things. An
appreciation for the things around you makes life richer. Maybe you don't like
the word beautiful in this context (or feel like it's overused), but I wonder
if you'd argue with using the word elegant to describe something like
quicksort or a cleverly composed fibonacci function? Beauty takes many forms,
and getting hung up on nomenclature seems silly TBH. Characterizing this
appreciation as perverse and inhuman is essentially taking offense to someone
else's enjoyment in something they appreciate. Seems pointless.

~~~
mlevental
elegant is fine. beauty and elegance aren't the same thing.

>Characterizing this appreciation as perverse and inhuman is essentially
taking offense to someone else's enjoyment in something they appreciate. Seems
pointless.

it's a humanist critique of the use of language for rhetorical purposes...?
the purpose is exactly to investigate what the significance of that particular
word choice is.

~~~
fnordsensei
Does elegance describe well-defined observable properties that are independent
of the observer?

The point being, that if not, we're still within the domain of the subjective.
It seems arbitrary to allow one set of subjective descriptions and disallow
another.

~~~
mlevental
elegant: "the quality of being pleasingly ingenious and simple; neatness."

definitely less subjective than beautiful. and it's not arbitrary - i've
alluded to an argument that can be made. i'm not going to spell it out because
as usual no one on here is receptive anything that doesn't valorize stem.

~~~
jjuel
Beauty: "a combination of qualities, such as shape, color, or form, that
pleases the aesthetic senses, especially the sight."

So I cannot like the shape or form when I look at a proof or function? I
cannot have my aesthetic senses pleased by looking at STEM field things?

~~~
mlevental
i'm sorry but what is the shape or form of a proof? a function maybe - and
calligraphy is certainly an art but what's wholly different from what the
function encodes.

~~~
tempWinHater
Proofs have structure, i.e., shape or form.

~~~
mlevental
they have a logical structure (claim, lemma, lemma, proof, etc.). they do not
have a shape in the sense that the person i responded to intends.

------
gussmith
Related: TVM's ([https://tvm.ai](https://tvm.ai)) intermediate representation
Relay is a functional, ML-like (ML as in ocaml) language. It's being actively
developed here at UW, with constant improvements to support the latest models
(eg dynamic models including language models and graph convolutional
networks).

Disclaimer: I'm doing my PhD at the University of Washington and work with the
Relay people.

------
chewxy
The first versions of Gorgonia[0] (obvs not called Gorgonia) was written in
Haskell.

I found it difficult because Accelerate wasn't there yet when I wrote it. And
cofounders found it difficult to understand (because Haskell was a strange
language).

On the one hand it is easy to implement a computation graph in Haskell. On the
other hand it is not so easy to implement an efficient kernel for which the
computation of the values would run - you end up writing weird looking C that
looks like Haskell, which I submit is not necessarily a good thing.

Working with Gorgonia IMO has been the clearest thing for deep learning for
me. Granted it's a bit biased because I wrote the damn thing.

Nonetheless, writing it first in Haskell had its benefits, the structure was
clear and hence it has imparted some design decisions in the Go library.

From time to time I still have to add new features to Gorgonia, so I still
plot out roughly using Haskell. Nowadays I recommend Grenade for anyone
wanting to play with deep learning in Haskell

[0] Gorgonia is a TensorFlow/PyTorch equivalent written in Go.
[https://github.com/gorgonia/gorgonia](https://github.com/gorgonia/gorgonia)
if you are interested.

------
samcodes
I wanted this article to convince me, but they really don't acknowledge the
reason that everyone currently uses Python - the libraries. Do linear algebra
using lightly wrapped C? numpy. What about NLP? spaCy. Implicitly specify a
computation graph with high level code? PyTorch. Explicitly specify a
computation graph using leaky C++ abstractions? TensorFlow. And using either
PyTorch or TensorFlow, you get to interact with CUDA.

For now, if you want a functional language for doing deep learning, IMO that
language needs Python interoperability. Long-term, I'm hopeful that GraalVM
[0] can provide a way of calling Python from the JVM, but until then, I think
the best option is coconut-lang [1], "a functional programming language that
compiles to Python." You get pattern-matching, TCO, the pipe operator, and
ADTs, all while being one AOT compilation step away from Python.

[0] [https://www.graalvm.org/](https://www.graalvm.org/)

[1] [http://coconut-lang.org/](http://coconut-lang.org/)

~~~
zgramana
If you haven’t checked it out already, you should take a look at Jython _. As
the name suggests, it’s a Java implementation of Python.

_ [https://www.jython.org/](https://www.jython.org/)

~~~
turingbike
I think the reason the parent mentions GraalVM instead of Jython is that
Jython can't use C libs like numpy. But GraalVM lets you compile JVM to a
native binary, so maybe there is some way to use Cython in JVM.

~~~
zgramana
JyNI* was created to bridge Jython to Numpy. I don’t have any personal
experience with it, but one need not wait for a GraalVM solution.

* [https://www.jyni.org/](https://www.jyni.org/)

------
_visgean
There is a lot here about performance - but I think much more important aspect
here is the training - in python its extremely easy to manipulate with files
and other data sources - sometimes its quite ugly but its simple, whereas
using for example Haskell for data processing wont be as simple...

------
truth_seeker
One of the most striking difference between Clojure and Haskell:

In Haskell Immutability is an Abstraction, not Implementation. Clojure data
structures and INTEGER & FLOAT object wrappers are expensive. Haskell build
toolchain produces much-optimized code which preserves the functionality at
high level and reasonable performant mutating code at a low level.

~~~
misja
Yes Haskell performance can be impressive. Check out this one for instance
[https://stackoverflow.com/questions/6964392/speed-
comparison...](https://stackoverflow.com/questions/6964392/speed-comparison-
with-project-euler-c-vs-python-vs-erlang-vs-haskell), and don't forget to read
the comments section.

~~~
truth_seeker
Thanks for the link. I have seen it before.

I am very much in love with Haskell build toolchain and different sort of
build flag options it provides.

------
wilsonthewhale
As far as I know, Clojure usage is on the decline at Facebook. It was only
used by an acquired subsidiary (Wit), and lack of internal support meant
migrating to another supported language.

~~~
FPGAhacker
I certainly don’t know what the future holds for clojure, but I feel that when
the community adopted Slack, they lost a fundamental learning resource and
talent attractor.

Why do I say this? Because very interesting and deep and enlightening
conversations and explanations and questions used to be in a forum environment
where threads of conversation, topically organized, were easily followed and
might be resurfaced and continued years after they had gone dormant.

All that is lost now. Slack was embraced, no doubt for the immediacy of
response, and topical history is vanished. Everyone jumped onto slack to try
it out, and people stopped checking the forum altogether.

The forum is a wasteland now populated with occasional version bump
announcements. Nobody bothers to post there because nobody reads it anymore.
And the wisdom of the community is vaporizing as soon as it forms.

It’s a massive loss, and deeply tragic. Beyond the insight and wisdom, the
forum had a cohesive effect. It was a grounding, a center and base. Slack is
transient stream of consciousness.

Even with logs, the slack history is worthless by comparison.

Until a medium of exchange develops that the community wants to adopt, which
can serve as a useful knowledge aggregator, I think clojure is on the path to
obscurity.

~~~
xavi
I didn't like the adoption of Slack either.

I think [https://clojureverse.org/](https://clojureverse.org/) is a much
better place for discussion.

~~~
FPGAhacker
Thanks for that! I wasn’t aware it existed.

------
magwa101
Data has to be modified and saved.

------
codesushi42
The author seems to be confused and uninformed about how parallelism is
implemented in ML libraries. Libraries don't mostly rely on spawning threads
for parallel execution! Parallelization happens at the hardware level using
SIMD by running the calculations for backprop on a GPU/TPU/DSP. Not by
creating new threads on the CPU. Because of this, you get the same type of
isolated execution you would have with a functional language.

Which leads to the second point. Libraries like TF are already designed around
a data flow execution model and provide functional APIs. It is nonsense to
assert these libraries are missing something fundamental that Clojure and
Haskell would magically fix.

~~~
K0SM0S
This is beyond my level of understanding but I believe there are physical
implementation paradigms in all languages that make them either capable or not
of performing certain things. While you can always bend a turing-complete
language to do the logic, it doesn't mean your abstractions carry all the way
down to hardware — there is "overhead" or "loss of efficiency" if you will.
You get the convenience, and perhaps organize the safety; but it only goes so
far.

For instance Go was designed based on SCP (notably), a theoretical language
designed by Tony Hoare in 1978 which implements a workable and fairly
efficient concurrent paradigm. At the compiler then runtime level (low-level
things you'll never be able to 'change' with libraries), when the program is
built then executed, the implementation is directly able to translate into the
physical topology of multi-core machines (Go design began in the late 2000's,
shortly after Intel released the Core Duo, first multi-core CPU).

Now you can do parallel and/or concurrent in other languages, of course, but
since they were not designed as such, it's a convoluted programming exercise
to say the least. For proof, observe how little software is able to multi-
thread efficiently today, how very little is built in a concurrent approach —
it's just a mess in most languages, the added complexity not worth the cost.

This is an example of how you can get the functionality of a fundamental
feature but the cost, the implementation curve is so steep that it doesn't
really fly — reality, money, skill pool, all these things.

On topic, functional languages like Haskell _do_ indeed "fix" or rather
implement some features in more efficient ways than e.g. Python; it's always a
trade off you know. No best tool, only tools best suited to a given case.

~~~
codesushi42
The point was that the concurrency model of these ML libraries is not
dependent on a CPU for backprop (ideally), so discussing language level
concurrency as an advantage is irrelevant. The parallelism is not implemented
at the language level, nor does it rely on language constructs for
synchronizing shared memory. The parallelism happens at the hardware level
(e.g. with SIMD). And in the model architecture (e.g. with convolutions).

So the advantages you posited in your lengthy diatribe are meaningless,
because you failed to take into account the problem domain, along with utterly
failing to grasp what gets calculated during backprop. Plus, backprop doesn't
get calculated in Python.

The argument about functional vs imperative languages is old, tired and is not
relevant to the subject. Please at least learn some fundamentals about NNs and
ML frameworks before contributing a long, uninformed response.

~~~
K0SM0S
> the concurrency model of these ML libraries is not dependent on a CPU

Gotcha, my bad.

> Please at least learn some fundamentals about NNs and ML frameworks before
> contributing a long, uninformed response.

My apologies. It's not like I enjoy wasting people's time, starting with mine.
It was an honest mistake, point taken, and thanks for explaining.

If you have a somewhat 'definitive' book or chapter/resource to share on
"understanding concurrency"... please do so. I'm eager to learn, and I'd wager
many reading such a thread.

~~~
codesushi42
See Udacity's course on parallel programming:

[https://www.youtube.com/watch?v=F620ommtjqk&list=PLGvfHSgImk...](https://www.youtube.com/watch?v=F620ommtjqk&list=PLGvfHSgImk4aweyWlhBXNF6XISY3um82_)

I don't think it touches upon ML, but it is relevant for understanding the
difference between GPUs and CPUs. You can go through the intro to backprop
lectures on deeplearning.ai for ML.

