
Deep learning experiments in OCaml - yminsky
https://blog.janestreet.com/deep-learning-experiments-in-ocaml/
======
xvilka
Sorry for repeating myself, but since there is a machine learning and OCaml it
worth mentioning Owl [1] - library for numeric and scientific computations,
including ML.

[1] [https://github.com/owlbarn/owl](https://github.com/owlbarn/owl)

------
phonebucket
This is great. Functional languages have such an elegant representation of so
many mathematical concepts. It's a bit of a shame that they don't have more
widespread use in scientific computing.

~~~
flavio81
> It's a bit of a shame that they don't have more widespread use in scientific
> computing

In truth, they have. Lisp was the first functional language (or the first
language that allowed that paradigm), and has been used a lot in scientific
computing, for example doing symbolic calculus and manipulation.

~~~
nikofeyn
that doesn't matter when very few scientists have even heard of an ml
(standard ml, ocaml, f#) or a lisp/scheme (common lisp, racket), much less
have an inkling to use them. their use does of course exist but by no measure
is it widespread.

~~~
ced
That's part of what makes Julia promising, it's a numerically-focused Lisp,
without the parentheses.

------
yunfeng_lin
So much bashing on static typing on deep learning:) Does any one from Google
can explain the benefit since you guys are working on swift in tensorflow

[https://medium.com/tensorflow/introducing-swift-for-
tensorfl...](https://medium.com/tensorflow/introducing-swift-for-
tensorflow-b75722c58df0)

~~~
shoyer
Static typing for catching errors is only a small part of the vision for Swift
on TensorFlow. The real advantage of static typing is that it enables the
compiler to reason to about your code, e.g., to automatically rewrite it for a
hardware accelerator with guaranteed correct semantics:
[https://github.com/tensorflow/swift/blob/master/docs/DesignO...](https://github.com/tensorflow/swift/blob/master/docs/DesignOverview.md)

This is obviously possible in Python as well (e.g., see Numba) but is clearly
has additional challenges:
[https://github.com/tensorflow/swift/blob/master/docs/WhySwif...](https://github.com/tensorflow/swift/blob/master/docs/WhySwiftForTensorFlow.md)

(I work at Google, but not on the TensorFlow team.)

~~~
KenoFischer
Static typing has very little to do with what the compiler can say about your
code. You can have dynamic languages with very strong type systems and
semantics as well as static languages with weak semantics. The only difference
between static and dynamic languages is whether the compiler enforces
completeness of the analysis or not.

~~~
yunfeng_lin
I am not sure I agree with you. You do need compile time type to generate
efficient hardware accelerated code.

Python has strong type, but that is only available at run time, which is not
useful to generate code.

But now python also have optional type. this might be utilized in generating
more efficient code though

~~~
KenoFischer
Look at e.g. Julia for a dynamic system with a strong type system that allows
the compiler to reason about the code without forcing completeness.

~~~
yunfeng_lin
Julia has optional typing which is static as well

~~~
KenoFischer
As always in these sorts of discussions, it depends how you define your
terminology. However, by most definitions of static typing, Julia's type
system is not static. The julia type system is very much a property of the
runtime language and behaves as such. It is quite strong, true, but still not
statically enforced as you would expect from a static language. In particular,
(to the extent that you can identify one), you never get any sort of compile-
time type errors in julia.

------
pc86
So I was lost at the VGG19 example code, but probably because I have (a) no
OCaml experience; and, (b) no ML/NN experience.

Still seems interesting, though. If anyone has any suggestions on basic
sources for getting a background on the concepts here I'd definitely give them
a read.

~~~
hackermailman
Look through youtube for university lectures, like these ones
[https://www.youtube.com/playlist?list=PL_Ig1a5kxu57NQ50jSuf0...](https://www.youtube.com/playlist?list=PL_Ig1a5kxu57NQ50jSuf0cjTe3zPLqv1O)

Most intro classes just require familiarity with basic calculus
(differentiation, chain rule), linear algebra and basic probability all of
which you can just lookup directly on
[https://www.expii.com](https://www.expii.com) for a short tutorial. Toolkits
are usually in Python or Lua, plus the numerous textbooks 'Deep learning with
python' that are around and specific DL books such as
[http://www.deeplearningbook.org/](http://www.deeplearningbook.org/).

Afterwards look around for Adversarial Learning, like detecting perturbations
that force mis-classification and other attacks described in papers by Carlini
and Wagner. Currently there isn't a perfect defense developed for all of these
attacks, except robust optimization that provably defend some of them. Attacks
are an interesting area in DL you can get into since we don't have access to
large resources and can only do DL on a small scale (in my case anyway).

------
mlthoughts2018
I had a very unpleasant interview regarding deep learning with Jane Street. I
spoke to a member of their HR team to try to get significant assurances that
the interview would actually be focused on deep learning and not puzzles or
brain teasers, and that the job would really focus on deep learning for their
actual business, and not just be a proxy for being generally smart and then
work on whatever existing inhouse models. The HR employee reassured me
significantly on both points.

Then the interview was nothing but deck of card puzzles and random riddles
where you have to articulate a careful model of some physical quantity like
speed or frequency to solve the puzzle. I hate that junk, never found that it
correlates with a way of thinking that matters in quant finance (which I
previously did for a living) and suitably failed the interview. Worse, I would
have been happy to decline that interview and tell them I know I’m not their
guy if only the HR staff had correctly depicted the interview & job to me.

Ok, enough grumbling. From this actual blog post,

> “Type-safety helps you ensure that your training script is not going to fail
> after a couple hours because of some simple type error.”

I really think this way of thinking about static typing is a very bad thing.
This is not at all an actual benefit, because in any sane situation, you will
use unit and integration tests that execute extremely quickly on small test
data to exercise your end to end model training code.

What I currently do for this on my team is to always require that model
training programs are deployed inside of containers that capture not just the
state of the code, but also make it configurable to mount the training data
volume and pass in ENV that governs what the training job really is.

So then Jenkins or whatever will build the container for any PRs that seek to
implement or modify training, attach fixture data and fixture ENV settings,
and give you quick feedback about the whole end to end training, even
inclusive of GPU settings (we have to do a slight manual step to specify
Jenkins running on a GPU server, but this is a vestige of some of our infra
headaches).

The point is that adding all sorts of extra code to embody type annotations,
and limiting people from awesome dynamic typing features is a silly thing to
do if you’re worried about type errors ruining a long-running job. That should
be handled by fast integration tests.

Now, there are perfectly valid other reasons to like static typing. I just
always hear this one, especially in regards to Python, and it’s really the
wrong way to look at it.

The extra code and constraints of static typing are liabilities that should
have to offer offsetting value to choose them. You already need integration
and unit tests to reliably make changes and maintain the training code. If you
can get the same benefit of overall job safety (or even 99% of the same
benefit), from the tests, without paying the extra costs of static typing,
then don’t!

Turning it around to act like static typing is _de facto_ always a benefit is
a very one-sided way to look at it.

~~~
deepGem
The type safety argument is total BS. First of all the training script will
fail for the very first time if there is a type error. You'd be a moron to
pass an argument of a different type 'a couple of hours' into the training. No
sane programmer writes such code. What kind of nonsensical argument is this.

What I have found static typing to be really useful for is in remembering what
I have coded. It's quite hard to remember a dynamic type while you are writing
code, given the number of variables you are dealing with. Seeing that type
definition next to your variable name is a handy reference. I find it helpful
to speed up coding a bit and being able to remember a lot more clearly what I
have done.

~~~
nnq
> You'd be a moron to pass an argument of a different type 'a couple of hours'
> into the training

Huh?! At least in non-ML code this happens all the time, data fetched by
whatever thinggie that uses zillion chained libraries of code nobody has time
to audit, comes in hours or days late in a long running service blowing it
up... eg. "oops, point.x is now no longer and integer but more like a
map[ErrorObject->vector[int]]" bc something blew up in a very unexpected way
in some other nodejs code light years away from the business logic you hold in
your head... (yeah, the service gets restarted, but at some point some data
that should have been saved in the DB hasn't been am may need to be recovered
manually from some obscure log if even recoverable)

~~~
deepGem
Yeah I should have been more explicit that this comment is in reference to ML
code. Training is nothing but a loop so its unlikely you pass type A in
iteration 1 and type B in iteration 200. If that happens most likely your
training data is messed up and type safety cannot help you as you would have
compiled and tested for type A.

~~~
mlthoughts2018
You could also have a complex training job that trains a shallow model for a
while, then uses it to train a deeper model, or extract an embedding from some
layer and train a classical prediction model that uses the embedding as the
feature vector.

But the point is that the right way to ensure safety is with realistic
fixture-based integration testing. That’s not what static typing is for in
that type of use case and is not a de facto benefit of static typing.

------
preparedzebra
I'm not convinced that functional programming will grow in terms of devs using
it daily, but it has been very useful for myself in certain contexts
(especially when I wrote math based libraries using permutations, heavy
recursion, etc). The results of this seminar are awesome!

------
mark_l_watson
Very nice. I have spent many evenings playing with the Haskell bindings for
TensorFlow that don’t have the coverage these OCaml bindings have (e.g.,
character seq models).

I have thought of learning some OCaml, maybe this will give me the kick in the
butt to do it.

------
senorsmile
Am I the only one who gets confused by references to ML (ML derived typed FP
vs Machine Learning)? The threads on this page are the represent a strange
junction where I really have to think about what people mean, because they
really could mean either!

~~~
yaseer
This happens to me too, having dabbled with ML for theorem proving.

Thing is, ML is an obscure language for most people. The association with
machine learning probably dominates in 95% of people.

~~~
icc97
It's becoming less obscure, F# (.net), Elm and Reason (JavaScript) are
bringing ML to a wider audience. Plus Jane Street do a great job of promoting
the use of OCaml.

------
gaius
_Type-safety helps you ensure that your training script is not going to fail
after a couple hours because of some simple type error._

This isn’t a failure mode that ever happens in DL... 2 hours into the job you
will only be dealing with floats anyway no matter what language you are using.
If you’re going to fail on anything typed it will be in the first 20 seconds
probably, basically the instant you start your first epoch.

~~~
habitue
This is true, but it's also because Tensorflow _is_ a typed language, but it
uses the syntax of python. (At least the default way) in tensorflow, the graph
is type checked on startup, and it'll fail if anything is wrong.

Contrast this with pytorch, chainer, or tensorflow's dynamic computation
graphs and they're much more likely to have a bug that happens later, since
their graphs aren't verified up front.

Unfortunately, typed languages won't help you much there. A big reason people
use pytorch is because of its flexibility (i.e. they were bumping up against
the constraints of a static graph system and wanted out)

~~~
gaius
I admit I don’t have much experience with those, I’m a Keras and CNTK guy but
the principles will be the same: marshal your data into a huge matrix of
floats/one-hot and hand it off to training where it will spend 99.9% of it’s
time.

I am a fan of strong/static typing and was once very active in the OCaml
community but that just struck me as a very odd thing for the OP to say...
it’s just not something that people doing DL worry about. It could be valuable
in the marshalling phase but that all happens before DL begins and (in my
experience) in a separate program.

------
rememberlenny
For reference, Jane Street is financial firm known for their widespread use of
OCaml.

~~~
remify
I'd add that the author Laurent Mazare is a fucking brilliant person.

~~~
mi_lk
You know him personally? Or he has some prior works that we can take a look?

~~~
remify
Ahah no, I was just looking at his CV and the guy is jacked.

~~~
3rdAccount
Hahaha! This is a comment I would normally make and be scoffed at.

I'm always amazed at how smart some people are.

