
Draft of OCaml Scientific Computing book - mseri
https://discuss.ocaml.org/t/ann-draft-of-ocaml-scientific-computing-book
======
G4BB3R
Each year I think wether should I learn OCaml or not. What is the current
state of multi-thread OCaml? Is that a game changer or just a cool feature? I
can't understand why OCaml doesn't have mass adoption.

~~~
steev
> I can't understand why OCaml doesn't have mass adoption.

Probably for the exact same reason each year you think about learning OCaml
you decide not to (I mean this sincerely, not trying to be snide).

I think the main reasons it doesn't see mass adoption in industry:

* There are only two major companies that do a substantial amount of OCaml that I can think of off the top of my head, Jane Street and Ahrefs. Facebook does some OCaml too but I don't think it's a core part of their stack.

* The tooling is lacking.

* People have an easier time learning Python or Java so you'll have a larger pool of candidates if you use one of those languages.

~~~
vmchale
> Facebook does some OCaml too

Using an ML or a Lisp for language tooling is the way to go. Nothing else in
that league.

~~~
weeniehutjr420
How about Haskell?

~~~
choeger
Haskell is like the nerdy cousin of the ML family. Everyone knows he's smart,
but no one would think to ask him to fix the kitchen sink or install the new
dishwasher.

Seriously, Haskell is massively impressive, both as a research language and as
an implementation. But it does not shed that certain research attitude. Every
known problem seems to be boring. "Oh you want a proxying http server? No
problem, this is just the inversion of the endofoo over the category of
abstract Monobars!". Sometimes I get the feeling that no one focuses on
shipping actual software with Haskell.

~~~
greggyb
Facebook uses Haskell in a critical portion of their user-content publishing
pipeline.

[https://engineering.fb.com/security/fighting-spam-with-
haske...](https://engineering.fb.com/security/fighting-spam-with-haskell/)

------
UncleOxidant
Love OCaml. One of my favorite languages. But I'm using Julia for this kind of
thing, it just seems much better suited. Likewise, I wouldn't use Julia to
write a programming language implementation, OCaml is much better suited for
that.

~~~
dunefox
It's the same for me, Julia is perfect for scientific programming - it has
basically replaced Python, it's even starting to be used at my work. I don't
think Ocaml has its place in this domain.

------
forgotpwd16
Online version is available on
[https://ocaml.xyz/book/](https://ocaml.xyz/book/).

Pretty interesting. Reading it, seems closer to a tutorial in using Owl, an
OCaml-written package for technical computing (what e.g. Matlab is; though
architecture differs according to post).

~~~
zelphirkalt
I am not sure why you are getting downvoted. If you had not posted the direct
link already, I would have done so. Perhaps the downvotes are about the second
part of your comment.

------
fluffything
Skimming through this book, one thing i was constantly wondering, is how well
does this ocaml framework use the hardware.

Leaving ocaml aside, the connection between scientific computing and hardware
is the one thing I miss the most in "scientific computing" books and courses,
because it sooner or later limits the science that any researcher doing
scientific computing can do.

To give an example, earlier this week, one of our scientists was waiting 10
minutes between each interactive iteration of their data-set, so I was called
to help, and the only feedback they gave was that "its slow", to which I
replied "slow with respect to what? how fast are you expecting this to be and
_why_?".

The answer to these questions is the difference between "maybe they just need
a faster computer", "maybe they need a different algorithm", or even "maybe
this problem cannot be solved today because computers this fast do not exist".

From their facial expression, it looked to me that they actually had never
thought about any of this, probably because whatever they did before was
always fast enough, but now this issue was limiting their science and they
were lacking the bare minimum set of tools to even get proper help.

If you are doing scientific computing, chances are that the problems you are
going to be dealing with are going to be getting bigger and harder as you
advance in your career. For many scientists, the first problems will actually
be big enough for the hardware to matter.

I wish scientific computing courses and books will at least provide the most
basic tools to these scientist for them to at least be able to get meaningful
help. Having someone on call for when this matters is quite expensive.

~~~
Bukhmanizer
Let’s face it, is this researcher ever going to read a book on scientific
computing in OCaml? Most researchers won’t even read a book on scientific
computing.

~~~
jimbokun
Why do you think that?

That's like saying "most programmers would never read a book about science, or
finance...", or whatever field they are writing software for.

~~~
gnufx
> Why do you think that?

Observation in research support, I'd guess. It typically no longer seems to be
the case that you do whatever you need to for your data.

------
cultus
In my opinion, static languages don't bring a whole lot to the table with
numerical math. There's not many types for one. You basically just use
matrices and vectors of floats most of the time.

What would really be a bigger deal is some limited dependent typing to avoid
errors from mismatched array sizes. Until then though, Julia is a bit more
practical choice for me.

~~~
srean
These old papers might pique your interest

Shape in Computing
[[https://dl.acm.org/doi/10.1145/234528.234749](https://dl.acm.org/doi/10.1145/234528.234749)
]

A Semantics for Shape
[[https://www.sciencedirect.com/science/article/pii/0167642395...](https://www.sciencedirect.com/science/article/pii/0167642395000151)
]

[https://www.semanticscholar.org/paper/The-FISh-language-
defi...](https://www.semanticscholar.org/paper/The-FISh-language-definition-
Jay/cc414b98ba5e65f47c18470a5739195f4a63a209)

[https://link.springer.com/article/10.1007/s100090050037](https://link.springer.com/article/10.1007/s100090050037)

The page for FiSH used to be online. I cant find it now.

~~~
cole-k
This sort of thing (if I understand the abstracts correctly) can also be done
with dependent types:
[https://www.cs.ox.ac.uk/people/jeremy.gibbons/publications/a...](https://www.cs.ox.ac.uk/people/jeremy.gibbons/publications/aplicative.pdf).

This paper was my introduction to dependent typing, so if you have a little
Haskell background, you should be able to grok its gist too.

------
ihnorton
> Indexing, slicing, and broadcasting are three fundamental functions to
> manipulate multidimensional arrays.

...

> Indexing and slicing is arguably the most important function in any
> numerical library.

These statements are undoubtedly true. The first question any practitioner
familiar with other systems will ask is: what does basic arithmetic, array
manipulation, and linear algebra look like?

But from what I can tell on a very quick skim, that question isn't really
answered until the section starting with these sentences, on page 123. I've
noticed this situation every time I look at the Owl docs webpage too, FWIW
(have not looked recently though).

I understand the need to be perceived as fully-capable for modern tasks -- and
that's fine for a 2-4 page set of teaser examples up front -- but I think this
book would become much more approachable if the basic mechanics of doing math
were presented first.

~~~
srean
If anyone wants a quick access to see how slicing and indexing is done
[https://ocaml.xyz/book/slicing.html](https://ocaml.xyz/book/slicing.html)

An older thread on HN on slicing
[https://news.ycombinator.com/item?id=20457884](https://news.ycombinator.com/item?id=20457884)

------
vmchale
Good stuff! Love OCaml and functional programming in the spotlight.

------
jpz
I think the work the author has done is amazing. I Just looking at the commit
history - the core contributor has definitely been super busy.

[https://github.com/owlbarn/owl/graphs/contributors](https://github.com/owlbarn/owl/graphs/contributors)

------
dunefox
Interesting idea to use an ML for scientific programming, but I don't see any
practical reasons not to use Julia or Python. I'd rather take advantage of
everything Julia already offers (+ Python with PyCall.jl) than wait for the
same support in a language not widely used in the first place.

~~~
mseri
You can use PyML to call python from OCaml in the same way, and it works fine
to pass an owl ndarray to numpy.

As a user of both, I think they have different treadoffs. I tend to use OCaml
when I am playing around with the code because I find it infinitely easier to
refactor (and to figure out what I was doing if I leave the code rotten for
too long)

------
rich_sasha
I'm beginning to think of learning either OCaml or F# for data sciency-kind of
things. Any points of comparison between those?

Library ecosystem seems better on F#, but I must admit I'm somewhat wary of
the behemoth that is .NET .

What else should I consider?

~~~
devmunchies
Data science and interactive programming are actually the main focus of the
next release of f#. Not saying it’s bad now, but Microsoft wants it to be a
strong option in data workflows.

[https://devblogs.microsoft.com/dotnet/announcing-f-5-preview...](https://devblogs.microsoft.com/dotnet/announcing-f-5-preview-1/)

I had originally been learning ocaml but switched to f# because it has much
better tooling and more uses (e.g. better web server support since I use .net
libs)

~~~
rich_sasha
Interesting, thank you!

From just preliminary research, F# seems both loved by its devs, but also
found to be a bit of an unloved child on a sidetrack - that was probably
another reason I hesitated in getting started. Do you think that is justified?

~~~
devmunchies
I wouldn't say its unloved or on a side track, it's baked into .net as a first
class citizen and has dedicated engineers at Microsoft working on it.

When I started I was pleasantly surprised how easy it was to download .net
core on a mac and not do anything else to use F#.

Its less popular than C# so its there's not as much documentation from 3rd
party sources, youtube videos, etc, but you can use any C# modules in F# so
you'll get used to reading C# docs.

I would highly recommend this overview of the .Net ecosystem:
[https://www.youtube.com/watch?v=bEfBfBQq7EE](https://www.youtube.com/watch?v=bEfBfBQq7EE)

I really like how it showed a preview of "modern C#" at 12:20 which is being
influenced by F#.

After that, this talk opened my eyes to the power of f# philosophy and type
system:
[https://www.youtube.com/watch?v=2JB1_e5wZmU](https://www.youtube.com/watch?v=2JB1_e5wZmU)

------
TallGuyShort
I've never encountered a real OCaml project or anyone who uses it in my career
(same is true for Haskell). I have assumed these languages are a hobby for CS
academics and get used for pet projects by their devotees. Not that that's bad
- they're interesting and the ideas are cool. I would just be afraid of
locking myself into an isolated ecosystem that it's hard to hire experienced
people for. Is anyone on HN actually using these languages for scientific
computing, or other large production projects? Curious to know what the pros /
cons are in practice and how common that is.

~~~
non-entity
I've talked to at least one persons who works on production Haskell
applications, and Jane Street, a company that was discussed on HN just
yesterday makes heavy use of OCaml, but yeah they're pretty rare and I suppose
the people working with them are just lucky. I've heard particularly about the
Haskell market that if you want a chance of competing for the few jobs
available you have to be among the top-haskellers, but I'm not sure.

~~~
Tarq0n
Jane Street being the only company that anyone ever mentions when discussing
OCaml is even worse in my opinion. It means the ecosystem is going to be
heavily driven by their needs, not to mention that banks tend to have
idiosyncratic development cultures.

~~~
mseri
It is not the only one though. There are Ahrefs, Tarides, Tezos, Citrix
(XenServer and a part of Xen are in OCaml), Inria, Facebook (for compilers,
typecheckers and ReasonML), Bloomberg (was Bucklescript/ReScript, now at
facebook though I believe).

There are also some academic projects with industrial uses. Directly to mind
come Coq, Frama-C, Mirage and the Zélus compiler.

EDIT: added Inria and Frama-C

------
srean
Owl, the array/scientific computing library that this book introduces, has
been discussed on HN before. Dropping those links here, in case people are
curious about the comments.

[https://news.ycombinator.com/item?id=20449595](https://news.ycombinator.com/item?id=20449595)

[https://news.ycombinator.com/item?id=14751236](https://news.ycombinator.com/item?id=14751236)

------
smabie
As someone that likes and uses OCaml a lot but uses Julia for scientific
computing, it's not worth it, just use Julia.

The Julia code is going to be shorter, faster, and more elegant. The libraries
will be sooo much better. The static typing of OCaml doesn't really help in
this area and sometimes actually hurts (statically typed DataFrames don't work
so well).

~~~
gnufx
What makes Julia code shorter and more elegant generally? I strongly disagree
with static typing not being useful in scientific computing. Most of that I
see isn't dealing with data frames, though I've seen confusion with types in
those with R users.

~~~
smabie
I would say the biggest difference is broadcasting. So let's say I have an
array of returns, r:

    
    
      r = [0.0001; -0.00002...]
    

I might be interested to find the cumulative returns:

    
    
      cumprod(1 .+ r) .- 1
    

Or just apply a function f to it:

    
    
      f.(r)
    

In Ocaml I can't vectorize any notation:

    
    
      List.map ~f:(fun x -> x - 1)) @@ List.cumprod(List.map r ~f:((+) 1)) 
    
      List.map r ~f:f
    

Julia allows you to write code in a very vectorized, array language style.
OCaml does not. This is big big issue, imo. Also with multi-dimensional arrays
and slice notation, Julia is just very convenient for working with higher
dimensional data. OCaml, to put it mildly, is not very good at this.

Scientific computing and array languages go hand in hand. Also the lack of
polymorphic functions is a big problem in OCaml. For example it would be
impossible to define an addition function in OCaml that transparently worked
with arrays:

    
    
      1 .+ [1; 2; 3] == [2; 3; 4]
    
      1+1 == 2
    
      [1; 2; 3] .+ 1 == [2; 3; 4]
    
      [1; 2; 3] .+ [1; 2; 3] == [2; 4; 6]
    

Julia makes this easy. A real array language like kdb+/q or J is even better.

This is great for linear algebra (matrices and tensors). You can write very
math-like equations using very high level functions. With OCaml you will
always be burdened with the nitty gritty of mapping and folding over the
lists/arrays.

~~~
mseri
owl ndarrays come with operations that support broadcasting though

------
logicchains
I wonder if there's any overlap between this and
[https://www.ffconsultancy.com/products/ocaml_for_scientists/...](https://www.ffconsultancy.com/products/ocaml_for_scientists/index.html),
or would it still be helpful to read both of them?

~~~
forgotpwd16
That's what the kind of book it came to my mind when I read the title. No,
they differ though there is slight overlap and yes, someone should read them
both. OfS first, OSC next.

Specifically. OfS is an introductory OCaml book (first half) which has example
usage of interest to scientists (second half). OSC though it has some
introductory text per section is mostly concerned in showing Owl, which is
what you'll end up using.

~~~
dna_polymerase
Is OfS available anywhere? Not that I advocate for piracy, but the site seems
abandoned and the payment link is dead.

~~~
gmfawcett
The author is 'jdh30' on reddit. You could always ask him.

