
Unix system programming in OCaml (2014) - jxub
https://ocaml.github.io/ocamlunix/index.html
======
themckman
For those interested, some of the underlying libraries that make up Docker for
Mac (and, I think, Windows) are written in OCaml (or have components written
in OCaml): VPNKit[0], DataKit[1] and HyperKit[2] (qcow2 support is implemented
in OCaml).

0: [https://github.com/moby/vpnkit](https://github.com/moby/vpnkit)

1: [https://github.com/moby/datakit](https://github.com/moby/datakit)

2: [https://github.com/moby/hyperkit](https://github.com/moby/hyperkit)

~~~
dfee
To be fair, GitHub attributes 1.5% of hyperkit’s code to ocaml

~~~
djs55
The 1.5% that's in the hyperkit repo is the shim to the Mirage block layer.
The qcow2 implementation that's normally linked in is from
[https://github.com/mirage/ocaml-qcow](https://github.com/mirage/ocaml-qcow) .

------
emmelaich
Ocaml, in comparison to other functional statically typed languages has had
many successful Unix applications.

For a great tour of going to Ocaml (from Python), see Thomas Leonard's blog.

A retrospective is here but read the whole lot.

[http://roscidus.com/blog/blog/2014/06/06/python-to-ocaml-
ret...](http://roscidus.com/blog/blog/2014/06/06/python-to-ocaml-
retrospective/)

~~~
cbcoutinho
And related hackernews discussion

[https://news.ycombinator.com/item?id=7858276](https://news.ycombinator.com/item?id=7858276)

------
a0
This is such a great book. OCaml’s type system feels like a superpower
specially in the context of Unix development which is traditionally done in C.

------
emersion
I'm contributing to a linker written in ~OCaml [1], and now I understand
writing systems code in OCaml is really a bad idea. It makes things more
complicated when you really don't want them to be. As James Mickens says [2]:

>You can’t just place a LISP book on top of an x86 chip and hope that the
hardware learns about lambda calculus by osmosis.

[1]: [https://github.com/rems-project/linksem](https://github.com/rems-
project/linksem) [2]:
[http://scholar.harvard.edu/files/mickens/files/thenightwatch...](http://scholar.harvard.edu/files/mickens/files/thenightwatch.pdf)

~~~
phaer
Can you provide an example of the things, OCaml makes more complicated and
explain which languages you compare it to? I guess it's quite different in
comparison to C, C++ or maybe Rust than to other garbage-collected 'system
programming' languages like Go?

~~~
emersion
Here are a few examples:

\- Inferred types: you're constantly wondering "what is the type of this
variable?". Type errors occur far from where the real type error is.
Annotating is complicated because you're often dealing with complicated types
(with nested tuples for instance).

\- Using higher-order functions: often you'll see a function with nested
functions in it [1], or some functional list processing function like "unzip"
being used [2]. This increases the code complexity a lot ("what does unzip do
again?").

\- Using monads: yet another distraction from what's really happening. It
forces you to write nested functions and doesn't play well with conditions
[3]. Also various people use monadic operators way weirder than >>= and all of
this becomes really hard to read.

\- Using functional-friendly data structures: the linker is using immutable
lists of optional bytes to represent memory images, because this is idiomatic
in OCaml. However this is slow and inefficient as hell. Really, I can't load a
60MiB binary because the linker runs out of memory and is killed by the kernel
on machine with 6GiB of RAM. Also, just linking hello-world takes ages.

\- Designing functional loops: when the loop you want isn't in your stdlib,
you first need to create a recursive function, and then make sure all
recursive calls are tail calls otherwise you blow up your stack. The result is
usually not very fun to read.

tl;dr: the C language is small, the OCaml language is not, and when you're
dealing with complicated stuff you really don't _need_ more complexity from
the language itself. James Mickens explains this (among other things) in his
essay.

[1]: [https://github.com/rems-
project/linksem/blob/master/src/link...](https://github.com/rems-
project/linksem/blob/master/src/link.lem#L820) [2]: [https://github.com/rems-
project/linksem/blob/master/src/link...](https://github.com/rems-
project/linksem/blob/master/src/link.lem#L341) [3]:
[https://github.com/emersion/linksem/blob/54c3a8430e621198653...](https://github.com/emersion/linksem/blob/54c3a8430e621198653744738933cdbea3151866/src/main_load.lem#L230)

(Note: set the syntax highlighting to "OCaml" when trying to read this source
code)

EDIT: list formatting EDIT 2: grammar

~~~
ernst_klim
>the linker is using immutable lists of optional bytes to represent memory
images

Why would you use linked list instead of Bytes, Bigarrays or Arrays? You are
doing it wrong, take a look at BAP's memory representation:

[https://github.com/BinaryAnalysisPlatform/bap/tree/master/li...](https://github.com/BinaryAnalysisPlatform/bap/tree/master/lib/bap_image)

[https://binaryanalysisplatform.github.io/bap/api/master/argo...](https://binaryanalysisplatform.github.io/bap/api/master/argot_index.html)

Looked at your code, damn it's bad, I'd really suggest to dive into BAP
sources and documentation a bit to understand how to write OCaml. It feels
like you are trying to write in OCaml as in some different language, Haskell
maybe, not using modules and functors. Reminds me how J people write C.

You could write `fun x y z -> ...` instead of `fun x -> fun y -> fun z ->
...`, it's also better to write signatures (with docstrings) in separate .mli
files (much easier to read and generate docs), if you use monads like maybe
(why not result) and option, it would be much more convenient to use bind
operators instead of `match ... | None -> None`

~~~
emersion
We need optional values for bytes, as a way to say "this byte is undefined",
so we can't use Bytes directly.

But yes, we're doing it wrong for sure.

EDIT: seems like you edited your reply, so let me add this to mine :)

>Looked at your code, damn it's bad, I'd really suggest to dive into BAP
sources and documentation a bit to understand how to write OCaml. It feels
like you are trying to write in OCaml as in some different language, Haskell
maybe, not using modules and functors. Reminds me how J people write C.

Oh, I also think it's bad don't worry. However we're using a subset of OCaml
(Lem) so we don't have things like modules for instance. And no, I really
don't want to start using complicated module features and functors, for the
reasons I explained above.

~~~
ernst_klim
Well, you could use arrays of optional bytes, but are use sure it's clever?

Maybe it's much better to define an abstract module of type

    
    
        module type Mem = sig
          type t
          val get_byte : t -> ~offset:int -> char option  
        end
    
    

And invent some clever (maybe lazy) implementation using just Bigarray?

BAP also has something like this in

[https://github.com/BinaryAnalysisPlatform/bap/blob/master/li...](https://github.com/BinaryAnalysisPlatform/bap/blob/master/lib/bap_image/bap_memory.mli#L22)

You could ask about BAP in general on

[https://discuss.ocaml.org/](https://discuss.ocaml.org/)

especially

[https://discuss.ocaml.org/u/ivg/summary](https://discuss.ocaml.org/u/ivg/summary)

(one of the guys behind BAP, very clever and talented in explaining stuff)

------
tntn
Only somewhat related,but I've tried to learn ocaml before but have never been
able to figure out when to put what kind of delimiters where. Can anyone
recommend a resource that explains the usage of the double comma and the like?

~~~
laylomo2
The double semicolon only matters in the REPL (aka the "toplevel").

Off the top of my head, I can think of at least two other situations where
punctuation matters:

1) Whenever you have nested match cases:

    
    
        match left with
        | Some left ->
          match right with
          | Some right -> Some (left, right)
          | None -> None
        | None -> None
    

The compiler does not use indentation to figure out scoping, so the above gets
treated as:

    
    
        match left with
        | Some left ->
          match right with
          | Some right -> Some (left, right)
          | None -> None
          | None -> None
    

So you need to surround the inner match with parens

    
    
        match left with
        | Some left ->
          (
            match right with
            | Some right -> Some (left, right)
            | None -> None
          )
        | None -> None
    

2) If you are doing imperative things inside an if statement:

Consider the following function:

    
    
        let cache_file file =
          if not (file_exists file) then
            printf "downloading file\n";
            download file
          else
            printf "file already exists\n"
    

The problem here is that OCaml allows one to create an if statement with no
else clause. And since the compiler doesn't use indentation to figure out
scoping, then the code is essentially treated like the following:

    
    
        let cache_file file =
          (if not (file_exists file) then printf "downloading file\n");
          download file;
          else (printf "file already exists\n");
    

Which is just totally wrong. So in order to do multiple imperative actions
within an if block, you should use either parens, or begin/end delimiters
(which are treated exactly the same as parens)

    
    
        let cache_file file =
          if not (file_exists file) then begin
            printf "downloading file\n";
            download file
          end else
            printf "file already exists\n"

~~~
dpwm
There's an interesting thing with the semicolon. I've heard people advocating
thinking of the semicolon as a binary operator:

(;) : unit -> unit -> unit

Of course, the reality is more complicated than this, because it can be used
at the end of an expression, where it behaves like a postfix operator of type
'a -> 'a. Let's take your example:

    
    
        match left with
        | Some left ->
          match right with
          | Some right -> Some (left, right)
          | None -> None
        | None -> None
    

you can actually do

    
    
        match left with
        | Some left ->
          match right with
          | Some right -> Some (left, right)
          | None -> None ; ; (* <-- two separate semicolons *)
        | None -> None
    

This doesn't just work for imperative stuff, like the operator theory of the
semicolon, it will work with this example.

My understanding is that this is due to the fact that optional semicolons are
allowed at the end of statements and close the current expression.

So we're closing the first expression (None) and then closing the second
expression (the inner match). The double semicolon is something else
altogether.

Though it is worth noting that this example is straying from idiomatic OCaml
and we can avoid this altogether with better matching:

    
    
        match (left, right) with
        | (Some left, Some right) -> Some (left, right)
        | _ -> None
    

Both the native optimizing compiler and BuckleScript are able to do what you
would expect with this: the matched tuple is optimized away.

Your second unfixed example actually causes a syntax error. There is an if
expression in OCaml and an if statement. This is particularly unfortunate,
because the let cache_file = function | file when file_exists file -> printf
"downloading file\n"; download file | _ -> printf "file already exists\n"
first semicolon within an if expression turns the whole thing into an if
statement, and an if statement can have no else clause. The else clause is
then without an if statement, because we've triggered the imperative if
statement. This is a syntax error.

were we to define an operator

    
    
       let (>>>) a b = a; b
    

then we could replace that semicolon with an >>> and things behave as
expected. But it's really that if can be both a statement and an expression.
For this reason I prefer using matches in general.

    
    
        let cache_file = function
          | file when not (file_exists file) ->
            printf "downloading file\n";
            download file
          | _ -> printf "file already exists\n"

~~~
theaeolist
OCaml has no guaranteed order of evaluation, which is not what you want with
(;). It needs to be part of the language as special syntax.

------
ofrzeta
For some real world applications take a look at libguestfs
[http://libguestfs.org](http://libguestfs.org), a library that can inspect,
mount, edit VM filesystem images. Also includes tools for p2v and v2v
migration.

------
rpcope1
I think one major turn off, especially for systems programming, was the lack
of multicore threading support (i.e. it had no way to get parallelism using
threads). Does anyone know if this has changed?

~~~
fpoling
One of the fastest web servers is nginx that uses child processes (typically
one child per core) and asynchronous IO within the child without any threads.
The same model with a parent process that monitor threadless child workers is
often used in embedded system. One can and people do exactly the same in
OCaml.

Surely it requires more code to setup things especially if the child processes
need to communicate over shared memory to max the performance, but the big
plus is that the resulting system is much more robust. In case of troubles one
can just let a child die or even use kill -9 and start a new child. This is
not possible with most threading implementations. For example, try to recover
from out-of-memory in multi threaded C++/Java/Go etc. application. It is very
hard. With threadless children it is almost trivial.

~~~
pjmlp
Plus we have learned the hard way that concurrency and processes are much
saner from safety point of view than threads.

------
toolslive
You really don't want to use the Unix module directly. If you're serious about
system programming in OCaml, use Lwt or Async to allow for concurrency.

------
testaccount7
IIRC in Coders at Work, Brendan Eich talks about hiring a programmer who wrote
an OS in OCaml.

