

OCaml: The Bugs So Far - ctoth
http://roscidus.com/blog/blog/2014/01/07/ocaml-the-bugs-so-far/

======
illumen
Quite an interesting series of articles :)

The Unix.waitpid failing because the author ported the python code, and did
not realise that the python version did something nice which avoided one of
the types of bugs.

Quite a number of the bugs appear to be because of things that the python code
did nicely without the author realising it.

This shows how the comparison between a second system, and the first system is
not really fair. This is because much of the code is already tested in the
first system. It also seems to suggest that prototyping in a language like
python as a first system is a great way to improve the quality of the second
system. Finish it first quickly, and then improve the quality.

It also shows the danger of not using the protective parts of the first system
in the second system. For example, the advanced integers in python avoided
some bugs for the author, which manifested in the second system. The python
integers automatically handle overflow, whereas the OCaml ones are 31bit on
32bit systems.

OCaml uses non-deterministic garbage collection, whereas python mostly uses
reference counting (in CPython). This is another case where the first system
was safer in one aspect.

Perhaps OCaml could consider taking on some of these features from python to
improve safety.

Glue code when libraries are wrapped is another area where these bugs seem to
happen. Some of the GTK bugs, and system call bugs are ones caused by badly
wrapped functions. With more stable and well tested wrapped code, the smaller
the likelihood of bugs there will be. Whilst automating wrapper generation
might stop some bugs, I am not convinced, after having used buggy
automatically generated wrappers many times where the manually written ones
are far superior.

The post also proves that a number of bugs were not caught by testing or
static type checking. This makes me wonder if the author has 100% unit test
coverage or not? Some of the bugs should have been caught by unit testing, and
others could only have been caught by functional testing. Some of the platform
bugs could have been caught with continuous integration testing on those
platforms.

~~~
copx
> Caml uses non-deterministic garbage collection, whereas python mostly uses
> reference counting (in CPython). This is another case where the first system
> was safer in one aspect.

I certainly would not call CPython's memory management approach "safer" [1].
As you can see, CPython has a nice accidental memory leak trap door there,
OCaml does not.

Also I see little value in _partially_ deterministic memory management like in
the case of CPython's ref-counting + GC combo.

[1] [http://stackoverflow.com/questions/8025888/does-python-gc-
de...](http://stackoverflow.com/questions/8025888/does-python-gc-deal-with-
reference-cycles-like-this)

~~~
illumen
That is not even a problem in python. You've pointed out _One_ strange case
with cyclic references, that can be garbage collected by the optional garbage
collector that is enabled by default. You can even test to see if there are
cycles with the garbage collector and remove the cycles if you want (making it
more sane, and deterministic in the process). The linked page even shows how
you can detect the cycle with python, and then remove it. So by default this
wouldn't be a problem in python. Also, in a production system which actually
tests their code for cycles, this would not be a problem in python with the GC
turned off.

In practice I've found it useful to avoid many types of weird behaviour that
GC systems have. Where you often need to tweak the GC so it behaves nicely
with your system. These pathological non-deterministic behaviours are often
just not there with ref counting. How many java or rails articles have you
seen where they talk about adjusting the GC so the apps don't blow up?

GC is also the enemy of fast. But not in the throughput way that many people
measure it. Instead in the latency way. Where if you have GC it can take
longer to process a request, due to the non-deterministic behaviour. This is
noticeable in many Java systems that stop for a second to collect
occasionally. If you're in a real-ish time system, then using GC or allocating
memory is slow. If every 100th web request takes 1 second, or you get janky
animations, or occasionally take too long to respond to user input, then that
is a _broken_ program in those domains.

The linux kernel, QT, and Objective C automatic reference counting are much
better forms of reference counting. The CPython one isn't the best, but still
does give you some of the benefits. CPython does have memory pools and
reference counting, which means if you are very careful you can avoid
pathological cases of memory allocation in a deterministic way. You can also
more easily manage memory manually if you need to. Since once you remove all
references to an object it is free'd immediately. With GC, how do you know the
3GB object is freed now, or in 1 second? Will that cause swapping to happen?
What about with the next version of INSERT_GC_LANG that changes the GC
behaviour?

------
amirmc
Also worth reading the follow up - OCaml: What You Gain [1] and the overall
retrospective [2]. If you search HN for the titles you can find discussions
about those. The author will be also be speaking at OCaml 2014 this year [3].

[1] [http://roscidus.com/blog/blog/2014/02/13/ocaml-what-you-
gain...](http://roscidus.com/blog/blog/2014/02/13/ocaml-what-you-gain/)

[2] [http://roscidus.com/blog/blog/2014/06/06/python-to-ocaml-
ret...](http://roscidus.com/blog/blog/2014/06/06/python-to-ocaml-
retrospective/)

[3]
[http://ocaml.org/meetings/ocaml/2014/program.html](http://ocaml.org/meetings/ocaml/2014/program.html)

------
DanielBMarkham
Amazing that the bug count was so small.

The only one that caught my attention was garbage collecting your functional
reactive code. The solution was to use global variables. That's a huge red
flag.

Overall though, this is a pretty positive report for committing some 30KLOC.
Amazing, really.

------
moron4hire
"Fails to Start on Windows" sounds more like your lack of experience with
Windows than a 3rd party issue.

