
Python to OCaml: Retrospective - antouank
http://roscidus.com/blog/blog/2014/06/06/python-to-ocaml-retrospective/
======
pmarreck
This seems near-proof to me that a gradual inflight translation of a codebase
from ANY one language, to another, entirely different one, is feasible, as
long as you build a data interop layer of some sort (JSON, etc.). I imagine
that if it were powering a web app, dual-deployment would become an additional
(but potentially manageable) concern during the "transition" period.

This seems a much safer/saner way to do total rewrites/refactorings.

Note that it practically demands decent (if not impeccable) test coverage (he
even admits that many parts were not tested... the only saving grace being
that due to the semantics of the 2 specific languages here, he was able to use
roughly the same logic for the less-tested portions, reducing risk).

Also note that according to the graph, during the middle of this process,
application and testing performance will be _the most terrible._ At that
point, some managers would probably decide to back out/bail on it, which is
why I thought it was important to note.

~~~
wwweston
> as long as you build a data interop layer of some sort (JSON, etc.). I
> imagine that if it were powering a web app, dual-deployment would become an
> additional (but potentially manageable) concern during the "transition"
> period

Which brings up an interesting idea:

* _All_ codebases are in some kind of inflight transition. Usually not migrating across languages, but often migrating across authors and authoring styles, sometimes migrating across underlying platforms. So, things that make these inflight transitions easier might well be practices one should consider adopting.

* The "data interop layer" might simply be another way of understanding another principle: if your data structures/formats are (a) legible and (b) well-fitted to your problem domain, your program is probably going to be easier to understand and modify... maybe even when it comes to modifications that might seem extreme.

Or to use words attributed to Linus Torvalds: "Bad programmers worry about the
code. Good programmers worry about data structures and their relationships."

~~~
agentultra
When you think about the data first you can usually synthesize the program.
It's almost a game now to try and write a DOOM engine from only the
description of the WAD format and its wonderful documentation. I'm under the
impression that if you think about your data and how to serialize it first the
code will fall out of it.

So I'm starting to think more about formal specifications and documenting
serializations formats first and worrying about the code second.

 _update_ : punctuation corrections

~~~
pmarreck
So, data-driven development, basically? [https://en.wikipedia.org/wiki/Data-
driven_programming](https://en.wikipedia.org/wiki/Data-driven_programming)

~~~
agentultra
Data-driven design[0] and formal methods[1].

 _update_

[0]
[http://dataorienteddesign.com/site.php](http://dataorienteddesign.com/site.php)

[1] [http://research.microsoft.com/pubs/64638/high-
level.pdf](http://research.microsoft.com/pubs/64638/high-level.pdf)

------
aikah
What about OCaml and concurrency ? what are the options out there ?

after reading :

[http://roscidus.com/blog/blog/2013/09/28/ocaml-
objects/](http://roscidus.com/blog/blog/2013/09/28/ocaml-objects/)

the language sounds quite interesting. Does it support something like channels
?

~~~
masklinn
LWT (a popular IO library) has channels:
[https://ocsigen.org/lwt/dev/api/Lwt_io](https://ocsigen.org/lwt/dev/api/Lwt_io)

That's what TFAA (and MirageOS) use.

~~~
vog
Forgive my ignorance, but what is TFAA? All my web searches yield irrelevant
results, such as "Triveni Faridabad Allottees Association" or "Trifluoroacetic
anhydride".

~~~
mercurial
The Fine Article Above?

~~~
masklinn
The F(ine|ucking) Article's Author.

------
NelsonMinar
I regularly use another OCaml program: Unison, the file syncer. It's sort of
like a bidirectional rsync. It's a fantastic tool but nearly abandonware. My
suspicion has always been the open source project languishes because so few
people know OCaml to work on it.

~~~
k_bx
In order to not become abandoned, open source software needs to be not only
open source, but also to have easily accessible source control, issue tracker
and other community-required tools.

In case of Unison – it's a program done by a brilliant Computer Scientist and
author of one of the best books I've read (well, reading) "Types and
Programming Languages" Benjamin Pierce. I know he had put quite some research
effort into that tool, but I don't see community-related infrastructure around
it being in place, that's why it looks (or maybe is) rather abandoned.

Btw, here's another fun video of B. Pierce making some intriguing statements
regarding a popular file-sync program called Dropbox
[https://www.youtube.com/watch?v=Y2jQe8DFzUM](https://www.youtube.com/watch?v=Y2jQe8DFzUM)

------
mands
Good to see this back up on here - am just about to start porting a Python
codebase to OCaml myself (www.stackhut.com) and have been reading this to
help. Am thoroughly looking forward to all the typed, functional goodness :)

~~~
e_d_g_a_r
You will learn about how python's scoping rules are actually quite bad.

~~~
mands
Yep, a source of constant frustration when moving between them!

------
melling
"Most errors are picked up by the type checker immediately at the start of the
build, rather than by the unit-tests at the end. That saves a lot of time."

Better tooling would help so that you'd get the error checking as you type.
Are there any good configurations for vim or Emacs, for example?

~~~
krat0sprakhar
Merlin for Vim is an absolute delight to use. You get auto-complete,
indentation, compile-on-save, type checking right within your editor with less
than 5 lines of configuration.

For someone new to the type system, it helps a lot to compulsively keep
checking the types of the expressions as you go along building the program.
Highly recommended.

PS: If you prefer screenshots -
[https://twitter.com/prakharsriv9/status/689141428161802241](https://twitter.com/prakharsriv9/status/689141428161802241)

~~~
mercurial
Also, destructuring for pattern-matching is pretty cool.

~~~
krat0sprakhar
It surely is. But how does Merlin help with that?

~~~
mercurial
Like that: [http://the-lambda-
church.github.io/merlin/destruct.ogv](http://the-lambda-
church.github.io/merlin/destruct.ogv)

Really nice.

~~~
LeonidasXIV
I used Merlin pretty extensively, yet this example had me sitting there with
an open mouth, staring in awe.

The funniest thing is that in my experience Merlin just seems to work. When I
think about how many hoops I need to jump through with the Clojure REPL to
connect it to CIDER in Emacs, Merlin just works in the background without me
having to think about it at all. Very impressive.

------
elbasti
Excellent writeup. It's a very useful example of a refactor done right. I'd
love to know more about the json interface between the Python and OCaml code,
since "integration" is usually where refactors like this get hard.

------
detaro
from 2014, discussion from back then here:
[https://news.ycombinator.com/item?id=7858276](https://news.ycombinator.com/item?id=7858276)

------
dorfsmay
I'd love to see a comparison with recent Rust. Looking at that graph with 2013
results, I'm surprise how much slower Rust was compared Haskell and OCaml.

~~~
steveklabnik
For fun I started porting the example over:
[https://gist.github.com/steveklabnik/f86cba4da1dc9c5c68e0](https://gist.github.com/steveklabnik/f86cba4da1dc9c5c68e0)

Then I realized that it wasn't totally clear to me what
json_list_to_str_vector() should do, exactly, and so I don't even know how it
would compare.

When this article was written Rust had a big runtime. It was a very different
language. OCaml is pretty fast, but I would still expect Rust to fare much,
much better today.

~~~
masklinn
> Then I realized that it wasn't totally clear to me what
> json_list_to_str_vector() should do, exactly, and so I don't even know how
> it would compare.

It just converts a JSON-encoded list of strings into a Vec<String> doesn't it?
That's what the other languages do:

* get data from some envvar

* decode it from json to an array/list/vector of strings

* concatenate argv[1..]to the result of (2)

* execute this new argslist (the first item being the name of the program)

~~~
steveklabnik
Ah that makes sense. Oh well.

------
meric
Thank you for the writeup. I loved the charts and the writing.

------
tempodox
Was a great writeup before, now even enhanced. +1.

------
systems
ats , hmmm never heard of it, seems to have done really well, whats the catch?

~~~
masklinn
If you read the original article[0] it did very well on the speed/size bench
(5/5) but took a big hit on ease of writing (1/5) ending up with 48 between C#
(47) and Haskell (49) and the following summary note:

> Everything would be incredibly fast, but getting new contributors would be
> very difficult due to the learning curve. There’s a risk of crashes as the
> library is not entirely memory safe, and there are likely to be changes
> ahead to the language. Probably writing the whole thing in ATS would be too
> much work for anyone.

ATS didn't make it to round 2[1] on grounds of use difficulty and difficulty
of separating memory-safe and memory-unsafe code.

That seems prescient as the original ATS1 (ATS/Anairiats) was replaced by ATS2
(ATS/Postiats) a few months later, I don't know how compatible the two are but
the FAQ puts ATS1 and ATS2 in different categories[2]

[0] [http://roscidus.com/blog/blog/2013/06/09/choosing-a-
python-r...](http://roscidus.com/blog/blog/2013/06/09/choosing-a-python-
replacement-for-0install/)

[1] [http://roscidus.com/blog/blog/2013/06/20/replacing-python-
ro...](http://roscidus.com/blog/blog/2013/06/20/replacing-python-round-2/)

[2] [https://github.com/githwxi/ATS-Postiats/wiki/ATS-
implementat...](https://github.com/githwxi/ATS-Postiats/wiki/ATS-
implementations#ats1)

------
aksx
Is it just me or is there a mistake here
[http://i.imgur.com/65GPWq6.png](http://i.imgur.com/65GPWq6.png)

He says the color of the UI part is orange but i see yellow

~~~
vog
Color perception is very subjective, because color learning differs from
childhood to childhood.

That specific color is between the yellow and red range, although it is closer
to the ideal yellow. Some people learned that this still counts as yellow,
while others learned that this is yellow is "red enough" to count as orange.

~~~
hacker_9
Whilst colour can be subjective, in this case it is clearly yellow and a
mistake by the OP, because the preceding bar is clearly orange.

~~~
masklinn
> the preceding bar is clearly orange.

With a hue of 7˚[0] the middle color is in the middle of the red range[1],
it's on the pink side of scarlet (8.5˚, 100%, 100%).

The rightmost bar has a hue of 47˚[2] making it an orange-yellow[3], calling
it an orange (if a light one) is not insane.

[0] [http://www.color-hex.com/color/f12910](http://www.color-
hex.com/color/f12910)

[1] [http://www.workwithcolor.com/red-color-hue-
range-01.htm](http://www.workwithcolor.com/red-color-hue-range-01.htm)

[2] [http://www.color-hex.com/color/fbcc1a](http://www.color-
hex.com/color/fbcc1a)

[3] [http://www.workwithcolor.com/orange-yellow-color-hue-
range-0...](http://www.workwithcolor.com/orange-yellow-color-hue-range-01.htm)

~~~
hacker_9
From your link, compare the red RBG bar to the clearly orange bars at the side
of the screen. Additionally the 'Analogous Colors' contains an even more
obvious orange, and pink, but no sign of red.

~~~
masklinn
> From your link, compare the red RBG bar to the clearly orange bars at the
> side of the screen.

I see red all around.

> Additionally the 'Analogous Colors' contains an even more obvious orange,
> and pink, but no sign of red.

Hue-wise, red is in the middle of orange and pink, if your analogous colors
are orange _and_ pink your base is a red. That's exactly what you get for
scarlet: [http://www.color-hex.com/color/ff2400](http://www.color-
hex.com/color/ff2400) or pure straight no-frills red: [http://www.color-
hex.com/color/ff0000](http://www.color-hex.com/color/ff0000)

~~~
kbenson
Arguing about this over the internet is somewhat comical, as you are probably
being represented by your computers at least slightly differing colors, and at
most entirely different colors, due to things like screen quality, viewing
angle (if LCD), brightness/contrast settings, etc. Trying to use specific
descriptions are can be replicated by a computer is a good step, but
ultimately of little use if you can't be sure that same color is represented
the same to who you are talking to.

I mean, obviously, the color is blue. Or is it gold? In any case, it's
definitely a dress...

