Hacker News new | comments | ask | show | jobs | submit login
Introduction to OCaml (baturin.org)
229 points by jxub 6 months ago | hide | past | web | favorite | 72 comments

OCaml is a great language. We looked really seriously at using it at my startup for some critical things a while ago, when the company was very young, because some of the team knew it and we were inheriting a small code base in it. Ultimately immaturity of tooling (eg ocamlp4 vs ppx) and near total lack of library support killed it for us. (We ultimately went with C++.) Good luck trying to use gRPC in it, for example: there are like three different "protocol buffers" libraries that all implement different parts of it and which are incompatible with each other. While Clojure and Scala have their weaknesses, being on the JVM and being able to re-use all of that Java infrastructure was such a massive advantage.

OCaml seems awesome and I wish I could use it more, but don't underestimate how much time you'll spend fighting with opam (and a lot of things just aren't in opam at all, or the opam version isn't compatible with your OS, but there's an apt package for it, but oh no what is going on) and just trying to get to talk to the rest of your stack.

i am curious if you considered f#.

We didn’t. I’ve heard good things about it, but my understanding is it’s windows/mono, and we run all Linux in production, which seems like a pain to navigate. (F# is CLR right?) Maybe this isn’t that big of an issue in reality but we had a reason to specifically consider OCaml, and once we passed on it, had strong biases to otherwise avoid things considered exotic.

F# runs fine on Linux, and it compatible with dotnet core, and have a very good emacs mode

If you have more questions on F# i recommend the community forum http://forums.fsharp.org/

I am not an expert, yet, but F# is missing several of the more sophisticated OCaml features, but on the positive side, you have the .net core ecosystem

can you point me to a tut to get f#running on Ubuntu?

I haven't used gRPC as such, but for protobuf specifically, ocaml-protoc works well for me.

The ML family languages are still so far ahead of "mainstream" languages in many ways. The first language I ever really learned was SML, my freshman year of college, and I swear there's still a part of me that reaches for "case", even after 15 years and about 1m lines of Java.

I've been playing around with an implementation of the ML semantics+type system, using the Go AST as a compilation target. Golang makes a surprisingly great compilation target for a functional language, because Go itself doesn't have any object-oriented baggage. I think there's real potential for a functional companion to Go, much like F#::C# and Scala::Java.

it is truly amazing to me that an ML language never made it mainstream. an OOP-enabled ML like F# or Ocaml beats the pants off of the likes of something like python, but here we are where python is taking over the world. and everyone else is clinging to java and c++.

f# and ocaml aren’t even hard to learn! i mean, one can learn Standard ML in a couple of weeks using dan grossman’s programming languages course! i understand, f# and sml and even some scala after just some small reading. but i took a semester long course in c++ and never understood it.

>an OOP-enabled ML like F# or Ocaml beats the pants off of the likes of something like python

Can you explain why you think so?

Decently typed, fast (immensely faster then python), the languages are based on a concise and sound theory and most of newly added features are just sugar over this concepts (like objects and polymorphic variants are just a products of subtyping feature in ocaml), unlike python, where features are monkey patches.


well python is often loved for how clean it is. however, i believe f# (or any ml dialect) is much cleaner than python in terms of syntax. and not only is it cleaner, it is more consistent, which is due to its ml ancestry. python is often loved for how it allows people to express solutions to their problems. however, f# allows much more expression that python has explicitly chosen to disallow. for example, f# is every bit as capable if not more capable as an oop language than python (see .net generics and ecosystem), but it also has functional programming features. if you have never programmed in a language with discriminated unions and pattern patching, you really should try. just look at people implementing ideas and compilers in f#, ocaml, or sml. using pattern matching and union types, it almost looks trivial. it looks as those they're just writing out a description of what they're trying to solve, and then bam, it's done. and i like f# out of these because of the .net platform and how it has embraced its multi-paradigm nature in a very graceful way.

python is a very pedestrian language. that is probably the secret to its popularity. it makes people who have put no study into programming languages feel great because they're able to accomplish things quickly, at least at the start. however, they are missing a lot, in my opinion by sticking with python. there is nothing, from my perspective, that python has that f# doesn't or couldn't have. and the reverse of that is explicitly not true.

ml and lisp/scheme are two of the greatest programming language approaches and paradigms, and python chose and continues to choose to ignore both of them.

It's always been funny to me as a functional programmer that Python took off. People always talk about how much quicker it is to write code in Python, and 99% of the time I find the Python code to be more complex than the OCaml/Scheme/Haskell/Common Lisp equivalents (minus, of course, the library support).

yes, it's really sad. i feel like the ML and lisp dialects are all we ever needed. having these two languages allows so much. they're perfect for getting stuff done and they're perfect for creating new languages. there's no reason why they couldn't have been used to perform the roles of things like java or c++ or python, but yet, here we are.

Yeah, that is my though since I used Caml Light in the university, quite a while ago.

Specially since adding to what you mentioned, those languages were compiled since the early days.

It all boils down to lack of love from OS vendors.

Even nowadays, F# is still the black swan of .NET languages, with less tooling support that C++ gets.

So I came to appreciate the small additions we get in Java, C#, C++ to get closer to FP kind of programming than expecting a radical change.

Thanks for the reply. Good points to check out.

no problem. i saw on this account’s summary that you do pdf generation. there are two good books on ocaml, and the second, more ocaml, has a project on producing pdf files. so that seems like it may be of interest to you.



Cool, thanks, will check it out.

Just the very facts that they compile to native code, use type inference, pattern matching does it.


> I've been playing around with an implementation of the ML semantics+type system

Sounds cool. Is it open source? I've attempted to write some ML interpreters and compilers, but I've always hit a brick while when I try to move from a basic "mini-ML" type of language -- which is relatively straight forward -- to one including pattern matching.

It's here [1]. I've made some good progress on the typechecker while on vaca. Still looking for a better name : )

[1] https://github.com/georgewfraser/funlang

Totally agree, and would love to see something with Go interop gain serious adoption. Thought it might have been Oden [1] for a while, but the project seems to have petered out.

[1]: http://oden-lang.github.io/

That’s a real shame. Everything about the go runtime is great, which is in complete contrast to the language itself imo.

Do not make it REPL-centric. Real life programs are not developed in the REPL. Include examples of complete programs.

In certain environments/languages, real programs are developed in the REPL. Real programs, managing billion dollar portfolios, natural gas scheduling, and the payroll of Fortune 500 companies have essentially been "developed in the REPL."

A REPL is an incredibly useful tool for all sorts of development, "real" or not. But I share the author's frustration with intros to functional programming that take advantage of a REPL to provide small, context-less examples. It makes it hard to see how I could use the language in for a complex project with significant side effects (e.g. file/network IO).

Real World Haskell[1] is a great example of an introduction that provides small examples AND situates them in the context of programs that I could see myself actually using.

[1] http://book.realworldhaskell.org/read/

Ocaml is not one of those environments. The repl is of limited value in ocaml as the language is not dynamic enough to use it. Much of the language is built on its module system and the repl requires one to write an entire module and (it’s signature) at once. This makes the repl very much inferior to compiled ocaml.

The article suggests that the repl is of use for finding the type of things. This is only half true. Consider the following:

  type 'a my_extra_args = foo -> bar -> 'a
  val some_fn : (int -> unit) my_extra_args
Now suppose you want to know the type of the function so you ask at the repl:

  utop> some_fn;;
  val - : (int -> unit) my_extra_args
And now if you want to know how to call this function you need to try to guess arguments until a type error expands the type.

Alternatively you could use compiled ocaml, write your programs in a sensible editor and then query the type using Merlin (a program to provide ide-like features to editors for ocaml)

On the other hand, I would expect a Common Lisp tutorial to talk about the repl as real world programs are written with the repl (but also in files)

I sort of lost interest in what else the author had to say at this early point, despite having been interested in OCaml for many years. (Referring, for context, to the "Do not make it REPL-centric" comment in the article.)

In Common Lisp, Clojure, Scheme/Racket and Haskell, I constantly develop with the REPL. I also use it in other languages such as Ruby which are not necessarily REPL-centric.

I find that I write much better (less buggy) code, faster, in a language with great support for a REPL.

I'm not against the REPL (perhaps I should even describe its usage in more detail). In fact, I use it all the time. I'm against books like Learn You A Haskell that leave the reader wondering how to make and distribute executable programs. Or, worse, create an impression that it is not even possible, if the reader is not motivated enough to read another book or blog posts.

100% agree. Rails console and irb is a major reason why I keep coming back to Rails, I haven't found anything as mature and pleasant to use in a web dev framework for verifying the code and tinkering with my models.

And... Hacker News. I can't find it now, but I'm pretty sure pg talked about how Hacker News was, at least initially, REPL-centric (not in those words)

EDIT: http://pchristensen.com/blog/articles/are-you-this-agile-pau...

> This is a step beyond that. I'm not even doing releases; I've been pasting the changes into the running server's repl.


Obviously it's a bit extreme to be hacking on the production site with no version control etc, but HN was a lot smaller back then.

How does that really work?

Because at the end of the day, to build your program, you need to have code in a source file of disk Code in the REPL is not persistent

So unless the language come with some sort of repl server, which persist code on a server, and the program instantiate themselves from the server (something i have never seen and i doubt that this is what you meant)

I am guessing you actually mean, that development was REPL supported, and that you used the repl to test your code, and help you design your code

And I see no problem in that, but still you need a solid traditional cycle around that to actually build, deploy and distribute your code ... right

How does that really work?

Because at the end of the day, to build your program, you need to have code in a source file of disk

Not necessarily in the way you'd imagine. In Smalltalk, there is what amounts to a transaction log of your code changes, which you can replay. There is also an operation analogous to a database "checkpoint" where you save the whole memory image.

In most production environments, there is a 3rd mechanism which also tracks and saves the configuration in a database, which also serves as part of a build/deployment mechanism.

Code in the REPL is not persistent

In Smalltalk, all code executed in one of the REPL-like things is recorded on the transaction log of your code changes.

I am guessing you actually mean, that development was REPL supported, and that you used the repl to test your code, and help you design your code

In many Smalltalk environments, there are REPL-like things everywhere, including the many of the fields/panes of the debugger, and seasoned Smalltalkers would pretty much write the system in the debugger.

The image abstraction often found in the ML family of languages (and Smalltalk). An image is a serialization of an interperter's state. An image includes the entire interperter/REPL/console code within the application. The effect is somewhat similar to REPL's attached to applications found in Common Lisp or Clojure.

Not sure who invented dumping and loading images, but the first Lisp did that already around 1960. From there it spread to Smalltalk in the 70s.

A typical MIT Lisp Machine OS in the 80s recorded for all functions/etc. where they were defined and one edits a function by calling (ed 'foo) (or by pressing M-. in the editor on a function name) and then the source file with the function gets edited. So the Lisp IDE knows that foo is defined in the file hans:graphics;editor;foo.lisp.273 (where 273 is the version number or the file). If one saved an image, this information is stored and restored on starting the image. Common Lisp systems today still record the source location, but they lack the versioned file system and they also usually not record a particular location from a source code version system like SVN or GIT.

What Xerox Smalltalk did (and something similar did Xerox Interlisp-D) was to manage source code in a distribution file and a changes file. Thus where the MIT Lisp Machine IDE tracks a multitude of versioned files (and the user has to somewhat manage them by creating them, registering them, compiling, loading them), the Xerox Smalltalk abstracts the editing away from files and manages the source storage in a file mostly invisible for the user. Thus a Smalltalk class browser does not display files, but it displays individual definitions. Since the compiled code is some high-level byte-code, source code can also in a more limited way computed from the compiled code.

Similar: Xerox Interlisp-D (which was a Lisp IDE, which ran on the same hardware as the Smalltalk machines from Xerox), which also managed source code (using tools like the File Package and Masterscope), but the source code is actually Lisp data and the main code editor is a structure editor - not a text editor. Xerox called it a residential system - since the source is actually data in the image and the File Package manages to track the changes and keep external versions of the source.

I have actually no idea which ML also a) works with images and b) manages source code. Would be interesting to hear more about that.

SMLNJ has ExportML to create an application/image. It's not Common Lisp of course. Or Smalltalk. But it's not Python or C either...people at Bell Labs were pretty sharp when it comes to programming language design...thinking about it a bit more, I think Erlang is worth mention in terms of interactive programming against running systems...anyway, Ocaml has ocamlmktop https://caml.inria.fr/pub/docs/manual-ocaml/toplevel.html#se...

I interpreted that as not making his "intro to ocaml" REPL-centric, so that things like the build tooling gets covered.

but this article is about learning, not the day-to-day development in OCaml. I agree it is good to learn how to structure, compile, and run code without a REPL. ie. learning how to import files and all that.

I read that as do not make your program depend on REPL features.

Do not forget the Real World OCaml book! I personally found it to be very well written.

Available both online and as a printed book published by O'Reilly: https://v1.realworldocaml.org/v1/en/html/index.html

You probably want to go with https://dev.realworldocaml.org/ , which is a in-progress rewrite of v1.

The v1 version has not aged well unfortunately. I _highly_ recommend just reading the dev version.

This is a key piece of information. Thank you!!

My compilers course in college was taught in OCaml. The first semester of the course was spent on teaching functional programming and OCaml and the second semester was spent on implementing a compiler that compiled a restricted subset of OCaml.

It is a pleasure to write a tokenizer and a lexer in OCaml.

Did you use parser combinators or what?

I assume OCamllex and OCamlyacc (or the various other alternatives like Sedlex and Menhir). The OCaml standard distribution is very compiler-writing-oriented, so to speak, and comes with tools for parsing out of the box. A lot of newer projects use parser combinators, but there are far fewer examples of how to use them (like the plZoo) and most people are already familiar with lex/yacc, so I imagine most courses would choose to use those.

Correct, good guess. :)

I thought this was a pretty great introduction. Part 2, in particular, finally helped me understand function type signatures in ML languages: the fact that, under the hood, functions only accept a single argument, and the way the type signatures are presented are merely an aesthetic cleanup of what it actually looks like (which would include more parentheses to indicate associativity).

One of the examples did refuse to compile for me, and I felt Part 1 was still too light on basic syntax. I tried to do the exercise at the end of Part 1, about prompting for, squaring, and printing an integer. This works:

  let this_integer = read_int ()
  let _ = print_int (this_integer * this_integer); print_newline ()
But this doesn't:

  let this_integer = read_int () in
    let _ = print_int (this_integer * this_integer); print_newline ()
And nothing in the tutorial explains why that should be so -- at least not that I could find.

I've also yet to see a really good explanation of when you need to use parentheses for grouping and when you don't -- this is the only part of the language that really feels like a "syntax error" to me. Like, "Look, ocaml is clean and doesn't need to use all this extraneous syntax! Except when it does."

Anyway, good introduction.

So the reason for this is actually because OCaml has 2 types of expressions, which is sort of confusing. The first is a top-level and the second is just a normal expression. In top-level expressions, they are implicitly put into scope. If you've seen any of the ";;" in tutorials, this indicates that you're working in the top-level (although these are pretty much always automatically inserted assuming you're not running a REPL). Now, if you're not in the top-level, you need to use the `let x = ... in ...` syntax, where the `in` is roughly analogous to a semi-colon. So the reason the first works is because the REPL runs at the top-level, and the second would only work inside of a function.

Okay, I think I get it... And if you're not at top-level, ie you're in a function definition, you actually can't use "let" without an "in".

That is a little weird, but not terrible. Thanks for the explanation.

This would work:

   let _ = 
     let this_integer = read_int () in
     print_int (this_integer * this_integer); print_newline ()
I agree part 1 should explain more of the basic syntax. The whole thing is a work in progress so I keep improving it as I get feedback.

In addition to books/tutorials/etc, I've found reading the source to unison [1] was a great way to see a more tangible example of using OCaml to create software

[1] https://github.com/bcpierce00/unison

I loved OCaml as well but the lack of libraries pushed me to another ML variant: F# on .net platform. With .net core and F# tooling around it gradually maturing, it is a solid cross-platform functional language.

Awesome. I've been getting into OCaml lately, and it's great to see new resources. This bodes well for future adoption.

For those looking for a slightly more comprehensive source, check out https://dev.realworldocaml.org/ (no affiliation).

What are the biggest differences between OCaml and F#? If I wanted to learn/use an ML style language 2018, for what purposes should I use or learn one over the other?

F# is very much focused on gaining real-world usage by making it compatible with .NET. It has a lot of features that heavily target this, so for example the object system is built to be compatible with the .NET style of classes/objects (but sadly, they ditched the OCaml way of doing this, which IMO is much better). This introduces many inconsistencies in the type system and some clumsy syntax, which is pretty much a deal-breaker for me. However, these features are very specific to more advanced parts of the language. If FP+.NET or library support is a must for you, F# is probably the best choice by far. Also, F# does a great job at interfacing with the GUI parts of .NET, so if you need GUI (or Windows support) it is definitely the right choice. The core language is still quite elegant and a joy to use.

I think if you have the choice not to use F#, though, that OCaml is generally nicer to use. It offers a ton of features that I find myself missing in F#, like a much more powerful module system, a more advanced approach to OOP, a cool macro system, and so many more (polymorphic variants, better exceptions/soon to be an effects system, ...). In general, I find the syntax to be slightly more predictable for OCaml, but a lot of people still prefer F#'s syntax (which is indentation-based, btw) so I wouldn't focus on that point.

Overall, I think they fit different niches. I've had to use F# quite a few times and it's by no means a bad language, but after using OCaml extensively I feel disappointed by F# a little bit like how I'd feel disappointed if I had to write a project in Java.

OCaml was a required course in school for me. It was immensely useful and allowed me to think differently about programming (I only knew Java,C,C++ at the time).

Super late to the party but, if you love OCaml and ML languages, check out Elixir. Runs on the Earlang machine and this inherits all its libraries. It takes more than a page from patterns and conditional method bodies. Love it.

Like Erlang, Elixir is still a "dynamic" language with no static types, it would have been great if it had introduced ML semantics to the Erlang/BEAM world, but this would have been tough to retrofit.

For those considering learning OCaml, I thought I'd share my (admittedly biased) thoughts about the language as a long-time user (well... roughly 4.5 years).

Out of all the languages I've learned, OCaml is one of the few that I would consider to be a "sweet spot" language. A lot of people seem to have one language that they tend to fall back to when they're not sure what else to use because they find it most practical, whether or not they enjoy using it as a language. Out of the languages I know, it's pretty much between Java and OCaml for me, with OCaml being much more ergonomic. Writing OCaml code is much more relaxing than most other languages I've encountered because everything is quite predictable once you know the core language, but it features a multitude of tools that you can use to approach any task (OOP, modules, functional programming, imperative, low-level, high-level, metaprogramming, etc.). I also think that OPAM is one of the best language package managers around (for example, it comes with native support for having multiple copies of the OCaml toolchain installed in parallel). Finally, Reason+BuckleScript have become really nice and for web programming I think they offer one of the best options.

There's still a few things that are far from perfect, though. OCaml still lacks an equivalent to Haskell's typeclasses and that makes designing good generic libraries a pain (it's still possible using modules+functors, but it takes a little boilerplate because it's not implicit). As a side effect of this, the standard library is pretty fragmented between the official one (aka "things we used to write the compiler so you can keep them if you want"), the Batteries library (which is essentially what the standard library would be if the official one was "finished"), and Jane Street's Core library (which replaces the standard library altogether). The problem is that this extends into basically all of OCaml. For such a small language, there's little room for all the competition and that means that a lot of libraries either don't exist or aren't actively maintained. That said, most libraries are a breeze to implement and for the real-world code that I do write, they are hardly a distraction 99% of the time. The only other downside is that OPAM isn't compatible with NPM, which means that BuckleScript (the OCaml->JS compiler) has a totally separate ecosystem from the native compiler.

Tl;dr, if you're looking for a very general language to learn and don't care about having to implement your own libraries, OCaml is one of the best choices out there. If you're doing JS programming, it's worth taking a look into ReasonML/BuckleScript.

If you're interested, feel free to ask me any questions about the language/ecosystem/learning.

How do you feel about the progress OCaml since you started using it?

What is your general prognosis for the future?

Are you by chance getting paid to write OCaml? If so, how hard was it to find opportunities.

1. It's definitely been slower than I would have liked. A lot of developments have been hyped up a lot, when in fact they are years away. Modular Implicits (which approximates Haskell's type classes) and Multicore have been in the works since only a little bit after I started using the language. At the time, they were kind of described to me as being "right around the corner" but failed to appear. There have certainly been lots of good additions to the language apart from this (e.g. we got a great inliner called Flambda), but it seems like it's still gonna be >=6 months before Multicore lands and possibly a year or more before Modular Implicits makes its way into the mainline compiler.

2. I think it's kind of a bittersweet future. I think growth is decelerating for OCaml because of ReasonML/BuckleScript making it really easy to get started with OCaml and use web libraries. This is great because it brings exposure to the language, but I think it's caused a number of people to switch over from native to web, which means abandoning some of the OCaml libraries for their web versions. OCaml has a great native backend and it's really a shame for me to see it fade away, but I think in the long-run that's probably going to happen. Still, I think popularity of Reason will continue to go up and that will bring OCaml into the mainstream for hopefully long enough to garner some attention and get people to take the native backend seriously.

3. I'm currently working for equity at a very small startup I co-founded but we're using ReasonML for the web side of things. Previously, I was at an internship where I had a lot of freedom to use whatever I wanted and was paid to write some software in OCaml. I think finding a job writing OCaml is rather difficult, but introducing it in a job (especially for internal tools or web development with Reason) is much, much easier. The "big" OCaml jobs seem to all be at companies where it's really hard to find employment (Jane Street, Facebook, Bloomberg), but occasionally you'll see some smaller ones pop up online.

Thanks for the detailed and thoughtful answers.

I'm going to keep hacking away at OCaml, but measure my expectations a little and perhaps start looking into Reason as well.

A couple questions wrt Reason:

Do you find it easy to switch between OCaml / Reason? I personally find the syntax of Reason a little unnecessary.

Do you use ReactReason or some other framework -- perhaps the Elm-like 'tea' package?


more like introduction to getting a job at Jane Street

I interviewed at Jane Street years ago, back when I was quite active in the OCaml community, got to the on-site rounds but didn't get an offer (a lucky escape it turns out, tho' I was pretty disappointed at the time). Anyway - OCaml wasn't even mentioned during their process. I think they just assume they'll have to teach it anyway.

> a lucky escape it turns out

How so?

How so?

Well, I didn't even think to ask about this, but a friend of mine turned down their offer because they have no remote working. Like me, he didn't actually WFH that often but the lack of an option to do so occasionally was a deal-breaker for him, and it would have been massively inconvenient for me too. I ended up joining a very similar firm who were a bit more relaxed about it (who worked in C++ and Python mostly).

Would you mind sharing the company's name? In terms of size, engineering culture and industry, I can only think of twosigma who is similar to janestreet.

I've already doxxed myself enough for one day ;-)

where's the PDF version?

The wonderful thing about a (relatively) plaintext site like this is you can 'print' it into whatever random format you want.

It will appear when (and I hope it's when rather than if) I complete the series. Then I'll typeset it properly and release the source under CC-BY-SA or another free license and make PDFs from it.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact