Interesting. As I understand it, this lands an approach in which Sequential Consistency is not guaranteed but if you have a data race you get even nicer guarantees than in Java and its authors believe this might be enough that real programmers working on real software can actually debug data races despite loss of sequential consistency.
In many popular languages today (e.g. C++) if you have a data race you're completely screwed, (e.g. the program exits successfully, even though it was supposed to loop forever serving TCP requests - good luck figuring out why). In Java they decided OK, that's not acceptable, so data races are constrained to only the data touched by the race and, importantly, that data is still legal it just might be astonishing (e.g. you were adding several small positive integers together from a shared data structure in parallel, but due to a misunderstanding in your design this was actually a data race, and some time later somehow your total is now zero, but you won't crash or whatever)
OCaml intends to further constrain the consequences in time, if the total was 114 when you stopped adding, it will still be 114 later, it won't mysteriously become zero (or any other value) thanks to a data race which must have happened before you checked.
[I'm sure I have some details wrong, but this is the gist]
What remains to be seen is: Is that enough? There was great hope when Java made its rules that they were enough and programmers could understand what was wrong in a Java program with data races, that did not pan out as I understand it. So, it seems to me that it's possible OCaml ends up in the same situation.
> e.g. you were adding several small positive integers together from a shared data structure in parallel, but due to a misunderstanding in your design this was actually a data race, and some time later somehow your total is now zero
Could you clarify why/how this would happen? Is it because the last process sets it to zero to initialize a variable?
I can’t conceptualize the steps that would end up with this result if all processes are adding. It seems like it would at least equal the result of the final thread’s calculation.
Non aligned data structures being written to. E.g. if you're adding 2 nibbles and writing it to the first part of a byte or writing a byte on a packed struct.
I believe the less scary race in Java implies a lower performance upper bound. Does the nicer guarantees in multicore OCaml have the same drawback? I.e. even less than Java?
The performance numbers are in the paper abstract: "our evaluation demonstrates that it is possible to balance a comprehensible memory model with a reasonable (no overhead on x86, ~0.6% on ARM) sequential performance trade-off in a mainstream programming language". It's a little higher on PowerPC but still very usable, and RISC-V overheads should be roughly comparable to ARM.
As the paper clarifies, this is for the performance of non-atomic read/writes in sequential setting. The paper left the performance evaluation for atomic read/writes to future work. Is there any indication yet regarding the performance in a parallel setting compared to weaker guarantees?
That's right -- we first wanted to establish that existing OCaml code wouldn't be adversely impacted. There are various efforts ongoing to build interested (locked and lock-free) concurrent data structures, so that will inform the parallel performance results. Nothing published yet.
If you use Atomic* classes and the available concurrent standard lib datastructures, Java concurrency should not be an issue to any non-junior developer.
If you're interested to see a comparison of parallel programming in a number of functional languages check this repo out [0]. It includes multicore OCaml, parallel MLton (but not Poly/ML, which has been around and parallel longer), Haskell, Futhark, F#, Scala, and Rust.
Credit to Sam Westrick for turning me on to this [1].
One thing that I think is missing is the compilation time. Those can vary widely depending on the languages. Other than that it's a nice repo, thanks for sharing it!
I wondered the same, but it seems the multicore OCaml implementation of those benchmarks was last touched 2 years ago, and the top level README with the benchmark results was also last updated for the OCaml results around that same time. So maybe just rerunning those tests with the latest version would give better results?
Great roundup, thanks! As I write this, 6 hours after this very-long-anticipated story was at the top of the front page, it's now down on page 3 with 299 upvotes. Not complaining, just find the ranking system to be so inscrutable at times like this. If I hadn't checked the site this morning, I'd have missed it. With a story history like this, I'd think it reasonable for the story to ride the front page all day. Of course, computers only do what one tells them to do, and ranking systems are hard...
This topic had a major thread just a few weeks ago. We try to avoid too much repetition here, because it's not good for curiosity.
Popular projects often get submissions of every step of the release lifecycle, from major commits, to PRs, to merges into main branches, to alpha releases and beta releases and GA releases. These are all significant milestones for people who care about the project, and the project is deservedly popular! But from an HN point of view, they are distinctions without differences, because the underlying topic of discussion is always the same—the project in general.
Been reading about this work for years, really cool to see it get merged! I recently came across a neat interview which describes upcoming changes to the OCaml GC and memory management system, which apparently lead to huge performance gains:
What's next after this? There are a lot of different projects ongoing for OCaml (typed effects, modular explicits/implicits, all of the RFCs, flambda and flambda 2, and I'm probably forgetting a lot of them). However, while following the progress of OCaml Multicore was easy thanks to the monthly posts, it's a bit harder to know where the focus will be next. Is there something like a "state of OCaml" that's planned, or a roadmap?
Congratulations on the merge, and thank you for all the work that went into it!
Few of those projects are still at the research stage (typed effects, modular implicits, unboxed types), with a handful of people working on them at most.
It is hard or even counter-productive to try to fit them on a roadmap.
The multicore project is(was?) a bit unusual from the point of view of OCaml development since it is a massive engineering effort with a focused team.
Nevertheless, I hope that we continue and extend the "OCaml compiler bimonthly" news to give more information about the ongoing work on the compiler.
Thank you for the explanation. I hope too you will continue the OCaml compiler bimonthly, it's always an interesting read, and it's nice to be up to date with what's happening.
What were the main technical challenges? why did it take so many years? did it take research breakthrough, or was it something else (organisational issue / technical debt...)?
It involved a fair amount of research and there were many technical challenges. Maintaining backwards compatibility in terms of language features and single-threaded performance puts constraints around what you can implement. Also from my personal experience, debugging segfaults in parallel programs that could be caused by GC bugs is a painstaking process.
Added to that is the complexity of tracking a moving target. Multicore had to be rebased through 12 releases of OCaml, which in itself was a non-trivial amount of work.
With the merging to trunk, Multicore doesn't really exist as a separate project. The last major milestone that is likely to have a multicore-dominant discussion thread is probably the 5.0 release.
Hah. I only contributed to the project for about half it's life.
There's still a lot of work to be done before 5.0 is released, it's just it'll happen through the usual PR and review process rather than a separate one.
Not a stupid question. I think adding parallelism to the compiler might be possible, though there are certainly parts that use a lot of mutable state that could prove tricky.
Yes! https://v3.ocaml.org/users has a list some of industrial users. The list might not be as large as say for languages like Rust, but its more than just Janestreet :)
Personally, I've also been lucky enough to have worked for two different employers (not in the finance or compiler space) over the past 2 years where I've mostly used OCaml for writing things that many people might consider "boring", namely lots of web services, database drivers, data processing, etc.
It was very similar to the Python story: native threads, but there is a global lock on the OCaml runtime. You can fire off a thread and have it stay busy in external code (e.g. in a C library), but only one thread can be running OCaml code at a time.
For this reason (and others), forking child processes has been a common alternative on many OCaml projects.
See here for an update re: the whole multicore initiative, and links to more information:
I would still fork child processes and use i.p.c. probably until there be something similar to channels, and often even then, because signal handling and forking in a multithreaded program is quite a hurdle.
Even in Rust, no one really knows at this point what is safe to do in a fork from a multithreaded program or in a signal handler in one. Signal handlers and forks are thus simply “unsafe” in Rust with “care must be taken”, but there is no real explanation either of what care, and Rust does not document or stabilize which of it's functions are async safe, as C does.
If I were a Rust maintainer, I wouldn't say what was safe either. :) This is an OS issue, not a language/runtime one. For example, POSIX has a lot to say about what is safe to do, and what isn't, after a fork() without exec() [1] and your OS of choice may have more to add. As you've said, it's perilous and messy.
I certainly know the underlying reasons, but Rust aspires to be a language with clear guarantees for what is safe, and what is not.
POSIX C documents which function is async safe, and specifies that only those functions may be used in certain contexts lest there be undefined behavior. Rust at this moment does not document which of it's functions are safe to use in such contexts, and I'm not sure what the situation with OCaml is.
Sharing state between threads in a language such as C is complex enough, but in a managed language there seem to be few guarantees. Go and Rust were designed with this in mind and fundamentally only allow either sharing data by way of channels, or by way of data structures whose reading and writing is race-free.
The received wisdom is that the only safe fork is the one followed immediately by an exec(e,l,p,v). If you can't exec(), then keep your design as simple as possible, consult your OS manual, and address your failure modes.
Asking any compiler to prove that your program's fork() is safe is asking an awful lot. :) If you're referring to Rust's notion of safe/unsafe code, "unsafe" explicitly refers to unsafe memory operations. It won't cover operational issues such as "threads won't be cloned" or "signal handlers will get dropped" during a fork. (Even if they were tracked: are those behaviours safe, or unsafe? That depends on the semantics of the program.)
Ada was another language designed with safety in mind. They chose to shove fork() into an "Unsafe POSIX Primitives" package -- in other words, use these at your own risk. I think this was a good call.
> The received wisdom is that the only safe fork is the one followed immediately by an exec(e,l,p,v). If you can't exec(), then keep your design as simple as possible, consult your OS manual, and address your failure modes.
But this is not true at all, calling any async safe function is save, including exec, which is simply an async safe function.
Furthermore, Rust's standard librarty does not even expose exec directly, but does expose a function to call arbitrary code before exec, but this function is marked as unsafe and the only documentation is that “care must be taken”.
For instance, socket activation in the launchd style that systemd and stocketd use must execute various socket related operations after forking, and before exec, these operations are async safe, and thus it is possible to do this from a multithreaded program, but in Rust there are no guarantees whether various functions dealing with sockets are async safe. They might call malloc or use mutexes internally; one has no idea.
> Asking any compiler to prove that your program's fork() is safe is asking an awful lot.
I'm not asking that at all, but it's actually not difficult at all. All it needs is a trait for async safety on functions, and any function that is marked as async safe can only call other functions internally that are. That is all it takes to do it savely.
But I'm not asking for a compiler proof; I'm simply asking that Rust document which functions are async safe and which are not, so that programmers can do it in unsafe code.
> Ada was another language designed with safety in mind. They chose to shove fork() into an "Unsafe POSIX Primitives" package -- in other words, use these at your own risk. I think this was a good call.
And they document what functions are async safe; Rust has no documentation nor guarantees about this.
I'm sorry if I misunderstood you. I was talking about POSIX semantics of fork(), but you're talking about a Rust documentation issue re: async. I didn't catch that earlier! I hope the issue gets sorted out for you.
> There is no description of what the PR does which is pretty bad form.
The description is in the first paragraph of the PR body text:
> This PR adds support for shared-memory parallelism through domains and direct-style concurrency through effect handlers (without syntactic support). It intends to have backwards compatibility in terms of language features, C API, and also the performance of single-threaded code.
OCaml had threads. There even a Threads module in standard library. The limitation was that only one thread can run at any given time.
P.S There were ways around it, like calling an extern C function which will spawn a new theead and call and pass the control back, but those were almost never used.
Ish. OCaml didn't have any constructs in the language itself for parallelism. You could fork processes and things and libraries did/do exist to write parallel code in OCaml.
Note that Eio is more than just a direct replacement for Lwt and Async. We couldn't resist using some of the experiences also gained from the MirageOS (mirage.io) unikernel framework in EIO. This means that the backends are highly optimised to use the best syscalls available in the OS (e.g. io_uring by default in Linux). If you write your applications to use Eio natively, then performance is very high so far. The ergonomics of programming in it also compare favourably to using monadic concurrency.
I just lurk the OCaml forums so don't take what I say as gospel but from what I've understood EIO will _not_ be part of stdlib. The OCaml devs go out of their way to not break things in stdlib so from what I've gathered they don't want to include it EIO until there's more real world experience with it and the API has started to settle down. Including it in stdlib might also never happen but that remains to be seen.
It's too early to say. Eio is still under very active development, so we'll have to see how it goes. The more useful feedback we get on it now, the more likely it'll be submitted for consideration in the OCaml stdlib when appropriate.
Along with the graphs from the PR in the sibling comment, there's also the extensive benchmarking from the ICFP2020 paper: https://arxiv.org/pdf/2004.11663.pdf
In short the expectation should be that single-threaded code performs roughly the same (single digit percentage changes) as on the sequential runtime.
Parallel code on multicore can see close to linear speedups on 64 cores, though it depends significantly on your workload. If you're interested in parallelising existing OCaml code, I gave an example-driven OCaml workshop talk in 2020: https://www.youtube.com/watch?v=Z7YZR1q8wzI
No, typed effects are still in the "research project with open design questions" category. It doesn't really make sense to have roadmaps for such open-ended feature.
In other words, I am keen to not reproduce the announcement that "multicore might be in 4.03" from 5 years ago.
In many popular languages today (e.g. C++) if you have a data race you're completely screwed, (e.g. the program exits successfully, even though it was supposed to loop forever serving TCP requests - good luck figuring out why). In Java they decided OK, that's not acceptable, so data races are constrained to only the data touched by the race and, importantly, that data is still legal it just might be astonishing (e.g. you were adding several small positive integers together from a shared data structure in parallel, but due to a misunderstanding in your design this was actually a data race, and some time later somehow your total is now zero, but you won't crash or whatever)
OCaml intends to further constrain the consequences in time, if the total was 114 when you stopped adding, it will still be 114 later, it won't mysteriously become zero (or any other value) thanks to a data race which must have happened before you checked.
[I'm sure I have some details wrong, but this is the gist]
What remains to be seen is: Is that enough? There was great hope when Java made its rules that they were enough and programmers could understand what was wrong in a Java program with data races, that did not pan out as I understand it. So, it seems to me that it's possible OCaml ends up in the same situation.