Hacker News new | past | comments | ask | show | jobs | submit login
OCaml Multicore merged upstream (github.com/ocaml)
320 points by sadiq on Jan 10, 2022 | hide | past | favorite | 74 comments



Interesting. As I understand it, this lands an approach in which Sequential Consistency is not guaranteed but if you have a data race you get even nicer guarantees than in Java and its authors believe this might be enough that real programmers working on real software can actually debug data races despite loss of sequential consistency.

In many popular languages today (e.g. C++) if you have a data race you're completely screwed, (e.g. the program exits successfully, even though it was supposed to loop forever serving TCP requests - good luck figuring out why). In Java they decided OK, that's not acceptable, so data races are constrained to only the data touched by the race and, importantly, that data is still legal it just might be astonishing (e.g. you were adding several small positive integers together from a shared data structure in parallel, but due to a misunderstanding in your design this was actually a data race, and some time later somehow your total is now zero, but you won't crash or whatever)

OCaml intends to further constrain the consequences in time, if the total was 114 when you stopped adding, it will still be 114 later, it won't mysteriously become zero (or any other value) thanks to a data race which must have happened before you checked.

[I'm sure I have some details wrong, but this is the gist]

What remains to be seen is: Is that enough? There was great hope when Java made its rules that they were enough and programmers could understand what was wrong in a Java program with data races, that did not pan out as I understand it. So, it seems to me that it's possible OCaml ends up in the same situation.


> e.g. you were adding several small positive integers together from a shared data structure in parallel, but due to a misunderstanding in your design this was actually a data race, and some time later somehow your total is now zero

Could you clarify why/how this would happen? Is it because the last process sets it to zero to initialize a variable?

I can’t conceptualize the steps that would end up with this result if all processes are adding. It seems like it would at least equal the result of the final thread’s calculation.


Non aligned data structures being written to. E.g. if you're adding 2 nibbles and writing it to the first part of a byte or writing a byte on a packed struct.


I believe the less scary race in Java implies a lower performance upper bound. Does the nicer guarantees in multicore OCaml have the same drawback? I.e. even less than Java?


(co-author of the OCaml memory model paper here)

The details of the 'LDRF' (local data race freedom) property are described in detail here: https://anil.recoil.org/papers/2018-pldi-memorymodel.pdf

The performance numbers are in the paper abstract: "our evaluation demonstrates that it is possible to balance a comprehensible memory model with a reasonable (no overhead on x86, ~0.6% on ARM) sequential performance trade-off in a mainstream programming language". It's a little higher on PowerPC but still very usable, and RISC-V overheads should be roughly comparable to ARM.


As the paper clarifies, this is for the performance of non-atomic read/writes in sequential setting. The paper left the performance evaluation for atomic read/writes to future work. Is there any indication yet regarding the performance in a parallel setting compared to weaker guarantees?


That's right -- we first wanted to establish that existing OCaml code wouldn't be adversely impacted. There are various efforts ongoing to build interested (locked and lock-free) concurrent data structures, so that will inform the parallel performance results. Nothing published yet.


Does this feature inhibit any kind of optimization that current compilers could perform?


According to the paper “any compiler optimisation that breaks the load-to-store ordering is disallowed.”


If you use Atomic* classes and the available concurrent standard lib datastructures, Java concurrency should not be an issue to any non-junior developer.


If you're interested to see a comparison of parallel programming in a number of functional languages check this repo out [0]. It includes multicore OCaml, parallel MLton (but not Poly/ML, which has been around and parallel longer), Haskell, Futhark, F#, Scala, and Rust.

Credit to Sam Westrick for turning me on to this [1].

[0] https://github.com/athas/raytracers

[1] https://twitter.com/shwestrick/status/1480587660691480579


One thing that I think is missing is the compilation time. Those can vary widely depending on the languages. Other than that it's a nice repo, thanks for sharing it!


Multicore OCaml seems quite slow compared to the others. I wonder how many low hanging fruit there are in the backend to speed those numbers up a bit.


I wondered the same, but it seems the multicore OCaml implementation of those benchmarks was last touched 2 years ago, and the top level README with the benchmark results was also last updated for the OCaml results around that same time. So maybe just rerunning those tests with the latest version would give better results?


From three weeks ago:

PR to Merge Multicore OCaml - https://news.ycombinator.com/item?id=29638152 - Dec 2021 (155 comments)

Past related threads:

Multicore OCaml: October 2021 - https://news.ycombinator.com/item?id=29238972 - Nov 2021 (12 comments)

Effective Concurrency with Algebraic Effects in Multicore OCaml - https://news.ycombinator.com/item?id=28838099 - Oct 2021 (59 comments)

Multicore OCaml: September 2021, effect handlers will be in OCaml 5.0 - https://news.ycombinator.com/item?id=28742033 - Oct 2021 (3 comments)

Multicore OCaml: September 2021 - Effect handlers will be in OCaml 5.0 - https://news.ycombinator.com/item?id=28719088 - Oct 2021 (3 comments)

Adapting the OCaml Ecosystem for Multicore OCaml - https://news.ycombinator.com/item?id=28440385 - Sept 2021 (1 comment)

Adapting the OCaml Ecosystem for Multicore OCaml - https://news.ycombinator.com/item?id=28373155 - Aug 2021 (21 comments)

Multicore OCaml: July 2021 - https://news.ycombinator.com/item?id=28039219 - Aug 2021 (14 comments)

Multicore OCaml: May 2021 - https://news.ycombinator.com/item?id=27480678 - June 2021 (27 comments)

Multicore OCaml: April 2021 - https://news.ycombinator.com/item?id=27140522 - May 2021 (89 comments)

Multicore OCaml: Feb 2021 with new preprint on Effect Handlers - https://news.ycombinator.com/item?id=26424785 - March 2021 (29 comments)

Multicore OCaml: October 2020 - https://news.ycombinator.com/item?id=25034538 - Nov 2020 (9 comments)

Multicore OCaml: September 2020 - https://news.ycombinator.com/item?id=24719124 - Oct 2020 (43 comments)

Parallel Programming in Multicore OCaml - https://news.ycombinator.com/item?id=23740869 - July 2020 (15 comments)

Multicore OCaml: May 2020 update - https://news.ycombinator.com/item?id=23380370 - June 2020 (17 comments)

Multicore OCaml: March 2020 update - https://news.ycombinator.com/item?id=22727975 - March 2020 (37 comments)

Multicore OCaml: Feb 2020 update - https://news.ycombinator.com/item?id=22443428 - Feb 2020 (80 comments)

State of Multicore OCaml [pdf] - https://news.ycombinator.com/item?id=17416797 - June 2018 (103 comments)

OCaml-multicore now at 4.04.2 - https://news.ycombinator.com/item?id=16646181 - March 2018 (4 comments)

A deep dive into Multicore OCaml garbage collector - https://news.ycombinator.com/item?id=14780159 - July 2017 (89 comments)

Lock-free programming for the masses - https://news.ycombinator.com/item?id=11907584 - June 2016 (29 comments)

Lock-free programming for the masses - https://news.ycombinator.com/item?id=11893911 - June 2016 (4 comments)

OCaml 4.03 will, “if all goes well”, support multicore - https://news.ycombinator.com/item?id=9582980 - May 2015 (113 comments)

Multicore OCaml - https://news.ycombinator.com/item?id=8003699 - July 2014 (1 comment)


Great roundup, thanks! As I write this, 6 hours after this very-long-anticipated story was at the top of the front page, it's now down on page 3 with 299 upvotes. Not complaining, just find the ranking system to be so inscrutable at times like this. If I hadn't checked the site this morning, I'd have missed it. With a story history like this, I'd think it reasonable for the story to ride the front page all day. Of course, computers only do what one tells them to do, and ranking systems are hard...


This topic had a major thread just a few weeks ago. We try to avoid too much repetition here, because it's not good for curiosity.

Popular projects often get submissions of every step of the release lifecycle, from major commits, to PRs, to merges into main branches, to alpha releases and beta releases and GA releases. These are all significant milestones for people who care about the project, and the project is deservedly popular! But from an HN point of view, they are distinctions without differences, because the underlying topic of discussion is always the same—the project in general.

I wrote a detailed explanation about just this sort of situation here: https://news.ycombinator.com/item?id=23071428.

Further explanations at these links:

https://hn.algolia.com/?dateRange=all&page=0&prefix=false&so...

https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...

https://hn.algolia.com/?dateRange=all&page=0&prefix=false&so...


Thank you!


Been reading about this work for years, really cool to see it get merged! I recently came across a neat interview which describes upcoming changes to the OCaml GC and memory management system, which apparently lead to huge performance gains:

https://signalsandthreads.com/memory-management/


The "huge performance gains" refers to the prefetching optimisation, which is already in OCaml (the old OCaml 4 branch, not in multicore yet).


As usual I'm happy to answer any questions I can.

(for maybe the second to last time)


What's next after this? There are a lot of different projects ongoing for OCaml (typed effects, modular explicits/implicits, all of the RFCs, flambda and flambda 2, and I'm probably forgetting a lot of them). However, while following the progress of OCaml Multicore was easy thanks to the monthly posts, it's a bit harder to know where the focus will be next. Is there something like a "state of OCaml" that's planned, or a roadmap?

Congratulations on the merge, and thank you for all the work that went into it!


Few of those projects are still at the research stage (typed effects, modular implicits, unboxed types), with a handful of people working on them at most.

It is hard or even counter-productive to try to fit them on a roadmap.

The multicore project is(was?) a bit unusual from the point of view of OCaml development since it is a massive engineering effort with a focused team.

Nevertheless, I hope that we continue and extend the "OCaml compiler bimonthly" news to give more information about the ongoing work on the compiler.


Thank you for the explanation. I hope too you will continue the OCaml compiler bimonthly, it's always an interesting read, and it's nice to be up to date with what's happening.


For the not so OCaml savvy, can you recommend a blog post that explain, what programming constructs this adds

Maybe a before and after, but one that focus on expressiveness, what you were not able to say/express in OCaml before, that is now* possible

not just speed or performance

*s/not/now


Good question!

https://github.com/ocaml-multicore/effects-examples has links to tutorials and examples for how effects can be used.

There's also some slides from KC's talk on effect handlers https://kcsrk.info/slides/handlers_edinburgh.pdf and materials from the CUFP 17 tutorial: https://github.com/ocamllabs/ocaml-effects-tutorial

https://gopiandcode.uk/logs/log-bye-bye-monads-algebraic-eff... this is also a great introduction


What were the main technical challenges? why did it take so many years? did it take research breakthrough, or was it something else (organisational issue / technical debt...)?


It involved a fair amount of research and there were many technical challenges. Maintaining backwards compatibility in terms of language features and single-threaded performance puts constraints around what you can implement. Also from my personal experience, debugging segfaults in parallel programs that could be caused by GC bugs is a painstaking process.

We've written a couple of papers detailing the internals and the trade-offs involved: https://arxiv.org/abs/2004.11663 (for parallelism) and https://arxiv.org/abs/2104.00250 (for effects)

Added to that is the complexity of tracking a moving target. Multicore had to be rebased through 12 releases of OCaml, which in itself was a non-trivial amount of work.


Perhaps not the question you expected, but ..

> (for maybe the second to last time)

How come?


With the merging to trunk, Multicore doesn't really exist as a separate project. The last major milestone that is likely to have a multicore-dominant discussion thread is probably the 5.0 release.


As someone who has worked so long on the project, how do you feel? Relieved?


Hah. I only contributed to the project for about half it's life.

There's still a lot of work to be done before 5.0 is released, it's just it'll happen through the usual PR and review process rather than a separate one.


Maybe a bit of a stupid question, but, will the compiler itself evolve to use multiple cores?


Not a stupid question. I think adding parallelism to the compiler might be possible, though there are certainly parts that use a lot of mutable state that could prove tricky.

Whether it's beneficial or not is unclear though.


Can you recommend a good resource for getting started with OCaml ?


Real World OCaml (https://dev.realworldocaml.org/) is a great one.


Books, videos, and other good resources are listed here: https://ocaml.org/learn/

This course text is also great, I don't know if it's listed at ocaml.org/learn or not:

https://cs3110.github.io/textbook/


+1 for CS 3110 (OCaml Programming: Correct + Efficient + Beautiful), and the video series is similarly excellent: https://www.youtube.com/playlist?list=PLre5AT9JnKShBOPeuiD9b...


Amazing to see this merged. As far as I can tell this work goes all the way back to 2014 (1), and has been a huge effort.

1. https://ocaml.org/meetings/ocaml/2014/ocaml2014_1.pdf


Congrats - this is a big hurdle to adoption that is now gone. The engineering that went into this is impressive.


It's been a long time since I was hyped about something in programming! Very cool, I really want to try OCaml now.


I’m hyped about OCaml for some time. I might finally give it a go.


Congrats on the merge, big achievement!

I have been interested in OCaml for a while (mainly because of ReasonML), never really got to it unfortunately, but maybe this is the time.


Is anywhere other than Jane Street using OCaml? :)


Yes! https://v3.ocaml.org/users has a list some of industrial users. The list might not be as large as say for languages like Rust, but its more than just Janestreet :)

Personally, I've also been lucky enough to have worked for two different employers (not in the finance or compiler space) over the past 2 years where I've mostly used OCaml for writing things that many people might consider "boring", namely lots of web services, database drivers, data processing, etc.


Facebook, Bloomberg, Citrix, Docker and a handful of others featured on their website[1]. I'm sure there are others that are quietly using it.

[1]https://ocaml.org/learn/companies.html


OCaml was single threaded until this or what is this exactly? There is no description of what the PR does which is pretty bad form.


It was very similar to the Python story: native threads, but there is a global lock on the OCaml runtime. You can fire off a thread and have it stay busy in external code (e.g. in a C library), but only one thread can be running OCaml code at a time.

For this reason (and others), forking child processes has been a common alternative on many OCaml projects.

See here for an update re: the whole multicore initiative, and links to more information:

https://discuss.ocaml.org/t/multicore-ocaml-december-2021-an...


I would still fork child processes and use i.p.c. probably until there be something similar to channels, and often even then, because signal handling and forking in a multithreaded program is quite a hurdle.

Even in Rust, no one really knows at this point what is safe to do in a fork from a multithreaded program or in a signal handler in one. Signal handlers and forks are thus simply “unsafe” in Rust with “care must be taken”, but there is no real explanation either of what care, and Rust does not document or stabilize which of it's functions are async safe, as C does.


If I were a Rust maintainer, I wouldn't say what was safe either. :) This is an OS issue, not a language/runtime one. For example, POSIX has a lot to say about what is safe to do, and what isn't, after a fork() without exec() [1] and your OS of choice may have more to add. As you've said, it's perilous and messy.

[1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/fo...


I certainly know the underlying reasons, but Rust aspires to be a language with clear guarantees for what is safe, and what is not.

POSIX C documents which function is async safe, and specifies that only those functions may be used in certain contexts lest there be undefined behavior. Rust at this moment does not document which of it's functions are safe to use in such contexts, and I'm not sure what the situation with OCaml is.

Sharing state between threads in a language such as C is complex enough, but in a managed language there seem to be few guarantees. Go and Rust were designed with this in mind and fundamentally only allow either sharing data by way of channels, or by way of data structures whose reading and writing is race-free.


The received wisdom is that the only safe fork is the one followed immediately by an exec(e,l,p,v). If you can't exec(), then keep your design as simple as possible, consult your OS manual, and address your failure modes.

Asking any compiler to prove that your program's fork() is safe is asking an awful lot. :) If you're referring to Rust's notion of safe/unsafe code, "unsafe" explicitly refers to unsafe memory operations. It won't cover operational issues such as "threads won't be cloned" or "signal handlers will get dropped" during a fork. (Even if they were tracked: are those behaviours safe, or unsafe? That depends on the semantics of the program.)

Ada was another language designed with safety in mind. They chose to shove fork() into an "Unsafe POSIX Primitives" package -- in other words, use these at your own risk. I think this was a good call.


> The received wisdom is that the only safe fork is the one followed immediately by an exec(e,l,p,v). If you can't exec(), then keep your design as simple as possible, consult your OS manual, and address your failure modes.

But this is not true at all, calling any async safe function is save, including exec, which is simply an async safe function.

Furthermore, Rust's standard librarty does not even expose exec directly, but does expose a function to call arbitrary code before exec, but this function is marked as unsafe and the only documentation is that “care must be taken”.

For instance, socket activation in the launchd style that systemd and stocketd use must execute various socket related operations after forking, and before exec, these operations are async safe, and thus it is possible to do this from a multithreaded program, but in Rust there are no guarantees whether various functions dealing with sockets are async safe. They might call malloc or use mutexes internally; one has no idea.

> Asking any compiler to prove that your program's fork() is safe is asking an awful lot.

I'm not asking that at all, but it's actually not difficult at all. All it needs is a trait for async safety on functions, and any function that is marked as async safe can only call other functions internally that are. That is all it takes to do it savely.

But I'm not asking for a compiler proof; I'm simply asking that Rust document which functions are async safe and which are not, so that programmers can do it in unsafe code.

> Ada was another language designed with safety in mind. They chose to shove fork() into an "Unsafe POSIX Primitives" package -- in other words, use these at your own risk. I think this was a good call.

And they document what functions are async safe; Rust has no documentation nor guarantees about this.


I'm sorry if I misunderstood you. I was talking about POSIX semantics of fork(), but you're talking about a Rust documentation issue re: async. I didn't catch that earlier! I hope the issue gets sorted out for you.


> There is no description of what the PR does which is pretty bad form.

The description is in the first paragraph of the PR body text:

> This PR adds support for shared-memory parallelism through domains and direct-style concurrency through effect handlers (without syntactic support). It intends to have backwards compatibility in terms of language features, C API, and also the performance of single-threaded code.


OCaml had threads. There even a Threads module in standard library. The limitation was that only one thread can run at any given time.

P.S There were ways around it, like calling an extern C function which will spawn a new theead and call and pass the control back, but those were almost never used.


Ish. OCaml didn't have any constructs in the language itself for parallelism. You could fork processes and things and libraries did/do exist to write parallel code in OCaml.


Congratulations on the accomplishment!


the PR adds effect handlers (`stdlib/effectHandlers.ml`). Does that mean work will begin to start replacing Async+Lwt with a native alternative?


Work has already started: https://github.com/ocaml-multicore/eio

There's also now https://github.com/talex5/lwt_eio, which allows you to run existing Lwt code alongside code using effects, to aid with porting.


work has already begin on a direct-style IO library that internally uses effects. See:

- https://github.com/ocaml-multicore/eio#readme for more information on the Eio library - https://watch.ocaml.org/videos/watch/74ece0a8-380f-4e2a-bef5... is a short talk on experiences using effects with some nice motivating examples

Note that Eio is more than just a direct replacement for Lwt and Async. We couldn't resist using some of the experiences also gained from the MirageOS (mirage.io) unikernel framework in EIO. This means that the backends are highly optimised to use the best syscalls available in the OS (e.g. io_uring by default in Linux). If you write your applications to use Eio natively, then performance is very high so far. The ergonomics of programming in it also compare favourably to using monadic concurrency.


so EIO is a new IO implementation built on top of effect handlers? Will EIO also be part of the stdlib?

ie. will the httpaf server library need to import it as an external dependency like it currently does with lwt, or will EIO already be available?


I just lurk the OCaml forums so don't take what I say as gospel but from what I've understood EIO will _not_ be part of stdlib. The OCaml devs go out of their way to not break things in stdlib so from what I've gathered they don't want to include it EIO until there's more real world experience with it and the API has started to settle down. Including it in stdlib might also never happen but that remains to be seen.


It's too early to say. Eio is still under very active development, so we'll have to see how it goes. The more useful feedback we get on it now, the more likely it'll be submitted for consideration in the OCaml stdlib when appropriate.


I'm familiar with the C++11 object model.

How does OCaml compare?

From a quick skim of papers, it seems it provides sequential consistency when using atomics and acquire/release semantics when not.

Sounds like a pretty bad design.


> it seems it provides sequential consistency when using atomics and acquire/release semantics when not.

(author of the said paper here) This is wrong. I suggest reading the paper closely. If not, the morning paper has a good summary [1,2].

[1] https://blog.acolyer.org/2018/08/09/bounding-data-races-in-s...

[2] https://blog.acolyer.org/2018/08/10/bounding-data-races-in-s...


I would have preferred a straightforward answer instead.

It's not obvious to me from those articles how what I said is inaccurate.


Any perf benchmarks?


Along with the graphs from the PR in the sibling comment, there's also the extensive benchmarking from the ICFP2020 paper: https://arxiv.org/pdf/2004.11663.pdf

Work on this is on-going via the sandmark benchmarking suite: https://github.com/ocaml-bench/sandmark

In short the expectation should be that single-threaded code performs roughly the same (single digit percentage changes) as on the sequential runtime.

Parallel code on multicore can see close to linear speedups on 64 cores, though it depends significantly on your workload. If you're interested in parallelising existing OCaml code, I gave an example-driven OCaml workshop talk in 2020: https://www.youtube.com/watch?v=Z7YZR1q8wzI




Best news of the day! Is there any roadmap for typed effects?


No, typed effects are still in the "research project with open design questions" category. It doesn't really make sense to have roadmaps for such open-ended feature.

In other words, I am keen to not reproduce the announcement that "multicore might be in 4.03" from 5 years ago.


Very interesting.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: