
Runtime Support for Multicore Haskell: A Retrospective - lelf
https://blog.sigplan.org/2019/12/16/runtime-support-for-multicore-haskell-a-retrospective/
======
gautamcgoel
Nice write-up. Haskell and GHC have definitely been enormously successful
platforms for PL research in general, and research in parallelism in
particular. However, one unfortunate side effect of all that research effort
is that there are now at least half a dozen libraries for parallelism and
concurrency (STM, Async, Control.Parallel.Strategies, etc), to the point that
it is hard for a software engineer to grasp the tradeoffs presented by the
various options. I feel like I succumb to decision paralysis when I attempt
parallel programming in Haskell; from a purely pragmatic perspective, I much
prefer the Go approach, where there is a single canonical concurrency
mechanism which is baked right into the language and is extremely well-
documented.

~~~
wostusername
Been a while since I've done Haskell, but the essential breakdown as I
understand it is this:

At lowest level you have the Control.Concurrent primitives: forkIO and MVar.
forkIO allows you to spin up new Haskell threads (not OS threads) and MVar's
can be used to communicate between them. MVar's can be empty or full and block
when the operation can't be completed immediately (writing to a full MVar or
reading from an empty MVar). They also have single wakeup semantics, so if you
have multiple readers blocking on an empty MVar read, only one will be woken
up when someone writes to it.

STM has transactional semantics. You can write to multiple STM variables
atomically in a transaction, and your transaction will restart if any of your
input variables have been written to while your transaction was running.
Useful if you need to maintain consistency between several variables at all
times.

Async is a wrapper around STM and provides common functions you would often
write yourself to express common patterns, but more battle tested.

Strategies are about optimistically evaluating Haskell thunks in parallel. For
example if you map a function over a list, now you have a list of thunks. You
can use Strategies to now evaluate those thunks in parallel. Key thing with
Strategies is that if you remove them from your code (or use a non-threaded
runtime) it doesn't affect the semantics of your program, it just might make
it slower due to loss of parallel evaluation.

The Par Monad isn't used much, but the key idea there is you build an explicit
dependency graph of the values in your computation and the runtime tries to
parallelize the nodes in the graph that it can. Similar to what make does if
you run it with -j.

~~~
verttii
Thanks, excellent summary!

------
mark_l_watson
Really nice paper, especially the bullet points near the end.

I applied for a job with Simon Marlow’s Haskell team at FaceBook about three
years ago. I really enjoyed talking with him in the phone, but it was obvious
that I didn’t have sufficient Haskell skills.

There is a lot of good synergy between meeting FB’s platform requirements and
research and development for the Haskell ecosystem.

------
xvilka
Sadly, Multicore OCaml [1] is nowhere near [2] completion yet. Nor it moves
fast. There are still a lot [3] of issues to solve and sync with the
mainstream OCaml until it will be ready for anything basic. I write OCaml
mostly these days (along with Rust, C, Python, and a couple less mainstream
languages), but often miss the ease of parallelization with Haskell, code
clarity, and so on.

[1] [https://github.com/ocaml-multicore](https://github.com/ocaml-multicore)

[2] [https://github.com/ocaml/ocaml/labels/multicore-
prerequisite](https://github.com/ocaml/ocaml/labels/multicore-prerequisite)

[3] [https://github.com/ocaml-multicore/ocaml-
multicore/projects/...](https://github.com/ocaml-multicore/ocaml-
multicore/projects/3)

~~~
pdimitar
Truthfully, I'm waiting for Multicore OCaml and I'll only use it, Rust and
Elixir.

OCaml's typing system is in my eyes unbeatable. Still considering making a
transpiler from it to Rust and Elixir but I'm grossly unqualified to write
compilers. :(

Haskell though, I couldn't grok it. Seemed to introduce a lot of tension for
very basic topics -- like strings -- and it lost me.

~~~
_verandaguy
As someone who learned it casually a while back: Haskell's not great for
strings, I'll give you that.

How does the OCaml type system compare with Haskell's though?

~~~
atombender
The big distinguishing difference is that OCaml doesn't have ad-hoc
polymorphism and function overloading. You cannot have a function that
operates on many different types; which is why you have two sets of arithmetic
operators for integers and floats ("+" vs "+.", for example). In theory you
can use modules for this, but it's awkward, and I don't think they're
generally used this way. (OCaml is getting implicit modules, and this may help
the situation.)

In Haskell, typeclasses make this super easy. It allows you define a family of
functions/operators that together become what a Go or Java developer might
call an interface. Code written to use a particular typeclass will work for
any implementation of that typeclass.

On the other hand, OCaml's module system is very powerful and has no
counterpart in Haskell. It allows you to package a whole type implementation
into a single module that has a public signature, and can be instantiated with
generic type parameters.

~~~
chongli
I don’t know anything about OCaml so please forgive me if this question seems
silly.

 _two sets of arithmetic operators for integers and floats_

What if I want to create more numeric types? Say vectors in R^n, for example.
Do I need to create more operators for vector addition and scalar
multiplication? How do I make sure the user can’t add vectors of incompatible
dimension?

~~~
atombender
Yes, you need to add new functions. You cannot redefine "+" to work a custom
vector type.

Similarly, you cannot have a "print" function that works for all sorts of
types. OCaml has print_string, print_int, and so on.

OCaml does have parametric polymorphism ("generics"). You can write functions
which support generic parameters. And you can use modules as "functors" to
sort of package up types in a similar way as Haskell typeclasses (e.g. see
[1]).

[1] [https://stackoverflow.com/questions/14934242/whats-the-
close...](https://stackoverflow.com/questions/14934242/whats-the-closest-
thing-to-haskells-typeclasses-in-ocaml)

------
perlpimp
"there’s a lot more to realising good parallel speedup than just choosing the
right language"

from ye olde days - if you can't split up computation into nice isolated
chucks point is moot, so relatively fine granularity of your algo stepping is
the key here, lack or severe control of side effects means you don't get to
make too many mistakes related to mechanics of running the computation.

~~~
zzzcpan
_> so relatively fine granularity of your algo stepping is the key here_

Fine granularity requires pretty much zero overhead synchronization, which
green threads or any shared memory multithreading implementations can't do,
they need to spend like a few thousands of nanoseconds of useful work before
synchronization costs become even just bearable.

~~~
chrisseaton
> Fine granularity requires pretty much zero overhead synchronization

Yes, like a work-stealing scheduler. Many tiny tasks are fine as long as you
keep them on one core. Other cores can steal a batch of them from the other
end of your queue with minimal synchronisation every now and again.

~~~
zzzcpan
No, work stealing is pretty high overhead.

~~~
chrisseaton
That's not my experience. I think work stealing is low overhead as it
amortises synchronisation, only does it at all when actively required to get
more work, and reduces conflicts due to two ends of the queue. Why do you
think it's high overhead?

~~~
zzzcpan
It's only one part of the story (scheduler) and not even good enough idea
considering non-locality overhead.

~~~
chrisseaton
Isn't work stealing excellent for locality? Jobs stay on the same core until
there's a need to steal, and then the most likely non-resident jobs are taken.
Is it even cache oblivious?

------
earthboundkid
Huh, it's almost like the idea of pure functional programming is all hype and
it doesn't actually make performant concurrency any easier to implement…

~~~
zadler
Did you read the article?

