
Haskell's Missing Concurrency Basics (2016) - DanielRibeiro
https://www.snoyman.com/blog/2016/11/haskells-missing-concurrency-basics
======
fiorix
There's an open position at Facebook to work on GHC. If you're into Haskell
and want to make it better, here's your opportunity:
[https://www.facebook.com/careers/jobs/a0I1H00000MoVjBUAV/](https://www.facebook.com/careers/jobs/a0I1H00000MoVjBUAV/)

------
dnautics
How does erlang/elixir do it? I've never really had any problems.

~~~
SlySherZ
Elixir newbie here. If I remember correctly IO is implemented as a process,
which means that different write requests are processed sequentially in the
order they arrive to the process. The following link has more information:
[http://erlang.org/doc/apps/stdlib/io_protocol.html](http://erlang.org/doc/apps/stdlib/io_protocol.html)

~~~
phoe-krk
Correct. Erlang (and therefore Elixir) stdio is implemented by means of
sending messages to a process that does low-level writing to a console. When a
single message is served, the other are queued up.

Therefore, when two processes (let's name them A and B) want to output
something at the same time, the result is going to be AAAABBBB or BBBBAAAA
(depending on the order in which the messages arrive in the mailbox), but
never BBAABBAA or anything similar.

~~~
gmfawcett
...where "AAAA" and "BBBB" are two distinct messages, and not two sets of four
messages each ("A", "A", "A", "A").

When I first read your example, it sounded like you were saying that the IO
process would exhaust all messages from one source process before processing
any messages from the other, no matter what order they arrived in.

~~~
phoe-krk
Yes, correct. I wasn't clear enough.

------
chriswarbo
I had some sympathy for this situation, until I saw that the concurrency was
being specified via a function called `mapConcurrently`.

IMHO this is perfectly acceptable behaviour for a `map` function, since that
name has gained the connotation that its purpose is to transform one
'collection' (Functor; whatever) into another, by pointwise, _independent_
applications of the given function. Providing a function/action which breaks
this independence (by writing to the same handle) breaks this implicit
meaning. Heck, I'd consider it a code smell to combine interfering actions
like this using a _non-concurrent_ `map` function; I would prefer to define a
separate function to make this distinction explicit, e.g.

    
    
        -- Like 'map', but function invocations may interfere with each other (you've been warned!)
        runAtOnce = map
    

When using `map` functions (which is a lot!) I subconsciously treat it as if
it will be executed concurrently, in parallel, in any order. Consider that
even an imperative languages like Javascript provide a separate `forEach`
function, to prevent "abuses" of `map`. Even Emacs Lisp, not the most highly
regarded language, provides separate `mapcar` and `mapc` functions for this
reason.

With that said, I recognise that there's a problem here; but the problem seems
to be 'mapping a self-interfering function'. If we try to make it non-
interfering, we see that it's due to the use of a shared global value
(`stdout`); another code smell! Whilst stdout is append-only, it's still
mutable, so I'd try to remove this shared mutable state. Message passing is
one alternative, where we can have each call/action explicitly take in the
handle, then pass it along (either directly, or via some sort of "trampoline",
like an MVar). This way we get the "concurrent from the outside, single-
threaded on the inside" behaviour of actor systems like Erlang. In particular,
it's easy to make sure the handle _only_ get passed along when we're
'finished' with it (i.e. we've written a complete "block" of output).

------
divs1210
Thread-unsafe `println` is one of Clojure's quirks too!

~~~
masklinn
Interesting, I think it's thread-safe in Rust because one of the common
performance improvements for console applications with lots of output is to
acquire the relevant stream's lock (and perform all writes against a never-
released guard) otherwise it's going to be acquired and dropped on every
write: [https://doc.rust-
lang.org/src/std/io/stdio.rs.html#448-461](https://doc.rust-
lang.org/src/std/io/stdio.rs.html#448-461)

------
heavenlyhash
I'm kind of surprised to hear that writing to stdout is a source of
concurrency problems in a language that's considered to be functional.

Surely if you can pass your IO handles to all functions that need them, you
can decide on a mutexing/buffering strategy at the top of your program, wrap
the standard IO interface with a delegate that does so, and pass it on. Then,
for all libraries called thereafter to use it consistently isn't just a no-
brainer, it's an outright given, isn't it? There's no _global_ (impure, non-
functional) handle to stdout, is there?

~~~
foldr
Haskell's being functional is pretty much irrelevant here. The process has one
stdout. If functions that write to file handles don't acquire a lock, then the
output of different threads will get mixed up.

~~~
chriswarbo
It's easy to split hairs about what "being functional" means, but the way that
Haskell implements IO is certainly a byproduct of this. In particular, my
preferred mental model of Haskell doesn't include "functions that write to
file handles"; but rather, these would be pure functions which return IO
"actions".

This distinction is often inconsequential, but this case seems to rely on how
we combine those "actions" together (which, due to laziness, may happen far
away from where/when the functions are called; see 'lazy IO'). For sequential
IO we can combine things with Applicative and Monad, which gives us a definite
order, but using these in a concurrent setting would cause too much
synchronisation and determinism to be useful. I've not done enough concurrent
Haskell to know how the various alternatives stack up; although I did play
with Arrow many years ago, before it fell out of favour (seemingly for
Profunctors?).

~~~
gmfawcett
Monadic I/O (as I'm sure you know) just means that every I/O effect takes a
world-state as an implicit parameter ("the world just before this action"),
and returns a world-state ("the world just after this action") to thread into
the next effect. In a concurrent program, I/O effects from multiple threads
(and the outside world) may be interleaved or executed concurrently. Crudely,
the world you changed a moment ago isn't necessarily the world you're about to
change again. :)

Monadic I/O on its own doesn't make any guarantees, or impose any
requirements, about locking external resources. If stdout is locked (from
within the monad, or not), then that's the state the world arrives in for your
effect. If not, then it's not. They are orthogonal concerns.

~~~
chriswarbo
I get what you're saying, but I don't like to think of these as orthogonal:
locking is a way to force actions to occur in a particular order, so is
monadic IO (the 'world-state' is a dummy data dependency, preventing later
actions from getting called before earlier ones; it doesn't "really" contain
the whole state of the world ;) )

~~~
gmfawcett
I get what you're saying too. :) I think one reasonable semantics for monadic
I/O is that it sequences effects in a single thread; and effects from other
threads are part of the world-state, not part of the monadic context. Another
reasonable semantics is what I think you're describing: monadic I/O in a
multithreaded program should sequence effects across all the threads (e.g.,
mutexes on shared resources). It would be nice to have both options available
-- maybe similar to how you can plug a strategy into Haskell's
Control.Parallel.Strategies monad.

[https://hackage.haskell.org/package/parallel-3.2.1.1/docs/Co...](https://hackage.haskell.org/package/parallel-3.2.1.1/docs/Control-
Parallel-Strategies.html#t:Strategy)

------
clord
Use an STM channel or some other lock and put your messages for the shared
resource (the terminal ui) through that channel. There’s no way to
automatically figure out what granularity the programmer expects from the
output so make them specify. Haskell makes specifying that staggeringly easy
compared to other languages.

------
bitL
How can you get a performant language if your I/O granularity is 1 character?
:-O

EDIT: this is an honest question, I was shocked to read what was in the
article.

~~~
chriswarbo
Strings in Haskell are one of the language's sore points. It's something
that's mostly a non-issue for those using Haskell day to day, but may be
surprising to newcomers.

Haskell's built-in string type is a list of characters. This is mostly for
historical reasons, but it's also handy in education (installing extra
packages is a barrier for learners; list processing is common in introductory
courses, but lists are polymorphic/generic in their element type; lists of
characters are a nice concrete type, which follows on easily from "hello
world"); also there are arguments about the theoretical elegance of linked
lists, KISS for the builtins, whether there's concensus on what the best
alternative is, etc.

Anyone who cares about Haskell performance will have hit this early on, and be
using a different string implementation, as mentioned in the article. In
particular there's ByteString for C-like arrays of bytes, and there's Text
which is just a ByteString with extra metadata like character encoding. In
fact, ByteString doesn't have to be a single contiguous array: it can be a
list of "chunks", where each chunk contains a pointer to an array, an offset
and a length; this speeds up many operations, e.g. we can append ByteStrings
by adding chunks to the list (pointing to existing arrays), we can take
substrings by manipulating the offsets and lengths, etc. This is all perfectly
safe and predictable since the data is immutable, where other languages which
allow mutation might prefer to make copies of the data to reduce aliasing.

The other aspect is the buffering mode of the handle, which is discussed a
little in the article and its comments (e.g. line-based buffering, etc.).

------
bru
Title is missing a (2016).

