
Haxl: Making Concurrency Unreasonably Easy [video] - dmit
http://events.techcast.com/bigtechday10/Garmisch-1345/
======
daxfohl
I wonder how long it will be before compilers/interpreters of async-aware
languages just do this by default. CPUs and low-level language compilers jump
through all kinds of hoops of out-of-order execution, branch prediction,
caching, parallel execution, etc.

I picture a day maybe 10 years from now where developers in most languages
don't even have to think about these things. All the old-timers will still be
structuring their code "as though it didn't exist" whereas the new kids will
fly along without even thinking about it. Kind of like garbage collection the
first few years.

~~~
DSrcl
It's unsafe for a compiler to do this in general (i.e. without annotations)
because it can't determine dependencies that are external to the program --
e.g. `one=get(); two=get()`. The dependency between one and two is not obvious
to a compiler when IO is involved, so it has to assume the two `gets' has to
be executed sequentially.

~~~
eyelidlessness
> It's unsafe for a compiler to do this in general (i.e. without annotations)
> because it can't determine dependencies that are external to the program

As I understood the talk, Haxl doesn't address this either. It depends on you
describing an IO operation as a type with set of functionality that can be
used to identify those dependencies.

> e.g. `one=get(); two=get()`. The dependency between one and two is not
> obvious to a compiler when IO is involved, so it has to assume the two
> `gets' has to be executed sequentially.

A "pure" language will address a lot of this due to referential transparency.
Which is to say that if `get` takes no arguments, its instructions for where
and how to perform an IO operation are entirely static. Given your literal
example, the only way the result could be different is if there were some side
effect or if a source of data required by `get` could change during execution
time.

Haxl's takes this a bit further by suggesting that if you are performing the
same IO request twice "at the same time" _, you expect to get the same result
both times, so it memoizes the request for the duration of your set of IO
operations.

_ Within a `do`, as far as I could grok from the talk and glancing at
documentation; it's worth noting I don't know Haskell and have never heard of
Haxl before today.

------
aban
Great talk.

For more on concurrency and parallelism in Haskell, check out Parallel and
Concurrent Programming in Haskell [0], deemed as the best book on the subject,
also written by Simon Marlow.

[0]:
[http://chimera.labs.oreilly.com/books/1230000000929](http://chimera.labs.oreilly.com/books/1230000000929)

------
vdijkbas
Haxl is a powerful abstraction with IMHO a beatifuly simple implementation.

However for our use case at LumiGuide (reading and writing registers of modbus
devices) it wasn't simple enough. We just needed an abstraction for batching
and did not need caching and the other features Haxl provides.

So I wrote monad-batcher which as the name implies only provides a batching
abstraction (which can also be used to execute commands concurrently). All the
other features can be build on top of monad-batcher as separate layers
(separation of concerns).

The library is available on Hackage but needs a bit more documentation (a
tutorial would be nice):

[http://hackage.haskell.org/package/monad-
batcher](http://hackage.haskell.org/package/monad-batcher)

------
sedachv
I looked through the slides but not the video and the slides ignore the hard
problem: how do you schedule these requests? How do you know how many parallel
requests you can issue without hammering the database or service? How do you
batch queries so that you get acceptable latency and a query size that will
not choke the database?

The last question is probably easy for most use cases where you have
independent requests coming in (typical web application) - in the context of a
single request you can usually get away with batching as much as is possible.
But the scheduling problem is very similar to the promises of "free
parallelism because Church-Rosser" \- actually taking advantage is an open
problem. Even when you know how much time each job takes in advance,
multiprocessor scheduling is NP-hard.

Anyway, if someone watched the video and the question is addressed there,
please let me know so I can watch it.

~~~
eru
I think the idea is that with the Haxl approach the scheduling can be dealt
with independently from your business logic.

By the way, I don't see how Church-Rosser would give you any free parallelism
---even in theory. You'd still have to heed Guy Steele's advice (see
[https://vimeo.com/6624203](https://vimeo.com/6624203)).

~~~
sedachv
> By the way, I don't see how Church-Rosser would give you any free
> parallelism---even in theory.

Church-Rosser theorem means that all possible reduction sequences lead to the
same normal form, so you can β reduce the subterms in parallel.

If you look at Paul Hudak's and SPJ's publications from the late 1980s and
early 1990s many of them are actually about trying to exploit this implicit
parallelism:

[http://sunsite.informatik.rwth-
aachen.de/dblp/db/indices/a-t...](http://sunsite.informatik.rwth-
aachen.de/dblp/db/indices/a-tree/j/Jones:Simon_L=_Peyton.html)

[http://dblp.uni-trier.de/pers/hd/h/Hudak:Paul](http://dblp.uni-
trier.de/pers/hd/h/Hudak:Paul)

[https://pdfs.semanticscholar.org/8912/5be7f9c222793c18b99d06...](https://pdfs.semanticscholar.org/8912/5be7f9c222793c18b99d0644c16ff19d9f06.pdf)

This turns out to be a hard problem and fell out of fashion as a research
topic. The big problem is managing overhead. There has been some recent work
to try to incorporate automatic profiling feedback to adjust the parallelism
granularity:

[http://dominic-mulligan.co.uk/wp-
content/uploads/2015/05/S-R...](http://dominic-mulligan.co.uk/wp-
content/uploads/2015/05/S-REPLS_Calderon.pdf)

~~~
eru
Yes, but you still need to have those parallel subterms in the first place!

If you eg still use a linked list data structure as an input, you are never
gonna be faster than O(n) no matter how clever your implicit parallelism is.

~~~
sedachv
> If you eg still use a linked list data structure as an input, you are never
> gonna be faster than O(n) no matter how clever your implicit parallelism is.

I think you took away completely the wrong point from Steele's talk. The
bottleneck is not in the data structures, it is in the control flow, which is
just a trivial way of restating Amdahl's law. In fact Steele co-wrote an
excellent expository paper with Daniel Hillis, while they were at Thinking
Machines, showing how to do parallel processing on linked lists:

[http://cva.stanford.edu/classes/cs99s/papers/hillis-
steele-d...](http://cva.stanford.edu/classes/cs99s/papers/hillis-steele-data-
parallel-algorithms.pdf)

~~~
eru
I was in the audience and asked questions, and chatted with him afterwards..

Data flow dictates control flow (to a certain extent).

The paper you linked to is very interesting: but even there they have to
augment their 'linked list' with more links. And they assume one processor per
list element and don't count the set-up time of having each processor find its
list element.

(There's some more complication to it. But I am glad between 1986 when the
paper was written and 2009 when he gave the talk, he figured out how to
explain some of these issues more simply.)

------
jankotek
I have not fully digested yet, but seems very similar to Scala Parallel
Collections and Java8 Streams. There are databases which implements such
interfaces.

~~~
willtim
It isn't the slightest bit similar to those! Haxl is a high-level library for
specifying (and optimising) concurrent data retrieval.

~~~
jack9
I'm not sure that's an apt description either. Haxl's features include auto
batching of IO through parallelism and a temporal cache...for error logging
when failures occur. It's spelled out in the first minute he talks about it.
In the IO batching code (the boilerplate) you specify how and what to output
in your logging. Haxl can be used without performing optimization (write an
inefficient batcher) or doing data retrieval (the I/O is fire and forget and
never cache anything). Maybe I am misunderstanding the basic usage.

~~~
willtim
It's not easy summarising it in one sentence, but 'auto batching' and
'caching' is what I meant by optimising the data retrieval.

EDIT: the Haxl docs describe it as "a library and EDSL for efficient
scheduling of concurrent data accesses with a concise applicative API"

------
pmarreck
How does this differ from BEAM langs which already make concurrency
"unreasonably easy"?

~~~
kornish
For one, they serve different purposes. Haxl is specifically for concurrent
data retrieval while BEAM is a general purpose platform for fault-tolerant
computation.

For another, they operate via different interfaces. BEAM languages communicate
concurrently only via a message passing interface. Haxl lets the author write
declarative code specifying _what_ to retrieve, then the library executes it
concurrently and in parallel under-the-hood.

Both are properly described as "unreasonably easy" \- just for different sorts
of things, on different platforms, with different interfaces.

~~~
pmarreck
Thank you for the breakdown!

~~~
kornish
Cheers!

