
Concurrent Haskell in the real world - jpvillaisaza
https://www.stackbuilders.com/news/concurrent-haskell-in-the-real-world
======
jwatte
Flaws in the article aside, we use STM and TChans with Haskell threads, and
it's awesome!

------
banachtarski
"All 101 threads run 99.9% of the time and we get an almost 100× speed-up"

Yea I'm going to call shenanigans.

~~~
the_duke
This is a classic example for a completely IO bound workload.

The program spends almost all time waiting for the web API responses and the
the DB inserts, and the computations take a trivial amount of time.

An almost linear speedup can be expected, since GHC has a pretty good
microthread implementation.

Up to a certain thread count of course, since at some point you either
saturate the network / database with requests, or there are to many threads
fighting for the CPU.

It's very similar to what you could expect from goroutines in Go.

The first answer here gives a very simplified overview of GHC s microthread
model: [http://stackoverflow.com/questions/5847642/haskell-
lightweig...](http://stackoverflow.com/questions/5847642/haskell-lightweight-
threads-overhead-and-use-on-multicores)

And more in depth:
[https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts](https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts)

~~~
banachtarski
What I'm missing is how an IO bound workload is really ever a good chance to
showcase a language's threading and concurrency performance or featureset.
When I think about a language's threading model, I think about memory
sharing/message passing, synchronization and fences, signaling, context
switching and scheduling, preemption etc.

None of that is really important if the threads aren't doing any real work.

~~~
the_duke
Of course it is.

In many languages that don't have microthreads baked in, this is not
straightforward.

You either have to use OS threads, which have a way higher overhead in memory
and context switching costs and are not trivial to get right. Or you use some
async IO framework with callbacks or the event based OS apis directly, which
is also FAR from trivial to get right.

Do you really want to think about memory fences and context switching when
writing concurrent code? I don't. Of course you should be aware of how it
works underneath though.

~~~
gvb
The problem with the article is Amdahl's law: he is speeding up the part of
the code that takes a small fraction of the time and handwaving the bottleneck
that takes most of the time. "Currently the bottleneck is writing to the file
system..." (and then the rest of the paragraph makes no sense at all).

"We cannot do writing to disk from several threads for the following
reasons..." \- so his threading speeds up everything except the bottleneck
that takes all the time.

~~~
the_duke
I highly doubt that the disk writes were the original bottleneck. It probably
is the _new_ bottleneck, after parallelizing the data fetching and database
IO.

I agree that the article has some flaws, though.

