
Haskell in the Datacentre - dmit
https://simonmar.github.io/posts/2016-12-08-Haskell-in-the-datacentre.html
======
greenspot
> At Facebook we run Haskell on thousands of servers

Wow, I didn't know this. Anyone knows for which programs they use Haskell? The
article doesn't say anything.

~~~
DigitalJack
its for spam detection. Everything that gets posted on Facebook is processed
by this code. There is a talk on it from a year or two ago at a functional
programming conference. I'll see if I can find it and edit this post.

~~~
noir_lord
I remember this one [https://m.youtube.com/watch?v=UMbc6iyH-
xQ](https://m.youtube.com/watch?v=UMbc6iyH-xQ) is that it?

~~~
DigitalJack
No, but I apparently didn't bookmark it and I'm having trouble finding it
again.

This might be it, don't have time to verify right now:
[http://cufp.org/2015/fighting-spam-with-haskell-at-
facebook....](http://cufp.org/2015/fighting-spam-with-haskell-at-
facebook.html)

------
rvdm
Really great to read more about real world large scale Haskell use.

I'd be curious to find out if the success FB has had with Haskell for spam
filtering means they might consider Haskell for different parts of their stack
too? Does anyone have any insight on this?

------
bojo
Happy to see they pushed the changes to the upstream GHC!

------
4ad
> GHC’s runtime is based around an M:N threading model which is designed to
> map a large number (M) of lightweight Haskell threads onto a small number
> (N) of heavyweight OS threads. [...] To cut to the chase, we ended up
> increasing N to be the same as M (or close to it), and this bought us an
> extra 10-20% throughput per machine.

Ah, yes. As a Go developer, really wish Go moved to an 1:1 threading model.

~~~
crawshaw
1:1 in Go would mean moving servers from blocking-style code to
callbacks/futures.

(I believe the story is slightly different in Haskell.)

Nice sequential blocking-style code is my favorite thing about Go. I realize
it's not C, and it makes runtime work much harder, but the payoff up in
complex servers is completely worth it.

~~~
4ad
> 1:1 in Go would mean moving servers from blocking-style code to
> callbacks/futures.

No, it would not mean such a thing at all. Where did you get this idea?

In fact, on some operating systems/linker combinations, gccgo uses a 1:1
threading model.

> Nice sequential blocking-style code is my favorite thing about Go.

Mine too.

~~~
crawshaw
It would in practice.

A typical server under load has more outstanding requests to answer than it
can OS threads. It is not uncommon in a Go server at Google to find a million
or so goroutines.

If you want to have an OS thread that can also run C code, it needs a large
enough stack to run C code. A million such stacks is not practical.

On an OS that lets you avoid creating stacks with threads like Linux, and
which has quite light-weight kernel-side accounting for threads, it would be
possible to give each goroutine a thread. You would still need an N:M pool for
cgo.

As an added problem, 1:1 is slower for certain kinds of common channel
operations. Anywhere you have an unbuffered channel and block waiting for a
value, a 1:1 model requires a context switch to pass the value, whereas M:N
means the userland scheduler can switch goroutines on the OS thread.

It is precisely these benchmarks that led Ian to implement M:N in gccgo. If
there are combinations where it is 1:1, that is either a new decision he has
made in the last year, or (more likely) OS/ARCH combinations that he hasn't
moved to M:N yet.

I have seen a similar attempt at 1:1 lightweight tasks in C++ that ran into
this. Without the ability to preemptively move tasks between OS threads, it
ran into performance problems. Programs that needed the speed in that model
had to switch to a futures-style of programming.

~~~
4ad
> A typical server under load has more outstanding requests to answer than it
> can OS threads.

First of all, I object to the use of "can" here. There was a time where this
was true, but, as I said, that time has passed. There is no problem running
millions of kernel threads today.

That being said, there are advantages of not doing that. Right now Go uses a
network polling mechanism like epoll to implement network I/O without
consuming threads. This is totally orthogonal to the 1:1 vs. N:M discussion.
Go can use 1:1 scheduling while still performing asynchronous I/O behind the
scenes, just as it does today.

The nature of the mapping between g's and m's does not influence the other {g,
g, ...} -> {g0{m}} mapping used to implement network I/O.

> It is not uncommon in a Go server at Google to find a million or so
> goroutines.

No problem with that, they can continue to do so.

> If you want to have an OS thread that can also run C code, it needs a large
> enough stack to run C code. A million such stacks is not practical.

C code needs its own stack, but this is again orthogonal to scheduling. Right
now there is a N:1 mapping {g, g, ...} -> {g0} which makes it easy to switch
to g0 stack. This will have to change to a N:M mapping {g, g, ...} -> {g0, g0,
...} and Go code would have to acquire C stacks, just as it acquires any other
resource that it needs to use.

This is not expensive, in fact, C calls are now expensive precisely because
there is a deep interaction between calling C code and the scheduler (that
needs thousands of cycles). All this cycles will Go away. In a 1:1 model _all_
you'd have to do is acquire a C stack, which is very fast and happens in
userspace for the uncontented case (the common case), probably less than a
hundred cycles.

> You would still need an N:M pool for cgo.

As explained above, you need a N:M pool, but this does not need to be
integrated with the scheduler any more, making it much simpler to implement
(and more performant).

> Anywhere you have an unbuffered channel and block waiting for a value, a 1:1
> model requires a context switch to pass the value, whereas M:N means the
> userland scheduler can switch goroutines on the OS thread.

Only if you naively implement channels as mutex-based queues.

There is still a runtime underneath that can switch stacks on different
threads. I want to move general scheduling decisions out of the Go runtime and
into the kernel, but pointwise, stateless, swap-of-control type of things can
still happen synchronously in the Go runtime.

~~~
crawshaw
I do not believe the networking issue is orthogonal. I believe you will find
that any communication between two OS threads is significantly slower than
switching one thread between an active and a parked goroutine. Futexes are
slow and spinlocks burn CPU. (This is exactly the case the C++ model I
mentioned ran into, and why gccgo got an N:M model.)

But as I said I'm happy to be convinced. The easiest way to demonstrate it
would be to move gccgo linux/amd64 back to 1:1 without hurting the channel
benchmarks. You could use that as an argument that fanning out an epoll loop
among threads can be made fast.

------
lmm
I think the bigger question is why they're using a C++ Thrift server. Haskell
is good at transforming data - they should be able to use an OS socket and do
the message decoding/encoding in pure Haskell.

~~~
chongli
_the bigger question is why they 're using a C++ Thrift server_

Because it's already there and it already works just fine.

It's so frustrating to see this sort of comment on nearly every post about a
language outside the mainstream. It's a religious argument and not a
productive one.

~~~
lmm
> Because it's already there and it already works just fine.

Well clearly it didn't work just fine if its threading model was interacting
poorly with that of their Haskell code. If there's a mismatch between those
two components then that will continue to bite them in the future, and it's
legitimate to ask whether moving the boundary would be a better way to solve
the problem.

~~~
easytiger
your right, makes more sense to have both in c++

~~~
massudaw
That was the last tried solution, didn't work out well. And what is holding is
nothing haskell specific. Is just wasting effort re implementing good broad
used code at Facebook.

