
A lot of websockets in Haskell - boothead
https://blog.wearewizards.io/a-lot-of-websockets-in-haskell
======
jlouis
Some comments:

Haskell will use less memory on this task. Since it has a shared heap, it
doesn't have to allocate a small heap per process, and thus it is expected
that it has lower overhead. Furthermore, static typing means Haskell needs
less type tags and this tend to make it win. As the system grows in
complexity, these things tend to even out a bit more but I would still expect
Haskell to use about half the memory of Erlang.

The Elixir numbers for Phoenix sounds off. At 83765 megabytes and 1999984
connections, that is 42 kilobytes per connection. That count is about an order
of magnitude over what I would expect it to be. How much of that memory is
kernel allocated network buffer space, and how much is buffer space in the
Erlang runtime? A "raw" process in Erlang is around 1.5 kilobytes nowadays,
including stack and heap, so where do the additional 40 get allocated to? I
don't think we have 20 extra processes per connection for some reason :)
Definitely something to look into.

Tsung is an old application. It isn't really written in ways that makes it
efficient at the network level and this shows its head in this benchmark.
Furthermore, Tsung does more work than the broadcast in Haskell, so it is
expected the load generator will give up long before the server. Again,
measure the amount of memory allocated by the kernel and by the userland
process in order to determine if it is one or the other you hit first. Still,
I would expect Tsung to be the culprit.

~~~
bos
A small correction: GHC uses plenty of type tagging information. In fact, its
metadata overhead is relatively high.

~~~
thinkpad20
Why is this there? Is it to facilitate things like Typeable? I believe that
there's no language-level way to do things like runtime type reflection. And
even if there were, how would one express a complex type like (Vector (forall
a. MyTypeClass a => a -> Int, String))?

I'm also curious if dependently typed languages like Idris, which presumably
must be able to have runtime access to type information, handle this stuff.

~~~
chadaustin
For values, laziness means there is a tag bit for whether a value is a thunk
or evaluated. Sum types use tags to determine which variant is active.

For functions, because a function that takes two arguments and returns a value
(a -> a -> a) has the same type as a function that takes one argument and
returns a function that takes another argument that returns a value (a -> a ->
a), the arity of functions is stored in the tag.

Some of these tags are eliminated by inlining but if you sit down and read
some typical Haskell output you'll see a _whole lot_ of tag checks.

Source: spent a lot of time reading GHC output and writing high-performance
Haskell code.

------
marcosdumay
Quite inline with my experience.

I wrote a mail server in Haskell that, because of a bug was leaking
connections. Didn't use nix, so I had the external forkIO explicit, and used a
custom sockets interface, that creates a buffer for every connection. I
optimized almost nothing, this was an earlier version of it.

Anyway, every time the server would reliably get to about 700k open
connections before getting killed at my linode machine.

------
BenoitP
What this benchmark shows is the lightweightness of the memory needed per
websocket. 500k users is really impressive.

500k users per machine is great if they are mostly idle. This is the use case
of WhatsApp, and their stats are[1]:

> Peaked at 2.8M connections per server

> 571k packets/sec

> >200k dist msgs/sec

Not every app is meant to have mostly idle users. Can a real-time MMO FPS be
done in the Haskell server is the question (lets limit each user's
neighbourhood to the 10 closest players).

I'd be very interested in the other corners of the envelope for the Haskell
server: A requests/second benchmark over websockets, with the associated
latency HdrHistogram, like [2]

[1] [http://highscalability.com/blog/2014/2/26/the-whatsapp-
archi...](http://highscalability.com/blog/2014/2/26/the-whatsapp-architecture-
facebook-bought-for-19-billion.html)

[2] [http://www.ostinelli.net/a-comparison-between-misultin-
mochi...](http://www.ostinelli.net/a-comparison-between-misultin-mochiweb-
cowboy-nodejs-and-tornadoweb/)

------
binaryapparatus
I really love Haskell and wish there is more opportunity for me to use it. But
for high concurrency web projects Elixir has one big advantage -- Chris and
the team doing very focused and high quality presentations. Building the
community and hype surrounding the project (I say 'hype' in best possible
meaning) is half job done. I really get itchy fingers to start some serious
Phoenix project every time I see Elixir team at work. Respect.

~~~
vegabook
Couldn't agree more. Hunting around for a new language to do large scale
concurrency in, I had shortlisted Clojure and even Golang, but the Phoenix
presentations (and BEAM) tipped it for me. I also appreciate the fact that the
Phoenix guys constantly address the need not only to target the browser, but
other types of endpoints "beyond the browser" with transport adapters, CoAP
etc. Certainly has gotten me to buy a bunch of Elixir books recently to get
into the ecosystem.

------
artursapek
I love people who spend time explaining things like this. It's how I've
learned a lot of what I know about computers.

~~~
axman6
I was actually a little annoyed the author didn't explain more, it felt like
it was a half finished article.

------
windlep
I have two samples of a websocket echo program in Haskell, and was
unfortunately unable to make it not leak memory merely sending the same data
back over the connections repeatedly. I will admit I haven't tested this
lately, and I didn't use Network.Websocket because of this:
[https://github.com/jaspervdj/websockets/issues/72](https://github.com/jaspervdj/websockets/issues/72)

Haskell code with tester is here: [https://github.com/bbangert/ssl-ram-
testing/tree/master/Hask...](https://github.com/bbangert/ssl-ram-
testing/tree/master/Haskell)

(In some cases it would take some client churn to get the leak to occur, but
clients come/go, a server has to survive this basic fact)

One diagnosis was that it was a memory fragmentation leak, since sure enough
when I ran some debugging it reported fragmentation losses at the end. Some
tweaks to the initial stack size and how much stack each increment should be
done remedied it for the most part, but resulted in it taking quite a bit more
memory (I believe I had to set the initial stack to at least 4k).

Using -N4 or -N6 (to turn on multi-core/cpu use in haskell) increased memory
consumption quite a bit, as Haskell will now start juggling all those threads
across real OS threads with its M:N scheduler (Just like Go has to do with its
goroutines). It was drastically more memory efficient with -N1, even though
that loses the multi-core utilization.

So far the most efficient implementation I've tested is of course... in C.
Where a simple websocket echo used 9.5kb of memory per connection (including
SSL, and with TCP kernel buffers set to 4k send/recv). I'm not really eager to
write C code though, so we're using Python+twisted where a plain WS connection
will take about 16kb per conn, or 23kb with SSL.

In this article's test, the kernel buffer for TCP send/recv was set to 1k,
which I guess is fine if your payload will frequently be under that. But if
you regularly are sending payloads exceeding that, the amount of wakes to keep
shoveling data into the kernel buffer is going to be rather expensive.

As the Phoenix people note, its useful to keep in mind what you need to do,
rather than merely how many connections you want to hold open.

~~~
codygman
It looks like the Yesod websockets version fixed the issue?

------
jw989
This is cool, largely because of the functional perspective to dealing with
pools of websockets. I personally think the next level for websockets
infrastructure is building a paradigm for large pools of concurrent websocket
connections.

Perhaps a microservces implementation with each step (handshake, broadcasting,
http overhead) having their own cluster and communicating through channels.

~~~
jkarneges
Pushpin ([http://pushpin.org](http://pushpin.org)) may be in line with what
you're thinking. It separates connection management from backend logic. In
fact the project itself is a handful of microservices (Mongrel2 is used as a
separate process for the HTTP handling).

~~~
timc3
Uses GNU AFFERO Version 3 license - noted to save someone else from going from
happily excited to disappointment.

~~~
jkarneges
Code that can be run, modified, and redistributed at no cost. There are worse
deals. ;)

------
jdreaver
Kind of off topic, but: can you elaborate on how you handle the state file for
nixops? I _really_ want to use nixops for some smallish server deployments,
but I don't want to have to share the state files with team members (or worse,
keep track of it on some central machine that becomes a single point of
failure). That is the only reason I am sticking to Ansible for the time being.

To anyone else who uses nixops, why does nixops use this state file? Can't
they use labels like Ansible to identify deployed machines? Can't you just
require the user to provide their own secrets instead of auto-generating a new
keypair for each machine?

~~~
gautier
Glad you asked, as we have already written this article:
[https://blog.wearewizards.io/how-to-use-nixops-in-a-
team](https://blog.wearewizards.io/how-to-use-nixops-in-a-team)

~~~
jdreaver
Ah, I've already read that and I didn't realize you guys wrote it :)

Thanks for the writeup!

------
TheIronYuppie
Disclaimer: I work at Google, on Kubernetes

Really cool example! Just FYI, pre-emptible VMs would only be $0.06 (vs. the
bid of $0.10 for AWS Spot Market). I mention because the author talks about
cost being a concern.

[https://cloud.google.com/compute/pricing#machinetype](https://cloud.google.com/compute/pricing#machinetype)

------
amelius
Imagine how long it would take to send a simple message to all of those
websockets. Having many websockets is nice, but only if the rest of your
infrastructure can deal with it.

~~~
Refefer
Clearly a chatroom with half a million individuals is unusable from pretty
much every perspective. That said, a chat server with N chatrooms and a total
population of 500k users sounds like a good day on IRC and well within the
realm of what something like this could potentially handle.

~~~
felixgallo
That depends very much on what's doing the chatting. If it's code chatting
with other code -- for example, mobile devices receiving near-real-time
notifications -- then half a million is just getting started.

------
issaria
Have to upvote this for no reason :)

------
arrty88
bravo

