
Why Events Are A Bad Idea For High-Concurrency Servers (2003) - jervisfm
http://static.usenix.org/events/hotos03/tech/full_papers/vonbehren/vonbehren_html/
======
rdtsc
Events (to translate to more common lingo) are another way today of saying
callbacks, promises, async. Usually a big select/poll/epoll system call sits
at the bottom of those. Callback chains (abstracted as promises, deferreds or
futures for example) branch out from it. As in select-c1()->c2()->c3()...

The popularity of that been going up and down in waves for various reasons.
Lack of multi-core machines in early 2000s was one reason. Fast CPU speeds was
another. Expensive RAM was also important (threads would need a larger context
while a callback context is very small).

This technique is performant still, but in specific situations and mostly in
cases were callback chains are short. So something like select->c1()->c2() is
nice. Some servers like HAProxy or nginx fall in that category. Those are
battle tested very performant pieces of code. Another category is demos. Short
examples of "look at my async code example". Those also have very shallow
callback chains and also look simple and pretty. [+]

What doesn't work for events too well is deep callback chains. This where real
world business logic gets to be implemented. With layers of API. Shopping
carts, payment processing, various backend databases. Now you are up to
...->c14()->c15(). That doesn't work in my opinion. The concurrency context
instead of being in one thread executing instructions c1....c15 is now spread
across many function interspersed with then()'s or yields()'s. Oh and usually
errors need to be handled, so the chain becomes a tree with some splits into
errbacks for example.

I think we are seeing a second wave of resurgence of threads in forms of
goroutines in Go. Rust tasks. Dart Isolates. That is a nice thing to see, I
general I believe those are better at handling concurrency.

[+] Python greenlet module (gevent is based on it) is an interesting case of a
hybrid. At its core it is based on an event loop but then switches to green
threads (think of them as co-routines). So user effectively spawns threads and
sends messages between them for example.

~~~
jlouis
How could you forget Erlang? It has been avoiding events for years and has
done so with high success. Same with Haskell and Standard ML :)

~~~
lmm
I think of Erlang as an event-oriented system; actor messages do not
necessarily receive exactly 1 reply (or any reply at all), which seems to be
the critical property that the paper is talking about. Why do you say Erlang
avoids events?

~~~
pron
The key point discussed in the article is blocking. Erlang processes, like
Go's goroutines and Quasar fibers on the JVM _block_ without sacrificing any
of the performance associated with asynchronous callback-based styles. The
article says that there is no problem with the thread abstraction, and
blocking in particular -- there's just a problem with the implementation of
kernel threads that makes blocking expensive.

Callbacks, and the various attempts to make them more palatable (composable
promises, monads), change the abstraction to get around implementation
problems. Lightweight threads (or user-mode threads) simply fix the
implementation, and preserve the threading/blocking abstraction.

------
acjohnson55
This is an interesting topic, which I've been thinking about a lot. It seems
like ten years later, we have the same duality going, but with successor
models. We've gotten rid of the shared state that makes threading a pain to
yield Actor Model/CSP and we've reified callbacks into promises/futures,
events into streams, and mutable data into signals to make each encapsulated
and composable in Reactive frameworks. It seems that the former model has
really taken off in imperative programming and the latter in functional style.

A paper from EPFL [1] does a great job of breaking down the problems of the
Observer/Event pattern and how reactive patterns shore up the workflow. It
also contrasts the resulting framework with others, including actor models.
Interestingly, both models seem to have settled on message passing, the main
difference being whether it's the sender's or receiver's responsibility to
initiate the connection. In an actor model, the sender must know the
receiver's address in advance, and receivers generally accept messages from
anywhere. In a reactive model, receivers are responsible for connecting to a
message source, and the sender has no control over whom receives its messages.
Unsurprisingly, since Odersky is one of the authors of the paper, Scala has a
great deal of support for both models.

I'm not sure I fully understand the contrast between the models. The
Principles of Reactive Programming [in Scala] Coursera course [2] actually
devotes subsections to both models respectively, but rather oddly does not
make much of a comparison between the two. I haven't read anything that fully
explores this duality, or perhaps whether there is a model that
generalizes/unifies them.

Does anyone know of good resources on this topic?

[1]
[http://lampwww.epfl.ch/~imaier/pub/DeprecatingObserversTR201...](http://lampwww.epfl.ch/~imaier/pub/DeprecatingObserversTR2010.pdf)

[2]
[https://www.coursera.org/course/reactive](https://www.coursera.org/course/reactive)

------
joosters
The paper doesn't seem to discuss the important concept that writing threaded
code is _hard_. Once your process starts using threads, suddenly every bit of
state in it can be shared by all threads. Concurrency issues, locking and so
on become major headaches. Reasoning about code paths and data accesses
becomes amazingly complicated. We may think that we're coding experts, but
throw us some threads and we'll screw up the code for them.

Also, this paper is from over ten years ago. If the concept is correct, surely
lots of threaded servers would have appeared and their speed would have been
beating benchmarks all over the place. Where are these servers?

~~~
rdtsc
> The paper doesn't seem to discuss the important concept that writing
> threaded code is hard.

It mentions that using threads is hard. The implication is events are harder.

> suddenly every bit of state in it can be shared by all threads.

Long overlapping chains of callbacks have similar issues. Points where
concurrency contexts switch are much larger (between callbacks instead of
between machine instructions) but in a large system the danger is real.

> Reasoning about code paths and data accesses becomes amazingly complicated.

Based on working on large systems based on callbacks and based on threads,
I've come to the conclusion that both are hard, but callbacks are much harder
and messier.

> Where are these servers?

There are everywhere. Java, C++ backend systems. Twitter is using Scala.
WhatsApp is using Erlang. PostgreSQL has background processes that help.
Apache is the most popular web server.

Now are you saying "where they are?" because we don't read about them on HN
every day? Yeah, nobody talks about stuff that works and is established and
boring. So perhaps picking technology based on what is read or discussed on HN
might not be the best approach ;-)

~~~
joosters
I didn't mean that threaded servers don't exist, that's pretty obvious. I
meant, as my comment stated, that there aren't threaded servers popping up and
demonstrating this fantastic performance/scalability. Hell, Apache moved
_away_ from threads.

Twitter and WhatsApp could be good examples, but no-one outside the company
can really say that each server there scales well. How many servers does
Twitter run? Who knows? So who knows if they are each 'scaling to 100,000
threads and achieving excellent performance', as per the paper's claims?

As for which model is harder or not, I guess that is an arguable matter of
opinion. When the paper is claiming that one model is a 'more natural'
programming style then it's clearly moving beyond objective reasoning.

~~~
jlouis
Erlang _routinely_ runs 100,000 processes in the same memory space and scales
very well. The reason it is possible to even handle this is that it is not
based on shared memory but on message passing.

There are two things at play here: a) Is the solution easy to design/code. b)
Is the solution fast enough.

There is far too much focus on (b) and relatively little attention is given to
(a). The claim is that writing code in a direct style in a highly concurrent
environment (Go, Erlang, Haskell, SML, ...) is easier and achieves roughly the
same performance (within a factor of, say, 2).

My experience is that this is true. I have seen fast evented and threaded
systems. But I do feel the threaded systems were easier to write.

~~~
tel
Another side effect of focusing on (b) is that it frees up the resources
someone else (a runtime implementer) has to focus on (a) independently. This
is just another corollary of "build it, build it right, build it fast".

In particular, consider the new MIO IO manager included in Haskell GHC 7.8.
It's increasing performance of IO-bound lightweight threading by orders of
magnitude... all performance provided completely for free to consumers of
these abstract threading services who've spent all their time making their
algorithms correct, not fast.

------
timdoug
A bit of historical context: this paper came out in 2003, the same year NPTL
(in RH9) and Linux 2.6.0 (first stable kernel with epoll, 2.5.44 introduced it
in 2002) were released.

~~~
stormbrew
It is worth noting that kqueue was introduced in FreeBSD 4.1 in 2000,
/dev/poll in Solaris 8 at about the same time, and though somewhat different
in nature, IOCP and Overlapped I/O were in Windows NT from before even that.

So there was already experience with high performance event driven mechanisms
before epoll.

That said, the examples presented in this paper do appear to be linux
examples.

------
zackmorris
Events are just a pattern running on top of nonblocking I/O. I preferred
nonblocking I/O when I first started network programming, but now I prefer
blocking I/O running on separate threads.

The reason is that languages like Go (and really any language that uses pipes
instead of a shared memory space) move the programmer from a tangled mess of
logic and callback hell to individual processes that each resemble the main
loop we’re all used to. Cumbersome tasks like fetching a file over HTTP end up
looking like opening a local file. The thread just blocks until the data
arrives, which the programmer generally doesn’t have to worry about. That was
the original vision behind sockets (that everything is a file).

I have high hopes that generators and coroutines will merge the two paradigms.
That’s because the danger with events is that their logic is a state machine
and tends towards unmanageability after a few dozen states. The programmer
discovers this late in a project when edge cases and failures can’t be easily
isolated or fixed. We see this every day on websites whose javascript chokes
and the page has to be reloaded. But the danger with threads is concurrency
issues (atomicity/mutexes/semaphores etc). The programmer discovers those
early on and has trouble writing even the simplest stable program, especially
in a low level language like C or java. So the learning curve with blocking
I/O is steep enough that I think that’s why it never went mainstream.

Anyway, if we only use channels/pipes and no shared memory, and not even allow
threads access to the filesystem, then coroutines and preemptive threads
become basically equivalent. I think the endgame will be a methodology in
approachable languages like javascript/Go that works like Erlang, with the
ability to spawn threads on different processors or even different machines,
running over something like ZeroMQ or WebRTC. If we can throw in a proof of
work system like Bitcoin, and be able to utilize the graphics card with
CUDA/OpenCL but without ugly syntax, we’ll really have something because we’ll
finally have distributed computing and processor speed won’t really matter
anymore. I think something like NumPy/Octave/MATLAB could run this way and be
several orders of magnitude faster than anything today, with little additional
cost since so many machines are sitting idle anyway.

------
lmm
So this paper's result depends entirely on their dynamic stack size technique.
Was this ever implemented in real systems?

~~~
skybrian
Yes, Go has stacks that start out tiny and grow and shrink dynamically.

------
jpollock
If you can have one thread per transaction then you can avoid event loops. If
you cannot then you will need to write an event loop. In the systems I work
with, the transactions are long held (months) and high in number (millions).
Unless we went with a language that had lightweight threads we have to use
event polling. Rewriting in Erlang would come with different problems...

There are architectural benefits to event loops too. You get built in
interception and if your code is free of side effects cheaper redundancy.

------
dap
The paper ignores what I think is one of the biggest advantages of event-based
systems over threaded ones, which is that the pathological failure modes when
things go operationally sideways are often much nicer with event-based
systems. It's not just that threaded code is "harder" \-- as others have
pointed out, event-based code is hard in a different way. But when you make a
network call with a lock held, you not only block the operation in question,
but anything else which needs the same lock, or which would like to use the
same thread. The worst part it is that it will work most of the time -- until
that network call blocks for a long time for some reason, at which point
latency bubbles propagate through the application. With an event-based system
in an environment that doesn't allow you to block the event loop without going
_way_ out of your way (as Node.js does, for example), the operation in
question will obviously stop until the network call completes, but that won't
affect anything that's not intrinsically tied to that call.

A related failure mode I've seen a lot is when an operation gets stuck with a
lock held and you can't even call into the server to ask the status of the
operation because _that_ request blocks on the lock that's held. Of course,
the better way to do this is to avoid holding the lock during a blocking
operation, and instead use the lock only to set a flag indicating the
operation is pending. That way, other threads can observe the current state
while the operation is pending. The point is that you have to make these
intermediate states explicit. But IMO, that's the hardest part of the event-
based system anyway.

------
tiagobraw
Ah, this paper is classic... It is a response to the other (equally classic)
paper "Why threads are a bad idea" [0].

In fact, threads and events are dual [1], and one can be made as efficient and
the other (given a proper implementation).

[0] [http://www.stanford.edu/~ouster/cgi-
bin/papers/threads.pdf](http://www.stanford.edu/~ouster/cgi-
bin/papers/threads.pdf) [1] Lauer and Needham. “On the Duality of Operating
System Structures” in 1978;

------
swah
Also [http://pt.slideshare.net/e456/tyma-
paulmultithreaded1](http://pt.slideshare.net/e456/tyma-paulmultithreaded1)

------
tegeek
This papers is from 2003. Now in 2014, Functional Programming gives enough
power to event-driven systems that its hard to beat by any other
tools/tech/technique.

The new mantra is Reactive Programming
([http://www.reactivemanifesto.org/](http://www.reactivemanifesto.org/))

Reactive Systems are event-driven but highly concurrent, scalable, easy to
maintain and extend because of the "high level" concepts of "Signals &
Behaviors" instead of "low level" event handlers.

~~~
dozzie
The manifesto seems silly. Reactive systems are defined as being scalabe, the
very thing we want to _achieve_ by using the approach (reactive approach in
this case). Yet, the system needs to be scalable first in order to be called
"reactive". Something went wrong.

