
Fast portable non-blocking network programming with Libevent - silentbicycle
http://www.wangafu.net/~nickm/libevent-book/
======
RoboTeddy
Libevent is rock solid -- it handles the event loop for memcache, among other
things. This guide looks like a great improvement/accompaniment to libevent's
docs.

One problem with using an event dispatcher like libevent is that libraries
that make their own blocking calls won't cooperate. That means you'll often be
stuck with limited capabilties, unless you write your own libraries or hack up
existing ones.

There's a Python project called gevent (<http://www.gevent.org/>) that works
atop libevent. It monkey-patches the python socket module with a version that
automatically cooperates with libevent's loop. It uses greenlets to make code
written sychronously run asynchronously behind the scenes. If, for example,
you want to use an Amazon S3 library that makes use of python's httplib (which
would normally block and choke your event loop), it'll automatically work with
gevent/libevent. If you need to handle lots of concurrent connections and
don't need to be working in C, check it out.

~~~
houseabsolute
It's too bad that no standard method for a asynchronous events has been
developed for Unix in general. Where I work, there's one single library for
concurrency, and every program or toolkit uses it. We do a lot of network
programming, so it would be unthinkable for any performance intensive
application to do blocking calls exclusively.

~~~
silentbicycle
libevent tries to provide a common interface to select, poll, kselect (on
BSD), epoll (on Linux), and comparable calls on Windows. It's not a standard
part of Unix, of course, but it seems reasonably portable.

~~~
houseabsolute
Sure, but libraries underneath you have to cooperate for you to use it to the
fullest potential. That no common means have been invented for that is what
I'm sad about.

~~~
silentbicycle
Right. Once you set up a non-blocking event loop, it's much simpler when you
can go async all the way down.

------
ajross
OK, I gotta ask: what's with the sudden interest in event-based I/O paradigms?
This is hardly a new idea, in fact it's very well-traveled ground. It was used
heavily in the 90's when threaded multiplexing was still new (and threads were
expensive), and it works well enough. But it's difficult and error-prone. It
splits sequential algorithms up into multiple callbacks with manually managed
shared state. It makes "what happens next" a really hard problem.

And it leads to some nasty state bugs, because conditions that are obvious in
sequential code ("parse failed? Return ERROR_PARSE_FAIL to the top level
request handler") become subtle (e.g.: parse failed, stash a failure result
somewhere to be returned when the socket-is-closed callback happens, but oops!
forget to actually close the socket so it lives as a zombie.)

Really, I'm curious. What exactly is it that made this thing the snake oil of
the week?

~~~
RoboTeddy
If I had to speculate:

\- epoll, which was added to the linux kernel ~2002, makes it possible to hang
on to potentially hundreds of thousands of concurrent connections on one
machine. Just can't do that with threads/processes.

\- the desire for real time web applications, which have to hold large numbers
of concurrent requests (e.g. friendfeed)

\- a focus on lowering web application latency, which can necessitate making
data requests in parallel

\- advances in software (such as gevent, mentioned below) that reduce the
complexity of writing async software

\- increasing use of remote services / slow requests over the internet (if
you're working in a threading/process model, you'll need a process open for
each concurrent request, which can become prohibitive)

\- web applications are simply serving more requests than they used to be, and
probably have more to gain from being asynchronous

Without async I/O operating a large modern web application would be a lot
harder. Lots of huge sites rely on memcache, for example, which would suck if
it didn't use event-based IO.

~~~
ajross
I don't think all of those items are really arguments about I/O paradigms,
though. Again, event stuff has been here forever, and clearly is appropriate
for some tasks. But it's a tradeoff: you have performance needs you can't get
from a simple implementation so you resort to the fancy tricks. So if you're
writing the client handler for a database, or something like memcached, you
need this technique (whether you need an abstraction library is another
argument...).

But most apps don't fall into those categories, and I'm not getting any of
that vibe from this. It seems that a lot of people sincerely believe that this
stuff is "easier" or "better" just because it has the fancy new Spicy Event
Sauce. And the truth isn't like that at all, and in fact these are truths that
we've all known for years now...

~~~
hurt
Hey ajross, do you know any good sources for reading up on different I/O
paradigms?

It's one of those things that I've read about different methods here and
there, but I've never come across one good source that compared and contrasted
the various techniques.

The short answer may very well be no, since it somewhat depends on the
programming language one chooses to use. As you'd likely handle I/O
differently in something like Haskell, versus python. Anyway, I'm just curious
Thanks.

~~~
silentbicycle
FWIW, the C10K Problem page (<http://www.kegel.com/c10k.html>) compares
various approaches to handling tens of thousands of simultaneous clients. It's
from 2006, but is a good overview of well-understood techniques.

Also, some observations on server design from Jeff Darcy
(<http://pl.atyp.us/content/tech/servers.html>).

~~~
ajross
Argh... was just googling to find that link when you posted it. But yeah: this
is the first attempt in the modern world to sit down and come up with a real
guide to high performance I/O. Most of it is still relevant, though it was
written before scary parallel SMP became common (8-way is _routine_ for a
server these days).

~~~
silentbicycle
Yeah. If you have any other links, please share. It wasn't too long ago when
people were excited about fork-for-each-connection servers, though at least
half of it was "OMG Unix system calls from Ruby" from people who probably
don't know C. (That "* is Unix" meme.)

Simpler approaches are _good enough_ sometimes, though. There are plenty of
services that won't ever need to handle more than a couple simultaneous
connections, and that way you don't need to bother with asynchronous IO.

