A Million-user Comet Application with Mochiweb, Part 1

alecco · on Oct 16, 2008

Yeah, libevent with http support scales crazy. I did a 4-5k req per second demo application last year. It took me about 25 minutes to write it (that was what shocked me most.)

Also a libevent server would probably fit the L1 cache. The data might fit in L2 depending of a function of overall throughput, message sizes, and of course the size of the L2 :)

wanderingmarker · on Oct 17, 2008

Very interesting. A couple of questions for you, in no particular order.

1. Have you used libev? I read that it's comparable, and in some ways superior to libevent, but I'm not sure why. I haven't found much documentation, though. Do you know if it also has some kind of HTTP server built in?

2. If I understand libevent correctly, it just uses a highly efficient mechanism to watch the file descriptor corresponding to a server socket (epoll, kqueue), and when data comes in on the socket, it runs a handler. If the handler is small and efficient, then in theory it will run faster than code which has several listener threads because thread context switches take time, and because it becomes more difficult to take advantage of the CPU's cache. Is this really still true in e.g., recent Linux kernels with NPTL? Also, would you run two libevent app instances on a two-core CPU?

alecco · on Oct 17, 2008

1. No experience on livev, only libevent. 2. libevent is an abstraction layer providing a single API for many event based subsystems. So you can write ANSI/POSIX C code with it portable and at the maximum available event i/o layer.

I doubt any implementation on NPTL or any threading mechanism can get close in performance to this. The overhead of managing threads is big.

About multiple cores, I don't think it is necessary. The CPU work should be minimal. If there is something else you need to do as backend, maybe it can be on a separate process and use fast inter process communication, say sharing memory (rfork is a good start, for example.)

davidw · on Oct 17, 2008

Where Erlang really shines in all this is that they also have a scheduler built into the system, so that you can handle "long running calculations" without 1) blocking the whole thing up, or 2) having to use threads.

amix · on Oct 16, 2008

The benchmark is very misleading as it's very cheap to open connections, especially when using non-blocking IO...

A good post about real-life performance of different comet frameworks: http://cometdaily.com/2008/03/14/comet-gazing-maturity/ They range from 1000 to 20.000 users pr. node.

icey · on Oct 16, 2008

I'm curious, does anyone know if this has been implemented somewhere and has actually been profiled?

Just because something can open a million connections doesn't mean it's stable enough to be a million-user application.

axod · on Oct 16, 2008

You would need a lot of ram to service a million users on a single server and actually do something useful with them. Input/output buffers alone would amount to a lot (Several gigs most likely)

I've run things up to around 50k in practice (Not erlang, java).

Stability isn't an issue really, mainly ram.

alecco · on Oct 16, 2008

The memory issue is Java. Not for event-based C. (Don't believe me, go to the language shootout and see Java vs. whatever and look at the memory use.)

Also what does "a million users" mean? A million open connections? How many requests active simultaneously?

One million simultaneous HTTP requests would sure eat a lot of memory, but who does it that way? It's stupid. The kernel handles socket buffers for you. Each processor core can only read one socket at a time. So as long as you don't take too much longer to respond requests you should be fine.

Plus with things like event based libraries you can just place a read on the socket for at least X size to "wait" till the basic of the request is fulfilled.

I don't know what they are trying to do, it all depends on what the server does. For pure messaging across connections it should be an order of magnitude better in C/libevent. But as I say, I need more info.

axod · on Oct 16, 2008

Yes java adds some overhead, but you still have the low level buffers.

A million open connections we're talking about. All active, connected.

Agreed, c/assembly would obviously reduce the memory footprint. But is the extra work worth it? I'd say not for most apps.

alecco · on Oct 16, 2008

D'oh, from TFP: This code (mochiconntest_web.erl) just accepts connections and uses chunked transfer to send an initial welcome message, and one message every 10 seconds to every client.

1000000 / 10 = 100000 writes per second to a client on a connected socket. My guesstimate is it's not that much :)

Based on this graph: http://monkey.org/~provos/libevent/libevent-benchmark2.jpg

[BTW sorry for replying to my own comment, and sorry I didn't read TFA.]

alecco · on Oct 16, 2008

A stress test for several days and fully networked stacks can give you an idea.

And in my experience C/libevent would be hard to beat there by anything as there are less things that can block service. A classic is seeing the garbage collector going bananas for a few seconds in object-oriented languages.