Yeah, libevent with http support scales crazy. I did a 4-5k req per second demo application last year. It took me about 25 minutes to write it (that was what shocked me most.)
Also a libevent server would probably fit the L1 cache. The data might fit in L2 depending of a function of overall throughput, message sizes, and of course the size of the L2 :)
Very interesting. A couple of questions for you, in no particular order.
1. Have you used libev? I read that it's comparable, and in some ways superior to libevent, but I'm not sure why. I haven't found much documentation, though. Do you know if it also has some kind of HTTP server built in?
2. If I understand libevent correctly, it just uses a highly efficient mechanism to watch the file descriptor corresponding to a server socket (epoll, kqueue), and when data comes in on the socket, it runs a handler. If the handler is small and efficient, then in theory it will run faster than code which has several listener threads because thread context switches take time, and because it becomes more difficult to take advantage of the CPU's cache. Is this really still true in e.g., recent Linux kernels with NPTL? Also, would you run two libevent app instances on a two-core CPU?
1. No experience on livev, only libevent.
2. libevent is an abstraction layer providing a single API for many event based subsystems. So you can write ANSI/POSIX C code with it portable and at the maximum available event i/o layer.
I doubt any implementation on NPTL or any threading mechanism can get close in performance to this. The overhead of managing threads is big.
About multiple cores, I don't think it is necessary. The CPU work should be minimal. If there is something else you need to do as backend, maybe it can be on a separate process and use fast inter process communication, say sharing memory (rfork is a good start, for example.)
Where Erlang really shines in all this is that they also have a scheduler built into the system, so that you can handle "long running calculations" without 1) blocking the whole thing up, or 2) having to use threads.
You would need a lot of ram to service a million users on a single server and actually do something useful with them.
Input/output buffers alone would amount to a lot (Several gigs most likely)
I've run things up to around 50k in practice (Not erlang, java).
The memory issue is Java. Not for event-based C. (Don't believe me, go to the language shootout and see Java vs. whatever and look at the memory use.)
Also what does "a million users" mean? A million open connections? How many requests active simultaneously?
One million simultaneous HTTP requests would sure eat a lot of memory, but who does it that way? It's stupid. The kernel handles socket buffers for you. Each processor core can only read one socket at a time. So as long as you don't take too much longer to respond requests you should be fine.
Plus with things like event based libraries you can just place a read on the socket for at least X size to "wait" till the basic of the request is fulfilled.
I don't know what they are trying to do, it all depends on what the server does. For pure messaging across connections it should be an order of magnitude better in C/libevent. But as I say, I need more info.
D'oh, from TFP:
This code (mochiconntest_web.erl) just accepts connections and uses chunked transfer to send an initial welcome message, and one message every 10 seconds to every client.
1000000 / 10 = 100000 writes per second to a client on a connected socket. My guesstimate is it's not that much :)
A stress test for several days and fully networked stacks can give you an idea.
And in my experience C/libevent would be hard to beat there by anything as there are less things that can block service. A classic is seeing the garbage collector going bananas for a few seconds in object-oriented languages.
Also a libevent server would probably fit the L1 cache. The data might fit in L2 depending of a function of overall throughput, message sizes, and of course the size of the L2 :)