First, there isn't enough context in the LShift article to judge whether Marek has a valid point. He's conceded up front that both Libev and libevent show logN growth for adding new fd's to the loop, and seems to be arguing that it could be brought down to constant time. So? He hasn't presented evidence that the overhead he's talking about is significant in the real world. This is compute vs. I/O in an I/O-bound environment.
Next, he also seems to be taking his cues from the second set of benchmarks on the Libev page, in which every socket has a random timeout. This is a worst-case scenario for both Libev and libevent. Contrary to both the Libev page and Marek's analysis, setting timeouts on individual I/O events is not the only way to implement "idle timeouts" (which many protocols don't even have); it is in fact one of the poorer approaches.
Finally, he's taken two very specific benchmarks and generalized them to the whole design approach. Libevent is considered "high-performance" because async I/O is a high-performance design --- especially compared to demand-threading --- and libevent was the first stable, cross-platform implementation of it. It's also "high-performance" because it implements the high-performance kernel interfaces, like epoll. But the internals of libevent, while reasonable, are not (at least last time I checked) particularly optimized.
You can get around most of these problems by building on top of libevent; for instance, at Arbor Networks, we ditched libevent's (verbose, clumsy) timer interface for a better data structure, and simply kept libevent notified of the next closest timeout.
"Terrified" seems like a very silly word to use here, but I'm always eager to get schooled.
I've spent a fair amount of time inside the libevent library source, and I'm pretty sure adding new connections is constant time as far as the library goes. There is however the underlying cost of the event struct memory allocation and the cost of the low level apis, but otherwise, IIRC, they keep it down to a single memalloc and link it into the head of internal linked lists for fixed cost insertion.
The observed log(N) insertion cost comes likely from the low level poll apis, and there is pretty much nothing libevent can do to make that faster. However you can choose the low level api used on some platforms via build options, so theoretically you can use a poll api better suited to your workload.
However the claim is made that connections are added and removed more often than polling is conducted (definitely the opposite of the average mysql workload), so I'd guess most low level high performance poll apis aren't optimized for that sort of workload.
Obviously I agree with you on the timer queue thing; I got a performance win simply by shoplifting the timer queue implementation out of MIT Athena; it took maybe a couple hours to integrate and test, and I didn't need to touch the libevent internals to do it.
just a quick question: what would you consider, is a superior approach to idle-timeouts ? thanks for you insights !
This is a grossly ignorant statement. I have written my share of event loops (including epoll and IOCP based) and choosing heap for timer management is an obviously dumb thing to do if you care about the performance. Also removing objects from the event loop before delivering a callback is even more dumb for exact reasons he listed in the post. Essentially what he's bitching about is a lame-ass un-optimized implementation of an event engine.
Also, interestingly enough he actually reinvented a simplified form of Linux timer management code - http://lwn.net/Articles/156329.