The popularity of that been going up and down in waves for various reasons. Lack of multi-core machines in early 2000s was one reason. Fast CPU speeds was another. Expensive RAM was also important (threads would need a larger context while a callback context is very small).
This technique is performant still, but in specific situations and mostly in cases were callback chains are short. So something like select->c1()->c2() is nice. Some servers like HAProxy or nginx fall in that category. Those are battle tested very performant pieces of code. Another category is demos. Short examples of "look at my async code example". Those also have very shallow callback chains and also look simple and pretty. [+]
What doesn't work for events too well is deep callback chains. This where real world business logic gets to be implemented. With layers of API. Shopping carts, payment processing, various backend databases. Now you are up to ...->c14()->c15(). That doesn't work in my opinion. The concurrency context instead of being in one thread executing instructions c1....c15 is now spread across many function interspersed with then()'s or yields()'s. Oh and usually errors need to be handled, so the chain becomes a tree with some splits into errbacks for example.
I think we are seeing a second wave of resurgence of threads in forms of goroutines in Go. Rust tasks. Dart Isolates. That is a nice thing to see, I general I believe those are better at handling concurrency.
[+] Python greenlet module (gevent is based on it) is an interesting case of a hybrid. At its core it is based on an event loop but then switches to green threads (think of them as co-routines). So user effectively spawns threads and sends messages between them for example.
A concrete example: I'm experimenting with a small analytics service where one part of the implementation is that I need to scrape about ~10 million forum posts across ~500k web pages.
A purely event-driven approach makes scraping very easy but throttling requests, handling errors and retries quickly becomes messy and difficult to reason about. A purely thread-driven approach makes throttling much simpler but scraping harder.
I found that implementing scraping with events (i.e. push a URL to a queue + specify a callback type + state) and downloading with threads (i.e. process the queue(s) with a request throttling window) much more manageable with some nice side benefits (e.g. persistent queues are easy to implement).
I haven't looked at goroutines, Rust tasks or Dart Isolates -- are you suggesting that one model can be sufficient for general purpose computing?
Callbacks, and the various attempts to make them more palatable (composable promises, monads), change the abstraction to get around implementation problems. Lightweight threads (or user-mode threads) simply fix the implementation, and preserve the threading/blocking abstraction.
If it is OS processes (or Erlang lightweight processes) then you gain failure isolation. If something has gone wrong and a child process crashes, only that request will fail, others will keep working. Real OS processes are usually on the heavy side.
In Go you might create a channel and spawn a goroutine for example.
Out of curiosity, you mentioned nginx as using callback chain ; has it any difference with apache in that regard ?
A paper from EPFL  does a great job of breaking down the problems of the Observer/Event pattern and how reactive patterns shore up the workflow. It also contrasts the resulting framework with others, including actor models. Interestingly, both models seem to have settled on message passing, the main difference being whether it's the sender's or receiver's responsibility to initiate the connection. In an actor model, the sender must know the receiver's address in advance, and receivers generally accept messages from anywhere. In a reactive model, receivers are responsible for connecting to a message source, and the sender has no control over whom receives its messages. Unsurprisingly, since Odersky is one of the authors of the paper, Scala has a great deal of support for both models.
I'm not sure I fully understand the contrast between the models. The Principles of Reactive Programming [in Scala] Coursera course  actually devotes subsections to both models respectively, but rather oddly does not make much of a comparison between the two. I haven't read anything that fully explores this duality, or perhaps whether there is a model that generalizes/unifies them.
Does anyone know of good resources on this topic?
Also, this paper is from over ten years ago. If the concept is correct, surely lots of threaded servers would have appeared and their speed would have been beating benchmarks all over the place. Where are these servers?
It mentions that using threads is hard. The implication is events are harder.
> suddenly every bit of state in it can be shared by all threads.
Long overlapping chains of callbacks have similar issues. Points where concurrency contexts switch are much larger (between callbacks instead of between machine instructions) but in a large system the danger is real.
> Reasoning about code paths and data accesses becomes amazingly complicated.
Based on working on large systems based on callbacks and based on threads, I've come to the conclusion that both are hard, but callbacks are much harder and messier.
> Where are these servers?
There are everywhere. Java, C++ backend systems. Twitter is using Scala. WhatsApp is using Erlang. PostgreSQL has background processes that help. Apache is the most popular web server.
Now are you saying "where they are?" because we don't read about them on HN every day? Yeah, nobody talks about stuff that works and is established and boring. So perhaps picking technology based on what is read or discussed on HN might not be the best approach ;-)
Twitter and WhatsApp could be good examples, but no-one outside the company can really say that each server there scales well. How many servers does Twitter run? Who knows? So who knows if they are each 'scaling to 100,000 threads and achieving excellent performance', as per the paper's claims?
As for which model is harder or not, I guess that is an arguable matter of opinion. When the paper is claiming that one model is a 'more natural' programming style then it's clearly moving beyond objective reasoning.
There are two things at play here: a) Is the solution easy to design/code. b) Is the solution fast enough.
There is far too much focus on (b) and relatively little attention is given to (a). The claim is that writing code in a direct style in a highly concurrent environment (Go, Erlang, Haskell, SML, ...) is easier and achieves roughly the same performance (within a factor of, say, 2).
My experience is that this is true. I have seen fast evented and threaded systems. But I do feel the threaded systems were easier to write.
In particular, consider the new MIO IO manager included in Haskell GHC 7.8. It's increasing performance of IO-bound lightweight threading by orders of magnitude... all performance provided completely for free to consumers of these abstract threading services who've spent all their time making their algorithms correct, not fast.
State modelling seems pretty powerful, and it always surprises me when people resist it.
Only when you use highly inconvenient model of shared memory. Message passing and shared-nothing combined, as it is done in Erlang or Go, are way easier to reason about, giving at the same time more natural way of structuring code than events handling.
Purely functional language can share state between threads just fine, since any state will be immutable (barring any explicit communication primitives).
I dispute this.
With a functional* approach to writing discrete, modular code with locking as fine-grained and minimal as possible, it's not that hard.
Lot's of threaded servers have appeared. They are, as others have said, absolutely everywhere.
*(not pure functional, just functional as an approach, i.e. writing C code in a side-effect and state-free manner as far as possible)
But, some people have to fail before they learn. Just wish they hadn't failed on such a big project.
So there was already experience with high performance event driven mechanisms before epoll.
That said, the examples presented in this paper do appear to be linux examples.
The reason is that languages like Go (and really any language that uses pipes instead of a shared memory space) move the programmer from a tangled mess of logic and callback hell to individual processes that each resemble the main loop we’re all used to. Cumbersome tasks like fetching a file over HTTP end up looking like opening a local file. The thread just blocks until the data arrives, which the programmer generally doesn’t have to worry about. That was the original vision behind sockets (that everything is a file).
There are architectural benefits to event loops too. You get built in interception and if your code is free of side effects cheaper redundancy.
A related failure mode I've seen a lot is when an operation gets stuck with a lock held and you can't even call into the server to ask the status of the operation because that request blocks on the lock that's held. Of course, the better way to do this is to avoid holding the lock during a blocking operation, and instead use the lock only to set a flag indicating the operation is pending. That way, other threads can observe the current state while the operation is pending. The point is that you have to make these intermediate states explicit. But IMO, that's the hardest part of the event-based system anyway.
In fact, threads and events are dual , and one can be made as efficient and the other (given a proper implementation).
 Lauer and Needham. “On the Duality of Operating System Structures” in 1978;
The new mantra is Reactive Programming (http://www.reactivemanifesto.org/)
Reactive Systems are event-driven but highly concurrent, scalable, easy to maintain and extend because of the "high level" concepts of "Signals & Behaviors" instead of "low level" event handlers.