Ironically, doing evented/fiber code right is probably harder than doing threads right, for this kind of stuff.
I'm a bit astounded that heroku, in their attempt to deal with, um, let's call it "routing-gate", aren't talking about talking about multi-threaded dispatch and config.threadsafe!, but only unicorn with 2-4 forked processes. When it seems awfully likely that multi-threaded dispatch is going to scale a lot more efficiently with regard to number of overlapping requests.
I think some of it is the lack of mature, robust, 'self-managing' app server solutions. For MRI (with the GIL), what's likely needed is something that can fork multiple processes (to use all cores), with each of those processes dispatching multi-threaded (to deal with I/O blocking as well as even-ing out latency when not all requests finish in identical time). So far as I know, Passenger 4 Enterprise is the only thing that can do this for you, without you having to manually set it all up.
Seriously, I just recently switched to puma and enabled config.threadsafe! and that was that. Now I/O calls like HTTP requests don't block the server, but are still synchronous. I didn't even need to switch to jruby.