Unless the dyno signals to the router that it is busy, isn't this just postponing the problem? Per dyno, unicorn can handle more requests but requests will still get queued at dyno level if one of the requests is a slow one (say 2-3 seconds)
If only 1 request is slow and you have 7-8 unicorn workers, only one of them will stay busy. Unicorn knows which of it's workers are busy and does not queue jobs behind individual worker but rather behind the master who delegates the request to the first available worker.
Precisely. As mentioned in the FAQ, putting the queueing logic closer to the process which is ultimately going to serve the request is a more horizontal-scale friendly way of tackling the queueing problem.
It works fantastically well for backends that can support 20+ concurrent connections, e.g. Node.js, Twisted, JVM threading, etc. It works less well as you can put fewer connections in each backend, which is part of why we're working on larger dynos.