This is how it works. Dynos register their presence into a dyno manager which publishes the results into a feed, and then all the routing nodes subscribe to that feed.
But dyno presence is not the rapidly-changing data which is subject to CAP constraints; it's dyno activity, which changes every few milliseconds (e.g. whenever a request begins or ends). Any implementation that tracks that data will be subject to CAP, and this is where you make your choice on tradeoffs.
> why would that mean "lower availability" or "higher latency"?
I'll direct you back to the same resources we've referenced before:
> Did you look into zookeeper?
This is the best question ever. Not only did we look into it, we actually invested several man-years of engineering into building our own Zookeeper-like datastore:
Zookeeper and Doozerd make almost the opposite trade-off as what's needed in the router: they are both really slow, in exchange for high availability and perfect consistency. Useful for many things but not tracking fast-changing data like web requests.
If that's what you actually mean then I'd ask: Can't the dynos reject requests when they're busy ("back pressure")?
AFAIK that's the traditional solution to distributing the "leastconn" constraint.
In practice we've implemented this either with the iptables maxconn rule (reject if count >= worker_threads), or by having the server immediately close the connection.
What happens is that when a loadbalancer hits an overloaded dyno the connection is rejected and it immediately retries the request on a different backend.
Consequently the affected request incurs an additional roundtrip per overloaded dyno, but that is normally much less of an issue than queueing up requests on a busy backend (~20ms retry vs potentially a multi-second wait).
PS: Do you seriously consider Zookeeper "really slow"?! http://zookeeper.apache.org/doc/r3.1.2/zookeeperOver.html#Pe...