Hm. Until now I thought dyno-presence is your issue, but now I realize you're talking about the actual "leastconn" part, i.e. the requests queueing up on the dynos itself?
If that's what you actually mean then I'd ask: Can't the dynos reject requests when they're busy ("back pressure")?
AFAIK that's the traditional solution to distributing the "leastconn" constraint.
In practice we've implemented this either with the iptables maxconn rule (reject if count >= worker_threads), or by having the server immediately close the connection.
What happens is that when a loadbalancer hits an overloaded dyno the connection is rejected and it immediately retries the request on a different backend.
Consequently the affected request incurs an additional roundtrip per overloaded dyno, but that is normally much less of an issue than queueing up requests on a busy backend (~20ms retry vs potentially a multi-second wait).