"The system could easily be built so that it by default only aggregates those services where the configuration indicates they can handle a concurrency below a certain level, and does random balancing of everything else."
I've built fairly large haproxy based infrastructures, thank you very much. Doing this is not particularly challenging.
Actually what I'd probably do for a setup like this would be to balance by the Host: header, and simply have the second layer be a suitable set of haproxy instances balancing each by least connections.
Haproxy doesn't support dynamic configurations as far as I know, which is a serious problem if you're letting lots of people add/change domains and scale backends up/down dynamically. A Heroku haproxy would probably need to be restarted multiple times a second due to config changes. Nginx can do dynamic backends with lua & redis, but it can't use the built-in upstream backend balancing/failover logic if you do.
While it doesn't support dynamic configurations, it does support hot reconfiguration (the new daemon signals the old processes to gracefully finish up and shut down), and reconfigures very rapidly. You still don't want to restart it multiple times a second, but you don't need to:
A two layer approach largely prevents this from being a problem. You can afford total overkill in terms of the number of haproxies as they're so lightweight - running a few hundred individual haproxy instances with separate configs even on a single box is no big deal.
The primaries would rarely need to change configs. You can route sets customers to specific sets of second layer backends with ACL's on short substrings of the hostname (e.g. two letter combinations), so that you know which set of backends each hostname you handle maps to, and then further balance on the full host header within that set to enable the second layer to balance on least-connections to get the desired effect.
That lets you "just" rewrite the configs and hot-reconfigure the subset of second layer proxies handling customers that falls in the same set on modifications. If your customer set is large enough, you "just" break out the frontend into a larger number of backends.
Frankly, part of the beauty of haproxy is that it is so light that you could probably afford a third layer - a static primary layer grouping customers into buckets, a dynamic second layer routing individual hostnames (requiring reconfiguration when adding/removing customers in that bucket) to a third layer of individual customer-specific haproxies.
So while you would restart some haproxy multiple times a second, the restarts could trivially be spread out over a large pool of individual instances.
Alternatively, "throwing together" a second or third layer using iptables either directly or via keepalived - which does let you do dynamic reconfiguration trivially, and also supportes least-connections load balancing - is also fairly easy.
But my point was not to advocate this as the best solution for somewhere like Heroku - it doesn't take a very large setup before a custom solution starts to pay off.
My point was merely that even with an off the shelf solution like haproxy, throwing together a workable solution that beats random balancing is not all that hard - there's a large number of viable solutions -, so there really is no excuse not to for someone building a PaaS.