But that's the service people thought they were getting and what they wanted.
If Heroku prices out the intelligent routing and says; "Ok you can have intelligent routing with your current backend stack, but it's going to cost you $25/mo for evert 10 dynos, or you can switch your stack and use randomized routing for free." Then they are empowering their customers to make the choice rather than dictating to them what they should do.
Aside from that, I am extremely sympathetic to Heroku's engineering point here --- it's obviously hard for HN to extract the engineering from the drama in this case! Randomized dispatch seems like an eminently sound engineering solution to the request routing problem, and the problems actually implementing it in production seem traceable almost entirely to††† the ways Rails managed to set back scalable web request dispatch by roughly a decade††††.
††† IT IS ALL LOVE WITH ME AND THIS POINT COMING UP HERE...
†††† ...it was probably worth it!
The solution is to combine the two approaches. You split the 100 nodes into 10 groups of 10, you route randomly to one of the groups, and then within a group you route intelligently. This works really well. The probability of one of the request queues filling up is astronomically small, because for a request queue to fill up, all 10 request queues in a group have to fill up simultaneously (and as we know from math, the chance that an event with probability p occurs at n places simultaneously is exponentially small in n). Even if you route randomly to 50 groups of 2, that works a lot better than routing randomly to 100 groups of 1 (though obviously not as well as 10 groups of 10). There is a paper about this: http://www.eecs.harvard.edu/~michaelm/postscripts/handbook20...
This is essentially what they are suggesting: run multiple concurrent processes on one dyno. Then the requests are routed randomly to a dyno, but within a dyno the requests are routed intelligently to the concurrent processes running on that dyno. There are two problems with this: (1) dynos have ridiculously low memory so you may not be able to run many (if any) concurrent processes on a single dyno (2) if you have contention for a shared resource on a dyno (e.g. the hard disk) you're back to the old situation. They are partially addressing point (1) by providing dynos with 2x the memory of a normal dyno, which given a Rails app's memory requirements is still very low (you probably have to look hard to find a dedicated server that doesn't have at least 20x as much memory).
They could be providing intelligent routing within groups of dynos (say groups of 10) and random routing to each group, but apparently they have decided that this is not worth the effort. Another thing is that apparently their routing is centralized for all their customers. Rapgenius did have what, 150 requests per second? Surely that can even be handled by a single intelligent router if they had a dedicated router per customer that's above a certain size (of course you still have to go to the groups of dynos model once a single customer grows beyond the size that a single intelligent router can handle).
There's a tradeoff between:
* a well-engineered request handler (a solved problem more than a decade ago) and
* an efficient development environment (arguably a nearly-unsolved problem before the Rails era)
And I feel like mostly the Heroku drama is a result of Rails developers not grokking which end of that tradeoff they've selected.
Of course Heroku is under no obligation to do anything, but its customers have to justify its cost and low performance relative to a dedicated server. And most applications run just fine on a single or at most a couple dedicated servers, which means you don't have routing problems at all, whereas to get reasonable throughput on Heroku you have to get many Dynos, plus a database server. A database server with 64GB ram costs $6400 per month. You can get a dedicated server with that much ram for $100 per month. Heroku is supposed to be worth that premium because it is convenient to deploy on and scale. Because of these routing problems which may require a lot of engineering effort in your application it's not even clear that Heroku is more convenient (e.g. making it use less memory so that you can run many concurrent request handlers on a single Dyno).
I'm not sure there are such providers, and if there aren't, I think it's safe to point the finger towards Rails.
As a system for efficiently handling database-backed web requests, Rails is archaic. Not just because of its memory use requirements! It is simultaneously difficult to thread and difficult to run as asynchronous deferrable state machines.
These are problems that Schmidt and the ACE team wrote textbooks about more than 10 years ago.
(Again, Rails has a lot of compensating virtues; I like Rails.)
> I'm not sure there are such providers, and if there aren't, I think it's safe to point the finger towards Rails.
This is not sound logic. I described above two methods for solving the problem: (1) increase the memory per Dyno (see below: they're doing this, going from 512MB to 1GB per Dyno IIRC, which although still low will be a great improvement if that means that your app can now run 2 concurrent processes per Dyno instead of 1), or (2) do intelligent routing for small groups of Dynos. Do you understand the problem with random routing, and why either of these two would solve it? If not you might find the paper I linked to previously very interesting:
"To motivate this survey, we begin with a simple problem that demonstrates a powerful fundamental idea. Suppose that n balls are thrown into n bins, with each ball choosing a bin independently and uniformly at random. Then the maximum load, or the largest number of balls in any bin, is approximately log n / log log n with high probability. Now suppose instead that the balls are placed sequentially, and each ball is placed in the least loaded of d >= 2 bins chosen independently and uniformly at random. Azar, Broder, Karlin, and Upfal showed that in this case, the maximum load is log log n / log d + Θ(1) with high probability [ABKU99].
The important implication of this result is that even a small amount of choice can lead to drastically different results in load balancing. Indeed, having just two random choices (i.e., d = 2) yields a large reduction in the maximum load over having one choice, while each additional choice beyond two decreases the maximum load by just a constant factor."
Most things are inferior to other substitutable things! :)
And here we've re-invented the airport passport checking queue - everybody hops onto the end of a big long single queue, then near the front you get to choose the shortest of the dozen or two individual counter queues
I wonder what the hybrid intelligent/random queue analogues of the in-queue intelligence gathering and decision making you caan do at the airport might be? "Hmmm, a family with small children, I'll avoid their counter queue even if it's shortest", "a group of experienced-looking business travellers, they'll probably blow through the paperwork quickly, I'll queue behind them". I wonder if it's possible/profitable to characterize requests in the queue in those kinds of ways?
The difference, of course, is that ELBs are single-tenant. So a big app might only end up with half a dozen nodes, instead of the much larger number in Heroku's router fleet.
Offering some kind of single-tenant router is one possibility we've considered. Partitioning the router fleet, homing... all are ideas we've experimented with and continue to explore. If one of these produces conclusive evidence that it provides a better product for our customers and in keeping with the Heroku approach to product design, obviously we'll do it.
My hypothesis is that tenant-specific intelligent load balancers would be plausible; i would guess that you would never need more than a handful of HAProxy or nginx-type balancers to front even a large application. Your main challenge would then be routing requests to the right load balancer cluster. If you had your own hardware, LVS could handle that (i believe that Wikipedia in 2010 ran all page text requests through a single LVS in each datacentre), but i'm not sure what you do on EC2.
However, "hypothesis" is just a fancy way of saying "guess", which is why your findings from actual experiments would be so interesting.