Also known as <17 requests per second... or a trickle of traffic. Hooray for using bigger numbers and a nonstandard unit to hide inadequacy!
Does Heroku use req/min throughout their service? I can't understand why they would, unless they also can't build the infrastructure to measure on a per-second basis.
> After extensive research and experimentation, we have yet to find either a theoretical model or a practical implementation that beats the simplicity and robustness of random routing to web backends that can support multiple concurrent connections.
Does this CTO think companies like Google and Amazon route their HTTP traffic randomly? No... he knows there are scaleable routing solutions and random routing isn't the best. So he cites "simplicity and robustness." Here, this means "we can't be bothered."
(I was in the bigger engineering team at Amazon that looked into this between between '04-'08.)
After having notable issues with Cisco's hardware load balancers, there was an internal project at Amazon aimed at developing scalable routing solutions.
After years of development effort, it turned out that the "better" solutions didn't work well in production, at least not for our workloads. So we went back to million $ hardware load balancers and random routing.
I don't know if things changed after I left, but I can tell you it wasn't an easy problem. So I completely buy the robustness and simplicity argument these guys are making.
Awesome info, thanks. This has been exactly our experience.
In theory, clever load distribution algorithms (of which one can imagine many variations) are very compelling. Maybe like object databases, or visual programming, or an algorithm that can detect when your program has hit an infinite loop. These are all compelling, but ultimately impractical or impossible in the real world.
Re: requests. RPM is the metric that New Relic reports, and it's the one most of our customers use when they talk about traffic. I try to speak in whatever terms are most familiar to our customers.
Re: I can't speak to Google and Amazon, and they aren't representative of the size of our customers anyway. We have discussed with many folks who run ops at many companies that are more on par with the size of our mid- and large-sized customers, and single global request queues are exceedingly rare.
The most common setup seems to be clusters of web backends (say, 4 clusters of 40 processes each) that each have their own request queue, but with random load balancing across those clusters. This is a reasonable happy medium between pure random and global request queue, and isn't too different from what you get running (say) 16 processes inside a 2X dyno and 8 web dynos.
Didn't intend to imply that. Number of dynos needed varies extremely widely, with the app's response time and language/framework being used as the main variables.
There are apps on Heroku that serve 30k–50k reqs/min on 10–20 dynos, typically written in something like Scala/Akka or Node.js and serving incredibly short (~30ms) response times with very little variation. But these are unusual.
The more common case of a website, written in non-threadsafe Rails, with median response times of ~200ms but 95th percentile at 3+ seconds, would probably use those same 10 dynos to do only a few thousand requests per minute. Whether or not you use a CDN and page caching also makes a big difference (see Urban Dictionary for an example that does it well).
But it really depends. We were trying to quantify when you should be worried. If you're running a blog that serves 600 rpm / 10 reqs/sec off of two dynos, you don't need to sweat it.
This comes back to visibility: knowing where the problem lies (especially when you're using a variety of add-on services or calling external APIs) and being able to understand what's happening, or what happened in retrospect.
Visibility is hard no matter where you run your app. But this is an area where Heroku can get a lot better, and we intend to.
Visibility is one part of the problem. 50k requests/min is only ~833/s. The reality is that a single dyno should be able to more than handle that sort of load, especially if it is a simple app. People are doing 10k connections on a single laptop, 833s should be a piece of cake. So, yes, visibility is a big issue here because you have no idea if you need 10, 11, 12 or 20 dyno's to serve 50k requests/min. You just guess and when you guess wrong, it ends up with cascading failure of H12's and other issues. Never mind that very few apps have a steady stream of traffic and most have big dips depending on the time of day and HN popularity... and now we are back to the auto scaling discussion.
Another key part of your statement is 'with very little variation'. The code pretty much can't be doing anything other than serving up some static content because as soon as anything that requires any sort of IO or cpu will instantly throw the system into H12 hell. Yes, a CDN will take load off your Heroku dyno's because god forbid that your dyno actually do anything itself. Except that you forget that not all apps are webapps and in my case, there is no reason to add a CDN when I'm just serving requests and responses to an iphone app.
The other part of the problem is being able to actually do something about it. I've tried anywhere between 50 and 300 dynos (yes we got that number increased). If we could just throw money at the problem that would be one thing, but nothing was able to resolve the H12's that we see and our paid support contract was no help either.
"If you're running a blog that serves 600 rpm / 10 reqs/sec off of two dynos, you don't need to sweat it."
Once again, we are back at the same conclusion... don't use Heroku if you want to run a production system.