We've run many experiments over the past month to try other approaches to routing, including recreating the exact layout of the Bamboo routing layer (which would never scale to where we are today, but just as a point of reference). None have produced results that are anywhere near as good as using a concurrent backend. (I'd love to publish some of these results so that you don't have to take my word for it.)
That said, we're not done. There are changes to the router behavior that could have an additive effect with apps running Unicorn/Puma/etc, and we'll continue to look into those. But concurrent backends are a solution that is ready and fully-baked today.
We also adopted Unicorn pretty early on but still suffered issues as dynos simply ran out of memory. In fact with some apps we have seen improvements (but still far from acceptable) in stability by un-adopting this method. The issues raised by scottshea below, as a consequence concern me, what/will be the charge for these as well?
To be honest it's the fumbling around in the dark that has annoyed me. I am with you 100% on your manifesto and your points about the type of service you provide. However the time we have spent on this (starting before you came clean about the issues) and the time spent on other increasingly suspicious advice to "up dynos" or spend time "optimising your app" sours this slightly. I accept the "magic black box" comes with its compromises and required understanding at our end but it also means needing to be far more communicative and honest about it at yours. Something which you are putting right I can see.
I for one think the premise of Heroku is a great one and you have succeeded for us in many of the things you have set out to achieve. This whole situation has been a real shame, I'm sure this must have a been a pretty shody time for you guys and I hope you come out the better for it. The quicker the better to be honest so you can focus on the new features we'd like to see.
Thanks for your support. Indeed, communication and transparency into how the product works as far as how it affects your app are two things we'd like to get better at.
Regarding your app: indeed, Unicorn is a huge improvement, but far from the end of the story. "Performance" is like "security" or "uptime" — it's not a one-time feature, something you check off a list and move on. It's something that requires constant work, and every time you fix one problem or bottleneck that just leads you to the next one.
Over time, though, your vigilance pays off with a service that its users deem to be fast or secure or have good uptime. Yet there's no such thing as a finish line on these.
Bringing it back to details. Kazuki from Treasure Data made this Unicorn worker killer that might help you: https://github.com/kzk/unicorn-worker-killer If you're still not happy with your app's performance, give me a shout at adam at heroku dot com and we'll see if we can help.
Please publish these results. I think a chart showing that Unicorn + Random routing is better than Thin + Intelligent routing would go a long way to ending this whole thing. That's assuming that you can make deploying a Unicorn app as easy as it was with Thin ('git push heroku')
Having been part of their efforts to test 2X, and 4X Dynos and using Unicorn long before the RapGenius issue I can tell you the added memory and Unicorn still has issues. We still see periods where queueing is above 500ms. The additional Dyno capacity distributes the chance of queue issues out over a larger numerical set but there is still the possibility of one dyno/Unicorn worker combo getting too much to handle. We use Unicorn Worker Killer to help in that case.
We might. But what does this actually get us? It helps clear Heroku's name, but it doesn't help our customers at all. I'd prefer to spend our time and energy making customer's lives better.
Given the choice between continuing the theoretical debate over routing algorithms vs working on real customer problems (like the H12 visibility problem mentioned elsewhere in this thread), I much prefer the latter.
I respect that mindset, I just don't think it would hurt. Maybe a middle ground would be a full-scale tutorial on how to switch from Thin on Bamboo/Cedar to Unicorn on Cedar for Rails users. It's a non-trivial process and I know I'd like some help with it. And in this same tutorial/article you could throw down the benchmarks you ran as motivation/justification.
Unless the dyno signals to the router that it is busy, isn't this just postponing the problem? Per dyno, unicorn can handle more requests but requests will still get queued at dyno level if one of the requests is a slow one (say 2-3 seconds)
If only 1 request is slow and you have 7-8 unicorn workers, only one of them will stay busy. Unicorn knows which of it's workers are busy and does not queue jobs behind individual worker but rather behind the master who delegates the request to the first available worker.
Precisely. As mentioned in the FAQ, putting the queueing logic closer to the process which is ultimately going to serve the request is a more horizontal-scale friendly way of tackling the queueing problem.
It works fantastically well for backends that can support 20+ concurrent connections, e.g. Node.js, Twisted, JVM threading, etc. It works less well as you can put fewer connections in each backend, which is part of why we're working on larger dynos.