We're aware of the random H12s problems. Some apps are affected pretty badly, others not at all, and we're not sure why yet. Sorry that you've had such a bad experience with this. We're continuing to investigate. If we're not able to find a solution in a timely fashion, I'll completely understand if you no longer want to use our product for this app.
Knowing how many dynos you need is definitely a problem. We have implemented autoscaling in the past... but it always sucked. It's hard to find a one-size-fits-all-solution. Rather than ship something sub-par we chose not to ship anything at all.
I understand a lot of people do well with autoscaling libraries and 3rd party add-ons. Would be curious to hear your experience with any of these.
I completely agree that it shouldn't take complaining in public to get a company to listen to its customers. That's was our biggest mistake in all of this, IMO — not listening.
I gladly accept your compliment that our git deploy remains the best on the market. :)
I'm sorry we haven't been able to serve you better. Let me know if you'd be willing to talk via skype sometime — even if you end up leaving the platform (or already have), I'd like to understand in more depth where we went wrong so we can do better in the future.
Your response only re-enforces my hard won opinion that Heroku should never be used for a production environment for any business that is trying to be successful and popular. Admitting that you have no idea why critical areas of your infrastructure is causing issues, while at the same time charging people an arm and a leg for services (we pay ~$4k/mo) feels like theft to me. I've built solutions for a large porn company that runs on significantly less infrastructure than what we are running on Heroku and handles 100x more traffic. Something is wrong here with the dyno/router model and maybe it is that you guys are just oversubscribed and not admitting to it in public.
Yes, autoscaling is hard. I have apps on Google AppEngine and see their issues as well. That said, at least they are trying. Maybe even take one of those 3rd party libraries and try to harden and adapt that and make it a real solution? I think the real problem though lies in the fact that there isn't any good metrics for what dynos are doing so there is no metric for when something is too busy or not. Yes, log-runtime-metrics puts out some numbers, but those numbers are meaningless when all I have is a slider to change the amount of money we are paying you.
I should qualify that git deploy compliment because there are issues with that as well. For example, why do you have to rebuild the npm modules from scratch each time? Why not have a directory full of pre-built modules for your dyno's that are just copied into my slug? This relatively simple change would increase the speed of deployments greatly. Never mind that deployments aren't reliable and fail randomly. At least it is easy to just try again.