Hacker News new | comments | show | ask | jobs | submit login

I hear you. Heroku's value proposition is that we abstract away infrastructure tasks and let you focus on your app. Keeping you from needing to deal with load balancers is one part of that. If you're worried about how routing works then you're in some way dealing with load balancing.

However, if someone chose us purely because of a routing algorithm, we probably weren't a great fit for them to begin with. We're not selling algorithms, we're selling abstraction, convenience, elasticity, and productivity.

I do see that part of the reason this topic has been very emotional (for us inside Heroku as well as our customers and the community) is that the Heroku router has historically been seen as a sort of magic black box. This matter required peeking inside that black box, which I think has created a sense of some of that magic being dispelled.

I doubt anyone chose Heroku solely because of the routing algorithm, but the sales point of intelligent routing was certainly an appealing one.

Developers don't want to pay for abstraction just for abstraction's sake, they want to pay for abstraction of difficult things. Intelligent routing is one of those difficult things. Random routing is easy, which is of course why it's also more reliable, but also why you're seeing people feeling like they didn't get what they paid for.

I should be clear; this doesn't affect me personally but I am totally sympathetic with the customers who are bent out of shape about this and I still see a divide between your response and why people are upset, and I'm trying to help you bridge that divide.

Well said.

It's interesting — very few customers are actually bent out of shape about this. (A few are, for sure.) It's more non-customers who are watching from the sidelines that seem to be upset. I do want to try to explain ourselves to the community in general, and that's what this post was for. But my first loyalty is to serving our current customers well.

What about potential customers? I've been evaluating your platform and just be completely frustrated with 3 days wasted trying to solve these performance issues.

I'm using gunicorn with python, and if I use the sync worker the request queue easily hits 10 seconds and nothing works; if I switch to gevent or eventlet, new-relic tells me that redis is taking the same time stuck getting a value. This is using the same code in my current provider that works just fine with eventlet and scales well.

To add insult to injury, adding dynos actually degrades performance.

That sucks. It may or may not be related to how the router works, but it's definitely about performance and visibility, which is what this is all about.

Can you email me at adam at heroku dot com and I'll connect you with one of our Python experts? I can't promise we'll solve it, but I'd like to take a look.

I'd be really interested to see some public information resulting from debugging Python apps. We're holding pretty steady, but see a fairly constant stream of timeouts due, apparently, to variance in response times. To be sure, we're working on that. But, in the meantime, our experiments with gevent and Heroku have been less than inspiring.

I've connected Nathaniel (poster above) with one of our Python folks. Looking forward to seeing what they discover.

Would you be willing to pair up with one of our developers on your app's performance? If so email me (adam at heroku dot com).

I'm an existing customer using python with gunicorn. I'd be very keen to see any learnings about an optimal setup.

Fwiw, I've found the addon / db connection limits to be the primary blocker when load testing so far.

"It's interesting — very few customers are actually bent out of shape about this"

Seems to me you don't get it. Sure there are some very vocal non-customers but you also have a lot of potential customers and users (spinning up free instances) evaluating your product and hoping for a better product. I agree that your true value is the abstraction you provide. Some of these potential customers want to ensure Heroku is as good an abstraction as promised to justify the cost and commitment.

Fair enough. I think the best thing we can do for those potential future customers is be really clear about what the product does and give them good tools and guidance to evaluate whether it fits their needs.

I'd argue that we dropped the ball on that before (on web performance, at least), and are rectifying it now.

If its only a few customers that are bent out of shape, how come you haven't quickly offered them refunds?

We did.

When did this happen? As of March 2, Rap Genius was still seeking to get money refunded.


Just in the last few weeks. I won't disclose details for any particular customer, but I assume that Tom @ RG will make a public statement about it at some point.

You sound like a politician talking to someone of the opposite party, in that you say "I hear you", but then completely fail to address anyone's concerns. Selling a "magic black box" that guarantees certain properties, changes them, then lies about having changed them presents a liability for users who want to do serious work.

A major selling point of Heroku is that scaling dynos wouldn't be a risk. This guarantee is now gone and it's not coming back soon even if routing behavior is reverted, because users prefer good communication and trust with their providers. The responses of Heroku are blithe non-acknowledgement acknowledgements of this problem.

This is really unfair. This comment:

>A lot of Heroku's apparent value came from the intelligent routing feature. Everybody knew that it was harder to implement than random routing, that's why they were willing to pay Heroku for it.

is being addressed by Adam in this comment:

>Heroku's value proposition is that we abstract away infrastructure tasks and let you focus on your app. Keeping you from needing to deal with load balancers is one part of that. If you're worried about how routing works then you're in some way dealing with load balancing.

I think Adam is getting to a really fair point here, which is that nobody really minds whether the particular algorithm is used. If A-Company is using "intelligent routing" and B-Company uses "random routing," but B-Company has better performance and slower queue times, who are you going to choose? You're going to choose B-Company.

At the end of the day, "intelligent routing" is really nothing more than a feather in your cap. People care about performance. That's what started this whole thing - lousy performance. Better performance is what makes it go away, not "intelligent routing."

Intelligent routing and random routing have different Big O properties. For someone familiar with routing, or someone who's looked into the algorithmic properties, "intelligent routing" gives one high-level picture of what the performance will be like (good with sufficient capacity), whereas random routing gives a different one (deep queues at load factors where you wouldn't expect deep queues).

This is why it was good marketing for Heroku to advertise intelligent routing, instead of just saying 'oh it's a black box, trust us'. You need to know, at the very least, the asymptotic performance beavhior of the black box.

And that's why the change had consequences. In particular, RapGenius designed their software to fit intelligent routing. For their design the dynos needed to guarantee good near-worst-case performance increased with the square of the load, and my back-of-the-envelope math suggests the average case increases by O(n^1.5).

The original RapGenius post documents them here: http://rapgenius.com/James-somers-herokus-ugly-secret-lyrics

The alleged fix, "switch to a concurrent back-end", is hardly trivial and doesn't solve the underlying problem of maldistribution and underutilization. Maybe intelligent routing doesn't scale but 1) there are efficient non-deterministic algorithms that have the desired properties and 2) it appears the old "doesn't scale" algorithm actually worked better at scale, at least for RapGenius.

>If A-Company is using "intelligent routing" and B-Company uses "random routing," but B-Company has better performance and slower queue times, who are you going to choose?

That's not the point.

As I think of it, "performance" is an observation of a specific case while intelligent/random can be used to predict performance across all cases.

Harsh. I'm going to ignore the more inflammatory parts of this (feel free to restate if you want me to engage in discussion), but one bit did grab my attention:

"A major selling point of Heroku is that scaling wouldn't be a risk"

This is interesting, especially the word "risk." Can you expand on this?

Very small customer here. I don't know much about the abstraction you provide us and I don't want to know as long as things go well.

To my point of view, the routing is "random" thus kind of unpredictable. If scaling becomes more of an issue with my business, the last thing I want is to have random scaling issues that I can not do anything about because the load balancer is queuing requests randomly to my dynos.

I want my business to be predictable and if I'm not able to have it I'm going to pack my stuff and move somewhere else.

For now, I'm happy with you except for your damn customer service. They take way too long to answer our questions!

Cheers! :)

Absolutely right, I totally agree. Random scaling issues that you can't either see or control is exactly the opposite of what we want to be providing.

Can you email me (adam at heroku dot com) some links to your past support tickets? I'd like to investigate.

Thanks for running your app with us. Naturally, I expect you to move elsewhere if we can't provide you the service you need. That's the beauty of contract-free SaaS and writing your app with open standards: we have to do a good job serving you, or you'll leave. I wouldn't want it any other way.


Very clarifying, thanks.

Fire-fighting during the scaling phase is a problem that every fast-growing software-as-a-service business will probably have to face. I think Heroku makes it easier, perhaps way easier; but I hope our marketing materials etc have not implied a scaling panacea. Such a thing doesn't exist and is most likely impossible, in my view.

My company has been building apps for startups for years, and I can confirm that Heroku is consistently perceived as a "I never have to worry about scaling" solution.

Very useful observation. I'd love to figure out how we can better communicate that while we aim to make scaling fast and easy, "you never have to worry about scaling" is much too absolute.

Your marketing materials clearly use phrases like "forget about servers", "easily scale to millions of users", "scale effortlessly" and so forth. You're making it very easy to misunderstand you.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact