
Running Rails on Heroku Update - neya
https://blog.heroku.com/archives/2013/3/13/running_rails_on_heroku_update?
======
RyanZAG
Still not fixing the core issue - intelligent routing.

I've played around with various routing methods before, and intelligent
routing does seem to be incredibly difficult at scale. I don't think anybody
gets it right? Any examples of intelligent routing at scale that works
properly?

If not, might be a good place to start an OSS project into and benefit
everyone. Intelligent routing should be far from impossible, all you need is
very basic communication between servers and routers, possibly using some form
of minimal broadcasting or multicast?

A simple solution could be having each server send 'dont send me more
requests' and 'im ready for more requests'

~~~
aristus
I have working knowledge of a couple of large routing systems (Yahoo,
Facebook) and built a few smaller ones. In general, _state is the enemy of
scale_. You want large systems to be simpler and less stateful, not more
stateful and "intelligent".

One good notion is to embed server statistics into the response, like in an
HTTP header. That way the LB can be aware of queue size, queue wait,
utilization, etc. But that only works if the knowledge of server health is
global to all load balancers, or if some LBs only talk to some servers.

If your requests are more or less the same "size", eg they take roughly the
same amount of CPU / wall / etc, all of this gets vastly easier.

~~~
aphyr
For single-threaded servers, request latency dominates--it's hard to get
useful state information out of a dyno, because as soon as it's returned a
response, its state has changed. You can add a dedicated stats endpoint to
each server, but that requires cooperation from the customer's application.
There are... alternate approaches, however.

~~~
Xorlev
IMO, adding feedback mechanisms (e.g. LB polling app or server agent) are
among the best ways to create smart, fast LBs without keeping a lot of
expensive state. Calculating busyness is another question entirely but it's
also a much easier problem to solve than stateful routers.

------
jackowayed
There's already a lot of negativity on this thread, suggesting that they only
care about this because they're facing bad PR and lawsuits.

I'm inclined to give them the benefit of the doubt. They made a reasonable
engineering and product tradeoff to increase their system's scalability,
decrease it's latency, and allow new kinds of applications that are able to
serve more than one request at once (eg. node.js servers holding onto tons of
idle websockets).

Given an increasingly large infrastructure (which makes stronger queueing much
harder if you don't want to add a lot of latency) and strong requests from
customers to allow other languages that don't fit into the "one fast request
per process" paradigm, I can absolutely see making that tradeoff. It makes the
infrastructure simpler (and thus more robust), cuts some latency, and makes
the many loud customers who wanted to run things like node.js happy.

Doing so, I would have been aware that in some cases, this would lead to
suboptimal routing that increased latency, but I never really would have
thought it would have as devastating effects on an application's overall
performance as Rap Genius saw.

So yes, they do care a lot more about this issue now that there has been a big
stir about it, but I think it's reasonable to assume that this may be more
because they've realized it can have a huge impact on overall latency rather
than a moderate one, and that it's come to their attention that some of their
messaging and docs hadn't been updated reflecting the routing change.

But given how great of a company Heroku usually is, the fact that it is not in
their interest to decrease their quality of service, and that there are very
legitimate reasons to move away from intelligent routing (namely, the facts
that it doesn't really make sense for applications that can do multiple
requests concurrently and that it is extremely difficult to scale), I'm
inclined to believe that this is more a mistake than a malicious lack of
concern for their customer's well-being that they are only addressing due to
vocal outside pressure.

------
minsight
The title makes it seem that they may be solving the underlying problem.
Unfortunately, they'd rather update their users about how the routing mesh
behaves (spoiler alert: poorly) than update their routing mesh to work as they
had gone to great lengths to falsely claim.

------
lifeisstillgood
Just a thought that I would love some feedback on: I would assume heroku would
have tried a hierarchy of routers when the peer grid style became overwhelmed
- some. Ross between BGP and DNS - the entry router simply passes the request
lower down the hierarchy that is advertising that app / group of apps.

(Something like my app name is heroku.ruby.free.cheapcustomers.myapp )

There will be lots of header handling but it seems simpler

------
smnrchrds
Another case of re-submission. Here is the original one:
<https://news.ycombinator.com/item?id=5371143>

The only difference is the URL here has a trailing question mark.

Why does this keep happening?

------
mamcx
This also is a problem for python and others stacks, or only for ruby?

~~~
habosa
It's a problem for anyone using a web framework/server combo that does't allow
for concurrent requests on a single dyno. The most common such combination is
Ruby on Rails/Thin Webserver, which is (I believe) the most commonly deployed
app configuration on Heroku by a mile.

~~~
praptak
> It's a problem for anyone using a web framework/server combo that does't
> allow for concurrent requests on a single dyno.

Or one that allows only k concurrent requests with a routing mesh that
schedules m>k per server (dyno) or at least does that often enough.

Even with unlimited concurrent requests per server, similar problems may arise
if the performance drops sharply with the number of concurrent requests being
served (e.g. due to swapping.)

------
kmfrk

        Fulfilling these commitments is Heroku’s number one 
        priority
    

I think this is the part that really breaks the camel's back for me.

I don't doubt that dealing with the biggest PR disaster in the company's
history is their number-one priority, don't get me wrong.

EDIT: It's probably better to make your opinion known on the blog post than
here, though.

~~~
braum
they could have been doing this all along if that was their priority. it's
only a priority now that they are under a microscope and possibly facing
lawsuits.

~~~
kmfrk
The present tense of "is" is also a bit irksome. "Was" it their priority, or
has it just become that in light of recent events.

I'm not going to make it sound like this is something that can be fixed and
mended by hiring some copywriters and PR consultants, but they're doing a very
poor job of making me personally give them the benefit of the doubt.

It's like the asshole who says "I'm sorry you feel offended", or "I'm sorry I
got caught". There isn't even a semblance of an apology and understanding of
the consequences of this for their customers.

~~~
braum
that would imply guilt and I'm sure their lawyers are pushing them very
strongly to avoid such statements.

------
joevandyk
Larger dynos would mean it would be easier to run JVM apps. Looking forward to
that!

------
cmelbye
This still doesn't solve the issue at all, all they've done is make their
documentation more honest. Switching to an app server like Unicorn just means
that a dyno will lock up after it's randomly assigned more than
NUMBER_OF_WORKERS requests rather than more than one request. Slightly better,
sure, but it doesn't fix the core issue.

------
smoyer
It's interesting to me that in the comments at the bottom of their blog post,
the Heroku guys are reaching out to people that voice objections. I'm not a
Heroku customer, but I'd think they'd want to let all their customers know
about their progress. Did this go out in an e-mail too?

