
Heroku's Ugly Secret: The story of how the cloud-king turned its back on Rails - tomlemon
http://rapgenius.com/James-somers-herokus-ugly-secret-lyrics
======
teich
This is Oren Teich, I run Heroku.

I've read through the OP, and all of the comments here. Our job at Heroku is
to make you successful and we want every single customer to feel that Heroku
is transparent and responsive. Getting to the bottom of this situation and
giving you a clear understanding of what we’re going to do to make it right is
our top priority. I am committing to the community to provide more information
as soon as possible, including a blog post on <http://blog.heroku.com>.

~~~
bambax
What's the point of posting a link to the front page of your blog, where the
most recent article is 15 days old (4 hours after the comment above)?

What we want to know:

\- is the OP right or wrong? That is, did you switch from smart to naive
routing, for all platforms, and without telling your existing or future
customers?

\- if you did switch from smart to naive routing, what was the rationale
behind it? (The OP is light on this point; there must be a good reason to do
this, but he doesn't really say what it is or might be)

\- if the OP is wrong, where do his problems might come from?

\- etc.

~~~
shabble
>> _I am committing to the community to provide more information as soon as
possible, including a blog post on_ <http://blog.heroku.com>

> _What's the point of posting a link to the front page of your blog, where
> the most recent article is 15 days old (4 hours after the comment above)?_

I think OP is saying 'I am going to investigate the situation; when I am
finished here [the blog] is where I will post my response', not that there is
something there already.

That said, it's all a little too PR-Bot for my taste (although there's
probably only so many ways to say the same info without accidentally accepting
liability or something).

~~~
mcguire
Note: I think we have different referents for "OP" here; bombax's is, I think,
the whining customer; while shabble's is the pompous CEO.

Me, I'm the swarthy pirate. Arrrh.

~~~
bambax
Upvoted, although it's bambax, not bombax ;-)

~~~
mcguire
I'm having trouble reading around the eye patch.

------
toast76
Wow. This is explains a lot.

We've always been of the opinion that queues were happening on the router, not
on the dyno.

We consistently see performance problems that, whilst we could tie down to a
particular user request (file uploads for example, now moved to S3 direct), we
could never figure out why this would result in queuing requests given
Heroku's advertised "intelligent routing". We mistakenly thought the occasion
slow request couldn't create a queue....although evidence pointed to the
contrary.

Now that it's apparent that requests are queuing on the dyno (although we have
no way to tell from what I can gather) it makes the occasional "slow requests"
we have all the more fatal. e.g. data exports, reporting and any other non-
paged data request.

~~~
runarb
Is it so that a dyno can only handle a single user request at a time?

Why dos it not use some kind of scheduling system to handle other task while
one task is waiting on i/o?

~~~
Cushman
It's not exactly so, if you use a server that spawns child processes:
[http://michaelvanrooijen.com/articles/2011/06/01-more-
concur...](http://michaelvanrooijen.com/articles/2011/06/01-more-concurrency-
on-a-single-heroku-dyno-with-the-new-celadon-cedar-stack/) you can potentially
handle 3-4 requests per dyno at a time. That doesn't fix the root problem,
though.

~~~
toast76
Investigating this approach now. It won't fix the problem, but will certainly
reduce the occurrence of blocked dynos. Thx!

EDIT: will need to look into our memory perf though, looks like we'll need to
do some work to get more than a couple of workers.

~~~
joeya
I can confirm this. We experimented with Unicorn as a way to get some of the
benefits of availability-based routing despite Heroku's random routing. Our
medium-sized app (occupying ~230 MB on boot) would quickly exceed Heroku's 512
MB memory limit when forking just 2 unicorn workers, so we had to revert to
thin and a greater number of dynos.

------
michaelrkn
We ran into this exact same problem at Impact Dialing. When we hit scale, we
optimized the crap out of our app; our New Relic stats looked insanely fast,
but Twilio logs told us that we were taking over 15 seconds to respond to many
of their callbacks. After spending a few weeks working with Heroku support
(and paying for a dedicated support engineer), we moved to raw AWS and our
performance problems disappeared. I want to love Heroku, but it doesn't scale
for Rails apps.

~~~
WillieBKevin
We moved our Twilio app off Heroku for the same reasons. Extensive
optimizations and we would still get timeouts on Twilio callbacks.

The routing dynamics should be explained better in Heroku's documentation.
From an engineering perspective, they're a very important piece of information
to understand.

We're with <https://bluebox.net> now and are very happy.

------
FireBeyond
This should be more prominent. I want to love Heroku, and am sure that I
could.

But really, throwing in the towel at intelligent routing and replacing it with
"random routing" is horrific, if true.

It's arguable that the routing mesh and scaling dynamics of Heroku are a large
part, if not -the- defining reason for someone to choose Heroku over AWS
directly.

Is it a "hard" problem? I'm absolutely sure it is. That's one reason customers
are throwing money at you to solve it, Heroku.

~~~
chc
> _But really, throwing in the towel at intelligent routing and replacing it
> with "random routing" is horrific, if true._

The thing is, their old "intelligent routing" was really just "we will only
route one request at a time to a dyno." In other words, what changed is that
they now allow dynos to serve multiple requests at a time. When you put it
that way, it doesn't sound as horrific, does it?

~~~
arcatek
The requests will not be served _at the same time_ , that's the whole point.
If a request is routed to a busy dyno, you will have to wait that the previous
job finish before being able to start yours.

~~~
antihero
Thing is though, it says that a Dyno is a Ubuntu virtual machine. In what sort
of horrendous configuration can an ENTIRE VM serve only a SINGLE REQUEST AT A
TIME?!

That is utter madness, and the validity of the argument depends on whether
it's the Heroku or this dude's fault that the VM is serving only a single
request at a time (and it taking >1sec to handle a request).

~~~
vemv
Not Heroku's fault in this case, Rails (and any other single-threaded
environment) can handle a single request at a time.

~~~
joesb
But one VM can host multiple Rails instances, each on different port. That's
what Passenger or Unicorn do, acting as proxy of group of locally spawn Rails
instances.

------
lkrubner
Good lord!!!!!

Percentage of the requests served within a certain time (ms)

    
    
      50%    844
    
      66%   2977
    
      75%   5032
    
      80%   7575
    
      90%  16052
    
      95%  20069
    
      98%  29282
    
      99%  30029
    
     100%  30029 (longest request)
    

Those numbers are amazingly awful. If I ever run ab and see 4 digits I assume
I need to optimize my software or server. But 5 digits?

Why in the world would a company spend $20,000 a month for service this awful?

~~~
CoffeeDregs
Worse than that:

    
    
      * 89/100 requests failed (according to
        https://gist.github.com/a-warner/c8cc02565dc214d5f77d ).  
      * Heroku times out requests after 30 seconds, so the 30000ms
        numbers may be timeouts (I've forgotten if *ab* includes 
        those in the summary).
      * That said, the *ab* stats could be biased by using overly 
        large concurrency settings (not probably if you're running 50 dynos...),
        but still...
    

But still WTF. 89/100 requests failed? That's not happy-making.

Uncertainty is _DiaI_ (death-in-an-infrastructure). I just created a couple of
projects on Heroku and love the service, but this needs to be addressed ASAP
(even if addressing it is just a blog post).

Also, if you have fewest-connections available, I've _never_ understood using
round-robin or random algorithms for load-balancers...

~~~
donavanm
> I've never understood using round-robin or random algorithms for load-
> balancers...

LeastConns/FastestConn selection is very dangerous when a backend host fails.
Imagine a host has a partial failure, allowing health checks to pass. This
host now fast fails and returns a 500 faster than other hosts in the pool
generate 200s. This poison host will have less active connections and your LB
will route more requests to it. A single host failure just became a major
outage.

I like WRR selection for back ends, then use a queue or fast fail when your
max active conns is exceeded. Humans prefer latency to errors, so let the lb
queue (to a limit) on human centric VIPs. Automated clients deal better with
errors so have your lb throw a 500 directly, or RST, or direct to a pool that
serves static content.

~~~
shabble
You'd have some sort of error threshold/rate limit[1] at which point the
server is marked dead and falls out of the backends list, surely?

Or even, an alarm threshold if responses are averaging /too fast/, based on
your expected load & response times.

I've not done any deployment/ops beyond hte trivial/theoretical though, so I
don't know how this would work in reality.

~~~
donavanm
No, lbs don't inspect established streams. The lb will periodically send
requests to a known uri as a health heck instead. The problem is when the
health check uri isn't indicative of availabilty. (Hint: it never is)

Nope, don't do this either. Unless you like getting pages because things are
working?

~~~
jeremyjh
You can configure a LB to inspect layer 7 - in HAProxy this is done with an
observe directive. Error 500s would then take the member out of a pool. You
are right that the health check of a static file may put it right back into
the pool, but you can give it a slow rise value so that it waits a few minutes
to do that. I'm not saying this is easy to get right but it is definitely
possible to at least reduce the frequency of selection of a server raising 500
errors.

~~~
donavanm
Yes, all things are possible. You'll also have to keep state and track rates.
Otherwise a very low error rate could down all of your backend hosts.

But you're now running a stateful l7 application proxy. That's waaaaaay more
expensive than a tcp proxy with asynchronous checks.

------
bignoggins
Rap Genius is employing a classic rap-mogul strategy: start a beef

~~~
parsnips
Not only that, East Coast vs. West Coast at that...

~~~
hunterhusar
LOL

~~~
hunterhusar
-4 for removing an LOL darn

------
mattj
So the issue here is two-fold: \- It's very hard to do 'intelligent routing'
at scale. \- Random routing plays poorly with request times with a really bad
tail (median is 50ms, 99th is 3 seconds)

The solution here is to figure out why your 99th is 3 seconds. Once you solve
that, randomized routing won't hurt you anymore. You hit this exact same
problem in a non-preemptive multi-tasking system (like gevent or golang).

~~~
aristus
I do perf work at Facebook, and over time I've become more and more convinced
that the most crucial metric is the width of the latency histogram. Narrowing
your latency band --even if it makes the average case _worse_ \-- makes so
many systems problems better (top of the list: load balancing) it's not even
funny.

~~~
harshreality
I seem to recall Google mentioning on some blog several years ago that high
variance in response latency degrades user experience much more than slightly
higher average request times. I can't find the link though; if anyone has it,
I'd be grateful.

~~~
nostrademons
Jeff Dean wrote a paper on it for CACM:

[http://cacm.acm.org/magazines/2013/2/160173-the-tail-at-
scal...](http://cacm.acm.org/magazines/2013/2/160173-the-tail-at-
scale/fulltext)

There's a relatively easy fix for Heroku. They should do random routing with a
backup second request sent if the first request times fails to respond after a
relatively short period of time (say, 95th percentile latency), killing any
outstanding requests when the first response comes back in. The amount of
bookkeeping required for this is a lot less than full-on intelligent routing,
but it can reduce tail latency dramatically since it's very unlikely that the
second request will hit the same overloaded server.

~~~
gleb
From experience, this is an incredibly effective way to DoS yourself. It was
the default behaviour of nginx LB ages ago. Maybe only on EngineYard. Doesn't
really matter as nobody uses nginx LB anymore.

Even ignoring the POST requests problem (yup, it tried to replay those)
properly cancelling a request on all levels of a multi-level rails stack is
very hard/not possible in practice. So you end up DOSing the hard to scale
lower levels of the stack (e.g. database) at the expense of the easy to scale
LB.

~~~
krunaldo
Nginx introduced least_conn lb method in 1.3.1 which makes it a bit better.
[http://nginx.org/en/docs/http/ngx_http_upstream_module.html#...](http://nginx.org/en/docs/http/ngx_http_upstream_module.html#least_conn)

ha-proxy is a lot better than nginx + more flexible if you want to introduce
non-http to your stack.

Shouldn't the request be canceled on all levels if you cut the HTTP connection
to the frontend?

------
nthj
I'm inclined to wait until Heroku weighs in to render judgement. Specifically,
because their argument depends on this premise:

> But elsewhere in their current docs, they make the same old statement loud
> and clear: > The heroku.com stack only supports single threaded requests.
> Even if your applicaExplaintion were to fork and support handling multiple
> requests at once, the routing mesh will never serve more than a single
> request to a dyno at a time.

They pull this from Heroku's documentation on the Bamboo stack [1], but then
extrapolate and say it also applies to Heroku's Cedar stack.

However, I don't believe this to be true. Recently, I wrote a brief tutorial
on implementing Google Apps' openID into your Rails app.

The underlying problem with doing so on a free (single-dyno) Heroku app is
that while your app makes an authentication request to Google, Google turns
around and makes a "oh hey" request to your app. With a single-concurrency
system, Google your app times out waiting for Google to get back to you and
Google won't get back to you until your app gets back to you so hey deadlock.

However, there is a work-around on the Cedar stack: configure the unicorn
server to supply 4 or so worker processes for your web server, and the Heroku
routing mesh appropriately routes multiple concurrent requests to Unicorn/my
app. This immediately fixed my deadlock problem. I have code and more details
in a blog post I wrote recently. [2]

This seems to be confirmed by Heroku's documentation on dynos [3]: > Multi-
threaded or event-driven environments like Java, Unicorn, and Node.js can
handle many concurrent requests. Load testing these applications is the only
realistic way to determine request throughput.

I might be missing something really obvious here, but to summarize: their
premise is that Heroku only supports single-threaded requests, which is true
on the legacy Bamboo stack but I don't believe to be true on Cedar, which they
consider their "canonical" stack and where I have been hosting Rails apps for
quite a while.

[1] <https://devcenter.heroku.com/articles/http-routing-bamboo>

[2] [http://www.thirdprestige.com/posts/your-website-and-email-
ac...](http://www.thirdprestige.com/posts/your-website-and-email-accounts-
should-be-friends-part-ii)

[3] [https://devcenter.heroku.com/articles/dynos#dynos-and-
reques...](https://devcenter.heroku.com/articles/dynos#dynos-and-requests)

[edit: formatting]

~~~
kennystone
If you have 2 unicorn servers and you happen to get 3 slow requests routed to
it, you are still screwed, right? Seems to me like it will still queue on that
dyno.

~~~
barmstrong
This is true - unicorn masks the symptoms for a period of time but does not
solve the underlying problem in the way a global request queue would.

Also, if the unicorn process is doing something cpu intensive (vs waiting on a
3rd party service or io etc) then it won't serve 3 requests simultaneously as
fast as single processes would.

~~~
rjacoby5
One of the hidden costs of Unicorn is spin-up time. Unicorn takes a long time
to start, then fork. We would get a ton of request timeouts during this
period. Switching back to Thin, we never got timeouts during deploys - even
under very heavy load.

------
habosa
Wow.

Normally when I read "X is screwing Y!!!" posts on Hacker News I generally
consider them to be an overreaction or I can't relate. In this case, I think
this was a reasonable reaction and I am immediately convinced never to rely on
Heroku again.

Does anyone have a reasonably easy to follow guide on moving from Heroku to
AWS? Let's keep it simple and say I'm just looking to move an app with 2 web
Dynos and 1 worker. I realize this is not the type of app that will be hurt by
Heroku's new routing scheme but I might as well learn to get out before it's
too late.

~~~
michaelrkn
My company switched off of Heroku for our high-load app because of these same
problems, but I still really like Heroku for apps with smaller loads, or ones
in which I'm will to let a very small percentage of requests take a long time.

------
stevewilhelm
Heroku Support Request #76070

To whom it may concern,

We are long time users of Heroku and are big fans of the service. Heroku
allows us to focus on application development. We recently read an article on
HN entitled 'Heroku's Ugly Secret' <http://s831.us/11IIoMF>

We have noticed similar behavior, namely increasing dynos does not provide
performance increases we would expect. We continue to see wildly different
performance responses across different requests that New Relic metrics and
internal instrumentation can not explain.

We would like the following:

1\. A response from Heroku regarding the analysis done in the article, and 2\.
Heroku-supplied persistant logs that include information how long requests are
queued for processing by the dynos

Thanks in advance for any insight you can provide into this situation and keep
up the good work.

~~~
stevewilhelm
Heroku's response:

Hi Steve,

I've been reading through all the concerns from customers, and I want every
single customer to feel that Heroku is transparent and responsive. Our job at
Heroku is to make you successful. Getting to the bottom of this situation and
giving you a clear and transparent understanding of what we’re going to do to
make it right is our top priority. I am committing to the community to provide
more information as soon as possible, including a blog post on
<http://blog.heroku.com>.

Oren Teich Heroku GM

------
htsh
Why not hire a devops guy & rack your own hardware? Or get some massive
computing units at amazon (just as good but more expensive)?

This reminds me of the excellent 5 stages of hosting story shared on here from
a while back:

<http://blog.pinboard.in/2012/01/the_five_stages_of_hosting/>

~~~
regularfry
Because the whole point is that you shouldn't have to.

~~~
Scramblejams
I don't know, $20k/mo strikes me as an awful lot of money to avoid engaging in
work that a scrappy internet startup really ought to be competent at. If you
don't know how to run the pipes and they get clogged and your plumber's not
picking up the phone, you're screwed.

That amount buys a whole lotta dedicated servers and the talent to run them.
(Sidenote: Every time I price AWS or one of its competitors for a reasonably
busy site, my eyes seem to pop out at the cost when compared to dedicated
hardware and the corresponding sysadmin salary.)

The larger issue is: Invest in your own sysadmin skills, it'll pay off in
spades, especially when your back's up against the wall and you figure out
that the vendor-which-solves-all-your-problems won't.

~~~
encoderer
Three thoughts:

1\. Employees are expensive. A good ops guy who believes in your cause and
wants to work at an early stage startup can be had for $100k. (Maybe NYC is
much cheaper than the bay area, but I'll use bay area numbers for now because
it's what I know). That's base. Now add benefits, payroll taxes, 401k match,
and the cost of his options. So what... $133k?. That's one guy who can then
never go on vacation or get hit by the proverbial bus. Now buy/lease your web
cluster, database cluster, worker(s), load balancers, dev and staging
environments, etc. Spend engineering time building out Cap and Chef/Puppet
scripts and myriad other sysops tools. (You'd need some of that on AWS for
sure, but less on Heroku which is certainly much much more expensive than AWS)

2\. When you price-out these AWS systems are you using the retail rates or are
you factoring in the generous discount larger customers are getting from
Amazon? You realize large savings first by going for reserved instances and
spot pricing and stack on top of that a hefty discount you negotiate with your
Amazon account rep.

3\. I've worked at 2 successful, talent Bay Area startups in the last few
years: One that was built entirely on AWS, and now, currently, one that owns
all of their own hardware. Here's what I think: It's a wash. There isn't a
huge difference in cost. You should go with whatever your natural talents lead
you towards. You have a founding team with solid devops experience? Great,
start on the cloud and then transition early to your own hardware. If not,
focus on where your value-add is and outsource the ops.

~~~
baudehlo
Your last sentence hides another option which is the route we took: outsource
devops. It cost us about $2000 in consultant fees for a fully setup system,
easy to expand and add hardware to, and much more cost effective long term
than AWS or Heroku. Our guy runs a small ops company who have 24/7 on-call.
It's really the perfect solution.

~~~
Scramblejams
I may have a use for this. Link to your ops provider?

------
barmstrong
We were very surprised to discover Heroku no longer has a global request
queue, and spent a good bit of time debugging performance issues to find this
was the culprit.

Heroku is a great company, and I imagine there was some technical reason they
did it (not an evil plot to make more money). But not having a global request
queue (or "intelligent routing") definitely makes their platform less useful.
Moving to Unicorn helped a bit in the short term, but is not a complete
solution.

~~~
homosaur
While I generally agree with your thoughts, I also wonder what's the reason
for continuing to misrepresent their service until you dig 20 layers deep in
docs.

------
rapind
I'd been using Heroku since forever, but bailed on them for a high traffic app
last year (Olympics related) due to poor performance once we hit a certain
load (adding dynos made very little difference). We were paying for their (new
at the time) critical app support, and I brought up that it appears to be
failing at a routing level continuously. And this was with a Sinatra app
served by Unicorn (which at the time at least was considered unsupported).

We went with a metal cluster setup and everything ran super smooth. I never
did figure out what the problem was with Heroku though and this article has
been a very illuminating read.

------
gojomo
They want to force the issue with a public spat. Fair enough.

But, they also might also be able to self-help quite a bit. RG makes no
mention of using more than 1 unicorn worker per dyno. That could help, making
a smaller number of dynos behave more like a larger number. I think it was
around when Heroku switched to random routing that they also became more
officially supportive of dynos handling multiple requests at once.

There's still the risk of random pileups behind long-running requests, and as
others have noted, it's that long-tail of long-running requests that messes
things up. Besides diving into the worst offender requests, perhaps simply
_segregating those requests to a different Heroku-app_ would lead to a giant
speedup for most users, who rarely do long-running requests.

Then, the 90% of requests that never take more than a second would stay in one
bank of dynos, never having pathological pile-ups, while the 10% that take 1-6
seconds would go to another bank (by different entry URL hostname). There'd
still be awful pile-ups there, but for less-frequent requests, perhaps only
used by a subset of users/crawler-bots, who don't mind waiting.

~~~
gojomo
On further thought, Heroku users could probably even approximate the benefits
from the Mitzenmacher power-of-two-choices insight (mentioned elsewhere in
thread), without Heroku's systemic help, by having dynos shed their own excess
load.

Assume each unicorn can tell how many of its workers are engaged. The 1st
thing any worker does – before any other IO/DB/net-intensive work – would be
to check if the dyno is 'loaded', defined as all other workers (perhaps just
one, for workers=2) on the same dyno already being engaged. If so, the request
is redirected to a secondary hostname, getting random assignment to a
(usually) different dyno.

The result: fewer big pileups unless completely saturated, and performance
approaching smart routing but without central state/queueing. There is an
overhead cost of the redirects... but that seems to fit the folk wisdom
(others have also shared elsewhere in thread) that a hit to average latency is
worth it to get rid of the long tails.

(Also, perhaps Heroku's routing mesh could intercept a dyno load-shedding
response, ameliorating pile-ups without taking the full step back to stateful
smart balancing.)

 _Added:_ On even further thought: perhaps the Heroku routing mesh
automatically tries another dyno when one refuses the connection. In such a
case, you could set your listening server (g/unicorn or similar) to have a
minimal listen-backlog queue, say just 1 (or the number of workers). Then once
it's busy, a connect-attempt will fail quickly (rather than queue up), and the
balancer will try another random dyno. That's as good as the 1-request-per-
dyno-but-intelligent-routing that RapGenius wants... and might be completely
within RapGenius's power to implement without any fixes from Heroku.

~~~
mononcqc
Most of the 'Power of two choices' I've read about assumes the presence of a
global queue
([http://www.eecs.harvard.edu/~michaelm/postscripts/handbook20...](http://www.eecs.harvard.edu/~michaelm/postscripts/handbook2001.pdf))
-- there's a parallel variety, but they go light on details in that text.

I'm unaware of how Heroku does things. I'd guess they dropped the global queue
because it's unpractical (failure prone, not scalable as it's a single point
of contention).

I'm mostly surprised to see people happy being able to handle 1 or 2 requests
in parallel per instance in general. That sounds absolutely insane to me.

~~~
gojomo
FWIW, the gunicorn in my Heroku web dynos is set to use 12 workers, though it
hasn't been stressed at that level.

------
zenazn
Randomized routing isn't all bad. In fact, if Heroku were to switch from
purely random routing to minimum-of-two random routing, they'd perform
asymptotically better [1].

[1]:
[http://www.eecs.harvard.edu/~michaelm/postscripts/mythesis.p...](http://www.eecs.harvard.edu/~michaelm/postscripts/mythesis.pdf)

~~~
jemfinch
If Heroku had the data needed to do minimum-of-two random routing, they'd have
the data needed to do intelligent routing. The problem is not the algorithm
itself: "decrement and reheap" isn't going to be a performance bottleneck. The
problem is tracking the number of requests queued on the dyno.

~~~
gojomo
_If Heroku had the data needed to do minimum-of-two random routing, they'd
have the data needed to do intelligent routing._

Not strictly true; imagine that they can query the load state of a dyno, but
at some non-zero cost. (For example, that it requires contacting the dyno,
because the load-balancer itself is distributed and doesn't have a global
view.)

Then, contacting 2, and picking the better of the 2, remains a possible win
compared to contacting more/all.

See for example the 'hedged request' strategy, referenced in a sibling thread
by nostradaemons from a Jeff Dean Google paper, where 2 redundant requests are
issued and the slower-to-respond is discarded (or even actively cancelled, in
the 'tiered request' variant).

------
goronbjorn
Aside from the Heroku issue, this is an amazing use of RapGenius for something
besides rap lyrics. I didn't have to google anything in the article because of
the annotations.

~~~
brusch
Funny - I thought this was a really interesting article - but I couldn't stand
all these annotations. And when I was selecting text (what i do mindlessly
when I am reading an article) all the hell broke loose and it tried to load
something.

Very annoying when I want to concentrate on the technical details. So we'll
see once again - everyone's different.

------
zeeg
If this is such a problem for you, why are you still on Heroku? It's not a be-
all end-all solution.

I got started on Heroku for a project, and I also ran into limitations of the
platform. I think it can work for some types of projects, but it's really not
that expensive to host 15m uniques/month on your own hardware. You __can __do
just about anything on Heroku, but as your organization and company grow it
makes sense to do what's right for the product, and not necessarily whats easy
anymore.

FYI I wrote up several posts about it, though my reasons were different (and
my use-case is quite a bit different from a traditional app):

* <http://justcramer.com/2012/06/02/the-cloud-is-not-for-you/>

* <http://justcramer.com/2012/08/30/how-noops-works-for-sentry/>

------
rdl
Wow. I suspect Rap Genius has the dollars now where it's totally feasible for
them to go beyond Heroku, but it still might not be the best use of their
time. But if they have to do it, they have to do it.

OTOH, having a customer have a serious problem like this AND still say "we
love your product! We want to remain on your platform", just asking you to fix
something, is a pretty ringing endorsement. If you had a marginal product with
a problem this severe, people would just silently leave.

~~~
zende
Rap Genius is limited more by time than by money if anything. It would make
more sense to throw money at the problem instead of people.

~~~
gleb
It doesn't appear that running on Heroku is free for them in terms of time.

~~~
rdl
There's also the outage hell. It's been ok for a month or two, but getting
killed whenever AWS has a blip in US-East (there's no cross-region redundancy,
and minimal resilience with an AZ or Region-wide service has serious problems)
isn't great.

It probably doesn't hurt RG as much as lower overall performance during normal
operations does, though.

------
lquist
Heroku implements this change in mid-2010, then sells to Salesforce six months
later. Hmm...wondering how this impacted revenue numbers as customers had to
scale up dynos following the change...

------
bifrost
I am only going to suggest a small edit -> s/Postgres can’t/Heroku's Postgres
can't/

PG can scale up pretty well on a single box, but scaling PG on AWS can be
problematic due to the disk io issue, so I suspect they just don't do it. I'd
love to be corrected :)

~~~
pvh
Data Dep't here. Postgres scales great on Heroku, AWS, and in general. We've
got users doing many thousands of query per second, and terabytes of data. Not
a problem.

The issue with the number of connections is that each connection creates a
process on the server. We cap the connections at 500, because at that point
you start to see problems with O(n^2) data structures in the Postgres
internals that start to make all kinds of mischief. This has been improved
over the last few releases, but in general, it's still a good idea to try and
keep the number of concurrent connections down.

*EDIT: thanks. not a thread. :)

~~~
joevandyk
Each connection forks a new process on the server, not a thread.

------
bad_user
I noticed problems with Heroku's router too.

However, contrary to the author, I'm serving 25,000 real requests per second
with only 8 dynos.

The app is written in Scala and runs on top of the JVM. And I was dissatisfied
that 8 dynos seem like too much for an app that can serve over 10K requests
per sec on my localhost.

~~~
jaytaylor
This sounds interesting, but kind of suspicious. What app is serving that kind
of volume continuously other than fb or goog?

You running zynga on Heroku or something?

~~~
bad_user
It's an integration with an OpenRTB bidding marketplace that's sending our way
that traffic.

And 25K is not the whole story. In a lot of ways it's similar to high
frequency trading. Not only do you need to decide in real time if you want to
respond with a bid or not, but the total response time should be under 200ms,
preferably under 100ms, otherwise they start bitching about latency and they
could drop you off from the exchange.

And the funny thing is 25K is actually nothing, compared to the bigger
marketplace that we are pursuing and that will probably send us 800K per
second at peak.

------
nacho2sweet
Why is everyone like against rapgenius.com for "forcing the issue with a
public spat". They are the customer not getting a service they are paying for.
I would be fucking pissed too. Heroku isn't being the darling the service they
advertised. They tried to work on it with Heroku. This is useful information
to most of you. Are most of you against Yelp?

~~~
gojomo
The only person using the words 'force the issue with a public spat' is me,
and I judged that as 'fair enough'. I'm not against RapGenius and I'm glad the
issue is being discussed.

But we haven't seen Heroku's comments, and while some parts of RapGenius's
complaint are compelling, I'm not sure their apparent conclusions - that
'intelligent routing' is needed, and its lack is screwing Heroku customers —
are right. I strongly suspect some small tweaks, ideally with Heroku's help,
but perhaps even without it, can fix most of RapGenius's concerns.

Perhaps there was a communication or support failure, which led to the public
condemnation, or maybe that's just RG's style, to heighten the drama. (That's
an observation without judgement; quarrels can benefit both parties in some
attention- and personality-driven contexts.)

------
zeeg
Here's a very simple gevent hello world app.

This is run from inside AWS on an m1.large:

<https://gist.github.com/dcramer/4950101>

For the 50 dyno test, this was the second run, making the assumption that the
dynos had to warm up before they could effectively service requests.

You'll see that with 49 more dynos, we only managed to get around 400 more
requests/second on an app that isnt even close to real world.

(By no means is this test scientific, but I think it's telling)

------
thehodge
Shame that this seems to have been flagged off the homepage before a
reasonable discussion can ensue

~~~
thehodge
Hmm seems to be back on the homepage again, not seen that before

~~~
thedufer
I think this can happen when its been erroneously flagged as having been
voting-ringed.

------
abat
The cost of New Relic on Heroku looks really high because each dyno is billed
like a full server, which makes it many times more expensive than if you were
to manage your own large EC2 servers and just have multiple rails workers.

New Relic could be much more appealing if they had a pricing model that was
based on usage instead of number of machines.

------
omfg
Someone from Heroku really needs to weigh in on this.

~~~
timmaah
This is not a new revelation. I got them to admit to it 2 years ago.

[http://tiwatson.com/blog/2011-2-17-heroku-no-longer-
using-a-...](http://tiwatson.com/blog/2011-2-17-heroku-no-longer-using-a-
global-request-queue)

and specifically:

[https://groups.google.com/forum/?fromgroups#!msg/heroku/8eOo...](https://groups.google.com/forum/?fromgroups#!msg/heroku/8eOosLC5nrw/Xy2j7GapebIJ)

~~~
VeejayRampay
But then again, those two links don't address the core of the problem:

Heroku is used by tons of people around the world. Some of them are paying
good money for the service. Given the amount of scrutiny under which they
operate, what is the incentive for them to turn an algorithm into a less
effective one and still charge the same amount of money in a growing "cloud
economy" where companies providing the same kind of service are a dime a dozen
(AWS, Linode, Engine Yard, etc)?

How does that benefit their business if "calling their BS" is as easy as
firing Apache Benchmark, collecting results, drawing a few charts and flat out
"prove" that they're lying about the service they provide??

I mean, I doubt Heroku is that stupid, they know how their audience doesn't
give them much room for mistakes. So as nice as the story sounds on paper, I'd
really like another take on all this, either from other users of Heroku,
independent dev ops, researchers, routing algorithms specialists or even
Heroku themselves before we all too hastily jump to sensationalist
conclusions.

~~~
timmaah
I didn't jump to any conclusion that they are doing this simply for the money.

The only conclusion I jumped to was that they ditched the routing they
originally said they had (without telling anyone) and that their routing is
worse than what you get as a default from passenger.

~~~
VeejayRampay
My bad if that wasn't clear in my comment: the sensationalist bit wasn't
directed at you but at the general tone of the thread. In fact, I actually
upvoted your comment for providing two links highly relevant to the article.

------
kmfrk
Looks like Cloud 66 couldn't have picked a better day to announce their
service: <http://news.ycombinator.com/item?id=5213862>.

------
tim_sw
randomized routing is not necessarily bad if they look at 2 choices and pick
the min. See <http://en.wikipedia.org/wiki/2-choice_hashing> and
[http://www.eecs.harvard.edu/~michaelm/postscripts/handbook20...](http://www.eecs.harvard.edu/~michaelm/postscripts/handbook2001.pdf)

~~~
jemfinch
This is a knee-jerk reply. I know, because my knee jerked as well. Think about
the problem a little more: if you have the data necessary to pick the min-of-
two, then you have the data you need to do intelligent routing.

~~~
jules
Not necessarily. Heroku claims that a global request queue is hard to scale,
and therefore they switched to random load balancing. The comment above shows
that a global request queue is not necessary. Lets say that minimum-of-n
scales up to 10 dynos. If your application requires 40 dynos, you can have one
front load balancer which dispatches the requests to 4 back load balancers,
each of which has 10 dynos assigned to them on which they perform min-of-10
routing. This gives you routing that's almost as good as min-of-40 but it
scales up nonetheless.

~~~
jacques_chester
> _Heroku claims that a global request queue is hard to scale, and therefore
> they switched to random load balancing._

I wish Heroku would tell us more about what they tried. I can imagine a few
cock-a-mimie schemes off the top of my head; it would be good to know whether
they thought of those.

------
chc
OK, maybe I'm missing something here, but it seems to me that the OP's real
problem is that he's artificially limiting himself to one request per dyno.
They now allow a dyno to serve more than one request at a time, and he's
presenting that as a _bad thing_! It seems to me that the answer to Rap
Genius' problems is not "rage at Heroku," but rather "gem 'unicorn'".

~~~
michaelrkn
We did this, but all it did was buy us a bit of extra time before we ran into
the same problem again - a very small percentage (<0.1%) of requests creating
a queue that destroyed performance for the rest of them. Also, FWIW, Heroku
does not officially support Unicorn, and you have to make sure that you don't
run out of memory on your dynos (we tanked our app the first time we tried
Unicorn with 4 processes).

------
ajsharp
My hunch is that Heroku isn't doing this to bleed customers dry. I know more
than a few really, really great people who work there, and I don't think
they'd stand for that type of corporate bullshittery. If this were the case, I
think we'd have heard about it by now.

My best guess is that they hit a scaling problem with doing smart load
balancing. Smart load balancing, conceptually, requires persistent TCP
connections to backend servers. There's some upper limit per LB instance or
machine at which maintaining those connections causes serious performance
degredations. Maybe that overhead became too great at a certain point, and the
solution was to move to a simpler random round-robin load balancing scheme.

I'd love to hear a Heroku employee weigh in on this.

~~~
Dylan16807
The thing that baffles me is that you could do high-level random balancing
onto smaller clusters that do smart balancing. This would solve most of the
problem of overloaded servers. An entire cluster would have to clog up with
slow requests before there was any performance impact. So why don't they do
this?

~~~
ajsharp
My off the cuff answer to your question is, because it's probably not quite
that simple ;)

~~~
Dylan16807
That's why I asked ;)

------
squidsoup
Given that ElasticBeanstalk has support for rails now, does Heroku still have
any advantage over AWS for a new startup?

~~~
kmfrk
SSL pain can be a major pain to set up. Is the process of setting it up
remotely easy compared to Heroku?

~~~
sqs
Setting up SSL on Elastic Beanstalk was very easy for us. The documentation
explained the entire process. It is easier if you get a wildcard SSL cert, so
then you can use the same SSL cert for your various deployments under the same
domain.

~~~
kmfrk
That's very intriguing. Do they support postinstall/post-deploy scripts/hooks
like dotCloud to run some framework set-up?

------
jotto
[http://stackoverflow.com/questions/6370479/heroku-cedar-
slow...](http://stackoverflow.com/questions/6370479/heroku-cedar-slower-
response-time-than-bamboo) (from june 2011), the first discovery of cedar
stack being slower than the bamboo stack

------
blatyo
I assumed people were running multiple rails processes on their dynos.

[http://michaelvanrooijen.com/articles/2011/06/01-more-
concur...](http://michaelvanrooijen.com/articles/2011/06/01-more-concurrency-
on-a-single-heroku-dyno-with-the-new-celadon-cedar-stack/)

------
nsrivast
OP is a friend of mine, and when I first heard of his problem I wondered if
there might be an analytical solution to quantify the difference between
intelligent vs naive routing. I took this problem as an opportunity to teach
myself a bit of Queueing Theory[1], which is a fascinating topic! I'm still
very much a beginner, so bear with me and I'd love to get any feedback or
suggestions for further study.

For this example, let's assume our queueing environment is a grocery store
checkout line: our customers enter, line up in order, and are checked out by
one or more registers. The basic way to think about these problems is to
classify them across three parameters:

\- arrival time: do customers enter the line in a way that is _D_ eterministic
(events happen over fixed intervals), rando _M_ (events are distributed
exponentially and described by Poisson process), or _G_ eneral (events fall
from an arbitrary probability distribution)?

\- checkout time: same question for customers getting checked out, is that
process _D_ or _M_ or _G_?

\- _N_ = # of registers

So the simplest example would be _D_ / _D_ /1, where - for example - every 3
seconds a customer enters the line and every 1.5 seconds a customer is checked
out by a single register. Not very exciting. At a higher level of complexity,
_M_ / _M_ /1, we have a single register where customers arrive at rate __L_
and are checked out at rate __U_ (in units of # per time interval), where both
__L_ and __U_ obey Poisson distributions. (You can also model this as an
infinite Markov chain where your current node is the # of people in the queue,
you transition to a higher node with rate __L_ and to a lower node with rate
__U_.) For this system, a customer's average total time spent in the queue is
1/( __U_ \- __L_ ) - 1/ __U_.

The intelligent routing system routes each customer to the next available
checkout counter; equivalently, each checkout counter grabs the first person
in line as soon as it frees up. So we have a system of type _M_ / _G_ / _R_ ,
where our checkout time is _G_ enerally distributed and we have _R_ >1
servers. Unfortunately, this type of problem is analytically intractable, as
of now. There are approximations for waiting times, but they depend on all
sorts of thorny higher moments of the general distribution of checkout times.
But if instead we assume the checkout times are randomly distributed, we have
a _M_ / _M_ / _R_ system. In this system, the total time spent in queue per
customer is C( _R_ , __L_ / __U_ )/( _R_ __U_ \- __L_ ), where C(a,b) is an
involved function called the Erlang C formula [2].

How can we use our framework to analyze the naive routing system? I think the
naive system is equivalent to an _M_ / _M_ /1 case with arrival rate __L_dumb_
= __L_ / _R_. The insight here is that in a system where customers are
instantaneously and randomly assigned to one of _R_ registers, each register
should have the same queue characteristics and wait times as the system as a
whole. And each register has an arrival rate of 1/ _R_ times the global
arrival rate. So our average queue time per customer in the dumb routing
system is 1/( __U_ \- __L_ / _R_ ) - 1/ __U_.

In OP's example, we have on average 9000 customers arriving per minute, or
__L_ = 150 customers/second. Our mean checkout time is 306ms, or __U_ ~= 3.
Evaluating for different _R_ values gives the following queue times (in ms):

# Registers 51 60 75 100 150 200 500 1000 2000 4000

dumb routing 16,667 1,667 667 333 167 111 37 18 9 4

smart routing 333 33 13 7 3 2 1 0 0 0

which are reasonably close to the simulated values. In fact, we would expect
the dumb router to be comparatively even worse for the longer-tailed Weibull
distribution they use to model request times, because you make bad outcomes
(e.g. where two consecutive requests at 99% request times are routed to the
same register) even more costly. This observation seems to agree with some of
the comments as well [3].

[1] <http://en.wikipedia.org/wiki/Queueing_theory>

[2]
[http://en.wikipedia.org/wiki/Erlang%27s_C_formula#Erlang_C_f...](http://en.wikipedia.org/wiki/Erlang%27s_C_formula#Erlang_C_formula)

[3] <http://news.ycombinator.com/item?id=5216385>

~~~
themgt
As someone building a Heroku Cedar-esque PaaS[1], here's the problem with your
analogy: back in Aspen/Bamboo days (and what a lot of people still think of as
"PaaS"), Heroku was like this (i.e your app was the one-a-time cashier, and
Heroku's "routing mesh" setup checkout lanes and routed customers to your
cashiers intelligently).

Now however, Heroku lets you build your own checkout lane, so you can run apps
with single-response thread Rails, multi-response thread(e.g. unicorn) Rails,
and async long-polling/SSE etc apps w/ ruby/node.js/scala/go/erlang/etc that
can handle huge numbers of simultaneous connections. Throw websockets into the
mix here too (we do). And you can even mix & match within an app, distributing
requests to different stacks of code based on URL or the time of day, which
may have different internal response/queuing characteristics (e.g. we have a
Rails app w/ a Grape API w/ a handful of URLs mapped in via Rack::Stream
middleware rather than going through Rails).

So to get back to your analogy, Heroku is automating the setup of the "lanes",
but each supermarket is allowed to use its own blueprint and cashier and
checkout process, and basically just do whatever they want within a lane.
Maybe some "lanes" are more like a restaurant where 25 customers spend an
average of 45 minutes at a time juggled between 6 waiters while others are
still bottlenecked supermarket checkouts, with everything in between. Maybe
one type of customer ties up the cashier/waiter so much that he can only
handle 10 others instead of 100 normally. And it could all change every time
the store opens (a deployment with new code occurs), or based on what the
specific customers are buying.

The point is simply that there's not a "next available checkout counter" in
this situation, because all apps are not single-threaded Rails apps anymore.
Which doesn't mean there aren't better solutions than dumb routing, but it
does get a bit more complicated than the supermarket checkout.

[1] <http://www.pogoapp.com/>

~~~
pdenya
We're discussing Rails on Heroku specifically which, non-unicorn, should be a
"next available checkout counter" situation. Ideally it should be possible to
make this an optional behavior that you can choose to turn on for Rails apps.

~~~
themgt
I agree there should be a better way - it's just important to understand than
Rails doesn't get any special treatment on a PaaS done correctly, so it's
important to come up with a generic solution.

I think part of the solution would be customizable option(i.e.. how many
requests can each dyno handle simultaneously), probably combined with
intelligently monitoring/balancing proxy load so new requests always go to the
least-loaded dyno.

Buildpacks could probably be used to parse of Gemfile/etc, see if you're using
what mix of webrick/unicorn/rails/sinatra/rack-stream/goliath etc, and set an
semi-intelligent default. But apps are increasingly unlike a checkout line.
Apps are more like the supermarket, which is harder.

~~~
vidarh
Rails doesn't _need_ to be treated specially. All that is needed is a "maximum
number of simultaneous connections to pass to this backend" setting coupled
with load balancing by available slots rather than purely randomly.

The issue here isn't that Rails needs to be treated specially - this problem
applies to various extent in _any_ type of backend where some types of
requests might turn out to be computationally heavy or require lots of IO. You
can't magic away that: A request that takes 8 CPU seconds will take 8 CPU
seconds. If you start piling more requests onto that server, response times
will increase, even if some will keep responding, and if another 8 CPU second
request hits too soon, chances increase that a third one will, and a fourth,
and before you know it you might have a pileup where available resources for
new requests on a specific instance are rapidly diminishing and response times
shoot through the roof.

Pure random distribution is horrible for that reason pretty much regardless.

Now, doing "intelligent" routing _is_ a lot easier for servers with _some_
concurrency, as you can "just" have check requests and measure latency for the
response and pick servers based on current low latency and get 90% there and
that will be enough for most applications. Sure, the lower the concurrency,
the more you risk having multiple heavy queries hit the same server and slow
things down, and this request grows dramatically with the number of load
balancers randomly receiving inbound requests to pass on, but at least you
escape the total pileup more often.

But that's also a clue to one possible approach for non-concurrent servers:
group them into buckets handled by a single active load balancer at a time and
have front ends that identifies the right second layer load balancers. Shared
state is now reduced to having the front end load balancers know which second
layer load balancers are the currently active ones for each type of backend.
It costs you an extra load balancer layer with according overhead. But don't
you think OP would prefer an extra 10ms per request over the behaviour he's
seen?

~~~
mononcqc
I'm sure OP could prefer the extra 10ms, but then everyone else who can deal
with random dispatching right now has to pay a 10ms penalty because OP built
his stuff on a technology that can deal with only one request at a time on a
server, which boggles the mind to begin with.

~~~
vidarh
Why? The system could easily be built so that it by default only aggregates
those services where the configuration indicates they can handle a concurrency
below a certain level, and does random balancing of everything else.

The "everyone else who can deal with random dispatching right now" is a much
smaller group than you think. _Anyone_ who has long running requests that
grind the CPU or disk when running, will be at high risk of seeing horribly
nasty effects from random dispatching, no matter whether their stack in ideal
conditions have no problem handling concurrent requests.

It's just less immediately apparent, as any dynos that start aggregating
multiple long running requests will "just" get slower and slower instead of
blocking normally low-latency requests totally.

~~~
DigitalJack
"The system could easily be built so that it by default only aggregates those
services where the configuration indicates they can handle a concurrency below
a certain level, and does random balancing of everything else."

Let me know when you are done with that.

~~~
vidarh
I've built fairly large haproxy based infrastructures, thank you very much.
Doing this is not particularly challenging.

Actually what I'd probably do for a setup like this would be to balance by the
Host: header, and simply have the second layer be a suitable set of haproxy
instances balancing each by least connections.

Immediately vastly better than random.

~~~
themgt
Haproxy doesn't support dynamic configurations as far as I know, which is a
serious problem if you're letting lots of people add/change domains and scale
backends up/down dynamically. A Heroku haproxy would probably need to be
restarted multiple times a second due to config changes. Nginx can do dynamic
backends with lua & redis, but it can't use the built-in upstream backend
balancing/failover logic if you do.

~~~
vidarh
While it doesn't support dynamic configurations, it does support hot
reconfiguration (the new daemon signals the old processes to gracefully finish
up and shut down), and reconfigures very rapidly. You still don't want to
restart it multiple times a second, but you don't need to:

A two layer approach largely prevents this from being a problem. You can
afford total overkill in terms of the number of haproxies as they're so
lightweight - running a few hundred individual haproxy instances with separate
configs even on a single box is no big deal.

The primaries would rarely need to change configs. You can route sets
customers to specific sets of second layer backends with ACL's on short
substrings of the hostname (e.g. two letter combinations), so that you know
which set of backends each hostname you handle maps to, and then further
balance on the full host header within that set to enable the second layer to
balance on least-connections to get the desired effect.

That lets you "just" rewrite the configs and hot-reconfigure the subset of
second layer proxies handling customers that falls in the same set on
modifications. If your customer set is large enough, you "just" break out the
frontend into a larger number of backends.

Frankly, part of the beauty of haproxy is that it is so light that you could
probably afford a third layer - a static primary layer grouping customers into
buckets, a dynamic second layer routing individual hostnames (requiring
reconfiguration when adding/removing customers in that bucket) to a third
layer of individual customer-specific haproxies.

So while you would restart _some_ haproxy multiple times a second, the
restarts could trivially be spread out over a large pool of individual
instances.

Alternatively, "throwing together" a second or third layer using iptables
either directly or via keepalived - which _does_ let you do dynamic
reconfiguration trivially, and also supportes least-connections load balancing
- is also fairly easy.

But my point was not to advocate this as the best solution for somewhere like
Heroku - it doesn't take a very large setup before a custom solution starts to
pay off.

My point was merely that even with an off the shelf solution like haproxy,
throwing together a _workable_ solution that beats random balancing is not all
that hard - there's a large number of viable solutions -, so there really is
no excuse not to for someone building a PaaS.

------
andrewcooke
i don't think some details of the argument hold. it alleges that you need more
dynos to get the same throughput. but that's not true once you have sufficient
demand to keep a queue of about sqrt(n) (i think - someone who knows more
theory than me can correct me) in size on the dyno (where you have n dynos).
because at that point all dynos will be running continuously, and the
throughput will be the same with either routing.

the average latency will be higher, though (and the spread in latency larger).

~~~
lil_tee
But you _never_ want to have a queue on any of your dynos! A queued request
means that a user is waiting with no response. If your goal is to have 0 (or
less than epsilon) requests queued, it takes far fewer dynos if the requests
are routed intelligently

If you have 10 dynos and 1000 simultaneous requests, the difference between
naive and intelligent might well be reduced, but that's also a scenario in
which your end user response times would be horrendously slow and so you'd
need more dynos either way

------
mixedbit
I think this analysis and simulation does not account for one important thing:
random routing is stateless and thus easy to be distributed. Routing to the
least loaded Dyno needs to be stateful. It is quite easy to implement when you
have one centralized router, but for 75 dynos this router would likely become
a bottleneck. With many routers, intelligent routing has its own performance
cost, the routers need to somehow synchronize state, and the simulation
ignores this cost.

~~~
badgar
> With many routers, intelligent routing has its own performance cost, the
> routers need to somehow synchronize state, and the simulation ignores this
> cost.

Which is why we pay companies like Heroku to engineer clouds in which to run
our applications. Because they're supposed to be better at this than us and
spend the time and money building this difficult infrastructure well. That
includes a scalable, stateful intelligent routing service.

------
dblock
I believe routing is not random, but round robin. I'd like Heroku to confirm.
It's still a problem. If you are looking to run Unicorn on Heroku, use the
heroku-forward gem (<https://github.com/dblock/heroku-forward>). Works well,
but application RAM is quickly its own issue, we failed to run that in
production as our app takes ~300MB.

------
jasonwatkinspdx
The problem is the request arrival rate vs the distribution of service times
in your app.

New Relic may be giving you an average number you feel happy about, but the
99th percentile numbers are extremely important. If you have a small fraction
of requests that take much longer to process, you'll end up with queuing, even
with a predictive least loaded balancing policy.

This is a very common performance problem in rails apps, because developers
often use active record's associations without any sort of limit on row count,
not considering that in the future individual users might have 10000
posts/friends/whatever associated object.

Fix this and you'll see your end user latency come back in line.

------
jules
Why not get yourself ONE beefy server? (or two) That should be able to handle
your 150 requests per second, simplify your architecture a lot, and buying it
would be cheaper than 1 month on Heroku (at $20,000/month).

~~~
sabat
Because when that one beefy server goes tits-up, you're out of business. Same
with two.

~~~
jules
Why? The second is idling until the first fails, then the second takes over.
Unless both fail simultaneously of course, for example due to power outage,
but then your 40 servers will also fail simultaneously. Not to mention that
with just one running server there are a lot fewer failure scenarios.

------
michaelfairley
There's another fun issue that falls out of this: any requests sitting in the
dyno queue when the app restarts get dropped with a 5xx error.
[https://github.com/michaelfairley/unicorn-
heroku/issues/1#is...](https://github.com/michaelfairley/unicorn-
heroku/issues/1#issuecomment-8601906)

------
codex_irl
Personally - I prefer Linode to Heroku, sure there is more of my time consumed
with sys admin, but I like having full control over my platform & setup,
rather than having it virtually dictated to me. I'm always open to change but
this strategy has served me very well for almost 3 years now.

~~~
kawsper
I have a Capistrano config that I can slab in my Rails projects. I can then do
a:

cap deploy:setup

cap deploy:cold

cap deploy

And now my app is running on my server, I then add routing and I am good to
go.

It is less fancy than Heroku if you want to play with some new technology, you
need to install it, and get it configured, and get it to run properly.

------
dangrahn
I was in contact with Heroku support a couple of weeks ago since we
experienced some timeout on our production app. Got a detailed explanation how
the routing on heroku works by a Heroku engineer, and thought I could share:

"I am a bit confused by what you mean by an "available" dyno. Requests get
queued at the application level, rather than at the router level. Basically,
as soon as a request comes in, it gets fired off randomly to any one of your
web dynos.

Say your request that takes 2 seconds to be handled by the dyno was dispatched
to a dyno that was running a long running request. Eventually, after 29
seconds, it completed serving the response, and started working on the new,
faster 2 second request. Now, at this point it had already been waiting in the
queue for 29 seconds, so after 1 second, it'll get dropped, and after another
1 second, the dyno will be done processing it, but the router is no longer
waiting for the response as it has already returned an H12.

That's how a fast request can be dropped. Now, the one long 29 second request
could also be a series of not-that-long-but-still-long requests. Say you had 8
requests dispatched to that dyno at the same time, and they all took 4 seconds
to process. The last one would have been waiting for 28 second, and so would
be dropped before completion and result in an H12."

------
jhuckestein
Watch out, this affects small rails applications with few dynos as well.

If you hit the wall with one dyno and add another one, you won't get twice the
throughput even though you pay twice the price.

I've always had suspicions about this on some smaller apps but never really
looked into it. You can configure New Relic to measure round-trip response
times on the client side. At peak loads those would be unreasonably high. Much
higher than can be explained by huge latencies even.

------
simpletouch
This is something that I have been struggling with the past long while. Very
troublesome when a dyno cycles itself (like they always will at least every 24
hours), as the routing layer continues to send it requests, resulting in
router level "Request Timeouts" if it takes too long to restart.

Especially difficult to diagnose when the queue and wait time in your logs are
0. What is the point of these in the logs if it never waits or queues?

------
tlrobinson
Question from a non-Ruby-expert: does Thin, which uses Event Machine, help
with this at all, or do requests still block on other IO like database calls,
etc?

~~~
siong1987
I don't think that thin will help in this case because Rails is blocking in
general. So, you are right because other IOs will still block.

You probably need an app that is built on like:
<https://github.com/raggi/async_sinatra>

------
zrail
For those of you looking to migrate to other, barer hosting solutions like AWS
or another VPS provider, I've put together a Capistrano add-on that let's you
use Heroku-style buildpacks to deploy with Nginx doing front-end proxy. I use
it for half a dozen apps on my VPSs and it works swimmingly well.

<https://github.com/peterkeen/capistrano-buildpack>

~~~
kawsper
Doesn't that require root or at least sudo permissions for your deploy user?

~~~
zrail
sudo for the user doing the deploy but not for the user running the app. The
default is to run as the deploy user but you can change it.

------
anon640
"For a Rails app, each dyno is capable of serving one request at a time."

Is this a deliberate design choice on Heroku's part, or is this just how Ruby
and Rails work? It sounds bizarre that you would need multiple virtual OS
instances just to serve multiple requests at the same time. What are the
advantages of this over standard server fork()/threaded accept designs?

~~~
kawsper
It is how Rails server behaves in itself, but that is also how Heroku tells
you to do it.

Rails can be served with Unicorn ( <http://unicorn.bogomips.org/> ) which is a
forking app-server.

I do believe there was a trick a while back where you could get Heroku to run
a Unicorn process on a dyno to get more requests out of it. The process is
described here: <http://blog.codeship.io/2012/05/06/Unicorn-on-Heroku.html>

~~~
anon640
Isn't this kind of a step backwards? Is Rails really that great that people
are willing endure these kinds of limitations just to use it?

~~~
kawsper
Rails server is mostly for development mode. When deployed I think most people
use either Unicorn, or throw their applications on JRuby that runs on the JVM
with some kind of appserver.

JRuby have this advantage of being multithreaded, so you can parallelize
within a single process, and don't rely on forking. Stock Ruby with MRI have a
GIL, and as far as I know only runs on one core.

The limitations of stock Ruby is being worked on, but there is still a long
way.

------
trotsky
Those charts of "simulated" load balancing strategies don't look at all
reasonable at first glance. You certainly don't see such spiky patterns with
normal web loads. I think you'd have to have some crazy amount of std. dev in
completion time cranked way, way up in your simulation before you saw a bunch
of servers stacked at 30 with others at 1.

It's not that there is no benefit to better balancing, it's just that I've
never seen it have anything close to that impact. It seems like it's only
being perceived as a problem here because somebody drank too much of the (old)
kool-aid.

Some of the other numbers are hard to take at face value as well. 6000ms avg
on a specific page? If requests are getting distributed randomly shouldn't all
your pages show a similar average time in queue? Sounds more like they're
using a hash balancing alg and the static page was hashing on to a hot spot.

~~~
waxjar
> If requests are getting distributed randomly shouldn't all your pages show a
> similar average time in queue?

A common misconception, called "the law of small numbers".

Probability theory tells us this is only true over a large amount of requests,
i.e. in the long term (the law of large numebrs). In the short term, results
can vary wildly and thus form these kind of queues.

------
Giszmo
I searched for variance and apparently nobody mentioned this before: They talk
of Mean request time: 306ms Median request time: 46ms Which indicates a very
high variance, so don't take for granted that an x50 increase of performance
would result from intelligent routing. The problem is that the fast tasks
suffer from being queued after the slow tasks, so each such fast task takes an
extra latency. If the variance is lower, the random routing will be favorable
at some point as the delay of getting the task from the router queue to the
dyno is not zero neither. In the case of no variance, "intelligent routing"
would always add that delay as soon as all dynos are at their limit. Before
that, the router would simply keep a list of idle dynos and send work there
without delay.

Sure if you never hit 100% load, intelligent routing is cheap and comes at no
delay. Imagine 40ms jobs getting all dynos to 100% load. Now the dynos would
be idle for the duration of the ping that it takes to report being idle. let
that be 4ms. That is 10% less throughput than with items queuing up on the
dyno.

The router being the bottleneck would therefore justify to make it stateless
and give the dynos a chance to use these last 10% of processing power as well,
ultimately increasing the throughput by 10%. Sure, a serious project would not
run its servers at 120% load hoping to eventually get back to 100% within
time, so all this being said I would always favor intelligent routing to get
responsive servers, add dynos in rush hours and only opt for dyno-queuing for
stuff that may come with a delay (scientific number crunching, …)

------
pdog
What's the advantage of randomized routing over intelligent routing? Why would
this change be made?

~~~
Mc_Big_G
It's easier/cheaper for them to maintain and you have to pay for more dynos. A
lot more.

~~~
reddit_clone
Win/Win. For them.

------
pointful
Just adding a top-level post to point out something buried in one of the
threads here that is an important point on what is happening here:

The "queue at the dyno level" is coming from the Rails stack -- it's not
something that Heroku is doing to/for the dynos.

Thin and Unicorn (and others, I imagine) will queue requests as socket
connections on their listener. Both default to 1024 backlog requests. If you
lower that number, Heroku will (according to the implications in the
documentation on H21 errors) try multiple other dynos first before giving up.

See [https://devcenter.heroku.com/articles/error-
codes#h21-backen...](https://devcenter.heroku.com/articles/error-
codes#h21-backend-connection-refused)

For a single-threaded process to be willing to backlog a thousand requests is
problematic when combined with random load balancing. Dropping this number
down significantly will lead to more sane load-balancing behavior by the
overall stack, as long as there are other dynos available to take up the
slack.

Also, the time the request spends on the dyno, including the time in the
dyno's own backlog, is available in the heroku router log. It's the "service"
time that you'll see as something like "... wait=0ms connect=1ms service=383ms
...". Definitely wish New Relic was graphing that somewhere...

------
juanbyrge
LOL, heroku is not designed for real apps. It's designed for side projects and
consulting projects that don't go anywhere.

Anytime you get traffic, move off ASAP!

------
mattbillenstein
Single request per server? What year is this? 2003?

~~~
bhauer
I was thinking the same thing. And 200ms for what exactly? To me the elephant
in the room is that 99 server instances are spun up to handle 15M uniques per
month.

------
jmount
Turns out to be a great queueing problem. Please check out my analysis of much
simplified version of the random routing algorithm that fails with near
certainty: [http://www.win-vector.com/blog/2013/02/randomized-
algorithms...](http://www.win-vector.com/blog/2013/02/randomized-algorithms-
can-have-bad-deterministic-consequences/)

------
kawsper
This explains why some of my benchmarking tools gave very different and
sometimes weird results when figuring out how many dynos our application
needed.

As this blogpost also states Heroku really need to keep their documentation up
to date. I sometimes stumble across something old referring to an old stack,
or something contradicting.

------
mleach
The balance of a subjective, sensationalist headline with objective
statistical simulation was impressive.

I'm a huge Heroku fan using Cedar/Java, but can't help but wonder how many
optimization options remain for Rails Developers, assuming nothing else
changes on Heroku:

* Serving static HTML from CDN * Unicorn * Redis caching with multiget requests

------
frankc
I know nothing about Heroku's architecture than what I just read in this post,
but couldn't you alleviate this problem greatly by having the dyno's implement
work stealing? Obviously the they would have to know about each other then,
but perhaps that is easier to do that global intelligent routing.

------
vineet
It seems that the delay is in the variance in the length of the different
jobs. Having slow jobs is generally not a good idea, and I can imagine that
they are happening for uncommon tasks.

When you are running a 100+ servers it seems like a simple answer would be to
think about these uncommon tasks differently. Options would be for
prioritizing them differently, showing different UI indicators, and also
wanting them happening on a separate set of machines.

Doing these would mean that an intelligent routing mechanism would not have as
much use. Am I wrong here?

I do believe that Heroku should document such problems of theirs more clearly,
so that we know what challenges that we are facing as we develop applications,
but in this particular case, it seems that they do have the right plumbing,
and that they just need to be used differently.

------
eignerchris_
Thanks for calling this out. As you said, random routing is about as naive as
it gets. They need to make upgrades to the routing mesh - expose some internal
stats about dyno performance and route accordingly. Even if the stats were
rudimentary, anything would be an improvement over random.

------
fatbird
Maybe this is a dumb question, but wouldn't straightforward Round Robin
routing by Heroku restore their "one dyno = one more concurrent request"
promise without incurring the scaling liabilities of tracking load across an
arbitrarily large number of dynos?

~~~
joevandyk
Nope, requests could still get queued behind a dyno that's busy with a long
request.

~~~
fatbird
Sure, but the real issue the article identifies is that, under random routing,
they need to keep doubling the number of dynos to halve the odds of bad
queueing, which leads to absurd factor of 50 requirements to get back to what
they had before. With round robin, the increase should be much more linear.

~~~
mononcqc
Over many requests, both should average to N requests served to each instance
assuming a uniform random distribution.

The real issue is being able to figure out instances to avoid when some
requests end up being slow. To put it another way, ideal balancing in this
case isn't about evenly splitting all requests, but evenly splitting all
processing (or waiting) time.

If you can guarantee that requests tend to take pretty stable and uniform
time, then random or round-robin distribution should give good results. If you
can't, some requests will be stuck waiting behind others and their waiting
time will accumulate. You'll see worse behaviour when two or more of the bad
slow requests get queued one after the other.

------
clouddevops
Perhaps easy deployments are not worth the performance and blackbox trade-off.
An alternative approach is a cloud infrastructure provider with baremetal and
virtual servers on L2 broadcast domain, and one that provides a good API and
orchestration framework so that you can easily automate your deployments. Here
are some things we at NephoScale suggest you consider when choosing an
infrastructure provider: [http://www.slideshare.net/nephoscale/choosing-the-
right-infr...](http://www.slideshare.net/nephoscale/choosing-the-right-
infrastructure-provider)

------
grandalf
I'd think that most of the requests being served by rapgenius.com would be
highly cacheable (99% are likely just people viewing content that rarely
changes).

Seems weird that the site would have such a massive load of non-cacheable
traffic. Heroku used to offer free and automatic varnish caching, but the
cedar stack removed it. Some architectures make it easy to use cloudfront to
cache most of the data being served. My guess that refactoring the app to lean
on cloudfront would be easier and more cost-effective (and faster) than
manually managing custom scaling infrastructure on EC2.

------
AliEzer
Interesting article but every time I read something on RapGenius and move my
eyes from the screen, I keep seeing white lines, very annoying. White font on
black background is bad. Off topic I know, but still.

------
habosa
Somewhat unrelated:

Does anyone else think that RapGenius makes a great blogging platform? I'd
love a plugin that enabled similar annotations on any blog, even if they're
just by the original author and not crowdsourced.

~~~
kmfrk
It's been tried before (Apture and a crowd-sourced proof-reading plug-in I
can't remember the name of). It needs critical mass to work, but it might very
well on a very community-focused platform.

~~~
kmfrk
gooseGrade! I remembered the name. Here's a link:
<http://www.crunchbase.com/company/goosegrade>.

------
benjamincburns
This kind of validates an idea I've been flirting with: a Heroku-like service
which routes requests via AMQP or similar message broker and actually exposes
the routing dynamics to the client apps.

From a naive, inexperienced view the idea of having web nodes "pull" requests
from a central queue rather than the queue taking uneducated guesses seems to
be a no-brainer. I can see this making long-running requests (keep-alive,
streaming, etc) a bit more difficult, but not impossible.

What am I missing? This seems so glaringly obvious that it must have been done
before...

~~~
dblock
The pull model is very hard to implement because the router behaves like a
proxy for a much larger set of dynos (think tens of thousands). When you have
10K clients yielding "i'm available" 10 times a second, you have a nightmare,
it's not sustainable.

A possible solution for the proxy and the dynos to agree on a protocol where
the proxy passes a request to the dyno and the latter can give up with a
status code that says "retry with another dyno". This could go on to up to the
30s timeout limit that Heroku has now.

~~~
jacques_chester
In Mongrel2, app servers subscribe to a named ZeroMQ queue and, when they're
done, they send the response on a different queue.

You can actually configure an arbitrary number of different queues if you
like, switching on request path and some other stuff I don't just now recall.

------
lquist
How does this compare to EngineYard/AppFog/any other Heroku competitors?

~~~
carbon8
Engine Yard is more like opinionated configuration management. It allocates
and configures EC2 instances that you can log into like normal. The software
stack is HAProxy, nginx, unicorn, etc, and customizable through the web
interface and/or chef.

~~~
kawsper
Is there a reason for using both HAproxy and Nginx?

~~~
photomattmills
HAproxy is a more efficient load balancer for really high scale apps. That
said, only about 5% of people would see a difference in load/memory usage.
Source: I work for EY.

------
googletron
Does anyone know how if python applications are affected by this? I know they
can handle multiple requests per dyno, I would be interested to know if random
routing affects python apps too.

~~~
beambot
For python / Django you can use a Procfile specifying gunicorn (rather than
stock manage.py) with multiple worker threads, eg.

 _web: gunicorn myapp.wsgi -b 0.0.0.0:$PORT -w 5_

Then you will have 5 parallel "single-threaded" instances on each dyno rather
than just 1. This will partially ameliorate the problem, but probably not
100%. (NOTE: This is speculation since Heroku hasn't weighed in yet)

------
joshwa
Fundamentally we're talking about a load balancer. Even the most basic load-
balancers can use a least-connections algorithm. Even a round-robin algorithm
would be better since that would give each dyno (number-of-dynos * msec per
request) to finish a long-running request. Random routing is a viable option
where the number of concurrent requests a node can handle is large or unknown,
but when the limit is known and in the _single-digits_ , random routing is a
recipe for disaster.

------
justinhj
Seems like message here is that if you use an off the shelf solution you need
to work around its limitations. In this case random load balancing may sound
dumb but it's actually quite a reasonable way to spread load. The customers
real problem is the single threaded server bottleneck compounded by the
sporadic slow requests. Seems like they have outgrown Heroku and a more custom
solution is required. Either that or rebuild the server in whole or in part
with a more concurrent one.

------
jonnycat
This might be the case "out of the box", but it's very simple to go
multithreaded on the Cedar stack and avoid this issue (provided that your app
is threadsafe, of course).

You can do this pretty easily with a Procfile and thin: bundle exec thin -p
$PORT -e $RACK_ENV --threaded start

And then config.threadsafe! in the app

Regarding Rails app threadsafety, there are some gotchas around class-level
configuration and certain gems, but by and large these issues are easily
manageable if you watch out for them during app development.

~~~
zen_boy
Are the some resources on the general topic what it means to build multi-
threaded Rails app instead of a traditional one?

------
haddr
I would like to see why actually Heroku fell back to random routing. It
doesn't really make sense. Of course this all routing stuff is really tricky,
but on the other hand there is a lot of work done (look at TCP agorithms).
When I was studying ZeroMQ routing based stuff for one project, I came across
"credit-based flow control" pattern, that could make perfect sense in this
kind of situation (Publisher-Subscriber scenario). Why not implementing such
thing?

------
krutulis
I can't help but wonder if this kind of surreptitious change to the platform
might in any way be connected to Byron Sebastian's sudden resignation last
September from Salesforce. Is that nutty of me?

[http://gigaom.com/2012/09/05/heroku-loses-a-star-as-ceo-
and-...](http://gigaom.com/2012/09/05/heroku-loses-a-star-as-ceo-and-
salesforce-evp-sebastian-resigns/)

------
Uchikoma
~$20,000 sounds like a lot of money, they would need at least $100M a year in
revenue to justify this number. This will be a major challenge if they want to
grow profitable after the $15M VC money runs out. I'd assume they'd get the
same for $5000 in rented servers which would free up enough money - outside of
the valley - to have an DevOps and another developer.

~~~
Terretta
Not following. Why do you say they need $100 million a year in revenue to
justify $240K (00.25% of revenue) in hosting expenses?

For an online biz, 10% - 50% isn't uncommon for profitable businesses. Many
"virtual" companies (online only) do fine at 80%.

~~~
Uchikoma
Otherwise you do not have enough traffic to justify $240k.

------
gtirloni
VMs running full frameworks that are single-threaded. Why does that feel like
wasting resources or bloating the architecture?

------
izietto
From the Heroku docs:

[...] Request distribution

The routing mesh uses a random selection algorithm for HTTP request load
balancing across web processes. [...]

If the algorithm is random, the load balancing simply doesn't happen, am I
wrong?

[https://devcenter.heroku.com/articles/http-
routing#request-d...](https://devcenter.heroku.com/articles/http-
routing#request-distribution)

------
JuDue
I'd love to see some better tutorials on how to use AWS Beanstalk to scale
Rails apps.

There is this one, but it doesn't give me a sense of the scalability or
management
[http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/create...](http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/create_deploy_Ruby_rails.html)

Any recommendations?

~~~
JuDue
For example, a small instance is $69/yr for 1.7GB of memory, with additional
hourly costs that are quite low

This is very economical compared to Heroku, and most startups can survive on
that initially if they cache properly.

But if there is any level of success, how hard is it to scale compared to the
extra cost of Heroku?

I'm not convinced it's THAT hard, but would love to see more blog posts about
Beanstalk. The AWS doco feels quite mechanical.

~~~
JuDue
And example of what is confusing...

$69 to reserve an instance for a year.

But that is for "light utilization"?!

What does it mean to reserve and instance, but to commit to light usage?

And if you are expecting heavy usage, the price goes up to $195.

But how can you buy an instance for a year but also commit to your usage
level? If it's my instance, why is my utilization anyones business?

~~~
dangrossman
You're not committing to a usage level, you're committing to a pricing level.
If you reserve a small instance, you get a small instance, no matter what
utilization level you choose. It's the same exact resources no matter what you
pick.

The utilization levels are pricing tiers:

Light utilization = lowest upfront cost, highest hourly rate.

Medium utilization = medium upfront cost, medium hourly rate.

Heavy utilization = highest upfront cost, lowest hourly rate.

The names are meant to signify the trade-off you're making. If you run your
instance only an hour a day, you will pay the least by choosing "light
utilization": the hourly cost is high but you're only going to multiply that
by a small number, so the savings in the up-front cost will dominate the total
cost. If you run your instance 24 hours a day, then the hourly rate will
dominate your total costs, so you'll save money by choosing "heavy
utilization" with a higher up-front cost but lower hourly cost.

Segmenting the costs makes the pricing table more difficult to read, but it
optimizes for everything else: you pay the lowest possible price for
guaranteed resources, and Amazon has better knowledge of how much spare
capacity it actually needs to handle the reservations.

------
jedahan
Take a look at deliver if you like heroku push, but want it on a machine you
control at a bit lower level: <https://github.com/gerhard/deliver> I got it
working on EC2 / Ubuntu real easy, and even added some basic support for
SmartOS/illumos for joyent cloud

------
filvdg
You can Model Queues and calculate the service level when you do inteligent
routing

The Erlang C formula expresses the probability that an arriving customer will
need to queue (as opposed to immediately being served).

<http://owenduffy.net/traffic/erlangc.htm>

------
hpguy
Can anyone explain why this random routing is supposedly good for Node.JS and
Java? I mean the net effect is busy dynos might serve more requests while idle
ones remain idle and that is certainly not good for Node.JS or anything. What
am I missing?

------
cwalcott
Managing load with thin workers isn't very hard...haproxy
[<http://haproxy.1wt.eu/>] makes it pretty easy to setup rather complex load
distributions (certainly more complex than random!).

------
leoh
Ugh. These guys are so cocky.

~~~
parsnips
They are. It's in their DNA to criticize (seamlesswebsucks.com)... But they're
also correct. So many reasons to hate them ;)

------
evan2
Great article. Have you thought about the alternative of building your own
auto-scaling architecture with 99.9% uptime? I'd be interested to hear if you
plan to move off heroku and, if so, what your plans are.

------
craigkerstiens
The initial response from GM of Heroku -
[https://blog.heroku.com/archives/2013/2/15/bamboo_routing_pe...](https://blog.heroku.com/archives/2013/2/15/bamboo_routing_performance/)

------
wastedbrains
Heroku for static content is always terrible, I am always surprised at how
many people host static sites on Heroku, it is really easy to host of S3
buckets and it is much faster for static pages.

------
lil_tee
We have updated our post to incorporate some popular suggestions and reactions
into our simulations: <http://rapgenius.com/1504221>

------
ratherbefuddled
If it's true, I can't see how random routing can be anything but a cynical
cash grab.

Even a very simple algorithm like round robin would give you a significantly
better latency characteristic wouldn't it?

------
aneth4
Would love to see some more perspectives on this. We also spend a lot of
resources on heroku.

I'm not sure if this change by heroku is worse than the intermediary popups on
all the links on this blog.

------
joeblau
Thanks for the write up. I've been looking for some more reviews on Heorku's
platform and this in-depth review definitely illuminates some challenges with
the platform.

------
oellegaard
This really sux. I like all their other offerings though - I'm considering
running the Cloud Foundry "dyno" part alone and using the heroku services with
it.

------
pm90
Did anyone else find the headline a bit confusing? I got the feeling that they
had abandoned RoR for another framework and almost skipped the article itself

------
EGreg
And this kind of thing is why I prefer to have our own VPS. Linode is great,
but we are slowly switching over to AWS and automating all the scaling
up/down.

------
adminonymous
I do hope that someone brings this up during Heroku's "Waza" developer
conference. It's the perfect opportunity to air it out.

------
cachvico
Can't they make the choice of intelligent or random scheduling a per-platform
setting?

Java, node.js use random. Django + Rails use intelligent.

------
MediaSquirrel
Heroku: The Rap Genius "Success" Story

<http://success.heroku.com/rapgenius>

------
philipDS
Off-topic: RapGenius should really open source their "Explain tooltips" with
the inline explanation window. Awesome :)

------
knodi
This kind of routing wouldn't be a problem if they didn't charge $35 a dyno.
It's such a high cost for a dyno.

------
bryanwbh
Thanks for the write-up on this as I am currently reviewing heroku as an
option for a proper PAAS for my app.

------
damian2000
A bit of context:

<http://success.heroku.com/rapgenius>

------
Sami_Lehtinen
Interesting. No I didn't read it because the page crashes my mobile browser
everytime reliably.

------
JuDue
I'd love to see some better tutorials on how to use AWS Beanstalk to scale
Rails apps.

There is this one, but it doesn't give me a sense of the scalability or
management
[http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/create...](http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/create_deploy_Ruby_rails.html)

Any recommendations for good tutorials?

------
rubyrescue
this is a bit hyperbolic. "Heroku Swindle Factor" just seems rude.

------
aren55555
This was a great read.

------
benihana
I really like Rap Genius, but I wish they would tone down the blackness of the
background. Reading #CCC text on #000 background makes my eyes bug out.

~~~
marcamillion
Perspective is a hell of a thing....The way this comment reads, I thought this
was going to get racist real quick - but was relieved when I finished reading
and did agree with you :)

~~~
sergiotapia
I think that speaks more on your dormant racism than anyone elses.

~~~
marcamillion
My dormant racism how? Vs black people?

------
seivan
rails server -p $PORT rake jobs:work Webrick and DJ

No procfile? No Unicorn or Puma? No worker process or threads defined

------
dschiptsov
Why on Earth any sane engineer would think that adding layers of "virtualized"
crap in front of your application will be of any benefit?)

The only advantage of virtualization is on the developing stage and it is an
ability to add quickly more slow and crappy resources you not own.)

Production is an entirely different realm, and the less layers of crap is in
between of your TCP request and DB storage - the better. As for load balancing
- it is Cisco level problem.)

Last question: why each web site must be represented as a hierarchy of some
objects, instead of thinking in terms of what it is - a list of static files
and some cached content generation on demand?)

------
alekseyk
How do they know what algorithm Heroku uses for randomization to stimulate the
results?

The differences in simulations are astonishing, I would not think Heroku's
engineers were fine with this approach.

'Let's push this random balancing out.. 1000% increase in resources? Oh well,
just update documentation!'

~~~
mononcqc
I'd guess the problem wouldn't be as bad if each instance could handle more
connections/requests than one or two. Allow say, 10 of them, and you will
reduce the problem by a lot I believe.

------
felipelalli
cacilda!

