Hacker Newsnew | comments | show | ask | jobs | submitlogin
Heroku's Ugly Secret: The story of how the cloud-king turned its back on Rails (rapgenius.com)
1715 points by tomlemon 512 days ago | comments


nsrivast 512 days ago | link

OP is a friend of mine, and when I first heard of his problem I wondered if there might be an analytical solution to quantify the difference between intelligent vs naive routing. I took this problem as an opportunity to teach myself a bit of Queueing Theory[1], which is a fascinating topic! I'm still very much a beginner, so bear with me and I'd love to get any feedback or suggestions for further study.

For this example, let's assume our queueing environment is a grocery store checkout line: our customers enter, line up in order, and are checked out by one or more registers. The basic way to think about these problems is to classify them across three parameters:

- arrival time: do customers enter the line in a way that is Deterministic (events happen over fixed intervals), randoM (events are distributed exponentially and described by Poisson process), or General (events fall from an arbitrary probability distribution)?

- checkout time: same question for customers getting checked out, is that process D or M or G?

- N = # of registers

So the simplest example would be D/D/1, where - for example - every 3 seconds a customer enters the line and every 1.5 seconds a customer is checked out by a single register. Not very exciting. At a higher level of complexity, M/M/1, we have a single register where customers arrive at rate _L and are checked out at rate _U (in units of # per time interval), where both _L and _U obey Poisson distributions. (You can also model this as an infinite Markov chain where your current node is the # of people in the queue, you transition to a higher node with rate _L and to a lower node with rate _U.) For this system, a customer's average total time spent in the queue is 1/(_U - _L) - 1/_U.

The intelligent routing system routes each customer to the next available checkout counter; equivalently, each checkout counter grabs the first person in line as soon as it frees up. So we have a system of type M/G/R, where our checkout time is Generally distributed and we have R>1 servers. Unfortunately, this type of problem is analytically intractable, as of now. There are approximations for waiting times, but they depend on all sorts of thorny higher moments of the general distribution of checkout times. But if instead we assume the checkout times are randomly distributed, we have a M/M/R system. In this system, the total time spent in queue per customer is C(R, _L/_U)/(R _U - _L), where C(a,b) is an involved function called the Erlang C formula [2].

How can we use our framework to analyze the naive routing system? I think the naive system is equivalent to an M/M/1 case with arrival rate _L_dumb = _L/R. The insight here is that in a system where customers are instantaneously and randomly assigned to one of R registers, each register should have the same queue characteristics and wait times as the system as a whole. And each register has an arrival rate of 1/R times the global arrival rate. So our average queue time per customer in the dumb routing system is 1/(_U - _L/R) - 1/_U.

In OP's example, we have on average 9000 customers arriving per minute, or _L = 150 customers/second. Our mean checkout time is 306ms, or _U ~= 3. Evaluating for different R values gives the following queue times (in ms):

# Registers 51 60 75 100 150 200 500 1000 2000 4000

dumb routing 16,667 1,667 667 333 167 111 37 18 9 4

smart routing 333 33 13 7 3 2 1 0 0 0

which are reasonably close to the simulated values. In fact, we would expect the dumb router to be comparatively even worse for the longer-tailed Weibull distribution they use to model request times, because you make bad outcomes (e.g. where two consecutive requests at 99% request times are routed to the same register) even more costly. This observation seems to agree with some of the comments as well [3].

[1] http://en.wikipedia.org/wiki/Queueing_theory

[2] http://en.wikipedia.org/wiki/Erlang%27s_C_formula#Erlang_C_f...

[3] http://news.ycombinator.com/item?id=5216385

-----

themgt 512 days ago | link

As someone building a Heroku Cedar-esque PaaS[1], here's the problem with your analogy: back in Aspen/Bamboo days (and what a lot of people still think of as "PaaS"), Heroku was like this (i.e your app was the one-a-time cashier, and Heroku's "routing mesh" setup checkout lanes and routed customers to your cashiers intelligently).

Now however, Heroku lets you build your own checkout lane, so you can run apps with single-response thread Rails, multi-response thread(e.g. unicorn) Rails, and async long-polling/SSE etc apps w/ ruby/node.js/scala/go/erlang/etc that can handle huge numbers of simultaneous connections. Throw websockets into the mix here too (we do). And you can even mix & match within an app, distributing requests to different stacks of code based on URL or the time of day, which may have different internal response/queuing characteristics (e.g. we have a Rails app w/ a Grape API w/ a handful of URLs mapped in via Rack::Stream middleware rather than going through Rails).

So to get back to your analogy, Heroku is automating the setup of the "lanes", but each supermarket is allowed to use its own blueprint and cashier and checkout process, and basically just do whatever they want within a lane. Maybe some "lanes" are more like a restaurant where 25 customers spend an average of 45 minutes at a time juggled between 6 waiters while others are still bottlenecked supermarket checkouts, with everything in between. Maybe one type of customer ties up the cashier/waiter so much that he can only handle 10 others instead of 100 normally. And it could all change every time the store opens (a deployment with new code occurs), or based on what the specific customers are buying.

The point is simply that there's not a "next available checkout counter" in this situation, because all apps are not single-threaded Rails apps anymore. Which doesn't mean there aren't better solutions than dumb routing, but it does get a bit more complicated than the supermarket checkout.

[1] http://www.pogoapp.com/

-----

pdenya 512 days ago | link

We're discussing Rails on Heroku specifically which, non-unicorn, should be a "next available checkout counter" situation. Ideally it should be possible to make this an optional behavior that you can choose to turn on for Rails apps.

-----

themgt 512 days ago | link

I agree there should be a better way - it's just important to understand than Rails doesn't get any special treatment on a PaaS done correctly, so it's important to come up with a generic solution.

I think part of the solution would be customizable option(i.e.. how many requests can each dyno handle simultaneously), probably combined with intelligently monitoring/balancing proxy load so new requests always go to the least-loaded dyno.

Buildpacks could probably be used to parse of Gemfile/etc, see if you're using what mix of webrick/unicorn/rails/sinatra/rack-stream/goliath etc, and set an semi-intelligent default. But apps are increasingly unlike a checkout line. Apps are more like the supermarket, which is harder.

-----

vidarh 512 days ago | link

Rails doesn't need to be treated specially. All that is needed is a "maximum number of simultaneous connections to pass to this backend" setting coupled with load balancing by available slots rather than purely randomly.

The issue here isn't that Rails needs to be treated specially - this problem applies to various extent in any type of backend where some types of requests might turn out to be computationally heavy or require lots of IO. You can't magic away that: A request that takes 8 CPU seconds will take 8 CPU seconds. If you start piling more requests onto that server, response times will increase, even if some will keep responding, and if another 8 CPU second request hits too soon, chances increase that a third one will, and a fourth, and before you know it you might have a pileup where available resources for new requests on a specific instance are rapidly diminishing and response times shoot through the roof.

Pure random distribution is horrible for that reason pretty much regardless.

Now, doing "intelligent" routing is a lot easier for servers with some concurrency, as you can "just" have check requests and measure latency for the response and pick servers based on current low latency and get 90% there and that will be enough for most applications. Sure, the lower the concurrency, the more you risk having multiple heavy queries hit the same server and slow things down, and this request grows dramatically with the number of load balancers randomly receiving inbound requests to pass on, but at least you escape the total pileup more often.

But that's also a clue to one possible approach for non-concurrent servers: group them into buckets handled by a single active load balancer at a time and have front ends that identifies the right second layer load balancers. Shared state is now reduced to having the front end load balancers know which second layer load balancers are the currently active ones for each type of backend. It costs you an extra load balancer layer with according overhead. But don't you think OP would prefer an extra 10ms per request over the behaviour he's seen?

-----

mononcqc 512 days ago | link

I'm sure OP could prefer the extra 10ms, but then everyone else who can deal with random dispatching right now has to pay a 10ms penalty because OP built his stuff on a technology that can deal with only one request at a time on a server, which boggles the mind to begin with.

-----

vidarh 512 days ago | link

Why? The system could easily be built so that it by default only aggregates those services where the configuration indicates they can handle a concurrency below a certain level, and does random balancing of everything else.

The "everyone else who can deal with random dispatching right now" is a much smaller group than you think. Anyone who has long running requests that grind the CPU or disk when running, will be at high risk of seeing horribly nasty effects from random dispatching, no matter whether their stack in ideal conditions have no problem handling concurrent requests.

It's just less immediately apparent, as any dynos that start aggregating multiple long running requests will "just" get slower and slower instead of blocking normally low-latency requests totally.

-----

DigitalJack 512 days ago | link

"The system could easily be built so that it by default only aggregates those services where the configuration indicates they can handle a concurrency below a certain level, and does random balancing of everything else."

Let me know when you are done with that.

-----

vidarh 512 days ago | link

I've built fairly large haproxy based infrastructures, thank you very much. Doing this is not particularly challenging.

Actually what I'd probably do for a setup like this would be to balance by the Host: header, and simply have the second layer be a suitable set of haproxy instances balancing each by least connections.

Immediately vastly better than random.

-----

themgt 512 days ago | link

Haproxy doesn't support dynamic configurations as far as I know, which is a serious problem if you're letting lots of people add/change domains and scale backends up/down dynamically. A Heroku haproxy would probably need to be restarted multiple times a second due to config changes. Nginx can do dynamic backends with lua & redis, but it can't use the built-in upstream backend balancing/failover logic if you do.

-----

vidarh 511 days ago | link

While it doesn't support dynamic configurations, it does support hot reconfiguration (the new daemon signals the old processes to gracefully finish up and shut down), and reconfigures very rapidly. You still don't want to restart it multiple times a second, but you don't need to:

A two layer approach largely prevents this from being a problem. You can afford total overkill in terms of the number of haproxies as they're so lightweight - running a few hundred individual haproxy instances with separate configs even on a single box is no big deal.

The primaries would rarely need to change configs. You can route sets customers to specific sets of second layer backends with ACL's on short substrings of the hostname (e.g. two letter combinations), so that you know which set of backends each hostname you handle maps to, and then further balance on the full host header within that set to enable the second layer to balance on least-connections to get the desired effect.

That lets you "just" rewrite the configs and hot-reconfigure the subset of second layer proxies handling customers that falls in the same set on modifications. If your customer set is large enough, you "just" break out the frontend into a larger number of backends.

Frankly, part of the beauty of haproxy is that it is so light that you could probably afford a third layer - a static primary layer grouping customers into buckets, a dynamic second layer routing individual hostnames (requiring reconfiguration when adding/removing customers in that bucket) to a third layer of individual customer-specific haproxies.

So while you would restart some haproxy multiple times a second, the restarts could trivially be spread out over a large pool of individual instances.

Alternatively, "throwing together" a second or third layer using iptables either directly or via keepalived - which does let you do dynamic reconfiguration trivially, and also supportes least-connections load balancing - is also fairly easy.

But my point was not to advocate this as the best solution for somewhere like Heroku - it doesn't take a very large setup before a custom solution starts to pay off.

My point was merely that even with an off the shelf solution like haproxy, throwing together a workable solution that beats random balancing is not all that hard - there's a large number of viable solutions -, so there really is no excuse not to for someone building a PaaS.

-----

badgar 511 days ago | link

You're right. They'd have to instead build a load-balancer that solves the problem, and that's too darn hard.

-----

dragonwriter 512 days ago | link

> it's just important to understand than Rails doesn't get any special treatment on a PaaS done correctly, so it's important to come up with a generic solution.

Its kind of weird to describe not optimizing the entire platform provided to apps as "PaaS done correctly". Making a PaaS more generic has a certain kind of value in terms of broadening the audience and enabling heterogenous systems to be implemented on it, but if you are doing that by sacrificing the optimization of the individual application platforms available, you are losing some of what makes a PaaS valuable as opposed to roll-your-own platform support on top of a generic IaaS.

Its especially problematic to say that worsening support for the main existing app framework in use on an establish PaaS and giving existing customers orders of magnitude less value for their money is doing something right.

> I think part of the solution would be customizable option

That's probably a good idea, though the default for existing apps should not have changed, especially without clear up-front notice.

> But apps are increasingly unlike a checkout line.

Existing apps are, for the most part, exactly as much like a checkout line as they were before the unannounced change.

-----

skrebbel 512 days ago | link

> it's just important to understand than Rails doesn't get any special treatment on a PaaS done correctly

Why is it only "done correctly" if it does not account for specific properties of the technology used by a particular customer?

-----

themgt 512 days ago | link

Because PaaS is a generic technology for running and scaling applications with a multitude of different language/framework/stacks, and many/most of those apps do not share the specific properties of single-threaded Rails (including many Ruby/Rails apps!)

And Rails 4 is going to bake-in "live streaming", making single-threaded app servers even more of an edge case.

-----

bradleyjg 512 days ago | link

That sounds a lot more like IaaS to me. PaaS should be providing the entire platform, hence the name.

The entire promise of the space is that the customer only has to worry about his own code and perhaps tweaking a few knobs.

-----

skrebbel 512 days ago | link

That's like saying Craigslist did it correctly and AirBnB didn't because AirBnB is only tailoring to a specific segment of the world's supply and demand market.

Rails is very widely used. How can you consider that an edge case?

-----

themgt 512 days ago | link

It's like saying EC2 should tailor its virtualization to Fedora 16, or Mac OS X should tailor its windowing system to Photoshop CS4, or Apache should tailor mod_proxy to Joomla. There may be specific attributes of popular applications that need to be adapted to, but those adaptations need to be built in a generic way and exposed through a standard API.

Since even many Rails apps now do not follow a single threaded request-response model, that model of running a web application needs to be considered as one case of many, and building a platform that supports many/all use-cases as well as possible is more complicated than building a platform that fits one use case like a leather glove.

-----

doktrin 512 days ago | link

> Mac OS X should tailor its windowing system to Photoshop CS4

I think statements like these obscure away the very tight coupling Heroku has historically had with Rails. While certainly Heroku now perhaps envisions itself as a do-it-all PaaS, there's no denying Rails at one point (and, numerically, perhaps still) was their bread and butter.

While I don't have numbers to support or refute the assertion that "most Rails apps are primarily single threaded", my suspicion is that this is in fact still the case.

-----

skrebbel 512 days ago | link

> or Mac OS X should tailor its windowing system to Photoshop CS4

I'm taking this example out specifically, because it illustrates my point quite nicely.

If there were a sizeable community of people who only wanted to use a computer for Photoshop, and tailoring the windowing system to them made it a significant usability improvement for those people, then it would be a completely imaginable situation that upon first opening your brand new Mac, it'd ask you whether you're one of those Photoshop people and want the special windowing system setup.

Well, ok, haha Apple and customizing anything for anyone, ever. But many other vendors might make such a choice.

The apparent stubborn refusal of many PaaS services, including Heroku and yours, to particularly tailor to a very common configuration of Rails sounds like a hole in the market, to me. As a customer, I don't care whether this is "incorrect" because Rails does not conform to some yet-to-be-defined standard or simply because the PaaS doesn't have their shit together. The customer experience is the same: I'm running a blog tutorial app, and the performance sucks.

-----

regularfry 512 days ago | link

Microsoft have definitely done exactly this. Windows versions have famously been "rebroken" in development to keep bug-for-bug compatibility so they didn't break big third party applications.

-----

papsosouid 512 days ago | link

The 'P' stands for platform. Providing the platform as a service absolutely means catering to the specific needs of the platform. There is no generic platform. If you want to support multiple platforms, then you support multiple platforms. You don't stop supporting any platform at all.

-----

SnootyMonkey 510 days ago | link

This whole we can't optimize for Rails anymore seems like a red herring. Dumb (random) routing is dumb routing. It doesn't matter if you have single threaded Rails or Django stack or highly concurrent Node.js or Erlang serving requests, if you distribute the requests randomly you're not going to efficiently use your resources and the Heroku "promise" of spinning up a 2nd dyno and getting 100% more concurrency is just not true.

All it changes is the details of the analysis, not the core finding. It makes the problem not as worse, but it's still pretty bad, and it's worse the more uneven your traffic is (in terms of how long each request takes to service).

All apps, even Unicorn, JRuby, Node.js, Erlang, etc. would benefit from something better than random routing.

-----

wereHamster 512 days ago | link

Since when does Heroku support WebSockets?

-----

themgt 512 days ago | link

It doesn't yet, but that doesn't really change the situation if you've already got long-polling/SSE. I mentioned it because we support it and it seems like a big part of the model the the web is moving towards (which is significantly less request-response based).

-----

newhouseb 512 days ago | link

Queuing theory is cool, but I'm not 100% sure it actually applies here in a meaningful sense (although, disclaimer: I'm no more experienced here than you). A lot of queuing theory assumes that you must route a request to a handler immediately as you receive it, and that reassigning a request is a very expensive process.

This intuitively explains why queueing theory is very big in router design - imagine that you send a packet through 10 hops and at the last hop experiences a significant delay, does the packet then turn around and go back through another router? Which hop does it pick to look for a different path through? What happens if the packet gets delayed going backwards looking for another route? Does it reverse _again_ looking for a quicker route to another route? Answer: it doesn't, routers deliver messages with the "best effort" (in protocol terminology) they can, and high level latency trends are adapted for through the routing algorithms themselves (read: not adapted for each individual packet). This keeps transport much simpler and therefore faster.

In the case of load balancing, if the "(re)assignment cost" (my terminology) of a request is sufficiently small, then it doesn't make sense to pre-distribute requests until you can be 100% sure a worker is ready. If a request takes 40ms to process and 0.5ms to distribute/assign to a worker, then waiting for feedback (which would also take 0.5ms) from a worker would incur a slowdown of (40 + 0.5 + 0.5)/40 versus if you pre-assigned before a worker was finished. This seems like a no-brainer if it would keep the width of the distribution of your latencies down.

Edit: thinking about this more, if you have an asynchronous worker model, Queueing theory comes back into play. If a worker stops processing Request A to wait for a network response and takes up Request B, and then the network responds while Request B is still working, moving Request A mid-handling to another free machine may be very hard/expensive, if not entirely impossible. As themgt brings up, it sounds like Heroku enabled an asynchronous model in a recent stack and may have dropped the feedback loop that allows for intelligent routing because there's no obvious way to implement it in an asychronous model.

That being said, you could still have a feedback loop when a worker is completely idle. It's certainly very hard to reason about a worker currently executing any task, but it is very easy to reason about a worker that isn't executing anything. Therefore, it should be straightforward (theoretically) to do intelligent routing to empty workers and then keep the random routing for busy workers in such a way that if there is an idle worker no request should ever go to a busy worker. A more advanced version would keep track of the number of open connections to each worker and simply distribute to the worker with the fewest number of active open connections.

I just checked and nginx actually has a directive (least_conn) to do exactly this, but it's not enabled by default! ELB apparently does something similar (see: https://forums.aws.amazon.com/thread.jspa?messageID=135549&#...).

-----

fizx 512 days ago | link

Yeah, you want leastconn + app/router affinity. Affinity is the statement that all of your requests for an app go through one router (to avoid distributed connection state tracking).

In the past, I've accomplished this by having the next layer up consistent hash the api keys onto the router list. If you don't control the top layer (ELB), you need to add a dumb layer just for the hashing.

HAProxy works great for this extra layer. In practice, all you end up doing is adding a "balance hdr(host)" directive (see http://haproxy.1wt.eu/download/1.5/doc/configuration.txt) to get the hashing right, and you're spending <1ms inside HAProxy.

-----

newhouseb 512 days ago | link

Maybe this was after you did your work, but ELB currently supports affinity, see: http://aws.amazon.com/about-aws/whats-new/2010/04/08/support...

-----

fizx 512 days ago | link

You want affinity by Host: header, not by cookie/session.

-----

stock_toaster 512 days ago | link

haproxy also has a few balance algos[1] that it can be configured to use. I would think something like static-rr would even be somewhat better than random.

[1]: http://cbonte.github.com/haproxy-dconv/configuration-1.5.htm...

-----

jcromartie 512 days ago | link

Just a little thing: why can't we just say "servers" and "requests" instead of "registers" and "customers", because stores don't get 9000 customers per minute and don't have 500 registers. Everybody here would understand servers and requests.

-----

mieubrisse 512 days ago | link

Though I'm aware of the basics of request routing, forming a real-world analogy that's more tangible definitely helped me grok the explanation.

-----

pinars 512 days ago | link

Distributed load balancing is a tough problem with two pieces to it. One is the queueing theory part.

The other is the systems side to it. If you have multiple customers and multiple checkout lines, and if your customers act independently without seeing the lines (no feedback from servers, network failures and delays, implementation complexity), what do you do?

It isn't a trivial problem. The easy route is paying Cisco's load balancers millions of dollars, but those only scale so far.

The bigger internet companies spend years of development time trying to make distributed load balancing work, but the issues there are a bit more complicated than a few customers walking to checkout lines.

-----

drostie 512 days ago | link

You are certainly right that it becomes an M/G/1 model and thus that M/M/1 and M/D/1 will give reasonable approximations. The reason that it's an M/?/1 model is partly what you're saying, but also partly because the random assignment of the incoming requests acts as a random filter on a Poisson process. Poisson processes are just defined by not having history, and the random filter is not history-dependent either, so the output from the filter -- the input to any node -- is still a Poisson process.

What's interesting to me here is: Suppose rather than doing this with a random variable, you do it with a summing filter, a simple counter:

    on request r: 
        node[n].handle(r)
        n = (n + 1) % node.length
That's a really simple solution but it should tend to average out the Poisson input, which gives you something closer to a D/G/1 problem.

-----

dxbydt 511 days ago | link

>as an opportunity to teach myself a bit of Queueing Theory[1]

My dear Sir, you are a brave man. I tried the same 1.5 years back on HN - http://news.ycombinator.com/item?id=3329676

-----

SeanDav 511 days ago | link

OT: I went to the website, had my eyes assaulted with multiple font colours and sizes and left immediately without even trying to read.

People, there is a compromise between Google "brain dead" simplicity and MySpace pages "psycho" look, that is easy to read but still functional.

-----

vicks711 512 days ago | link

I cant stop myself from saying this. You wrote all this instead of doing what?

-----

humbledrone 512 days ago | link

Normally I would just downvote you and move on, but in this case your comment is frustrating enough that I have to say something. I found the comment you responded to (by nsrivast) quite fascinating. A well-written but brief analysis of the problem, with sources attached for further reading -- what's not to like? In-depth and thoughtful comments like that are what keep me coming back to this site, and are what make the community great.

I for one am very thankful that nsrivast took the time to write something so technical and detailed. However, I found your response to be in extremely poor taste. It added nothing to the conversation, and IMHO was rude and unnecessary.

-----

vicks711 512 days ago | link

really?

-----

mcherm 512 days ago | link

Yes. Really. Some people[1] spend their time writing lengthy and technical posts about specific technical issues. The rest of the world benefits from this. And the person writing the post benefits too, because trying to write something like that makes you smarter. Perhaps you should try it sometime.

[1] - I occasionally do this.

-----

vicks711 512 days ago | link

your handle is humbledrone and your opinions are humbled. Boy am i impressed!

-----

lectrick 510 days ago | link

Maybe he's off work? Maybe he's "decompressing" from a hard problem at his job? Maybe you should STFU and enjoy the free information you are getting?

-----

nthj 512 days ago | link

I'm inclined to wait until Heroku weighs in to render judgement. Specifically, because their argument depends on this premise:

> But elsewhere in their current docs, they make the same old statement loud and clear: > The heroku.com stack only supports single threaded requests. Even if your applicaExplaintion were to fork and support handling multiple requests at once, the routing mesh will never serve more than a single request to a dyno at a time.

They pull this from Heroku's documentation on the Bamboo stack [1], but then extrapolate and say it also applies to Heroku's Cedar stack.

However, I don't believe this to be true. Recently, I wrote a brief tutorial on implementing Google Apps' openID into your Rails app.

The underlying problem with doing so on a free (single-dyno) Heroku app is that while your app makes an authentication request to Google, Google turns around and makes a "oh hey" request to your app. With a single-concurrency system, Google your app times out waiting for Google to get back to you and Google won't get back to you until your app gets back to you so hey deadlock.

However, there is a work-around on the Cedar stack: configure the unicorn server to supply 4 or so worker processes for your web server, and the Heroku routing mesh appropriately routes multiple concurrent requests to Unicorn/my app. This immediately fixed my deadlock problem. I have code and more details in a blog post I wrote recently. [2]

This seems to be confirmed by Heroku's documentation on dynos [3]: > Multi-threaded or event-driven environments like Java, Unicorn, and Node.js can handle many concurrent requests. Load testing these applications is the only realistic way to determine request throughput.

I might be missing something really obvious here, but to summarize: their premise is that Heroku only supports single-threaded requests, which is true on the legacy Bamboo stack but I don't believe to be true on Cedar, which they consider their "canonical" stack and where I have been hosting Rails apps for quite a while.

[1] https://devcenter.heroku.com/articles/http-routing-bamboo

[2] http://www.thirdprestige.com/posts/your-website-and-email-ac...

[3] https://devcenter.heroku.com/articles/dynos#dynos-and-reques...

[edit: formatting]

-----

wwarnerandrew 512 days ago | link

Yes, it's true that the Cedar stack supports forking web servers like unicorn, and that an individual dyno can run multiple workers and therefore serve multiple requests at the same time.

However, dumb routing is still very problematic – even if your dyno can work on two requests simultaneously it's still bad for it to get sent a third request when there are other open dynos.

Also, for apps with a large-ish memory footprint, you can't run very many workers. A heroku dyno has 512mb memory, so if your app has a 250mb footprint, then you can basically only have two workers.

Another essential point to note is that the routing between cedar and bamboo is essentially unchanged. They simply changed the type of apps you can run.

-----

richcollins 512 days ago | link

Right if the apps internal queue is full and it stops accepting connections I'm assuming it will still queue at the dyno level anyway.

-----

kennystone 512 days ago | link

If you have 2 unicorn servers and you happen to get 3 slow requests routed to it, you are still screwed, right? Seems to me like it will still queue on that dyno.

-----

michaelrkn 512 days ago | link

That's exactly what happened to us - switching to unicorn bought us a little time and a bit of performance, but we hit the exact same problems again after a couple more weeks of growth.

-----

ibdknox 512 days ago | link

Yeah, the only real question is whether or not it's true that they no longer do intelligent routing. If that is the case, then regardless of anything else the problem exists once you pass a certain scale/request cost. It won't matter if that one dyno can handle hundreds of requests at once, it will still queue stupidly.

-----

barmstrong 512 days ago | link

This is true - unicorn masks the symptoms for a period of time but does not solve the underlying problem in the way a global request queue would.

Also, if the unicorn process is doing something cpu intensive (vs waiting on a 3rd party service or io etc) then it won't serve 3 requests simultaneously as fast as single processes would.

-----

rjacoby5 512 days ago | link

One of the hidden costs of Unicorn is spin-up time. Unicorn takes a long time to start, then fork. We would get a ton of request timeouts during this period. Switching back to Thin, we never got timeouts during deploys - even under very heavy load.

-----

adrr 512 days ago | link

Maybe this is a stupid question, but with unicorn it forks the request and can process multiple requests at the same time. Previously it seems that only one request could be handled by the dyno so requests had to queue on the dynamic routing layer but with multiple request support with unicorn or whatever, wouldn't it be more efficient to dump all the requests to dynos? Followup question, also how would intelligent routing work if it just previously checked to see if which dyno had no requests? That seems like an easy thing to do, now you would have to check CPU/IO whatever and route based on load. Not specifically targeted at you but to everyone reading the thread.

-----

vidarh 512 days ago | link

> Previously it seems that only one request could be handled by the dyno so requests had to queue on the dynamic routing layer but with multiple request support with unicorn or whatever, wouldn't it be more efficient to dump all the requests to dynos?

It would be if all requests were equal. If all your requests always take 100ms, spreading them equally would work fine.

But consider if one of them takes longer. Doesn't have to be much, but the effect will be much more severe if you e.g. have a request that grinds the disk for a few seconds.

Even if each dyno can handle more than one requests, since those requests share resources, if some of them slows down due to some long running request, response times for the other requests are likely to increase, and as response times increase, it's queue is likely to increase further, and it gets more likely to pile up more long running requests.

> Followup question, also how would intelligent routing work if it just previously checked to see if which dyno had no requests? That seems like an easy thing to do, now you would have to check CPU/IO whatever and route based on load. Not specifically targeted at you but to everyone reading the thread.

There is no perfect answer. Just routing by least connections is one option. it will hurt some queries that will end up being piled up on servers processing a heavy request in high load situations, but pretty soon any heavily loaded servers will have enough connections all the time that most new requests will go to lighter loaded servers.

Adding "buckets" of servers for different types of requests is one option to improve it, if you can easily tell by url which requests will be slow.

-----

nevinera 512 days ago | link

That gets pretty unlikely, especially if you have many dynos and a low frequency of slow requests. The main reason unicorn can drastically reduce queue times here is that it does not use random routing internally.

-----

richcollins 512 days ago | link

how does it decide to queue at the dyno level anyway? Does it check for connection refusal at the TCP level?

-----

dblock 512 days ago | link

The connection is accepted, and a single-threaded web server will do the queuing.

-----

richcollins 511 days ago | link

Oh so the server process hosting rails is itself queueing? Is that what they refer to as "dyno queueing"? I thought perhaps there was another server between the router and your apps server process.

-----

spoiler 512 days ago | link

Slightly off topic, but what are everyone's. experience and thoughts about Puma[1]?

I am using it on a small production environment with Heroku and I like it, but when we officially launch the app, should we switch to Unicorn?

[1] http://puma.io/

-----

gizzlon 512 days ago | link

I don't have a stake in the Ruby webserver wars, but the unicorn site has a very good discussion about how it works internally, why it's build as it is, pros & cons etc..

This seems to be missing from most of these project sites, which are often just marketing (look! It's better!!), and therefore not very trustworthy.

From the outside it looks like the biggest differentiator in each generation of ruby servers (and, I guess, db managment systems :) is not that the new is better or worse, but simply that has different trade-offs.

-----

ylansegal 512 days ago | link

I did some performance analysis on puma vs unicorn vs thin a while ago:

http://ylan.segal-family.com/blog/2012/08/20/better-performa...

Although as noted in the comments, I neglected to run threadsafe! and should have probably tried rubinius or jruby. I have been meaning to redo. Take with a grain of salt

-----

argarg 512 days ago | link

You should give a try to puma 2.0 currently in beta 6 (https://rubygems.org/gems/puma). Lots of performance improvements. I haven't benchmarked it but my guess is it outperforms unicorn.

-----

NatW 512 days ago | link

You seem have written one of the very few articles I've seen benchmarking this. I'd love to know more about how Puma compares to Unicorn (especially unicorn configurations mentioned by some in this conversation) and Thin for serving rails on Heroku. Many of the Unicorners pushing their solution don't appear to be aware of Puma and its potential benefits. I'm curious if Puma with MRI has benefits, too. Thanks!

-----

miloshadzic 512 days ago | link

AFAIK Puma should have a lower memory footprint on MRI than Thin but I haven't done benchmarks myself.

-----

steveklabnik 512 days ago | link

Also note that Rails 4 will have threadsafe! on by default, too, so if you didn't have it on here, that'll make it different for those apps.

-----

tcc619 512 days ago | link

I have been using it on a few small projects. Haven't ran into an issue yet and the setup has been really easy.

-----

pselbert 512 days ago | link

We are running a smaller app on JRuby using Puma with thread safe on. It has a significantly smaller footprint, as we are only booting one server.

Overall really solid, though more useful if you can use something other than MRI.

-----

jarcoal 512 days ago | link

This is how I run my apps as well, and they seem to handle more than one request concurrently per dyno, but I'm not smart enough to dispute this post, so I'm just sitting back and watching.

-----

dedsm 512 days ago | link

Nevertheless, random routing is a bad idea even if a dyno can handle multiple requests simultaneously

-----

scottshea 512 days ago | link

To amplify, I do a ton of queue adjustment work with Unicorn at the backlog level. It is so frequent that we set up Unicorn startup to read from an ENV variable on Heroku that we set as needed.

With two Unicorn workers we found that 25 was the best backlog threshold to accept (it refuses additional requests). When we were able to go to 5 Unicorn workers on Heroku we had to start to adjust that.

-----

tibbon 512 days ago | link

You don't happen to have any documentation for how to do that do you? Very curious. Never seen anything about setting up Unicorn like this prior (I'm just using 3 or 4 Unicorns/dyno currently)

-----

jwrubel 512 days ago | link

Here's a gist of our unicorn.rb config (https://gist.github.com/apangeajwrubel/4953849) Using env variables lets us adjust parameters without a code push (still requires a restart). We saw dramatic reduction in 'long tail' requests when we dropped the backlog to 25. We're experimenting now (thanks to @scottshea) with even lower values. At some point the routing mesh will give up retrying and throw an H21 (https://devcenter.heroku.com/articles/error-codes#h21-backen...). One data point that would be useful from heroku is how many retries we get.

-----

pointful 512 days ago | link

To expand on this:

You have to remove the port declaration from the line for Unicorn in your Procfile, and then add a line like this to your unicorn.rb file to define the listener port along with adjusting the backlog size:

listen ENV['PORT'], :backlog => Integer(ENV['UNICORN_BACKLOG'] || 100)

-----

scottshea 512 days ago | link

We did this in unicorn.rb `:backlog => Integer(ENV['UNICORN_BACKLOG'] || 200)` and then set the UNICORN_BACKLOG variable by the Heroku command line `heroku config:set UNICORN_BACKLOG=25 -a <app_name>`. We have been as high as 1024 and as low as 10. We settled in at 25 for us.

-----

seivan 512 days ago | link

How I have been doing it for the last year.

Puma define 4:8 threads or Unicorn 3 workers.

-----

teich 512 days ago | link

This is Oren Teich, I run Heroku.

I've read through the OP, and all of the comments here. Our job at Heroku is to make you successful and we want every single customer to feel that Heroku is transparent and responsive. Getting to the bottom of this situation and giving you a clear understanding of what we’re going to do to make it right is our top priority. I am committing to the community to provide more information as soon as possible, including a blog post on http://blog.heroku.com.

-----

doktrin 512 days ago | link

Thanks for the response, but I have to admit that the lack of a clear-cut answer here is a little worrisome.

Anyone who wants to like Heroku would hope that the OP is flat out, 100%, wrong. The fact that Heroku's official answer requires a bit of managing implies otherwise.

On a related tangent, I would also encourage future public statements to be a little less opaque than some Heroku has put out previously.

For instance, the cause of the outage last year was attributed to "...the streaming API which connects the dyno manifold to the routing mesh" [1]. While that statement is technically decipherable, it's far from clear.

[1] https://status.heroku.com/incidents/372

-----

toyg 512 days ago | link

Maybe it doesn't need "managing", Oren might just want to talk with whoever was responsible for the change and see what the best way forward is. I don't think panicked, knee-jerk reactions like "OMG we were wrong and will revert that commit pronto!" are beneficial in situations as complex as this.

-----

doktrin 512 days ago | link

You're assuming that the change was actually made. Until we hear definitively from Heroku, the only evidence is an (admittedly, well documented) blog post.

-----

toyg 512 days ago | link

Yeah, absolutely. I'm just saying we can't expect a to manager to immediately respond to a highly-technical issue questioning subtle changes in internal behaviour which might have been introduced years ago.

-----

bambax 512 days ago | link

What's the point of posting a link to the front page of your blog, where the most recent article is 15 days old (4 hours after the comment above)?

What we want to know:

- is the OP right or wrong? That is, did you switch from smart to naive routing, for all platforms, and without telling your existing or future customers?

- if you did switch from smart to naive routing, what was the rationale behind it? (The OP is light on this point; there must be a good reason to do this, but he doesn't really say what it is or might be)

- if the OP is wrong, where do his problems might come from?

- etc.

-----

shabble 512 days ago | link

>> I am committing to the community to provide more information as soon as possible, including a blog post on http://blog.heroku.com

> What's the point of posting a link to the front page of your blog, where the most recent article is 15 days old (4 hours after the comment above)?

I think OP is saying 'I am going to investigate the situation; when I am finished here [the blog] is where I will post my response', not that there is something there already.

That said, it's all a little too PR-Bot for my taste (although there's probably only so many ways to say the same info without accidentally accepting liability or something).

-----

mcguire 512 days ago | link

Note: I think we have different referents for "OP" here; bombax's is, I think, the whining customer; while shabble's is the pompous CEO.

Me, I'm the swarthy pirate. Arrrh.

-----

bambax 512 days ago | link

Upvoted, although it's bambax, not bombax ;-)

-----

mcguire 499 days ago | link

I'm having trouble reading around the eye patch.

-----

eli 512 days ago | link

What's the point of posting a link to the front page of your blog

Well, he promised a detailed blog post, at which point that link will be extremely helpful.

I do not think it is fair to expect an immediate detailed response to those questions. If I were CEO of Heroku, I wouldn't say anything definite until after talking to the engineers and product managers involved--even if I was already pretty sure what happened. The worst thing you could do at this point is say something that's just wrong.

-----

bambax 512 days ago | link

I don't expect an immediate response; I would have been happy with just: "This is Heroku's CEO. I'm on it."

But a link, that doesn't point anywhere useful, introduced by a PR phrase that sounds a little like "Your call is important to us", was a little annoying, esp. after reading the OP where they say they have contacted Heroku multiple times on this issue.

-----

eli 512 days ago | link

I guess it's a matter of perception. I thought "I'm on it, expect an update later" is what he said.

-----

praptak 512 days ago | link

> if you did switch from smart to naive routing, what was the rationale behind it?

Most probable cause: smart routing is hard to scale. Multiple routers, with each one doing random distribution independently of others will still produce a globally random distribution. No need for inter-router synchronization.

If multiple routers try smart routing, they must do quite a bit of state sharing to avoid situations where N routers try to schedule their tasks on a single dyno. And even if you split dynos between routers then you need to move requests between routers in order to balance them.

-----

character0 512 days ago | link

While I think it is appropriate for Heroku to respond to this thread (and other important social media outlets covering this), linking to a blog without any messaging concerning your efforts might not be the greatest move... This may not be a sink or swim moment for Heroku, but tight management of your PR is key to mitigating damage. Best of luck, Heroku is a helpful product and I want to see you guys bounce back from the ropes on this one.

-----

csense 512 days ago | link

Telling people where to look for a reply when they have one is a great idea, IMHO.

-----

teich 511 days ago | link

I've posted an update on our blog, with another to follow tomorrow:

https://blog.heroku.com/archives/2013/2/15/bamboo_routing_pe...

-----

tjbiddle 512 days ago | link

Looking forward to your blog post. Hoping things get cleared up!

-----

willvarfar 512 days ago | link

hint just use a rabbitmq queue or something. Don't have a 'smart' LB that has to know everyone's state; instead, have dynos that get more work as quick as they can.

-----

praptak 512 days ago | link

MQ might be a solution but certainly not in the "just use" class. Unless you want to introduce a bottleneck and a single point of failure, this queue has to be distributed.

Managing a distributed queue is hard, for reasons similar to ones making the original problem hard - DQs require global state in a distributed environment. There are tradeoffs involved - the synchronization cost might become a bottleneck in itself.

Pushing the problem on the distributed brokers is making a big bet on the queuing solution. Nope, definitely not in the "just use" category.

-----

willvarfar 512 days ago | link

yes, I know all the ins and outs.

But they will end up building a pull rather than push system in the end.

-----

GhotiFish 512 days ago | link

I'm looking forward to hearing why Heroku is using such a strange load balancing strategy.

-----

avodonosov 512 days ago | link

I hope the solution will not break the possibility for multithreaded apps to receive several requests

-----

sylvinus 512 days ago | link

Me too. I see this as a side-effect of Rails single-threaded craziness, our "modern" Node.js apps run faster than ever.

-----

antihero 512 days ago | link

Can "Dynos" serve multiple requests simultaneously? That's the question, really.

-----

neilmiddleton 512 days ago | link

That's up to the process you have running on the dyno

-----

More



Guidelines | FAQ | Lists | Bookmarklet | DMCA | News News | Bugs and Feature Requests | Y Combinator | Apply | Library | Contact

Search: