Hacker Newsnew | comments | show | ask | jobs | submitlogin

Note: Just a bystander here

> What's the constraint that prevents you from having your dynos register with the loadbalancer cluster and then having the latter perform leastconn balancing per application

I suspect this is a consequence of the CAP theorem. You'll end up with every loadbalancer needing a near-instantaneous perception of every server's queue state and then updating that state atomically when routing a request. Now consider the failure modes that such a system can enter and how they affect latency. Best not to go there.

My understanding is that Apache Zookeeper is designed for slowly-changing data.




You'll end up with every loadbalancer needing a near-instantaneous perception of every server's queue

But that's not true. Only the loadbalancers concerned with a given application need to share that state amongst one another. And the number of loadbalancers per application is usually very small. I.e. the number is <1 for >99% of sites and you need quite a popular site to push it into the double digits (a single haproxy instance can sustain >5k connect/sec).

Assigning pooled loadbalancers to apps while ensuring HA is not trivial, but it's also not rocket science. I'm a little surprised by the heroku response here, hence my question which constraint I might have missed.

My understanding is that Apache Zookeeper is designed for slowly-changing data.

Dyno-presence per application is very slowly-changing data by zookeeper standards.

-----


Again, I'm no expert on Heroku's architectre. Just thinking out loud here, and feel free to tell me to RTFA. :-)

> the number of loadbalancers per application is usually very small. I.e. the number is <1 for >99% of sites and you need quite a popular site to push it into the double digits (a single haproxy instance can sustain >5k connect/sec).

So most Heroku sites have only a single frontend loadbalancer doing their routing, and even these cases are getting random routed with suboptimal results?

Or is the latency issue mainly with respect to exactly those popular sites that end up using a distributed array of loadbalancers?

> Assigning pooled loadbalancers to apps while ensuring HA is not trivial, but it's also not rocket science.

To me the short history of "cloud-scale" (sorry) app proxy load balancing shows that very well-resourced and well-engineered systems often work great and scale great, that is until some weird failure mode unbalances the whole system and response time goes all hockey stick.

> Dyno-presence per application is very slowly-changing data by zookeeper standards.

OK, but instantaneous queue depth for each and every server? (within a given app)

-----




Applications are open for YC Summer 2015

Guidelines | FAQ | Support | Lists | Bookmarklet | DMCA | Y Combinator | Apply | Contact

Search: