How to Kill Your Servers - Learning How to Scale the Hard Way

ebiester · on Aug 23, 2010

Wait... how about, "Stress-test before the changeover so that you see the bottlenecks before your customer does?"

portman · on Aug 23, 2010

Agreed, but note that at high scale, stress-testing isn't as easy as it sounds.

For example, at an ad-serving company I worked with, stress-testing the ad servers required a botnet of hundreds of machines in order to approximate the load of millions of real users.

The hardware required to run a stress test can easily be 25% or 50% the hardware required to run your production environment. It's not just a workstation with JMeter.

btilly · on Aug 23, 2010

Another challenge is the difference between volume of traffic, and diversity of traffic. Naively set up load tests tend to be really good tests of your caching mechanism, and non-tests of other parts of your system.

rada · on Aug 23, 2010

At high scale, you should have a few grand to pay gomez for some real-world load testing without having to buy your own hardware. Also, it sounds like the author did not even try a "workstation with jmeter". I am by no means a scalability expert but even I know better than to stress-test on a live system.

lallysingh · on Aug 23, 2010

Amazon EC2 is a great place to get of load simulators.

rada · on Aug 23, 2010

The author said "a few years ago". I presume it was before cloud was available.

viraptor · on Aug 23, 2010

Sure, it's a different story if you have a server farm and need a client farm to test it. But in his scenario you could use an off-the-shelf laptop to saturate the local link with requests...

Even if it's not easy to tell the min/max bounds in high scale deployments, it's possible to give an approximate value by measuring elements. You have load balancers that don't do more than X req/s, servers that take X ms/req, databases which can do X q/s, caches with delay X ms in X% of request. That gives you an rough estimate - then it's about locating bottlenecks which prevent you from reaching those numbers. (simplified model)

mcgraw · on Aug 23, 2010

It's actually pretty amazing to see how many people don't really put their systems in any kind of stress to validate their performance needs. The place I work at now never took the time to do it and things exploded. Good buddies at start-ups launched sites with no performance metrics and exploded.

mmt · on Aug 23, 2010

Lesson #0: scalability is not a problem that can be solved exclusively in software.

Hire a sysadmin. And take his advice.

seven · on Aug 23, 2010

Lesson #0.1 make sure she has experience with high load situations.

vegai · on Aug 24, 2010

Everyone has to learn at some point.

flojo · on Aug 23, 2010

I could not agree more. But I have learnt, not many clients have the money for a hardware solution. But one thing I too have learnt .. stay away from pound etc, just stick to LVS.

mmt · on Aug 23, 2010

I'd like to clarify:

A solution that involves other-than-software is not automatically buy-vs-build, nor an appliance.

Broadly, to me, it's usually about choosing the right hardware and assembling/configuring it in a "custom"[1] way. This is usually significantly cheaper than trying to get anything off-the-shelf to scale.

stay away from pound etc, just stick to LVS

I've learned to stay away from anything with heavy dependency on kernel code. My preference is HApoxy.

[1] Optimized for the problem at hand, which, usually, is remarkably commonplace, just nowhere near as commonplace as a lowest common denominator web server.

bryanh · on Aug 23, 2010

I'd KILL to have scalability issues... we had one band that said they were likely to sell several hundred thousand copies of their single using BitBuffet, which got me all excited, but that never came to pass. I honestly believed it might happen because: 1) they were donating the money to charity, 2) it was pay-what-you-want, and 3) they were pretty well known.

I was actually a little sad he didn't kill my cute little Linode server...

adammichaelc · on Aug 24, 2010

I'm a pretty big noob in the scaling world, but as I read the article I kept thinking, "With services out there like Heroku, isn't scaling something of a solved problem (at least with a Rails app)?" Do web startups still manually handle the scaling process?

dagw · on Aug 24, 2010

Well one reason is that Heroku and the likes don't allow for all possible options and configurations you might run into. If your app requires some obscure back end software then you kind of have to host (at least part of) it yourself.

adammichaelc · on Aug 24, 2010

For the downvoters, I'm genuinely curious about the questions I asked.

spudlyo · on Aug 24, 2010

"Almost all programming can be viewed as an exercise in caching." --Terje Mathisen

siculars · on Aug 23, 2010

"... and your servers will DIE!"

Di diyanu, di diyanu, di diyanu, diyanu, diyanu!

chopsueyar · on Aug 24, 2010

http://en.wikipedia.org/wiki/Dayenu

chopsueyar · on Aug 23, 2010

I got the joke.

It would have been enough for us.