

Scaling lessons from Second Life - CoryOndrejka
http://arstechnica.com/business/data-centers/2010/02/what-second-life-can-teach-all-companies-about-scaling-web-apps.ars

======
nicpottier
I think my biggest learnings from building systems, and although he didn't
mention it explicitely, he did mention it, is to build your system to self
heal.

Shit happens, whether it is a database starting to fall over for some reason,
or your heap growing too big, or who knows what, something will eventually
fail. Having monitoring to tell you that is of course the first step, but the
second is to have that monitoring be able to shoot everything in the head to
see if it fixes things. It doesn't always work, but does a lot of times, and
although inelegant will usually result in a better user experience.

Every system needs to be able to self heal, especially if you want to keep
your sanity and you are a small outfit.

~~~
rm-rf
Sort of covered:

"Ideally, you've designed your system such that real-time human intervention
is never required; every failure should be handled automatically, with repairs
needing human hands neatly enqueued for processing during work hours."

My takeaway: Above a certain scale, one needs to stop responding to individual
failures, and start responding to statistics.

