
Texas Pickup Truck Crash Shuts Down the Internet (2007) - tosh
https://it.toolbox.com/blogs/it-management/texas-pickup-truck-crash-shuts-down-the-internet-011813
======
meesterdude
> emergency crews shut off power to rescue the truck driver without notifying
> Rackspace, the double whammy that the company had never planned on: the
> double power outage within the 30-minute chiller up-cycle. Oops.

Wow, what a design fault.

I used to do data center electrical/mechanical commissioning, which means
putting equipment and systems through their paces to prove things work and so
the vendor can get paid. We had a fairly standard process that we applied to
validation - but every once in a while we'd have to skip/tweak a test because
of something specific to the datacenter. Usually, this was a good thing, like
an extra layer of redundancy - but sometimes it was a glaring assumption that
we had to run-with, despite our objections.

I remember one DC had spec'd their cooling based off of power needs and
climate of the area. They had balanced things out to very thin margins - which
was in some ways very efficient. But... they didn't factor in global warming.
Come summer Some days they could only load to 50% capacity because they
couldn't keep things cool enough. Ended up redoing much of the DC to support
beefier cooling.

Single points of failure can take many forms and occur at many points along a
redundancy plan. The industry has oodles more examples to pull from. But now
that i'm nerdier, the better approach is to be able to tolerate DC failure in
your topology than try to prop up unicorn buildings. IIRC, google doesn't even
have backup generators, because utility is good enough.

------
teamunicorn
I'll never forget the phone call after the truck crash. It was about noon, and
I was scheduled to start a support shift at 2pm. My team lead called and not
long after, I was in Datapoint along with every other available racker,
ansewering the phones.

The way Rackspace respondrd to that incident has forever shaped my perspective
on what makes up an acceptable level of customer service.

Fanatical Fuckin Support.

~~~
adamson
Does your username reference the incident somehow?

~~~
teamunicorn
Not the event, but the team I was on at the time.

-subs

