Hacker News new | comments | show | ask | jobs | submit login

"Are there two completely redundant power systems up to and including the PDUs and generators? How often are those tested?"

And "are they tested for long enough to detect a faulty cooling fan that'll let the primary generator run at normal full working load for ~10mins and are the secondary gensets run and loaded up long enough to ensure something that'll trip ~5mins after startup isn't configured wrong?

While they clearly failed, I do have some sympathy for the architects and ops staff at Amazon here. I could very easily imagine a testing regime which regularly kicked both generator sets in, but without running them at working load for long enough to notice either of those failures. My guess is someone was feeling quite smug and complacent 'cause they've got automated testing in place showing months and years worth of test switching from grid to primary to secondary and back, without every having thought to burn enough fuel to keep the tests running the generators long enough to expose these two problems.

"There is a "right way" to do this …"

There's a _very_ well known "right way" to do this in AWS - have all your stuff in at least two availability zones. Anybody running mission critical stuff in a single AZ has either chosen to accept the risk, or doesn't know enough about what they're doing… (Hell, I've designed - but never go to implement - projects that spread over multiple cloud providors, to avoid the possible failure mode of "What happens if Amazon goes bust / gets bought / all goes dark at once?")

Or maybe they should just do it backwards

Run the generators and have the grid as backup

And just stop the generators to validate fallback to grid once in a while

Large cogeneration sites (where they use the waste heat from electrical generation for process steam, building/district heating, etc.) actually do run in grid-backup mode. An example is MIT's cogeneration plant (a couple of big natural gas turbines on Vassar street) -- a lot of universities do this since they can use the steam for heating, and a lot of industrial sites do it for process.

It comes down to cost and zoning/permitting. It's much easier to get a permit to run a generator for backup use than to run one 24x7. It's also hard to get a 1-10MW plant which is per-KWh as efficient/inexpensive as the grid (although now that natural gas is about 20% of what it was when I last bought it, gas turbines actually are cheaper than industrial tariff grid power, if you have good gas access...). Being able to actually use the waste heat is what makes the combined cycle efficiency worth it.

There was a crazy plan to run a datacenter on a barge tethered to the SF waterfront, for a variety of reasons, but a primary one being power -- the SF city government wouldn't be able to regulate the engines/generators on a ship running 24x7.

My university had a big cogen plant, but it was never designed to power the entire campus (it was only able to do so at around 3 AM). Aside from providing heating and power, because it was run off of natural gas it qualifies for clean energy credits, which the university makes money off of by selling on the market.

Hmm, wouldn't it be less practical to do that with large CHP plants (vs small ones)? Here in Europe district heating CHP plants are generally run by utilities.

that sounds like the Crash Only Software paper, but with respect to power sources

Applications are open for YC Winter 2018

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact