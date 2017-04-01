https://www.ualberta.ca/news-and-events/newsarticles/2017/ap...
There's more details about the causes of the failure at that URL:
> the refrigeration chillers shut down due to “high head pressure” conditions. Essentially, the chillers were not able to reject their heat through the condenser water system—heat instead of cold circulated through the freezer.
> Compounding matters, the system monitoring the freezer temperatures failed due to a database corruption. The freezer’s computer system was actually sending out alarm signals that the temperature was rising, but those signals never made it to the university’s service provider or the on-campus control centre.
Edit: of course I'm being naive, and hindsight is always wonderful, but the total cost for a backup would be perhaps $100k/year?
Take 5 minimum wage, no benefit people. Some of these might be called "retirees". You need 5 to cover the 168 hours in a week. Or maybe you employ some starving "students", of which there is a nearly infinite supply nearby.
Sit one down in a chair. Tell him, "sit here, read books, surf the internet. Every 5 minutes look up and see what the temperature is. If it's higher than -35, call this list of people".
Seriously, I wonder why we don't take more advantage of humans in situations like this. In years past it was often routine to have humans present 24/7 at critical sites.
This is a common enough problem that it's a well understood factor in systems failures and has been extensively explored in literature with airline pilots, nuclear technicians and TSA X-Ray techs as human factor in engineering.
