Totally correct -- In our data center, and many of the universities that we work with, will have a right hand (RH) and left hand (LH) power feed for a dual PDU server, switch, router, console server, etc. You will typically run a power bar on the right and left hand side of the rack and wire then accordingly. You will have dedicated beakers, panels, UPS, and generator, etc for the RH/LH side. If you ever need to service a panel, then you can safely cut power knowing that everything will still be powered by its partner. This happens once and a while and allows you breathing room if you need to replace a breaker or a power feed fails. We also test our generators on a monthly cycle to test for failures.
I also wanted to address you point about batteries. We have a device on each battery that monitors it's state. So we can find faults before they cause the entire UPS to fail.
"We also test our generators on a monthly cycle to test for failures."
Curious - when "testing" them, how long do you run them for and at what load?
I could see the beancounters being _very_ unhappy with the ops people saying "we want to run both gen sets at full datacenter load for more than 10 minutes at a time, every month", which is what Amazon would have to have done to detect the faulty cooling fan problem. I'm guessing there are _some_ organisations who do that, but I suspect most datacenters don't.
I work for the Fed. We are a remote site and have several power outages each year due to trees on the power lines or snow related issues. It's pretty much required and we've had zero issues justifying it.
The baseline generator maintenance cost exceeds the fuel cost every month. A bigger issue is getting permits from your local/moronic government to run your generators for testing, but for a big datacenter, you're probably in an industrial neighborhood (to get correct power from multiple substations or higher) and this isn't an issue -- it's more an issue with office-datacenters or other backup systems in normal residential/commercial neighborhoods.
It takes a lot of discipline to run a shared power setup like you describe. Most of those servers that have two power supplies operate in a shared power mode, rather than active/failover. This means that if one of your sides (LH/RH) is over 50% and you fail over, you are going to have a cascading failure as the other side goes >100%. It used to surprise me how often I saw things like this, although I'm talking about server rooms in the low 100s of servers, not huge data centers.