Is there any reason the sources and targets couldn't both be thoroughly distributed throughout the cluster? Nothing says hard drives have to be perfectly replicated, you just need multiple copies of the data. I'm imagining that a HD dies, and the extra copies of what it contained are scattered all over. You re-replicate them by scattering them further all over. No one pod has to move any substantial amount of data.
Sure. You can absolutely replicate chunks. But you start kicking the problem upstream. A rack down is a couple pb, so you start doing a ton of cross rack transfers to get your replica counts back up. Now you're gated on nic/TOR/agg switch throughput. A DC down and you're gated on nics TORs Aggs & intra DC network. And this keeps adding up $$$ the further you get.
Ms had an interesting paper on data locality in storage last year. Can't recall the title offhand though.