I could see it being a problem for Google, but Backblaze wants these for archival purposes, not something where there is going to be a lot of reading and writing. The write rate is going to be whatever speed their users upload stuff, divided by the total number of their storage pods. I assume this is relatively small. The read rate is going to be whatever speed their users download restores, divided by the total number of storage pods, which is probably much smaller still.
The assumption here is that data is kept for a long time relative to how frequently it's written and read, so the IO speed probably isn't that big of a deal.
No. As you said port speed doesn't matter for data at rest. What matters is ingest/exfil of data due to "exceptional" conditions. Prime cases are cluster/mirror failure. Remirroring existing data to another pod is port limited, as is ingest for pods that are remirror targets.
Is there any reason the sources and targets couldn't both be thoroughly distributed throughout the cluster? Nothing says hard drives have to be perfectly replicated, you just need multiple copies of the data. I'm imagining that a HD dies, and the extra copies of what it contained are scattered all over. You re-replicate them by scattering them further all over. No one pod has to move any substantial amount of data.
Sure. You can absolutely replicate chunks. But you start kicking the problem upstream. A rack down is a couple pb, so you start doing a ton of cross rack transfers to get your replica counts back up. Now you're gated on nic/TOR/agg switch throughput. A DC down and you're gated on nics TORs Aggs & intra DC network. And this keeps adding up $$$ the further you get.
Ms had an interesting paper on data locality in storage last year. Can't recall the title offhand though.