I had an ultimately unrewarding conversation with Sean Quinlan (of Google GFS fame) about the futility of putting a lot of storage behind such a small channel (in Google's case the numbers were epicly Google of course but the argument was the same). You waste all of the spindles because the operation rate (requests coming into the channel) vs the amount of data ops needed to satisfy the request, basically leave your disks waiting around the next request to come in from the network. (btw that allows you to make a nearly perfect emission rate scheduler for disk arms but that is another story).
What this means is that petabyte pods are going to be nearly useless, although with an external index they can be dense.
The assumption here is that data is kept for a long time relative to how frequently it's written and read, so the IO speed probably isn't that big of a deal.
Ms had an interesting paper on data locality in storage last year. Can't recall the title offhand though.