I don't think they're loss-leadering on storage, but if they are, they don't think they will be for long. AWS (EC2 and S3 in particular) does very well when it comes to profit margins. I suspect they'd like to keep it that way, and that whatever they're charging gives them some slice of profit, however small.
Close. Tiered, yes. But remember who we're talking about.
First, no tape. The areal storage density of tape is lower than hard disks. Too many moving parts involved. Too hard to perform integrity checks on in a scalable, automated fashion without impacting incoming work.
Second, in order to claim the durability that they do (99.999999999%), that means every spot along the pipe needs to meet those requirements. That means the "near-line HDD array" for warm, incoming data needs to meet those requirements. Additionally, if the customer has specified that the data be encrypted, it needs to be encrypted during this staging period as well. It also needs to be able to scale to tens if not hundreds of thousands of concurrent requests per second (though, for something like Glacier, this might be overkill).
They've already built something that does all that. It's called S3. The upload operations likely proxy to S3 internally (with a bit of magic), and use that as staging space.
Dropbox stores their stuff in S3 a little different. It's not a 1:1 correspondence between user files and objects under Dropbox's S3 account. The fact that they use S3 as their backing store means very little. It certainly sounds good to have S3 in back when you talk about scalability and durability, but 1) they could just as easily use something else, and 2) depending on their sharding strategy, a single lost object could impact multiple files at the user level.
Amazon has ridiculous internal bandwidth. The costly bit is external. The time delay is largely internal buffer time - they need to pull your data out of Glacier (a somewhat slow process) and move it to staging storage. Their staging servers can handle the load, even at peak. GETs are super easy for them, and given that you'll be pulling down a multi-TB file via the Internet, your request will likely span multiple days anyhow - through multiple peaks/non-peaks.
I was referring to the external bandwidth. Even if pulling down a request takes hours, forcing them to start off peak will significantly shift the impact of the incremental demand. I'm guessing that most download requests won't be for your entire archive - someone might have multiple months of rolling backups on Glacier, but it's unlikely they'd ever retrieve more than one set at a time. And in some cases, you might only be retrieving the data for a single use or drive at a time, so it might be 1TB or less. A corporation with fiber could download that in a matter of hours or less.
I get it - but I'm arguing that the amount of egress traffic Glacier customers (in aggregate) are likely to drive is nothing in comparison to what S3 and/or EC2 already does (in aggregate). They'll likely contribute very little to a given region's overall peakiness.
That said - the idea is certainly sound. A friend and I had talked about ways to incentivize S3 customers to do their inbound and outbound data transfers off-peak (thereby flattening it). A very small percentage of the customers drive peak, and usually by doing something they could easily time-shift.