Hacker Newsnew | comments | show | ask | jobs | submit | sintaks's comments login

I don't think they're loss-leadering on storage, but if they are, they don't think they will be for long. AWS (EC2 and S3 in particular) does very well when it comes to profit margins. I suspect they'd like to keep it that way, and that whatever they're charging gives them some slice of profit, however small.

-----


Almost spot on. See: http://news.ycombinator.com/item?id=4416065

-----


Close. Tiered, yes. But remember who we're talking about.

First, no tape. The areal storage density of tape is lower than hard disks. Too many moving parts involved. Too hard to perform integrity checks on in a scalable, automated fashion without impacting incoming work.

Second, in order to claim the durability that they do (99.999999999%), that means every spot along the pipe needs to meet those requirements. That means the "near-line HDD array" for warm, incoming data needs to meet those requirements. Additionally, if the customer has specified that the data be encrypted, it needs to be encrypted during this staging period as well. It also needs to be able to scale to tens if not hundreds of thousands of concurrent requests per second (though, for something like Glacier, this might be overkill).

They've already built something that does all that. It's called S3. The upload operations likely proxy to S3 internally (with a bit of magic), and use that as staging space.

After that, the bottleneck is likely I/O to Glacier's underlying storage - but again, not tapes. See this post for deets: http://news.ycombinator.com/item?id=4416065

-----


See my post up top: http://news.ycombinator.com/item?id=4416065

-----


Data cryonics indeed. One of the early names kicked around (and, indeed, its working name for over a year) was Cold Storage.

-----


Yes. The forthcoming S3 Lifecycle Policy API additions should allow for exactly this, automatically.

-----


Dropbox stores their stuff in S3 a little different. It's not a 1:1 correspondence between user files and objects under Dropbox's S3 account. The fact that they use S3 as their backing store means very little. It certainly sounds good to have S3 in back when you talk about scalability and durability, but 1) they could just as easily use something else, and 2) depending on their sharding strategy, a single lost object could impact multiple files at the user level.

-----


Right - the point I was trying to make, is he was putting all his eggs in one basked. If anything catastrophic happened to S3, he might lose both his S3 as well as his Dropbox backups.

If you are going to the effort of having dual-backup systems, may as well try and find something that can't be impacted by a single disaster.

-----


Ah, right - definitely.

-----


He could just turn on object versioning.

-----


Amazon has ridiculous internal bandwidth. The costly bit is external. The time delay is largely internal buffer time - they need to pull your data out of Glacier (a somewhat slow process) and move it to staging storage. Their staging servers can handle the load, even at peak. GETs are super easy for them, and given that you'll be pulling down a multi-TB file via the Internet, your request will likely span multiple days anyhow - through multiple peaks/non-peaks.

-----


I was referring to the external bandwidth. Even if pulling down a request takes hours, forcing them to start off peak will significantly shift the impact of the incremental demand. I'm guessing that most download requests won't be for your entire archive - someone might have multiple months of rolling backups on Glacier, but it's unlikely they'd ever retrieve more than one set at a time. And in some cases, you might only be retrieving the data for a single use or drive at a time, so it might be 1TB or less. A corporation with fiber could download that in a matter of hours or less.

-----


I get it - but I'm arguing that the amount of egress traffic Glacier customers (in aggregate) are likely to drive is nothing in comparison to what S3 and/or EC2 already does (in aggregate). They'll likely contribute very little to a given region's overall peakiness.

That said - the idea is certainly sound. A friend and I had talked about ways to incentivize S3 customers to do their inbound and outbound data transfers off-peak (thereby flattening it). A very small percentage of the customers drive peak, and usually by doing something they could easily time-shift.

-----


Why wouldn't they stage in S3? :)

-----

More

Applications are open for YC Winter 2016

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: