Hacker Newsnew | comments | show | ask | jobs | submit | sintaks's commentslogin
sintaks 687 days ago | link | parent | on: Amazon Glacier

I don't think they're loss-leadering on storage, but if they are, they don't think they will be for long. AWS (EC2 and S3 in particular) does very well when it comes to profit margins. I suspect they'd like to keep it that way, and that whatever they're charging gives them some slice of profit, however small.

-----

sintaks 687 days ago | link | parent | on: Amazon Glacier

Almost spot on. See: http://news.ycombinator.com/item?id=4416065

-----

sintaks 687 days ago | link | parent | on: Amazon Glacier

Close. Tiered, yes. But remember who we're talking about.

First, no tape. The areal storage density of tape is lower than hard disks. Too many moving parts involved. Too hard to perform integrity checks on in a scalable, automated fashion without impacting incoming work.

Second, in order to claim the durability that they do (99.999999999%), that means every spot along the pipe needs to meet those requirements. That means the "near-line HDD array" for warm, incoming data needs to meet those requirements. Additionally, if the customer has specified that the data be encrypted, it needs to be encrypted during this staging period as well. It also needs to be able to scale to tens if not hundreds of thousands of concurrent requests per second (though, for something like Glacier, this might be overkill).

They've already built something that does all that. It's called S3. The upload operations likely proxy to S3 internally (with a bit of magic), and use that as staging space.

After that, the bottleneck is likely I/O to Glacier's underlying storage - but again, not tapes. See this post for deets: http://news.ycombinator.com/item?id=4416065

-----

sintaks 687 days ago | link | parent | on: Amazon Glacier

See my post up top: http://news.ycombinator.com/item?id=4416065

-----

sintaks 687 days ago | link | parent | on: Amazon Glacier

Data cryonics indeed. One of the early names kicked around (and, indeed, its working name for over a year) was Cold Storage.

-----

sintaks 687 days ago | link | parent | on: Amazon Glacier

Yes. The forthcoming S3 Lifecycle Policy API additions should allow for exactly this, automatically.

-----

sintaks 687 days ago | link | parent | on: Amazon Glacier

Dropbox stores their stuff in S3 a little different. It's not a 1:1 correspondence between user files and objects under Dropbox's S3 account. The fact that they use S3 as their backing store means very little. It certainly sounds good to have S3 in back when you talk about scalability and durability, but 1) they could just as easily use something else, and 2) depending on their sharding strategy, a single lost object could impact multiple files at the user level.

-----

ghshephard 686 days ago | link

Right - the point I was trying to make, is he was putting all his eggs in one basked. If anything catastrophic happened to S3, he might lose both his S3 as well as his Dropbox backups.

If you are going to the effort of having dual-backup systems, may as well try and find something that can't be impacted by a single disaster.

-----

sintaks 686 days ago | link

Ah, right - definitely.

-----

sintaks 687 days ago | link | parent | on: Amazon Glacier

He could just turn on object versioning.

-----

sintaks 687 days ago | link | parent | on: Amazon Glacier

Amazon has ridiculous internal bandwidth. The costly bit is external. The time delay is largely internal buffer time - they need to pull your data out of Glacier (a somewhat slow process) and move it to staging storage. Their staging servers can handle the load, even at peak. GETs are super easy for them, and given that you'll be pulling down a multi-TB file via the Internet, your request will likely span multiple days anyhow - through multiple peaks/non-peaks.

-----

tomkarlo 686 days ago | link

I was referring to the external bandwidth. Even if pulling down a request takes hours, forcing them to start off peak will significantly shift the impact of the incremental demand. I'm guessing that most download requests won't be for your entire archive - someone might have multiple months of rolling backups on Glacier, but it's unlikely they'd ever retrieve more than one set at a time. And in some cases, you might only be retrieving the data for a single use or drive at a time, so it might be 1TB or less. A corporation with fiber could download that in a matter of hours or less.

-----

sintaks 686 days ago | link

I get it - but I'm arguing that the amount of egress traffic Glacier customers (in aggregate) are likely to drive is nothing in comparison to what S3 and/or EC2 already does (in aggregate). They'll likely contribute very little to a given region's overall peakiness.

That said - the idea is certainly sound. A friend and I had talked about ways to incentivize S3 customers to do their inbound and outbound data transfers off-peak (thereby flattening it). A very small percentage of the customers drive peak, and usually by doing something they could easily time-shift.

-----

sintaks 687 days ago | link | parent | on: Amazon Glacier

Why wouldn't they stage in S3? :)

-----

More

Guidelines | FAQ | Lists | Bookmarklet | DMCA | News News | Bugs and Feature Requests | Y Combinator | Apply | Library | Contact

Search: