I thought this was debunked at the time? When I was running exchange systems, ou...

_jcwu · on Jan 15, 2017

IOPS isn't important for glacier. You just upload to some buffer and the eventually move it to the slow storage.

Reading is pretty slow from glacier.

t0mas88 · on Jan 15, 2017

He meant that if EBS has the same issue as his Exchange servers. To explain in more detail: You have 10TB disk space with 10.000 IOPS, your users buy 4TB with 10.000 IOPS then you have 6TB of storage wasted.

If Amazon has that problem with EBS, then selling that storage capacity as Glacier and using just the idle IOPS (or leaving a small bit reserved) allows them to sell capacity that would otherwise just be useless.

nickodell · on Jan 16, 2017

Aren't IOPS incredibly expensive on Glacier? There was that guy who paid $150 for a retrieval. https://medium.com/@karppinen/how-i-ended-up-paying-150-for-...

qeternity · on Jan 16, 2017

That's the point. They aren't trying to sell IO with glacier, since they've already saturated that with EBS. They just want to sell the spare storage capacity, ideally in a write once read never use case. That way they can achieve 100% utilization out of the drives.

So if you use a lot of IO with Glacier, they are going to charge you like crazy, since you're potentially impacting EBS customers.

markonen · on Jan 16, 2017

I'm that guy. I should update the post; Amazon "fixed" the retrieval fees in late 2016 and I would've paid less than a dollar had the current pricing scheme been in effect when I did the retrieval.

Spooky23 · on Jan 15, 2017

Sorry I didn't really finish the point.

With exchange, we had all of this expensive, reliable SAN storage that would be perfect for a low requirement glacier like solution. Unfortunately, we lacked the ops mojo to pull it off.

KaiserPro · on Jan 15, 2017

Archive is not about iops. its about streaming bandwidth.

for example I used to look after a quantum iScaler 24 drive robot, each drive was capable of kicking out ~100 megabytes a second. It was more than capable of saturating a 40 gig pipe.

However random IO was shite, it could take up to 20 minutes to get to random file. (Each tape is stored in a caddy of (from memory) 10 tapes, There is contention on the drives, and then spooling to the right place on the tape.)

Email is essentially random IO on a long tale. So, unless your users want a 20 minute delay in accessing last year's emails, I doubt its the right fit.

The same applies to Optical disk packs (although the spool time is much less.)

ajosh · on Jan 17, 2017

I think that's the point - the e-mail is using up all of the IOPs. There would be a small amount of IOPs left over that could deal with streaming data. The data is unlikely to be accessed on a regular basis. The data not used by e-mail would then be used for the archive - data that's pretty much write-only.

JimmyAustin · on Jan 15, 2017

It makes sense for email when you aren't giving your users access to their old email, but storing it for regulatory compliance purposes.

robbiep · on Jan 16, 2017

Why do you care how fast you can read it back when you're storing it for regulatory purposes? Isn't that a sunk cost? Buy high capacity, high reliability and don't care for the read speed?

cbsmith · on Jan 16, 2017

With SANS, the iops budget is a function of your hardware config. If you want more IOPS, you get more RAM/SSD involved. More importantly, Amazon gets to sell EBS on their terms: a specific amount of IOPS with a specific amount of storage. If you want more IOPS, you have to buy more EBS. The "wasted storage" you're thinking of would be on your instance using EBS, not EBS itself.