Hacker News new | past | comments | ask | show | jobs | submit login

I thought this was debunked at the time?

When I was running exchange systems, our biggest challenge was delivering IOPS. We had to use SAN, and wasted significant storage because we'd spend our IOPS budget at 40-60% storage capacity.

I figured at their scale they would have similar problems.




IOPS isn't important for glacier. You just upload to some buffer and the eventually move it to the slow storage.

Reading is pretty slow from glacier.


He meant that if EBS has the same issue as his Exchange servers. To explain in more detail: You have 10TB disk space with 10.000 IOPS, your users buy 4TB with 10.000 IOPS then you have 6TB of storage wasted.

If Amazon has that problem with EBS, then selling that storage capacity as Glacier and using just the idle IOPS (or leaving a small bit reserved) allows them to sell capacity that would otherwise just be useless.


Aren't IOPS incredibly expensive on Glacier? There was that guy who paid $150 for a retrieval. https://medium.com/@karppinen/how-i-ended-up-paying-150-for-...


That's the point. They aren't trying to sell IO with glacier, since they've already saturated that with EBS. They just want to sell the spare storage capacity, ideally in a write once read never use case. That way they can achieve 100% utilization out of the drives.

So if you use a lot of IO with Glacier, they are going to charge you like crazy, since you're potentially impacting EBS customers.


I'm that guy. I should update the post; Amazon "fixed" the retrieval fees in late 2016 and I would've paid less than a dollar had the current pricing scheme been in effect when I did the retrieval.


Sorry I didn't really finish the point.

With exchange, we had all of this expensive, reliable SAN storage that would be perfect for a low requirement glacier like solution. Unfortunately, we lacked the ops mojo to pull it off.


Archive is not about iops. its about streaming bandwidth.

for example I used to look after a quantum iScaler 24 drive robot, each drive was capable of kicking out ~100 megabytes a second. It was more than capable of saturating a 40 gig pipe.

However random IO was shite, it could take up to 20 minutes to get to random file. (Each tape is stored in a caddy of (from memory) 10 tapes, There is contention on the drives, and then spooling to the right place on the tape.)

Email is essentially random IO on a long tale. So, unless your users want a 20 minute delay in accessing last year's emails, I doubt its the right fit.

The same applies to Optical disk packs (although the spool time is much less.)


I think that's the point - the e-mail is using up all of the IOPs. There would be a small amount of IOPs left over that could deal with streaming data. The data is unlikely to be accessed on a regular basis. The data not used by e-mail would then be used for the archive - data that's pretty much write-only.


It makes sense for email when you aren't giving your users access to their old email, but storing it for regulatory compliance purposes.


Why do you care how fast you can read it back when you're storing it for regulatory purposes? Isn't that a sunk cost? Buy high capacity, high reliability and don't care for the read speed?


With SANS, the iops budget is a function of your hardware config. If you want more IOPS, you get more RAM/SSD involved. More importantly, Amazon gets to sell EBS on their terms: a specific amount of IOPS with a specific amount of storage. If you want more IOPS, you have to buy more EBS. The "wasted storage" you're thinking of would be on your instance using EBS, not EBS itself.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: