Glacier is not actually tape, the fancy tape-robot videos nonwithstanding. Most ...

remram · 2024-05-20T22:54:16 1716245656

Interesting. Why does it take hours to retrieve then? Network/disk bandwidth really is that bad? Also do you have a source for this?

swiftcoder · 2024-05-21T08:29:08 1716280148

It's mostly just bandwidth prioritization, slotting large transfers in when there is excess bandwidth.

You can tell this is the case for at least the flexible retrieval tier, because small objects can be returned in a few minutes, whereas larger requests take hours - if the files were actually on a tape drive somewhere, small requests couldn't be fulfilled dramatically faster than large ones, given that tape has shitty random-access performance.

(I used to work on a downstream team at AWS)

remram · 2024-05-21T13:39:54 1716298794

I believe you, I can see how it would make sense that AWS would create a tier to exploit the spare capacity in S3 disk bandwidth, just like they did for EC2 spare VM capacity with spot instances. Still it doesn't make intuitive sense to me how the performance AND the price can be so far. That's why I'd love a longer write-up if you know of any.

It's also weird that the retrieval gives you a regular fast S3 object you can then access. Given that it's already on that hardware, is a copy even happening?

magarnicle · 2024-05-21T00:04:14 1716249854

To discourage access, maybe? That pushes you to use more expensive options.

remram · 2024-05-21T17:05:49 1716311149

This doesn't hold, not having the cheap option encourages you to use the expensive option even better...