Great to see Amazon employees being allowed to talk openly about how S3 works behind the scenes. I would love to hear more about how Glacier works. As far as I know, they have never revealed what the underlying storage medium is, leading to a lot of wild speculation (tape? offline HDDs? custom HDDs?).
Amazon engineer here - can confirm that Glacier transcodes all data on to the backs of the shells of the turtles that hold up the universe. Infinite storage medium, if a bit slow.
Glacier is a big "keep your lips sealed" one. I'd love AWS to talk about everything there, and the entire journey it was on because it is truly fascinating.
My impression is that the ambiguity gives them freedom to implement in different ways across different regions and over time.
The original Glacier was very clearly tape, but given the instant retrieval capabilities the newer S3-Glacier tiers are most likely just low-margin HDDs, maybe with some dynamic powering on and off of drives/servers.
I’m sure it’s a mix. Back when it launched there were a number of rumours about it being Blu-Ray based. They had similar capacity for the space used compared to tapes, were considered very physically stable storage mediums, but had long access time as they would need to be physically moved, like tape, explaining the retrieval times.
The perceived value of results is higher if it takes a longer time to load, users feel the computer is hard at work. If its true for flight searches, its true for backup systems.
Reminds me of the automated phone systems that play random keystrokes while telling you they’re looking up your info - people don’t trust it if they come back instantly, I guess.
I recall at launch just about the only implementation detail that _was_ publicly given was that it did not involve tape. That's going to be difficult to dig up a cite on years later.
No idea how it's evolved over the years, so for all I know it's tape based these days.
Never officially stated, but frequent leaks from insiders confirm that Glacier is based on Very Large Arrays of Wax Phonograph Records (VLAWPR) technology.
We came up with that idea in Glacier during the run up to April one year (2014, I think?), half jokingly suggested it as an April Fool's Day Joke, but Amazon quite reasonably decided against doing such jokes.
One of the tag line ideas we had was "8 out of 10 customers say they prefer the feel of their data after it is restored"
This would have been incredible. But I guess I get the angle of not wanting to risk pissing off the audiophile CTO paying you 10 figures per month. Cause he can TOTALLY hear the difference listening to Dark Side of the Moon on vinyl via Monster Cables.
It's honestly super impressive that it's never leaked. All it takes is one engineer getting drunk and spouting off. In much higher stakes, a soldier in Massachusetts is about to go to jail for a long time for leaking national security intel on Discord to look cool to his gamer buddies. I would have expected details on Glacier to come out by now.
I don't expect high salary engineers leak it, but random contractor at datacenter or supplier would eventually leak if they use special storage device other than HDD/SSD. Since we don't see any leaks, I suspect that it's based on HDD, with very long IO waitlist.
HSM is a neat technology, and lots of ways it has been implemented over the years. But it starts with a shim to insert some other technology into the middle a typical posix filesystem. It has to tolerate the time penalty for data recovery of your favored HSM'd medium, but that's kind of the point. You can do it with a lower tier disk, tape, wax cylinder, etc. There's no reason it wouldn't be tape though, tape capacity has kept up and HPSS continues to be developed. The traditional tape library vendors still pump out robotic tape libraries.
I remember installing 20+ fully configured IBM 3494 tape libraries for AT&T in the mid-2000's. These things were 20+ frames long with dual accessors (robots) in each. The robots were able to push a dead accessor out of the way into a "garage" and continue working in the event one of them died (and this actually worked). Someone will have to invent a cheaper medium of storage than tape before tape will ever die.
Glacier was originally using actual glaciers as a storage media since they have been around forever. Bu then climate change happened so they quickly shifted to tiered storage of tape and hard drives.