Hacker News new | past | comments | ask | show | jobs | submit login

Great to see Amazon employees being allowed to talk openly about how S3 works behind the scenes. I would love to hear more about how Glacier works. As far as I know, they have never revealed what the underlying storage medium is, leading to a lot of wild speculation (tape? offline HDDs? custom HDDs?).



Amazon engineer here - can confirm that Glacier transcodes all data on to the backs of the shells of the turtles that hold up the universe. Infinite storage medium, if a bit slow.


Shh....


Blueray disks are thought to be the key: https://storagemojo.com/2014/04/25/amazons-glacier-secret-bd...

Some people disagree though. It’s still an unknown.


Glacier is a big "keep your lips sealed" one. I'd love AWS to talk about everything there, and the entire journey it was on because it is truly fascinating.


My impression is that the ambiguity gives them freedom to implement in different ways across different regions and over time.

The original Glacier was very clearly tape, but given the instant retrieval capabilities the newer S3-Glacier tiers are most likely just low-margin HDDs, maybe with some dynamic powering on and off of drives/servers.


I’m sure it’s a mix. Back when it launched there were a number of rumours about it being Blu-Ray based. They had similar capacity for the space used compared to tapes, were considered very physically stable storage mediums, but had long access time as they would need to be physically moved, like tape, explaining the retrieval times.


I don't buy the Blu-ray thing largely because of price, but also because Amazon is quite a conservative company and tape is the more obvious choice.


Glacier is just run on S3 with some sleep statements added.


The perceived value of results is higher if it takes a longer time to load, users feel the computer is hard at work. If its true for flight searches, its true for backup systems.


Reminds me of the automated phone systems that play random keystrokes while telling you they’re looking up your info - people don’t trust it if they come back instantly, I guess.


I am going to choose to believe this


I recall at launch just about the only implementation detail that _was_ publicly given was that it did not involve tape. That's going to be difficult to dig up a cite on years later.

No idea how it's evolved over the years, so for all I know it's tape based these days.


Never officially stated, but frequent leaks from insiders confirm that Glacier is based on Very Large Arrays of Wax Phonograph Records (VLAWPR) technology.


We came up with that idea in Glacier during the run up to April one year (2014, I think?), half jokingly suggested it as an April Fool's Day Joke, but Amazon quite reasonably decided against doing such jokes.

One of the tag line ideas we had was "8 out of 10 customers say they prefer the feel of their data after it is restored"


This would have been incredible. But I guess I get the angle of not wanting to risk pissing off the audiophile CTO paying you 10 figures per month. Cause he can TOTALLY hear the difference listening to Dark Side of the Moon on vinyl via Monster Cables.


The real problem is the lack of Star Wars references.


It's honestly super impressive that it's never leaked. All it takes is one engineer getting drunk and spouting off. In much higher stakes, a soldier in Massachusetts is about to go to jail for a long time for leaking national security intel on Discord to look cool to his gamer buddies. I would have expected details on Glacier to come out by now.


I don't expect high salary engineers leak it, but random contractor at datacenter or supplier would eventually leak if they use special storage device other than HDD/SSD. Since we don't see any leaks, I suspect that it's based on HDD, with very long IO waitlist.


HSM is a neat technology, and lots of ways it has been implemented over the years. But it starts with a shim to insert some other technology into the middle a typical posix filesystem. It has to tolerate the time penalty for data recovery of your favored HSM'd medium, but that's kind of the point. You can do it with a lower tier disk, tape, wax cylinder, etc. There's no reason it wouldn't be tape though, tape capacity has kept up and HPSS continues to be developed. The traditional tape library vendors still pump out robotic tape libraries.

I remember installing 20+ fully configured IBM 3494 tape libraries for AT&T in the mid-2000's. These things were 20+ frames long with dual accessors (robots) in each. The robots were able to push a dead accessor out of the way into a "garage" and continue working in the event one of them died (and this actually worked). Someone will have to invent a cheaper medium of storage than tape before tape will ever die.


Glacier was originally using actual glaciers as a storage media since they have been around forever. Bu then climate change happened so they quickly shifted to tiered storage of tape and hard drives.


It's just low powered hard drives that aren't turned on all the time. Nothing special.


Are there any public details on how Azure or GCP do archival storage?


Just look at other clouds. I doubt amazon is doing anything special. At least they don't reflect any special pricing.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: