Hacker News new | past | comments | ask | show | jobs | submit login
Amazon’s Glacier secret: BDXL? (storagemojo.com)
276 points by hemancuso on April 25, 2014 | hide | past | favorite | 132 comments

The author quickly dismisses hard drives because at the time of the Glacier launch SMR drives were to expensive because of the Thai flood. But after a few years of running S3 and EC2 Amazon must have tons of left-over hard drives which are now simply too old for a 24/7 service.

So what do you with those three year old 1 TB hard drives where the power-consumption-to-space ratio is not good enough anymore? Or can of course destroy them. Or you actually do build a disk drive robot, fill the disk with Glacier data, simply spin it down and store it away. Zero cost to buy the drives, zero cost for power-consumption. Then add a 3-4 hour retrieval delay to ensure that those old disk don't have to spin up more than 6-8 at times a day anymore even in the worst case.

But that's just my personal theory anyway.

This is probably close, at least for launch.

The problem with this theory, however, is that tape still would have been cheaper with roughly the same footprint, and tape has the benefit of often being forward-compatible too, as the drives improve, so does the storage capacity of the tape.

The author dismisses that amazon wasn't using tape, but I haven't seen much evidence to support that necessarily.

Maybe the reality is they probably use a bit of everything. A robotic disk library, a robotic tape library, maybe a robotic optical library. Maybe they're secretly ahead of the rest of current tape technology and are getting 20TB out of a single tape. They could be using custom hard drives, or even having a robotic platter library.

One thing for sure is that it's not active disk drives, and I don't believe they keep all the data on optical disks alone, given that optical disks degenerate much quicker than magnetic storage.

I dont understand how tape infrastructure would be cheaper than free used disks that are on only 4-5 times a day for mostly very short periods of time (actually some discs would be off for months). Disks are easy to use, they are not that easy to damage and you can have each copy of data synchronised on 2-3 different DCs easily using Amazon network during night times (low network usage).

The server and network to host those disks ends up costing much more than the disks, so even MAID built from free disks isn't that cheap.

Wouldn't be possible to just stack them in custom/modified racks, connect them to the tape and custom switch? Then get very cheap servers to switch through them? The hard drives don't generate much heat anyways so only basic ventilation would be needed. Using enterprise servers for Glacier sounds like an expensive luxury. Most discs would be mostly offline anyways (small amount of changes) and I bet more than a half discs would be used less than once a week so cost of. Then if you want to increase speed of the whole structure put 1u server with 4 x 4TB SATA3 discs in Raid10 for caching the most used/changed parts, connect server to 1Gbps line and you are all done on budget. The cost is just 1 simple server, modification to racks (that would cost really small money), tapes for discs and custom hdd switch.

Disks internally have lots of extra space to handle errors and are not pushed to the limits of their physical storage. Amazon could have partnered with a disk manufacturer or written their own firmware to have disks that are somewhat lossy but 3x of what reliable drives provide.

Its only now that HDDs are close enough to the density of tape (4Tbs) however lto6 is half the price, and designed for the job.

HDDs are not designed for long term archive. They simply fail hard when being moved or switched off for a long time.

> HDDs are not designed for long term archive. They simply fail hard when being moved or switched off for a long time.

This is also surprisingly easy to do with tape - simply leaving an LTO tape on its side can push failure rates to significant levels within a year.

If you care about data there's no substitute for multiple copies which are regularly verified. The idea that you can leave something on the shelf and expect to reliably read it is a dangerous myth. If you care about archival, build a system with the staffing and procedures needed to make that happen. This is enormously easier and cheaper to do with spinning disk below a certain level but if you have enough data the lower media cost of tape will balance out the increased overhead.

> simply leaving an LTO tape on its side can push failure rates to significant levels within a year.

Do you have a source for that? I'd not heard of that before, I'm now interested to know what % the failure rate increases by

Thank TheCondor for posting something more official.

I'd never heard about that before, either (not having had tape storage as a primary job), but a colleague was quite surprised by 20-30% failure rates for tapes first used within a year and asked our drive vendor about it. At the time, there was nothing mentioned on the media we bought and the tape vendor didn't have any official docs but the drive technician we dealt with said that it was well known among the support group - apparently a lot of customers either didn't know or didn't assume it'd be so dramatic.


> HDDs are not designed for long term archive. They simply fail hard when being moved or switched off for a long time.

A single disk certainly is bad for long term archive. But when you have thousands of disks and use multiple copies for all data with some fancy error correction you can still get extremely high reliability, you just need to calculate in the expected failure rate.

High failure == high cost.

Which is why I would build a HD robot that doesn't manipulate the disks, but the connector.

You don't even need a robot. The electronics needed to digitally "switch" which disk is connected to the server would be extremely inexpensive. I'm not sure why the original article even considers the idea of a disk drive robot.

the connectors are only rated for <1000 insertions. Tape is where its at.

The best part about a robot is that you can take things out of a library and move them to a somewhere without power or epic cooling.

The main problem I see with this is that if you have old disks the last thing you want to be doing is spinning them up and down all the time.

The drives and firmware you want for 24/7 use are not the same as what you want for intermittent use. On big old raid arrays, the scariest thing was powering them down because for sure, some of the disks would not come back again.

I was under the impressions that old disks generally fail by getting stuck. If they spin them up a few times a day that would seem to solve that problem.

They could even have them just plugged in to power and laying in racks with custom firmware that triggers drives to automatically spin up ever few hours/days. Then they could just be manually plugging them in to retrieve data when data from a drive is requested.

There's an interesting dead comment to this:

secretamznsqrl 7 hours ago | link [dead]

AWS does not re-use disks under any circumstance. It is strictly forbidden to protect customer data. Disks also do not leave the datacenter until they have been degaussed AND crushed.

How is that interesting? Like when CNN speculates MA 370 was sucked in by a mini black hole or dolphins emit pings?

AWS does not re-use disks under any circumstance. It is strictly forbidden to protect customer data. Disks also do not leave the datacenter until they have been degaussed AND crushed.

Your personal theory is very correct.

A “former S3 engineer” commented on HN during the Glacier launch. Nothing verifiable, but it suggests some contrast with the idea that Glacier is optical backed [also interesting: he suggests that S3 has an erasure encoding strategy.]


“They’ve optimized for low-power, low-speed, which will lead to increased cost savings due to both energy savings and increased drive life. I’m not sure how much detail I can go into, but I will say that they’ve contracted a major hardware manufacturer to create custom low-RPM (and therefore low-power) hard drives that can programmatically be spun down. These custom HDs are put in custom racks with custom logic boards all designed to be very low-power. The upper limit of how much I/O they can perform is surprisingly low – only so many drives can be spun up to full speed on a given rack. I’m not sure how they stripe their data, so the perceived throughput may be higher based on parallel retrievals across racks, but if they’re using the same erasure coding strategy that S3 uses, and writing those fragments sequentially, it doesn’t matter – you’ll still have to wait for the last usable fragment to be read.”

"he suggests that S3 has an erasure encoding strategy"

Apologies for the diversion - what does this mean? Does it mean that when an item is erased from S3, S3 "encodes" the data so that the next person who gets the same physical disk space can't read what was there before?

No, "erasure encoding" refers to a specific type of forward error correction.



If you push hard enough erasure coding, you can use extremely unreliable components throughout the infrastructure while maintaining reliability. That introduces high latency depending on the hardware specifics, but that's what Glacier is about.

Imo, they found a sweet spot of $/GB on a much higher latency/lower reliability region (this is analogous to increasing overall capacity in a communication channel by instead of using a few highly reliable high powered symbols use optimally many unreliable low power ones with error correction) -- disk manufacturers already use this aggressively for soft failures within the disk, but are obviously restricted on more systematic failures (i.e. if the whole drive fails there's nothing they can do).

If a single drive has a P_failure, with many drives they can achieve close to 1-P_failure reliable storage capacity [1]. So all they have to do is seek the optimum

$/GB_opt = min over C,D [ C/(D*(1-P_failure(C))) ], where C is the cost per drive and D is the drive capacity

[1] http://en.wikipedia.org/wiki/Binary_erasure_channel

Yes, but this doesn't explain 4-5 hours data access latency.

Latency could be artificial, in order to get:

-differential pricing

-ability to switch to transparently switch to slower technologies in future

If they're aggressive enough, I'd say there could justifiably be a quite large latency. Say for example their block size is 55 disks with 10% redundancy. Then the seek time is the maximum seek time of the lowest 50 seek times within the set of disks which haven't failed. Now there's a queue to read each drive, and it may very well frequently be a huge maximum queue pretty much every time. Even if the queue is typically short, they would still have to show a typical high value, which I'd say is essentially read time of an entire disk.

Now factor that the disks are both large and crappy. A 120 Mb/s read speed and 1Tb disk size would imply 8000 secs ~ 2 hours. Factor possibility of differential pricing as you mentioned (even longer queues), and you may get an upper bound of 3 or 4 hours.

I'm just speculating though.

S3 has to handle random access patterns, which are the hardest to optimize.

I wouldn't be surprised if Glacier's latency wasn't purely artificial so much as it was a deliberate design decision so the architecture can be very different from S3: pure streaming I/O, huge block sizes, concurrent access is nowhere near the same, etc. That much time allows really aggressive disk scheduling and it'd make it much easier to do things like spread data across a large number of devices with wide geographic separation.

The current strategy for highly durable object stores is [largely] to have 3+ replicas of the data. If you utilized erasure coding [Cleversafe does this, others are working on it], you can achieve very high durability by spreading data over multiple disks or even datacenters such that if a few of them fail, you still can recover all of the information. But, like with RAID, there are many tradeoffs in how you configure such a system. Erasure coding makes sense for rarely-accessed data since IO is considerably more expensive in a networked setting.

Mozy has been doing this for nearly a decade.

I read it as saying that the data has redundancy added to it so that some amount of unreadable ("erased") bits can be interpolated. CD's do this with Reed-Solomon encoding, and there's a decoder in the hardware.

Do BDXL discs have greater longevity than other types of optical media?

Back in 2000, our studio switched from archiving our audio recordings on DAT to CD-R. Thinking the greatest threats were loss and scratches, they made three copies of every disc and stored them in different locations. Around 2007, all of those CD-R discs started becoming unreadable around the same time.

Video recordings were archived to DVD-R starting in 2003 with the same three copy approach. In 2009, after having successfully rescued all our audio recordings on to RAID5 network attached storage, we weren't so lucky with saving our early DVD-R data.

I always assumed the reason DVD-R discs lasted less time than CD-R discs before chemical decomposition was because of the greater density, but I never looked into it enough to know for sure.

I have to imagine if Amazon is indeed doing this, they have some sort of plan in place to write all data to fresh optical media every five years or so. If so, that would create a very different cost equation.

For the same reasons, I've been investigating M-Disc. Requires special writer (but sub-$100), but can be read by any DVD reader.

Etches instead of dye, result is claimed 1000 year lifetime. DOD has evidently certified.


However, I worry about even finding optical readers in 10 years time.

Interesting. I hadn't heard of M-Disc. Thanks for the link!

The worry of not being able to find a device to read/play back your media is a common one for all storage types, analog and digital. For example, we have hundreds of reels of quarter inch tape from the fifties and sixties down in our library. The tapes themselves are safe (some may need a day in an easy-bake oven), but otherwise they still sound great. We recently had our Studer A-820 refurbished and it's probably in better than new condition, but I worry there won't be any people to do the service, or parts to do the service with if it needs attention in ten years.

The more obscure, the medium, the trickier it gets. We have a bunch of early digital recordings stored on Betamax tapes, that need both a working Betamax deck and a PCM encoder-decoder box in order to be read. Fortunately, those have all been transferred now.

This is why I'm a believer in thinking of maintaining an archive as being an active rather than static process. It's important to be periodically re-evaluating your digital assets to make sure they can be losslessly transferred to current file formats and modern storage media. It was probably never a good idea to simply "put it on a shelf and forget about it", but thankfully with digital assets, these migrations can be lossless, automated, and tested.

If you don't have restore drills you don't have backups.

Yes, of course. However, backups != preservation. You could have a textbook backup strategy with restore drills and fixity checks making sure that your integrity is 100%. If your audio is stored in Sound Designer II files and your print documents are Word Perfect files, you are probably not being a good steward of your data.

M-Disc BD is doing the right thing by being readable by common consumer drives, though -- I'd be a lot more optimistic about finding or salvaging a BD drive to read some discs in 50-100 years, since even just the PS3 shipped in big enough volume that there will be some forgotten somewhere, than any specialty formats.

The NASA tapes problem (where they couldn't find drives) is definitely a concern on longer time scales. USB seems widespread enough, too.

It would be cute if someone made a self-contained archival device with display, designed for 100+ year lifespan. Solar powered (although generally external power is a simple enough interface that as long as specs are given, it shouldn't be too hard to recreate), multi-language, redundant, etc. Ideally with periodic integrity checks, a duplication function, etc. built in.

Seems kind of like an Internet Archive project, or OLPC or something.

You just described the librarian from the movie Timeline. I think the issue is that we still don't really have a technology or medium that doesn't degrade when unplugged or plugged. Chemical mediums such as most optical media fail, etching is searchable scratchable, etc. Records seem to be the most reliable but are of limited storage ability and require careful storage and switching.

Those kinds of drives are being used by individuals a lot for archiving things like genealogy. I would expect there will continue to be readers (perhaps not <$10) for a very long time.

According to something of an expert in the field (she holds a fist full of patents, several of which are used in every hard drive you own), DVD-R's pushed CD-R technology too far. Even using Misui/MAM-A Gold media is no great guarantee.

CD-Rs ... who made the discs, and how did you store them? Or did you buy a huge batch at once? I've had great success for over a decade with Taiyo Yuden CD-Rs, over 3,000 recorded and not a single failure to read that wasn't due to a bad read drive, easily solved by using another. Although those retrieval results aren't yet statistically significant.

I didn't start working here until 2006, so I don't really know how they were purchased, but they didn't standardize on Mitsui MAM-A until 2002 or so. Before then, there was an assortment of gold media from different brands. The masters have been stored in standard jewel cases in drawers in our office.

I've also had great luck with the Taiyo Yuden CD-Rs. We go through about a thousand discs a year for people who prefer to have a CD copy of their music to being able to download a file, and I can't remember the last time I had a failure. However, now that our master copies are 24-bit wav files on network storage, I haven't been as concerned with how those discs have been aging.

I suppose we probably have enough samples for an interesting study on the longevity of optical media, but unfortunately I just don't see us having the time to devote to something like that.

Just to add some more anecdotal evidence for optical: my group stored satellite data on DVD-Rs, starting in 2003, and in 2012 the earliest discs were still readable. Luckily, we had several forms of back up of the data, so the loss of the DVD-Rs wouldn't have affected us.

This is why I was suprised that FB were so keen on BDXL. There is no long term data on stability. they are incredibly fragile and prone to delamination.

-R discs store data via a dye layer that changes color when hit by the laser. The dyes are not known to hold color durably. I don't have the links now but I've read studies that estimated CD-Rs could fail within 7-10 years.

-RW discs supposedly last a lot longer because the RW tech uses a phase change metal instead of dye, and the phase changes are more durable.

Were you using gold archival quality discs?

I just went to the drawer to check - It appears as though most of the earliest CD-R discs were HHB gold archival discs. Some of them came from external mastering facilities, so would be on something else. Around 2002, they standardized on Mitsui MAM-A for the archival copies. Interesting enough, those discs have had very few issues.

It appears as though the early DVD-R discs were Mitsui MAM-A, with us switching over to Taiyo Yuden in early 2007. Since we have all that stuff on network storage, I haven't gone back to check how the Taiyo Yuden discs have fared over time.

When I was last buying media, Taiyo Yuden DVD-R media cost one cent more than their famed CD-R media, for what that's worth. Never had a problem with it, but I've never used it for anything really important (e.g. installation discs were I kept a hard disk copy).

This is the first time I can remember ever hearing of HHB gold archival CD-R discs. That's probably not a good sign given how much I was into this field back then, they certainly weren't on the recommended list of anyone I respected. After Kodak stopped making their gold discs, Taiyo Yuden, or Mitsui/MAM-A if you felt like it gave you extra protection.

Ah, you did check to make sure that each disc burned could be read back, didn't you?

I didn't start working here until 2006, so I don't really know the details of what the process was like back then for testing discs after they had been burned. I assume at least a verification step was involved.

Now, when I want to see how bad a disc has gotten, I use PlexTools to do a Q-Check. However, we haven't considered anything on optical media to be a master copy for a while (at least five years). CDs only get burned when someone wants to listen to something in their car.

DVD-RAM discs were much more reliable than DVD-RW, and the chemistry has only gotten better since then. And write-once BD-Rs actually use melting metal instead of dye, so they are extremely stable with some companies advertising 50 year shelf lives.

There are several interesting "dead" comments on this thread from people who say they have specific knowledge:

secretamznsqrl 20 minutes ago | link [dead] [-]

  Here is a Glacier S3 rack:
  10Gbe TOR
  3 x Servers
  2 x 4U Jbod per server that contains around 90 disks or so
  Disks are WD Green 4TB.
  Roughly 2 PB of raw disk capacity per rack
  The magic as with all AWS stuff is in the software.

jeffers_hanging 1 hour ago | link [dead] [-]

  I worked in AWS. OP flatters AWS arguing that they take
  care to make money and assuming that they are developing 
  advanced technologies. That't not working as Amazon. 
  Glacier is S3, with the added code to S3 that waits. That 
  is all that needed to do. Second or third iteration could 
  be something else. But this is what the glacier is now.

The math on the first "dead" post only works out to 1PB per rack. Unless there is somehow a way to jam 90disks into 4U. The backblaze and supermicro 45 disk 4U chassis suggest that would be pretty tough. Besides, there is still a good bit of rack left at 24U of JBOD plus 3 servers and TOR.

http://www.supermicro.com/products/chassis/4U/847/SC847DE26-... 90 disks in 4U chassis. If you don't care about power / hot swap ( I doubt there's a datacentre monkey running around swapping disks for Glacier or actually anything -- Google anecdotally was repairing servers by removing two rackful worth of them at the same time ) then instead of piling them four deep you can pile them six deep (36" server depth / 6" disk depth) reaching 135 or so disks.

This is awesome, wow. I wonder how you cool something like that without most of it being powered down.

The disk density, at least, isn't entirely out of line with what Amazon have publicly stated they have. Amazon have stated that their density is higher than Quanta products, and Quanta make a 60 disk 4U chassis, the M4600H: http://www.quantaqct.com/en/01_product/02_detail.php?mid=29&...

2PB per rack sounds about right for a very tightly packed rack. Most DC's can't supply power and cooling for that kind of density, though.

The power requirements would be precisely why a Glacier HDD rack has only a fraction of its HDDs powered on at any given time. This also explains the 3-5 hours latency: you have a queue of jobs and you have to wait for other jobs to finish (eg. reading gigabytes of data) before your drive can be powered on.

It all makes sense.

That density is doable if the drives spend most of their time powered down, which would fit with the potential Glacier restore delay while they wait for the discs you need to come round in the power cycle.

I never worked on glacier, but I can second that at launch it was commonly internally discussed as being an s3 front-end.

Facebook is doing this. They have a video of the machines too! https://www.facebook.com/photo.php?v=10152128660097200&set=v...

That's really cool. I feel like it'll be years before the rest of the world sees that in datacenters.

have you never seen a robotic tape library?

Here's a video of a 40PB library at NERSC:


Therefore, by a process of elimination, Glacier must be using optical disks. Not just any optical discs, but 3 layer Blu-ray discs.

What? You partially eliminate a few strawmen, and thus conclude that your pet theory is the only answer? The logic in this article is so weak that I presume this must be an example of 'parallel construction'. I have to guess that someone he trusts but is not allowed to quote has told him that Amazon is using BDXL disks, and he's pretending to reason himself to the same conclusion. My next best guess is that he's a few weeks late on his April Fool's post.

Assuming aggressive forward pricing by Panasonic or TDK, Amazon probably paid no more than $5/disc or 5¢/GB in 2012. Written once, placed in a cartridge, barcoded and stored on a shelf, the $50 media cost less than a hard drive – Blu-ray writers are cheap – Amazon would recoup variable costs in the first year and after that mostly profit.

OK, maybe the April Fool's explanation should come first. Because if you were trying to come up with plausible logic, surely you could do better than declaring that Amazon's private price for BDXL media is 1/5th that known to the public and that all other costs are zero. And even if one were to assume this admittedly unlikely scenario, wouldn't you need to write more than one disk for redundancy? But that's easy to solve: just wave the magic wand and halve Amazon's secret price down to a level where it make sense.

Powered down hard drives seem like a much simpler explanation. The robotics don't seem that difficult if instead of bringing the disk to the backplane you keep the disk fixed in place and just attach the cable. Presumably you could design your own sturdier and easier to align connector, and leave the adapter in place. Or maybe there is some way to do it with a mechanical switch? Or do you even need a robot? If you built a jig so you could plug in a whole drawer of drives at once, maybe you could just hire someone to do it.

Tape is dismissed far to easily, tape is around $25-40 a tape, which $15 a TB.

Although Glacier is interesting and I use it every day for some of my backups (a few TB of ongoing backups via Arq).

But I have always assumed it was just -- like the mysteriously/inappropriately killed comments from jeffers_hanging on this thread claim -- just S3 with hard delays enforced by software.

If you are already doing S3 for the entire world at exabyte scale, and you have petabytes of excess capacity and a bunch of aging, lower-performance infrastructure sitting around... do you really need to fuck with risky new optical disk formats?

It seems to me that you could just start with selling slowed-down rebranded S3, and then iterate on it.

IIRC there was a similar rumor that S3 reduced redundancy storage was just cheaper rates for the exact same service. Not sure if that is true, but certainly meets the minimum viable product.

Glacier as S3 with sleeps seems to be a pretty reasonable extrapolation of that same idea. Plus, the sleeps are long enough that if and when they build it for real, it could be economically viable.

That said, if the quoted 0.3 cents is true, it should be viable as it is by just stuffing way more density in a rack and keeping them powered down most of the time... Though that rack would probably weigh tons, so you'd probably want to reinforce the floor of the data center a bit. We saw equinix facilities with concrete floors and ventilation delivered from above, so something like that could be an option.

I was thinking the basic strategy of Glacier was to power down whole racks, accumulates requests until they have enough of them to power up the rack and serve them all in one go. If you combine this with some kind of erasure coding (typically Cauchy-Reed-Solomon) you get even more freedom as you can treat powered down racks as temporary erasures. Anyway, I'm pretty sure it's tape nor optical.

Amazon Glacier has some small letter hidden costs that can pile up to THOUSANDS of dollars quickly if you retrieve the data "too fast". It can happen with only a few hundred gigabytes.

I fail to see how it is useful as a backup service. This got me into serious trouble. Built a whole backup system around it and now it is not what it was supposed to be.

I mean, it's not like Amazon suffers from opaque pricing. The offer with Glacier is exactly what they said it would be, and the trade off for lower per-gig costs is the toll on retrieving the data. That's still perfectly reasonable IMO; I have a couple of terabytes backed up, and worst-case if I actually need to hit glacier, it'll cost me a few hundred dollars.

In general I agree, but the problem with Glacier's retrieval pricing in particular is that it's quoted as dollars-per-byte (like all their other prices), but in fact it's really dollars-per-(peak-bytes-per-second).

It's easy to get a nasty surprise. If I pull enough to saturate my 28Mbps DSL downstream for 4 hours (~51GB), I'd pay nearly $100 that month, not 51GB*($0.01/GB)=$0.51. You really need a Glacier-aware restore program that is told your maximum budget, knows the pricing formula, and carefully schedules the retrievals to keep the peak rate below that threshold.

It's for long term offsite backups/archival. If you need to retrieve lots of data from it on a regular basis (e.g. because of a single dead drive) you're doing something wrong.

If our office burns down, and I had our offsite backups in Glacier, the retrieval costs are peanuts compared to cost of losing the data (going out of business). If my home burns down, and I have my offsite backups in Glacier, a few hundred dollars to retrieve data would be nothing compared to the emotional loss of years worth of photos etc.

But I have triple copies at home - minimum - of almost all of my data (mirrored drives + regular snapshots on a third drive; and a lot of the stuff is also synced to/from one or more other computers), and similar setups at work: An offsite backup is last resort.

The thing is, it's not marketed as a backup service. It's an archive service. If you have business requirements to keep data around for a certain length of time, and things will go very badly (legally speaking) if the data disappears, you might decide to have Amazon store it in Glacier. You likely wouldn't ever access it unless a lawyer or auditor wanted it; if your systems failed, you'd hopefully be restoring from a short-term backup rather than long-term archives.

Amazon's data transfer pricing is awful. I have no idea how Droplr and Cloudapp make money serving files directly from S3.

Biggest guys peer with Amazon, biggish guys get special pricing. Other burn through VC money.

I am an AWS engineer but note that I am not affiliated with Glacier. However James Hamilton did an absolutely amazing Principals of Amazon talk a couple of years ago going into some detail on this topic. Highly recommended viewing for Amazonians.

From what I remember from it, its custom HDs, custom racks, custom logic boards with custom power supplies. The system trades performance for durability and energy efficiency.

Robin Harris' deductions in this article are worthy of Arthus Conan Doyle...

Glacier uses hard drives, and the thing the author missed is that the MTTF of infrequently utilized drives is much better than frequently accessed drives.

It is interesting to note that while S3 prices have more than halved, Glacier prices have remained the same during that period. Does this mean that extremely low margins is at play here?

Or is it that Amazon has no competitor here yet, good enough to push it to lower the prices?

Could be either, but note that Amazon has a history of introducing new products at or below cost and then gaining a profit margin by keeping their prices fixed while their costs drop. A while back I heard that the fully loaded cost (to Amazon) of m1.small was $0.15/hour when EC2 launched, but they priced it at $0.10/hour because they knew their costs would drop and they wanted to win market share.

That history may actually be over as Google is aggressively undercutting them and they are scrambling to match pricing.

Glacier now costs the same as a Google drive, except drive has none of the restrictions. AWS on-demand prices have been cut so much they've actually left some of their prepaid discount prices ('reserved instances', in their parlance) higher than what they sell on demand.

Glacier now costs the same as a Google drive, except drive has none of the restrictions

Only if your usage is exactly equal to the maximum usage from one of Google's pricing tiers.

I was just talking about this yesterday. My conclusion has always been that they are powering off the drives.

I came to that conclusion while modeling cost for Eye-Fi. Power is the big driver.

I'm not even saying robots. Just leave the disk there and power it off.

Where did Amazon deny that Glacier was tape? I don't believe that the climate control requirements debunks dape.

I really wish BDXL would hit the consumer market. I archived terabytes of data on single layer 25GB blurays and it's tedious to span across 10 discs for a larger storage folder.

I'm mildly curious why you'd do that - at a quick look on Amazon a 4TB hard drive seems to work out at less / GB than writeable blue rays, plus way less hassle plus in my personal experience hard disks seem more reliable than optical media - my 10 year old hard disks all work, cd-r's I made of similar age usually not?

I have 2 drobos maxed out at about 32TBs. The blurays are my cold storage. Blurays will last a hell of a lot longer than hard drives when kept away from the sun. Just because you have hard drives that last 10 years, doesn't mean you're not experiencing bit rot.

Ta. I just googled and found blurays use a completely different technology to my old CD-Rs which used organic dye and were awful for lifespan. Though I think I'll stick to my Moore's Law type disk strategy where I buy a new HD every few years and copy the entire old HD onto about 15% of it.

Do you keep a binary log of your network traffic?

I wish the same. I needed over 160 BD-R to archive 5K RAW footage of my around the world movie and just archiving them on a single BluRay writer took a few days. I wish there was more progress in consumer space...

What method did you guys use to segment the data between the disks?

Toast has automatic spanning without breaking up the files.

1cent/GB still isn't that impressive to me.. I have terrabytes I need to archive -- mostly family photos and videos (and my oldest kid is 5!). 1 terrabyte = $10/month. It's still cheaper to just buy external hard drives.

There are those online backup services that are unlimited for $X/month, but I have been bait-and-switched by those services twice now, and it takes such an incredibly long time (months) to upload everything.. so I've quit relying on those as well.

I also have about a terabyte of videos of kids and so forth.

You are right that external hard drives are a cheaper option, but you must actually take the time to transfer the drives offsite. I'm lazy and since I can't automate physical transfer of drives, that does not work for me. That's why I use Glacier via Arq.

Aside: CrashPlan and a few others allow me to backup to drives I've placed at a friend's house. But I'm not interested in setting that up.

What about having a small device, say based on a BeagleBone board (or similar low power) hooked up to a USB drive and wifi. Then set it up in your car to power up for a while when the car is turned off. When it power up, it looks at the network that it can connect to -- if it is your home network, then rsyncs from your home backup server. If it is an alternate (say, office network, or another place you visit frequently), it can rsync to that server.

If someone marketed a turn-key solution like this (home backup appliance, plus vehicle-based transfer appliance), would this be a good alternative to online based services?

That's a clever idea.

I have been wishing for something like that for years. I'd love to have all my music sync'd to car automatically when I get home. Even better if it can take all my docs and whatnot with me. Security would be an issue; you're upping your exposure immensely by driving your data around.

Ouse a pair of mirrored drives (software raid1) for my main home storage, with a corn job rsyncing that to an external USB drive running encfs (which I swap out approximately weekly with a duplicate in my office drawer) - I could happily wifi sync that encrypted drive to one in my car without worrying about it if/when it goes missing. (and I've got a spare RaspberryPi and wifi adapters on the project box too… I can just see the hassle of getting a DIY power supply from the cars "12v" system to reliably power the RaspberryPi being more time wasting than I'd be prepared to invest in this though… Noisy automotive power systems are a pain in the ass…)

Take a look at one of those phone power pack batteries. These already have clean USB 5.0V output, and can charge while it is in use.

Security would be an issue

Encryption would solve that.

I think you can help yourself a ton by using compression softwares for images, and especially videos. The cameras nowadays are of very very high resolution -- I am pretty sure that even if the image/size is quartered by a good compression method, your eyes won't notice a thing, be it on screen, or in print.

Why would you ever back up lossy, compressed copies of your irreplacable pictures? If you're going to do that, you might as well just put them all on Facebook.

In terms of long-term storage, decent lossy compression and Facebook are both better bets than lossless images on a hard drive you don't bother backing up because you can't justify the cost.


You do realise that Glacier is triply-redundant and your data is stored in at least two geographically redundant locations. Not to mention that they ensure that data isn't undergoing bit rot as will definitely happen with HDDs that are just sitting around..

They do feel a bit more reliable than just buying your own external hard drives. Or do you have a good data recovery plan?

I admittedly do not have a good data recovery plan. A house fire and it's lost (assuming the drives are beyond repair).

I'm kind of hoping a good+cheap service actually comes along. All I'd really need to do is take a backup to work though and swap them out once in a while -- but I'm not disciplined enough for that :)

Rereading my post, apologies if it came across harsh. I'm in pretty much the same boat. My kids are right about 5 and it is crazy how much I've managed to accumulate. I'm currently forcing myself to edit down pictures when I get a chance and paying for Amazon's cloud drive.

I certainly wish things were cheaper, but all of the "do it yourself" options have left me more than a little worried at my own lack of discipline working against me.

Backblaze has worked very well for me. It took two months to upload but now it stays synchronized. When I had a disk failure I had them overnight a HDD to me for $250.

This is why I worry about cloud backup — as things stand (local + periodic offsite backups), I periodically flatten and bare-metal-restore my systems for the sole purpose of testing backups, so I have a reasonable level of confidence that the backups will work as intended when disaster strikes. For $250, I'd probably never do this.

You don't need to pay them $250, you can download your data at 1MB/sec or so. I just needed my data ASAP and paid them to overnight the disk.

And let's not forget holographic technology. The first enterprise hardware is geared towards 2015, where one recordable optical disc reaches storage capacities of at least 300GB. Both Sony and Panasonic are working on that. The future is plenty... plenty of gigs that is. If Amazon was indeed using optical disks for their cloud storage, then this could only mean even lower prices for us in a very foreseeable future.

> The first enterprise hardware is geared towards 2015

I've been hearing about this since pets.com was a hot investment and not a whole lot has shipped since then. For a new storage technology that's particularly concerning since you don't know how accurate current guesses as to cost will be and, critically, nobody has a baseline to say what reliability will be like in the real world.

Costs and reliability - very good points. Regarding reliability, couldn't we say the same about BDXL technology? Certified discs for BDXL have a 50 to 100 years media longevity, or so it's said. Question is, how accurate are these numbers? The first rewritable 100GB BDXL discs hit the market in 2011. Would three years real world experience be sufficient to extrapolate the findings to the next 97 years?

I wouldn't disagree completely – anyone predicting longevity greater than, say, 5-10 years is simply making up numbers and hoping you don't ask about methodology.

That said, BDXL has shipped in volume for years and has less new technology involved so I would be surprised if it's significantly different than older Blu-Ray or DVD/CD systems which have been heavily tested.

Interesting analysis. That forward pricing suggested (5c/GB) for the 3-layer media is incredibly aggressive, and even then, it seems to be 2.5x the cost of drives and doesn't take into account the 3-layer RAIDed 10-disc product.

The cost issue just doesn't seem to compute for me. And that's leaving alone the custom storage system setup costs.

I worked in AWS. OP flatters AWS arguing that they take care to make money and assuming that they are developing advanced technologies. That't not working as Amazon. Glacier is S3, with the added code to S3 that waits. That is all that needed to do. Second or third iteration could be something else. But this is what the glacier is now.

Why is the author so quick to write off the idea of robotic hard drive management? It seems like the simplest explanation. Designate racks of hard drives for "new data," and "retrieving data." Write all new data to the "new data" hard drives, until they're full. Then a robot replaces the hard drive with a new one, and puts the full hard drive in cold storage. When a client requests data retrieval, the robot locates the hard drive and puts it into a "retrieving data" rack. The data retrieval servers are connected to this and actively work to retrieve data as the hard drives come online.

Considering Amazon bought Kiva Systems, which made highly intelligent floor robots for warehouses, they obviously have the talent to build the robots necessary for such an operation.

Having a robot juggling the hard drives would not make that much sense. The reason why we have optical disc and tape robots is that the tape and discs need a separate device that reads/writes them. With hardware there's not such need.

With hard drives it would make more sense to do some development on the electronics side and build a system where lots of drives can be simultaneously connected to a small controller computer. All of the HD's don't need to be powered on or accessible all the time, the controller could turn on only few of them at a time. And of course also part of the controllers could be normally powered off, once all the harddrives connected to them are filled.

Makes me think it would be really cool to work in an amazon data center and get to see and work with all this unreleased hardware. Very cool article.

It's very funny to hear you say that. Full article assumption based on the assumptions of the authors. The truth is much easier and more interesting. Amazon launches minimum viable products, doing as little as possible engineering and then iterates on those for the production of the product. The cheapest way to test the idea of ​​a glacier was add waiting to S3. So Amazon did that.

The problem is that BDXL is just no proven for any long term storage. It's got a very bad track record when it comes to delamination, oxygen and sunlight.

TApe is awesome, cheap and proven. It fits the MO of glacier. This is not to say that its not BDXL, but I'd be very suprised if amazon bet the entire house on something so untested.

Here is a Glacier S3 rack:

10Gbe TOR

3 x Servers

2 x 4U Jbod per server that contains around 90 disks or so

Disks are WD Green 4TB.

Roughly 2 PB of raw disk capacity per rack

The magic as with all AWS stuff is in the software.

There have been disks packs in Spectra tape libraries for 10 years, so you don't even need custom hardware for "disk drive robots"; it already exists.

Interesting that the article doesn't have the question mark in the title, but the HN link does. It inverts the implied conclusion of the article.

It's a longstanding convention on HN that we sometimes add a question mark when the title is overstated, but the article is still interesting.

"It's a longstanding convention on HN that we sometimes add a question mark when the title is overstated, but the article is still interesting."

Really? I don't think this is true, I have only seen the question mark when it is on the original article as well. Any other examples?

I don't think this is true

I'm a primary source on this, having done it many times myself, and I know PG has as well. Sorry, but I don't have time to dig up examples. It might be fairly straightforward, though, if you wanted to write code against the HN Search API.

The Amazon Glacier is hidden in plan sight.

I think is just a virtual product. Probably unused S3 capacity, at a much lower price, not a different technology.

I'm guessing cheap disks.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact