
Amazon’s Glacier secret: BDXL? - hemancuso
http://storagemojo.com/2014/04/25/amazons-glacier-secret-bdxl/
======
skrause
The author quickly dismisses hard drives because at the time of the Glacier
launch SMR drives were to expensive because of the Thai flood. But after a few
years of running S3 and EC2 Amazon must have tons of left-over hard drives
which are now simply too old for a 24/7 service.

So what do you with those three year old 1 TB hard drives where the power-
consumption-to-space ratio is not good enough anymore? Or can of course
destroy them. Or you actually do build a disk drive robot, fill the disk with
Glacier data, simply spin it down and store it away. Zero cost to buy the
drives, zero cost for power-consumption. Then add a 3-4 hour retrieval delay
to ensure that those old disk don't have to spin up more than 6-8 at times a
day anymore even in the worst case.

But that's just my personal theory anyway.

~~~
batbomb
This is probably close, at least for launch.

The problem with this theory, however, is that tape still would have been
cheaper with roughly the same footprint, and tape has the benefit of often
being forward-compatible too, as the drives improve, so does the storage
capacity of the tape.

The author dismisses that amazon wasn't using tape, but I haven't seen much
evidence to support that necessarily.

Maybe the reality is they probably use a bit of everything. A robotic disk
library, a robotic tape library, maybe a robotic optical library. Maybe
they're secretly ahead of the rest of current tape technology and are getting
20TB out of a single tape. They could be using custom hard drives, or even
having a robotic platter library.

One thing for sure is that it's not active disk drives, and I don't believe
they keep all the data on optical disks alone, given that optical disks
degenerate much quicker than magnetic storage.

~~~
funkyy
I dont understand how tape infrastructure would be cheaper than free used
disks that are on only 4-5 times a day for mostly very short periods of time
(actually some discs would be off for months). Disks are easy to use, they are
not that easy to damage and you can have each copy of data synchronised on 2-3
different DCs easily using Amazon network during night times (low network
usage).

~~~
wmf
The server and network to host those disks ends up costing much more than the
disks, so even MAID built from free disks isn't that cheap.

~~~
funkyy
Wouldn't be possible to just stack them in custom/modified racks, connect them
to the tape and custom switch? Then get very cheap servers to switch through
them? The hard drives don't generate much heat anyways so only basic
ventilation would be needed. Using enterprise servers for Glacier sounds like
an expensive luxury. Most discs would be mostly offline anyways (small amount
of changes) and I bet more than a half discs would be used less than once a
week so cost of. Then if you want to increase speed of the whole structure put
1u server with 4 x 4TB SATA3 discs in Raid10 for caching the most used/changed
parts, connect server to 1Gbps line and you are all done on budget. The cost
is just 1 simple server, modification to racks (that would cost really small
money), tapes for discs and custom hdd switch.

------
hemancuso
A “former S3 engineer” commented on HN during the Glacier launch. Nothing
verifiable, but it suggests some contrast with the idea that Glacier is
optical backed [also interesting: he suggests that S3 has an erasure encoding
strategy.]

[https://news.ycombinator.com/item?id=4416065](https://news.ycombinator.com/item?id=4416065)

“They’ve optimized for low-power, low-speed, which will lead to increased cost
savings due to both energy savings and increased drive life. I’m not sure how
much detail I can go into, but I will say that they’ve contracted a major
hardware manufacturer to create custom low-RPM (and therefore low-power) hard
drives that can programmatically be spun down. These custom HDs are put in
custom racks with custom logic boards all designed to be very low-power. The
upper limit of how much I/O they can perform is surprisingly low – only so
many drives can be spun up to full speed on a given rack. I’m not sure how
they stripe their data, so the perceived throughput may be higher based on
parallel retrievals across racks, but if they’re using the same erasure coding
strategy that S3 uses, and writing those fragments sequentially, it doesn’t
matter – you’ll still have to wait for the last usable fragment to be read.”

~~~
badman_ting
"he suggests that S3 has an erasure encoding strategy"

Apologies for the diversion - what does this mean? Does it mean that when an
item is erased from S3, S3 "encodes" the data so that the next person who gets
the same physical disk space can't read what was there before?

~~~
oakwhiz
No, "erasure encoding" refers to a specific type of forward error correction.

[https://en.wikipedia.org/wiki/Erasure_code](https://en.wikipedia.org/wiki/Erasure_code)

[https://en.wikipedia.org/wiki/Forward_error_correction](https://en.wikipedia.org/wiki/Forward_error_correction)

~~~
darkmighty
If you push hard enough erasure coding, you can use extremely unreliable
components throughout the infrastructure while maintaining reliability. That
introduces high latency depending on the hardware specifics, but that's what
Glacier is about.

Imo, they found a sweet spot of $/GB on a much higher latency/lower
reliability region (this is analogous to increasing overall capacity in a
communication channel by instead of using a few highly reliable high powered
symbols use optimally many unreliable low power ones with error correction) --
disk manufacturers already use this aggressively for soft failures within the
disk, but are obviously restricted on more systematic failures (i.e. if the
whole drive fails there's nothing they can do).

If a single drive has a P_failure, with many drives they can achieve close to
1-P_failure reliable storage capacity [1]. So all they have to do is seek the
optimum

$/GB_opt = min over C,D [ C/(D*(1-P_failure(C))) ], where C is the cost per
drive and D is the drive capacity

[1]
[http://en.wikipedia.org/wiki/Binary_erasure_channel](http://en.wikipedia.org/wiki/Binary_erasure_channel)

~~~
amaks
Yes, but this doesn't explain 4-5 hours data access latency.

~~~
chockablock
Latency could be artificial, in order to get:

-differential pricing

-ability to switch to transparently switch to slower technologies in future

~~~
darkmighty
If they're aggressive enough, I'd say there could justifiably be a quite large
latency. Say for example their block size is 55 disks with 10% redundancy.
Then the seek time is the maximum seek time of the lowest 50 seek times within
the set of disks which haven't failed. Now there's a queue to read each drive,
and it may very well frequently be a huge maximum queue pretty much every
time. Even if the queue is typically short, they would still have to show a
typical high value, which I'd say is essentially read time of an entire disk.

Now factor that the disks are both large and crappy. A 120 Mb/s read speed and
1Tb disk size would imply 8000 secs ~ 2 hours. Factor possibility of
differential pricing as you mentioned (even longer queues), and you may get an
upper bound of 3 or 4 hours.

I'm just speculating though.

------
esonderegger
Do BDXL discs have greater longevity than other types of optical media?

Back in 2000, our studio switched from archiving our audio recordings on DAT
to CD-R. Thinking the greatest threats were loss and scratches, they made
three copies of every disc and stored them in different locations. Around
2007, all of those CD-R discs started becoming unreadable around the same
time.

Video recordings were archived to DVD-R starting in 2003 with the same three
copy approach. In 2009, after having successfully rescued all our audio
recordings on to RAID5 network attached storage, we weren't so lucky with
saving our early DVD-R data.

I always assumed the reason DVD-R discs lasted less time than CD-R discs
before chemical decomposition was because of the greater density, but I never
looked into it enough to know for sure.

I have to imagine if Amazon is indeed doing this, they have some sort of plan
in place to write all data to fresh optical media every five years or so. If
so, that would create a very different cost equation.

~~~
_red
For the same reasons, I've been investigating M-Disc. Requires special writer
(but sub-$100), but can be read by any DVD reader.

Etches instead of dye, result is claimed 1000 year lifetime. DOD has evidently
certified.

[http://www.mdisc.com/what-is-mdisc/](http://www.mdisc.com/what-is-mdisc/)

However, I worry about even finding optical readers in 10 years time.

~~~
esonderegger
Interesting. I hadn't heard of M-Disc. Thanks for the link!

The worry of not being able to find a device to read/play back your media is a
common one for all storage types, analog and digital. For example, we have
hundreds of reels of quarter inch tape from the fifties and sixties down in
our library. The tapes themselves are safe (some may need a day in an easy-
bake oven), but otherwise they still sound great. We recently had our Studer
A-820 refurbished and it's probably in better than new condition, but I worry
there won't be any people to do the service, or parts to do the service with
if it needs attention in ten years.

The more obscure, the medium, the trickier it gets. We have a bunch of early
digital recordings stored on Betamax tapes, that need both a working Betamax
deck and a PCM encoder-decoder box in order to be read. Fortunately, those
have all been transferred now.

This is why I'm a believer in thinking of maintaining an archive as being an
active rather than static process. It's important to be periodically re-
evaluating your digital assets to make sure they can be losslessly transferred
to current file formats and modern storage media. It was probably never a good
idea to simply "put it on a shelf and forget about it", but thankfully with
digital assets, these migrations can be lossless, automated, and tested.

~~~
barrkel
If you don't have restore drills you don't have backups.

~~~
esonderegger
Yes, of course. However, backups != preservation. You could have a textbook
backup strategy with restore drills and fixity checks making sure that your
integrity is 100%. If your audio is stored in Sound Designer II files and your
print documents are Word Perfect files, you are probably not being a good
steward of your data.

------
nkurz
There are several interesting "dead" comments on this thread from people who
say they have specific knowledge:

secretamznsqrl 20 minutes ago | link [dead] [-]

    
    
      Here is a Glacier S3 rack:
      10Gbe TOR
      3 x Servers
      2 x 4U Jbod per server that contains around 90 disks or so
      Disks are WD Green 4TB.
      Roughly 2 PB of raw disk capacity per rack
      The magic as with all AWS stuff is in the software.
    
    

jeffers_hanging 1 hour ago | link [dead] [-]

    
    
      I worked in AWS. OP flatters AWS arguing that they take
      care to make money and assuming that they are developing 
      advanced technologies. That't not working as Amazon. 
      Glacier is S3, with the added code to S3 that waits. That 
      is all that needed to do. Second or third iteration could 
      be something else. But this is what the glacier is now.

~~~
hemancuso
The math on the first "dead" post only works out to 1PB per rack. Unless there
is somehow a way to jam 90disks into 4U. The backblaze and supermicro 45 disk
4U chassis suggest that would be pretty tough. Besides, there is still a good
bit of rack left at 24U of JBOD plus 3 servers and TOR.

~~~
moe
2PB per rack sounds about right for a very tightly packed rack. Most DC's
can't supply power and cooling for that kind of density, though.

~~~
mrb
The power requirements would be precisely why a Glacier HDD rack has only a
fraction of its HDDs powered on at any given time. This also explains the 3-5
hours latency: you have a queue of jobs and you have to wait for other jobs to
finish (eg. reading gigabytes of data) before your drive can be powered on.

It all makes sense.

------
nutjob123
Facebook is doing this. They have a video of the machines too!
[https://www.facebook.com/photo.php?v=10152128660097200&set=v...](https://www.facebook.com/photo.php?v=10152128660097200&set=vb.9445547199&type=2&theater)

~~~
darksim905
That's really cool. I feel like it'll be years before the rest of the world
sees that in datacenters.

~~~
kakoni
In the meantime, keep eye on this guy;
[http://gigaom.com/2014/03/25/facebooks-open-compute-guru-
fra...](http://gigaom.com/2014/03/25/facebooks-open-compute-guru-frank-
frankovsky-leaves-to-build-optical-storage-startup/)

------
nkurz
_Therefore, by a process of elimination, Glacier must be using optical disks.
Not just any optical discs, but 3 layer Blu-ray discs._

What? You partially eliminate a few strawmen, and thus conclude that your pet
theory is the only answer? The logic in this article is so weak that I presume
this must be an example of 'parallel construction'. I have to guess that
someone he trusts but is not allowed to quote has told him that Amazon is
using BDXL disks, and he's pretending to reason himself to the same
conclusion. My next best guess is that he's a few weeks late on his April
Fool's post.

 _Assuming aggressive forward pricing by Panasonic or TDK, Amazon probably
paid no more than $5 /disc or 5¢/GB in 2012. Written once, placed in a
cartridge, barcoded and stored on a shelf, the $50 media cost less than a hard
drive – Blu-ray writers are cheap – Amazon would recoup variable costs in the
first year and after that mostly profit._

OK, maybe the April Fool's explanation should come first. Because if you were
trying to come up with plausible logic, surely you could do better than
declaring that Amazon's private price for BDXL media is 1/5th that known to
the public and that all other costs are zero. And even if one were to assume
this admittedly unlikely scenario, wouldn't you need to write more than one
disk for redundancy? But that's easy to solve: just wave the magic wand and
halve Amazon's secret price down to a level where it make sense.

Powered down hard drives seem like a much simpler explanation. The robotics
don't seem that difficult if instead of bringing the disk to the backplane you
keep the disk fixed in place and just attach the cable. Presumably you could
design your own sturdier and easier to align connector, and leave the adapter
in place. Or maybe there is some way to do it with a mechanical switch? Or do
you even need a robot? If you built a jig so you could plug in a whole drawer
of drives at once, maybe you could just hire someone to do it.

~~~
KaiserPro
Tape is dismissed far to easily, tape is around $25-40 a tape, which $15 a TB.

------
veidr
Although Glacier is interesting and I use it every day for some of my backups
(a few TB of ongoing backups via Arq).

But I have always assumed it was just -- like the mysteriously/inappropriately
killed comments from jeffers_hanging on this thread claim -- just S3 with hard
delays enforced by software.

If you are already doing S3 for the entire world at exabyte scale, and you
have petabytes of excess capacity and a bunch of aging, lower-performance
infrastructure sitting around... do you really need to fuck with risky new
optical disk formats?

It seems to me that you could just start with selling slowed-down rebranded
S3, and then iterate on it.

~~~
grogers
IIRC there was a similar rumor that S3 reduced redundancy storage was just
cheaper rates for the exact same service. Not sure if that is true, but
certainly meets the minimum viable product.

Glacier as S3 with sleeps seems to be a pretty reasonable extrapolation of
that same idea. Plus, the sleeps are long enough that if and when they build
it for real, it could be economically viable.

That said, if the quoted 0.3 cents is true, it should be viable as it is by
just stuffing way more density in a rack and keeping them powered down most of
the time... Though that rack would probably weigh tons, so you'd probably want
to reinforce the floor of the data center a bit. We saw equinix facilities
with concrete floors and ventilation delivered from above, so something like
that could be an option.

------
toolslive
I was thinking the basic strategy of Glacier was to power down whole racks,
accumulates requests until they have enough of them to power up the rack and
serve them all in one go. If you combine this with some kind of erasure coding
(typically Cauchy-Reed-Solomon) you get even more freedom as you can treat
powered down racks as temporary erasures. Anyway, I'm pretty sure it's tape
nor optical.

------
alecco
Amazon Glacier has some small letter hidden costs that can pile up to
THOUSANDS of dollars quickly if you retrieve the data "too fast". It can
happen with only a few hundred gigabytes.

I fail to see how it is useful as a backup service. This got me into serious
trouble. Built a whole backup system around it and now it is not what it was
supposed to be.

~~~
matthewmacleod
I mean, it's not like Amazon suffers from opaque pricing. The offer with
Glacier is exactly what they said it would be, and the trade off for lower
per-gig costs is the toll on retrieving the data. That's still perfectly
reasonable IMO; I have a couple of terabytes backed up, and worst-case if I
actually need to hit glacier, it'll cost me a few hundred dollars.

~~~
lotharrr
In general I agree, but the problem with Glacier's retrieval pricing in
particular is that it's quoted as dollars-per-byte (like all their other
prices), but in fact it's really dollars-per-(peak-bytes-per-second).

It's easy to get a nasty surprise. If I pull enough to saturate my 28Mbps DSL
downstream for 4 hours (~51GB), I'd pay nearly $100 that month, not
51GB*($0.01/GB)=$0.51. You really need a Glacier-aware restore program that is
told your maximum budget, knows the pricing formula, and carefully schedules
the retrievals to keep the peak rate below that threshold.

~~~
vidarh
It's for long term offsite backups/archival. If you need to retrieve lots of
data from it on a regular basis (e.g. because of a single dead drive) you're
doing something wrong.

If our office burns down, and I had our offsite backups in Glacier, the
retrieval costs are peanuts compared to cost of losing the data (going out of
business). If my home burns down, and I have my offsite backups in Glacier, a
few hundred dollars to retrieve data would be nothing compared to the
emotional loss of years worth of photos etc.

But I have _triple_ copies at home - minimum - of almost all of my data
(mirrored drives + regular snapshots on a third drive; and a lot of the stuff
is also synced to/from one or more other computers), and similar setups at
work: An offsite backup is _last resort_.

------
amznian
I am an AWS engineer but note that I am not affiliated with Glacier. However
James Hamilton did an absolutely amazing Principals of Amazon talk a couple of
years ago going into some detail on this topic. Highly recommended viewing for
Amazonians.

From what I remember from it, its custom HDs, custom racks, custom logic
boards with custom power supplies. The system trades performance for
durability and energy efficiency.

Robin Harris' deductions in this article are worthy of Arthus Conan Doyle...

------
oh_sigh
Glacier uses hard drives, and the thing the author missed is that the MTTF of
infrequently utilized drives is _much_ better than frequently accessed drives.

------
rushi_agrawal
It is interesting to note that while S3 prices have more than halved, Glacier
prices have remained the same during that period. Does this mean that
extremely low margins is at play here?

Or is it that Amazon has no competitor here yet, good enough to push it to
lower the prices?

~~~
cperciva
Could be either, but note that Amazon has a history of introducing new
products at or below cost and then gaining a profit margin by keeping their
prices fixed while their costs drop. A while back I heard that the fully
loaded cost (to Amazon) of m1.small was $0.15/hour when EC2 launched, but they
priced it at $0.10/hour because they knew their costs would drop and they
wanted to win market share.

~~~
pvg
That history may actually be over as Google is aggressively undercutting them
and they are scrambling to match pricing.

Glacier now costs the same as a Google drive, except drive has none of the
restrictions. AWS on-demand prices have been cut so much they've actually left
some of their prepaid discount prices ('reserved instances', in their
parlance) higher than what they sell on demand.

~~~
cperciva
_Glacier now costs the same as a Google drive, except drive has none of the
restrictions_

Only if your usage is exactly equal to the maximum usage from one of Google's
pricing tiers.

------
dmourati
I was just talking about this yesterday. My conclusion has always been that
they are powering off the drives.

I came to that conclusion while modeling cost for Eye-Fi. Power is the big
driver.

I'm not even saying robots. Just leave the disk there and power it off.

------
everettForth
Where did Amazon deny that Glacier was tape? I don't believe that the climate
control requirements debunks dape.

------
post_break
I really wish BDXL would hit the consumer market. I archived terabytes of data
on single layer 25GB blurays and it's tedious to span across 10 discs for a
larger storage folder.

~~~
tim333
I'm mildly curious why you'd do that - at a quick look on Amazon a 4TB hard
drive seems to work out at less / GB than writeable blue rays, plus way less
hassle plus in my personal experience hard disks seem more reliable than
optical media - my 10 year old hard disks all work, cd-r's I made of similar
age usually not?

~~~
post_break
I have 2 drobos maxed out at about 32TBs. The blurays are my cold storage.
Blurays will last a hell of a lot longer than hard drives when kept away from
the sun. Just because you have hard drives that last 10 years, doesn't mean
you're not experiencing bit rot.

~~~
tim333
Ta. I just googled and found blurays use a completely different technology to
my old CD-Rs which used organic dye and were awful for lifespan. Though I
think I'll stick to my Moore's Law type disk strategy where I buy a new HD
every few years and copy the entire old HD onto about 15% of it.

------
legohead
1cent/GB still isn't that impressive to me.. I have terrabytes I need to
archive -- mostly family photos and videos (and my oldest kid is 5!). 1
terrabyte = $10/month. It's still cheaper to just buy external hard drives.

There are those online backup services that are unlimited for $X/month, but I
have been bait-and-switched by those services twice now, and it takes such an
incredibly long time (months) to upload everything.. so I've quit relying on
those as well.

~~~
cjensen
I also have about a terabyte of videos of kids and so forth.

You are right that external hard drives are a cheaper option, but you must
actually take the time to transfer the drives offsite. I'm lazy and since I
can't automate physical transfer of drives, that does not work for me. That's
why I use Glacier via Arq.

Aside: CrashPlan and a few others allow me to backup to drives I've placed at
a friend's house. But I'm not interested in setting that up.

~~~
derekp7
What about having a small device, say based on a BeagleBone board (or similar
low power) hooked up to a USB drive and wifi. Then set it up in your car to
power up for a while when the car is turned off. When it power up, it looks at
the network that it can connect to -- if it is your home network, then rsyncs
from your home backup server. If it is an alternate (say, office network, or
another place you visit frequently), it can rsync to that server.

If someone marketed a turn-key solution like this (home backup appliance, plus
vehicle-based transfer appliance), would this be a good alternative to online
based services?

~~~
cjensen
That's a clever idea.

~~~
msandford
I have been wishing for something like that for years. I'd love to have all my
music sync'd to car automatically when I get home. Even better if it can take
all my docs and whatnot with me. Security would be an issue; you're upping
your exposure immensely by driving your data around.

~~~
bigiain
Ouse a pair of mirrored drives (software raid1) for my main home storage, with
a corn job rsyncing that to an external USB drive running encfs (which I swap
out approximately weekly with a duplicate in my office drawer) - I could
happily wifi sync that encrypted drive to one in my car without worrying about
it if/when it goes missing. (and I've got a spare RaspberryPi and wifi
adapters on the project box too… I can just see the hassle of getting a DIY
power supply from the cars "12v" system to reliably power the RaspberryPi
being more time wasting than I'd be prepared to invest in this though… Noisy
automotive power systems are a pain in the ass…)

~~~
derekp7
Take a look at one of those phone power pack batteries. These already have
clean USB 5.0V output, and can charge while it is in use.

------
mrsaint
And let's not forget holographic technology. The first enterprise hardware is
geared towards 2015, where one recordable optical disc reaches storage
capacities of at least 300GB. Both Sony and Panasonic are working on that. The
future is plenty... plenty of gigs that is. If Amazon was indeed using optical
disks for their cloud storage, then this could only mean even lower prices for
us in a very foreseeable future.

~~~
acdha
> The first enterprise hardware is geared towards 2015

I've been hearing about this since pets.com was a hot investment and not a
whole lot has shipped since then. For a new storage technology that's
particularly concerning since you don't know how accurate current guesses as
to cost will be and, critically, nobody has a baseline to say what reliability
will be like in the real world.

~~~
mrsaint
Costs and reliability - very good points. Regarding reliability, couldn't we
say the same about BDXL technology? Certified discs for BDXL have a 50 to 100
years media longevity, or so it's said. Question is, how accurate are these
numbers? The first rewritable 100GB BDXL discs hit the market in 2011. Would
three years real world experience be sufficient to extrapolate the findings to
the next 97 years?

~~~
acdha
I wouldn't disagree completely – anyone predicting longevity greater than,
say, 5-10 years is simply making up numbers and hoping you don't ask about
methodology.

That said, BDXL has shipped in volume for years and has less new technology
involved so I would be surprised if it's significantly different than older
Blu-Ray or DVD/CD systems which have been heavily tested.

------
r00fus
Interesting analysis. That forward pricing suggested (5c/GB) for the 3-layer
media is incredibly aggressive, and even then, it seems to be 2.5x the cost of
drives and doesn't take into account the 3-layer RAIDed 10-disc product.

The cost issue just doesn't seem to compute for me. And that's leaving alone
the custom storage system setup costs.

------
jeffers_hanging
I worked in AWS. OP flatters AWS arguing that they take care to make money and
assuming that they are developing advanced technologies. That't not working as
Amazon. Glacier is S3, with the added code to S3 that waits. That is all that
needed to do. Second or third iteration could be something else. But this is
what the glacier is now.

------
chatmasta
Why is the author so quick to write off the idea of robotic hard drive
management? It seems like the simplest explanation. Designate racks of hard
drives for "new data," and "retrieving data." Write all new data to the "new
data" hard drives, until they're full. Then a robot replaces the hard drive
with a new one, and puts the full hard drive in cold storage. When a client
requests data retrieval, the robot locates the hard drive and puts it into a
"retrieving data" rack. The data retrieval servers are connected to this and
actively work to retrieve data as the hard drives come online.

Considering Amazon bought Kiva Systems, which made highly intelligent floor
robots for warehouses, they obviously have the talent to build the robots
necessary for such an operation.

------
jpalomaki
Having a robot juggling the hard drives would not make that much sense. The
reason why we have optical disc and tape robots is that the tape and discs
need a separate device that reads/writes them. With hardware there's not such
need.

With hard drives it would make more sense to do some development on the
electronics side and build a system where lots of drives can be simultaneously
connected to a small controller computer. All of the HD's don't need to be
powered on or accessible all the time, the controller could turn on only few
of them at a time. And of course also part of the controllers could be
normally powered off, once all the harddrives connected to them are filled.

------
math0ne
Makes me think it would be really cool to work in an amazon data center and
get to see and work with all this unreleased hardware. Very cool article.

~~~
jeffers_hanging
It's very funny to hear you say that. Full article assumption based on the
assumptions of the authors. The truth is much easier and more interesting.
Amazon launches minimum viable products, doing as little as possible
engineering and then iterates on those for the production of the product. The
cheapest way to test the idea of ​​a glacier was add waiting to S3. So Amazon
did that.

------
KaiserPro
The problem is that BDXL is just no proven for any long term storage. It's got
a very bad track record when it comes to delamination, oxygen and sunlight.

TApe is awesome, cheap and proven. It fits the MO of glacier. This is not to
say that its not BDXL, but I'd be very suprised if amazon bet the entire house
on something so untested.

------
secretamznsqrl
Here is a Glacier S3 rack:

10Gbe TOR

3 x Servers

2 x 4U Jbod per server that contains around 90 disks or so

Disks are WD Green 4TB.

Roughly 2 PB of raw disk capacity per rack

The magic as with all AWS stuff is in the software.

------
wazoox
There have been disks packs in Spectra tape libraries for 10 years, so you
don't even need custom hardware for "disk drive robots"; it already exists.

------
kristianp
Interesting that the article doesn't have the question mark in the title, but
the HN link does. It inverts the implied conclusion of the article.

~~~
dang
It's a longstanding convention on HN that we sometimes add a question mark
when the title is overstated, but the article is still interesting.

~~~
kristianp
"It's a longstanding convention on HN that we sometimes add a question mark
when the title is overstated, but the article is still interesting."

Really? I don't think this is true, I have only seen the question mark when it
is on the original article as well. Any other examples?

~~~
dang
_I don 't think this is true_

I'm a primary source on this, having done it many times myself, and I know PG
has as well. Sorry, but I don't have time to dig up examples. It might be
fairly straightforward, though, if you wanted to write code against the HN
Search API.

------
enricotal
The Amazon Glacier is hidden in plan sight.

I think is just a virtual product. Probably unused S3 capacity, at a much
lower price, not a different technology.

------
nroose
I'm guessing cheap disks.

