
Amazon’s Glacier secret: BDXL (2014) - another
https://storagemojo.com/2014/04/25/amazons-glacier-secret-bdxl/
======
fnord123
They're almost certainly doing something like Microsoft's Pelican:
[https://www.microsoft.com/en-us/research/wp-
content/uploads/...](https://www.microsoft.com/en-us/research/wp-
content/uploads/2016/09/pelican-hotstorage2016.pdf)

The first comment on TFA says as much.

Edit: This is the actual Pelican paper: [https://www.microsoft.com/en-
us/research/wp-content/uploads/...](https://www.microsoft.com/en-
us/research/wp-content/uploads/2014/10/osdi2014-Pelican.pdf)

~~~
match
Yep. This is basically what they do. Source: I was around when they were
designing and building Glacier.

~~~
hbk1966
This is very cool.

------
hemancuso
I highly doubt this. Especially given the introduction of the near line tiers.

It's probably just very widely striped, price-segmented data.

Also, see:
[https://news.ycombinator.com/item?id=4416065](https://news.ycombinator.com/item?id=4416065)

------
Spooky23
I thought this was debunked at the time?

When I was running exchange systems, our biggest challenge was delivering
IOPS. We had to use SAN, and wasted significant storage because we'd spend our
IOPS budget at 40-60% storage capacity.

I figured at their scale they would have similar problems.

~~~
chrisper
IOPS isn't important for glacier. You just upload to some buffer and the
eventually move it to the slow storage.

Reading is pretty slow from glacier.

~~~
t0mas88
He meant that if EBS has the same issue as his Exchange servers. To explain in
more detail: You have 10TB disk space with 10.000 IOPS, your users buy 4TB
with 10.000 IOPS then you have 6TB of storage wasted.

If Amazon has that problem with EBS, then selling that storage capacity as
Glacier and using just the idle IOPS (or leaving a small bit reserved) allows
them to sell capacity that would otherwise just be useless.

~~~
nickodell
Aren't IOPS incredibly expensive on Glacier? There was that guy who paid $150
for a retrieval. [https://medium.com/@karppinen/how-i-ended-up-
paying-150-for-...](https://medium.com/@karppinen/how-i-ended-up-
paying-150-for-a-single-60gb-download-from-amazon-glacier-6cb77b288c3e)

~~~
markonen
I'm that guy. I should update the post; Amazon "fixed" the retrieval fees in
late 2016 and I would've paid less than a dollar had the current pricing
scheme been in effect when I did the retrieval.

------
UseStrict
Using BDXL seems like a pretty good solution. Most of this data is archival
and existing data is very unlikely to change. You can use HDD/SSD as a buffer
as users upload data, and then optimize the packing to ensure you're using all
available space on a disk. Possibly encrypt each user's data block on the
disk. The system itself would only need to track metadata (file metadata,
cartridge/disk, key). Deleting a file would be deleting the key and marking
the file as inactive. Once/if a cartridge is marked as completely deleted, can
just recycle it.

~~~
baybal2
I feel it will go the way of Data8

------
zitterbewegung
Nice investigation and also Facebook has been using 50gb blue rays
[http://www.businessinsider.com/facebook-uses-10000-blu-
rays-...](http://www.businessinsider.com/facebook-uses-10000-blu-rays-for-
backup-2014-1) and is moving to 300gb
[http://www.businessinsider.com/ces-2016-facebook-uses-
panaso...](http://www.businessinsider.com/ces-2016-facebook-uses-panasonic-
freezeray-2016-1) .

~~~
kakarot
Could you provide me some insight into why optical storage is a better
solution than standard HDDs? Is it just the cost, or is cooling / form-factor
a big part of it?

~~~
zitterbewegung
It is cost that provides the upside for BluRay than HDDs.

------
shiftpgdn
Why isn't anyone thinking tapes? You can get LTO 7 tapes for $0.008 per
Gigabyte that allow 100-300 writes before the tape should be destroyed.
Quantum and HP make monstrous tape libraries that hold 5-10 petabytes per
rack. You can also cartridge-ize your library for even more dense storage on a
literal warehouse rack somewhere.

Tapes also match the slow retrieval speeds as you have to read the data out
onto a drive linearly.

~~~
eridius
Amazon already denied using tapes, which was mentioned in the third paragraph
of the article.

~~~
shiftpgdn
There isn't a source mentioned for the denial of tapes but this article claims
an Amazon insider verified that it is tapes:

[http://www.theregister.co.uk/2012/10/18/ipexpo_2012/](http://www.theregister.co.uk/2012/10/18/ipexpo_2012/)

------
tyingq
Previous HN discussion about this:
[https://news.ycombinator.com/item?id=7647571](https://news.ycombinator.com/item?id=7647571)

------
saosebastiao
This is an extremely interesting deductive analysis. However, considering it
is amazon, there always exists that persistent "other" possibility: they're
purposefully taking a loss.

~~~
ghaff
Or at any rate, pricing based on net present value of archived data given
decreasing costs over time for storage.

I do find it rather fascinating that AWS has managed to keep the technology
used by Glacier, even at a high-level (i.e. disks vs. tape vs. optical), so
under wraps. My personal guess is that it's powered-down disk drives on the
grounds that's the simplest long-term solution but that's purely a guess.

------
WalterBright
Given the scale of Glacier, I'm surprised that Amazon is able to keep their
underlying storage technology a secret.

~~~
rbanffy
This is one reason to assume it's unremarkable. If Amazon were buying and
offloading truckloads of BDXL disks and drives, someone would eventually
notice. A good explanation is, therefore, that the technology is unremarkable
and boring.

------
binaryanomaly
Does someone know about Google nearline and coldline storage? Google claims
coldline access within miliseconds.

~~~
monocasa
> Google claims coldline access within miliseconds

Well, that basically tells you almost all you need to know. It's disk in
JBODs. The only question is SMR vs conventional. Anyone who knows that can't
tell you in public.

~~~
CydeWeys
If that's how Google is implementing it, and the prices are similar, then
isn't that some evidence that Amazon might be implementing it the same way?

~~~
_wmd
Amazon's latency is measured in hours, whatever process they use involves
literally cold storage, either disks that are completely switched off, or some
tape-like media archive system, but the article makes a good case for why it's
almost certainly not tape

~~~
CydeWeys
It could still be normal disks, if we're feeling conspiratorial. It's easy to
make faster storage slower; just add waits. Or maybe they make it take up to
five hours so they can avoid peak traffic times in whatever data center your
info happens to be located in.

------
mixmastamyk
I've got a USB3 BDXL writer attached at my desk and it is quite handy and not
too expensive. I back up my whole data (work) partition to it every so often
and occasionally take one over to a relative's house as my own home-grown
"glacier" system.

~~~
tropin
Althought your setup is interesting, isn't using 2.5( * ) sata disks via an
usb3 adapter way cheaper and flexible?

( * ) So you don't need a power adapter.

~~~
mixmastamyk
Perhaps, but optical is lighter, much simpler, durable, can be mailed, and
fits into CD cases for travel, etc.

~~~
sengork
And optical media is also immutable (read only) when you need it to be immune
to any data changes.

I find that read only point in time backups gain value over time. Especially
if you need to pull a file that would have been long rotated and replaced by
newer backups on read write media (eg. HDDs).

Unfortunately the market for this use case is not large and this is greatly
reflected in the prices and relatively hard to source high quality optical
media. For BD this would be (inorganic) HTL Panasonic media which only has a
market inside Japan itself. M-Disc is the other alternative although it only
has proven itself within the DVD market, as classical HTL BD media is expected
to be very similar in endurance to what M-Disc has on offer in the BD range.

------
Twirrim
Ex-Glacier engineer... and no I'm not going to tell you what or how it's done.
NDAs and all that jazz. These speculation threads always make for fascinating
reading for people on the team.

~~~
exawsthrowaway
Glacier in particular seems to attract the speculative fascination. Do people
not realise the name is not in jest, it really is done with graphene-doped
room-temperature ice crystals and laser interference lithography?

~~~
Twirrim
We used to joke among ourselves that actually it was done using vinyl records.
Have you seen how many vinyl records you can fit into a single rack?

Added bonus, 9 out of 10 customers actually preferred the feel of their data
when it is restored.

~~~
dwyerm
I liked the story that the truck-mounted Snowball came from Glacier tech.
Amazon has been putting data on a truck and then driving it around Virginia.
The delay in reading it back is the time it takes for the truck to arrive at a
datacenter and plug in. :)

------
digi_owl
Are there packet written?

That seems to have been the major stumbling block with higher capacity optical
media, that one can't do the drag and drop writes that one have with spinning
rust and flash chips.

~~~
lobster_johnson
Bulk transfer. They'd write it to a normal disk first, then slowly copy the
data in bulk.

------
jayonsoftware
What if they are using 2.5 inch 5TB drives like this
[http://www.theverge.com/circuitbreaker/2016/11/15/13642078/s...](http://www.theverge.com/circuitbreaker/2016/11/15/13642078/seagate-
backup-plus-portable-5tb-hard-drive) I use. They are nice as we can plug them
into a 15 port USB stick, they auto power down when not in used. Amazon could
have developed a box like what backblaze.com has done.

------
kennethh
Facebook disclosed how they are archiving photos long term, they manage a 1 to
1.4 ratio with Reed Solomon Redundancy and 8 disk of 14 can fail without
loosing data. [https://code.facebook.com/posts/1433093613662262/-under-
the-...](https://code.facebook.com/posts/1433093613662262/-under-the-hood-
facebook-s-cold-storage-system-/)

------
KaiserPro
From what I recall, Writable optical disks have a much shorter life span
compared to tape (~15 years vs 75 years)

Plus, if I was designing an archival system, it wouldn't be on blueray, unless
there was a requirement for magnetic resistance.

~~~
pmlnr
You are mistaken here. Tapes, unless stored with extreme precision, usually
last ~20 years safely. They tend to have a longer life, but not without a high
risk of deterioration.

In my experience, written CD and DVD only lasts for <10 years, if you're
lucky. However studies show you can get 30-45, even 45+ years out of them.

Most Blu-Ray expectancy exceeds this due to the different non organic dye
based layering and coating.

The one mentioned above, M-Disc was something developed for DARPA and supposed
to last 1000 years in theory.

See:

[http://www.mdisc.com/](http://www.mdisc.com/)

[http://loc.gov/preservation/resources/rt/NIST_LC_OpticalDisc...](http://loc.gov/preservation/resources/rt/NIST_LC_OpticalDiscLongevity.pdf)

[http://www.zdnet.com/article/torture-testing-
the-1000-year-d...](http://www.zdnet.com/article/torture-testing-
the-1000-year-dvd/)

------
gwicke
I wonder if Amazon is also deduplicating. It seems likely that a share of
users would store large media files or software packages without encryption.

------
Flammy
I remember reading this back around when it came out. Have there been any new
pieces of the puzzle identified (or announcements...) since then?

------
Nition
Could it possibly just be compression on tradtional HDDs?

\- Cheaper storage because data is heavily compressed

\- Slow retrieval time due to slow decompression

~~~
usefulcat
I seriously doubt it. If I have many TB of stuff I need to have saved off
site, you can bet the very first thing I'm going to do is compress the hell
out of it. So they certainly wouldn't be able to count on compressing it
further.

------
rrggrr
Custom engineered SSD. Powered off at rest.

See: [http://www.storagesearch.com/ssd-
petabyte.html](http://www.storagesearch.com/ssd-petabyte.html)

~~~
hedora
SSD isn't quite ready for backup use cases (and certainly wasn't when Glacier
was built a few years ago).

However, is 2017, so we can say for sure that the extrapolation to 2016 in the
linked article from 2010 was pretty good. It is too optimistic by a factor of
~2 in density, and ~10x in cost, but is spot-on even compared to most
predictions from a year or two ago.

Since any whacko can claim they made a prediction in 2010, I double checked:

[http://web.archive.org/web/20100322200343/http://www.storage...](http://web.archive.org/web/20100322200343/http://www.storagesearch.com/ssd-
petabyte.html)

Thanks for sharing the link!

