

Amazon Glacier - sqnguyen
http://aws.amazon.com/glacier/
http://aws.amazon.com/glacier/
======
ghshephard
I'm a long time user of backblaze, and I'm a big fan of the product - it does
a great job of always making sure my working documents are backed up,
particularly when I'm traveling overseas, and my laptop is more vulnerable to
theft or damage.

With that said - Backblaze is optimized for working documents - and the
default "exclusion" list makes it clear they don't want to be backing up your
"wab~,vmc,vhd,vo1,vo2,vsv,vud,vmdk,vmsn,vmsd,hdd,vdi,vmwarevm,nvram,vmx,vmem,iso,dmg,sparseimage,sys,cab,exe,msi,dll,dl_,wim,ost,o,log,m4v"
files. They also don't want to backup your /applications, /library, /etc, and
so on locations. They also make it clear that backing up a NAS is not the
target case for their service.

I can live with that - because, honestly, it's $4/month, and my goal is to
keep my working files backed up. System Image backups, I've been using Super
Duper to a $50 external hard drive.

Glacier + a product like <http://www.haystacksoftware.com/arq/> means I get
the benefit of both worlds - Amazon will be fine with me dropping my entire
256 Gigabyte Drive onto Glacier (total cost - $2.56/month) and I get the
benefit of off site backup.

The world is about to get a whole lot simpler (and inexpensive) for backups.

~~~
sreitshamer
I'm looking into supporting Glacier in Arq. It sure is cheap -- $10/month for
a terabyte.

~~~
jordibunster
I think the key here is not to just provide a toggle for using Glacier instead
of S3, but to have the historical snapshots migrated to Glacier from S3 and
deleted every 90+ days.

I will gladly pay more money for Arq again.

~~~
sreitshamer
I think it would be more like choosing Glacier instead of S3 on a per-folder
basis.

Arq is incremental backup. The data structure is similar to git's
<http://www.haystacksoftware.com/arq/s3_data_format.txt>

Unless your files change a lot, keeping the latest backup version in S3 and
previous versions in Glacier would mean that most of your backup data are
still in S3 I think. Right?

------
buro9
This is a really good offering for media that you typically will keep locally
for instant access, yet you want to have an off-site backup in a way that
lives for a very long time.

Dropbox should work here, but it's simply too expensive. My photo library is
175GB. That isn't excessive when considering I store the digital negatives and
this represents over a decade.

I don't mind not being able to access it for a few hours, I'm thinking
disaster recovery of highly sentimental digital memories here.

If my flat burns down destroying my local copy, and my personal off-site
backup (HDD at an old friends' house) is also destroyed... then nothing would
be lost if Amazon have a copy.

In fact I very much doubt anyone I know that isn't a tech actually keeps all
of their data backed-up even in that manner.

I find myself already wondering: My 12TB NAS, how much is used (4TB)... could
I backup all of it remotely? It's crazy that this approaches being feasible.
It's under 30GBP per month for Ireland storage for my all of the data on my
NAS.

To be able to say, "All of my photos are safe for years and it's easy to
append to the backup.". That would be something.

A service offering a simple consumer interface for this could really do well.

~~~
DeepDuh
_sigh_ Dropbox should _not_ be used as a backup system. A system that
synchronizes live should _never_ get that role, except if they can guarantee
that old data is never overwritten, new is always appended. This is _not_ the
case with dropbox - I've experienced multiple scary occurrences of old
versions going to the nirvana with certain user actions. In certain cases the
old data just appears to be gone, in other cases the web interface shows them
but a restore results in an error message. Especially data move appears to be
buggy. Dropbox _appears_ to be simple, but the backend processes are really
not and there is too much going on for it to be a reliable backup system,
especially if you also share some of your data to team mates.

edit: emphasis formatting, niceifying.

~~~
g-garron
I use a combination of Dropbox and S3.

I have my MacBook and a Linux server linked to my Dropbox account. So changes
in my documents are synced to my Linux.

My Linux run three cron-jobs. One daily, one weekly and another monthly. The
command is.

s3cmd sync --delete-removed ~/Dropbox/documents/ s3://backup-daily/

There are buckets for weekly and monthly too.

Note: the command is not exactly like that, check the man page.

That way I have backed up all my documents very cheap.

~~~
ghshephard
One thing to be a little careful of - you are sending all of your data to S3,
which is, of course, the backing store for Dropbox.

~~~
sintaks
Dropbox stores their stuff in S3 a little different. It's not a 1:1
correspondence between user files and objects under Dropbox's S3 account. The
fact that they use S3 as their backing store means very little. It certainly
_sounds_ good to have S3 in back when you talk about scalability and
durability, but 1) they could just as easily use something else, and 2)
depending on their sharding strategy, a single lost object could impact
multiple files at the user level.

~~~
ghshephard
Right - the point I was trying to make, is he was putting all his eggs in one
basked. If anything catastrophic happened to S3, he might lose both his S3 _as
well as_ his Dropbox backups.

If you are going to the effort of having dual-backup systems, may as well try
and find something that can't be impacted by a single disaster.

~~~
sintaks
Ah, right - definitely.

------
martey
I think one of the most interesting parts of this is how they plan to ensure
that people do not use it for transient backup:
[https://aws.amazon.com/glacier/faqs/#How_am_I_charged_for_de...](https://aws.amazon.com/glacier/faqs/#How_am_I_charged_for_deleting_data_that_is_less_than_3_months_old)

 _Deleting data from Amazon Glacier is free if the archive being deleted has
been stored for three months or longer. If an archive is deleted within three
months of being uploaded, you will be charged an early deletion fee. In the US
East (Northern Virginia) Region, you would be charged a prorated early
deletion fee of $0.03 per gigabyte deleted within three months._

~~~
cayblood
So I guess that means it would still work well for a scheme like time machine
uses, where incremental changes are added but deletions are simply made note
of. At least I think that's how it works.

~~~
rektide
And yet they support a max of 100 vaults per account, so some roll behind
recompaction of incrementals is still necessary.

------
phil
Storage experts: I'd love to know more about what might be backing this
service.

What kind of system has Amazon most likely built that takes 3-4 hours to
perform retrieval? What are some examples of similar systems, and where are
they installed?

~~~
tezza
Typically they are tiered.

There'll be a near-line HDD array. This is for the recent content and content
they profile as being common-access.

Then there'll be a robotic tape library. Any restore request will go in a
queue annd when an arm-tapedrive becomes free they'll seek to the data and
read it into the HDD array.

Waiting for a slot with the robot arm - tape drive is what will take 4 hours.

EMC(kinda), Fujitsu etc make these.

<http://en.wikipedia.org/wiki/Tape_library>

[http://www.theregister.co.uk/2012/06/26/emc_tape_sucks_no_mo...](http://www.theregister.co.uk/2012/06/26/emc_tape_sucks_no_more/)

~~~
nodata
Wouldn't there also need to be a lot of logic to prevent fragmentation? You'd
probably want data from one user near other data from that user, i.e. on the
same tape.

~~~
dagw
I'd guess that they ignore that problem and have baked the time it takes to
get data from several tapes into the 3-4 hour estimate.

If you think about it, writes are more common than reads on average, so it's
more efficient to just write to whatever tape is online and deal with the
fragmentation problem on the read end, as opposed to queueing writes until the
'correct' tape can be brought online just save some time reading. Also in
backup situations like this, it's more important to get the backup done in a
timely manner.

~~~
jvdh
That's true, but they're obviously using a multi-tier solution, with about a
90 day buffer before things go to tape:

    
    
      In addition, there is a pro-rated charge of $0.033 per gigabyte for items deleted prior to 90 days.
    

So it may just be feasible to organise writes so they end up together. But,
yes, ultimately it is probably not worth it to do so.

------
zach
_Amazon Glacier is an extremely low-cost, pay-as-you-go storage service that
can cost as little as $0.01 per gigabyte per month._

What would be absolutely fascinating is a pay-before-you-go storage service —
data cryonics.

Paying $12 to store a gigabyte of data for 100 years seems like a pretty
intriguing deal as we emerge from an era of bit rot.

~~~
arethuza
"Paying $12 to store a gigabyte of data for 100 years"

I'm not sure what kind of organisation I'd actually trust to store data for
that length of time - a commercial organisation is probably going to be more
effective at providing service but what commercial organisation would you
trust to provide consistent service for 100 years? A Swiss bank perhaps?
Governments of stable countries are obviously capable of this (clearly they
store data for much longer times) but aren't set up to provide customer
service.

~~~
wheels
> _Governments of stable countries are obviously capable of this_

I don't consider that obvious. I live in Berlin, the capital of what most
would consider a stable country, but my apartment (which is even older) has
been a part of 5 different countries in the last 100 years (German Empire,
Weinmar Republic, Nazi Germany, East Germany and finally, the Federal Republic
of Germany).

~~~
arethuza
Sorry, what I meant by "stable" there is countries that have been relatively
stable for a few hundred years and seem reasonably likely to continue that
integrity for at least a century or so.

Of course, predicting future stability is complete guesswork!

~~~
abrahamsen
US. UK. Sweden. It is really hard to come up with countries where the
government has been stable "for a couple of centuries".

~~~
justincormack
Thailand.

~~~
jonknee
They have had a military coup within the last decade, not exactly a bastion of
stability.

~~~
dredmorbius
The Thai king is the longest-reigning current head of state, ascending the
throne on 9 June 1946. Elizabeth II of England is 2nd, 6 February 1952.

The oldest _country_ (not government) is likely Vietnam (2897 BCE). Other
contenders: Japan (660 BCE), China (221 BCE), Ethiopia (~800 BCE), or Iran
(678 BCE).

Few of today's modern states pre-date the 19th Century, many antedate World
War II or the great de-colonialisation of the 1960s including much of Africa
and Oceana (some of the longest inhabited regions of Earth).

Among the more long-lived institutions are the Catholic Church (traditionally
founded by Jesus ~30 AD, emerging as an institutional power in 2nd Century
Rome). The oldest company I can find is Kongo Gumi, founded in 578, a Japanese
construction firm. The record however is likely held by the Shishi Middle
School founded in China between 143 and 141 BCE.

My own suggestion would be the Krell, though some might disqualify this based
on a requirement for _human_ organization.

~~~
philwelch
Isn't Egypt another contender for the oldest country?

~~~
dredmorbius
That was my thought as well. However it spent a great deal of time under
foreign rule: under the Greeks and Romans, the Turkish / Ottoman empire, and
later under British occupation. And, I just discovered what boxer Muhammad
Ali's referent was.

<http://en.wikipedia.org/wiki/Egypt_history>

~~~
philwelch
So have Vietnam, China, and Iran.

~~~
dredmorbius
Japan's older than China. But yes. As I mentioned in my post above.

------
OoTheNigerian
What I find fascinating with Amazon's infrastructure push is the the
successful 'homonymization' of their brand name Amazon.

Amazon simultaneously stands for ecommerce and web infrastructure depending on
the context. e.g "Hey I want to host my server".. "Why don't you try Amazon".
"Do you know where I can get a fair priced laptop?" "Check Amazon".

Is there any other brand that has done this successfully?

Edit: I should have specified internet brand.

~~~
vidarh
Virgin (200+ businesses operating or having operated, under the Virgin brand,
ranging from infrastructures - trains, airlines - to records, banking, bridal
saloons under the name Virgin Bride...).

Mitsubishi and Samsung springs to mind as two of the best known ones
internationally where their brands are known in multiple markets
internationally, though many of their businesses are less known outside Asia
(e.g. Mitsubishi's bank is Japans largest). Any number of other Asian
conglomerates.

ITT used to fall in that category back in the day: Fridges, PC's, hotels,
insurance, schools,telecoms and lots more. I remember we at one point had both
an ITT PC and fridge. The name was well known in many of its markets.

The large, sprawling, unfocused conglomerate have fallen a bit out of favor in
Europe and the US. ITT was often criticized for their lack of focus even back
in the 80's, and have since broken itself into more and more pieces and
renamed and/or sold off many of them (e.g. the hotel group is now owned by
Starwood).

~~~
tommoor
Like you say, many of the big Asian companies are prime examples - Yamaha
manufactures Motorcycles and Synthesizers, seems like a good combination!

~~~
mjb
Musical instruments (especially wind instruments) and motorcycles share a lot
of similar design principles. Ever compared a saxophone to a WWII-vintage
motorcycle engine?

------
kondro
I know a lot of people seem to have jumped on the _backup_ options of Glacier
here and, whilst there is some potential for home users to make use of this
product for back, that is not what Glacier is intended for.

Glacier is an _archive_ product. It's for data you don't really see yourself
ever needing to access in the general course of business ever again.

If you're a company and you have lots of invoice/purchase transactional
information that's 2+ years old that you never use for anything, but you still
have to keep it for 5 - 10 years for compliance reasons, Glacier is the
perfect product for you.

Even its pricing is designed to take into account that the average use case is
to only access small portions of the total archive store at a reasonable price
(5% prorated for free in the pricing page).

~~~
ghshephard
For many users, though - they will never use the restore capability. And for
those who do, with Backblaze, they'll usually get a FedEx of a Hard Drive - so
Recovery time is measured in about a day or so. I wouldn't downplay the
consumer backup/restore angle so quickly - for many (most?) consumers, the
ability to restore rapidly is balanced by their desire to have low monthly
payments. I think we're going to see a lot of consumer backup applications
built on top of Glacier in the next several months that will be competing with
(the already excellent) backblaze and friends. (Note - Backblaze has excellent
real-time restore, with date versioning, for those of us who use it as an
online data recovery tool as well)

------
terhechte
This is fantastic. I've long searched for a solution like that. This is really
suitable for a remote backup that only needs to be accessed if something
really bad happens (i.e. a fire breaking out, etc). I'm a lone entrepreneur,
so I do have backup hard disks here, but being able to additionally save this
data in the cloud is great.

I'm often creating pretty big media assets, so Dropbox doesn't necessarily
offer enough space or is - for me - too expensive in the 500gb version (i.e.
$50 a month).

Glacier would be $10 a month for 1 terabyte. Fantastic.

~~~
yungchin
The only issue I see is that verifying archive integrity (you don't want to
find out the archive was bad after you lost the local backup...) would be
somewhat complicated, given their retrieval policies. Also, the billing for
data-transfer out plus peak retrievals sounds so convoluted, I can't begin to
work out what a regular test-restore procedure would cost me. Nevertheless,
it's some exciting progress in remote storage!

~~~
pgeorgi
They could provide salted hash verification: send some salt, get a list of
files with SHA1(salt | filedata) via email some hours later (so they can do
the verification as a low priority job).

The salt is used to prevent amazon from just keeping the hashes around to
report that all is well.

To avoid abuse, restrict the number of free verification requests per month.

~~~
toomuchtodo
Or just build the verification into the storage system, and send a SNS message
if data loss has occurred (just like what happens when a Reduced Redundancy
object has been lost in S3).

------
Keyframe
I'm not sure if cost is right. Each project I work on is approx. 50-60 TB in
size (video). Recent one got backed up on 20 LTO 5 tapes times three. That's
$600 for tapes per project. Each tape set went to a separate location - two
secure ones for about $20/year and one at studio archive for immediate access,
if needed. I find this method extremely reliable and it cost ~$700 initially
to back all up and virtually non existent further fees. With Glacier it would
cost $600 per month.

~~~
ghshephard
You have a pretty niche (but interesting!) use case. An LTO-5 Tape can store
1.5 Terabytes Raw [1] - Call it 2 Terabytes with a bit of compression (your
Video probably doesn't losslessly compress at 2:1). 60 Terabytes requires 30
Tapes - around $15/month to store at Iron Mountain. [2] . The Glacier Charge
for 60 Terabytes is $600/month vs $15/month for Tape Storage.

Also - upload/recovery times are problematic when you are talking 10s of
terabytes. Right now, the equation is in favor of archiving tapes at that
level (Even presuming you store multiple copies for redundancy/safety).

Glacier is for the people wanting to archive in the sub-ten terabyte range -
they can avoid the hassle/cost of purchasing tape drives, tapes, software -
and just have their online archive.

The needle will move - in 10 years Glacier might make sense for people wanting
to store sub 100 Terabytes, and tapes will be for the multi-petabyte people.

[1] <http://en.wikipedia.org/wiki/Linear_Tape-Open>

[2]
[http://www.ironmountain.com/Solutions/~/media/9F17511FA1A741...](http://www.ironmountain.com/Solutions/~/media/9F17511FA1A74131A0ECF1C132C642E0.pdf)

------
omh
Interestingly they penalise you for short-term storage:

 _Amazon Glacier is designed for use cases where data is retained for months,
years, or decades. Deleting data from Amazon Glacier is free if the archive
being deleted has been stored for three months or longer. If an archive is
deleted within three months of being uploaded, you will be charged an early
deletion fee. In the US East (Northern Virginia) Region, you would be charged
a prorated early deletion fee of $0.03 per gigabyte deleted within three
months_

~~~
MattSayar
After reading tezza's explanation [1] of how they're probably using tape
storage, this makes sense; Amazon wants the mechanical robot arm to spend the
majority of its time writing to the tapes. If you're constantly tying it up
with writes/deletes, you're taking time away from its primary mission: to
archive your data. Charging you for early deletes discourages that practice.

[1] <http://news.ycombinator.com/item?id=4411697>

~~~
ratzkewatzke
They're not using tape storage. The ZDNet story confirms this:
[http://www.zdnet.com/amazon-launches-glacier-cloud-
storage-h...](http://www.zdnet.com/amazon-launches-glacier-cloud-storage-
hopes-enterprise-will-go-cold-on-tape-use-7000002926).

There are any number of reasons why deletes would be discouraged. One is
packing: if your objects are "tarred" together in a compiled object,
discouraging early deletes makes it more cost-effective to optimistically pack
early.

------
nemesisj
I had a quick skim through the marketing stuff and the FAQs and didn't see
anywhere that actually details what the backend of this is. I'd be curious if
they're actually using tape, older machines, Backblaze pods, etc. I guess if
it's the latter, the time to recover could be an artificial barrier to prevent
people from getting cute.

~~~
flyt
It appears to use S3 as its basic backend. My guess is that S3 has been
modified to have "zones" of data storage that can be allocated for Glacier.
Once these zones have been filled with data (and of course that data is
replicated to another region) the hard drives are spun down and essentially
turned off.

This is why the cost of retrieval is so high: every time they need to pull
data the drives need to be spun back up (including drives holding data for
people other than you), accessed, pulled from, then spun back down and put to
sleep. Doing this frequently will put more wear and tear on the components and
cost Amazon money in power utilization.

As is Glacier should be extremely cheap for AWS to operate, regardless of the
total amount of data stored with it. Beyond the initial cost of purchasing
hard drives, installing, and configuring them the usual ongoing maintenance
and power requirements go away.

~~~
sintaks
Almost spot on. See: <http://news.ycombinator.com/item?id=4416065>

------
tc
Beware that retrieval fee!

The retrieval fee for 3TB could be _as high as_ $22,082 based on my reading of
their FAQ [1].

It's not clear to me how they calculate the hourly retrieval rate. Is it based
on how fast you download the data once it's available, how much data you
request divided by how long it takes them to retrieve it (3.5-4.5 hours), or
the size of the archives you request for retrieval in a given hour?

This last case seems most plausible to me [6] -- that the retrieval rate is
based solely on the rate of your requests.

In that case, the math would work as follows:

After uploading 3TB (3 * 2^40 bytes) as a single archive, your retrieval
allowance would be 153.6 GB/mo (3TB * 5%), or 5.12 GB/day (3TB * 5% / 30).
Assuming this one retrieval was the only retrieval of the day, and as it's a
single archive you can't break it into smaller pieces, your billable peak
hourly retrieval would be 3072 GB - 5.12 GB = 3066.88 GB.

Thus your retrieval fee would be 3066.88 * 720 * .01 = $22081.535 (719x your
monthly storage fee).

That would be a wake-up call for someone just doing some testing.

\--

[1]
[http://aws.amazon.com/glacier/faqs/#How_will_I_be_charged_wh...](http://aws.amazon.com/glacier/faqs/#How_will_I_be_charged_when_retrieving_large_amounts_of_data_from_Amazon_Glacier)

[2] After paying that fee, you might be reminded of S4:
<http://www.supersimplestorageservice.com/>

[3] How do you think this interacts with AWS Export? It seems that AWS Export
would maximize your financial pain by making retrieval requests at an
extraordinarily fast rate.

[(edit) 4] Once you make a retrieval request the data is only available for 24
hours. So even in the best case, that they charge you based on how long it
takes you to download it (and you're careful to throttle accurately), the
charge would be $920 ($0.2995/GB) -- that's the lower bound here. Which is
better, of course, but I wouldn't rely on it until they clarify how they
calculate. My calculations above represent an upper bound ("as high as"). Also
note that they charge separately for bandwidth out of AWS ($368.52 in this
case).

[(edit) 5] Answering an objection below, I looked at the docs and it doesn't
appear that you can make a ranged retrieval request. It appears you have to
grab an entire archive at once. You can make a ranged GET request, but that
only helps if they charge based on the download rate and not based on the
request rate.

[(edit) 6] I think charging this way is more plausible because they incur
their cost during the retrieval regardless of whether or how fast you download
the result during the 24 hour period it's available to you (retrieval is the
dominant expense, not internal network bandwidth). As for the other
alternative, charging based on how long it takes them to retrieve it would
seem odd as you have no control over that.

~~~
sintaks
Former S3 employee here. I was on my way out of the company just after the
storage engineering work was completed, before they had finalized the API
design and pricing structure, so my POV may be slightly out of date, but I
will say this: they're out to replace tape. No more custom build-outs with
temperature-controlled rooms of tapes and robots and costly tech support.

If you're not an Iron Mountain customer, this product probably isn't for you.
It wasn't built to back up your family photos and music collection.

Regarding other questions about transfer rates - using something like AWS
Import/Export will have a limited impact. While the link between your device
and the service will be much fatter, the reason Glacier is so cheap is because
of the custom hardware. They've optimized for low-power, low-speed, which will
lead to increased cost savings due to both energy savings and increased drive
life. I'm not sure how much detail I can go into, but I will say that they've
contracted a major hardware manufacturer to create custom low-RPM (and
therefore low-power) hard drives that can programmatically be spun down. These
custom HDs are put in custom racks with custom logic boards all designed to be
very low-power. The upper limit of how much I/O they can perform is
surprisingly low - only so many drives can be spun up to full speed on a given
rack. I'm not sure how they stripe their data, so the perceived throughput may
be higher based on parallel retrievals across racks, but if they're using the
same erasure coding strategy that S3 uses, and writing those fragments
sequentially, it doesn't matter - you'll still have to wait for the last
usable fragment to be read.

I think this will be a definite game-changer for enterprise customers.
Hopefully the rest of us will benefit indirectly - as large S3 customers move
archival data to Glacier, S3 costs could go down.

~~~
ghshephard
The math doesn't come close to replacing tape - basically once you go north of
100 terabytes (just two containers - at my prior company we had 140 containers
in rotation with iron mountain) Glacier doesn't make financial or logistical
sense. Far cheaper and faster to send your LTO-5 drives via driver.

~~~
sintaks
It may not make sense today. Amazon is notorious for betting on the far
future. They're also raising the bar on what archival data storage services
could offer. When you ship your bits to Amazon, they're in 3+ DCs, and
available programmatically.

Separate from the play for replacing tape, there's also the ecosystem
strategy. When you run large portions of your business using Amazon's
services, you tend to generate a lot of data that ends up needing to be
purged, else your storage bill goes through the roof. S3's Lifecycle Policy
feature is a hint at the direction they want you to go - keep your data, just
put it somewhere cheaper.

This could also be the case where they think they're going after tape, but end
up filling some other, unforeseen need. S3 itself was originally designed as
an internal service for saving and retrieving software configuration files.
They thought it would be a wonder if they managed to store more than a few GB
of data. Now look at it. They're handling 500k+ requests per second, and you
can, at your leisure, upload a 5 TB object, no prob.

But maybe you're right. The thing could fail. Too expensive. After all, 512k
ought to be enough for anybody.

~~~
ghshephard
Thanks very much for the insight - what you are saying actually makes a lot of
sense in the context of systems inside the AWS ecosystem. After all, they need
to archive data as well. Also - my 140 container example w/Iron Mountain was
Pre-versioning and always-online differential backups. We basically had a
complex tower-of-hanoi that let us recover data from a week, a month, six
months, and then every year (going back seven years) from all of our servers.
(And, by Year seven, when we started rotating some of the old tapes back in -
they were a generation older than any of our existing tape drives. :-)

Clearly, with on-line differential backups - you might be able to do things
more intelligently.

I'm already looking forward to using Glacier, but, for the forseeable future,
it looks like the "High End" archiving will be owned by Tape. And, just as
Glacier will (eventually) make sense for >100 Terabyte Archives, I suspect
Tape Density will increase, and then "High End" archiving will be measured in
Petabytes.

Thanks again.

------
bambax
This is the best thing ever! I've been dreaming about such a service for a
very long time -- posted about how I needed this here 6 months ago:

<http://news.ycombinator.com/item?id=3560952>

Rotating hard drives on the NAS in my attic is going to get a LOT simpler...

------
ibotty
that's certainly interesting. as there will be migration from s3 to glacier,
it would be nice if tarsnap had an option to store only the (say) last week in
s3 (with .3$/gb/month) and the rest in glacier (with, say, .03$/gb/month).

that would certainly be very nice. cperciva, what do you think?

~~~
cperciva
I can't see any way for Tarsnap to use this right now. When you create a new
archive, you're only uploading _new_ blocks of data; the server has no way of
knowing which _old_ blocks of data are being re-used. As a result, storing any
significant portion of a user's data in Amazon Glacier would mean that all
archive extracts would need to go out to Glacier for data...

Also, with Tarsnap's average block size (~ 64 kB uncompressed, typically ~ 32
kB compressed) the 50 microdollar cost per Glacier RETRIEVAL request means
that I'd need to bump the pricing for tarsnap downloads up to about $1.75 / GB
just to cover the AWS costs.

I may find a use for Glacier at some point, but it's not something Tarsnap is
going to be using in the near future.

~~~
PanMan
While I have no idea how you would fit it in your current infrastructure, I
certainly see a (BIG) use-case for: I have this 100 GB, store it somewhere
safe (in glacier), I won't need it for the next year (unless my house burns
down). I agree that is a bit different from ongoing daily backups with
changes, but its also not THAT different from a customer perspective. That it
doesn't fit with how you store blocks on the backend won't matter to a lot of
customers.

~~~
cperciva
Oh, I absolutely agree that Glacier has lots of great use cases. I wish
Tarsnap was able to make good use of it.

------
gvalkov
Here's to hoping that _duplicity_ and _git-annex_ could somehow make use of
this service. I'm far more optimistic about _duplicity_ support though, as
incremental archives seem to fit the glacier storage model much better. A
_git-annex_ special remote [1] might turn out to be much more challenging, if
at all possible.

[1] <http://git-annex.branchable.com/special_remotes/>

~~~
squidsoup
Once there's support for Glacier in boto (<https://github.com/boto/boto>) I
would imagine a duplicity backend would be easy to implement.

------
Gussy
From the retrieval times they are giving, it seems plausible that they could
be only booting the servers 5 or 6 times a day, to run the upload and
retrieval jobs stored in a buffer system of sorts. Having the servers turned
off for the majority of the time would same an immense amount of power,
although I wonder about the wear on drives spinning up and down compared to
being always on.

Any other theories on how this works on the backend while still being
profitable?

~~~
Baughnie
Tape drives. Lots and lots of tape drives.

------
jhack
This sounds really appealing as a NAS backup solution, but I'm a bit concerned
about security and privacy. Let's say I want to backup and upload my CDs and
movies, would Amazon be monitoring what I upload and assume I'm doing
something illegal?

~~~
avar
Just use GPG to encrypt your files before you upload them.

------
kdsudac
Wow, order of magnitude cheaper than S3 (1 cent per month vs. 12-14 cents for
S3). Data transfer priced the same.

Drawback is "jobs typically complete in 3.5 to 4.5 hours."

Seeing as how people tend to be pack rats, I can see this being huge.

------
nicolas314
One recurrent issue with Amazon services is that they charge in US$ and
currently do not accept euros. European banks charge an arm and a leg for
micropayment conversions: last time I got a bill from AWS for 0.02$ it ended
up costing me 20 euros or so. Pretty much kills the deal. One solution could
be to pre-pay 100$ on an account and let them debit as needed.

~~~
EwanToo
That's really odd, I've never heard of fees that high - in the UK I can use my
credit cards with Amazon and only pay a few pennies in transaction fees. Have
a look at the CaxtonFX Dollar Traveller, you need to load it with $200 but
then there's no transaction fees I think, except a slightly worse exchange
rate..

~~~
nicolas314
Sure there are workarounds. But why should I have to jump through hoops when
my account at Amazon is already entrusted with a European credit card and
merrily charging euros for all other goods?

~~~
cbg0
This sounds like it's an issue related with your bank trying to scrounge more
money off you. I have a European debit card and I get charged the equivalent
in EUR, nothing more.

------
regularfry
What does "99.999999999% durability" mean? Does it mean they allow themselves
to lose one byte per terabyte on average?

~~~
Swizec
I think it means there is a small chance enough hard drives might fail at the
same time that there happen to be no backups of those drives.

They make so many backups so quickly that there is only a 0.00000000001% (I
didn't count the zeros) chance of this occuring.

~~~
gjm11
Which of course means that (if they're telling the truth) the probability of
losing your data mostly comes from really big events: collapse of
civilization, global thermonuclear war, Amazon being bought by some entity
that just wants to melt its servers down for scrap, etc. (Whose probability is
clearly a lot more than 10^-11 per year; the big bang was only on the order of
10^10 years ago.)

~~~
Jabbles
_per object_. So although the chance of losing any particular object is tiny,
the chance of you losing _something_ is proportional† to the number of
objects. Still extremely small.

†roughly proportional if you have << 1e11 objects

~~~
gjm11
Yes. Though I bet the real lossage probabilities are dominated by failure
events that take out a substantial fraction of all the objects there are, and
that happen a lot more often than once per 10^11 years.

~~~
metalruler
Agreed. More likely a catastrophic and significant loss for a small number of
customers rather than a fraction of a percentage of loss for a large number.

Similar deal for hard drive bit error rates, where the quoted average BER may
not necessarily accurately represent what can happen in the real world. For
example, an unrecoverable read error loses 4096 bits (512 byte sectors) or
32768 bits (4k sectors) all at once, rather than individual bits randomly
flipped over a long period.

------
nl
Interesting to see that in some cases it probably makes sense to just stop
paying the bills, rather than pay the early deletion fee[1].

[1][https://aws.amazon.com/glacier/faqs/#How_am_I_charged_for_de...](https://aws.amazon.com/glacier/faqs/#How_am_I_charged_for_deleting_data_that_is_less_than_3_months_old)

------
rglover
I'm currently using an app called Arq that backs everything up to S3. If I had
to guess, I'd say there's about 50-60 gigs or more on there. Last months bill
was something like .60 cents. How does glacier compare or contrast to this
setup (the app does something similar with the archive concept)?

~~~
dont_believe_u
Would love to know where you're getting that $0.60 number (I assume you meant
60 cents and not 0.60 cents). Even with the first-year free tier, it costs
$7.00/mo ($6.00/mo on RRS) to store 60 GB of data on S3.

------
moontear
Crashplan is still cheaper for storage larger than 400GB.

Crashplan+ Unlimited is USD 2.92/month if you take the 4 year package. When I
upload 300GB to Amazon and pay 0.01 * 300 = USD 3/month. Amazon would be even
more expensive for larger amounts of data.

Is there some fine print I'm missing with Crashplan unlimited?

~~~
josephagoss
Also remember its free to recover your entire Crashplan archive (I have 500GB
with them). If you wanted to recover 500GB with Glacier it would cost $200
@10MB/s (according to someones calculation further down) You have to pay for
retrieval

~~~
moontear
Not totally true, you have to pay for retrieval for 1GB and upwards per month:
<http://aws.amazon.com/glacier/#pricing> as well as 5cent per upload/retrieval
request.

------
mvanveen
I just started a project where I'm keeping a raspberry pi in my backpack and
am archiving a constant stream of jpgs to the cloud. I've been looking at all
the available cloud archival options over the last few days and have been
horrified at the pricing models. This is a blessing!

~~~
lucaspiller
Great idea, have you got any more details?

------
urza
I am a bit confused with the pricing of retrieval.

Could some good soul tell me how much would cost to:

Store 150 GB as one big file for 5 years. To this I will add 10 GB (also as
one file) every year. And lets say I will need to retrieve the whole archive
(original file + additions) at the end of year 2 and 5.

How much will it cost?

------
monkeypizza
They should provide a "time capsule" option - pay X dollars, and after a set
number of years, your data archive will be opened to the public for a given
amount of time.

There'd be no better way to ensure that information would eventually be made
public.

~~~
spindritf
You can build it on top of Glacier.

------
benguild
Great. Now if someone would simply create a badass client for this for Mac
(like Backblaze or Mozy) … we'd be in business. :)

~~~
sa1f
Arq's support is enough. <http://www.haystacksoftware.com/arq/>

~~~
benguild
It already supports it?

~~~
jordibunster
Kinda: "In the coming months, Amazon S3 will introduce an option that will
allow customers to seamlessly move data between Amazon S3 and Amazon Glacier
based on data lifecycle policies."

~~~
ojilles
URL? Can't find this on the website, blog or twitter account.

~~~
icebraining
It's in the Glacier FAQ:
[http://aws.amazon.com/glacier/faqs/#How_should_I_choose_betw...](http://aws.amazon.com/glacier/faqs/#How_should_I_choose_between_Amazon_Glacier_and_Amazon_S3)

------
ghshephard
What's particularly awesome, is that this likely represents an upper bound on
cost. It will only go down as time goes on.

------
jeffchuber
Try this link

[http://aws.amazon.com/glacier/?utm_source=AWS&utm_medium...](http://aws.amazon.com/glacier/?utm_source=AWS&utm_medium=website&utm_campaign=BA_glacier_launch)

------
sharingancoder
Looks like the perfect solution to backup all my photos. Considering 100 GB of
photos, that's still just 100 * 0.01 * 12 = $12 a year! I'm sold on this!

------
josteink
This looks like exactly what I need. I'm currently using S3 for backup and
archiving by regularly running s3cmd to sync my data on my NAS.

And while not super-duper expensive, s3 provides much more than I really need,
and hence a more limited (but cheaper) service would definitely be
appreciated.

If there is anything with the easy of use like s3cmd to accompany this service
I will be switching in a heartbeat.

------
mp99e99
I'm with Atlantic.net cloud [AWS competitor, full disclosure]; the price point
for storage is great but retrieval seems expensive -- perhaps retrieval is
very rare its offset by the savings on the storage. I know you can mail in
drives for storage, can you have them mail you drives for retrieval? (for
Glacier specifically)

Also, prior comments made mention they were using some sort of robotic tape
devices, but according to this blog:

[http://www.zdnet.com/amazon-launches-glacier-cloud-
storage-h...](http://www.zdnet.com/amazon-launches-glacier-cloud-storage-
hopes-enterprise-will-go-cold-on-tape-use-7000002926/)

Its using "commodity hardware components". So, thats why I thought maybe they
are loss-leadering on the storage and making up on the retrieval prices?

Its definately a interesting product and I love how there's a reason they
called it Glacier. AMZN is a wild boar going after everyone!

~~~
sintaks
I don't think they're loss-leadering on storage, but if they are, they don't
think they will be for long. AWS (EC2 and S3 in particular) does very well
when it comes to profit margins. I suspect they'd like to keep it that way,
and that whatever they're charging gives them some slice of profit, however
small.

------
sharth
Since this is built on top of S3, I'd love to be able to store EBS Snapshots
in this.

~~~
kondro
And then wait 4+ hours to restore them?

~~~
acdha
That'd make a ton of sense for forensic and compliance needs: lots of storage,
limited access in special situations where a little delay is reasonable.

------
akh
We've just added support for Amazon Glacier to <http://www.PlanForCloud.com>
so you quickly forecast your costs and compare it with other options.

~~~
akh
We just ran a quick cost forecast and it's interesting: If you start with
100GB then add 10GB/month, it would cost $102.60 after 3 years on AWS Glacier
vs $1,282.50 on AWS S3!

------
Swizec
This is amazing! Just ~3 weeks ago I finally broke down and started using S3
to store my digital photos and other such crap.

Sure hope transferring 10 or 20 gigabytes of data from S3 to Glacier is easy.

------
gadders
Anyone know what the TOS are for this? I couldn't find them on a scan of the
announcement.

A lot of the consumer-level services refuse any liability for any data loss.
Does Amazon do the same for this?

~~~
ghshephard
Presumably the standard S3 SLA will apply: <http://aws.amazon.com/s3-sla/>

Realistically, you'd want to have at least two diverse cloud backup systems -
I doubt you'd be happy with Service Credits if your data went missing.

~~~
gadders
It always seem funny though that these companies say

"We will keep your data safe! *

* T&C's apply, if we lose it, you're on your own."

I shouldn't imagine Bank Vaults deal the same way with physical property. Can
you get insurance for digital assets the same way you can for physical ones?

------
jalada
Can anyone make sense of the retrieval fees? Seems like the most confusing
thing ever. If I'm storing 4TB and one day I want to restore all 4TB, how much
is it going to cost me?

~~~
kondro
1 x retrieval request per archive (it's really designed to store a small
number of large files/tars) plus $0.12/GB... therefore, 4TB = $480.

~~~
jalada
No, that's not right. That's only the data transfer rate.

If you check the FAQ, they bill you based on your peak hourly retrieval rate.

If I download 4TB at say...10MB/s, not only do I need to pay $480, but I also
have to pay ~$257 as a retrieval fee?

Their wording is confusing, but ignoring the free retrieval amount (negligible
difference on a 4TB transfer):

Fee = Peak hourly retrieval * number of hours in month * $0.01

[https://aws.amazon.com/glacier/faqs/#How_much_data_can_I_ret...](https://aws.amazon.com/glacier/faqs/#How_much_data_can_I_retrieve_for_free)

Admittedly ~$737 isn't the end of the world if your house has burned down and
you need all your data back, but it's still important to know the details.

I think in that situation, it would be cheaper to use their bulk
import/export, which would be roughly $300 for 4TB

~~~
Dylan16807
This is so confusing. So apparently if you spend the entire month retrieving
the data at 1.6MB/s it only costs $40 plus transfer fees? And more
importantly, _how_ do you throttle your retrieval?

Edit: So I'm working through a scenario in my head and trying to figure out
how charging based on the peak _hour_ isn't completely ridiculous.

I have 8GB stored to try out the system. This costs a whopping dollar per
year. One day I decide to test out the restore feature. So I go tell Amazon to
get my files and wait a few hours. When Amazon is ready, I hit download. I'm
on a relatively fast cable connection so the download finishes in an hour. I
look at the data transfer prices and expect to be charged one dollar.

But I didn't take into account this 'peak hour' method. I just used roughly
8GB/hour over the minimal free retrieval. This gets multiplied out times 24
hours and 30 days to cost 8 * 720 * $0.01 = $57. _Fifty-seven times my annual
budget_ because I downloaded my data too quickly after waiting hours for
Amazon to get ready.

~~~
freehunter
If you're in a corporate environment, you'd likely have a network admin group
that could throttle your connection to them. Or you'd be using a custom-
designed front-end that would throttle it. Or if you're SOHO, you could set up
QoS rules on your router to throttle it.

Realistically though, this service might not be for you if fast and cheap
retrieval of your data is important. The importance here is cheap storage, not
transfer. They could reasonably expect that you'd only retrieve this data once
or twice, if ever, and cost won't be deterrent. Say, if your data center burns
down and your company is moving to a new office.

~~~
Dylan16807
Oh I totally understand that people are going to be willing to pay more after
a disaster to get files back. But realistically, if there is a large enough
volume of files to bother amazon then they're going to need day(s) to
download. If they rate-limited by day then the price would only reach a couple
years of storage. The hourly thing is only going to bite minor retrieval
events, and it is going to bite them amazingly hard.

~~~
freehunter
My department (information security) was actually just discussing this service
this morning at our morning meeting. We've been looking into backup services
for our security monitoring appliance beyond our datacenter and DR site. These
backups would need to persist for a year according to PCI/SOX compliance, and
if we needed to show data to an auditor, we wouldn't need the entire log. In
fact, we'd likely not even need 5% of it. Most likely, we'd only need to pull
a day (maybe two) from the logs.

I can imagine our media services group talking about the same thing, how to
keep master files of their product pictures/videos where they'd only need to
grab a file or two here or there (if at all).

Amazon seems to be pushing this as a file dump where retrieval of the files is
exceedingly rare. They don't want you to use it as a hard drive, they want you
to use it as a magnetic tape drive.

~~~
Dylan16807
It's not that I want to use this as hot storage, it's that I would like to be
able to test my backups once or twice a year.

------
klodolph
I get a 404, and when I go to <http://aws.amazon.com/> I see nothing about
glacier in the news feed.

~~~
sqnguyen
This link should work better: <http://aws.amazon.com/glacier/>

~~~
klodolph
Hm, that link was 404 as well until just a moment ago. Must have taken time to
propagate to the edge servers or whatever is going on there.

------
catastrophe
Any easy to use Windows clients for Glacier (for backing up an OS and/or other
files)?

If not, to any developers reading this: there's money in them thar Glacier.

------
kloc
Amazon should complement this service with data contact centers which are
connected to their data centers network. Then people could go to these centers
in person and hand over their hard drives full of data for back up. It will be
like bank lockers but only digital. At this low price people would want to
upload terabytes of data which will be pain to upload/download.

~~~
digeridoo
You mean like <http://aws.amazon.com/importexport/> ?

~~~
kloc
Was not aware of this service. Thanks!

------
mslot
I think this is more or less the formula for calculating monthly costs
(corrections welcome):

0.01S+max(0,7.20*(R-0.0017S)/4)

S is number of GB stored

R is biggest retrieval in the month

4 is the average number of hours a retrieval

For an example with 10TB storage (replace 10000 to change):
<http://fooplot.com/plot/4pu7u2gpox>

x is biggest retrieval in GB, y is $/month

~~~
mslot
Correction: See my other post <http://news.ycombinator.com/item?id=4416684>

------
xedarius
Would be nice if I could just type my Amazon login details into TimeMachine
and magic off-site backups just happened.

------
Yrlec
This looks awesome! We are currently developing a P2P-based backup solution
(<http://degoo.com>) where we are using S3 as fall-back. This will allow us to
be much cheaper and I am sure it will enable many other backup providers to
lower their prices to.

~~~
mik4el
The S3->Glacier lifecycle-feature must be a welcome thing for you I guess?

~~~
Yrlec
Definitely! Hopefully it can enable us to provide a good trade-off between
cost and restore time.

------
kristofferR
I hope they can get the access time down from 3-5 hours to about 1 hour -
that's the difference for me between it being a viable alternative for storing
backups of my client's web sites or not.

I might create a script that uploads everything to Glacier and just keeps a
couple of the latest backups on S3 though.

~~~
bbgm
Per Werner's blog post [1] "in the coming months, Amazon S3 will introduce an
option that will allow customers to seamlessly move data between Amazon S3 and
Amazon Glacier based on data lifecycle policies."

1\. [http://www.allthingsdistributed.com/2012/08/amazon-
glacier.h...](http://www.allthingsdistributed.com/2012/08/amazon-glacier.html)

------
res0nat0r
Does anyone know if there is a CLI tool to interface with this yet? I see
SDK's mentioned on the product homepage but I dont see any simple CLI tools
for this yet to upload/download data and query etc.

<http://aws.amazon.com/developertools/>

------
WalterBright
I'm curious about data security. It says it is encrypted with AES. But is it
encrypted locally and the encrypted files are transferred? I.e. does Amazon
ever see the encryption keys?

Or is the only way to encrypt it yourself, and then transfer it?

~~~
zhoutong
AFAIK the keys are managed by Amazon, just like S3. It's more for compliance
reasons rather than real security. Encryption still has to be done yourself to
protect the data.

~~~
guan
It does protect you against situations where Amazon loses disks or tapes, or
disposes of them improperly, or they are stolen.

------
Ecio78
It will be interesting to know if they will upgrade AWS Storage Gateway to use
this kind of backend instead of S3 <http://aws.amazon.com/storagegateway/>

------
newhouseb
It's a bit strange how S3 has a 'US Standard' region option, while Glacier has
the usual set of regions (US East, US West, etc). I wonder if this means that
unlike S3, Glacier isn't replicated across regions?

~~~
miles932
US Standard isn't replicated across regions.

~~~
newhouseb
Hm, you could be right, AWS says:

> The US Standard Region automatically routes requests to facilities in
> Northern Virginia or the Pacific Northwest using network maps.

I guess this means that the data is automatically geographically sharded?

~~~
jordanthoms
Keep in mind that even if your data is in one AWS region, it'll still be
stored in multiple different datacenters some distance apart. Just not on the
other side of the US.

------
JakeSc
> Amazon Glacier is designed to provide average annual durability of
> 99.999999999% for an archive.

That's pretty impressive. I wonder how many bytes they lost to lose that
.0000000001% of data.

------
jebblue
I just saw this and thought wow finally cheap mass storage. After reading the
comments (and the Amazon Glacier web page) it's clear it's cheap archiving but
not cheap retrieval.

------
akurilin
Any desktop apps out there that will let you add folders for backup to Glacier
and have them be automatically synced up to the cloud as they change? That
would be quite useful.

~~~
d0ugal
<http://www.haystacksoftware.com/arq/>

I assume they will support Glacier soonish.

------
Sami_Lehtinen
Excellent! Is there any good open source client / backup application for this?
I would start using it immediately. I'm currently using ridiculously expensive
backup solution.

~~~
nodata
Duplciity supports S3, so I'd watch it for Glacier support:
<http://duplicity.nongnu.org/>

~~~
wladimir
I wonder if Glacier support in Duplicity will be possible without large
changes. AFAIK, duplicity also reads some state from the remote end to
determine what to backup (Although it also keeps a local cache of this?). To
use glacier, the protocol would have to completely write-only.

~~~
takluyver
I'd guess it would use a hybrid approach, with recent backups on S3 (which
duplicity already does) being shifted to glacier after a period of time. The
FAQ indicates that Amazon plans to make this easy.

------
djbender
In case anyone is wondering it appears you can only upload from their APIs
right now. I wonder if they intend to make it accessible through their web
interface at any point?

------
csears
I wonder if homeowner's insurance would cover the retrieval fee if your
computer/hard drives were protected assets on the policy.

------
jordanthoms
Wow, the cadence of releases from aws recently has been amazing. Can't wait to
see what else they have in store.

------
ck2
Now if Tim Kay just adds support for glacier and I can leave all the backup
scripts as is...

------
ukd1
11 9's. Impressive.

------
portentint
Worst. name. ever.

~~~
milesokeefe
How so? Doesn't it represent that your data is safe, as it is "frozen", and
slow to retrieve like a glacier?

~~~
talaketu
I like the name too. An accreting mountain of preserved data.

------
gregtour
Wake up.

