
I ended up paying $150 for a single 60GB download from Amazon Glacier - markonen
https://medium.com/@karppinen/how-i-ended-up-paying-150-for-a-single-60gb-download-from-amazon-glacier-6cb77b288c3e#.w582hjycu
======
dirktheman
I'm the first one to admit that Glacier pricing is neither clear nor
competetive regarding retreival fees. I do think that a lot of people use it
the wrong way: as a cheap backup. I use:

1\. My Time Machine backup (primary backup)

2\. BackBlaze (secondary, offsite backup)

3\. Amazon Glacier (tertiary, Amazon Ireland region)

I only store stuff that I can't afford to miss on Glacier: photos, family
videos and some important documents. Glacier isn't my backup, it's the backup
of my backup of my backup: it's my end-of-the-world-scenario backup. When my
physical harddrive fails AND my backblaze account is compromised for some
reason, only then will I need to retrieve files from Glacier. I chose the
Ireland region so my most important files aren't even on the same physical
contintent.

When things get so dire that I need to retrieve stuff from Glacier, I'd be
happy to pony up 150 dollars. For the rest of it, the 90 cents a month fee is
just a cheap insurance.

~~~
StavrosK
Similarly, I have a NAS at my house pull stuff from all the other computers
and back it up to its drives (snapshotted with ZFS to avoid deleting files and
realizing it days later), and borg to upload to rsync.net. I paid $54 per year
for 150 GB, which is pretty cheap.

~~~
DavideNL
fyi, there's also Amazon Cloud Drive which includes unlimited storage for
$59.99/year.

~~~
StavrosK
That's a great price, but I don't think it works with borg, which is just
fantastic. Hands down, the best of the programs I tried (it's a fork of attic:
[http://www.stavros.io/posts/holy-grail-
backups/](http://www.stavros.io/posts/holy-grail-backups/))

------
res0nat0r
Glacier pricing has to be the most convoluted AWS pricing structure and can
really screw you.

Google Nearline is a much better option IMO. Seconds of retrieval time and
still the same low price, and much easier to calculate your costs when looking
into large downloads.

[https://cloud.google.com/storage/docs/nearline?hl=en](https://cloud.google.com/storage/docs/nearline?hl=en)

~~~
_g9ex
There's also Backblaze B2 (public beta): [https://www.backblaze.com/b2/cloud-
storage.html](https://www.backblaze.com/b2/cloud-storage.html)

Their pricing is great (0.5c/month) but I'm a little worried about their
single DC.

~~~
andmarios
3 days ago they sent a newsletter about their new (alpha quality) b2sync tool
which essentially is a “rsync to backblaze” utility.

This makes their offer very interesting.

~~~
josephagoss
Can this tool offer deduplication (so that changing a folder name does not re-
upload thousands of files) or is that something I would have to code into my
own backup solution?

~~~
andmarios
No, it is even less than rsync. It can only upload a folder recursively and
skip files with common modification time on local and remote.

Considering though that before that you couldn't even upload a directory, only
files, this is a huge step. :)

~~~
BorisMelnik
I currently need some help whipping up a solution for this.

~~~
manquer
git with lfs

------
markonen
OP here. Some updates and clarifications are in order!

First of all, I just woke up (it’s morning here in Helsinki) and found a nice
email from Amazon letting me know that they had refunded the retrieval cost to
my account. They also acknowledged the need to clarify the charges on their
product pages.

This obviously makes me happy, but I would caution against taking this as a
signal that Amazon will bail you out in case you mess up like I did. It
continues to be up to us to fully understand the products and associated
liabilities we sign up for.

I didn't request a refund because I frankly didn't think I had a case. The
only angle I considered pursuing was the boto bug. Even though it didn't
increase my bill, it stopped me from getting my files quickly. And getting
them quickly was what I was paying the huge premium for.

That said, here are some comments on specific issues raised in this thread:

\- Using Arq or S3's lifecycle policies would have made a huge difference in
my retrieval experience. Unfortunately for me, those options didn't exist when
I first uploaded the archives, and switching to them would have involved the
same sort of retrieval process I described in the post.

\- During my investigation and even my visits to the AWS console, I saw plenty
of tools and options for limiting retrieval rates and costs. The problem was
that since my mental model had the maximum cost at less than a dollar, I
didn't pay attention. I imagined that the tools were there for people with
terabytes or petabytes of archives, not for me with just 60GB.

\- I continue to believe that “starting at $0.011 per gigabyte” is not a
honest way of describing the data retrieval costs of Glacier, especially when
the actual cost is detailed, of all things, as an answer to a FAQ question. I
hammer on this point because I don't think other AWS products have this
problem.

\- I obviously don't think it's against the law here in Finland to migrate
content off your legally bought CDs and then throw the CDs out. Selling the
originals, or even giving them away to friend, might have been a different
story. But as pointed out in the thread, your mileage will vary.

\- I am a very happy AWS customer, and my business will continue to spend tens
of thousands a year on AWS services. That goes to something boulos said in the
thread: "I think the reality is that most cloud customers are approximately
consumers". You'd hope my due diligence is better on the business side of
things, as a 185X mistake there would easily bankrupt the whole company. But
the consumer me and the business owner me are, at the end, the same person.

------
re
Glacier's pricing structure is complicated, but fortunately it's now fairly
straightforward to set up a policy to cap your data retrieval rate and limit
your costs. This was only introduced a year ago, so if like Marko you started
using Glacier before that it could be easy to miss, but it's probably
something that anyone using Glacier should do.

[http://docs.aws.amazon.com/amazonglacier/latest/dev/data-
ret...](http://docs.aws.amazon.com/amazonglacier/latest/dev/data-retrieval-
policy.html#data-retrieval-policy-using-console)
[https://aws.amazon.com/blogs/aws/data-retrieval-policies-
aud...](https://aws.amazon.com/blogs/aws/data-retrieval-policies-audit-
logging-glacier/)

~~~
vonklaus
> fortunately it is fairly strait forward to cap your data retreival rates.

Amazon has done a great job with this feature. By doing a poor job
implementing something for an extremely narrow usecase, in a technology that
is outdated and then providing the most complicated pricing structure
surrounding every aspect of the product one can't helpbut use the feature: any
other provider or service.

Like, wtf would be the usecase for amazon glacier in 2016? I dont think I
would put hubdreds of petabytes of sata into 20 year cold storage, and the
author of this post certainly wouldnt use it again. The fact that i need to
read 2 pages of pricing docs and then the 2 pages you linked to control them
because I cant estimate them myself, is a sure sign this is absurd

~~~
zcdziura
Tape storage is still the most optimal form of long-term storage. If you need
to store things for an exceptionally long time, such as financial data,
scientific data, etc, then you're going to get the most bang for your buck on
tape.

~~~
vonklaus
The post states he paid $150 yo retrieve 60gb of data. For $150 you can buy a
5tb hdd drive.

What usecase could the price delta make sense to have a 4 hour feedback loop
and all of your important data locked in someone elses data center.

the usecase where your data is so properly massive that this makes sense &&
you don't have the storage infrastructure in place, is so narrow that it
doesnt make sense.

It is basically one research student's crawl data

Edit: also, s3 is pretty cheap. So again, i dont really see the usecase here.
How much room is in the market between your own physical or digital system and
amazon s3 or an equivalent. you would have to have a massive amount of data
you dont care about and be very price sensitive.

~~~
88e282102ae2e5b
Durability is one reason. A physical hard drive in my desk is so much more
susceptible to destruction or loss or theft.

~~~
bigiain
This.

The data I care about is already backed up on two different multi-TB drives at
home, and another one at work.

Glacier is the contingency for "something took out the original data and all
three backups in two different locations 7 or 8km apart - if I'm still alive
after whatever just happened, I'll consider whether or not to pay Amazon a
grand or so to retrieve it quickly from Glacier, or wait ~20 months to get it
all for single-digit-dollars".

~~~
manigandham
If you're talking about personal data, why not just use backblaze and amazon's
consumer unlimited cloud storage?

That gives you 2 backup providers that can durably store everything and it's
free and quick to access. Why deal with all the harddrives and glacier?

~~~
88e282102ae2e5b
Neither of those seem free - am I looking in the wrong place? Also there's no
Linux client for either AFAICT.

~~~
manigandham
Right the services cost money but retrieval is free. Going by the cost of the
harddrives amortized out, it'll probably be the same or less. You get far more
durability and less complexity with universal web access.

I believe there are other similar services for Linux or you can just use
browser to upload files with Amazon.

------
astrostl
Arq has a fantastic Glacier restore mechanism. You select a transfer rate with
a slider, and it informs you how much it will cost and how long it will take
to retrieve. It optimizes this with an every-four-hours sequencing as well.
See
[https://www.arqbackup.com/documentation/pages/restoring_from...](https://www.arqbackup.com/documentation/pages/restoring_from_glacier.html)
for reference.

~~~
copperx
It's unfortunate that Arq forces you to archive in their proprietary format.
That locks you in to the tool.

~~~
asymptotic
I was also concerned about this when I was looking into Arq, so I wrote a
cross-platform restoration tool that'll also work on Windows and Linux (not
just Mac):
[https://github.com/asimihsan/arqinator](https://github.com/asimihsan/arqinator)

This is purely based on the author's excellent description of the format in
his arq_restore tool:
[https://www.arqbackup.com/s3_data_format.txt](https://www.arqbackup.com/s3_data_format.txt)

------
Spooky23
The only use case I would be willing to commit to glacier would be legal-hold
or similar compliance requirement.

The idea would be that the data would either never be restored or you could
compel someone else to foot the bill or using cost sharing as a negotiation
lever. (Oh, you want all of our email for the last 10 years? Sure, you pick up
the $X retrieval and processing costs)

Few if any individuals have any business using the service. Nerds should use
standard object storage or something like rsync.net. Normal people should use
Backblaze/etc and be done with it.

~~~
timv
Back when I worked in banking we had requirements like that (though we didn't
use glacier)

We had a legal requirement to be able to product up to 7 years worth of bank
statements upon receipt of a subpoena.

Not "reproduce the statements from your transactions records" but "give us a
copy of the statement that you sent to this person 6.5 years ago"

We had operational data stores that could generate a new statement for that
time period, but if we received the subpoena then we needed to be able to
produce the original, that included the (printed) address that we sent it to,
etc.

We had (online) records of "for account 12345, on 27th October 2011, we sent
out a statement with id XYZ", we'd just need a way to pull up statement XYZ.

There's no way(^) we'd _ever_ get subpoenaed for more than 5% of our total
statement records in a single month, so something like Glacier would have been
a great fit.

We had other imaging+workflow processes where we'd receive a fax/letter from a
client requesting certain work be undertaken (e.g a change of address form).
90 days after the task was completed, you could be _pretty sure_ that you
wouldn't need to look at the imaged form again, but not 100% sure. We could
have used glacier for that.

We use case that would have cost us (rare, but we needed to plan for it) was
"We just found that employee ABC was committing fraud. Pull up the original
copies of all the work they did for the 3 years they worked here, and have
someone check that they performed the actions as requested." Depending on
circumstances & volume that might trigger some retrieval costs, but the net
saving would almost certainly still be worth it.

(^) Unless there was some sort of class action against us, but that's not a
scenario we optimised for.

------
KaiserPro
Glacier is not a cheap/viable backup

its even less suited to disaster recovery (unless you have insurance)

Think about it. For a primary backup, you need speed and easy of retrieval.
Local media is best suited to that. Unless you have a internet pipe big enough
for your dataset (at a very minimum 100meg per terabyte.)

4/8hour time for recovery is pretty poor for small company, so you'll need
something quicker for primary backup.

Then we get into the realms of disaster recovery. However getting your data
out is neither fast nor cheap. at ~$2000 per terabyte for just retrieval, plus
the inherent lack of speed, its really not compelling.

Previous $work had two tape robots. one was 2.5 pb, the other 7(ish). They
cost about $200-400k each. Yes they were reasonably slow at random access, but
once you got the tapes you wanted (about 15 minutes for all 24 drives) you
could stream data in or out as 2400 megabytes a second.

Yes there is the cost of power and cooling, but its fairly cold, and unless
you are on full tilt.

We had a reciprocal arrangement where we hosted another company's robot in
exchange for hosting ours. we then had DWDM fibre to get a 40 gig link between
the two server rooms

------
Nexxxeh
The post is a useful cautionary tale, and he's not alone in getting burned by
Glacier pricing. Unfortunately it was OP not reading the docs properly.

Yes, the docs are imperfect (and were likely worse back in the day). And it
was compounded by the bug, apparently. But it's what everyone on HN has
learned in one way or another... RTFM.

Was it mentioned in the article that the retrieval pricing is spread over four
hours, and you can request partial chunks of a file? Heck, you can retrieve
always all your data from Glacier for free if you're willing to wait long
enough.

And if it's a LOT of data, you can even pay and they'll ship it on a hardware
storage device (Amazon Snowball).

Anyone can screw up, I'm sure we all have done, goodness knows I have. But at
the very least, pay attention to the _pricing_ section, especially if it links
to an FAQ.

~~~
avodonosov
I would say the FM in this case was unreadable. “starting at $0.011 per
gigabyte”, "learn more" \- no one would expect to pay $150 here.

~~~
jlarocco
It's definitely sleazy of Amazon to hide the pricing info like that, but I
can't imagine seeing that sentence while making a purchase and not clicking
through to see how the pricing actually works. But I guess that's just me.

~~~
avodonosov
Not just you. But read the page referred by that "Learn more" link. It's very
unclear. I mean, or course, it has unambiguous meaning. But to understand it
you need to read a lengthy sheet of prose, and only deep in it, after many
definitions, you must notice a phrase "we multiply your peak hourly billable
retrieval rate" ... "by the number of hours in a month". What concentration of
attention, and how much time a reader need to spend, to understand this
important nuance?

As far as I see, in all other places they specify their pricing "per GB", and
only this small phrase uncovers real meaning, which is not per actual GBs you
transferred, but your peak rate multiplied by number of hours in month. IMHO
this should be one of the first phrases describing the pricing model.

------
sathackr
This sounds a lot like demand-billing [1] [2] that's common with electric
utilities, particularly commercial, and increasingly, people with grid-tied
solar installations. [citation needed]

You pay a lower per-kilowatt-hour rate, but your demand rate for the entire
month is based on the highest 15-minute average in the entire month, then
applied to the entire month.

You can easily double or triple your electric bill with only 15 minutes of
full-power usage.

I once got a demand bill from the power company that indicated a load that was
3 times the capacity of my circuit (1800 amps on a 600 amp service). It took
me several days to get through to a representative that understood why that
was not possible.

[1]
[http://www.stem.com/resources/learning](http://www.stem.com/resources/learning)

[2] [http://www.askoncor.com/EN/Pages/FAQs/Billing-and-
Rates-8.as...](http://www.askoncor.com/EN/Pages/FAQs/Billing-and-Rates-8.aspx)

------
lazyant
You don't have a backup until you test its restore.

~~~
anonfunction
I've never heard that and I'm stealing it.

~~~
ekimekim
Come on, don't downvote someone for learning something (you think is) well
known for the first time. Especially when their response is "that's great and
I'm going to use it". [https://xkcd.com/1053/](https://xkcd.com/1053/)

~~~
lmm
The comment is noise - no different from "+1" or "me too". If you found a
comment helpful and want to thank someone for it, the way to do that is an
upvote.

~~~
anonfunction
It's a bit different, I added that I had never heard of the phrase and that I
would be using it.

------
profsnuggles
Even with the large data retrieval bill he still saves ~$100 vs the price of
keeping that data in S3 over the same time period. Reading this honestly makes
me think glacier could be great for a catastrophic failure backup.

~~~
workitout
Compare to Google Drive at 100 Gigs for2 bucks and no bandwidth charges that
I'm aware of.

~~~
profsnuggles
I looked up the Google drive pricing and the costs increase quickly over that
100GB level. I'm going to use 3TB of data for my example because that is
approximately the amount of data I would be backing up at work. The cost for
Google Drive would be the $99 a month 10TB plan, Amazon Glacier is $21.51 a
month. This is before you get into things like having the enterprise AWS
ecosystem with IAM versus a single user Gmail account. Remember I am only
talking about retrieval in the case of a catastrophic failure, the data is
already backed up elsewhere. As long as I can manage to go a year without
destroying all my backups Glacier comes out on top over Drive even taking into
account the retrieval fees. In the best case scenario I never ever retrieve
that data.

------
captain_jamira
If one can download a percentage for free each month - 5% in this case, and
the price of storage is dirt-cheap, then couldn't one just dump empty blocks
in until the amount desired for retrieval falls under the 5% limit? In this
case, if one wants to retrieve 63.3 GB, uploading 1202.7 GB more for a total
of 1266 GB, 63.3 GB of which represents just under 5%. There's no cost for
data transfer in and the monthly cost at $0.007/GB would be just $8.87. And
that's just for the one month because everything wanted would be coming out
the same month.

Has anyone tried this or know of a gotcha that would exclude this?

And I realize that for the OP's situation, it wouldn't have mattered since he
thought he was going to get charged a fraction of this.

~~~
rakoo
That's possibly a way to trick the system, but storing 63.3 GB in S3 would
cost OP less than 2$ for standard availability and less than 1$ for reduced
availability, not counting request costs (which are not as surprising as the
hidden costs in question here). At this scale you should just store it in S3
and be done with it.

~~~
captain_jamira
sure, I wasn't intending to suggest this as a good premeditated maneuver, but
there are probably other individuals out there who have found themselves in a
similar position to the OP and considering their predicament.

If you've gotten into Glacier for the wrong reason, you may already be in the
trap, and you can quickly rip yourself free and take a bunch of skin, spend
almost 2 years ever so gently prying yourself free, or maybe a third way.
That's my angle here. Also, traps don't have to be laid for someone to feel
like he's in one, so I'm not putting that on AWS.

The cheapest way out seems to be to just grab 5%/month over 20 months, but
that's a lot of sustained effort and contact with the service. So I could see
a trick like this as a potential middle ground, at three months and ~$30
according to previous comment's details.

------
kennu
Glacier is more comfortable to use through S3, where you upload and download
files with the regular S3 console, and just set their storage class to Glacier
with a lifecycle rule. I've used the instructions in here to do it:
[https://aws.amazon.com/blogs/aws/archive-s3-to-
glacier/](https://aws.amazon.com/blogs/aws/archive-s3-to-glacier/)

~~~
takeda
That can bite you if you have a lot of small files[1], which with automatic
archiving of S3 can happen.

[1] [https://therub.org/2015/11/18/glacier-costlier-
than-s3-for-s...](https://therub.org/2015/11/18/glacier-costlier-than-s3-for-
small-files/)

------
joosteto
If downloading more than 5% of stored data is so expensive, wouldn't it have
been cheaper to upload a file 19 times the size of the stored data (containing
/dev/urandom)? After that, downloading just 5% of total data would have been
free.

~~~
yetanotherjosh
It still wouldn't have been free. The free download allowance is spread daily
across the entire month. That is, you can download 5% of your data per month,
and only 0.16% of it per day. For your optimization to work, you'd have to
retrieve your data over 30 days for it to be free.

------
slyall
I've had some big Glacier bills in the past, even the upload pricing has
gotchas[1]

These days the infrequent access storage method is probably better for most
people. It is about 50% more than Glacier (but still 40% of normal S3 cost)
but is a lot closer in pricing structure to standard S3.

Only use glacier if you spend a lot of time working out your numbers and are
really sure your use case won't change.

[1] - 5 cents per 1000 requests adds with with a lot of little files.

~~~
wallacoloo
I use Infrequent Access Storage for backups, through a tool called duplicity
(or more aptly, I use a GUI front-end for that tool called Deja Dup). Instead
of uploading every individual file, it gathers them into 25 mb .tar files &
uploads those along with an index describing where each actual file is. That
has made the requests negligible for me.

~~~
slyall
We combined the files too.

And then a customer wanted all their files rather than just one or two.
Although that was billed back to the them.

------
Zekio
Pricing should always be made straight forward, easy to understand, and that
pricing plan is dodgy as hell

~~~
URSpider94
Pricing plans FOR CONSUMERS should always be straightforward and easy to
understand. This is not a consumer product.

Pricing plans FOR B2B should model, as effectively as possible, the underlying
costs -- this allows the provider to offer the lowest possible pricing for the
services that cost them the least to provide, with expensive services priced
accordingly. As others have mentioned on this thread, utilities are really,
really good at this -- they come up with extremely complex rate plans for
their largest customers that help them achieve whatever economies they are
aiming for, for example incentivizing customers to provide level-loading
(which is effectively what Amazon is doing in this retrieval scheme).

~~~
jack9
> This is not a consumer product.

That's a manufactured excuse for a fundamentally bad product api and pricing
structure. When dropbox is more useful than AWS, amazon has screwed the pooch
(which they do pretty often). Segmenting users by arbitrary circuitous logic
into "consumers" (can't find a good use for it) and "enterprise" (can find a
good use for it) isn't constructive. Both classes should avoid it, because
it's not even an inexpensive choice, for what you get.

~~~
manigandham
How is Dropbox more useful? Dropbox and AWS Glacier are vastly different
products and show exactly the divide between consumer and enterprise that you
say doesn't exist.

------
detaro
Seems like "precise prediction and execution of Amazon Glacier operations"
might be a niche product people would pay for (and probably already exists for
enterprise use cases?)

That's something that generally keeps me from using AWS and many other cloud
services in many cases: the inability to enforce cost limits. For private/side
project use I can live with losing performance/uptime due to a cost breaker
kicking in. I can't live with accidentally generating massive bills without
knowingly raising a limit.

~~~
eropple
You can do it pretty easily with the AWS APIs, and in the process scram-switch
only the stuff you really want to kill.

~~~
detaro
How would you script a protecting against the issue described in the article?
Unless you make sure to check the cost before each single request you can't
stop it (and if you accidentally send one massive retrieval request even that
isn't enough)

For other services it is easier, but even then, setting up and managing my own
cost control mechanism is a level of complexity (and risk of failure) I'd
really want to avoid, esp. since I probably use AWS to _avoid_ management
overhead.

~~~
eropple
You can't in this particular use case, but I can't envisage AWS providing a
cost-control system that would stop this, either. It doesn't make sense--
they're not calculating costs as they go. What I'm saying is that what Amazon
would provide you is not functionally different from what you can do yourself
right now.

I would be a lot more worried about a risk of over-charging myself if AWS
wasn't incredibly good about refunding accidental overages.

~~~
detaro
as user _re_ pointed out above, apparently for Glacier that functionality
actually exists: [https://docs.aws.amazon.com/amazonglacier/latest/dev/data-
re...](https://docs.aws.amazon.com/amazonglacier/latest/dev/data-retrieval-
policy.html#data-retrieval-policy-using-console)

~~~
eropple
Huh, that's...surprising. Glad it exists, though, as Glacier definitely is
opaque. Thanks for pointing that post out (to you and to `re).

------
stuaxo
"The problem turned out to be a bug in AWS official Python SDK, boto."

My only experience of using boto was _not_ good. Between point versions they
would move the API all over the place, and being amazon some requests take
ages to complete.

After that worked with google APIs which were a better, but still not what I'd
describe as fantastic (hopefully things are better over last 2 years).

------
tomp
Wouldn't it be better for the OP to simply upload 20 * 60GB (= 1.8TB) of
random data, wait a month (paying less than 20 USD), and then download the
initial 60GB within his 5% monthly limit?

~~~
re
No, because he can download the 60GB at a slower, cheaper rate, paying less
than the storage for the 1.8TB and getting it sooner than a month.

~~~
otakucode
How?

~~~
detaro
By requesting the CDs one after another (or small numbers in parallel),
keeping the data rate low.

------
sneak
This article claims that glacier uses custom low-RPM hard disks, kept offline,
to store data.

Does s/he substantiate this claim in any way? AFAIK glacier's precise
functioning is a trade secret and has never been publicly confirmed.

~~~
olavgg
I think this is a good guess, [http://storagemojo.com/2014/04/25/amazons-
glacier-secret-bdx...](http://storagemojo.com/2014/04/25/amazons-glacier-
secret-bdxl/)

~~~
olavgg
Oh, there also a hacker news thread about this

[https://news.ycombinator.com/item?id=4416065](https://news.ycombinator.com/item?id=4416065)

------
Pxtl
Considering the fact that bugs in the official APIs resulted in multiple retry
attempts, he should demand some of his money back.

~~~
detaro
The retries don't cost extra. Sending off all 150 retrieval requests for all
the data at once set the retrieval rate and the price, they should have been
staggered over a lot of time to keep the rate low.

------
atarian
They should rename this service to Amazon Iceberg

------
Jedd
About a year ago NetApp bought Riverbed's old SteelStore (nee Whitewater)
product -- it's an enterprise-grade front-end to using Glacier (and other
nearline storage systems). It provide a nice cached index via a web GUI that
let you queue up restores in a fairly painless way. It even had smarts in
there to let you throttle your restores to stay under the magical 5% free
retrieval quota. It's not a cheap product, and obviously overkill for a one-
off throw of 60GB of non-critical data ... but point being there are some good
interfaces to Glacier, and roll-your-own shell scripts probably aren't.

As noted by others here, if you treat glacier as a restore-of-absolute-last-
resort, you'll have a happier time of it.

Perhaps I'm being churlish, but I railed at a few things in this article:

If you're concerned about music quality / longevity / (future) portability -
why convert your audio collection AAC?

Assuming ~650MB per CD, and the 150 CD's quoted, and ~50% reduction using
FLAC, I get just shy of 50GB total storage requirements -- compared to the
63GB 'apple lossless' quoted. (Again, why the appeal of proprietary formats
for long term storage and future re-encoding?)

I know 2012 was an awfully long time ago, but were external mag disks really
that onerous back then, in terms of price and management of redundant copies?
How was the OP's other critical data being stored (presumably not on glacier).
F.e. my photo collection has been larger than 60GB since way before 2012.

Why not just keep the box of CD's in the garage / under the bed / in the
attic? SPOF, understood. But world+dog is ditching their physical CD's, so
replacements are now easy and inexpensive to re-acquire.

If you can't tell the difference between high-quality audio and originals now
- why would you think your hearing is going to improve over the next decade
such that you can discern a difference?

And if you're going to buy a service, why forego exploring and understanding
the costs of using same?

~~~
stordoff
> Assuming ~650MB per CD, and the 150 CD's quoted, and ~50% reduction using
> FLAC, I get just shy of 50GB total storage requirements -- compared to the
> 63GB 'apple lossless' quoted. (Again, why the appeal of proprietary formats
> for long term storage and future re-encoding?)

I did a comparison between FLAC and ALAC (a.k.a. Apple Lossless) on my CD
library a few years ago (plus a few 48kHz tracks taken from DVDs), and the
difference in total filesize was less than 10% so I doubt that is a major
factor. I personally went for ALAC, as it has equal (EAC, VLC) or better
support (OS X Finder, iTunes, Windows Explorer, Windows 10 media player, some
tagging scripts, iOS) in stuff I currently use. Providing I keep a decoder
with the files, its proprietary nature doesn't really bother me - I can always
convert to xLAC if desired.

~~~
Jedd
Interesting, thanks.

I wouldn't use a proprietary format because I could neverbe sure when in the
future I'd want to read / re-encode, and what type of systems I'd have
available at that timer, other than knowing I'd always have access to free
software.

I have some FLAC archives, but I don't _use_ them - so support to play that
format hasn't been something I've taken much notice of. Do you normally play
your ALACs, or keep an mp3 / ogg / aac version around to actually listen to?

~~~
tjl
It's not exactly proprietary since Apple open sourced the ALAC code in 2011.

[https://alac.macosforge.org](https://alac.macosforge.org)

Also, someone has wrapped it to build with different tool chains.

[https://github.com/TimothyGu/alac](https://github.com/TimothyGu/alac)

------
LukeHoersten
Does anyone have a success story for this type of backup and retrieval on
another service?

------
elktea
I briefly used Glacier for daily backups as a failsafe if our internal tape
backups failed when we needed them. The 4 hour inventory retrieval when I went
to test the strategy and the bizarre pricing quickly make me look at other
options.

------
pmx
I have a strong feeling that he would get a refund if he contacted Amazon
support, considering it was caused by a bug in the official SDK and he didn't
ACTUALLY use the capacity he's being asked to pay for.

~~~
hueving
Multiple requests didn't cause the issue. It was asking for all of his data to
be queued in such a small window of time. The same thing would have happened
without the bug.

~~~
cdcarter
That said, this is Amazon, who will refund you for a product if you ship it to
the wrong address accidentally...I'm sure OP could get a refund.

------
natch
This is why I break my large files uploaded to Glacier into 100MB chunks
before uploading. If I ever need them, I have the option of getting them in a
slow trickle.

~~~
Nexxxeh
This is no longer necessary, they now allow you to specify ranges of partial
files. So you can split a single large file into multiple requests to keep you
within allowance or within budget.

~~~
natch
Nice. I also upload par2 checksum files for each chunk so changing things now
would involve a bit of script rewriting, but that's good to know for the
future.

~~~
thedufer
Is that necessary? They report a checksum for every file when you request an
inventory.

~~~
natch
No, it's not necessary at all, I hope. I probably should have called them just
"Par2" files not Par2 checksum files. They aren't just checksums. They allow
reconstruction of the original file even after fairly extensive damage. Which
isn't likely, so, yes, not strictly necessary, but who knows, could be useful
in the what-if scenario. The checksums you mention only help with checking,
not with repair.

[https://en.wikipedia.org/wiki/Parchive](https://en.wikipedia.org/wiki/Parchive)

------
cm2187
Perhaps a naive question but why would glacier try to discourage bulk
retrieval? Is it because the data is fragmented physically?

~~~
syncsynchalt
We don't actually know how glacier data is stored (though there are several
theories from regular s3 to tape robots to experimental optical media).

~~~
ufmace
I suppose part of the neat trick of it is that because we don't have to know,
Amazon can switch it out for something else anytime it's convenient for them,
or some new tech comes up. Or split thing up among several methods and compare
costs. As long as they structure their operations such that nothing is ever
lost and everything can be retrieved with a few hours notice at any time, they
can try anything they want.

~~~
latch
How would us knowing prevent them from doing this?

~~~
lmm
Whatever information became public, people would write scripts and the like
with those assumptions baked in. Even if it wasn't officially documented, it'd
be bad PR for Amazon to break things people were relying on.

------
prohor
For cheap storage there is also Oracle Archive Storage with 0.1c/GB
($0.001/GB). They have horrible cloud management system though.

[https://cloud.oracle.com/en_US/storage?tabID=1406491833493](https://cloud.oracle.com/en_US/storage?tabID=1406491833493)

------
forgotpwtomain
> I’d need more than one drive, preferably not using HFS+, and a maintenance
> regimen to keep them in working order.

I'm really doubting the need for a maintenance regimen on a drive which is
almost entirely unused. Could have spent $50 on a magnetic-disk-drive and
saved yourself hours worth of trouble.

~~~
jaytagdamian
The problem with physical drives is that they can (and do!) fail. The authors
point is surely that the backup would need to be checked periodically, and
drive failures dealt with.

Does magnetic media like this (especially spinning disk) suffer from bit-rot?
What about the possibility of mechanical failure?

I'd never rely on mechanical disks as the one and only backup of any data
critical to me - a two tier approach of mechanical for fast retrieval, and
cloud/online backup seems to be the safest bet.

------
kozukumi
For unique data you want super robust storage options, both local and remote.
But for something as generic as ripped CDs? Why bother? Just use an external
drive or two if you are super worried about one dying. Even if you lose both
drives the data on them isn't impossible to replace.

------
JimmaDaRustla
Wow, thanks for this!

I currently have 100gb of photos on Glacier. I am going to be finding another
hosting provider now.

------
random3
So depending on how the "average monthly storage" is computed you could get
20x more data in one month and then retrieve the 5% (previously 100%) that you
care about for free, and then delete the additional data?

~~~
cldellow
There's a 90 day minimum storage cost for data.

------
jaimebuelta
You will ALWAYS pay more that you expect when you use AWS (and probably other
cloud services). This case is quite extreme, but the way costs are assigned,
is quite complicated not to miss something at some point...

------
z3t4
I was looking at Glacier for my backups, but it seemed to complicated ... glad
I didn't use it.

I ended up using some cheap VPS, two of them located in two different
countries. And it's still cheaper then say Dropbox.

------
alkonaut
Curious: if you use a "general storage provider" (like glacier) for backup,
rather than a "pure backup provider" (like Backblaze, CrashPlan) why is that?

~~~
natch
I control the encryption algorithms, the compression scheme, the file chunking
strategy, the encryption keys, the encryption of archive/file names, the file
naming scheme, the addition of Par2 files, everything.

And I don't pay the overhead of an add-on service.

Also I back up stuff that's not on my hard drive (only on external USB drives)
and I'm not sure how the services handle that.

If the services give me some of these points, that's not sufficient; they
would have to give me all of these points. Only then would I consider them.
All things being equal I'd be willing to pay for some convenience but my
current solution is all scripted so it's pretty darn convenient.

~~~
alkonaut
> I control the encryption algorithms, the compression scheme, the file
> chunking strategy, the encryption keys, the encryption of archive/file
> names, the file naming scheme, the addition of Par2 files, everything.

I can see why detailed control would be one reason, but you could still just
have a very controlled backup to your own storage location(s) as a first step
and just let a backup service bulk store your already named and encrypted
files? It's only the last-resort you need to go to so if it's a huge blob of
encrypted data that shouldn't matter too much -- you only need to access that
in case of a total disaster where you lost all your own backup endpoints
first.

> And I don't pay the overhead of an add-on service.

The reason I'm asking is because I was under the impression that backup
services are much _cheaper_ than pure storage, while still offering some
conveniences such as versioning/backup apps. Glacier charges $0.007 per GB per
month, that's $7/month just for a single 1TB machine, just for a single
version (If my math is correct, it's early)! If you have dozens of versions it
quickly adds up.

I do 10 machines at around 1TB on average, unlimited storage in unlimited
versions, at $1.25 per machine per month (flat rate, regardless of storage
volume). I have tried building my own machines, tried looking at storage
providers etc., but can't get near.

Even if I did only 1-2 machines, the cost in Glacier would break the backup
service cost already at a couple of TB total storage.

~~~
natch
Great analysis, but why "per machine"?

I back up data, not machines. I would prefer that third party backup software
not have access to the unencrypted files on my machines.

The pricing you mention is compelling. Which service is that? Does your model
work if the data is on multiple external drives?

Getting down to a specific example, say I have one laptop and three external
drives. Do the backup services you have in mind work with this setup? How
would they charge?

~~~
alkonaut
The service I use is the CrashPlan "family" plan. If I would just gather all
the data to one machine instead it would work too (could just keep it on a
file server with a directory per machine or whatever). The only reason
"machines" are useful is for having separate easy restores -- basically my
parents or mother in law can restore their machines from the other side of the
country.

As long as you just keep the external drives connected to the same machine
during backup, the backup application doesn't care where the file set to
backup is mounted. its just a directory list per machine.

I used to backup "data" too, by using a first step of backup of multiple
machines to a NAS and then only backup that data to the cloud. However, I
sacrificed that extra security for the added convenience of direct restore of
individual files and machines. It also reduced the risk of me having done a
mistake in the backup config of the first step (which I estimated to be a way
higher risk than hardware failure, fire or a data breach at the cloud
provider, since I have basically never configured anything right). Being able
to easily fetch an individual file from 1 day or 1 week ago can really save
time. Edit: also remember that the backup client on Linux uses ram
proportional to the backup size so on my cheapo NAS I also outgrew its ram and
would have had to get a faster one or a file server, that was also partly why
I left it.

The good thing about the backup pricing is that it doesn't increase with
aggregated backups like normal storage. It's 150/year so to be competitive you
need a few TB, but you can very easily backup your parents machines too and
save some time around Christmas... Note though that people sharing the same
plan can see each others data (or so I presume)

------
limeyx
So ... why not just upload and additional 60GB / 0.05 and then download the
entire 60GB which is now 5% of the total storage for free ?

------
NicoJuicy
Does anyone have a backup script for backblaze or a similar windows app like
SimpleGlacier Uploader?

------
dalanmiller
So, what's the most cost effective way to download all your files from Glacier
then?

~~~
benmanns
Spread the download over time. Over 20 months would be free. Over 1 month
would be the quoted per-GB charge based on your peak download rate.

------
pfarnsworth
If there's a bug in Amazon's libraries, can't you ask for a refund?

~~~
roywiggins
Once he was charged the $150, it didn't cost him any more to try again to
download it, because of how the pricing works. So, he would have been charged
that much no matter what if he'd downloaded the data at that speed, and there
was nothing to refund once he successfully got his data out.

------
jedisct1
I don't get Glacier. It's painfully slow, painful to use and insanely
expensive. [https://hubic.com/en](https://hubic.com/en) is $5/month for 10 Tb,
with unmetered bandwidth. A far better option for backups.

~~~
breakingcups
I'm interested in Hubic, but where do you read that bandwith is unmetered?

------
harryjo
Impressive that Amazon can choose to serve a request at 2x the bandwidth you
need, with no advance notice, and charge you double the price for the
privilege.

~~~
thedufer
Can you explain what you mean here? I'm fairly certain it isn't true.

------
languagehacker
This is a simple case of spending more than you should have because you didn't
understand the service you were using. It's impacted a little worse by how
silly the whole endeavor is, given the preponderance of music streaming
services.

~~~
joshmanders
Uh no, actually this is really a case of a bug in OFFICIAL SDK that caused a
higher bill than expected.

~~~
detaro
No, requesting all his data at once alone determined the rate. The retries
didn't cost extra.

------
otakucode
I'm surprised that the author had 150GB of Creative Commons audio CDs to begin
with!

~~~
sundvor
Are you trying to say that the OP was not allowed to back up their own CDs? I
have several boxes of space wasting CDs of my own that have all been ripped in
lossless, and feel somewhat insulted by that notion. Note: I live in
Australia, a somewhat less insane country for DRM than the US (AFAIK).

~~~
nness
Also in Australia, and its not that black-and-white in my understanding:

This is known as "Format Shifting" — taking one copyrighted medium and
converting it to another. In Australia, you are explicitly not allowed to do
this with CDs, DVDs and Blu-rays.

You are only allowed to keep a digital copy if you continue to retain the
original — a backup. If the original is lost or destroyed, your digital copies
must be discarded.

For example, you can rip a CD and put it on your iPod, or computer, as long as
you continue to own the CD. The issue here is that in both cases you also
control the device you are copying it to. You don't control but rather lease
space on Amazon's servers — so it introduces a grey area on whether you are
allowed to backup to such places and whether putting data on those servers
constitutes distribution of the copyright material.

Realistically, none of this is black-and-white and Amazon could flag it as
infringing content and remove it just to cover themselves against DMCA
complaints anyway. This is true in both Australia and the US, regardless of
the differences in copyright law (Australian copyright law offers far fewer
protections than the US, incidentally) because both having similar DMCA laws.

~~~
sundvor
Saw this too late to edit my other reply; an interesting overview of what you
may or may not do:

[http://www.lifehacker.com.au/2013/01/format-
shifting-101-wha...](http://www.lifehacker.com.au/2013/01/format-
shifting-101-what-are-your-legal-rights-in-australia/)

Also backed up by: [http://copyright.com.au/about-
copyright/exceptions/](http://copyright.com.au/about-copyright/exceptions/) \-
which states we're allowed to "space shift" music.

------
anonfunction
I don't like how the title and article reads like a hit piece on Amazon
Glacier. It's great at what it is intended for. In addition it seems he still
saved money because over 3 years because the $9 a month savings added up to
more than the $150 bill for retrieval.

I'm surprised that this aspect has not been mentioned here in the comments
yet:

> I was initiating the same 150 retrievals, over and over again, in the same
> order.

This was the actual problem that resulted in the large cost.

At my old job we would get a lot of complaints about overage charges based on
usage to our paid API. It wasn't as complicated of pricing as a lot of AWS
services, just x req / month and $0.0x per req after that, but every billing
cycle someone would complain that we overcharged them. We would then look
through our logs to confirm they had indeed made the requests and provide the
client with these logs.

~~~
detaro
> _> I was initiating the same 150 retrievals, over and over again, in the
> same order.

This was the actual problem that resulted in the large cost._

Except that it wasn't. The repeated requests were free, because he already set
the maximum rate with the first wave of requests. Surprising?

Also, it really is not a hit piece. It's an honest report of what he did (and
what he did wrong) and that he thinks the docs aren't as clear as they could
be.

~~~
anonfunction
Thanks, I guess I glossed over that part of the article but when I got to the
end and saw the original quote I assumed the worst. The title alone is pretty
inflammatory in my opinion.

