Hacker News new | past | comments | ask | show | jobs | submit login
Super Cheap Data Backups with Amazon Glacier Storage (atomicobject.com)
57 points by signa11 on Feb 19, 2015 | hide | past | favorite | 50 comments



Be careful with Amazon Glacier, retrieving data usually leads to a rude awakening.

Glacier bills on "peak retrieval rate". Say you have a 100mbps connection and 1TB of data. Retrieving all 1TB over a 24-hour period at an even rate will cost you ~$300. Not great, but not awful.

But this calculation is wrong. The real story is much worse.

You're not retrieving your data directly from Glacier. You're restoring it to S3, then downloading your files from there.

Amazon is going to bill you at the peak retrieval rate _from Glacier to S3_. If you're lucky and it takes four hours (and at an even rate), you'll rack up _$2000_ in charges.

This has happened to me and to many others. The solution is to carefully meter your restoration requests. Or, better yet, don't use Glacier in the first place. It's simply not suited for the everyday user.

Play with the numbers for yourself: http://calculator.s3.amazonaws.com/index.html

And I'm certainly not the first to warn of this:

- https://news.ycombinator.com/item?id=4412886

- http://www.wired.com/2012/08/glacier/


"It's simply not suited for the everyday user."

True, but it seems like a reasonable secondary/off-site backup. If my machines and primary backups are both destroyed or stolen I'll pay whatever is necessary for to recover data (within reason)


This is something that I haven't seen a straight answer on yet (from Amazon). When you make the restore request, you restore a "block" of data (or container, can't remember what they call it). And you have no control over how fast they restore that container -- if they take 4 hours, it is one price, but if they decide to take less than an hour then they bill you more, for something that isn't under your control -- correct? I did see where you can request a partial container restore (range restores, I think they call it), so you can at least have some control over the retrieval rate (you can, for example, every 2 hours request 10% of a container to restore). But that still seems like a lot of areas that can go wrong.


I was under the impression if you can retrieve the data slowly, the price goes WAY down.

http://liangzan.net/aws-glacier-calculator/


Consider this. To store 500 GB in Glacier it would cost be $5/month and I would have to do:

- Use a tool to manage my uploads, or write my own thing, or use the CLI or Web UI

- I get charged money for downloading content out. So checking that my backup works costs more than the backup. And it takes longer.

- Manually manage what has been backed up, and figure out the differences to get any kind of syncing or incremental backups.

- Have no idea about how Glacier data is actually stored or how redundant it actually is. Amazon will not talk about this publicly.

Or... I could spend $5/month on Backblaze where I could:

- Store as much data as I want

- Access it at any time, instantly, from any device, for free

- Get free, supported, automatic, "only upload the diffs" software to keep everything in sync

- Know exactly how my data is stored because Backblaze is super transparent about their storage pods, the harddrives, and the reliability of the various components. They even open sourced there hardware designs.

Glacier is neither an easy nor cheap backup solution for the end-user. Seriously, for an end-user computer just go get Backblaze, or Crashplan, or anything else.

(For businesses that already store things in S3, yet, I can see an advantage to "aging" data into Glacier. But this is a rather specific use case)


I'm on OSX. I was a long time CrashPlan user until on multiple occassions my stored data would simply vanish (e.g. yesterday it was 300GB, today it's 3GB and everything needs to be forcefully reuploaded for the next week). Spent quite a bit of time with support, sending logs and such, with no real resolution. And only because I happened to look at the UI was I aware of the issue. That didn't leave me comfortable at all, given I normally didn't pay attention once it was running.

I then looked at BackBlaze briefly. Same class of product as CrashPlan (set and forget), but I immediately ran into bugs (submitted, confirmed by them), was put off, still raw from CrashPlan... Not a fair assessment to be sure.

Landed on Arq. It's not without its annoyances, but it does do the job well enough, including the recovery process from Glacier which is pretty tedious. I like that Arq can easily map backups sets to different providers, like putting frequently changing stuff that I do restore as a convenience into S3 or Google Drive, instead of Glacier. My most important (smallish) files live on multiple services.

On a more general level, I like that I'm deciding exactly what I'm willing to pay for and how it's managed. I get the AWS bills. If I want infinite backups on this set, and the just the last week on a different set, no problem. The tool does my bidding, and does it smartly (deduplication, etc.)

Lastly, the Arq author is accessible. The whole experience has brought me from skeptical to fairly satisfied.


Doesn't support Linux, bummer.


Crashplan does. I run it on my RaspberryPi.

Also, Crashplan never deletes anything while Backblaze, last I checked, requires something like an external HD to be connected once every 30 days. A true backup, not offsite storage, in a sense.

It also has a family plan which is great, allowing 5 machines for U$ 13/month I think.

But the whole Java thing is a nightmare.


Yes, Backblaze's "window" is limited to 30 days. Crashplan may delete files but it's up to you:

"Backblaze will keep versions of a file that changes for up to 30 days. However, Backblaze is not designed as an additional storage system when you run out of space. Backblaze mirrors your drive. If you delete your data, it will be deleted from Backblaze after 30 days." [1]

"If you delete files from your system, they remain backed up and in your backup archive forever, as long as: 1. The files remain selected in your backup file selection. 2. Your 'Remove deleted files' setting is set to never." [2]

[1] https://www.backblaze.com/remote-backup-everything.html

[2] http://support.code42.com/CrashPlan/Latest/Backup/Backup_FAQ


Backblaze provides a mirror of your running system with a 30 day max limit on diffs; if you discover a file disappeared or became corrupted more than 30 days in the past, Backblaze can't help you.

You can download 1GB/month from Glacier for free or you can transfer any amount from Glacier to an EC2 instance in the same region; one or both should be sufficient to do your own integrity testing.

Anyway, Backblaze is a backup service, Glacier is not. Glacier is a storage service you can use with any backup software written to be compatible with Glacier's API. Backup management including diffs is the backup software's job, not Glacier's. Glacier's job is to be very durable storage; I also wish Amazon would say more about their storage media but it's generally understood to be heterogeneous with multiple redundant copies of all data stored.


> or you can transfer any amount from Glacier to an EC2 instance in the same region

Note that it's only the transfer that is free. Telling Glacier to retrieve something, even from EC2, comes with all the same fees as if you did it from your own machine.


Glacier retrievals (separate from data transfers out) can also be effectively free but it can be complex. This complexity is a significant downside to Glacier.

http://aws.amazon.com/glacier/pricing/

http://aws.amazon.com/glacier/faqs/#How_much_data_can_I_retr...


"Backblaze is a backup service, Glacier is not. Glacier is a storage service"

This was my original point, and thank you for stating it so clearly. My concern with the article was that is was saying "Here is a storage service, and here is what I do to use it as a backup, isn't that cheap and easy!"

If you want a backup service, you are often better off purchasing a backup service instead of trying to roll your own on top of someone else's storage service. YMMV


I don't understand your claim about redundancy. Amazon says that Glacier "is designed to provide average annual durability of 99.999999999% for an archive", https://aws.amazon.com/glacier/details/#durability. This seems to be the same as for S3 objects. Furthermore, they say that "the service redundantly stores data in multiple facilities and on multiple devices within each facility".

So multiple facilities and multiple devices on the same facility (I guess that means at minimum 4 copies), and a calculated annual durability. Not so secret?

What are Backblaze's (and Crashplan's) redundancy policies?


I tried Backblaze a few years ago and it choked due to the number of files I had (millions, IIRC). Crashplan was going to take months for the initial sync. Have either improved?

I'm currently using a Mac OS X app called Arq, which offers similar features (incremental backups, end-to-end encryption), but backs up directly to Glacier/S3 (or SFTP, or Dreamhost/Google Drive/Google Cloud Storage/etc). It works well.

I prefer backup software that uses open protocols with commodity/"dumb" file storage providers. Given client-side encryption, there's not a whole lot of value to be added by middlemen.


Ideally, glacier is your backup of your backup. You only fall to it if your tapes fail, your disk-to-disk blows up, your datacenter melts in a fire, crashplan/backblaze go out of business.


For Crashplan an Backblaze, you are using their specific client, so if it doesn't suit your needs (retention schedules, file include/exclude control, multi tiered backups, special [i.e., database] file handling) you are out of luck.

So the ideal use of Glacier is to use a backup client that fits your needs, with S3/Glacier support built in. Or use an online backup provider that uses S3/Glacier as a backend to their product.



Does Backblaze still forcibly exclude /Applications on my Mac? Because that's a deal breaker.


Does it run on Linux, last time checked it didn't?


By default yes, but you can easily change it in the options.


I don't know about the Mac version, but on Windows there are a bunch of default exceptions and you can edit the list and remove them.


It does, but I believe CrashPlan does not


On a complimentary note - if you have a Synology there is an officially supported Glacier backup app which works very well. You can simply select the files you want maintained in the particular backup and you go from there. I treat Glacier as a catastrophic failure last resort. I have local backup (and lots of it) for reasons apparent to maintaining all of my digital archives (which are mainly contributed to by a lot of large camera photo and videos).

I don't, unfortunately, trust CrashPlan anymore. I've had a significant number of restoration problems with the clients. I think they've done a poor job maintaining them over the years and, as others have stated, there are features missing unfortunately. It's cheap and you get what you pay for at this point in time. I think it used to be a great product, but when you lose trust in a backup provider there's no point. Just my opinion, I know others still think they're great.


My case against Glacier:

a) if you're using it for backups it's discouraging you from periodically checking your backups, which is standard good practice.

b) if you're using it for backups of your backups, dude what are you smoking? Why don't the backups of your backups have backups? Or second breakfast and mid-afternoon tea[1]? Where does it end?

For most of us mortals, it's a hard enough job doing one set of offsite backups and convincing our loved ones to get with the program. Find the one right offsite backup solution for yourself, periodically compare your on-site archives with your off-site backups, and get on with your life.

[1] http://www.youtube.com/watch?v=dLXeL4HbPr4


For me, it wouldn't be a backup of backups, but more a backup of expired files off my regular backups. So I'd keep say 3 months of files in my primary backup, anything older (up to say 3 years) would then go to Glacier. That's the best use case of it.


Arq and S3 Glacier are a fantastic combo. Uploads encrypted diffs.


I use Amazon Glacier storage for my backups. It's pretty cheap, I backed up 161GB for around $1.70/month. Since I took an Amazon AWS survey, I received a $25 promotion for AWS so my Glacier storage is paid for the next 2 years. :)


Which tool(s) do you use for Glacier backups?


Do they charge you to read or write to Glacier?


If you have a large amount of data, even the 1 cent per gigabyte per month quickly becomes not the cheapest option.

I have a CrashPlan family plan. 10 computers, unlimited data for like $6 a month with a four-year commitment (when I signed up).

The service just works. I'm not sure how much I'm backing up but let's be conservative and say each computer is backing up 100 GB. That's $0.006 per gigabyte per month.

As you back up more services like CrashPlan become a lot more appealing. If you've got kids, I can imagine you're gonna keep snapping pictures and videos of them! :)


1) Push data to S3 and keep it there for 90 days. If I need something, it's there (testing, retrieval, etc).

2) After 90 days, allow the data to be moved to Glacier. At that point, the data is good to have but I wont need it any time soon.

It is really about risk management. How often and how far back you need to pull data? A good strategy is required to minimize risk.

Glacier is for cold storage only. You should not expect to retrieve data from Glacier unless it is absolutely necessary. I think the pricing is justified but some use it incorrectly.


I would also recommend Microsoft Onedrive/office 365 to anyone looking for really cheap only storage. For roughly $6 a month you can get 5 accounts with 10TB each.

A one year Office 365/Onedrive Home Premium pack with 5 licences costs $70 on amazon. By default Onedrive accounts only have 1TB right now, but microsoft announced a few months ago that they are changing that to unlimited and are rolling that out right now. You can request an account to be upgraded to 10TB right now[1]. The upgrade only took 1 day for me.

So that makes it $6 for 5x10TB total, or $0.12/TB/month.

And you also get all the actual Office products. I recently started using that as an extra backup location since I've had to many issues with crashplan to trust them and dropbox or google drive's storage plans don't work out very well if you just need a little more than 1TB.

[1] http://preview.onedrive.com


How exactly does one use it? Are there open source clients? Can I use it on Linux? What kind of limitations are there to file size or number? How fast is it to store/retrieve?


Dropbox Pro is same rate (https://www.dropbox.com/plans). You do have to buy a it $10/month at a time, but you can get to it whenever you like without paying 9 cents per GB. AWS is usually worth it but it is never super cheap.


Which is around the same price as google drive: 1TB ~= $10. You will also get it immediately without going through the 3-5hrs penalty and supports instant preview of most files.

Although, the increment is much higher (Next tier is at 10TB ~= $100)


I use both -- Dropbox and Arq/Glacier. Dropbox does have fantastic accessibility and it's irreplaceable for sync, but for stuff like family photos it's nice to have a very cheap backup-of-my-backup


Or you could pay $7/month for office 365, get 1TB+, office and 60 skype minutes. You can even buy a $75 x86 tablet and get 1 year of free office 365 for a total cost of $6.25/month.


You could also use this thing, a "Cloud NAS": www.cloudengines.com. Basically a device that exposes itself as a drive via SMB/CIFS and works exactly like a normal NAS, but it's hooked up to a cloud backend, so there is no local storage and it scales. Costs the same as Glacier (1c/GB/Month), but without the retrieval fees. Plus it won't take days to get your data! Obviously it won't run out of the box as a backup application since it's just a drive, but you could do pretty much anything with this thing that you could do with a normal NAS as long as you don't need blazing fast performance.


I'm surprised nobody has mentioned tarsnap yet. Simple to use (I simply run a cron job to backup my home directory once a day). Backups are incremental (you don't pay for a full backup every day). Rock solid security (powered by cperciva). Reasonable pricing. Granted, the UI is a little rough around the edges when compared to e.g. crashplan but I live in a terminal already anyways.


Tarsnap is fantastic, and I'd love to be able to use it more, but it's 25x more expensive than Glacier (not counting the whacky retrieval costs) and 10x more expensive than S3, so it's not really practical for bulk backups unless your data is particularly valuable.


I respect Colin and what he does, but tarsnap is too expensive for general purpose/consumer back ups, I mean photos and videos.


So true, that's amazingly cheap!

Still, you need tools and a real strategy for your backups in the Cloud: S3 and IAM will help as well! http://cloudacademy.com/blog/amazon-s3-vs-amazon-glacier-a-s...


RunAbove (OVH) offers $0.01/GB/month. It's Openstack so everything that works with Rackspace works fine.

https://www.runabove.com/storage/object-storage.xml


So far I like crashplan for remote backups on my personal machines (a few Ubuntu and one Windows machine). I did a little research and it fits my needs.

Initially looked at Backblaze and saw Mac and Windows only support so that was a no-go.


I really wanted to use it with Ubuntu but it was incompatible with an encrypted /home. The daemon would start at boot and, unable to mount my target directory in /home, would create a new target directory in / and fill that up.

I opened a ticket and their support people seemed confused by Linux, and told me to check the user forums. Since this is a paid product I saw this a poor service and cancelled.

It did look interesting but I got the impression that Linux wasn't really supported - just that a Linux client existed.


Did you know linux crashplan doesn't store full permissions?

Try a 200GB restore of 5M files sometime, apparently often the client crashes after many hours.


If you're an Amazon Prime member you've already paid for free unlimited online photo backups. Prime is only $8.33/month and offers tons of other benefits. Just saying.


Only for JPGs, though, no RAWS or anything else. And upload speeds are capped. And all your photos are in cleartext on Amazon's end.


I just use smugmug to backup all my photos. It's $40 per year and supports unlimited photos




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: