Hacker News new | past | comments | ask | show | jobs | submit login
Backup Awareness Week (backupweek.com)
36 points by nanch on Mar 25, 2014 | hide | past | favorite | 52 comments

I know all about the importance of Backups.

Mirroring is not a backup.

Raid is not a backup.

Hard disks are not supposed to sit on a shelf unplugged for extended periods.

Fine. Now, how the hell do I backup my 5TB of photos? :-( :-(

Edit: Lots of fantastic information here https://news.ycombinator.com/item?id=7371725

Is there any service that can burn my terabytes of data onto multiple copies on "made in japan taiyo yuden" CDR? :-)

> Fine. Now, how the hell do I backup my 5TB of photos? :-( :-(

I use git-annex. It understands the concept of wanting multiple copies of things, and keeps track of what is where (eg. S3, Glacier, some remote rsync server, or which of my many external drives). When I want something, it gets it for me (eg. by telling me which external drive to plug in).

Then, all I need to keep backed up is my git repository itself, which is tiny. I use Tarsnap for this, which means that I can keep previous snapshots without issue.

> Hard disks are not supposed to sit on a shelf unplugged for extended periods.

This works fine for me, when combined with some other method. Redundancy is key. And "git-annex fsck" checks a drive's integrity for me.

thanks for the new possibility! Very interesting. from their page: https://git-annex.branchable.com/not/

"git-annex is not a backup system. It may be a useful component of an archival system, or a way to deliver files to a backup system. For a backup system that uses git and that git-annex supports storing data in, see bup."

since you said you're using it effectively as a backup, could you please clarify what they mean, and what you mean?


For context, I was originally answering "Fine. Now, how the hell do I backup my 5TB of photos?". My answer is that you don't need to, since photos don't usually change after they are taken. It is sufficient to simply archive them safely (and redundantly).

I'm not using git-annex as a backup system. I use Tarsnap and my own tool ddar for backups.

I am using git-annex to archive specific large file collections that don't ever change (eg. photos and videos). By storing these collections appropriately in multiple redundant locations, and by also backing up my git-annex repository (using the backup tools above), I have effectively "backed up" my photos. They're as safe as any backup system can make them.

I've come to the conclusion that Amazon Glacier is probably the first good answer since tape drives. 5TB is I think 50$ a month - if you use this data professionally, this is probably worth it. Otherwise maybe still tapes? What's the max. capacity on those these days?

It's a more expensive investment to get started, (as opposed to something like CrashPlan, Carbonite, etc.) but I definitely recommend having a Drobo product hooked up to your network, with regularly scheduled backups to it.

If you got one of the five drive boxes, you could dump 5 2TB drives into it, and likely have enough capacity to store all 5TB of photos, and be able to flip on the option to have two drives crash simultaneously without you losing any data. (you definitely lose space using them, as I have a 1TB and 2TB in mine right now and only have ~900GB available) If you're using a mac, it can even act as a Time Capsule so you can direct Time Machine to back up directly to it. (my wife and I do this, it tends to back up about once a day rather than every hour like Apple promises) Yes, I know it's expensive, but it's nice to know I have a local box (nothing in the cloud!) with all my (and my wife's) data backed up to it, where if a hard drive decides to go kaput, neither of us lose anything.

Getting a "Drobo product" (NAS) won't do much when your house burns down or gets robbed. To be safe, you need your data to be in multiple physical locations.

That's fair enough. However, I do believe that the common case for data loss is that someone's hard drive dies, or you drop a laptop, or your laptop gets stolen, etc. Multiple physical locations provides the best safety, but there is value in having a local backup for when the hard drive dies in your computer.

Also I'm not sure what (NAS) means, and I apologize if I sound like a salesman or something. Wasn't sure how to best describe a line of products made by a company without sounding like one. I was just trying to recommend a solution that I, personally, own and use to cover a pretty common case. And also feeds my slight paranoia of having a lot of my personal data and whatnot on someone else's servers.

Start by putting a copy on Glacier. If you're on osx you can use Arq [1] to make it insanely easy.

[1] www.haystacksoftware.com/arq/index.php

Copy them to 2x 3TB drives, put the drives in a safe deposit box or store them in a fire safe at a friend or relative's house.

Let me repeat: "Hard disks are not supposed to sit on a shelf unplugged for extended periods."

This even more the case with the very large capacity disks you are mentioning here.

As mentioned elsewhere in the thread, what about burning them to 50 triple or quadruple layer Blu-ray Discs (BD-R's) then store them in a fire proof safe?


Every month go fetch the old hard drives, copy all the new data onto them, then take the currently in use hard drives and put those back into storage.

If you want more redundancy replace 2x 2x3TB drives with 2x NASes each with 3 drives in raid 5 or 4 drives in raid 1.

If you archive disks like this, you need to run software like SpinRite on them ever so often to maintain them. I'd suggest that you do it every time you update your archive on the disks.

[citation needed]

A lot depends on whether you're doing a backup or an archive; if it's a backup you'll be rotating the disks in periodically.

Print them twice and store them in different locations.

Or more realistically: batch-resize the photo's so that they are much smaller, and save that version of manageable size somewhere online as a last resort for when a disaster destroys all your current full-sized copies.

Wouldn't it be better to find ways that are not lossy?

I have been using CrashPlan for backing up my huge photo library, and it has been working great so far.

The main benefit compared to the others is the "unlimited" cloud backup space if you sign up for their cloud service.

I've tried Backblaze, but the upload speed is only a single connection and VERY slow from my part of the world. If the uploading was done via multiple connections, it would be 10X faster. As it is, I still have about 230 days left before my existing data (and that's not even all of it) will be uploaded.

I will check out crashplan.

The speed varies with Crashplan as well, but I usually get around a couple of megabits per second at least.

I run it on my home NAS though so it's basically just set-and-forget, meaning that I don't have to remember to keep my Mac online. I just add my photos to the NAS share and it takes care of it from there.

To underscore the importance of backups:

We've been aware that our data and backup policies were lacking, and have been discussing ways to improve that. We have more than 500gb of work and client data that is highly valuable, and thousands of dollars in stock photos to boot.

Then one of our systems got hit by cryptodefense (the latest variant of cryptolocker). It encrypted the NAS drives faster than we could notice, and destroyed some 99% of our data. If it had been able to propagate to just 1 other system, which we did a nightly mirror of the NAS to, we'd have lost everything. Since then, we've been trying to set up backups to Amazon Glacier.

Don't underestimate the ease with which data loss can affect you. It could be a malicious link in a phishing email, a malicious attachment, bit rot, a natural disaster, or simple hardware failure. The cost of maintaining offsite backups is cheaper than recovering from data loss.

This is like Tongue Awareness Month [http://xkcd.com/972/] for me. Intellectually I always know the state of my backups is pitiful, but now it's in the front of my mind and I can't stop thinking about it. ;-)

Jokes like this derive their humor by encouraging alienation from our bodies.

There is nothing wrong or unusual with being aware of our tongues.

There is nothing superior or powerful about others who make try to make us uncomfortable with ourselves.

This is nothing spooky, troubling or surprising about our breathing (as some similar posts elsewhere suggest "you are now breathing in manual mode. you just realized you have a skeleton.").

There is no reason to allow ourselves to experience any anxiety through becoming aware of our own bodies.

The only reason any of this "humor" works at all is because we are so thoroughly conditioned to have our minds focused outside of ourselves, on our work, on our needs and those society puts upon us that we rarely get to focus on ourselves and our bodies. And when we do, it's all too often because someone else is "rating" us or when we're trying to please others.

Let's all commit to dismissing false humor that suggests focusing attention on ourselves is anything but familiar, comfortable and impervious to psychological hijack by others.

I hate you.

Enjoy the next four weeks.

While this is a cool project, I unfortunately doubt it will convince the 90% of people who don't back up their stuff.

I had an idea like that a few months ago after spending time with my family and my girlfriend's family. We need a simple page that explains in big images why:

1. chances are, you will lose your data 2. here's a one-click thing to make sure it doesn't happen

For 1, we can't use the word "data" because people won't connect emotionally with the concept. "Photos from your last summer vacation", "The video of your kid's first step" etc. would be more powerful.

For 2, I was looking for solutions that provide automatic, continuous and off-site backup that is easy to recover. Without any of these criteria, a backup strategy is effectively useless. (+ secure, affordable, etc. but these are extras).

I personally use and always recommend Backblaze for these reasons (one click install that just works and gets out of the way), but there might be other solutions.

Anyway, I'd love to see this project get somewhere, probably a GitHub page where people can contribute and provide 3-step tutorials for iCloud (so many people don't back up there iPhone when it's right there, baked in) and other platforms.

Interested? Get in touch: contact@kevinbongart.net

So this is what stemmed from "World Backup Day" guy grabbing a trademark, shutting everyone out and turning spontaneous reddit project into his personal little cash cow?

Pissed masses countered with "Backup Week", which still appears to be thinly veiled ad spread for backup companies.

What next? "Backup Month"? This is getting ridiculous.

[0] http://www.worldbackupday.com

What's ridiculous about it?

Getting people to backup their files is important.

What's the problem?

The problem is your motivation, which is disingenious - "You should be backing up, we are worried for you, ah, oh, use these products (from random people who just won a bid for our ad spots)".

Ever seen those "hosting provider ranking" sites that rank top entries purely based on the fees paid to them? That's you.

You want to fix this perception? Remove Sponsors, add complete listings of all notable backup products, invest time in reviewing them and allow ranking and comments. Then you will have something that would start to justify your preachy stance. Until then you are in just for a quick money grab and it's pretty damn obvious.

There is a link to Wikipedia's exhaustive list of providers right by Step 2 in the Quick Start Guide.

Also, providers have a legitimate incentive to help people. They may even like why they do what they do.

So what would be a good, affordable backup solution that doesn't get in my way on my linux machine?

I use a cronjob that uses duplicity for encrypted backups with Google Drive as external storage (python2-gdata package on Arch). After Google dropped their price on 100GB to $2/month they became the cheapest option for me.

I'd love to use something like Tarsnap, but storing my ~80GB of data with them would cost $24/month plus $0.30 per gigabyte of bandwidth. And rsync.net is $0.10 per GB of storage, so that would be $8/month. (that price only when signing up through git-annex http://www.rsync.net/products/git-annex-pricing.html )

Good question! cperciva supports Tarsnap. It's not included on the "deals" page yet, but I'd be happy to add Tarsnap if he wants to participate.

We did our best to invite all of the backup software companies (the more the merrier)! Better deals for customers means more people backing up! To participate, drop me a line at hello@backupweek.com.

I simply rsync my entire server (over ssh) to a large disk on another box. Super easy to set up and the backed up data doesn't need any special software to read/restore, beyond standard 'nix tools.

Except if your files get corrupted, then you're screwed.

Of course, and the same is true of every backup solution. The way to mitigate that problem is with a decent backup rotation scheme:


> the same is true of every backup solution.


A proper backup involves copies that cannot be modified after they are made (i.e. a snapshot after the backup has run if using disk, or for free on tape).

As it was ambiguous, I figured you must have been talking about the more likely scenario whereby files become corrupt on the live server, unknown to the either users or admins. Over time, these corrupt files then make their way through all generations of backup until no good copies remain.

There's nothing special about an rsync/ssh solution that precludes the backup server from creating a read-only copy of each and any backup.

I solve this by supplementing the daily networked backups with weekly manual backups to an external hard drive that remains disconnected at all other times.

A mac user, but I've been using crashplan (linux available) for the past three years and I love it. Haven't had any issues yet.

I second the nomination for crashplan. I use it on Windows, but linux client is available.

Depending on your definition of affordable, I've heard people raving about rsync.net for years.

Is there any good site out there where people showcase their backup setups, similar in spirit to http://macmenubars.com/ or http://usesthis.com/ ?

As many have found first-hand for most people there is too much additional research/knowledge required for them to consider the effort worthwhile, although I like the idea of an awareness week for backups.

Having a video I could point people to, or even simple instructions for a specific way of backing up using Windows or OSX would be a more effective way of reminding people.

There are all kinds of local and cloud backup solutions out there that having a site which filtered through them would actually be great, but obviously outside the scope of the site which is more a friendly reminder to be aware and prepare.

What do you guys think of http://tarsnap.com ? I need to backup some sensitive client data off-location and the concept seems pretty good.

What is your experience in educating non-tech people about the need and adequate methods for backups? I've found that they'll listen, but not act if it takes several steps and new routines.

Another problem I've noticed: many people have no idea how fragile hard drives are, and they keep old ones around for years -- with a precious, unique copy of their early photo libraries... Not only is the hard-drive unused for years, but they sometimes no longer have the necessary connections available (e.g. FireWire) on their latest laptop.

Yeah, I know what you are talking about, I d wish tech giants started having backward compatibility as a rule when creating any new gadget or a means to store data. Probably there's no ultimate means to save the data now.

So true, I have a hard time just getting people to hook up an external to backup via Time Machine. These are the same people that come to me asking if I can somehow magically recover a file for them or get their pictures back...

Is this linked to Reddit's World Backup Day on 31st March? http://www.worldbackupday.com/en/

Hey nodata! Actually we tried to get in touch with the creator of World Backup Day but regretfully we didn't get a response.

So Backup Awareness Week is not associated with World Backup Day, but we support each other's goals of increasing awareness of the importance of regular backups.

Good thing that today, thanks to Dropbox/Google Drive/OneDrive etc., its more easy to get free backups up and running.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact