This means that, of course, it works perfectly with rsync.net and that we have yet another chance to offer the HN readers discount, which you may email us to find out about.
I see the calculator floating around in the comments, but it's not formatted in a backup-friendly fashion / use case (ie. set storage size with XYZ% churn for changed files, or continuously expanding snapshots with aged snapshot deletion).
I've always been curious for usage for backup storage. The costs for retrieval are expensive enough (in the time frames I'd care to see for a personal computer) that I think I'll still avoid that aspect for the time being.
Having used the past Glacier support, the new S3 Glacier Lifecycle will be much better.
I am wondering about when (if?) the open source arq_restore and format documentation will be updated.
The software I use: CrashPlan, Arq, DropBox, Carbon Copy Cloner.
CrashPlan: I have backing up my data, configuration /Users, /Library. I have a bunch of regex restrictions to not backup various files such as caches, VMWare images, etc. Crashplan backups up to their cloud servers every 15 minutes. When I am at home (where I work from) it also backups to a local copy of CrashPlan on a server.
Arq: I am doing daily backups to Arq. These are now being done to Glacier for long term / last resort backups. This only backups my /Users with heavy restrictions on which files.
DropBox: I have many of my documents stored in Dropbox with the PackRat feature to keep copies of every version and deletion. I don't consider DropBox to be backup by itself, but I often find it is much faster to find and restore something via Dropbox than other methods. I also take care about the types of data I put on Dropbox.
Carbon Copy Cloner: as I mentioned in another part of this thread, I think SuperDuper is better for most people. However, I do use CCC's ability to remotely do a boot able bare metal backup to my home office server. When I travel, I typically take an external backup drive with a current mirror of my system.
I don't use Apple's Time Machine. I think it is a good choice for most home users. As Apple has added more features to Time Machine, I do think about adding it to my mix.
That covers most things. I do have somethings under SVN or Git which could be considered another layer of backup.
Currently, the biggest pain point for me in backups is VMWare images. I currently have 4 Linux and 3 Windows images on this system, and they can cause a huge amount of data needing to be backed up every time they are used.
This is where a sector-by-sector backup program shines.
I don't know what to recommend on the Mac, but on Windows, ShadowProtect is pretty wonderful. It backs up only changed sectors - update 10MB in a 10GB file and it only copies that 10MB - and it's insanely fast.
Even with a file-by-file backup, one thing you can do for VMware images is to take a snapshot. After you take a snapshot, further changes to the VM don't go into the large .vmdk file you just snapshotted, they go into a new, potentially much smaller .vmdk file, so your next incremental backups may be much smaller.
Usually, the VM disk images are some where between the need to backup Applications and configuration files, and not as important my work and data files.
The problem with VM's is not just the quantity of data that needs to be backed up, but the overall size of the data that needs to be evaluated. I think that CrashPlan does a pretty good job of just coping the changed data of the disk image, but it has to do a HUGE amount of processing with every backup. Therefore VM's are hard to fit in with the remote versioned backups of CrashPlan and Arq.
I do backup these up via Carbon Copy Clone when I mirror the entire drive.
That would be quite a project, compared with backing up everything on the host system as I do now.
I have all sorts of VMs. Some of them are extremely minimal OSes (think router/firewall distros). I have no idea how I would be able to back these up from inside the VM. And even for the VMs where I could do that, why bother? It seems like a lot of work.
By having an extremely fast sector backup running on my host system, I can be sure that all of my VMs are backed up, with no extra effort when I install a new one. I don't have to worry about how I would do a "restore" in any of those specific VMs, I can just restore files on the host OS and know that it will work perfectly.
A much simpler and more convenient solution.
> DropBox: I have many of my documents stored in Dropbox with the PackRat feature to keep copies of every version and deletion.
Aren't you worried about the lack of encryption?
I live in a office and work environment with windows that people can look in. I don't much like people staring in, but I am also not about to keep the blackout curtains drawn at all times.
I do wish that there were other more secure options than Dropbox, but the combination of easy of use, price, third party support, and collaboration make Dropbox hard to beat.
1. Bug in your backup software
This is addressed by using more than 1 piece of backup software.
2. Corruption in your live data
(i.e. your filesystem corrupts your favourite baby photo)
This is addressed by having lots of incremental backups going back into history. Note that Time Machine throws away historical incrementals over time, so does not protect against this, given long enough time windows.
3. Failure of your backup hardware
This is addressed by using more than 1 piece of backup hardware.
4. Destruction of your backup hardware
This is addressed by having your backups exist in more than 1 physical location, so you can never lose your live data and all your backups because of, say, a house fire.
5. User error deletion of data
This is addressed by having backups that run frequently.
My strategy is:
* Time Machine to a Time Capsule on my LAN
* Time Machine to an external disk on my Mac
* Nightly Carbon Copy Cloner clone of my entire disk to (the same) external disk on my Mac
* Nightly Arq backup to Glacier's Ireland location (I live in London)
So (in addition to the live copy of my data on my Mac's main disk) I have 4 copies of my data, from 3 different pieces of backup software, on 3 different pieces of hardware, in 2 different locations. The CCC clone is there mainly because it's bootable, so if my mac's SSD fails, I can reboot and hold a key and I'm no more than 24 hours behind.
It's unfortunate a few things appear to be backwards - why can you include wifi APs, yet not exclude them - despite the example suggesting you exclude from tethered devices.
Likewise, why can you email on success, but not failure.
If you want to stick with Arq 3 you can. Delete Arq 4. Download Arq 3 (http://www.haystacksoftware.com/arq/Arq_3.3.4.zip).
Launch Arq 3.
You'll have to find your old backup set under "Other Backup Sets", select it, and click the "Adopt This Backup Set" button.
Sorry for the hassle.
CrashPlan: I have used since soon after they first appeared for Mac. They have really strong compression and de-duplication to minimize and speed data transfers. I personally use their consumer and small business solutions. However, I also maintain their CrashPlan PROe enterprise backup for several clients. The fact that they have a very strong enterprise product, provides me with a great deal of trust in the quality of CrashPlan's work. I think they have be best solution I have used for a notebook that is on the move. I backup to both their remote servers and to my own home office server. Thus, I have the option to quickly restore from my own local server, their much slower remote server AND I can have a CrashPlan Next Day send me a copy of my data on a disk drive. I do wish they could get rid of the Java dependency for their Mac Client software since it is a RAM hog. CrashPlan rates very well at saving Mac OS X meta data.
Arq: I like the approach to backing up to Amazon S3 which I know is a very reliable storage environment, and Glacier has made it dirt cheap for last resort archival backups. I like the fact that at least through Version 3 there has been an open source software GitHub hosted restore. If Haystack software disappears there are still options to restore. I believe Arq is one of the very few Mac OS X remote backup systems that preserves ALL meta data.
I have used lots backup software over 30 years. Every backup system has failings and bugs. And the operator (normally me) is capable of making mistakes. That is why I use multiple products to do backup.
I am interested in exploring Arq new features especially using SSH/SFTP which will allow me to self host, and may cause me to re-evaluate my overall backup approach.
I left JungleDisk because it went sideways and S3 was too expensive. After that was CrashPlan; I liked its free remote backup option. But then my backup destination disappeared behind carrier grade NAT. That left me with paying for regular CrashPlan or looking elsewhere. Enter Arq.
Based on my estimated usage, for two computers, I calculated the following estimated yearly cost.
JungleDisk S3 $288
Arq Glacier $ 32
This month is the first full month in which I'm not seeding my initial Arq backup to Glacier. I'm hopeful that the cost will be significantly lower than CrashPlan.
Even if it was more expensive for me, I would still switch, because I don't trust Crashplan completely. There have been stories from users of backups getting corrupted when they needed to recover, and the upload speed to Crashplan is so slow it took months for the full 300GB to upload (I'm getting around 0.5 - 2Mbps up on my 100Mbps/100Mbps connection, I believe they are artificially throttling it to discourage people from storing a ton of data). This means new data takes a long time to be 100% safe, especially when for example I dump my camera's memory to disk.
On top of that, if their upload speed is this low, their download speed probably is, too. If my data crashes, I need the backup yesterday. I can't wait a week to download the 300GB at 10Mbps.
I believe Amazon's speeds would be much higher.
For example, suppose you're restoring 50 GB. If you want to start the retrieval 4 hours from now (the minimum), you'll pay $97. If you're willing to wait 10 hours, that drops to $43. 20, $25. 40, $16. Goes down to $7 at the limit.
You can play with the cost at http://liangzan.net/aws-glacier-calculator/
I'm curious what backup tools people use on Linux if they want to back up files on Glacier. I use git-annex for certain files (it works well for pictures and media). The rest of my backup process is a fairly rudimentary (though effective) rsync script, but it doesn't use Glacier.
My current setup works fine for me, but I imagine there are better tools out there.
It tars a directory, naming the output with a hash of the original directory name. Then it encrypts it with gpg and breaks it into small parts (100M) so I can pace any needed Glacier restores so as not to break the bank. Then it runs par2 on each part, to make it more likely that I can recover from any file corruption. Then it uploads each part and the par2 files to an S3 bucket which is set (via the S3 admin web dashboard UI) to automatically transition the files to Glacier.
The shortcoming is it's not a whole-system backup. Also it doesn't do differential backups, though that's not a problem for me because I organize things such that old stuff doesn't change often if ever. It's dirt cheap, one-command simple, and feels pretty reliable... though I must admit I haven't tested a restore!
Today v4 has been released and comes with new storage options (GreenQloud, DreamObjects, Google Cloud Storage, SFTP aka your own server), multiple backup targets, unified budget across S3 and S3/Glacier, Email notifications and many more clever features.
Can anyone recommend a SFTP backup provider?
My Arq backups are designed to be worst-case. I have other, local backup options in case of failure. I was using Glacier, but I ran into Arq3 sync problems and I need to re-upload all my data. Glacier is very slow from where I live. I assume SFTP will be a bit faster.
The new-user-HN-discount is 10c per GB, per month, and there are no other (traffic/bandwidth/usage) costs. Our platform is ZFS and there are 7 daily snapshots for free.
We would be happy to serve you, as we've been serving thousands of users since 2001.
It's called Filosync and is like Dropbox but secure and with your own (or with Amazon) servers. Check out http://www.filosync.com/ for more information. And be warned, it's pricey!
Also, keep in mind that Boxcryptor and Boxcryptor Classic are two different products. If sharing files in an encrypted way with other people or having a master key is part of your use case, Boxcryptor Classic is not a great choice. But if you just need a way to store your own files in the cloud, with seamless sync between machines, and without privacy concerns, Boxcryptor Classic has worked very well for me.
Could anyone clarify whether my calculation is wrong?
48 a day * 31 days a month = 1488 uploads a month
5.5 cents per 1000 uploads
so this would cost roughly 1.5 * 5.5cents, so about $0.0825 a month.
There would be data and storage costs but I'm just going by your upload request calculations.
S3 is more suitable for things you are working on like codes.
I'm traveling in hotels with shitty wifi most of the time. It's hard enough to browse so I'd really only like to backup while I'm sleeping. Also, in the interest of not hogging all the bandwidth I'd like it to stop by say 6am so that as guests are waking up they can use the bad internet.
You can. After you setup a target open the Preferences window. Select the target therein and click the "Edit..." button.
The dialog that follows has an option to "Pause between [00:00] and [00:00]", where [00:00] is a drop-down which lets you pick the top of any hour of the day.
1. In former versions, it was difficult to see what was actually part of the backup. Has that changed?
2. Is there any reliable way of calculating how much an Arq backup will cost? Storage costs are easy to calculate but with Amazon S3, changes etc. are a major cost factor.
In other words, can someone sell me on the idea of paying for AWS storage when I have dirt cheap storage around my house and even a remote location that I can stuff a huge drive in.
I'd recommend that someone using Sync for backup purposes take snapshots of they're syncing for backup purposes.
Alternatively, you could roll your own solution on top of Sync by encrypting your files on your own and creating a Sync folder of the encrypted files.
I generally prefer SuperDuper for its simplicity, and recommend it to most people.
CCC is not really harder to use, but presents a bunch more options which most people don't need and with which they may get into trouble. One great feature of CCC is the ability to do bootable backup to a remote volume. I have my MacBook set up to backup to a server in this fashion. However, this requires you to configure your remote server with root SSH access via certificates.
That said, you can also use CCC and SuperDuper to backup to a disk image file. This would not be directly bootable, but it can be copied to a new hard drive which would then be bootable. Backing up to remote disk images is much slower than the root method to drive method.
http://www.haystacksoftware.com/arq/ links to ?product=arq3 though, text "per computer for Arq 3".
Please consider integrating or linking http://liangzan.net/aws-glacier-calculator/ (as found in the HN comments somewhere around here), it's been invaluable when talking with people about Arq today.
pay close attention to the part about providing a clear call-to-action. :-)
For my own Mac I backup to a sort of off-site NAS with TimeMachine, which is fine for all but the worst-case, meteor to the city situations. However, Glacier makes a perfect option for me to deal with this actual worst-case scenario.