
Duplicity + S3: Easy, cheap, encrypted, automated full-disk backups - tortilla
http://blog.phusion.nl/2013/11/11/duplicity-s3-easy-cheap-encrypted-automated-full-disk-backups-for-your-servers/
======
aroch
I use Arq[1] on my macs and it does the same thing but handles all the
"complicated" bits that I'd rather be programmatically dealt with. I have 6
macs, about 2TB of backups in Glacier and a ~20GB "system image" backup in
hot-storage on S3.

[1]
[http://www.haystacksoftware.com/arq/](http://www.haystacksoftware.com/arq/)

~~~
makmanalp
How do you like glacier? It seems it's fairly cheap until you want your data
back. Any insights?

~~~
aidos
Personally I've factored that in to my cost of recovery. It's going to cost me
about £120 to restore my data over the course of a week. It depends on your
use case but for me - in the event that glacier is the last point I can get my
data from, £120 is a small price to pay to get it back. I used backblaze
before and it was about the same to get them to post out a hdd (which was an
awesome service on the one occasion I had to use it).

~~~
toomuchtodo
It's too bad Amazon doesn't offer a discounted rate to restore from Glacier if
you pay upfront.

~~~
zz1
Sorry to chase you between threads, but yes, I'd like to get in touch for the
Getty files! Care to share contact info?

~~~
toomuchtodo
m8r-fhf8r1@mailinator.com (apologies, would prefer not to share my public
email address here)

I'll reply back from my personal email account.

------
aryastark
This article is about backups for servers. I think some people are missing
that part.

For my personal home machines, I was recently researching backup systems. I
decided against cloud backups and opted for the simple solution of a USB 3.0
drive. It's cheap, and if I ever need to get my backup quickly, I just pick it
up and take it. For critical documents, I can't imagine most people have more
than a few megabytes anyways. That stuff can be archived/encrypted
periodically and sent to cloud email (gmail, yahoo) much cheaper and easier
than worrying with S3.

For the actual backups, I'm using Linux so I went with rsync. See here for
roughly how it's done:
[http://www.mikerubel.org/computers/rsync_snapshots/](http://www.mikerubel.org/computers/rsync_snapshots/)

The USB drive is encrypted, I then tell rsync to use hard links and it does
the incremental backup. The nice thing about hard links is you can get at any
file you want to, in any backup you want to _while_ preserving space. Files in
new backups that exist in _prior_ backups are simply hard links. Then you just
"rm" older backups when no longer needed. It's an incredibly elegant solution
on Linux, especially if you combine it with LVM snapshots.

~~~
Silhouette
May I ask which model of USB drive you're using, if it works reliably with
Linux?

I was reading about these just last weekend, and surprised to hear that
several of the current, big name brand USB hard drives seem to come with
"clever" software, not necessarily easily removable, that means they don't
work like a vanilla disk any more. The result is that operating systems
without supplied drivers, such as Linux, won't necessarily be able to access
the drives properly.

I have no way to tell how much of this is real and how much is down to
misunderstanding, but it was a recurring theme and applied to multiple major
vendors, so as appealing as this simple approach is, I don't really feel like
trusting my critical back-ups to any of these drives until I've got some
facts.

~~~
voltagex_
Seagate and WD drives have worked fine for me, both on ARM and x86.

WD drives detect a wd_ses device as well as the drive and I think that gives
me LED control (great for turning off blinkenlights at night)

My main problem with backup drives at the moment is getting hdparm/power
management settings to stick so the drives will spin down after an amount of
time. I've had more luck with USB than ESATA in this case.

I'd recommend AGAINST any ESATA/USB "docks" \- the USB to SATA bridges seem to
have a very high failure rate
([https://bugzilla.redhat.com/show_bug.cgi?id=895085](https://bugzilla.redhat.com/show_bug.cgi?id=895085)
for the first one I saw on Google)

------
zzzeek
I use duplicity extensively and I've written my own frontend around it.
Duplicity itself works very well, though has many areas in which it could
improve.

One is, it could play within the Python ecosystem more nicely; it doesn't
install with the usual setup.py mechanism and also is very resistant to any
kind of API/in-process usage; it's various features are locked up tight within
its commandline interface so writing frontends means you pretty much have to
pipe to a process to work with it.

Another is that it needs a feature known as "synthetic backup". If you read up
on how people use duplicity, a common theme is the need to "run a full backup"
periodically. This is because you don't want to have months worth of backups
as an endless long stream of incremental files; it becomes unwieldy and
difficult to restore from. In my case, I'm backing up files from my laptop,
and a full backup takes hours - I'd rather some process exists which can
directly squash a series of incremental backup files into a full backup, and
write them back out to S3. I'm actually doing this now, though in a less
direct way; my front-end includes a synthetic backup feature which I run on my
synology box - it restores from the incremental backup to a temp directory,
then pushes everything back up as a new, full backup. My laptop doesn't need
to remain open, it only needs to push small incremental files, and I get a
nice full backup of everything nightly.

------
jewel
The issue I have with duplicity is that every N backups have to be a full
backup, so a large backup (such as my photo collection) becomes impossible.

~~~
orthecreedence
Is this a technical limitation? I'm not seeing anything in the docs that
forces you to do a full backup after N amount of backups (maybe I missed
this).

~~~
Spooky23
You can do many many incrementals, but the restoration process, especially
from a remote source is incredibly slow.

~~~
zobzu
its not that bad. i do it from a shiva plug over a 10mbit DSL link for a few
dozen gigs regularly. its as fast as the DSL will go.

------
sahaskatta
I've been using BackBlaze. I find it both cost effective and time saving.
Here's why:

Price: It costs only $3.96 a month if you pay for two years up front. ($4.17 a
month if you pay for only 1 year up front.) They offer unlimited storage of
everything on internal and external HDDs. 30 days of file revisions too. I
think I have close to 1TB of data backed up with their service. (Using
Duplicity w/ S3 would cost me $90 a MONTH in comparison.)

Time: I've always been using Windows File History to backup onto a external
USB HDD, but I wanted an off-site backup. I initially started backing up to S3
myself and even dabbled with Glacier. However, maintaining this setup proved
to be quite messy. Dealing with encryption and decryption was painful as well.
Having a simple foolproof interface was worth it.

~~~
alex_doom
I've been using them for the past two years as a remote backup. Locally, Time
Machine. I don't even have to think about it which is the most appeal to me. I
come home and everything gets backed up around midnight.

------
jd
Quick correction: it's not true that rdiff-backup has to be installed on the
remote server. Rdiff-backup works brilliantly with a dumb target for storage
and that's how we use it in production.

Our servers periodically run rdiff-backup of all important data to a /backup
partition on the same server. Then this /backup partition -- with the backup
version history and metadata -- is rsynced to various dumb backup storage
locations.

We've tried many backup solutions, and rdiff-backup is by far the fastest and
most robust backup program we know.

~~~
ars
> is rsynced to various dumb backup storage locations.

Rsync requires a program on the other end. True, it's a more commonly
installed program than rdiff-backup but it's not a dump backup.

~~~
andreasvc
The difference is that rdiff-backup requires the exact same version on both
ends.

------
dotemacs
What's the benefit of Duplicity over Tarsnap
[http://www.tarsnap.com/](http://www.tarsnap.com/) ? Thanks

~~~
aquark
I played a bit with both and they both worked well.

Tarsnap: easier to set up and use Duplicity: fully open source, and I think
the restore speed was significantly faster ... but I've not properly
benchmarked it apples-to-apples.

If you are backing up a lot of frequently changing files (eg. database
backups) then tarsnap's bandwidth charges (probably caused by the underlying
EC2 charges) can swamp the storage cost. Running duplicity against rsync.net
avoids this.

~~~
gingerlime
restore speed was significantly _faster_ with duplicity? I haven't used
tarsnap yet, but one of my main gripes with duplicity is the rather slow
restore speed. Is tarsnap even slower?

~~~
tedunangst
Depends. tarsnap restore times should be constant regardless of which backup
you pick, since it's just downloading the relevant blocks. duplicity does full
backup + diffs, so if your data changes quite a bit you can end up downloading
a lot of data that doesn't land on disk. (i believe that's right, not
verified.)

------
banachtarski
I know it's not the point of the article, but "Duplicity" is a horrible name.
It has negative connotations of dishonesty and is only in the same family as
the word "duplication."

------
bkmartin
How would one do something similarly easy and economically efficient for a
Windows platform? We've looked at different services but they are always so
expensive. I would love a piece of software that we can place on each server
and let it do its thing.... of course there is always SQL Server to contend
with, again... expensive solutions exist.

~~~
uptown
Check out Crashplan. You can use their software for incremental backups to the
destination of your choice at no cost, and it's multi-platform provided you
install Java.

~~~
pwenzel
Another Crashplan fan here. You can also install the Crashplan client on
headless Linux-based NAS devices, so long as they can run Java (such as
ReadyNAS).

------
CyberThijs
Has anyone tried the new Glacier-clone of OVH [0] yet? At 0.008 EUR/GBP it
seems to be priced very reasonable, and there are no crazy retrieval price
structures (altrough retrieval has a 4 hour lead time, just like Glacier).

[0] [http://www.ovh.co.uk/cloud/archive/](http://www.ovh.co.uk/cloud/archive/)

------
oliwarner
Cheap? Pish. A lot of server hosts have a local backup solution.

For a 2GB Linode, we're talking about backing up 96GB of storage. More than
that, three backups are made (daily, weekly and 2-weekly). 288GB of active
backup storage. Handled for you. For $10 a month.

Amazon is over twice that before you even consider the hassle of setting it up
and restoring backups.

~~~
harshreality
Until linode breaks or is compromised and your server and backups both get
broken or hacked and deleted.

I don't trust hosting my server and backups with the same company. I backup to
s3 with s3 credentials that forbid removal; files are expired out of the
duplicity-related s3 buckets using native amazon functionality, not duplicity.
It does mean there are some stale incrementals, but that's a small price to
pay.

------
nodata
duplicity is nice, but it doesn't do dedupe. It's also slow.

I currently prefer [http://liw.fi/obnam/](http://liw.fi/obnam/) which does
dedupe and every backup is a snapshot. Only downside is that encrypted backups
are broken if you use the latest gpg.

~~~
2bluesc
I use obnam too. Way better then duplicity with generation snapshots.

------
kudu
If you're interested in this, you should look into DreamObjects
([http://www.dreamhost.com/cloud/dreamobjects/](http://www.dreamhost.com/cloud/dreamobjects/)).
It's a redundant cloud storage service which is compatible with the S3 and
Swift protocols at less than half the price ($0.05/GB). It can be used with
duplicity ([http://www.dreamhost.com/dreamscape/2013/02/11/backing-up-
to...](http://www.dreamhost.com/dreamscape/2013/02/11/backing-up-to-
dreamobjects-with-duplicity/)).

------
zobzu
duplicity + anything = easy,cheap,encrypted, automated full-
disk/incremental/etc backups.

duplicity's simple yet pretty awesome.

------
fwenzel
S3 is actually not that cheap. I have well over 100 GB of data to back up (and
that's not even a full backup by any means, just the stuff I don't want to
lose and can't get back by redownloading and reinstalling things), which racks
up a significant bill on S3. CrashPlan isn't cheap either but the cheapest
ones of the bunch, so I use them instead. Actually getting the data back in
the case of disaster would take a while but I could certainly restore the most
important things first, then backfill the rest as I go.

------
adamonduty
I'm hoping that Space Monkey
([http://www.spacemonkey.com/](http://www.spacemonkey.com/)) helps solve some
of this. It backups up to a local appliance-style box, then syncs with other
space monkey boxes in a peer to peer manner. Like cloud storage, but cheaper
at $10/month for 1TB.

------
engates
Duplicity also works with Rackspace Cloud files as detailed here:
[http://gsusmonzon.blogspot.com/2013/07/backup-with-
duplicity...](http://gsusmonzon.blogspot.com/2013/07/backup-with-duplicity-
and-rackspace.html)

------
Spooky23
I couldn't disagree more. Duplicity is awesome, I had it running in a prod
environment that had some intense security requirements (addressed with
GPGcards for keying)

But for system backups, you need something simple and easy. Just get arq/etc.

------
PlaneSploit
Duplicity has consistently failed to restore for me in the form of deja-dup on
Ubuntu.

~~~
tenfingers
Restoring from duplicity takes a proportional amount of time related to the
backup chain length, as archives are stored as forward-deltas.

Nothing against forward deltas, but just consider that:

\- restoring time increases with increment count \- risk of corruption of a
single archives will make all further increments worthless.

These are very important things to consider when doing a backup.

Currently, you have to limit the chain length by performing a full backup
every N increments (with N being as short as possible), which defeats the
purpose of efficient increments.

I have requested the ability to specify manually the ancestor of the
increments (by time or count), so that one could implement a non-linear
hierarchy with multiple levels like one normally does with tar/timestamps, but
the request was dubbed as "unnecessary given the delta efficiency" (despite
the fact that efficiency is just one variable). Having a 3 level backup
(daily, weekly, monthly) would make duplicity much more space efficient,
reduce the number of full backups needed, and shorten the chains to the level
that would make restores of "last year" actually _possible_.

I sent several patches to fix duplicity behavior with files larger than 1gb
(by limiting block counts), which got integrated, but are still a far cry to
make duplicity work decently as a whole-system backup&restore solution. It's
just too slow. And like you said, several bugs afflicted duplicity in the past
that would make restore fail in many circumstances. I also debugged my share
of issues, which led me to think that very few people actually tried to
restore from arbitrary increments with duplicity and/or used them to archive a
large system.

Many got fixed, but I won't consider duplicity again until I can control the
incremental ancestor and reduce the chain lengths (and it's silly to think
that "rdiffdir", distributed with duplicity, would allow for that easily).

Nowdays I use rdiff-backup, and use a second script to tar/encrypt the deltas
after the backup has completed.

I'm keeping an eye on "bup"
([https://github.com/bup/bup](https://github.com/bup/bup)), but I cannot
backup "forever", thus without the ability of purging old versions it's only
useful in a limited set of cases.

~~~
rlpb
> I'm keeping an eye on "bup"
> ([https://github.com/bup/bup](https://github.com/bup/bup)), but I cannot
> backup "forever", thus without the ability of purging old versions it's only
> useful in a limited set of cases.

I wrote ddar before I knew about bup. It doesn't have this limitation; you can
arbitrarily remove any archive. However, it does not do encryption, so I
wouldn't recommend using it to store on S3 directly.

~~~
tenfingers
I actually tried again today, and this is still promiment in the limitations:

bup currently has no features that prune away old backups.

Because of the way the packfile system works, backups become "entangled" in
weird ways and it's not actually possible to delete one pack (corresponding
approximately to one backup) without risking screwing up other backups.

------
tunesmith
Just for clarification, doesn't this backup approach only give you your most
recent backup? In other words, is it true that it doesn't do "versions", i.e.
doesn't let you restore from n backups ago?

------
dalore
Isn't silence-unless-failed just a way of doing > /dev/null ? Since the stderr
isn't redirected cron will get the errors but the other stuff goes to null.

------
JimmaDaRustla
I'm currently using Duplicity and a raspberry pi to push my encrypted files
off of a NAS to a VPS on a daily basis. Works out to be $1.38/month for 50gb
storage.

------
bowlofpetunias
The main reason why I prefer Duplicity is its portability. I can easily use
the same setup for backups over ssh, rsync, s3, ftp, whatever.

------
buster
OOOrrrr... QNAP home-disk array + deja-dup = automated, really cheap full-disk
backup NOT stored on some US-NSA-server..

------
kbar13
duplicity + s4[0], because s4 is greater than s3.

[0] [https://leastauthority.com/](https://leastauthority.com/)

------
phusion
Ahh the bums who think they have right to my name, despite me using it since
'00 or so. This still looks like an interesting service.

~~~
toyg
care to elaborate?

~~~
jlgreco
They both call themselves "phusion".

