

Encrypted, deduplicated remote backups - StavrosK
http://www.stavros.io/posts/encrypted-offsite-backups/

======
ColinWright
What's wrong with tarsnap? [http://www.tarsnap.com/](http://www.tarsnap.com/)

 _Added in edit: I see that people are saying it 's expensive, and I
appreciate that. When I bundle it into my considerations of how much I spend
on other things, what I get, and my level of confidence in the service, it
seems a good deal to me. In the interests of full disclosure, I don't actually
use anything like this, as I have a storage solution provided for me. But when
I did the sums a while ago, I decided that tarsnap would be my choice if I
needed to make one. For people to say it's expensive is a useful additional
data point for me. Thank you._

~~~
ams6110
Expensive? Don't we have patio11 regularly telling Colin (Percival) that he's
not charging nearly enough?

~~~
Wilya
By telling him to charge more, what he's saying is more or less to drop the
consumer market and sell to bigger businesses.

It would probably be better for Tarsnap's financial health, but it wouldn't
make it a better product for people like the OP who just want to save pictures
of their dog.

------
hlieberman
What about Duplicity [1] ? Seems to hit all the right points - encrypted,
remote, based off of differentials.

There's even a front end of it called Deja Dup [2] being distributed by the
GNOME team and used in Ubuntu.

[1]: [http://duplicity.nongnu.org/](http://duplicity.nongnu.org/)

[2]:
[https://wiki.gnome.org/Apps/DejaDup](https://wiki.gnome.org/Apps/DejaDup)

------
bshep
Personally I use crashplan, its more expensive than your max price requirement
if you use their cloud services, but I also back up way more data (300+GB),
mostly family photos+videos. Also, since its unlimited storage I don't really
care if I suddenly have a couple of extra gigs to store after xmas or
whatever. Oh and finally I can use the same app to backup to my home server so
basically everything is backed up locally and remotely with one app and is
encrypted/deduplicated.

I used to use Arq and it worked great but then the backups became too big.

EDIT: forgot you wanted opensource... oh well... anyway this works for me.

------
ctz
I'm disappointed there's no comment here from tptacek telling you off for that
decidedly dodgy-looking mac-then-encrypt, or the timing side channel in your
mac verification. He must be slipping :)

~~~
StavrosK
Haha, the MAC-then-encrypt should be fine, since it's stored files, and how is
anyone going to use the timing attack when the script runs on trusted hardware
anyway? :P

------
conradev
Meeting most of the requirements on that list, I have found Arq[1] to be a
great backup solution on OS X. It stores encrypted, incremental backups on S3
and although the client itself isn't open source, the data format is[2]. The
company also maintains an open source recovery tool[3]. The only requirement
that isn't met is cost, because S3 costs about $60/yr for 50GB.

[1]
[http://www.haystacksoftware.com/arq/index.php](http://www.haystacksoftware.com/arq/index.php)

[2]
[http://www.haystacksoftware.com/arq/s3_data_format.txt](http://www.haystacksoftware.com/arq/s3_data_format.txt)

[3]
[http://sreitshamer.github.io/arq_restore/](http://sreitshamer.github.io/arq_restore/)

~~~
apaprocki
Happy Arq user here, but the cost of S3 is why I have it store in Glacier
instead. It all really depends how (and how often) you plan on accessing the
backups.

------
skrause
If you're using a Linux-based system
[http://liw.fi/obnam/](http://liw.fi/obnam/) might also be a good choice.

~~~
StavrosK
I tried that on 500 KB when I was testing it and it took a few minutes to back
things up. I'm not sure if it's linear, but I wouldn't want to try the full
set if it's that slow...

~~~
joeyh
obnam's speed is mostly determined by file size and latency. With large files
and low latency, it can push several megabytes a second off my laptop.

It's pretty excellent in every other respect.

~~~
StavrosK
Hmm, I'll give it another shot, thanks. Does it only transfer the diffs too,
or does it have to read everything off the remote server?

~~~
liw
[http://liw.fi/obnam/ondisk/](http://liw.fi/obnam/ondisk/) attempts to answer
that in some detail. In summary: Obnam splits file data into chunks and stores
each chunk only once. It looks up chunks using checksums, and does not need to
download the chunk data again (unless you turn on verification mode to protect
against checksum collisions). Obnam does download B-tree nodes, though, but
they should be a fraction of the size of the file content data.

~~~
StavrosK
Thank you for the summary, it sounds pretty reasonable. I will read the full
document now, hopefully it will elucidate even more.

------
purplerails
Really nice utility. More developers should dip their toes into crypto and
develop applications like this. :)

A comment in the code about why it's OK in your case to use the same key for
MAC and encrypt would be useful. I think you're fine. See here:
[http://security.stackexchange.com/questions/37880/why-
cant-i...](http://security.stackexchange.com/questions/37880/why-cant-i-use-
the-same-key-for-encryption-and-mac)

I needed to implement deduplication in my system. Since I controlled the
server, I developed a slightly more elaborate system which doesn't have the
limitation of a predictable IV (predictable from the encryption key).

So in my system, I derive two keys from the same passphrase (PBKDF2 with
different salts). I encrypt as usual with unpredictable IVs. When uploading,
the HMAC of the plaintext and SHA-256 of the ciphertext are both loaded.

To check for duplication, the client asks if a certain HMAC is already
present. And it's an error (at the server) to upload multiple ciphertexts with
the same HMAC.

~~~
StavrosK
The vulnerability in that post is for using AES-CBC with AES-CBC-MAC with the
same key. I'm using AES-CBC and HMAC-SHA512, which should be okay. The design
was reviewed by cryptographers and was given a green light, plus I tried to
use as little custom crypto as possible for this exact reason :)

------
nilved
I feel that the main problem with online backups is simply a matter of
bandwidth. I have in excess of 3 TB of data and even if I only backed up
things I considered _critical_, it would still be so much that it would take a
week to upload and cost me hundreds of dollars of data overages. Such is
Canada.

It's very telling that I'm even considering this, but if there existed a
backup service where I could mail them a hard drive to load my data initially,
and then upload only incremental changes, I would be all over that. Of course,
uploading videos and such would still be out of the question.

~~~
SimonPStevens
"if there existed a backup service where I could mail them a hard drive to
load my data initially"

Crashplan provide this option. I've never used that specific part of their
service personally, but I've been a satisfied customer of their unlimited
family plan for several years.

~~~
brightsize
Right. See the "Seeded backup" section here:
[http://www.crashplan.com/consumer/details.html](http://www.crashplan.com/consumer/details.html)

------
falcolas
On a slightly larger scale, we work with customers to do encrypted logical
database backups using GPG and s3/Glacier.

Works pretty well, and we're able to de-duplicate because we keep local copies
and make use of hardlinking. A bit of logic around s3cmd to look for
hardlinked files before uploading takes care of the remote de-duplication
(which isn't realistic after the files are encrypted).

Not quite as easy to adapt to non-linux machines or as practical for smaller
files, but it's still quite useful.

------
timbowhite
Can encbup be used for automated backups? Seems like it's using symmetric
encryption, so that the passphrase would need to be entered or passed as an
argument on each run.

Not sure if it meets all your requirements, but duplicity [1] has support for
asymmetric encryption via gpg pub/priv key pairs.

[1] [http://duplicity.nongnu.org/](http://duplicity.nongnu.org/)

EDIT: just noticed the article mentioned duplicity as insufficient.

~~~
StavrosK
You can specify the passphrase on the command line, but it's less secure.
Automated backups are my use case, so that's what it's meant to be used for.

I wanted to also add GPG encryption, but I'd be unable to store the key
locally, so I couldnt find a way to get it for subsequent backups. I'll have
to see how duplicity does it.

------
ams6110
Another option might be Cyphertite. I have no direct experience with it but I
love the browser made by the same group (Xombrero).

[https://www.cyphertite.com/](https://www.cyphertite.com/)

Not clear to me from a quick overview of their site if they do de-duplication.

~~~
Wilya
Yes, they do deduplication. I'm using Cyphertite for some of my backups. Works
fine, and the pricing is a bit more convenient than Tarsnap for home users.

------
jackalope
_EncFS in reverse mode with rdiff-backup would pretty much be ideal, but EncFS
currently has two bugs that prevent this from working._

Does anyone know what this refers to? It sounds like a winning combination to
me.

~~~
StavrosK
Two things, one doesn't store the encfs.xml file in the reversed directory, so
it doesn't get backed up automatically, and the other, more important (for my
use case) is that there's no way to make multiple directories appear under one
(symlinks are exposed as encrypted/broken symlinks). This means that you can
only back up one dir per repo, which I found too constraining.

------
mariusblaesing
Boldshare matches your feature list exactly. Client code will be open source.
Beta in November: [https://boldshare.com/](https://boldshare.com/)

~~~
parley
> Client code will be open source.

On the link you provided I found the following:

"Parts of the programm relevant for security and networking actions will be
open source."

"Parts of the program" might not do it for everyone (me included). Will anyone
be able to audit the full source code of the client and compile it themselves,
like e.g. tarsnap? Otherwise I fear the usual trust issues arise.

If I missed some information at your linked site which says the entire client
will be open source, then please correct me.

~~~
mariusblaesing
You can still tell what exactly happens. Two separate processes might be used:
one for crypto + network which is open source, the one manages synchronisation
(which is the secret sauce of all cloud-sync services, if it works really
good).

Updated the FAQ:
[https://boldshare.com/?lang=en#faq](https://boldshare.com/?lang=en#faq)

~~~
StavrosK
You won't know if the other process is transmitting your key, though.

~~~
mariusblaesing
Communication between open and closed source processes runs via IPC. Since all
the IPC functions are declared in the open source part, you can exactly check
what data is exchanged between the processes -> You can see that keys are not
transmitted.

~~~
StavrosK
How can you check that the closed-source process won't ever read the key from
the disk without elaborate contortions?

~~~
mariusblaesing
What should it do with it? It can't send it anywhere: Block network access for
that process if you don't trust it.

Key won't be accessible on disk anyway, only in RAM during crypto and will be
destroyed immediately afterwards.

~~~
StavrosK
You can make it so that it will only store the key in RAM, but then you'd have
to enter the key every time it launches, making automatic backups impossible.

Although, I agree, if you only allow the open source network access, AND can
ensure that the key won't somehow be smuggled in the data the closed source
process sends, you're probably fine.

------
achille
I'm surprised no one here has mentioned Bitcasa. They provide unlimited
storage for $99/year, and everything is encrypted locally before being sent
over the wire.

~~~
chadk
Not open source.

