
Rdedup – backup deduplication with asymetric encryption (in Rust) - dpc_pw
https://github.com/dpc/rdedup
======
sliken
I've been looking for something like this to allow a collection of clients to
encrypt locally and store centrally with deduplication. Ideally without
trusting the central store with the password/passphrase. I don't quite follow
how Rdedup works though.

I get using bup -> chunks and tracking said chunks by their SHA256. Makes
perfect sense. As does making digests of the collections of SHA256s.

However to dedupe across clients you'll be using the encrypted chunks right?

With a libsodium sealed boxes and an ephemeral key the same plaintext on
different clients will be encrypted into different chunks. I'm probably just
confused. You mention deduplicating backups across multiple systems. Maybe a 2
system example would help?

My main confusion is you say "Thanks to public key cryptography, secure
passpharse is required only when restoring data, while adding and
deduplicating new data does not." But if you don't need the passphrase for
deduplication and the encrypted bits use an ephemeral key how could they
possibly be deduplicated?

~~~
luchs
As I understand it (from reading the README) it's storing SHA(chunk) ->
Enc(chunk) in its database. When adding to the backup, it checks whether
SHA(chunk') is already there. Thus, it doesn't have to look at the encrypted
data. However, it also can't verify that a chunk stores the correct data.

~~~
Terr_
So in a multiuser environment, the "first" client to upload an encrypted chunk
and lie about it's plaintext-hash... Which would poison the well and anybody
else gets a nasty surprise when their "backup" is always corrupted.

~~~
robryk
If your encryption is deterministic, the second client can check with the
server that Hash(Enc(chunk)) is the same on the client and server.

~~~
signa11
> If your encryption is deterministic, the second client can check with the
> server that Hash(Enc(chunk)) is the same on the client and server

but the chunk on the server was encrypted using a different public-key, so how
can hash(pub-key-1(chunk)) == hash(pub-key-2(chunk)) ?

~~~
Natanael_L
Isn't only the decryption keys encrypted to the public keys?

~~~
robryk
They use nacl cryptobox primitive.

This means that you are right. Alas, the decryption key (they symmetric key
used to encrypt this particular message) is derived deterministically from the
private key and nonce. The nonce they use is the hash of the chunk. Thus, the
same chunk will always be encrypted with the same symmetric key.

------
mappu
Asymmetric is a nice touch (sets you apart from restic and attic/borg)!

I've PR'd this project into the restic/others list.

~~~
dpc_pw
Thanks, merged!

Yes, the main reason I wrote it is asymmetric cryptography.

~~~
brunoqc
> the main reason I wrote it is asymmetric cryptography

Why do you prefer asymmetric cryptography instead of whatever restic and
attic/borg are using?

~~~
jewel
Asymmetric cryptography means that the private key doesn't need to be stored
on the client or the server.

If you have someone's public key, and no other key material whatsoever, you
can encrypt something in a way that only they can read it. Even you can't
decrypt it.

(Because asymmetric cryptography is slow, if your messages are large you'd
typically encrypt the data with a symmetric key, and then encrypt just that
key asymmetrically.)

------
basemi
Very nice! But how is this different from Obnam? AFAIK obnam also dedups and
encrypts (gpg).

~~~
X86BSD
Or Borg backup?

Seems like a lot of wheel reinvention is going on in this space.

~~~
rcthompson
It seems like the main difference from attic/borg is the the asymmetric
encryption.

------
warmwaffles
So why not just dedup backups and then encrypt the backups instead of
encrypting and backing up?

As I understand it this is doing `file -> encrypt -> dedup -> compress`

~~~
sliken
If you encrypt locally you don't have to trust the backup server with your
privacy. This is doubly an issue when the server is running on some random
cloud somewhere instead of the laptop that you have physical possession of.

The trick is deduplication across multiple client.

In response to your second comment, that's pretty close, the encrypt step is
deterministic (unlike the usual with a unique IV and/or nonce) and compressing
after encryption is useless. If your encrypted blocks are not effectively
random and incompressible you are doing your encryption wrong.

