

Dropbox clone with git, ssh, EncFS and rsync.net - rsync

We&#x27;ve been trying to solve this problem for longer than Dropbox has existed:  how can you maintain plaintext locally and ciphertext remotely, easily, with efficient changes-only (a la rsync) sync ?<p>We created a lot of half-working Truecrypt&#x2F;Filevault&#x2F;encfs schemes and fiddled with a lot of --partial --progress --whatever switches to rsync, but eventually we just told people to use duplicity[1][2] and call it a day.<p>Then Mr. Raymii came along and dropped this in our lap:<p>https:&#x2F;&#x2F;raymii.org&#x2F;s&#x2F;articles&#x2F;Set_up_your_own_truly_secure_encrypted_shared_storage_aka_Dropbox_clone.html<p>The tl;dr there is:<p>&quot;This article describes my truly secure, encrypted file synchronization service. It used EncFS and dvcs-autosync which lets me share only the encrypted data and mount that locally to get the plaintext.&quot;<p>... and we couldn&#x27;t be happier.  Finally we (and anyone using rsync.net, or any ssh host with git on it) have an elegant way to sidestep the issues of trust and authority[3] over remote data on systems you don&#x27;t control.<p>Enjoy! [4]<p>[1] http:&#x2F;&#x2F;duplicity.nongnu.org&#x2F;<p>[2] ... and it&#x27;s still a very good solution.<p>[3] http:&#x2F;&#x2F;www.rsync.net&#x2F;resources&#x2F;notices&#x2F;canary.txt<p>[4] HN discount is 10c&#x2F;GB&#x2F;mo.  Just email us.
======
nuclear_eclipse
Does this make any attempt to handle conflicts on files when changed on two
offline hosts? I assume since it's based on Git it will just try to do merges
that way, but how does it handle conflicts that can't be automatically
resolved?

------
prirun
You can do this with HashBackup too (I'm the author). There are sites using HB
to backup terabytes of data. One of the larger sites has 25M files they backup
every day.

    
    
      1. Copy ssh public key to rsync.net server:
    
      [jim@mb hbdev]$ scp ~/.ssh/id_rsa.pub XXXX@usw-s002.rsync.net:.ssh/authorized_keys
      Password:
      id_rsa.pub
    			 100%  392     0.4KB/s   00:00    
    
    
      2. Create a local HashBackup backup directory:
    
      [jim@mb hbdev]$ hb init -c hb
      HashBackup build 1070 Copyright 2009-2013 HashBackup, LLC
      Backup directory: /Users/jim/hbdev/hb
      Permissions set for owner access only
      Created key file /Users/jim/hbdev/hb/key.conf
      Key file set to read-only
      Setting include/exclude defaults: /Users/jim/hbdev/hb/inex.conf
    
      VERY IMPORTANT: your backup is encrypted and can only be accessed with
      the encryption key, stored in the file:
          /Users/jim/hbdev/hb/key.conf
      You MUST make copies of this file and store them in a secure location,
      separate from your computer and backup data.  If your hard drive fails, 
      you will need this key to restore your files.  If you setup any
      remote destinations in dest.conf, that file should be copied too.
    
      Backup directory initialized
    
    
      3. Create a dest.conf file in the backup directory:
    
      [jim@mb hbdev]$ cat - >hb/dest.conf
      destname rsyncnet
      type rsync
      dir NNNN@usw-s002.rsync.net:
      password XYZZY
    
    
      4. Enable dedup using up to 1GB of memory:
    
      [jim@mb hbdev]$ hb config -c hb dedup-mem 1g
      HashBackup build 1070 Copyright 2009-2013 HashBackup, LLC
      Backup directory: /Users/jim/hbdev/hb
      Current config version: 0
      Showing current config
    
      Set dedup-mem to 1g (was 0)
    
    
      5. Create a test file, 10 x 10k random blocks
    
      [jim@mb hbdev]$ dd if=/dev/urandom of=ran10k bs=10k count=1
      1+0 records in
      1+0 records out
      10240 bytes transferred in 0.001602 secs (6392272 bytes/sec)
    
      [jim@mb hbdev]$ cat ran10k ran10k ran10k ran10k ran10k ran10k ran10k ran10k ran10k ran10k >test100k
    
      [jim@mb hbdev]$ ls -l test100k
      -rw-r--r--  1 jim  staff  102400 Oct 15 14:20 test100k
    
    
      6. Backup the test file:
    
      [jim@mb hbdev]$ hb backup -c hb test100k
      HashBackup build 1070 Copyright 2009-2013 HashBackup, LLC
      Backup directory: /Users/jim/hbdev/hb
      Using destinations in dest.conf
      This is backup version: 0
      Dedup enabled, 0% of current, 0% of max
      /Users/jim/hbdev/test100k
      Writing archive 0.0
      Copied arc.0.0 to rsyncnet (11 KB 2s 5.0 KB/s)
      Copied hb.db.0 to rsyncnet (4.3 KB 2s 2.1 KB/s)
      Copied dest.db to rsyncnet (340 B 2s 162 B/s)
    
      Time: 4.8s
      Wait: 6.5s
      Checked: 5 paths, 125418 bytes, 125 KB
      Saved: 5 paths, 125418 bytes, 125 KB
      Excluded: 0
      Dupbytes: 0
      Compression: 87%, 8.1:1
      Space: 15 KB, 15 KB total
      No errors
    
    
      7. Create a different test file, with junk at the beginning, middle, and end:
    
      [jim@mb hbdev]$ echo abc>junk
    
      [jim@mb hbdev]$ cat junk test100k junk test100k junk >test200k
      [jim@mb hbdev]$ ls -l test200k
      -rw-r--r--  1 jim  staff  204812 Oct 15 14:24 test200k
    
    
      8. Backup the new file:
    
      [jim@mb hbdev]$ hb backup -c hb test200k
      HashBackup build 1070 Copyright 2009-2013 HashBackup, LLC
      Backup directory: /Users/jim/hbdev/hb
      Using destinations in dest.conf
      This is backup version: 1
      Dedup enabled, 0% of current, 0% of max
      /Users/jim/hbdev/test200k
      Writing archive 1.0
      Copied arc.1.0 to rsyncnet (42 KB 2s 15 KB/s)
      Copied hb.db.1 to rsyncnet (4.6 KB 1s 2.3 KB/s)
      Copied dest.db to rsyncnet (388 B 1s 207 B/s)
    
      Time: 3.0s
      Wait: 6.7s
      Checked: 5 paths, 227898 bytes, 227 KB
      Saved: 5 paths, 227898 bytes, 227 KB
      Excluded: 0
      Dupbytes: 122880, 122 KB, 53%
      Compression: 79%, 4.8:1
      Space: 47 KB, 62 KB total
      No errors
    
    
      9. What do the stats look like?
    
      [jim@mb hbdev]$ hb stats -c hb
      HashBackup build 1070 Copyright 2009-2013 HashBackup, LLC
      Backup directory: /Users/jim/hbdev/hb
    
    		     2 completed backups
    		353 KB file bytes checked since initial backup
    		353 KB file bytes saved since initial backup
    		    0s total backup time
    		176 KB average file bytes checked per backup in last 2 backups
    		176 KB average file bytes saved per backup in last 2 backups
    		  100% average changed data percentage per backup in last 2 backups
    		    0s average backup time for last 2 backups
    		353 KB file bytes currently stored
    		     2 archives
    		 53 KB archive space
    		 53 KB active archive bytes - 100%
    		   5:1 industry standard dedup ratio
    		 26 KB average archive space per backup for last 2 backups
    		   6:1 reduction ratio of backed up files for last 2 backups
    		6.2 MB dedup table current size
    		     4 dedup table entries
    		    0% dedup table utilization at current size
    		     2 files
    		     6 paths
    		    12 blocks
    		     6 unique blocks
    		16,386 average variable-block length (bytes)
    
    
      10. How much space are we using on the rsync server?
    
      [jim@mb hbdev]$ ssh XXXX@usw-s002.rsync.net ls -l
      total 309
      -rw-r--r--  1 XXXX  XXXX     33 Oct 15 18:22 DESTID
      -rw-r--r--  1 XXXX  XXXX  11216 Oct 15 18:22 arc.0.0
      -rw-r--r--  1 XXXX  XXXX  42416 Oct 15 18:25 arc.1.0
      -rw-r--r--  1 XXXX  XXXX    388 Oct 15 18:25 dest.db
      -rw-r--r--  1 XXXX  XXXX   4308 Oct 15 18:22 hb.db.0
      -rw-r--r--  1 XXXX  XXXX   4604 Oct 15 18:25 hb.db.1
    
    
      11. By default, HB creates a local backup too.  Delete the
          local arc files (this is the file backup data)
    
      [jim@mb hbdev]$ hb config -c hb cache-size-limit 0
      HashBackup build 1070 Copyright 2009-2013 HashBackup, LLC
      Backup directory: /Users/jim/hbdev/hb
      Current config version: 2
      Showing current config
    
      Set cache-size-limit to 0 (was -1)
    
      [jim@mb hbdev]$ hb backup -c hb /dev/null
      HashBackup build 1070 Copyright 2009-2013 HashBackup, LLC
      Backup directory: /Users/jim/hbdev/hb
      Using destinations in dest.conf
      This is backup version: 2
      Dedup enabled, 0% of current, 0% of max
      /dev/null
      Removing archive 2.0
      Copied hb.db.2 to rsyncnet (4.0 KB 1s 2.0 KB/s)
      Copied dest.db to rsyncnet (404 B 7s 57 B/s)
    
      Time: 2.5s
      Wait: 9.2s
      Checked: 3 paths, 5692 bytes, 5.6 KB
      Saved: 3 paths, 5692 bytes, 5.6 KB
      Excluded: 0
      Dupbytes: 0
      Compression: 29%, 1.4:1
      Space: 4.0 KB, 66 KB total
      No errors
    
      [jim@mb hbdev]$ ls -l hb
      total 30600
      -rw-r--r--  1 jim  staff       65 Oct 15 14:03 HBID
      -rw-r--r--  1 jim  staff       76 Oct 15 14:05 dest.conf
      -rw-r--r--  1 jim  staff     4096 Oct 15 14:31 dest.db
      -rw-r--r--  1 jim  staff  6291716 Oct 15 14:31 hash.db
      -rwxr-xr-x  1 jim  staff  9182624 Aug 10 09:45 hb
      -rw-r--r--  1 jim  staff   139264 Oct 15 14:31 hb.db
      -rw-r--r--  1 jim  staff     4308 Oct 15 14:22 hb.db.0
      -rw-r--r--  1 jim  staff     4604 Oct 15 14:25 hb.db.1
      -rw-r--r--  1 jim  staff     4012 Oct 15 14:31 hb.db.2
      -rw-r--r--  1 jim  staff        6 Oct 15 14:31 hb.lock
      -rw-r--r--  1 jim  staff      412 Oct 15 14:31 hb.sig
      -rw-r--r--  1 jim  staff      511 Oct 15 14:03 inex.conf
      -r--------  1 jim  staff      333 Oct 15 14:03 key.conf
    
    
      12. Mount the backup as a FUSE filesystem:
    
      [jim@mb hbdev]$ hb mount -c hb mnt >mount.log 2>&1 &
      [1] 62772
    
    
      13. What's in the mounted filesystem?
    
      [jim@mb hbdev]$ ls -l mnt
      total 8
      drwx------  1 jim  staff  1 Oct 15 14:22 2013-10-15-1422-r0
      drwx------  1 jim  staff  1 Oct 15 14:25 2013-10-15-1425-r1
      drwx------  1 jim  staff  1 Oct 15 14:31 2013-10-15-1431-r2
      drwx------  1 jim  staff  1 Oct 15 14:31 latest
      [jim@mb hbdev]$ find mnt/latest
      mnt/latest
      mnt/latest/Users
      mnt/latest/Users/jim
      mnt/latest/Users/jim/hbdev
      mnt/latest/Users/jim/hbdev/test100k
      mnt/latest/Users/jim/hbdev/test200k
      mnt/latest/dev
      mnt/latest/dev/null
    
    
      14. All file attributes are correct in the mounted filesystem:
    
      [jim@mb hbdev]$ ls -l mnt/latest/Users/jim/hbdev
      total 602
      -rw-r--r--  1 jim  staff  102400 Oct 15 14:20 test100k
      -rw-r--r--  1 jim  staff  204812 Oct 15 14:24 test200k
    
      [jim@mb hbdev]$ ls -l test*k
      -rw-r--r--  1 jim  staff  102400 Oct 15 14:20 test100k
      -rw-r--r--  1 jim  staff  204812 Oct 15 14:24 test200k
    
    
      15. Test the backup: are the remote and local files equal?
    
      [jim@mb hbdev]$ time cmp test100k mnt/latest/Users/jim/hbdev/test100k
      real0m2.464s
      user0m0.001s
      sys0m0.005s
    
      [jim@mb hbdev]$ time cmp test200k mnt/latest/Users/jim/hbdev/test200k
      real0m2.271s
      user0m0.001s
      sys0m0.004s
    
      [jim@mb hbdev]$ time cmp ran10k mnt/latest/Users/jim/hbdev/test100k
      cmp: EOF on ran10k
      real0m0.005s
      user0m0.001s
      sys0m0.003s
    
      (ran10k and remote test100k are equal for 1st 10k, which is correct)
    

Beta site is: [http://www.hashbackup.com](http://www.hashbackup.com)

There is a security doc on the site that has been reviewed by Bruce Schneier.

~~~
edwintorok
If someone is looking for an alternative to a closed-source product, I don't
see why another closed-source product would be the answer (according to
Hashbackup's FAQ it won't be open-source). Usually the goal of replacing a
closed-source product is to have something that you can fully control / tweak
/ audit ...

------
kzrdude
Ideally git-annex should be able to cover this in the future.

------
chubot
How much does encryption defeat the rsync algorithm? I don't see anything in
the article that addresses that.

That is, say you change one line in 1 MB text file. If you rsync plain text,
it will transfer a handful of bytes. If you encrypt and then rsync encrypted
text, then presumably a lot more bytes will be scrambled and you will have a
bigger diff to transfer. I guess it depends on the block size.

~~~
rsync
It depends.

If the resulting file is truly random vs. the last time you mounted it, then
you (of course) need to retransmit everything.

However, imagine that every time you closed your TC volume or unmounted
filevault it had to rewrite an entirely new multi-GB file ? Of course that
would take forever and would be difficult to use.

So most encryption image software organizes the image internally such that new
data written only affects portions of the file. And then, in theory, rsync
with options like --partial --inplace --whatever will then transfer it
efficiently.

BUT, our experience is this seldom works properly. The amount of internally
changed data changes dramatically form usage to usage, and often has very
little to do with the amount of actual data you changed. We just never got it
to work well, consistently.

~~~
nsomaru
Does this mean you guys found that a large portion of the encrypted file had
to be retransmitted as compared with the number of bits actually changed?

How does one consistently store encrypted data using rsync.net as a backend,
whilst maintaining the benefits of the rsync protocol?

This seems to rule out the benefits of using rsync w/ encrypted files.

Is there something I'm missing?

------
JimmaDaRustla
YES!! This is exactly what I've been looking to do!! Going to replicate this
to RamNode host where I get 50GB for $1.38/month.

~~~
lcedp
I see price there starting from $24/year.

~~~
JimmaDaRustla
Sorry, forgot to mention the coupon code from serverbear.com: SB31

31% off for life - 24*0.69/12=1.38

Enjoy!

Edit: Not to mention the other uses I get out of my VPS - voice chat server,
VPN tunneling for my Roku (In Canada), game server, etc. Ofcourse, if you are
storing important information, I wouldn't suggest loading up your VPS with
services increasing risk of hacking.

~~~
lcedp
Thank you!

------
rsync
... someone on irc asked about encfs for windows, which appears to be here:

[http://members.ferrara.linux.it/freddy77/encfs.html](http://members.ferrara.linux.it/freddy77/encfs.html)

We have never used this and cannot vouch for it in any way, but there it is
...

------
cypherpunks01
Could Tahoe-LAFS be used for this purpose?

~~~
rsync
A few comments about that...

To be a Tahoe-LAFS "target" you do need to be running their code on the server
side ... and it's python.

We make a point to keep our environment as simple and sanitized as possible,
which implies having no interpretors, so at this time you can't use rsync.net
as a Tahoe-LAFS target.

BUT! We've always been very excited about Tahoe-LAFS and are well acquainted
with Zooko and his team, etc., and so we are experimenting with a frozen[1]
implementation of it that we can place into our environment as a binary
executable[2].

The two solutions aren't really that related, as Tahoe-LAFS (sort of) implies
that rsync.net would be just one of many (perhaps ten) remote containers out
there, whereas this solution is targeted to just one remote host... but since
you asked ...

[1] [http://cx-freeze.sourceforge.net/](http://cx-freeze.sourceforge.net/)

[2] We already do this with rdiff-backup, which is how we are able to support
that ...

~~~
cdjk
That's exciting, and something I might be interested in. My question and
concern, however, is that Tahoe-LAFS seems better suited for distributed
storage among unreliable nodes. I consider rsync.net to be fairly durable
storage, especially since the way I use it is as backup and not the only
storage location for a file. I have the same question about using S3 as a
tahoe backend, which is another thing I've seen.

Of course, you could just use Tahoe-lafs to store everything on one or two
nodes when they're reliable and durable, but then why not just use gpg or
encfs, which don't require custom clients or gateway/introducer nodes?

------
lcedp
> IMPORTANT! Make sure to remove the .encfs files from the secure folder
> before syncing. IF THESE FILES ARE IN THE SYNCED FOLDER, YOUR FILES ARE MUCH
> MORE EASIER TO BE CRACKED.*

What?! No don't do it! If you lost your .encfs6.xml file pretty much you lost
everything. You can't decode your files back. Of course you don't have to sync
it, you can back it up by other means. But you can't just remove it.

------
grigio
I thinks ZFS is more appropriate than git in this context

[http://grigio.org/diy_your_dropbox_backup_zfs_sshfs_and_rsyn...](http://grigio.org/diy_your_dropbox_backup_zfs_sshfs_and_rsync)

~~~
4dl0v3-p34c3
High Five. And with PEFS, there's no comparison:
[https://github.com/glk/pefs](https://github.com/glk/pefs)

List encrypted FSL
[http://en.wikipedia.org/wiki/List_of_cryptographic_file_syst...](http://en.wikipedia.org/wiki/List_of_cryptographic_file_systems)

I just use SSH or stunnel w/ a SQL database with procedures built in.

------
tmikaeld
Umm... Git-annex does this without it taking half an hour to do, plus it takes
less very little effort to setup on multiple locations.

------
jlund
git-annex is explicitly designed to avoid putting file contents under version
control because git doesn't handle large files (like videos) very well. It
seems like this system is running headlong into that problem. I'm probably
going to be sticking with duplicity for now.

------
mdewinter
@prirun is Hashbackup open source? Or close source and filled with backdoors?

