Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: NFreezer – Encrypted-at-rest remote backup tool (github.com/josephernest)
96 points by josephernest on Nov 28, 2020 | hide | past | favorite | 49 comments


Hi there! Author here.

I created nFreezer initially for my needs, because when doing remote backups (especially on servers on which we never have physical access), it's hard to 100% trust the destination server.

Being curious: how do you usually do remote backups of your important files?

--> With usual solutions, even if you use SSH/SFTP and an encrypted partition on destination, there will be a short time during which the data will be unencrypted on the remote server, just before being written to disk and before arriving to the encrypted filesystem layer.

Thus this software nFreezer: the data is never decrypted on the remote server.

How do you work with this?


borg[1] has become the de facto standard for this use-case.[2]

It can run over SSH with the borg binary on the remote server or it can run in an SFTP mode with nothing installed on the destination.

[1] https://www.borgbackup.org/

[2] https://www.stavros.io/posts/holy-grail-backups/


As I discover it, borg seems to be pretty good for this use case indeed.

But still there is one thing (at least) that can be of interest to some people with nFreezer: it is very simple, it does the job in only 249 lines of code. You can then read the FULL source code in a few hours, to see if you trust it or not. See here: https://github.com/josephernest/nfreezer/blob/master/nfreeze...

If I want to do this with the source code of the tool you mentioned, I would have to spend at least one full week. (this is normal: this program has 100 times more features).

The key point is: if you're looking for a solution for which you don't want to trust a remote server, then you probably don't want to trust the backup tool of a random internet person either. And you probably want to read the source code of the program.

So having only < 300 lines of code to read in a single .py can be an advantage.


"But still there is one thing (at least) that can be of interest to some people with nFreezer: it is very simple, it does the job in only 249 lines of code. You can then read the FULL source code in a few hours, to see if you trust it or not. See here: https://github.com/josephernest/nfreezer/blob/master/nfreeze..."

I really appreciate this and find this very interesting.

I would be happy to give you a free dev/test account at rsync.net if that would help you continue this development.


I don't see the key point: the two things (trusting the remote storage and the tool) are rather independent for me. I already trust probably billions lines of code which handle my data (basically, every program installed on my system). I don't trust them because I checked all of them, but because I know that the same lines are used by many other people, some of which have actually read parts of them. There is a global trust repository to which every programmer contributes a little bit and every user benefits a little bit. For this reason, I find more trustworthy a project already used and developed by many people, like borg. I didn't just take that for granted, of course: I read the documentation (which, in the case of borg, is very clean and extensive), I looked how it work, I even looked a part of the source code, and I was satisfied. And all of this has nothing to do with where I am putting that data in the end.

On the other hand, missing features in the name of small code size is not a great advantage, if those missing features make my backup less reliable or quick or compact. And maybe I end up backing up less stuff because (say) the simple tool does not handle deduplication or compression as effectively as the complicated one, and so is not viable for me. This would be a net loss.

Be careful, I am just saying my perspective. You have, of course, the entire right to decide what are your priorities and which tool is better for you.


From https://github.com/josephernest/nfreezer/blob/master/nfreeze..., this is how you decrypt files:

   [...]
   with open(f2, 'wb') as f, src_cm.open(chunkid.hex(), 'rb') as g:
      decrypt(g, pwd=encryptionpwd, out=f)
   [...]

   
   def decrypt(f=None, s=None, pwd=None, out=None):
     [...]
     while True:
        block = f.read(BLOCKSIZE)
        if not block:
            break
        out.write(cipher.decrypt(block))
     try:
        cipher.verify(tag)
     except ValueError:
        print('Incorrect key or file corrupted.')
So, basically, you decrypt the whole file and write the result before checking the tag. You're using a block cipher in a streaming fashion, and as has already been said before (see https://www.imperialviolet.org/2015/05/16/aeads.html, "AEADs with large plaintexts") this is dangerous if you don't do it correctly. Your data may be garbage, but it's too late it's already written on the disk before you know it's bad, it's not deleted and you won't know which file it was.

As some HN crypto celebrity said some time ago, if you write "AES" in your code then you're wrong. You MUST use misuse-resistant libraries unless you know exactly what you're doing.

TL;DR: your crypto is broken, use NaCl instead of doing it yourself.


    except ValueError:
        print('Incorrect key or file corrupted.')
So if there is a tag problem, this is clearly logged, and you know something is wrong. The good thing is that you can easily edit it and have instead (pass the filename fn to the function, to be able to log):

     print('Incorrect key or file corrupted, will be deleted:', fn)
     os.remove(fn)
     exit(...)     
Real question: is it possible to `.verify(tag)` before having decrypt(...) the whole file? I doubt it is possible. So an option could be to write the file in a temporary place, and then, only when tag is verified, move it to the right place. Delete it if it is not verified. Another option would be to do a first pass of decrypt(), without writing anything to disk, then get the tag, verify it, and then if ok, redo the whole decryption with writing on disk this time. The latter might be a bit too extreme and halfs the performance.


> So if there is a tag problem, this is clearly logged, and you know something is wrong [...] The good thing is that you can easily edit it and have instead (pass the filename fn to the function, to be able to log)

That's the thing: the script in its current version is incorrect, and even doing that won't be a perfect solution. That's why other people are saying that other softwares, with large usage and that can do more than what nFreezer can do, should be analyzed before trying to do it your own way.

It's good to not rely on anyone else, but crypto is the one domain where you can't have "good enough" -- it's either correct, or it's not.

> Another option would be to do a first pass of decrypt(), without writing anything to disk, then get the tag, verify it, and then if ok, redo the whole decryption with writing on disk this time

Yep, that's the way: do the decrypting in memory, or in /tmp, verify the tag, and only after you can put the file where it belongs. I just checked the API of the crypto module, and there's a `decrypt_and_verify` that should do it properly.

Of course that's problematic especially for big files, so what you want to do is chunk the files, encrypt the chunks separately and store the file as a list of such chunks.

The step after is to use Content-Defined Chunking, ie chunking based on the content of the file. This way when a big file modifies only the chunk around the modification will change, the rest of the file will be chucked exactly the same way. So you don't need to store the full content of each version of the file, just a small-ish diff.

That's not a novel system, bup (https://github.com/bup/bup) kinda pioneered it... and as others have advised, restic, borg-backup and tarsnap do exactly that.


bup (https://github.com/bup/bup) kinda pioneered it... and as others have advised, restic, borg-backup and tarsnap do exactly that.

According to wikipedia, bup was released in 2010, 3 years after Tarsnap started doing this. (And Tarsnap wasn't the first either.)


To clarify: if you start from a given nonce and key:

    cipher = AES.new(key, AES.MODE_GCM, nonce)
    while True:
        block = f.read(16*1024*1024)
        if not block:
            break
        out.write(cipher.encrypt(block))
you get exactly the same result as if you do (with a big RAM, bigger than your file) it in one pass:

        cipher = AES.new(key, AES.MODE_GCM, nonce)
        out.write(cipher.encrypt(f.read()))
Please try it with pycryptodome, you will see it is.

You might find this unforunate in the naming, and .init(), .update(), etc. might have been better names to emphasize this.

So this shows that, in its current state, the chunking is just a "RAM-efficient" way to encrypt, but it writes exactly the same encrypted content, as if you did encrypt(...) in one pass. So as long as the file is under ~2^39 bits, it is fine (see https://csrc.nist.gov/publications/detail/sp/800-38d/final).

___

Then, there is another layer of chunking that would be possible, and that would add many benefits: even better deduplication, avoid to reencrypt a whole 10GB file if only a few bytes have changed. This would be an interesting addition, it's on the Todo list, and I know some other programs do it, of course.

But to clarify: this "content-chunking" is independent to the "RAM-efficient" one I use here.

If you want to continue the discussion (more convenient than here), you're very welcome to post a Github issue.

Thanks for your remarks, I appreciate it.


Very well said. Sometimes simplicity carries immense value and reassurance.


Don’t forget restic - it also works great on s3 compatible services!

https://restic.net


Yes, restic would be my pick: easy to deploy and use, well-maintained.

If you want some assurance, CERN are deploying restic:

https://indico.cern.ch/event/862873/contributions/3724442/


Yes restic is great! No python no dependencies (like borg), just one GO exe and that's it.


I'll have a look too.

When using with SFTP, do Restic binaries need to be installed on both local and remote, or just local?


Also want to plug an old HN favourite - tarsnap.

https://www.tarsnap.com/


> How do you work with this?

My setup depends only on crappy home made bash scripts and standard utilities, and never leaks encryption keys or filesystem metadata off the local machine. I rsync my home directory with a luks encrypted filesystem image, split the image file into fixed sized chunks using the standard GNU split utility, and then rsync the encrypted split up chunks with a directory on a remote server. The remote server periodically syncs the encrypted image chunks with S3 using the usual aws cli tools, and runs a script to "cat" the chunks back into a remote copy of the whole image and display the overall md5 sum. Another machine off site keeps amazon honest by syncing the s3 bucket with a local copy, reassembling the encrypted filesystem image, and computing the md5 sum. An extra benefit is that I can use the filesystem image on the remote server over sshfs as the backing device for a locally loopback mounted luks partition to access or modify individual files without downloading the whole image or corrupting it, provided no more than one session is open at a time. Back when I was more paranoid, I used only a raw dmcrypt image rather than a luks encrypted image in this setup so that there would be no luks header to give it away and I could plausibly claim that it was one-time-pad encrypted data, for which I could construct a key to yield any plaintext of my choice. (Yes, I know I won't be such a smartass when it comes to a rubber hose attack.)


Personally, I use zfs on my home server. All my other machines rsync to a backups folder on there, and once a month a scheduled email reminds me to grab my backup usb hard drive and do a zfs send | recv to sync for an offline backup.

If you were using an untrusted server as a zfs send target, as of a few months ago, you can actually do an encrypted send to a remote without unlocking on the remote, which is pretty cool.


restic, borg, even plain rclone. plenty of backup tools exist with encryption at rest, what's the key selling point of yours?

I use rclone+restic because of google drive(=cheap unlimited storage) integration in rclone, and restic provides snapshotting on top.


Thanks for asking the question, it's right, there are thousands of backup programs already.

The "key thing" that lead me to code my own is that with nearly all the solutions I tried, data was resent over network when a big file is moved to another dir+renamed, WHEN used in encrypted-at-rest mode.

It's the case for rsync (the --fuzzy only helps when renamed but stays in same dir), duplicity, and even rclone when in encrypted mode (see https://forum.rclone.org/t/can-not-use-track-renames-when-sy...: "Can not use –track-renames when syncing to and from Crypt remotes").

So I wanted to solve this problem, and it works. See point #3 of https://github.com/josephernest/nfreezer#features.


> The "key thing" that lead me to code my own is that with nearly all the solutions I tried, data was resent over network when a big file is moved to another dir+renamed, WHEN used in encrypted-at-rest mode.

restic (and i suppose borg which is similar) solves this problem and goes even further by chunking your files, hashing and deduping the chunks - chunks with the same hash aren't resent. Great e.g. for backing up VM images or encrypted containers where only a small part of the file change - only that small part will be resent between snapshots. Chunking algo is "content-defined", can probabilistically quite efficiently detect shifted chunks and duplicated chunks across different files.

(Naturally, all this machinery will also handle the simple cases of renamed and duplicated files on your filesystem)


>Being curious: how do you usually do remote backups of your important files?

I keep my important files in a ~50GB veracrypt container then backup that to multiple locations including a local usb disk every day or so with a shell script.

dropbox nuked its copy once but it worked fine again after a name change (version number)


Do you copy the entire 50GB on every backup, even if you've only changed 1MB?


To an external 1TB HDD yes. Keep one for the end of each month. Veracrypt containers rsync and dropbox quite nicely, just keep one copy of those.


I use dar[0] to create incremental, compressed, encrypted backup archives, and then upload those to a GCS bucket. GCS is dirt cheap even for terabytes of data, so as added bonus this gives me a history of all my files (eg, i still have backup archives from 2013). Both calls wrapped into a shell script that is regularly scheduled via a cronjob on my local server. Within my network i sync between laptop, pc and server via syncthing. Works well for me, even though it doesn't handle file renames gracefully.

[0] http://dar.linux.free.fr/


Do you think there is a way to avoid to retransfer files that have been renamed/moved with this method?

This is the use case I mentioned: I often move files from one dir to another when working on audio- or video-production projects, and it can be dozains of gigabytes.


Within my home network, syncthing handles file renames gracefully [0]. For backups to cloud storage, This might require changes to dar, because according to [1] it's unclear how that tool handles renames. Since I only do these off-site backups to cloud storage once a month, I'm okay with it using a bit more bandwith/storage (as long as it runs over night and doesn't get in anybody's way).

[0] https://forum.syncthing.net/t/rename-file/11091 [1] https://wiki.archlinux.org/index.php/Synchronization_and_bac...


Nice work. I personally use Arq, which supports SFTP. And like you, I backup to a remote server over which I have no control. Arq is really desktop software, however.


Rclone for encrypted sync (can be virtually mounted) and Duplicacy for encrypted backup. What makes nFreezer different?


nFreezer is different to RClone and Duplicity on a very important point (for me because I often move big multimedia projects): if you move `/path/to/10GB_file` to `/anotherpath/subdir/10GB_file_renamed`, no data will be re-transferred over the network, thus saving 10 GB of data transfer.

Indeed, Rclone does not handle renames when in encrypted mode, see https://forum.rclone.org/t/can-not-use-track-renames-when-sy...

Idem for duplicity (tests done).

TL;DR: see 3rd point of https://github.com/josephernest/nfreezer#features

___

Another interesting feature is that you can read/audit the full source code of nFreezer quickly: it's only 249 source lines of code, as of today.


Duplicacy is a separate backup tool, but has a confusingly similar name to duplicity. I believe its chunking algorithm means large renamed files aren't re-transferred.


And don't forget Duplicati!


Thanks for sharing! Is there a reason why borg is excluded from the comparison?


I will add it.

You will probably be surprised, but I wasn't aware of this one when starting my project! In a way, this is fortunate, because it was an interesting journey to code this :)


I'm currently writing my own backup tool because of one feature I want that nothing else seems to have. In my opinion, it's not enough to distrust the server where the backups are stored. The client should also not need to keep around a key that can be used to decrypt the backups for the purpose of encrypting new backups.

This is quite easy to accomplish with public key cryptography - just use a new, random AES key for each backup job, then encrypt that with a public key you keep around. The private key can be stored offline in cold storage.


But asymmetric crypto does that under hood (hybrid encryption).

In fact, if you encrypt the same file twice with rsa you get different cipher text.

No one encrypts a file directly with RSA. It’s the data encryption key that is encrypted .


rdedup (mentioned in a few other comments) uses asymmetric cryptography[1].

[1]: https://github.com/dpc/rdedup/


Thanks - this seems useful, but currently

> cloud backends are WIP

and I probably don't have good enough rust skills to help out myself practically.


Cloud backends aren't really needed imho if you use something like rclone. Native cloud backends can make transfer order more intelligent but rclone does deletes last meaning you won't lose data if the transfer dies mid-transfer.


Duplicacy can do that with it's RSA option. My private key isn't on the VM doing the backing up.


I use Borg to a local home server and then Duplicity on the Borg repository to Google Drive (where I happen to have a lot of space for free due to my university). In this way I am protected from both loss or breakage of local server and from revocation of remote account (as Google is known to do). Also, backup is quicker on the local network.


That's a nice setup, I like it!


My current solution to perform backups with local encryption (locally encrypted before syncing) is to use a gocryptfs setup. Specifically, I create an overlay with gocryptfs in which the overlay has the unencrypted filesystem, and what's physically written to disk are the encrypted files.

Then, I create another mount on the physical disk and simply sync _that_ mount to multiple remote sources. Of course, this solution is fairly simple, and does not provide the features like file moves. Further, this approach potentially "leaks" some information, such as how many files I have, the approximate sizes of each file, etc.

This was set up a while ago; will definitely take a look at NFreezer to see if it's time for a refresh of my setup.


Don't forget to mention rdedup (https://github.com/dpc/rdedup), which is an impressive piece of software. The advantage it has over all other solutions I came across is, that it can do UNATTENDED backups without providing the encryption key... on the other hand it does not have a file scanner, which is a really pity.


I thought there are a lot options for this already (perhaps even any descent back up tool?)

For example, restic with SFTP backend. I thought restic encrypts data client side before SSH transport. Am missing something?

Other tools: duplicacy, rclone, Borg (this might run on server), ... Duplicity does asymmetric and symmetric crypto, but no dedup. So if you change a file path, it “might “ be reencryptped.


> Am missing something?

about duplicity and rclone, they don't track file renames/moves in encrypted mode (https://forum.rclone.org/t/can-not-use-track-renames-when-sy...).

i haven't tested the two others (duplicacy, borg), but by reading other comments, it seems the second one has this feature.


So many people don't understand encrypting at rest

To the point where in my organisation we're told to do application encryption on resources that really don't need it

Please read what encrypting at rest is for here: https://www.hln.com/encrypting-data-at-rest-on-servers-what-...


Remember encryption isn't a magic method of preventing hackers getting your data remotely

A security problem is still a security problem, encryption at rest is about physical security and encryption in transit is about moving data securely between locations

Your applications still need to be secure otherwise encryption won't save you


How does backblaze compare on the features chart: https://github.com/josephernest/nfreezer#comparison




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: