Hacker News new | past | comments | ask | show | jobs | submit login

A shoutout for attic https://attic-backup.org/

Attic is one of the new-generation hash-backup tools (like obnam, zbackup, Vembu Hive etc). It provides encrypted incremental-forever (unlike duplicity, duplicati, rsnapshot, rdiff-backup, Ahsay etc) with no server-side processing and a convenient CLI interface, and it does let you prune old backups.

All other common tools seem to fail on one of the following points

- Incremental forever (bandwidth is expensive in a lot of countries)

- Untrusted remote storage (so i can hook it up to a dodgy lowendbox VPS)

- Optional: No server-side processing needed (so i can hook it up to S3 or Dropbox)

If your backup model is based on the old' original + diff(original, v1) + diff(v1, v2).. then you're going to have a slow time restoring. rdiff-backup gets this right by reversing the incremental chain. However, as soon as you need to consolidate incremental images, you lose the possibility of encrypting the data (since encrypt(diff()) is useless from a diff perspective).

But with a hash-based backup system? All restore points take constant time to restore.

Duplicity, Duplicati 1.x, and Ahsay 5 don't support incremental-forever. Ahsay 6 supports incremental-forever at the expense of requiring trust in the server (server-side decrypt to consolidate images). Duplicati 2 attempted to move to a hash-based system but they chose to use fixed block offsets rather than checksum-based offsets, so the incremental detection is inefficient after an insert point.

IMO Attic gets everything right. There's patches for windows support on their github. I wrote a munin plugin for it.

Disclaimer: I work in the SMB backup industry.




Sorry, but "Untrusted remote storage" and "No server-side processing" are exactly the opposite of what I need.

If the original box is ever compromised, I don't want the attacker to gain any access to the backup. If you use a dumb storage like S3 as your backup server, you need to store your keys on the original box, and anyone who gains control of the original box can destroy your S3 bucket as well. Ditto for any SSH-based backup scheme that requires keys to be stored on the original box. A compromised box could also lie about checksums, silently corrupting your backups.

Backups should be pulled from the backup box, not pushed from the original box. Pushing backups is only acceptable for consumer devices, and even then, only because we don't have a reliable way to pull data from them (due to frequently changing IP addresses, NAT, etc).

The backup box needs to be even more trustworthy than the original box, not the other way around. I'm willing to live with a significant amount of overhead, both in storage and in bandwidth, in order not to violate this principle.

The backup box, of course, could push encrypted data to untrusted storage, such as S3. But only after it has pulled from the original box. In both cases, the connection is initiated from the backup box, not the other way around. The backup box never accepts any incoming connection.

Does Attic support this kind of use case? The documentation doesn't seem to have anything to say about backing up remote files to local repositories. I don't see any reason why it won't be supported (since rsync does), but "nominally supported" is different from "optimized for that use case", and I suspect that many of the latest generation of backup tools are optimized for the opposite use case.


I really try to restrain myself when a backup article pops up on HN, but there are two things you raise here that I'd like to address ... first:

"Ditto for any SSH-based backup scheme that requires keys to be stored on the original box. A compromised box could also lie about checksums, silently corrupting your backups."

This is a good thought - you should indeed be thinking about an attacker compromising your system and then using the SSH keys they find and wiping out the offsite backup. All they need to do is look into cron and find the jobs that point to the servers that ...

So how do we[1] solve this ? All of our accounts have ZFS snapshots enabled by default. You may not be aware of it, but ZFS snapshots are immutable. Completely. Even root can't write or delete in a snapshot. The snapshot has to be deliberately destroyed with ZFS commands run by root - which of course, the attacker would not have access to. It's a nice safety net - even if your current copy is wiped out, you have your ZFS snapshots in place.[2]

"Backups should be pulled from the backup box, not pushed from the original box."

This was the tipping point - I had to comment. Since day one, we have, free of charge, set up "pull jobs" for any customer that asks for it. Works just like you'd like it to on whatever schedule they can cram into a cron format. It's a value add we've always been happy to provide.

[1] You know who we are.

[2] Yes, if you don't notice for 7 days and 4 weeks that the attacker has wiped you out, at that point your snapshots will all rotate into nothingness as well. Nothing's perfect.


Is it possible to be notified if a certain percent of the backup is changed? Something that would let me tell if something like 50% of the bytes or 50% of the files are different between snapshots? Just a simple 'zfs diff | wc -l | mail' in cron?


By pulling backups, you're giving the backup box full control over your computer, meaning that yes it must be more trustworthy. Push backups can indeed allow the initiator to wreck the remote state.

But in the modern commercial market, who are you going to trust as your backup box provider? A USA company subject to NSLs? Run your own in a rack somewhere? Having an untrusted server greatly decreases cost and increases the chance that you can actually produce a successful backup infrastructure at all.

It is possible to do it safely.

Since you say you're "willing to live with a significant amount of overhead", i would suggest a two-tier push/pull configuration. Desktop pushes to site A; then site B pulls from site A. This also increases redundancy and spreads out the attack surface.

Append-only is another good solution - i don't believe attic formally supports this today but it should be as simple as patching `attic serve` to ignore delete requests. Good first patch.

(Also if you really trust your backup server, then you don't need encryption anyway and can just run rdiff-backup over ssh.)


> By pulling backups, you're giving the backup box full control over your computer

Not really. On my production boxes, I usually set up a "backup" account and give it read-only access to paths that need to be backed up. Nothing fancy, just standard POSIX filesystem permissions. The backup box uses this account to ssh in, so it can only read what it needs to read, and it can never write anything to the production box. I wouldn't call that "full control".

> Desktop pushes to site A; then site B pulls from site A.

What you described is similar to my own two-tier configuration, except I pull first and then push to untrusted storage like S3 (using encryption, of course). The first step uses rsync over ssh. The second step is just tar/gzip/gpg at the moment, but if I want deduplication I can easily switch to something like tarsnap.


I guess it depends on your security model. With one single pull backup,

- if your backup box is on another network then it can be coerced into malicious reads (leaking private information, trade secrets, your competitive advantage etc).

- if it's on the same network then it's subject to your same failure patterns.

Push backup has some disadvantages, but there's a lot of peace-of-mind in never (intentionally) granting additional users access to the unencrypted data.

Two-tier is one approach. There's another comment in this thread about snapshotting filesystems (ZFS, or i suppose LVM snapshots might be easier) which would be another method of addressing concerns about the client tampering with the backed up data.


You can have a self-controled intermediary machine that pulls backups, encrypts, and then pushes them to the untrusted cloud.

When I had no resources for this (eg: low income student), I had a server at my mom's place that did this for me. Low-cost offsite, trustable backup server for personal usage.


If you use a dumb storage like S3 as your backup server, you need to store your keys on the original box

I believe this isn't strictly necessary if you use asymmetric cryptography (e.g. curve25519). For a file, generate a temporary key pair, use it and the backup's public key to encrypt the file, then throw out the private key and send the encrypted file + public key to the server.

Apple uses this technique to move files to the "Accessible while unlocked" state without having the key for that state (i.e. while the device is locked).


Just for the record, asymmetric cryptography is not efficient for encrypting content. What you should do is:

- Generate a temporary key

- Symmetrically encrypt with that key

- Encrypt that key with your long-term assymetric private key, and send the encrypted version along your backups.

And before you hack around your own version, I'd like to point out this is exactly what PGP (and really, any crypto scheme that involves asymmetric keys) does. So, basically, just GPG your backups.


I use duplicity + rsync.net and have wondered about this attack vector. My solution (which admittedly only protects against remote backups being deleted by an attacker, not read) is:

1. Use sub accounts on rsync.net so backups from different parts of the system are isolated from each other.

2. Use a different GPG keypair and passphrase for each host being backed up.

3. Have an isolated machine out on the internet somewhere (that, importantly, isn't referenced by anything in the main system including documentation / internal wikis i.e. so the attackers don't know it exists) that does a daily copy of the latest and previous full backup plus any current incrementals directly from rsync.net's storage. This way I'm still covered (and can restore relatively quickly) if an attacker gets in to the system and deletes the rsync.net hosted backups for lulz.

If you're truly paranoid or need to protect backups going back over months you could also introduce a final routine that duplicates the data from the ghost machine to Amazon glacier (and then optionally pay for an HDD to be shipped periodically to your offices).


Your rsync.net account has ZFS snapshots enabled - at the very least, the smallest default is 7 daily snapshots.

The ZFS snapshots are immutable. Completely. Even root can't write or delete in a snapshot. The snapshot has to be deliberately destroyed with ZFS commands run by root - which of course, the attacker would not have access to.

Also, thanks for your business :)


Ha! Well that simplifies things for me :) Honestly, your service is so solid I can't remember the last time I actually logged in to check something. It really is backup as a utility (you should consider re-branding as "BaaU" lol)


What about append-only remote storage? This is possible (in a kludgy way) in S3: http://stackoverflow.com/questions/10592541/amazon-s3-acl-fo...


Better, but it might still expose to the attacker more data than he would otherwise have access to. For example: production box only contains data from the last 3 days, but the backup contains data from the last 12 months.

Even stricter access controls (write once, no read) might help with that. Not sure if you can do that with S3 though.


>production box only contains data from the last 3 days, but the backup contains data from the last 12 months

Even if the source can only perform new backups, it's a timing attack with a deduplicating system. The attacker can attempt to back up chosen data to infer properties of the existing backups.

You can remove this only by removing deduplication (or by crippling deduplication to work only at the server-side, and incur wasteful network requests)


Which is why tarsnap says about your keys: STORE THIS FILE SOMEWHERE SAFE! Copy it to a different system, put it onto a USB disk, give it to a friend, print it out (it is printable text) and store it in a bank vault — there are lots of ways to keep it safe, but pick one and do it.


Indeed, that's exactly what I do (with a small script that wraps rsync): the backup server PULLS data from my machine, and saves incremental backups (uses hardlinks, see --link-dest). In case my machine is compromised, the backup server is still inaccesible.


You could use versioned buckets in S3 and also enable MFA delete.


From: https://attic-backup.org/quickstart.html#quickstart

"Attic can initialize and access repositories on remote hosts if the host is accessible using SSH."

Fantastic. Will work perfectly here. We[1] are happy to support this just like we've supported duplicity all of these years. EDIT: appears obnam also works over plain old SSH. Can't tell about zbackup, however...

As always, email us to discuss the "HN Readers" discount.

[1] rsync.net


If someone is looking at trying attic and wants something a little nicer to configure and run than the very basic shell script from attic's Quick Start[1], check out the wrapper script I wrote to make this a little easier[2]. It's still somewhat "1.0" right now, but it does the basics for me. See the included sample config files for an idea of how the configuration works.[3]

[1] https://attic-backup.org/quickstart.html#automating-backups [2] http://torsion.org/hg/atticmatic/file/tip [3] http://torsion.org/hg/atticmatic/file/tip/sample/config


I made my own wrapper as well: http://code.ivysaur.me/atticinst.html

Although upon reading yours, it looks they perform somewhat different tasks.


i'm curious: why isn't this part of attic in the first place? i have been working with the bup folks for a while to try to make a similar interface, and i was wondering what was your experience with upstream...


Great question. I do think this should be part of what attic provides out of the box, but I still really wanted to use attic despite the fact that it doesn't include this sort of functionality. I'll try contacting the attic devs and see what they say about it.


Your wrapper is only lacking one critical feature I'd love. I am currently using rsnapshot and while its big issue is lack of encryption, it is able to run scripts on remote hosts to pull backups from them. This is a big deal to me since I can then script things like MySQL/Postgres backups, etc. on my master server, rather than having to configure each host individually.

It's possible that this is a bad way to run things since my master server is then a SPoF. I do trust this server more since I monitor it much more closely, than I would a dozen random VPS's with a half-dozen different providers.


Push vs. pull for backups is an interesting philosophical issue, and there's some good discussion of it below. For many setups, pull really doesn't make sense. For example, one of the machines I backup is my personal laptop, and I back it up to a completely untrusted VPS. Therefore I want to be able to encrypt locally and push that encrypted data to the remote VPS. Pulling wouldn't work here, because then I'd have to hand the keys to my laptop to the VPS.

The scenario you're describing, however, sounds like the opposite in terms of trust. And in that case pull may make sense. However it doesn't sound like attic itself natively supports that sort of config. I could envision a sort of hybrid approach where the local machine encrypts to a local attic repository, and then the remote backup server pulls a copy of it. There's nothing stopping you from setting that up, either with attic as-is or with this wrapper script.


I am also a very happy attic user and can attest to the wonderfulness that is it.

Here's a blog post about it:

http://www.stavros.io/posts/holy-grail-backups/


i was wondering how this would compare with bup. so i did a very quick benchmark: http://anarcat.koumbit.org/2014-11-18-bup-vs-attic-silly-ben...

take it with a grain of salt, but i am surprised to say that attic is at least as fast as bup.


The only point that stuck in my eye was atime modification from attic but I realized that in my environment I would have to mount all the backup clients remotely on the backup server so I can get around this by mounting sshfs with noatime.


Thanks for mentioning attic. Looks like an interesting project which slipped under my radar.


from what i can tell here: https://github.com/c4rlo/attic/commit/f4804c07caac3d145f49fc... ... attic updates the atime when opening files right now... that patche fixes the problem, but i have to wonder how fast it actually and accurate is. bup has all sorts of helpers and tweaks to make it really fast, but also to ensure that it doesn't touch the filesystem while making backups.

i've been amazed to notice how all my files have atimes from july, when i stopped using rsync to make backups and switched to bup. :p


For someone like me who isn't very technically minded (forgive me)...

Could you or someone explain why these fantastic sounding tools don't get a developed front-end? Or if they do why am I missing them?

The best solution I've found is ChronoSync.


Because (for the most part) non server-admins don't do backups. It's like flossing; everybody knows you should do it, but nobody actually does it.


For bup, there is something called Kup (http://kde-apps.org/content/show.php/Kup+Backup+System?conte...), but I haven't tested it.

For duplicity, there is a quite good ui in the form of deja dup (see http://www.howtogeek.com/108869/how-to-back-up-ubuntu-the-ea...). It's really nice and easy to use, and if I recall correctly it's installed by default on Ubuntu.


You might want to checkout Arq from Haystack software - http://www.haystacksoftware.com/arq/ - bring your own storage, GUI client, but with open source client in case they disappear, good support, etc.


Being interested in Linux/Unix, file systems, crypto or whatever other technical topic doesn't necessarily mean you are also interested in developing graphical interfaces. That's an art form and discipline in itself. I suppose the people who make these tools are more interested in the tools' respective technical aspects than in interface or graphical design.

Moreover, many of these backup tools run on servers (or as an automated background process) where a graphical interface is more of a handicap than an asset.

I think that in open-source software "pretty interface" sounds often like "customer-oriented" which in turn sounds like "getting paid".


as is the case with most 100% volunteer-based free software: lack of time and/or ppl to help.

There are currently multiple interfaces available to bup, but each one has some quirks. bup's web interface is still very embryonic and would need the magic touch of some designers / integration specialists to make it fun to work with.


How does attic compare to obnam?


I haven't used obnam much, the feature sets are quite similar, perhaps someone else can chip in here.

My personal choice of attic over obnam was based on this performance comparison[1]

1. http://librelist.com/browser//attic/2014/1/31/attic-vs-obnam...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: