Hacker News new | past | comments | ask | show | jobs | submit login
Ransomware-resistant backups with duplicity and AWS S3 (franzoni.eu)
87 points by alanfranz on Jan 27, 2022 | hide | past | favorite | 59 comments



> you'll need to make sure that your master access to AWS S3 is never compromised

Your master access to S3 should never go into your servers. Create an IAM access with authorization to only PUT objects into S3.

> For the purpose we have, Governance mode is OK

Maybe not (?), since Governance mode allows for deletion of previous versions. One careless mistake handling your access key/secret and you're exposed to bye bye backups.

End note: this is still not enough. An attacker could compromise your backup script and wait for 40 days before locking yourself out of your data. When you try to recover a backup, you'll notice you have none.

Perhaps most attackers won't have the patience and will just forget about you, but who knows?

A naive protection from that would be to store at least one version of the backups forever. But we're still not covered, since the attacker could compromise that particular version and boom.

I can't think of a comprehensive, fully closed-loop solution right now...


"I can't think of a comprehensive, fully closed-loop solution right now..."

The ZFS snapshots that you may configure on an rsync.net account are immutable.

There are no credentials that can be presented to destroy them or alter their rotation other than our account manager whose changes always pass by a set of human eyes. Which is to say, no 'zfs' commands are ever automated.

So the short answer is you simply point borg backup to rsync.net and configure some days/weeks/months of snapshots.

The long answer - if you're interested:

https://twitter.com/rsyncnet/status/1470669611200770048

... skip to 52:20 for "how to destroy an rsync.net account":

"... Another thing that lights up big and red on our screen is ... someone's got a big schedule of snapshots ... and then they change it to zero ... you've got seven days and four weeks and six months but we want to change that to zero days of snapshots. We see those things ... and we reach out to people."


Re: HN rsync.net discounts. Is that basically the Borg specific product? Or is there some discount on the normal rsync.net service? The Borg product misses one important thing for me and that is sub accounts. But the price difference between them is pretty large. I don't need the hand holding service, I already use the Borg service you provide but definitely would prefer having the sub accounts and possibly the ZFS snapshots might be useful.


email info@rsync.net and someone (possibly me) will get you squared away ...


Thanks, sent. Have a great evening!


I’m a bit confused. https://www.rsync.net/products/borg.html says that, with the Borg pricing, there are no free snapshots. Are there non-free snapshots?

The Borg pricing is quite competitive with S3 Glacier, although Deep Archive seems to have you beat. (Of course, Deep Archive loses badly if you actually read the data…)


Yes - any account can have snapshots. The discounted ones just don’t have free snapshots. They count against your quota.

Full price accounts have 7 days that don’t count against your quota.


Can we stop it with the thinly veiled advertising?


I don't think they're trying to veil anything with the account they're posting from. And this does present a seemingly pretty reasonable "comprehensive, fully closed-loop solution"


Well I'm a user of theirs and right now, at this moment, I have the same problem the article describes. John's comment made me aware of the option, which I will set up right away.


It’s more or less open advertisement. The user constantly advertises their product on HN.

It’s a tough market, and HN might include users or sysadmin types who might be interested in ZFS backups.

Whether advertisement violates the site rules, I don’t know.


There is nothing thin or veiled here, afaics.


Author here.

> Your master access to S3 should never go into your servers. Create an IAM access with authorization to only PUT objects into S3.

Isn't this precisely the approach I take in the article? But, you need to make sure the master access to S3 is not compromised as well. Probably obvious, but certainly not wrong.

> End note: this is still not enough. An attacker could compromise your backup script and wait for 40 days before locking yourself out of your data. When you try to recover a backup, you'll notice you have none.

I agree that's a reasonable attack scenario. Just as for any backup strategy, you'd need a monitoring strategy (i.e. try restoring a backup at least every X days), and that was left as an exercise at the end of the post.

> I can't think of a comprehensive, fully closed-loop solution right now...

More complex and yet more robust solutions can leverage two AWS accounts and setup cross-account replication between S3 buckets. The replicated bucket never gets its files deleted (they're just marked as deleted in source).

But I think this can get too complex for my scenario.


> An attacker could compromise your backup script and wait for 40 days before locking yourself out of your data. When you try to recover a backup, you'll notice you have none.

There are many reasons why recovery from backup could fail. Verifying recovery from backup is critical, so the solution here is to not delete you oldest backup until you have verified recovery from a newer version.


You're right master access to S3 should never be on your servers. Permission to deleting object versions from your backup bucket should be reserved for the highest level of trust in your organization. These permissions shouldn't be accessible via a key sitting around on a computer, but should only be accessible to a person logging in with MFA.

With such access only accessible to a person with MFA, it should be pretty hard to accidentally leak the secret (MFA-authenticated sessions have tokens that are time-limited). If even those AWS permissions are compromised, yes, you could lose your backups, but I hope you don't use S3 as your sole backup storage.

I keep backups on both a local backup server and also in S3. If my AWS credentials are compromised, there should be no access to the local server. If someone accesses my local server, the access it has won't allow deleting S3 backups.

Since my backups are encrypted, the attacker would have to also have the key to the backups to read the content.


There is an immutable backup solution to S3: https://www.retrospect.com/en/support/kb/immutable_backups_o...


I would never use this after being burned badly. Duplicity hits a scalability brick wall on large file volumes which consumes ridiculous amounts of CPU and RAM on the host machine and leads to irrecoverable failure where it can’t backup or restore anything. Fortunately I caught this before we had a DR scenario.

I am using rdiff-backup over SSH to replace it now. This has been reliable so far but recovery times are extensive.


"I would never use this after being burned badly. Duplicity hits a scalability brick wall on large file volumes which consumes ridiculous amounts of CPU and RAM on the host machine and leads to irrecoverable failure where it can’t backup or restore anything."

I believe you are correct and I believe that in my private correspondence with the duplicity maintainer (we sometimes sponsor duplicity development[1]) he sort of conceded that borg backup[2] is a better solution.

If the cloud platform you point your borg backups to can configure immutable snapshots (that is, they create, rotate, and destroy them) then a good solution would be using borg backup over SSH and configuring some of those snapshots[3].

[1] https://www.rsync.net/resources/notices/2007cb.html

[2] https://www.stavros.io/posts/holy-grail-backups/

[3] https://twitter.com/rsyncnet/status/1453044746213990405


Assuming zfs is reasonable for the usecase: incremental zfs snapshots are likely efficient since these are byte level vs file level.


Depends on the recovery cost but yes I agree they are probably a better solution.


Duplicity also can't be run natively on Windows, so I was stuck having to migrate all of my backups to another program (Duplicati) once I changed operating systems.


Do you have an idea of how "large" it is? I used it for a previous server of mine (about 250GB/300GB), I did daily backups and multiple restores over the years, they would always succeed, even though, admittedly, I didn't care about speed.


I found Duplicity to be untenable for a volume of about 2TB on a machine with relatively low resources (4GB of RAM). It would consistently fail due to OOM before completing the job. This is an unfortunately common issue with backup tools, which it seems get surprisingly little testing on multi-TB tasks. I'd guesstimate maybe half of open source backup tools I've tried either seriously struggle or reliably fail to complete an 8TB job I have on another machine, with similar memory constraints.

So far Rustic has been my best bet. It seems to complete very large jobs with static and relatively low memory consumption. Enumeration takes about 30 minutes and a full backup nearly a month on the 8TB job (limited bandwidth available) but it reliably completes. Rustic also resumes interrupted jobs (e.g. due to reboot) in around 10-15 minutes on the 8TB volume and with it seems minimal rework. I'm sure there are other tools that handle this as well, but I've definitely gotten frustrated with finding them. I wish more backup tools gave you some kind of assurance in the marketing materials that they've been validated on, say, 10TB.


> I wish more backup tools gave you some kind of assurance in the marketing materials that they've been validated on, say, 10TB.

I'd wish that too. Also, I'd like to pay for a backup tool, since it's so critical, so that I can get a sort of support, but I have found issues with many of them.

I must say that, with my small population (N=1) for testing, it was hard for me to settle on a backup system. I tried duplicity, then I tried borg, then I used duplicati. I had considered attic and restic as well, but I don't remember why I didn't choose them right now.

I experienced issues with most of them and I reverted to duplicity. With borg, I had a persistent issues where backups would stop and said something like "backup destination is newer than source" (I forgot the exact message, but happened multiple times across versions). Duplicati seemed a very large and complex codebase, and periodically stopped working, mostly because dotnet runtime issues AFAICR.


Object lock helps tick some regulatory boxes, but at least in the described scenario, enabling it isn't really going to protect you against many additional threat models.

Simply using object versioning, and an IAM role with no delete permission is sufficient.

The biggest risk with using AWS for backups, is being compromised via a high-privilege IAM role.

If a role has s3:* permission, it can just remove the locks and delete the file.


Indeed. It'd be best to have another copy of the backup somewhere other than S3, or in S3 under a different AWS account.

I worked for a company that had all their databases in AWS RDS and all their database backups as RDS backups stored in the same AWS account. A compromise of a single high-privilege account could cause that company to no longer have ANY data. I warned the higher-ups repeatedly that they needed more than one backup location. They didn't seem to care too much.


Yes. Cross account replication is the way to go for better protection.


At a minimum, you should include enabling MFA for the IAM user. Generally, I'd suggest against using IAM users entirely. Ideally you would use an IAM Role via federation or SSO. For my personal accounts I use AWS SSO even though I'm just one person since it enables me to do all my work through role-based authentication and is still protected by MFA on top.


Uhm, I don't understand how this would work. We need the backups to run unattended. The IAM user I configure has got no console access at all. Do you have any pointer or example?


That's true. You could still do IAM role authentication via web identity which would be much safer than a user and could still be automated.


As I understand it, the problem enterprises hit isn't the lack of backups or that backups are compromised. The problem seems to be that the restore process has stopped being scalable - restore of all enterprise data can take weeks or longer, and so paying the hacker is cheaper than doing the restore even though the restore would work.


Some companies I've worked for have "disaster recovery" drills periodically. Those should involve restoring backups. From that you should have a timeline of how long it takes to perform a restore.

If it takes weeks to perform a restore then that should be a major red flag and be addressed.

Of course, if a company has the kind of security that allows them to be hit by ransomware, then they probably aren't diligent about their disaster recovery scenarios.


obvious problem here is that most dont perform dr test across all systems simultaneously - they do it by major component, and usually have a working system.. ransomware can take down ALL of that, so you are left with bare metal recovery, and no infra to build off of.


That depends, with modern enterprise backup software it can be faster to recover that way than rely on the (sometimes slow) decryption software. This was the case with Colonial Pipeline for example, https://www.theregister.com/2021/05/13/colonial_pipeline_ran...


AWS S3 object versioning makes it pretty easy to allow a server to add backup data without the ability to permanently modify or delete data that existed previously.

For my backups I use restic and sync the restic repository data to S3. Even if the source data is corrupted, I can always roll back to the set of S3 object versions from a particular time.

The downside to using S3 object versioning is that I haven't found any good tools to work with object versions across multiple objects.

For example, I need to prune old backups from my restic repository. To do that I have to delete object versions that are no longer current (i.e. the latest version of an object). To accomplish this I had to write a script that uses Boto3 (AWS SDK for Python) that lets me delete non-current object versions and deletes any delete-marker versions.

The code was pretty straightforward, but I wish there was a tool that made it easier to do that.


Not 100% sure but can't you setup rules for that stuff in s3 itself? Like delete everything older than x date but never delete current version?

I have a feeling I set something like this up but it's been a while since I did it.


You are correct, S3 lifecycle policies support deleting non-current object versions once they're X days old as well as X versions old: https://docs.aws.amazon.com/AmazonS3/latest/userguide/lifecy...



You can, but automatically doing it would defeat the whole purpose of having versions I could roll back to.

I need to confirm that that my recent backups are valid before deleting old backups. Otherwise I might as well not have used S3 versioning.


Incredible that folks have to jump through so many hoops.

At this point, AWS should offer versioned S3 backups accessible only offline or through customer support, enabled with a click of a button.


AWS Backup does exist.


I knew about AWS Backup but I previously never saw an option to enable it for S3. It looks like it is in limited (1TB) preview in the Oregon region, and doesn't support backing up buckets encrypted with client-provided keys.

That said, AWS Backup is the answer to ransomware woes and it can't GA soon enough (3P solutions like Rubrik, Druva notwithstanding)

https://docs.aws.amazon.com/aws-backup/latest/devguide/s3-ba...


Short plug: At https://BorgBase.com, we also do append-only backups and every repo is isolated. The majority or repos is append-only at this point.


Thanks for your plug :-) Your company seems interesting, but could you give us some background? How many people are working there? What are your assurances?

I cited rsync.net in my article, and I leverage AWS S3 as an example, because they're well-known players. I would be more hesitant to use a small and unknown (hence potentially unreliable) backup provider.


BorgBase has been around since 2018 and we also maintain Vorta, a popular desktop backup client (also included in Debian since Buster). In addition we sponsor Borg and Borgmatic to keep the eco system healthy. Most providers don't go as far. So I'm quite proud of that.

BorgBase itself doesn't have staff, but runs under Peakford.com, which offers a variety of hosting- and consulting offers. Myself, I mainly take care of our open source offerings and community.


Having point in time backups is a good start but I can see ransom-ware adopt to slowly corrupt your data in a way that is reversible but may take months to detect. Your backups going back months will then have this corruption. To detect this application level tripwires may be needed like checksums etc. Finally there is always reputation damage and threats to expose the attack an your data to the public via blackmail. Just because you have backups does not make you safe.


Application "tripwires" are just another obstacle for an attacker to overcome. If audits aren't external to the system being audited they're just as vulnerable to manipulation.

A Customer of mine in the financial sector sent their backups to a third-party for independent verification quarterly. The third-party restored data into a clean system reconciled against the production system.

That would be the kind of auditing that would be more apt to detect the "low and slow" attack.


Author here. I'd like to thank the HN folks for some great feedback (as usual).

I'll update the article in the next hours to add some caveats.


I use duplicity (via the DejaDup GUI) and I really like it. However, there is one thing I think most people need to be aware of:

Duplicity does NOT encrypt the names of your files.

Some might not care about this much, but for me I really don't like my encrypted backups containing so much metadata.

EDIT: FWIW I don't use the AWS backup options. I have separate offsite backup connections to cloud services as well as NAS.


Author here.

> Duplicity does NOT encrypt the names of your files.

I, personally, do not care, but you're right. For the purpose of this article I could even leave encryption off, though. "I don't trust the backup destination" wouldn't be part of my threat model here.


Any half-ass backup software encrypts the file names in the repository, and it seems duplicity does too.


It actually does not as you can see all of the file names before you enter your encryption password.


That’s most likely local cache (it takes file names from your local disk)!!

The file names and content are encrypted.

No decent backup software will leave file names in plaintext.


Seems easier to just do incremental snapshots unless you're on bare metal. Many hypervisors support them and they're built into EC2/EBS

If you want to limit data, you can create additional drives and mount them at the appropriate location (or change your application config to save to the auxiliary drives)


One good alternative is to upload to an S3 bucket with object lock enabled. This way you can store immutable objects.

You can make them immutable for everyone if you wish and the only way to delete them is to close the AWS account.

I cannot think of a safest place for a backup than a bucket with object lock.


The article is about object lock


One good alternative to reading the articles is just skimming the comments.


Yet another alternative is to misread the title.


Can you still configure object retention when they are locked? i.e. automatically delete objects after X days.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: