
How to Automatically Backup a Linux VPS to a Separate Cloud Storage Service - jakejarvis
https://jarv.is/notes/how-to-backup-linux-server/
======
joshstrange
Forgive me if I'm missing something but this appears to just backup files so
it would be fine for source code (should be in version control and safe
already) and static assets (like user uploads) but doesn't appear to address
things like DB backups which I feel like is the number 1 thing lost if you
lose access to your host (followed by user uploads). The problem with DB
backups is you can't just backup the data directory (like /var/lib/mysql)
unless you've shutdown the DB or you can do a dump (mysqldump) but backing
that up hourly is not a good solution IMHO. I guess you could have a replica
that you shut down at the top of the hour, backup the data directory, then
start back up but all if this is to say this post is not a silver bullet to
"Automatically backup a Linux VPS".

This is NOT a knock against the author, I just wanted to point out that
"backups" are much more complicated than "copy files elsewhere". For DB's I'd
probably consider running a replica on 1 or more other clouds. IDK the
logistics of replication over the internet but I know for work we do
replication from our datacenter down to our local servers and that's over a
relatively slow connection so I assume it's possible to do it from cloud-to-
cloud.

~~~
jakejarvis
Absolutely. Maybe I should have noted that this is more of a guide to make
your existing backup procedures more redundant, which implies that you already
have local "backups" being made of whatever you want to redundantly store in
S3 or B2 or anywhere externally.

In that case, it does become as simple as just copying files elsewhere. (For
example, using the Restic steps in my post to backup a folder of hourly
database dumps, like you mentioned.) Replicating databases (and other methods
made specifically for DBs) is certainly a much, _much_ better route for
mission-critical and/or enterprise data.

Covering every permutation of different types of data to backup would have
made a long post much longer, but I'll add writing a part two to my to-do list
covering rudimentary database backups since that has been brought up here a
few times.

Thanks for the feedback! :)

~~~
joshstrange
Awesome! I really hope I didn't come across as attacking you/your post, I
found it really useful. I just wanted to remind people that the DB wouldn't
really be covered by this (except in the case you mentioned where you are
dumping the data).

I am definitively looking at this through the lens of where I work where a
mysqldump (or equivalent) could take days to complete in full (DB is nearing
2TB in size now). For a number of projects a mysqldump might only take seconds
or minutes and would be a perfect candidate for this backup scheme.

~~~
jakejarvis
Not at all! I'm really glad you mentioned it, since I wrote this in the
mindset of small to medium VPSes used for personal projects and I'll make that
more clear in the intro. Backups (like everything else, unfortunately)
definitely get exponentially more difficult the more successful you become.

------
turrini
Vultr AND Linode.

1) Upload a custom ISO with ZFS ([https://github.com/beren12/zfs-
iso/](https://github.com/beren12/zfs-iso/))

2) Create a new VPS without OS and boot to your uploaded ISO.

3) Create a ZFS root pool and bootstrap your Debian or another distribution.

4) Enable all cool features: compression, encryption, etc.

5) rsync your zfs snapshots from Vultr to Linode and vice-versa.

This is how I do. You can even use them as templates for newly VPS.

And for backups, BackBlaze B2 and WASABI with a zfs-snapshot-upload script.

~~~
SkyLinx
Isn't a backup of the whole os via snapshot overkill? I can bring up one or
more completely configured new servers in a few minutes with Ansible (plus
another few with Rancher for Kubernetes). I don't see the point in backing up
anything other than the actual data.

------
fabian2k
I've used rclone for a very similar purpose. Restic, which is used in this
post looks very interesting as well.

It's not the topic of the post, but database backups deserve a special
mention. You can't just naively copy the database folder this way in most
cases, you have to make sure to backup a consistent snapshot of the database.
This is still not hard to do at smaller scales, when you can just add an
exported dump of the database to your regular backup. But it is a point that
needs some attention if you host the database yourself.

~~~
tluyben2
I have many servers with different (versions) of Linux distros on them and I
found Duplicity | Restic very annoying to install. Vague (for me as non-Python
expert) error messages and options randomly not working as a result. Rclone
was absolutely painless to install everywhere.

~~~
KAMSPioneer
Were you getting Python errors from Restic? Not terribly familiar with
Duplicity, but Restic is written in Go (Github is
[https://github.com/restic/restic](https://github.com/restic/restic)).

~~~
tluyben2
I tried 4 different similar packages; maybe I remember that detail wrong, but
I could not get Restic working on older machines for some reason. Rclone was
so simple and it just worked, so I did not investigate further. Is Restic much
better?

~~~
witten
You might be thinking of Borg Backup, which is written in Python:
[https://borgbackup.readthedocs.io/](https://borgbackup.readthedocs.io/)

------
krn
Are there any reasons to prefer Restic over BorgBackup[1]?

A conclusion from one comparison (2017)[2]:

"Restic’s memory requirements makes it unsuitable for backing up a small VPS
with limited RAM, and the slow backup verification process makes it
impractical on larger servers. But if you are backing up desktop or laptop
computers then this may not matter so much, and using Restic means that you
don’t have to setup your own storage server."

Is this still true?

[1] [https://www.borgbackup.org/](https://www.borgbackup.org/)

[2] [https://stickleback.dk/borg-or-restic/](https://stickleback.dk/borg-or-
restic/)

~~~
raimue
For remote backups, BorgBackup always needs to run a server process (usually
over SSH). Restic works with a "dumb" storage that only provides
get/put/list/delete operations. Therefore restic is way easier to set up with
built in support for S3, B2, GCS, and similar services that only offer an API
but not shell access.

~~~
witten
That's true, although there are now a handful of BorgBackup remote storage
vendors (rsync.net, BorgBase, etc.) that you can pay to run the server-side
hosting for you. Probably not nearly as cheap as, say, S3.. but it does get
closer to "just point your client here and hit go". And they offer additional
sauce on top that you'd have to roll yourself with S3.. Backup activity
monitoring, etc.

~~~
m3nu
Thanks for the mention. BorgBase.com author here.

We're not as cheap as S3 Deep Glacier, but cheaper than standard storage and
the same price as B2 and Wasabi, if you get the large plan. So not that much
difference to "dumber" storage.

Storage is either RAID6 or Ceph.

------
Neil44
I use Duplicity in a similar way to back my Linode stuff up to Backblaze. It
does versioning really well and it's been very reliable. I'd still have to
configure up a new server somewhere etc but at least I have the data.
[http://duplicity.nongnu.org/](http://duplicity.nongnu.org/)

~~~
raimue
I used duplicity in the past, but the main problem with its incremental
backups is that in order to be able to prune the backup history, you need to
do full backups regularly to start a new backup chain. That means transferring
a full copy of the data.

I switched to restic now, which allows to take incremental backups, but can
also remove any snapshot to prune the history. Although it does not support
compression, due to its deduplication and removing the need to store multiple
full backups, the restic repository takes less space now than duplicity
before.

~~~
OJFord
That's a really good point I hadn't considered at all; I'm glad you mentioned
it! I was looking at a benchmark (that i linked in another comment) that makes
duplicity look slow, but so much more economical on storage space - i.e.
cheaper.

But as you point out, if you don't need a long history, incremental eventually
gets more expensive. Unless you could squash older than X, I suppose, but
presumably that's so expensive to run (encryption & compression) that it's not
supported.

------
PStamatiou
Related - I've been thinking about how to best backup my S3 buckets (some with
50k+ files) off of Amazon. Sure I can setup another bucket with that cross
region duplication feature, and I have versioning.. but would really prefer a
backup off of Amazon (ie not sending manually created zips in a lightsail/ec2
or something to glacier) in case it ever gets hacked or I accidentally nuke
the buckets or something like that.

Currently just doing a combination of s3cmd for a local archive (takes forever
to download and then it doesnt seem like incremental syncs are any faster), as
well as having Google Console clone my bucket there (but I'm not sure if it's
versioned, or as easy as downloading the whole archive).

Never used duplicity -- would it be fast for something like this? Guessing I
should just cron it on a remote server instead of running off a local machine
frequently.

~~~
padelt
Have you had a look at rclone? Pretty sure you can copy or even sync files
from one remote storage to another. E.g. copy from S3 to B2.
[https://rclone.org/commands/rclone_sync/](https://rclone.org/commands/rclone_sync/)

~~~
anderspitman
+1 for rclone here. It can indeed copy between remote backends. Just keep in
mind that that data all has to flow through the rclone process. You could
probably get much better performance by running rclone itself on an ec2
instance. Just keep an eye on your throughput usage.

------
tickthokk
Thanks for sharing! While the victims were being scorned by the internet for
not having proper backups, nobody was sharing how to achieve that.

~~~
dymk
Really? A blog post about how to use a glorified `rsync` was needed to
instruct people _building services for Fortune 500 companies_ to back their
user data up?

~~~
anderspitman
To be fair there are a dizzying array[0] of OSS backup solutions, and it's
very much not apparently what features are most important. A simple post like
this that outlines a single _good enough_ solution with a modern tool is
valuable IMO.

EDIT: Oh and restic has much more functionality than rsync, including
deduplication and encryption. rclone is more of a "glorified rsync", but even
then its array of backends makes it truly glorious.

[0]
[https://wiki.archlinux.org/index.php/Synchronization_and_bac...](https://wiki.archlinux.org/index.php/Synchronization_and_backup_programs)

------
z3t4
Don't forget about practicing restoration (catastrophe scenarios). So that you
will know how long time it will take to restore, and if something is missing.
Last time I did it I did not remember the password for the encryption key.
Sure I had it written down on a piece of paper, but the scenario was that the
building had burnt down.

~~~
SkyLinx
Good point on testing the backups.

------
smnrchrds
In a dockerized single-VPS environment, where should cronjobs live? Should
they be part of the main Docker container that had the app code, or a separate
container that only has all cronjobs, or simply on the host?

~~~
jakejarvis
Good question. I have the same setup on one server hosting GitLab, Pi-Hole,
Plex, etc., and I have Restic (and its cronjob) installed on the host and only
backup the files that I mount to each Docker container, which are all stored
in /srv/docker.

In theory, you need to be ready to literally delete every container at any
time and pull them from scratch and be 100% fine, since all of your actual
data should be stored on the host and mounted as Docker volumes [0]. It's a
good Doomsday test if you're looking for one. ;)

[0] [https://docs.docker.com/storage/](https://docs.docker.com/storage/)

~~~
tracker1
As mentioned in another post... could manage the backups from another server
(not in the network) with a cron job that grabs a snapshot from the docker
server's shared volume directory and forwards it to it's final destination.
Could be done on a really small instance, and this way your backup information
and account details aren't on your production server itself.

------
kijin
Meh, just another backup solution that requires AWS keys, ssh keys, etc. to be
kept on the same server where your data is. What if that server is
compromised? The attacker now has all the keys he needs to delete or modify
your backups, too.

For maximum peace of mind, always _pull_ backups from a separate server that
is not exposed to the world. Don't let your primary server _push_ arbitrary
data to the backup store.

This rule is trickier to follow when your backup store can't run scripts,
which is why so many tools designed to work with S3 tell you to keep the keys
exposed. But if you really want to, you can use an intermediate host to pull
backups before pushing them again to S3.

~~~
longwave
Borg has an append-only mode [1] that prevents clients from overwriting data.

[1]
[https://borgbackup.readthedocs.io/en/stable/usage/notes.html...](https://borgbackup.readthedocs.io/en/stable/usage/notes.html#append-
only-mode)

------
LinuxBender
My own personal preference is to simply make VM's on each VPS that has some
storage space, then enable chroot sftp and rsnapshot. Then on the client side,
I used LFTP (sftp mirror sub-system) which is compatible with chroot sftp and
behaves like rsync.

Each VPS backs up to the other. RSnapshot makes daily diffs that use hardlinks
to avoid taking up space. This also mitigates tampering, as only root have
access to the snapshots.

Demo site using anon login for testing: [1]

[1] - [https://tinyvpn.org/sftp/#lftp](https://tinyvpn.org/sftp/#lftp)

------
cure
+1 for restic. I use it, and it's awesome.

~~~
rsync
(I hope) You'll be happy to learn that restic works perfectly with rsync.net:

[https://www.rsync.net/products/restic.html](https://www.rsync.net/products/restic.html)

One of the modes of restic is SFTP target and as we run stock, standard
OpenSSH, it works perfectly.

EDIT: A sibling comment to yours mentioned 'rclone' and I am happy to
informally announce that over the past few months we have rolled out the
'rclone' binary to all of our production fileservers (it requires a server-
side binary exe to be in place) and it is being used by rsync.net customers to
broker file transfers cloud to cloud to cloud (as rclone is apt to be used
for). 'rclone serve' and 'rclone mount' are disallowed for (I think) obvious
reasons, but otherwise everything works ...

~~~
nickcw
Nice one - I'd love to hear more about this (rclone author!).

~~~
rsync
Please email info@rsync.net so we may chat a bit ... I'm really excited to
have this functionality in place.

------
heinrichhartman
Does anyone here have experience with backing up ZFS pools in cloud storage
like S3, B2, ...?

I have a bunch of snapshots ([https://github.com/jakelee8/zfs-auto-
snapshot](https://github.com/jakelee8/zfs-auto-snapshot)) that I want to
backup along with the active tree. But don't want to keep extra copies of the
data.

\- Do these services offer snapshotting? ...that can be automated?

\- Is there zfs integration, e.g `zpool send | b2 receive`?

~~~
conception
[https://www.rsync.net/](https://www.rsync.net/) is the only one I know of.

------
Blackstone4
One idea I had was to create a service with preconfigured images setup for
personal use with VPN, email server and file sync/backup. It could be sold to
privacy conscious individuals and could compete with ProtonMail.

The technical side could be hidden from less technical users and it sold as
isolated servers so the data would be protected.

I don’t have the skills or the time to work on this so happy for others to use
the idea

------
monkeydust
I am looking for something that can backup Dropbox, Google Drive and Amazon
Cloud to a 3rd party service. What do people recommend?

~~~
ac29
rclone

~~~
monkeydust
Thanks but appears Amazon stopped issuing api keys for Amazon Drive so stuck,
at least for a fully automated solution.

------
a2tech
Does anyone have a recommendation for a backup client that handles millions of
tiny files? I'm using rsnapshot right now, which works but backing up to an
NFS share is incredibly slow (most of the time is spent in iterating over the
filesystem to get a list of changed files, then running the hardlink process
from the previous snapshot).

~~~
rsync
You're going to have to walk all those inodes no matter what you do. rsync is
as good as anything at that task.

A better way would be to unmount and send the filesystem with 'dd' or
something like that, or, to use 'zfs send' but I have a suspicion that neither
of those options are available to you ...

I _will say_ that splitting the rsync job (rsnapshot runs rsync underneath)
into multiple, smaller jobs, could save you some time if you're running into
any resource limits while you walk that big set of inodes... so if you're
lucky and you have 4 or 5 or 8 top level dirs that are all roughly the same
size, you could do a handful of smaller jobs, one after the other, instead of
one huge one ...

~~~
module0000
>> A better way would be to unmount and send the filesystem with 'dd' or
something like that

To add to that.. to avoid having to unmount your filesystem; use LVM. Then you
can call `sync`, snapshot your main volume, and `dd` the clean snapshot. Once
you're done, remove the snapshot. This strategy avoids downtime while backing
up your volume.

------
pnutjam
Good directions, everyone should do this.

------
themodelplumber
I actually like using CPanel's built-in backup settings on servers where I
have CPanel installed. Amazingly simple to set up, really intuitive, and
supports a variety of services. I have used Amazon and SFTP backups so far and
they both work really well.

------
electriclove
Or use a paid service that handles files and databases for ~$30/year like
[https://www.dropmysite.com/](https://www.dropmysite.com/)

(I'm just a customer that has been generally pleased over the past many years)

------
SkyLinx
I'm surprised that no one has mentioned Duplicacy yet. It's another very
solid, reliable and fast alternative. At the moment I use Restic on servers
but use Duplicacy on the desktop. It can also be used on servers of course.

------
OJFord
Restic looks neat. I've been looking at using duplicity [0] for similar
purposes recently, which does a similar job.

Just found a good comparison/benchmark of the two at [1] - tl;dr seems to be
that restic is fast, and duplicity is small.

[0] - [http://duplicity.nongnu.org](http://duplicity.nongnu.org)

[1] -
[https://github.com/gilbertchen/benchmarking](https://github.com/gilbertchen/benchmarking)

~~~
SkyLinx
Duplicity is not particularly good imo. Like I said in another comment it us
much slower than other options and requires full backups regularly, which is a
problem with lots of data.

------
ausjke
trying to backup linode image using dd, which works but not easy, I hope VPS
vendors can provide a way for customer to migrate when the time comes.

~~~
zerkten
Doesn't Linode have its own backup which lets you restore onto another Linode
VPS? It'd be nice to use provider-agnostic tools, but this seems like the most
pragmatic option. I'm guessing other VPS providers offer something similar.

~~~
ausjke
it can't deal with the case when linode locks up your account, similar to what
DO did to the original post, you can't put all eggs in the same basket to be
safe?

~~~
tracker1
I don't consider something backed up unless there are at least 3 copies of
something in at least 2 locations. Also, replicated dbs are not backups, but
may be the best available option when you have too much data to reliably
backup en masse.

I've done replication on write scenarios. Generally when I've setup mongo or
elasticsearch for searching/read performance, I'll also push to a fixed
JSON+GZ on S3 or similar. It tends to work pretty well as a fallback for
larger data scenarios as a fallback. Had to use it once, and was so glad to
have it.

------
anderspitman
TLDR for comments: use rclone and/or restic

