
Tarsnap – Online backups for the truly paranoid - type0
https://www.tarsnap.com/
======
nodesocket
Colin is probably one of the best minds in encryption that I know of. He wrote
Scrypt[1] as well. My gripe has always been that while Tarsnap is a product
that is clearly built for developers, it does not have to look and feel like
the 90's and use obscure and non-intuitive billing (picodollar). The web
interface and landing site need some designer/UX love for sure, but at the
same time I think Colin is happy and satisfied with how things are going.
Highly recommend Tarsnap.

[1] -
[https://en.wikipedia.org/wiki/Scrypt](https://en.wikipedia.org/wiki/Scrypt)

~~~
thisnotmyacc
[http://www.kalzumeus.com/2014/04/03/fantasy-
tarsnap/](http://www.kalzumeus.com/2014/04/03/fantasy-tarsnap/) \- pretty much
everything you just said.

~~~
cperciva
In case anyone sees this and not your other comment (which is now at the
bottom of the page), I replied to it here:
[https://news.ycombinator.com/item?id=13505465](https://news.ycombinator.com/item?id=13505465)

~~~
ruanmuller
If you want a hand with some design work for the site, drop me a note. I'd be
happy to help, and local to you.

------
kogir
Tarsnap saved HN's bacon when a drive catastrophically failed shortly after I
started working on it some years ago. (I no longer do.)

Cannot recommend enough.

Here's my sleep deprived post shortly after:
[https://news.ycombinator.com/item?id=7069013](https://news.ycombinator.com/item?id=7069013)

------
cperciva
Not sure why Tarsnap has bubbled up on HN again, but I'm always happy to
answer questions about Tarsnap if there's anything anyone here wants to know.

~~~
Osmium
Always wanted to try Tarsnap for personal use, but I can only budget if the
cost was comparable to Dropbox/Microsoft OneDrive/Google Drive/iCloud/Amazon
Drive which are all ~$10/TB/mo, but instead Tarsnap seems about an order of
magnitude more expensive? Or is the product just not aimed at me?

I currently use Arq[1] for backups (which has its own encryption[2] with a
user choice of cloud backend), but Tarsnap has such a stellar reputation I'd
definitely try it if I could :)

[1] [https://www.arqbackup.com](https://www.arqbackup.com) [2] AES-256 with
PKCS5_PBKDF2_HMAC_SHA1 for key derivation, implemented by OpenSSL, with an
open file format
[https://www.arqbackup.com/arq_data_format.txt](https://www.arqbackup.com/arq_data_format.txt)

~~~
CodeWriter23
Tarsnap stores in S3, which is the primary cost driver. For the extra expense,
you get multi-region durability up to I think it's six nines.

------
no_protocol
I see some comments here from people wondering why anyone would use this or
searching for use cases. I'll share mine.

I use tarsnap to keep daily backups of a web server with a handful of sites. I
send over their content, configuration, and database backups. This isn't
$$$-critical stuff so I just make a once-daily database backup and I'm fine
with that. Every few months I verify one of the backups to make sure
everything is still working, but it is extremely hands-off. I just don't have
to think about it.

It is currently costing me about 3 cents per day. I could probably reduce that
a bit with some pruning. It was 2 cents per day for a while.

Here is my most recent billing statement:

    
    
      2016-01-26 Balance                   43.152827492555316729
        Client->Server Bandwidth   8.2MB    0.002069643750000000
        Server->Client Bandwidth  41.2kB    0.000010296000000000
        Daily storage             3.35GB    0.027041509164110040
      2016-01-27 Balance                   43.123706043641206689
    

So far I have used a total of $7 of my $50 deposit. Let's hope Colin keeps the
service running at least until I've worked through that. Might be a few more
years, though. Here are my statistics:

    
    
      $ tarsnap --list-archives | wc -l
      496
      $ tarsnap --print-stats --humanize-numbers
                                             Total size  Compressed size
      All archives                               200 GB            86 GB
        (unique data)                            8.4 GB           3.3 GB
    

It appears I have been using it for about 496 days. I can access any specific
day's snapshot and download it if I need. The 3.3GB here is the same as the
3.3GB for daily storage on my billing statement. I'm guessing that is also
very close to the total amount of Client->Server traffic I have generated in
my account's lifetime. Probably within a few percent? Not sure if I can
quickly verify though.

~~~
nodesocket
2¢, 3¢, 10¢ per day, does that really matter? We are talking about less than a
Starbucks for a full month of backups. Sorry, but developers who literally
optimize cents over their time and effort is a pet peeve of mine.

~~~
cperciva
_2¢, 3¢, 10¢ per day, does that really matter?_

It doesn't matter right now, but if you're building a company which you hope
to scale up, it's good to have costs which won't get too big when said scaling
happens -- because when your company explodes overnight, you're going to be
too busy keeping everything else running to spend time reworking your backup
strategy.

(Also: If you don't have backups now because you're "not big enough to have
data worth protecting", you're going to be too busy to start doing backups
when you suddenly _do_ have data worth protecting.)

~~~
skrebbel
> _but if you 're building a company which you hope to scale up_

This is a popular sentiment among engineers at startups and in my opinion
founders should work hard to (kindly) beat it out of the team as soon as
possible.

Basically, you can use that line to justify nearly any engineer hobby. We need
microservices! We need message queues! We need master-to-master replicated
NoSQL databases spread geographically! We need Redis, Kafka and Cassandra with
a CQRS event source pumping data in there, oh and also to a Postgres so we can
do arbitrary queries for management reports! We need our backups to cost not 3
but 2 cents a month!

But the truth is that, statistically, the chance that a startup will not
_actually_ scale up, is much bigger than the chance that once it does, the
sharply increasing Tarsnap bill is going to drive it to bankruptcy.

I agree with you that you need backups from the start, but whether they cost
$.02 per month or $10 per month initially really doesn't matter. There's a lot
of features to build (and kill), users acquire, content to market, and the
team is tiny.

I used to laugh about Twitter during their early growth days, fail whales all
over the place. "What, they made that in Rails? Idiots". Now that I'm a
startup founder myself, I realize that they did it perfectly right and I
strongly doubt Twitter would've been what it is today if they had wasted time
getting the perfectly scalable tweet processing timeline before putting the
site out there.

~~~
_dax
I think it's a false dichotomy that great architecture and fast iteration are
at odds. In fact, great architecture is what allows fast iteration to happen.

Yes, simplify your system by using as few moving parts as possible. But that
also means don't use bloated frameworks that silently slow down your iteration
pace with technical debt.

So many startups I've consulted with were stuck with having to redo core
architecture right when they found market fit. It's a tough position to be in.
The main rule to follow is that good design allows better design to happen
later, it's worth investing in that

~~~
skrebbel
I fully agree and I don't believe anything in my comment contradicts yours.

> _So many startups I 've consulted with were stuck with having to redo core
> architecture right when they found market fit. It's a tough position to be
> in._

Good point, but watch out for survivor bias here: plenty startups failed
before they reached product market fit, and some did so because they wasted
time doing the wrong things. All things be told, I'd rather run a startup that
needs to hire you to fix the core at the worst possible moment, than a startup
that fails. But I agree fully nevertheless: good design begets good design and
it really doesn't take much time.

------
forgotpwtomain
This has appeared on HN a few times.. I'd _really_ like to use tarsnap but
it's simply too expensive to be viable (to which I guess the response is:
don't use tarsnap for full-backups -- but this makes it practically useless as
far as diminishing the complexity of my current backup scheme[0]).

[0] [http://duplicity.nongnu.org/](http://duplicity.nongnu.org/)

~~~
mike-cardwell
I also use duplicity for my backups. I don't use the S3 option, preferring to
write to my own storage, but if I were to turn on S3, what exactly does
Tarsnap give me beyond what I'm already getting? Other than an additional
middleman taking a cut?

~~~
cperciva
Duplicity isn't bad; I often recommend it to people who want something similar
to Tarsnap but don't want Tarsnap itself.

The biggest disadvantage to Duplicity is that it has a far more constrained
archive management model -- with tarsnap you can delete any archive at any
time, but Duplicity works with a "full plus incrementals" model, which means
that (a) you can't delete an archive without deleting the archives which
"depend" on it, and (b) you'll inevitably need multiple full archives.

Other points which are probably of lower importance to most users: Duplicity's
website and documentation is even worse than Tarsnap's; and they rely on GPG,
which has a pretty lousy security track record.

------
hobarrera
I've been a tarsnap users for years (keeping personal photos and emails). I'm
really please with the service.

I added a cron script years ago with the right paths/filenames, and cron
emails me the output (the default). Eventually, the script got a bit more
complex, because I wanted a nicer email subject, and being other silly
aesthetics.

Still, the service is great and deduplication is awesome: data transfer is
extremely low every day, but I have a separate archive for every day for the
last several years.

I have a backup of my tarsnap keys off-site, so I don't lose any photos even
in case of things like the house burning down. As for emails, chances are I'll
never have to need those backups, since they're on my laptop, desktop, and
fastmail.

------
andmarios
I do use tarsnap and like it but in the last years I've found another similar
approach that I do prefer.

What I do is create a local backup using borg. So I do get deduplication,
compression and encryption. Once I update my borg archive (a few megabytes per
day) I sync it to a cloud storage provider, like GCS, S3 or backblaze.

The reason I prefer it, is that I have a copy of my backups locally, which is
convenient if I ever need them. Also borg is more friendly when it comes to
managing old archives.

------
kobayashi
Hi Colin, what would you say is the strongest argument in favour of Tarsnap
over Arq?

(As an aside: I've been following your work for the past four years - as
someone who is quite passionate about infosec, I'd like to thank you deeply
for your contributions to the community.)

~~~
cperciva
I don't know enough about Arq to feel qualified to answer this question.

------
ptman
How does tarsnap compare to borgbackup
[https://borgbackup.readthedocs.io/en/stable/](https://borgbackup.readthedocs.io/en/stable/)
? rsync.net has quite some praise for it:
[https://www.reddit.com/r/linux/comments/42feqz/i_asked_here_...](https://www.reddit.com/r/linux/comments/42feqz/i_asked_here_for_the_optimal_backup_solution_and/czbeuby/)

------
warmwaffles
As always, picodollar pricing is so confusing.

~~~
Osmium
Agreed, as a consumer I prefer flat-pricing, even if it's nominally more
expensive simply because it's easier to budget for and requires less thinking
(honestly).

------
MarkMc
I use Tarsnap and it's great for two big reasons:

1\. Deduplication. I have a 10gb database but only about 30mb changes every
day. So I can backup the full database each day but only pay for an extra
30mb.

2\. Security. Colin is an expert in security and Tarsnap has a respectable bug
bounty program. Plus I can have a Tarsnap crypto key which only allows read
and write - no delete - which adds an extra layer of security in case a hacker
gains access to my server.

------
iUsedToCode
I've been using tarsnap as a backup tool for a small NGO, especially backing
up DB. I take a dump of the DB every 2 hours (we have almost 1k users now) and
send it over.

The biggest pain is that it now takes ages to list archives. I think i have
like 1500 of them. But otherwise: i'm really glad tarsnap exists. I deposited
$5 in november and have $4.673539677478560010 left (i'm pretty sure Colin can
now tell who i am, using the unique amount of picodollars)

My stats now:

    
    
      pawwer@pro16:~$ tarsnap --print-stats --humanize-numbers
                      Total size  Compressed size
      All archives       137 GB            42 GB
      (unique data)    2.0 GB           489 MB
    
      pawwer@pro16:~$ tarsnap --list-archives | wc -l
      1425
    

Thanks, Colin. It's a great service.

However, i backup my personal photos someplace else, tarsnap is a bit pricey
for that. You can get backblaze for $0.005 / GB month with free transfer, so i
use that.

------
thisnotmyacc
Why did you never follow Patrick's advice:
[http://www.kalzumeus.com/2014/04/03/fantasy-
tarsnap/](http://www.kalzumeus.com/2014/04/03/fantasy-tarsnap/)

Not your thing? I've always wondered.

~~~
cperciva
I do follow some of his advice. :-)

This bit: _" Yes folks, Tarsnap — “backups for the truly paranoid” — will in
fact rm -rf your backups if you fail to respond to two emails."_ is no longer
true; you get three emails, and often more if you're a long-time Tarsnap user
/ I recognize your email address for some reason / you have a history of
getting your data almost deleted.

A short time later, Patrick writes _" If Colin does, in fact, exercise
discretion about shooting backups in the head, that should be post-haste added
to the site"_ \-- which is something I considered doing but ended up with a
firm "nope, not happening" to, for two reasons:

1\. I hate to advertise "discretion" because people get upset if you exercise
your discretion in a way which is not in their favour, and

2\. I have solid statistical evidence that _people respond to incentives_. An
email which says "Your Tarsnap account will be deleted soon" is far more
likely to make people do something than an email which says "Your Tarsnap
account needs more money and if you don't pay up soon and don't write back
then I'll think about deleting it" (which is actually closer to the truth). I
really really don't want to delete someone's data when they still need it --
it has happened a handful of times and I feel awful about it -- and implying
the presence of a ruthless cron job is quite an effective mechanism for
preventing that.

Later, Patrick writes

 _Current strap line: Online backups for the truly paranoid

Revised strap line: Online backups for servers of serious professionals_

Here, I simply disagree with Patrick; nobody interprets "truly paranoid" as
meaning "diagnosed by a psychiatrist as suffering from mental illness". I
think this branding has been highly effective.

 _Tarsnap is for backing up servers, not for backing up personal machines. It
is a pure B2B product. We’ll keep prosumer entry points around mainly because
I think Colin will go nuclear if I suggest otherwise, but we’re going to start
talking about business, catering to the needs of businesses, and optimizing
the pieces of the service “around” the product for the needs of businesses._

I have an unfair advantage over Patrick here: I know Tarsnap's user base. With
the exception of Stripe, which started using Tarsnap thanks to Patrick (err,
the other Patrick...), every large corporate user of Tarsnap I can think of
started using Tarsnap thanks to a sysadmin who had used Tarsnap personally. In
economic terms, Tarsnap's "personal" customers provide most of their "lifetime
value" as sales channels to their employers.

This is already getting a bit long to be an HN comment, so I'll stop going
through point by point. Suffice to say that a number of things Patrick
suggests have either happened or are in progress. Customer testimonials?
There's now a page full of them (starting with Stripe). Improved getting-
started documentation? Done. Advice for dealing with a variety of common
scenarios? A whole page of tips. Binary packages for common platforms? Due to
be announced in a few days (currently available as "experimental"). A GUI? In
progress, hopefully landing soon.

I've used this metaphor before, but I like it so I'm going to use it again.
Patrick gives great business advice, but Tarsnap is not just a strategy for me
to make money. So I treat his advice like ships treat navigational beacons:
Paying close attention to them, and using them to plot a course, but not
steering directly towards them.

------
scottpiper
If you're doing backups for your business, I've written on how to properly
encrypt backups[1] and how to use Google Compute Engine for backups[2]. I'm
working on write-ups for AWS and Azure that should post within the new few
weeks.

[1]
[https://summitroute.com/blog/2016/12/25/creating_disaster_re...](https://summitroute.com/blog/2016/12/25/creating_disaster_recovery_backups/)

[2]
[https://summitroute.com/blog/2016/12/25/using_google_for_bac...](https://summitroute.com/blog/2016/12/25/using_google_for_backups/)

~~~
cperciva
This is certainly one way to do backups. Two things which come to mind on
first reading:

1\. You're _encrypting_ backups but not _authenticating_ them; someone with
access to your archives could trivially truncate an archive or replace one
archive with another, and there's a nontrivial chance that they could splice
parts of different archives together.

2\. Every archive you're creating is stored as a separate full archive; this
will probably limit you to keeping a handful of archives at once. With a more
sophisticated archival tool, you could store tens of thousands of backups in
the same amount of space.

~~~
scottpiper
These are both accurate.

For 1, I ensure that an attacker can not modify my archives after they've been
uploaded by giving the backup service "put" only privileges. This is not
possible with GCE from the article unfortunately, as I point out in a warning
banner there, but is with AWS that I'll post soon. My use case is primarily to
have a backup in the event of a devops mistake, or malicious attacker
(ransomware), so I assume if someone has write-access to my archives they
would just delete them, so authenticating them isn't as big of a concern, but
although this would be a good idea just to ensure the files aren't corrupt in
some other way.

For 2, my storage needs currently aren't expensive (100GB archives per day,
which means pennies per day for all of them), but eventually I plan on sending
just diffs. I also wanted to create and send backups in the simplest possible
way to help people get up and running as fast as possible, which meant
limiting myself to the "openssl" command and other basic commands. The other,
smarter, solutions I'm aware of are either tied to a service (ex. tarsnap) or
don't maintain the data as encrypted at the backup location.

------
raphinou
Tarsnap is great if you backup a file system holding small files. If you use
it to backup eg vm disks images, the slow restore makes it an impractical
tool. Worth knowing before switching to Tarsnap...

------
m3kw9
They are as secure as the words on that page that they wrote the security on.
Can you really trust it?

------
adim86
Does Tarsnap backup tarsnap?... just curious :)

~~~
cperciva
I use Tarsnap to back up Tarsnap servers, yes. But obviously not to back up
the backups, because that wouldn't make any sense.

