

Tarsnap – Online backups for the truly paranoid - tete
http://www.tarsnap.com
They look amazing. Bug bounties for everything, completely transparent architecture, data duplication and compression on the fly, they will be up even if two of Amazon's data centers die, one pays per byte and they are pretty cheap.
======
kogir
Tarsnap works really, really well. Just make a consistent snapshot of your
data (I'm using UFS snapshots), point Tarsnap at it, and you're good to go.

The documentation is thorough, and Colin (the owner/operator/author) responds
quickly to emails.

Finally, compression and deduplication is amazing:

    
    
      [nick@home ~]$ sudo tarsnap --print-stats
                                             Total size  Compressed size
      All archives                               348 GB            76 GB
        (unique data)                             34 GB           6.3 GB
    

Yep, I've backed up 350GB of data, but since most of it is duplicated, I pay
for storing 6.3GB. Win.

One word of caution though - this isn't a mainstream consumer backup service.
If you lose your keys you lose your data. No chance of recovery. So make sure
you back those up properly too, ideally in a different geography.

~~~
cperciva
_Just make a consistent snapshot of your data (I'm using UFS snapshots), point
Tarsnap at it, and you're good to go._

You're using the --snaptime option, right? It's necessary when you're backing
up a filesystem snapshot in order to work around a race condition with them --
if a file is modified, the filesystem snapshot is created, and then the file
is modified again, all within a single time quantum, it can trick Tarsnap into
thinking that the file hasn't been modified later (which triggers an
optimization of "this must be the same blocks as it was last time" in place of
the usual "read the file and split it into blocks" behaviour).

 _Finally, compression and deduplication is amazing:_

Well, if we're going to be posting statistics here...

    
    
                                             Total size  Compressed size
      All archives                               269 TB           121 TB
        (unique data)                            177 GB            72 GB
    

That's 269 TB of data backed up from my laptop, deduplicated and compressed
down to 72 GB. This is what I get for taking a backup of my entire home
directory every hour...

~~~
kogir
> You're using the --snaptime option, right?

Yep, emailed you about this in fact, and appreciated your detailed response.

~~~
cperciva
Ah, that was you -- I remembered sending an email about snaptime recently but
couldn't remember who it was to (and HN user names don't always correlate
anyway...)

------
e40
_why Tarsnap pricing is defined in terms of picodollars per byte rather than
dollars per gigabyte: Tarsnap's author is a geek. Applying SI prefixes to non-
SI units is a geeky thing to do._

I find that so amazingly annoying. To me it says "yeah, I know many people
might find it hard to get their head around the units I defined, but I don't
really care about that because I find it cool." We have standard units for a
reason, because people can immediately get the scale of something in their
mind. With this, you can't. I went to their site open to what they were
selling, but I'm very turned off by this.

~~~
veidr
Interesting: in my browser, this comment is both greyed out _and_ upvoted to
the top of the list. Does this mean that comment placing is determined by
upvotes, and comment greying is determined by downvotes, but those two
processes are independent of each other?

~~~
sp332
Comment placement is by both new-ness and upvotes. So a post that is downvoted
immediately sinks faster, but doesn't immediately go to the bottom.

~~~
DanBC
Also average comment score of poster.

------
ozataman
As an alternative, I use Arq continuously on all my computers and I highly
recommend it (Sorry I'm on my iPhone and won't be able to give a link). It
lets you use your own AWS credentials for backup and you can encrypt the data
before it is sent to AWS.

The issue I have with Tarsnap is that the data is still at the hands of a
small operation, as far as I can tell, and honestly I'm afraid we won't get
our data if something happens to the guy. This is fine of course for many
services, but data backup is inherently as mission critical as it gets. The
whole reason for it is reliability, assurance and redundancy. It is not a nice
to have, it is for many people the only place they fully trust to keep their
data forever.

I wish Tarsnap had an innovation that made it possible to use it with one's
(or an organization's) own AWS credentials. An on-site mode, if you will.
Otherwise it has always seemed to me like a great piece of software.

~~~
aidos
I have the same thought. I'm using backblaze at the moment but am actively
looking to move to either arq or tarsnap. I like that arq is in your own
account and the file format is open so you can work on it yourself. Also,
storing to glacier means it's dirt cheap. It's reassuring to here someone
having a positive experience with it. Backblaze ha been a mixed bag for me.

~~~
Paul12345534
Have you considered Crashplan? I've had positive experiences with it. The only
downsides are that the client program uses a ton of RAM and there's no API.

------
spindritf
All the data is encrypted before it ever leaves your machine. Not even
cperciva should be able to read it.

You can also create a write-only key. If you run tarsnap from a server which
gets pwned, the attackers can't touch the existing backups. Don't be the next
Astalavista[1].

[1] <http://joncraton.org/blog/49/analyzing-the-astalavista-hack>

~~~
neeee
I would have trusted that more if the tarsnap client had source code
available.

~~~
lambda
The source code is available; it's available under a "shared source" license
rather than free software/open source (you can look at it, but not modify it),
but it is available for review. <https://www.tarsnap.com/download.html>

He also has a bug bounty <http://www.tarsnap.com/bugbounty.html>, and several
substantial security bugs have been found and fixed due to the bug bounty
(<http://www.tarsnap.com/bounty-winners.html>). In fact, the first of those,
the AES CTR nonce bug, was found before he had offered the bounty program; the
bounty program was inspired by that bug, and has since led to the discovery of
several other more minor issues.

So, the source is available, and there's a bounty out for discovering bugs
ranging from cosmetic issues to major security issues. Feel free to review it
and submit any bugs you find!

------
mhartl
I believe Tarsnap's only flaw is that it hasn't yet solved the cperciva-gets-
hit-by-a-bus problem. Or perhaps I am mistaken?

~~~
cperciva
That is indeed a problem which has yet to be solved. Or a potential problem,
rather... I'm rather hoping it will never actually happen. ;-)

Seriously though, it is on my list of issues which needs to be addressed.
Bringing in someone else and getting them up to speed on how to run everything
is an expensive prospect, though.

~~~
carterschonwald
Have you considered some sort of "enterprise" variant where large
organizations use their own storage backend? Just 1-4 serious enterprise sized
customers would cover the salary and overhead of 1+ good engineers. There's a
lot of opportunity in that direction. Like any medium sized or larger company
that needs to deal with compliance with education or medical privacy
regulation, your tech is a great backup solution, and if carefully done
doesn't increase your overheads much/ at all.

Otoh, I am just speculating :-)

~~~
cperciva
Setting up tarsnap to use non-AWS infrastructure would be a significant amount
of work. Setting up a "private" Tarsnap (but still on AWS) is something I
could do for a company needing to store a large amount of data (say, 10+ TB).

~~~
carterschonwald
doesn't AWS have some special clouds for companies that have compliance needs?

Either way, it might be worth looking into even just the "private" Tarsnap on
AWS business direction as a way of growing revenue in a way that isn't tied
strictly to data storage volume.

One way to go about this is to ask some of your larger business users if they
would be interested in such a "private for them" self hosted Tarsnap variant.
I think many of them would love a way to help you have revenues sufficient to
support having an additional engineer (or two) working with you, which isn't
possible for them to do with your current usage based revenue model.

Point being, theres probably an "enterprise" business model that stays true to
your quality goals, but gives you more ahead of time revenue by a substantial
amount. For some of your customers, there might be more value in supporting
you being able to hire some engineers than there is in the cost savings
element of the current revenue model. This can be an ancillary product that
isn't the core one, but which still helps you have more resources to make the
core better.

Talk with your larger customers, they're probably happy to chat with you given
the chance.

------
tete
OP here. I found them looking for a good backup solution.

They look amazing. Bug bounties for everything (including cosmetic stuff),
completely transparent architecture, data deduplication and compression on the
fly, they will be up even if two of Amazon's data centers fail, one pays per
byte (traffic/store) and for all that they are pretty cheap.

~~~
DanBC
Tarsnap is brilliant. cperciva is a well known and respected HN user too.

(<https://news.ycombinator.com/user?id=cperciva>)

~~~
herenowgonenow
> cperciva is a well known and respected HN user too.

Irrelevant.

~~~
Ecio78
But maybe the fact that he has been a Security Officer of the FreeBSD project
for many years is relevant (for those concerned of privacy/security):
[http://www.nux.ro/archive/2012/07/Colin_Percival_no_longer_S...](http://www.nux.ro/archive/2012/07/Colin_Percival_no_longer_Security_Officer_for_FreeBSD.html)

------
trhtrsh
Interesting. Tarsnap and rsync.net seem to alternate coverage on HN, and for
the longest time I kept forgetting they were different, even though I had
vague sense of confusion.

This one is Colin Percival's project.

~~~
rsync
We hold tarsnap in high esteem and wish Colin the best of luck. Further, we
appreciate all of the good work Colin has done for the FreeBSD project.

Glad to see you on the front page.

------
tonywebster
I understand data is encrypted before it ever leaves your machine, but I
certainly wouldn't want encrypted data at-rest being exposed. Which gives me
concern about Tarnap's terms: "I may provide information concerning your
account and your use of the service to 3rd parties, at my sole discretion, if
... It is requested by law enforcement authorities ..." note - no requirement
for a court order or subpoena.

<https://www.tarsnap.com/legal-why.html#PRIVACYLAW>

~~~
cperciva
Note the last paragraph of that: _However, I'm serious about saying "at my
sole discretion" — if a law enforcement agency wants information, they'd
better have a good reason for asking for it... and I don't consider the NSA
saying "we want to have all the information you have, just because we feel
like it and someone somewhere might be a terrorist" to be a good reason. Also
note that unlike the situation with certain illegal wiretaps, I can't give
your data to anyone, because it's all encrypted such that I can't read it._

This situation has never arisen, but if I'm confronted by a police officer and
enough evidence that I'm sure they _could_ get a court order, I'd rather be
cooperative than force them to go through the courts. This doesn't mean that
I'd give them any more data than they would get from a court order -- in fact,
quite the opposite, since police tend to err on the side of requesting more
than they need when going through the courts, and cooperating could change
"seize a server" into "get a copy of the required data".

~~~
nuttendorfer
Why would you comply with foreign government agencies?

~~~
cperciva
If the Libyan, or Iranian, or Chinese, or Russian police come knocking, I
probably wouldn't.

Beyond that, it's a judgement call. Lots of countries have agreements to
assist each others' police forces in obtaining evidence.

~~~
werid
They would all be channeled through your local law enforcement though.

If swedish police comes to you directly, you don't have to comply, but if they
go through the proper channels, the request to you, comes from canadian
police.

------
zenocon
A secure online backup service for _Minix_ \-- FINALLY!!

~~~
cperciva
I'd like to support GNU Hurd too, but they make some unusual (but still POSIX
compliant!) choices and I haven't had the time to work around them.

------
skarmklart
I'm thinking of using Tarsnap. Can I absolutely, positively, definitely trust
that everything on Tarsnap's end is encrypted to best practice standards and
that there is no reasonable way to get to my data (outside of the usual
contract provided by encryption I mean)?

I don't have the option to know for sure by analyzing the source code myself
so I'll have to trust the popular opionion of Very Smart People here on HN
(well, I suppose I _could_ if I spent a non-trivial chunk of the coming year
reading up on crypto stuff).

~~~
icebraining
Well, the guy who wrote Tarsnap is himself a Very Smart Person here on HN, so
I'd say the answer is yes, by definition ;)

~~~
Steko
"a Very Smart Person"

Did he win the Putnam?

spoilers: <https://news.ycombinator.com/item?id=35083>

------
Freaky
Cyphertite may also be of interest: <https://www.cyphertite.com/>

Client-side encryption and deduplication, with source code. 8GB free, $10/mo
for personal unlimited use, 10c/GB/month for business/enterprise. My main
reservations are they seem to be based in one datacenter, and don't seem to
have support for multiple keyfiles with separate read/write/delete/machine
restrictions. Also not in FreeBSD ports :P

------
nkabbara
Tarsnap has been working really well for us, but one huge downside that we've
noticed is how slow it is to restore data from say a 1TB archive.

Sometimes it takes more than 3 hours to restore a customer's 40MB directory.

If we were to have a full HD failure and had to restore the whole 1TB, that
would probably take days. Days of downtime for us.

So depending on your situation, this might not be ideal.

I contacted Colin about this a few months ago and he mentioned that he is
working on a faster version.

------
krasin
The "legal" section of the site is confusing.

"1. You may only access the service using unmodified Tarsnap client code which
I have distributed" -- really? no API and no custom clients?

~~~
taralx
I imagine that if you want an API you could ask nicely... :-)

------
hazz
One of the things I love about Tarsnap is the bug bounties, which range from
$2000 for being able to decrypt user data right down to $1 for cosmetic
issues.

------
tigerweeds
quick question here: is there a delay in Recent Activity?

I just signed up and used it on two servers like 30 minutes ago, but I don't
see anything in the account activity except the payment info. I'm quite sure
my servers sent stuff because I monitored b/w usage

~~~
cperciva
The accounting data updates at midnight UTC -- so yes, there is a delay.

~~~
tigerweeds
roger, thanks!

------
Ecio78
Anyone has experience with Duplicity? <http://duplicity.nongnu.org/>

~~~
dnr
I use it to save backups of my desktop and laptop home directories to a home
NAS mounted with NFS, which later gets synced to another NAS.

It's a decent tool. I encrypt with a separate gpg key, do mostly incremental
backups, and a full one every few months. Incremental backups take under a
minute on my desktop (100K files, 11G). Full ones are kind of slow (which is
why I set it to only do it every few months).

I haven't used the S3 support.

------
aaronbrethorst
Colin - I dig what you're doing, but every time I go to the Tarsnap website,
I'm turned off from using it for all of the reasons that have been discussed
here ad nauseum since 2009. I'd love to see you succeed more; I think you
deserve it, and I wish you'd just _grab_ it.

see <https://news.ycombinator.com/item?id=820705> and
<https://news.ycombinator.com/item?id=1639277>, e.g.

------
nuttendorfer
I'm still not sure whether I can trust somebody else with my data, but I'm
growing more and more concerned of hardware failure of my own backups. Might
try Tarsnap one of these days.

~~~
marcosdumay
It aledgely encripts all your data at the local machine, before sending it to
the server.

Now, of course, if you are truly paranoid, you'll want to review their code
first. I don't get why I can't simply mount a volume with encription and write
there. Using code that is already on my machine (on the kernel, nonetheless)
would make it a much simpler decision.

~~~
cperciva
_I don't get why I can't simply mount a volume with encription and write
there._

You can (and you could even use tarsnap to back up the encrypted filesystem
image if you want), but writing your data to an encrypted filesystem tends to
expand the amount of data changing -- in the extreme case, if you create a
copy of a file you'll write that many blocks of new encrypted data which needs
to be backed up, whereas tarsnap would just say "hey, I recognize all these
blocks, it's those ones I backed up earlier" -- so Tarsnap's encrypted backups
of a filesystem tend to be many times more efficient than backups of an
encrypted filesystem.

------
tbassetto
I would love to have something like the Backblaze client but working with
Tarsnap as a backend: you install it and you forget about it. The sensible
default configuration is good enough for average joe but you can tweak it if
you want.

~~~
philfreo
Check out Arq

<http://www.haystacksoftware.com/arq/>

~~~
tbassetto
Well, it is not using Tarsnap as a backend and it seems that you have to add
folders on the first launch, that is definitely not what I am looking for :)

Time Machine and Backblaze know how Mac OS is architectured and backup
everything but useful files (logs mainly).

------
alanh
This is absurdly expensive. If you have a 400GB laptop, fully backed-up and
with negligible deltas, you are paying $1440 a year.

(400GB is not an absurd amount, either. I personally would ideally have about
that much in my off-site backup.)

~~~
snuxoll
You're missing the value-added component here where tarsnap compresses and
deduplicates data.

~~~
wmf
That's not exactly rocket science these days (see bup). What you're paying for
with tarsnap is making it totally rock solid and as usual the last 90% of the
work is also 90% of the cost.

~~~
elangoc
The point that suxnoll is making, though, is that the cost is not nearly high
to begin with, b/c the data is dedup'ed and compressed. You're justifying an
issue that doesn't quite exist.

The query that suxnoll responded to supposed that you have 400 GB of data with
small deltas. But that's only possible if you're filling your harddrive with
files created from random noise from /dev/random and updating all your files
monthly by more random noise.

------
lbatista
I've been using <http://labs.bittorrent.com/experiments/sync.html> for a while
and it is as reliable as you need.

~~~
nonane
Sync is not backup. For backups you also need the ability to look back at the
previous versions of your files. For example in your sync solution if a file
gets corrupted, all copies of the file will also have the corrupted bytes
sync'd. With a backup solution you'll be able to rollback to a previous non
corrupted version.

------
andrewcooke
i've been poking round the site and i couldn't see an answer to this question
- why do you need to encrypt the communication if the data themselves are
encrypted? maybe i am misunderstanding, but it seems like each block is
encrypted _and_ the pipe between client and server is encrypted. is it because
there are additional interesting metadata (if so, what)? or have i
misunderstood?

~~~
cperciva
There is metadata; whether you consider it interesting is up to you. The
tarsnap client has to say "I'm machine X, and I want to store a block of data
with tag Y"; and when you extract an archive, "I'm machine X, and I want to
retrieve the block of data with tag Y". This could allow someone to figure out
(a) that it's the same machine, and (b) that you're extracting an archive
which contains data which was stored at a particular point in time.

Paranoia means encrypting everything which _might_ be sensitive, even if you
can't see any way for it to be abused.

~~~
andrewcooke
thanks. (i understand the point about metadata; to be honest i was more
worried i had misunderstood).

------
Paul12345534
$0.30/GB/month is pretty steep :) You can use your own private keys with
Crashplan, $60 a year for unlimited storage/bandwidth.

~~~
Paul12345534
Crashplan also does dedupe and compression. The only downside is the client
program uses a lot of RAM.

~~~
dublinben
I concur. I've been very happy with the price and features, but the ever
increasing amount of RAM (the larger your backup, the larger its RAM use) is
worrying.

------
sdfjkl
How does this compare to dump | aespipe | s3cmd?

~~~
wmf
If none of those commands perform dedupe and compression then tarsnap is much
cheaper (like 20x cheaper).

~~~
sdfjkl
It's safe to assume a bz2cat can be thrown in there. And I'm not sure how
deduplication can work on encrypted dumps/files?

~~~
cperciva
Tarsnap deduplicates first, then it encrypts.

~~~
sdfjkl
I see, that makes sense. Any word on how it compares to simply shipping
encrypted dumps to S3?

~~~
cperciva
Depends how much tarsnap manages to deduplicate, really.

------
bernatfp
Truly paranoids are/will/should use Bitcoin or Litecoin. I don't get why
pricing is USD$ only, it just seems that cryptocurrencies are perfect for this
kind of service.

~~~
glurgh
It's how your data is actually stored that protects it, not whether you've
paid for the storage in tinfoil shekels or picodollars.

~~~
ethanaustinite
The "truly paranoids" probably have plenty of bitcoin burning a hole in their
digital wallets.

------
olalonde
<http://www.tarsnap.com/legal-why.html#BCPSTISSTUPID> I see what you did there
:)

~~~
cperciva
Well, the BC PST _is_ stupid. I find it amazing that during the HST referendum
campaign a large number of people said they thought the HST was good, but they
were going to vote against it anyway to punish the BC Liberal government...
and now we don't have the HST, the the government they wanted to punish has
gotten re-elected.

------
awayand
try crashplan with boxcryptor

