Hacker News new | past | comments | ask | show | jobs | submit login
Tarsnap price cut (daemonology.net)
320 points by cperciva on Apr 2, 2014 | hide | past | web | favorite | 157 comments

Tarsnap: Our costs just went down, so we're lowering our pricing for customers too!

Comcast: Our costs just went down. Effective immediately, we are adding a new cash management fee to your bill to cover our costs of handling all this new cash.

I was once billed by Washington Mutual for depositing too much cash with their bank tellers. Can't make this stuff up.

Cash management actually does cost money. You expect bills you get from them to be definitely issued by authorized representatives of the US government and bills you deposit with them to not disappear, right? Those are not naturally occurring properties of green paper.

Below scale, cash management is thrown in just to get people's business, but at scale, it costs what it costs. Walmart, which has substantially more problems than you on that score, reportedly spends millions of dollars every year just dealing with pennies.

Cost of doing business. But I'm sure they spend many more millions on consulting figuring out how to pass that along.

The "cost of doing business" all gets passed along to the consumer, inevitably. That's how businesses work. The consumer has to cover the "cost[s] of doing business" and then some, or companies don't make any money.

I'm using the phrase to mean something that's inconvenient for a company but ultimately necessary to make more money. Not just any task that's part of everyday business. But maybe I'm mistaken.

"You have too much money. Here, we'll help you by taking some of it off your hands..." ?

They complained that they had to handle too much physical cash and that there was a limit to how much I could deposit in a given month (into a business account, mind you).

I can't think of TOO many more first world problems than that, to be honest...

You realize that they have to pay people to move that cash around, count it, double check it etc, and then they have to pay to store it and maintain it? People don't work for nothing, and storage doesn't cost nothing.

It would be true if they charged a percent of the money deposited for every dollar, it's not the case here.

It might be. My (Canadian) bank's "small business banking" plan charges $2.25 per $1000 of cash deposits beyond the first few thousand/month. (And $2.25 per $100 of coins.)

Of course, if I ever deposited cash I would be on a different banking plan; but it's not entirely unheard-of.

The one thing that has put me off Tarsnap until now is, and not to be intentionally morbid, the bus factor [1] of 1. As far as I know its just Colin that runs it. In the unlikely event we need to access our backups, tarsnap is down and Colin is no longer available to maintain it we could be stuffed.

Colin, do you have a contingency plan in place if you are not available?

1: http://en.wikipedia.org/wiki/Bus_factor

Here is his response to that question on the mailing list:


What has put me off Tarsnap was not realizing that the price per GB is for compressed storage. So while it looks high compared to Amazon's per GB pricing, it's a lot cheaper.

I don't recall if this was on Tarsnap's website when I had first checked, but it might be worth reporting the average compression rate to make more visible to prospective customers what the Amazon S3 storage portion of the pricing is.

What if tarsnap released a command-line tool that could be run on a directory to estimate its storage cost, pre-signup?

Yeah, I almost wrote to them suggesting exactly this, but then I figured what the heck, I'll just sign up. The tool would definitely be helpful, though.

I'd assume the average compression rate to be very low, no? Don't modern cryptosystems have to encrypt then compress to avoid some forms of attack, thus making the compression not very effective?

I'd wager that the output from "modern cryptosystems" is, for all practical purposes, not compressible at all.

Those attacks only work when the attacker has some control over the plaintext. Backup services don't have that problem.

Well, if you backup your entire disk, then there is probably a fair amount of known plain text...

In the recent web attack instances, it's not just known plaintext, it's using compression to discover unknown plaintext by repeatedly guessing across multiple requests and watching for the request size to change. So an attacker would need the ability to alter your filesystem and make backup requests, repeatedly, for that attack to matter to Tarsnap.

So tarsnap compresses then encrypts?

Guys, don't confuse the crypto issues here.

The whole point to tarsnap is to be able to safely encrypt any and all of your data, including compressed data. We wouldn't expect that a .tar.gz being backed up is somehow "less secure" than the .tar file, we'd demand the same security for both.

Compression becomes a problem when it can be used as an "oracle" into the key used for a given stream of ciphertext. The reason TLS is susceptible is because the attacker can MITM and control at least some aspects of the plaintext or shared client-server state, and iterate repeatedly to refine their guesses.

These issues simply don't apply in the same way to backing up files.

I sure there are still theoretical issues that would need to be worked through in deciding how you'd do something like this, issues which could be better explained by any of the many crypto types who hang out here. But don't cargo cult your treatment of crypto.

I haven't actually checked that; all I mean to say is that it would be relatively safe to do so.


And it de-duplicates as well.

A strong construction should be impossible to compress because encryption maximizes bit entropy, which should make the cyphertext indistinguishable from a PRF (ie a random noise source).

MAC, decrypt, decompress.... always^9 in that order.

Right, encrypt then compress is useless. And compress then encrypt is unsafe in some cases[1]. Apparently backup isn't one of them, you need to be able to choose the plaintext.


Not really. If you compressed your data on Amazon, you'd get a lower per-GB price too. I think tarsnap is good, and you're paying him for the service of interfacing to Amazon for you. It's not accurate to say his price of compressed storage is comparable to uncompressed storage.

You could mitigate the bus factor of 1 yourself by using more than one backup service, which is good practice anyway.

Very true, and infact that is what I do. I use my VPS providers backup system for snapshotting my servers and an offsite file store for additional backups.

However if my choice for the latter is between a sole trader and an organisation with 20-30 employees and multiple directors that obviously is the more sensible choice. No matter how much more you like the tools or respect the owner with something as important as backups you need to go with the most reliable option.

Sure, I could use tarsnap and another file store for three backup locations (which may be something I should do) but then the argument is why not go with two companies that have a bus factor of more than 1?

For my part, it's a closely related reason: I'd sign up for the service today if the client were Open Source, rather than "look but don't touch" source.

If you're using Tarsnap, and you hear it fails, then quickly make a new backup somewhere else. The chance of your primary storage failing during that unlikely and limited window seems low.

Yes, that could work however you have now lost your historical backups and still exposed yourself to an unnecessary risk.

Again sorry Colin for this but there is a 1/500,000 chance he will be struck by lightning this year [1]. Amazon 3s (the backend of tarsnap) has a durability of 99.999999999% [2] and so you are 200 thousand times more likely to lose your backup because of a lightning strike than due to a corruption on s3.

[1]:http://en.wikipedia.org/wiki/Lightning_strike [2]: http://aws.amazon.com/s3/faqs/

I think the fact that Colin handles everything himself increases my trust in tarsnap, because he clearly knows what he is doing. I trust it in a way that I would not trust a faceless corporation like crashplan, even though they also advertise client-side encryption.

I have a secondary emergency backup in a bank vault, so even in the unlikely event that tarsnap were to crash and burn at the same time my disk fails, I still have another backup, it's just 1-2 months old.

Honestly, I'm not sure I would be a tarsnap customer if it were a larger operation where I could not gauge the trustworthiness and skill of all the involved parties.

Why do you need Colin alive in order use tarsnap?

(God, that's a strange sentence to write. Sorry about that, Colin.)

It runs through servers that he manages (it doesn't talk directly to s3). If those servers are down you can access your backups.

I don't carry around EC2 servers my backpack. Me getting hit by a bus wouldn't take the servers down immediately.

If you get hit by a bus, who alerts your users and how? If your users aren't alerted, they may not realise that anything is amiss until AWS freeze your account for non-payment (for example).

I can't say that I spend a lot of time keeping up-to-date on the project leaders' obituaries for services I use. I don't imagine I'm unusual in this regard :)

Edit: This is just a rhetorical question, not a demand for explanation :) - just saying that being hit by a bus does not automatically mean that users are shifting off your service in 24 hours.

Colin has said he does have a system: http://mail.tarsnap.com/tarsnap-users/msg00849.html

Assuming the S3 bill is the next day...

Remember when Colin Percival announced his Tarsnap logo competition on HN?

There'd be a cash price and everything. I am sure several people worked on logos - some excellent, some less so. I'm frankly surprised to see that this is the result:


Really? Did some of us just waste our time so that Colin could have his 8 year old nephew whip up a blurry logo? It's ofcourse his right to do so, but it does make me wonder.

Why are you surprised by the result? It's distinctive, easily recognisable, and the keyhole makes it clear that the product's focus is on security.

"During this time, 83 people submitted over 100 designs . . . The winner, as promised, received $500; I decided to create a second prize of $100 for the designer of the other logo, in part as an apology for the time I took up asking him for revisions" http://www.daemonology.net/blog/2013-12-16-tarsnap-logo-cont...

There's also a square version here: https://www.freebsdfoundation.org/donate/sponsors

eh... Without having any skin in the game, I would say this logo design adequately conveys the meaning: disks, a lock, "authoritative"-looking font, succint tagline.

I agree the blurriness could be improved, though.

As for people entering a contest... they can't all be winners.

The blurriness is the result of antialiasing. I start with an EPS; can you tell me how to convert this without running into that problem? I'm afraid I'm not very good at anything graphical.

Often it requires a bit of manual hinting. For example, [1] was just resizing the vector and allowing antialiasing to do its thing, and [2] is the result of resizing it and then dragging vertices to pixel or half-pixels.

[1]: http://i.imgur.com/Urs6S3r.png [2]: http://i.imgur.com/5EMBxaV.png

Ugh. I was hoping there was some automatic way of doing this.

If you can first convert the logo to PDF, then you could grab xpdf or poppler, then try: pdftoppm -png -aa no -r <dpi> logo.pdf logo

You'll have to experiment with the dpi until the logo comes out the right size. The dpi doesn't have to be an integer.

Well I find this logo to be excellent. Easy on the eyes,and the motto is to the point.

So, I guess it's about time I start using Tarsnap :)

No, seriously. What do most people use it for? Simply creating a daily backup of their hard drives? Also, are there any business users who use it to backup an entire organisation's systems?

Btw, more on-topic, I'm reminded of this quote, which I love, from Jeff Bezos: "There are two kinds of companies: Those that work to try to charge more and those that work to charge less. We will be the second."

I've always thought that was a profound sentiment, and I've always wondered if, in the long run, that's the way to make a business survive.

What do most people use [Tarsnap] for? Simply creating a daily backup of their hard drives?

This is all anecdotal, but I think most people are at least somewhat selective in what they back up. In my case, I have some servers where I back up /, but on my laptop I only back up my home directory because I know if my laptop dies I'll be reinstalling FreeBSD from scratch anyway.

Also, are there any business users who use it to backup an entire organisation's systems?

I think so, but I'm not going to name names. Maybe some Tarsnap customers will reply here.

this quote, which I love, from Jeff Bezos: "There are two kinds of companies: Those that work to try to charge more and those that work to charge less. We will be the second."

Yes, I found that quite inspiring too. And it has certainly worked well for Amazon.

>> Yes, I found that quite inspiring too. And it has certainly worked well for Amazon.

It's worked well for Wal-mart too, but I don't think anyone would praise them for doing so.

The fact that enough people are choosing to buy stuff there, making it a sustainable business model is praise enough.

> are there any business users who use it to backup an entire organisation's systems?

Stripe has been a happy Tarsnap customer for several years. It's robust and thoughtfully designed, Colin has provided great support, and "backups for the truly paranoid" is a pitch that very much resonates with us.

Wow, that's a hell of a testimonial. :)

Definitely vault physical backup copies of private key material at a physical safety deposit box in a bank. Preferably two copies in two banks in two different timezones.

I keep all of the company registers and records, receipts, contracts, invoices, etc in Google Drive. We need to use Google Drive as it is the best of the options open to us for securely sharing files with accountants and lawyers.

I use https://github.com/Grive/grive to sync Google Drive to an encrypted local directory on my workstation at home. This is done daily.

Then I use TarSnap to backup that local directory.

Effectively I create an on-site copy of important docs from Google Drive, and then use TarSnap to create a secure and trusted off-site backup of those important docs.

Oh, and we also use TarSnap to backup our company (product) databases once a day too (though interim backups are stored closer to the servers too).

Our entire organisation is handled thus:

1) Systems and code via Github, and pulled often to one place (that local workstation) and then backed up.

2) Data dumps pulled locally and sent over to TarSnap of all product/customer data and all company files.

And the only on-faith thing is file attachments in the customer data:

3) Files via Amazon S3, trusting the durability of S3 and security of our interface to it.

This is all disaster recovery stuff. Secure, trusted backups.

"There are two kinds of companies: Those that work to try to charge more and those that work to charge less. We will be the second."

Thats works well when your prices are based on 3rd party prices and you can optimize your business to keep your margin. But, for instance, a service oriented company, or a freelancer, will usually try to improve his/its skills and experience to charge more because the endresult is worth it.

In the case of a freelancer your are optimising your prices so that you can charge more while optimising your skills/performance so people need less of your time.

You can make money by working more effectively and efficiently while still charging less for the same "product".

But that wouldn't change my bottom line at all, i would just work faster/more and need more clients. My clients usually are fine with paying more because they feel that my experience has value vs the next guy who just cobbles together their application without thinking much about maintenance, architecture, testing etc. I might even be slower than that guy.

It can easily change your bottom line.

If you can do the task in a quarter of the time and charge half as much for it then you can double your bottom line.

Well im not convinced. For one, better skills/more experience doesn't necessarily make you faster, it might even make you slower because you put more effort into things others simply ignore, but the quality will be higher. Anyway, i see your point, i just don't think it translates into reality all that well. Not for longterm projects at least, which are hard to estimate anyway.

How does Tarsnap compare to Arq when using a Mac? Both do local encryption and incremental backups with deduplication. However, I pay $0.03 per GB-month on S3 with Arq and $0.25 on Tarsnap, almost an order of magnitude more expensive. And yet I constantly hear people saying that Tarsnap should be more expensive when I simply compare it to Arq and think that it already is very expensive.

You paid $40 for Arq, though. Assuming you intend to keep it updated, that's $30/year¹, which buys you 120GB-month per year on Tarsnap. It also works everywhere where there's an Unix shell (including Windows with Cygwin), not just Mac OS X.

¹ (assuming 16 months between major versions, as in 3 → 4)

> It also works everywhere where there's an Unix shell

This really is Tarsnap's key differentiator (for me).

It's also worth noting that Arq does have command line functionality[1] (on OSX) as well as an open source CLI restore tool[2].

[1] http://www.haystacksoftware.com/support/arq_help/pages/scrip...

[2] http://sreitshamer.github.io/arq_restore/

120GB * ($0.25/GB)/Month * 12 Months/Year = $360/Year correct?

So, $30/Year buys you only 10GB-month of storage on Tarsnap.

Sorry, I wasn't clear. I didn't meant you could get 120GB of storage for a year, I meant you could each year buy 120 units of "GB-month" (like kWh but for storage).

You could distribute that equally throughout the year, using 10GB-month per month, yes. Or you could use 1GB-month in the first month, 2GB-month in the second, etc, as your storage needs grew.

  TARGETS="/etc /home /var /root /srv"
  ARCHIVE="$(hostname)_$(date +%Y-%m-%d_%H%M)"

  tarsnap -v -c -f "$ARCHIVE" $TARGETS

Ugh. This works, but it's ugly. There are a lot of automated tarsnap assistants on github; I made one myself.[1] They can do helpful things like allow for daily, weekly, and monthly backups, keeping a certain number of each. (And personally I like giving each folder its own archive, and I dislike having slashes in the archive names.)

[1] https://github.com/pronoiac/tarsnap-cron - which might be helpful to someone else. It splits up the archiving and the pruning of old archives, so you can do those with different keys. This part is as yet untested. It probably requires an tarsnap fsck if you do those on different systems. I was going to forgo the plug, but it might be helpful, and I'm about to get some sleep, so I'll probably miss the best time to contribute to the discussion.

I've always thought that was a profound sentiment, and I've always wondered if, in the long run, that's the way to make a business survive.

That depends on your customer and what your product positioning is. If, for example, you run a discount store, a la Amazon or Wal-Mart, then that advice will work well for you. If, however, you are running a company that sells products that are positioned as premium, lowering your prices can hurt you.

Cutting costs on premium products immediately changes customer perception, such as when Starter was purchased by Wal-Mart or Rock & Republic was purchased by Kohl's.

I only use it to backup configs and game saves. So far, I've used less than $0.05, backing up 73M with 93 days of history (I haven't deleted anything).

That is a daily snapshot of my /etc, important dotfiles, and game saves. I haven't needed it yet, but I'm sure I will be glad when I do.

EDIT: I just have a simple script in my cron.daily that backs up files and folders that I tell it to.

I have something close to 100G of document scans, database backups and /etc for all my servers in there. Use it to back up my office file server as well.

> What do most people use it for?

Weekly backups of my VPSs, daily backups of my email and Tiny Tiny RSS database. I'm probably worthless as a customer :|

I'm probably worthless as a customer

I find that the customers who are paying $0.01/month often provide the most enthusiastic word-of-mouth advertising. You're not useless at all, no matter how little you have stored. ;-)

[I haven't looked up your account, so I don't know if you're over or under $0.01/month, but the precise number really doesn't matter.]

> What do most people use it for?

I use it for daily DB backups for 2 small website databases. It's cost me about 2 cents for 4 months.

I use it to backup my entire machine from /

It's only about 2MB per day to upload. I have a daily cron that does the backup and deletes all but the latest two images.

Quality of service, along with Colin's amazing pricing, make it an easy win for me. I never feel like I need to stop and waste time thinking about other options.

Genuinely curious, how is this different from using rsync with a shell to S3?

Maybe nothing. I didn't look into other options because Tarsnap is already cheaper than a cup of coffee per month and it's run by someone I trust and want to support.

I'm with you on that it's cheap and also run by someone trustworthy. However, what happens if he decides to retire or something similar? At least S3 is almost surely going to be around.

If he disappears and suddenly my backups aren't available. Then I'll start backing up somewhere else. No biggie.

If he disappears, suddenly my backups aren't available, AND my drive blows up, all at the same time, then I'll be sorry I guess.

I would like to use it to backup my photos and videos offsite (they are currently on a ZFS RAIDZ machine with snapshots), but, unfortunately, the cost is prohibitive. I currently use SpiderOak, which is overkill for my purposes, but comes out to about $5/mo for 100 GB.

Something like Popcorn Time is the future both as a client and moving to a shared backend where people can more easily split hosting fees... Because how many copies of The Big Lebowski need to be duplicated all over the place if there's plenty of bandwidth to serve it? A hosting service like Mega with the security of Tarsnap that could host and de-duplicate torrents would be very interesting. Imagine being able to instantly watch a movie if someone else on the same service happens to have already got it.

>A hosting service like Mega with the security of Tarsnap that could host and de-duplicate torrents would be very interesting.

Tarsnaps security is based on client-side encryption. They cannot read the plaintext. Therefore if more than one person uploads the same file, they will have two different encrypted versions of the same file, which can not be deduplicated.

You could use convergent encryption, but then users can query each others files.

> I've always thought that was a profound sentiment, and I've always wondered if, in the long run, that's the way to make a business survive.

In the presence of competition this is the case. Fat margins attract competitors.

If I wanted to store, say, a 100 GB of photographs (jpeg and raw formats), roughly how much would it cost me per month just for the storage? Let's say I would upload one big archive of photos in the beginning, and then re-archive the photo collection whenever I add significantly more photos to the collection (after a vacation or a birthday).

To go just by the stated pricing of Tarsnap, this would cost me $25 per month for the storage. But then I also see mention of users with terabytes worth of archives who pay less than $10 a month. I read the FAQ entry which explains how this happens, but that does not really tell me whether I can hope for such savings when it comes to photos. Do photo collections (raw/jpeg/both) "shrink" significantly from the deduplication and compression of Tarsnap? I think they don't (and that the savings apply to incremental backups), but it would be great if you could make this clear. Thank you!

Tarsnap won't be able to shrink your photos by using deduplication and/or compression. I guess users paying 10$/mo with terabytes of data stored, have massive advantages of both, but it all depends on your usage.

If you are going to store 100GB of photos with Tarsnap, I'd guess it would be close to 25$ as you said. If you just want your photo collection for disaster recovery, you could check out Glacier instead, which is a lot cheaper.

I'll start with my disclaimer that I'm founder of Trovebox and this isn't a sales pitch because we've focused Trovebox for business use.

Great, that's out of the way. Storing and archiving photos & videos is near and dear to my heart [1] and I believe that cloud storage is one piece of answering yes to the question of "will I have my photos in 50 years?". My entire Shuttleworth Fellowship is based on this.

Here's my $.02.

Cost - we haven't yet but have all the pieces to use Amazon Glacier for storage of high resolution originals. That means storing 100 GB will cost you $1 / month. There's additional costs to keep thumbnails in S3 for immediate access -- my estimate is <$3 all inclusive for that 100GB.

Ownership / Portability - a big part of my fellowship is to see how a hosted service (ala Flickr) can provide 100% ownership and portability. The solution is to let users (optionally) bring their own storage. So you can use the Trovebox software connected to your own Glacier and S3 bucket.

Functionality - I think organization, viewing, sharing and archiving should be merged together. Instead of having your sharable photos in one place and your archives in another --- why aren't they combined?

Raw - we support RAW and do conversion to JPEG.

Open Source - Check [2].

Mobile - Check [3] [4] [5] [6].

API - Check [7].

[1] http://www.shuttleworthfoundation.org/fellows/current/jaisen...

[2] https://github.com/photo

[3] https://github.com/photo/mobile-ios

[4] https://github.com/photo/mobile-android

[5] https://itunes.apple.com/us/app/trovebox/id511845345?mt=8

[6] https://play.google.com/store/apps/details?id=com.trovebox.a...

[7] https://trovebox.com/documentation

I work with wedding & portrait photographers who shoot 100GB or more in a single weekend. So, within a couple years they could be looking at $100/mo+. And it just grows from there.

This is the fundamental problem I've always seen with cloud storage for photo/video pros (or hobbyists): they need long-term storage but the bill just keeps growing. Should they be expected to pay $1k/mo after they've been in business 10 years?


On a separate note, how are you doing RAW <-> JPEG conversion on the server?

The bill would keep going up, yes. But that's the nature of any growing collection - even if you're using a drobo at home.

The only hope is that cloud storage goes down over those same 10 years at a rate which makes it continually affordable. But it won't always be a fixed cost since the number of photos keeps going up.

The RAW -> JPEG conversion is done using ufraw [1]. We originally tried extracting the thumbnails so the JPEGs would use conversion settings from the camera but most of the thumbnails are too small to do anything useful with.

[1] http://ufraw.sourceforge.net/

That also assumes we've reached "peak megapixel" ... it seems to have plateaued recently but I'm not convinced a RAW file in 5 years will be about the same size as now.

i did an experiment where my raw & converted jpegs ended up compressed by about 12% overall, if that's any help.

A link to tarsnap on that page would be a big improvement. http://www.tarsnap.com/

That's... rather embarrassing. Usually I remember to linkify the first time I mention Tarsnap in a blog post.

Fixed, thanks!

It seems that may qualify for one of your bug bounties at the $1 tier. Pay the man!

I've awarded bug bounties for typos on the Tarsnap website, but my blog is strictly out of scope. ;-)

No worries! I only noticed because I was actually looking for it.

Question about Tarsnap backup strategies and worst case scenarios.

How would one go about making sure that when a server is compromised, the malicious attacker wouldn't be able to delete all the tarsnap archives for that machine? Since the tarsnap.key is stored on the server itself and that's all you need to delete archives as well. Of course, you're already properly effed when an attacker has root access to the machine, but offsite backups should still be safe imho.

That's why on some of my servers I have a 'pull'-backup strategy in place, where a remote server would connect to the machine to be backupped and pull a backup, so in an event of that the server would be compromised no backups could be deleted. Is this something that can be achieved with Tarsnap as well?

You can use tarsnap-keymgmt (http://www.tarsnap.com/man-tarsnap-keymgmt.1.html) to create a key file with a subset of the keys for a machine.

Exactly. Thanks that will work :)

I don't have much insight into what's profit-maximizing in this market, but rather than "public utility pricing" (though I can also see that analogy) I think of it as more like classic small-business pricing, especially in markets where developing some kind of reputation for fairness is deemed important by the owner. I can't think of a good representative example, but it's so well established I'm pretty sure I've run across examples in 19th-century American novels of this sort of "fair price with a modest profit" ethos.

developing some kind of reputation for fairness is deemed important by the owner

That's certainly part of it. Let's face it, a lot of people use Tarsnap because of my reputation; I'd like to have Tarsnap contribute to my reputation rather than merely taking advantage of it.

I'm pretty sure I've run across examples in 19th-century American novels of this sort of "fair price with a modest profit" ethos.

I guess I really am a bit anachronistic...

cperciva hopes, by strict attention to business, combined with moderate charges, to merit a fair share of patronage and support.

(stolen from tom)

Would tarsnap do well to become a non-profit?

He could still pay himself a fair salary but he'd avoid any tax liability for the business and I guess he could accept donations too?

I don't know much about it, but if your only goal is a fair salary I've always assumed a non profit business structure would make sense? Can someone more knowledgable chime in?

There's no real reason to turn Tarsnap into a non-profit. At least for now, the work is easily handled by him, he still has a passion for the work, and the project isn't large enough to require any "management".

Would there be any tax benefits though?

Doubtful. Companies which don't make a profit don't pay income tax, whether they're non-profit by purpose or non-profit by circumstance.

Err, well not if your goal is to make a profit.

Right but in his post he states that his goal is to make a fair salary. Non profits allow you to pay yourself a fair salary AFAIK.

It's still a fair bit from being the cheapest UNIX backup (that also does client side encryption).

Crashplan, Spideroak and Wuala (for 100GB and under) are a bit cheaper ( http://skeptu.com/tarsnap/100gb ).

It has a nice CLI, however.

A better comparison would be to Cyphertite (https://www.cyphertite.com/), which is somewhere on par security-wise.

CrashPlan and co. are somehow secure, but still not really open about their crypto (and some of them use quite weird one).

But does crashplan does client side decryption? That is the one weak spot in my current backup provider - backblaze. They encrypt and protect your data and when you need to restore it is out in the wild.

They list it as a feature, but I'm not sure I'd trust it, given that their client is closed source.

Why not just run encfs or ecryptfs locally and then backup the encrypted regular files?

Or use duplicity. It even directly supports many cloud-services as backend, including Google Drive ($2 for 100GB, $10 for 1TB) Dropbox, Onedrive, etc. It encrypts, deduplicates, stores old versions, etc.

I'd love to use duplicity, but in the 3rd paragraph of their website is the sentence "Duplicity is still in Beta.".

I'd much rather trust my backups to tarsnap, which is not in beta. Furthermore, and perhaps even more importantly, tarsnap offers support which duplicity does not. When my hard drives decide to die, I really want someone who I can contact if there are any issues restoring from backups.

Check out Duplicati - same idea as duplicity, but in active development, main developer posts to his support group frequently, overall very solid

Since CrashPlan uses blowfish-448-cbc-sha1 (a weird choice) and may (I'm uncertain on this) have ability to push configuration changes from remote, I've considered that possibility, because of CrashPlan's "unlimited" offer.

Unfortunately, it's barely usable due to recovery issues. You can't mount CrashPlan as a filesystem (well, without ton of reverse engineering), so that's not an option unless you're satisfied with all-or-nothing recovery without possibility to pick and restore just certain files of interest.

A middle ground there that would work for restore of particular files is to use ecryptfs/encfs without encrypting filenames. I think ecryptfs at least knows how to do this. Then you can download the file with the proper name, but the contents are encrypted and decrypt them locally.

You could probably also hack up something to figure out locally what encrypted filename corresponds to what regular filename and go fish for the encrypted filename in CrashPlan. It will probably be clunky though.

You can absolutely "pick and restore just certain files of interest" when restoring with CrashPlan.

That's if you only rely on CrashPlan-provided encryption, which is certainly not a cream of the crop.

We were talking about eCryptfs/encFS-encrypted copy, where filenames are (usually) encrypted. That means navigating around names like `l00Dqf,A49VqDd8AveLMrbBE` or `qR,bmE-73cA2H6wOxZxlKSwD`.

how do u share a file encrypted using encfs with someone else?


And if you don't want some govt to take down a single provider like what happened with Lavabit email vis-a-vis backup, go with a distribute (multi-provider) solution like Tahoe LAFS


The proxy is run locally and all encryption happens locally, so it's end-to-end encrypted.

I love tarsnap. Thank you Colin! :)

  $ tarsnap --print-stats
                                       Total size    Compressed size
  All archives                               535 GB           233 GB
    (unique data)                            1.2 GB           438 MB

What is "unique data"? Does it do de-duplication across users? Are you paying only for unique compressed data?


Deduplication across users isn't possible due to encryption, but it is done locally on your own data, so it is true that you only pay for unique and compressed data.

> This will no doubt annoy my friends Patrick McKenzie and Thomas Ptacek, who for years have been telling me that I should raise Tarsnap's prices. But while Thomas accuses me of running Tarsnap like a public utility rather than a business, and thinks this is a characteristic to be avoided, I see this as a high compliment (...)

That's really nice. I'd rather have fair public utility rather than business for the sake of concentrating money.

I won't think twice about it the day I need to sign-up for tarsnap. That is

I wonder how sustainable it is though. cperciva could make a boatload of money doing something else for someone else. Right now maybe that's not an attractive option, but perhaps at some point he'll, say, marry, have kids, and want some more cash and financial security.

I did say that I'm paying myself what I consider to be a reasonable salary.

perhaps at some point he'll, say, marry, have kids, and want some more cash and financial security.

I am currently single, but if/when I find the right girl I don't expect finances to be the limiting factor.

Could you replace yourself in the business for what you're paying yourself?

Subject to the usual caveats about no person ever being a perfect replacement for another person: Yes.

So to a first approximation you're valuing the business at 0. You're generating enough free cash flow to pay the salary needed to keep it running but no more. If you wanted to delegate that responsibility you, as the owner, couldn't get anything out of the business as the free cash flow would be 0.

Nothing wrong with that really, but if you really want to strictly price it as a utility there needs to be some form of return on equity. Even amazon generates some free cash flow, they've just been reinvesting it all so not generating any income.

He said he's in a good financial situation. He did not say he paid himself the lowest possible amount of salary he could live off, and let the business run without any safety buffer of money.

I'm just going to assume that Colin is smart enough that he knows how to price his service. After all, he's the only one with the data to really know tarsnaps financial situation.

Sorry for the poor wording and horrible grammar.

May be a bit OT but is there any way to create an archive locally and see the size of it before paying to figure out how much space i'd need?

Been looking at tarsnap for a while but never gotten around to actually try it and my quick naive calculations for backing up my ~ makes it sound too expensive for me even though it's probably the best alternative i've seen so far.

Using options --dry-run and --print-stats would probably give you this information

''Don't really create an archive; just simulate doing so. The list of paths added to an archive (if the -v option is used) and statistics printed (if the --print-stats option is used) will be identical to if tarsnap is run without the --dry-run option.''


Missed the dry-run option but it still seems to require a key which requires you to pay. Talked with some on irc and it seems it's not possible at the moment. Someone mentioned gzip providing a good enough estimate on the compression without taking the de-duplication into account.

Gzip would provide a fine-enough approximation, since both that and tarsnap use zlib. Tarsnap also has deduplication, so equal (or almost-equal) files will take up much less space.

is there a way to check the size of your backup without having an account?

We love tarsnap, easiest and most secure way to offsite our database dumps. No thrills, no beautiful design, just solid software written by cperciva who is a crypto wizard.

Where do I sign up for the Patrick McKenzie surcharge?

Send me an email. But I'll need to ask Patrick how much of a surcharge I should add -- or wait for the blog post it sounds like he's going to write (https://twitter.com/patio11/status/451262179858083840).

that would be interesting to read, I guess the slightly less generous/fair option would probably be

"we are thus launching a new price layer at .25, new users will be automatically enrolled in this one and you can switch by doing X"

This way people who want to actually donate more, or do not have enough of a preference would still give you more money.

That's a bit too much like a loyalty penalty for my taste: the default option makes things better for new users than for established customers.

(Of course loyalty penalties are commonplace -- from phone companies, car insurers, etc. -- but my mental model of cperciva says he probably doesn't like them.)

my mental model of cperciva says he probably doesn't like them

Exactly true. I was looking at that comment trying to put my finger on why I hated the idea so much, and you're right: It's because it would be a loyalty penalty. I want to be fair to everyone.

As a customer of about 2 years now, it is much appreciated. I feel much better doing business with corporations when I get the feeling that they do their business fairly. Tarsnap succeeds at this.

Tarsnap is excellent for mission-critical data.

But for most personal uses, a Glacier-backed variant of tarsnap would be extremely appealing.

You really want something that is orders of magnitude slower and several times as expensive?


Even excluding all the details of tarsnap's design, using glacier for backups has always seemed totally wrong-headed to me for a simple reason: it disincentivizes checking up on your backups, which is a key part of doing backups. No point waiting until you lose your data to realize that your backups had developed a glitch.

Add any minimal restoring cost and glacier costs about the same as regular storage -- with a lot less convenience. I honestly can't think of a use case for something like glacier, where storage is cheap but reading/writing is expensive.

I would personally be totally fine with a backup solution which is orders of magnitude slower and several times as expensive in the rare restore case if it meant that the typical "write only for long periods of time" scenario was significantly cheaper. My personal backups are not something I'd ever need to restore in a hurry.

The one big downside to the "cheap backup, expensive restore" approach is that it discourages testing your restore.

Thanks for that, using 'tarsnap' on a FreeBSD and a couple of RPi clients, works great! Always nice to get a price cut!

PS. Is there any future plans to make restore a little bit faster? It's been a while since I restored some files, but the process was so slow that I thought there was a connection problem, then I googled and found out that it's okay to be slow :-)

Improving extract performance is my big project right now.

I'm using tarsnap in combination with https://github.com/Gestas/Tarsnap-generations for rolling backups.

Are there any better tools out there to schedule backups and purge old backups? I'd be happy with a simple tool which just keeps 7 daily backups and that's it.

Does anyone know if heroku will cut their prices after the amazon aws price cut?

I have a question for Colin.

What's the second-most-critical bug Tarsnap has ever experienced? Just curious.

There was a bug in Tarsnap 1.0.20 (February 2009) which had a ~ 0.2% chance of causing silent data corruption if people were using the new --checkpoint-bytes option. (The bug actually had two possible consequences, silent data corruption and exiting with an error, and my analysis showed that the second was about 500x more likely to occur first.)

Fortunately this was a new feature and I was able to identify which people had used it (since it relies on server-side functionality and I had good logs) so I could email all the potentially-affected users to warn them.

What was the cause?

Uhh... let me look back at that code. My memory isn't what it used to be... right, now I remember. Sort of.

So, Tarsnap uses a "chunkification cache" to speed up archiving; when it chews through a file and splits it into chunks, it records "file X was N bytes, had inode #I, and was last modified at time T, and here's the chunks it was split into". The next time it sees the file, it starts by stat()ing the file and if those parameters match it reuses the chunk list rather than reading the file from disk and splitting it into chunks again.

With checkpointing enabled, if a checkpoint occurred in the middle of a file a truncated entry (to be specific, the chunk list for the portion of the file which had been processed) would be stored in the chunkification cache. If a later tarsnap process read that it was possible that it would archive the file wrong (I think it would be truncated, but I'm not absolutely certain right now). The fix was simply to not use an entry from the chunkification cache if it wasn't internally consistent.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact