The single reason (other than, er, fecklessness on my part) why I haven't tried Tarsnap is that it seems difficult to predict how much using Tarsnap would actually cost me. I would need to know (1) how much space my data will take up after compression and deduplication, and (2) how much bandwidth my incremental backups will need -- again, after compression and deduplication.
Wouldn't It Be Nice If there were a "predict my Tarsnap costs" tool? You download it and point it at your data. It compresses it and identifies duplicated blocks, and says "You would initially be storing about 25 GB of data on the Tarsnap servers, which will cost you about $6.70 per month." It records a bunch of hashes. Then, a little later, you run it again. It identifies changed blocks and compresses the differences (or something), and says "If you do an incremental update like this, you will transfer about 2GB of data, which will cost you about $0.54." With, each time, a disclaimer: "This is only a crude estimate and if you think Tarsnap, Inc., will be in any way bound by it then you're out of your mind."
I'm not sure whether this is a thing that any random third-party person could write, or whether getting the numbers right would require information about exactly what happens on the Tarsnap servers that only Colin has. Perhaps the answer is that doing it right requires secret information, but doing it well enough (e.g., getting within 20%, 95% of the time) is easy with some naive algorithm like "divide everything into 4kB blocks, hash them all, identify duplicates, compress individual unique blocks with gzip; do the same for incremental updates but ignore blocks whose hash hasn't changed".
I'm curious: is this a factor stopping other people signing up with Tarsnap? Is it perhaps only a factor for cheapskate individuals like me, and not for the larger organizations that probably represent most of Tarsnap's profits?
Valid point. You can use tarsnap's --dry-run option (along with --print-stats) to find out how much space your data will take up; unfortunately you need to have a key file before you can run that, and you can't create a key file until you've created a tarsnap account and added money to it.
I will be adding a mechanism to allow people to run "keyless" dry runs so that you can install tarsnap and find out how much space your data would take before you spend any money.
Would I be right in thinking that, if the storage cost is small enough not to bother me, then the bandwidth costs are extremely unlikely to be a problem?
It sounds like you are a programmer. At an hourly rate, it probably cost you more just to write that comment.
I think the way it actually works, what would happen is that after my first backup completes my account would go overdrawn and I would lose access to Tarsnap. And then I could walk away and all I'd actually lose would be whatever I'd prepaid. But -- and this is surely irrational, but I bet it isn't only my brain that works this way -- I would then feel really bad because (1) I would be failing to pay for something I had bought and (2) I would have spent money and got nothing in return. Even though (1) Colin surely budgets for a certain amount of attrition and (2) the sum of money involved needn't be bigger than $5.
I don't want to do something that seems like it has a substantial chance of leaving me feeling simultaneously guilty and wasteful.
Right now I'm paying CrashPlan about $60/year. My best guess at Tarsnap costs is about 10x that, $600/year.
Normally I'm a very privacy-conscious person, but that difference does give me pause. I could probably slim down what I do with Crashplan to only personal files rather than the whole disk, but why would I want to do that? The whole point is to "fire and forget", and restore painlessly.
Perhaps Tarsnap makes sense if you're Stripe, with lots of sensitive data to store, easily partitioned from your other data. But if my calculations are correct, probably not if you're an average individual.
Consider: Your current approach is basically to model the problem, evaluate the model, and then decide whether to use the service. That approach seems to protect you from risk, but it really doesn't. As you've noted, there isn't an existing model, which makes the modeling-first approach too expensive. As a result, you don't have backups, which is riskier that one or two months of abnormally expensive tarsnap service.
Your deciding factor is probably the long-term cost. You can ballpark that by paying only a month or two and extrapolating. That should relatively cheap in absolute terms. Most numbers are not outliers, so you're unlikely to be very surprised. You would probably know already if you're atypical in some way. Even if it turns out you'd have an abnormally high cost, figuring that out by evaluating the service will be cheaper that building a model.
Perhaps you're worried that the costs are fine right now but will grow too high over time. If so, that'll happen relatively slowly, over the course of a month or two at the fastest. At worst, you'll have to delete some backups or switch to a difference service. That might be annoying, but it won't be overly expensive and it's unlikely to even happen. If it does, at least you will have had affordable backups in the meantime. Right now, you're at risk.
"Unlimited" backups at $5/month/computer. Install their proprietary agent on your Windows/Mac OS X machine, log in, and forget about it. No Linux support, though.
In my experience, deduplication typically doesn't do much for the first backup. Further backups take approximately as much space as usual incremental backups. So the answer to this is "as much space as your existing backup".
> (2) how much bandwidth my incremental backups will need
I recently looked into this and for me the cost of bandwidth used for daily backups is about 10-15% of the cost of data storage.
I think the ability to do a dry-run upload without prepaying (which Colin has already said is on the way) will be pretty much a complete solution.
But honestly, it is what truly stops me from using Tarsnap. The part that stops me is the fact I have a dev server at home I have on 24/7 and I can just wait 9-10 hours for it to pull gpg'd backups.
No homepage, No installer, not even a screenshot. "There exists code you can compile. Someplace. QED."
Congratulations for staying true to form!
But I would be interested to know more of what (if any) the commercial relationship is?
Tarsnap Backup Inc. funded this development, and is continuing to fund work on this.
Nothing came of it because he is sticking to his licensing, which would mean telling users to first install XCode before installing the GUI. It didn't seem like that would be a winner, so I moved on to other things.
I greatly admire that he stuck to his position.
It'd take some dev work to set it all up, but could be transparent for the user.
Right, that sounds very much like the "age-old tradition of UNIX/Linux GUI".
Oh, the installer would obviously check the source archive against a bundled copy of the current Tarsnap GPG key before compiling. I just wasn't detailing implementation details.
Speaking of broadening Tarsnap's user base: Binary packages
(for both tarsnap and this GUI) will happen at some point --
not that I recommend relying on binaries (since you lose the
ability to audit the code), but in keeping with the UNIX/X11
philosophy of "tools, not policy" I want to allow users to
decide the tradeoff between paranoia and ease of use for
I bring fresh news from the Tarsnap pit.
I have been working hard for the last 6 months on a desktop application
frontend for the awesome Tarsnap service. Most of you are using Tarsnap as
it was designed, from the command line, usually on the server side and in
scripts, however some people, like me, feel the need to benefit from the same
Tarsnap juice from the comfort of the desktop too, with ease, for common tasks
and swift backups. Another important aspect is that it is so easy to create
complex and custom backup schemes using the tarsnap command line utilities,
that are adhering to the Unix philosophy and thus can easily be used like an
API, that I genuinely found an opportunity for creating a backup application
that I would be the first user of and would put my lack of patience, trust
and overall pessimism regarding existing solutions at rest.
This is where I introduce Tarsnap for the desktop, a cross-platform,
open source (BSD 2 clause) modern desktop application acting as a wrapper
around the Tarsnap command line utilities, written in C++ and using the
Qt 5 framework. You need to install the command line Tarsnap client before
you can use the application. Given that Tarsnap doesn't provide any binary
redistributables for the CLI utilities on any platform at the moment,
there's none for this desktop app either. This might be subject to change
in the future.
The project page and code is hosted at your favorite host, Github:
To get started all you have to do is: $git clone
https://github.com/Tarsnap/tarsnap-gui.git && cd tarsnap-gui $git
checkout v0.5 $less README
The application currently has 3 main usage patterns:
1. The Backup tab allows you to quickly backup files and directories in a
single shot fashion; 2. The Archives tab lists all of the archives that have
been created using the current machine key. You can inspect, restore and
delete archives from this view; 3. The Jobs tab. A job is a predefined set
of directories and files, as well as backup preferences, that you know are
going to be backed up regularly; These are persistent (in a local Sqlite DB)
and you can attend to them whenever you wish afterwards;
The other tabs are Settings and Help, which hopefully are self explanatory. See
the distribution files README, CHANGELOG and COPYING for information on
The current version is 0.5 and is considered beta until otherwise
announced. There are rough edges around the corners and lots more ground to
cover when it comes to functionality and Tarsnap options breadth and depth
coverage. All development will now take place in the open, thus I'd like to
start the conversation here and encourage contribution and review on GitHub.
Read this Wiki page and the announcement summary on my
blog for more details and some beautiful screenshots:
To conclude, if you're a lazy desktop user like me, you're a perfect fit,
be an early adopter and start using it now so we can get it further. :-)
I do advise you to create a new key for this desktop session, it's generally
best practice and a safe-guard given that the application is still in beta.
There's lots of implementations of the idea, I don't remember if I copied it from somewhere or not.
Apart from that the service with the gui looks quite promising to me and is totally worth a try.
Yes, well, err... cough.
I've been spending most of my time for the past two years managing the server side of things, but there will be a new release soon and hopefully more active client development after that. Part of the reason this GUI is on github is that I'm in the process of moving my local client code tree up there (aka. going through the fixes I've made over the past two years and filling in commit messages).
I recall that there was this one GUI tool for OSX but I was never able to get it to work correctly.
> At the present time, Tarsnap does not have a graphical user interface.
Might want to update (and add some screenshots). And maybe make the price calculator live.
Bluecoat blocks the strangest things sometimes but this was amusing to me because I work at a "cloud hosting provider" and essentially tarsnap could be seen as a competitor to some services that we sell, but we don't focus solely on *nix systems.