
Online Backups for the Truly Paranoid  - prakash
http://caffeinatedcode.com/wsup/cafe/entry/tarsnap_online_backups_for_the
======
JoachimSchipper
The "blog" of Colin Percival (cperciva), the author of Tarsnap, is also very
much worth a read. Fortuitously, he's just posted a convenient overview of the
most important articles: [http://www.daemonology.net/blog/2009-10-15-100-blog-
posts.ht...](http://www.daemonology.net/blog/2009-10-15-100-blog-posts.html).

------
smanek
I love tarsnap (I'm using it in production now). But, I wish it were a little
easier to use ...

For example, each of my servers creates a backup named something of the form
'machineName-epochTime' every night. I wish there was some built in way to
delete all but the last N backups. I ended up writing a small Python script to
take care of it, but it relies on text-munging, etc and seems brittle.

Something that would automate a 'grandfather, father, son' style rotation
scheme would be appreciated too.

In principle, it isn't too hard for me to script any of this functionality -
but maybe I'll make a mistake in the portion of my code that handles my
Tarsnap key, or the portion that deletes backups. I'd rather pay someone
smarter than me (e.g., cpercival) to specialize on making my backups work.
That is, after all, the basic premise behind the Tarsnap business model.

~~~
cperciva
Don't worry, providing such a script is on my to-do list. :-)

~~~
smanek
Excellent, thanks.

Overall, a great product that I'm happy with (and have been recommending to my
friends).

------
joechung
Why stop at geo-redundancy? Let's set up an online backup store on the moon
and Mars, too - [http://domaingang.com/short-news/nasa-secures-moon-and-
mars-...](http://domaingang.com/short-news/nasa-secures-moon-and-mars-tlds/)

------
c1sc0
Would tarsnap be a solution for long-term archival of logfile data? I'm
working on a data mining project of the "Let's store everything & figure out
what we do with it later" type. My servers generate about 2GB of data (zipped)
every day. We plan to store an 'analysis' dataset of the last 3 months on S3
and run a batch of Hadoop/Pig/MapReduce jobs every night on EC2.

My question: what would be the most cost-efficient long-term archival solution
(I can live with slow access-times) of Apache logs? Does tarnsap offer any
benefit here? Are there any compression solutions specific for Apache logs?
Other ideas?

~~~
JoachimSchipper
Two points.

First, this is not _that_ much data (~180GB). Is there a particular reason not
to just throw it on a hard disk on some machine that doesn't do too much
during the night and write a trivial Perl script?

Secondly, (g)zip may not the best solution here. A quick unscientific test on
~3MB of Apache log data (in the default Common Logfile format): gzip or zip
produce ~240KB of data, xz (formerly lzma) gets it down to ~80KB (using -9e)
or ~96KB (using the default option).

In my quick unscientific test, xz can decompress data about half as quickly as
gzip and about ten times faster than bzip2. It's very likely able to keep up
with your disk.

------
neilk
Rsync.net has offered similar services for quite a while now. It's a little
more product-ized, they have more options for service, and they guarantee you
can talk to an engineer at any time.

It's a little more expensive than Dr. Percival -- or maybe not, depending on
your access patterns and volume discounts. But I'm a bit leery of trusting an
organization that is just one guy who already has a day job.

<http://rsync.net/>

~~~
cperciva
_Rsync.net has offered similar services for quite a while now._

rsync.net is a great service, but I wouldn't say that it was similar to
Tarsnap. If you backup your data to rsync.net, they can access it.

 _I'm a bit leery of trusting an organization that is just one guy who already
has a day job._

Erm... what's my day job? Other than Tarsnap, that is?

~~~
neilk
> Erm... what's my day job?

Oh, I'm sorry, I made a silly assumption. From the "Dr." handle, and the
FreeBSD contributions, I assumed you were an academic who had a sort of side
business going.

~~~
cperciva
The doctorate just means that I spent years at a university in the past, not
that I'm still at a university. :-)

I am still very academic-minded, and serve my alma mater in a voluntary
capacity on a few committees, but Tarsnap is absolutely a real business and is
what I spend the vast majority of my time on.

------
DanBlake
I love it.

Actually, I went looking for this exact product but for windows earlier. Sort
of like R1Soft, but for windows + encrypted with a key only I know. (I dont
trust mozy/carbonite/etc.. )

Anyone know of anything?

~~~
m_eiman
Since the code is included, it might be possible to compile and run it through
Cygwin?

~~~
cperciva
Yes, several people are using Tarsnap via Cygwin. It's not something I
recommend to the general public, but I imagine the readership of Hacker News
wouldn't have any difficulty with this.

~~~
jhancock
Is it possible for you to provide a pre-compiled client for windows without
the cygwin install?

~~~
cperciva
Pre-compiled clients are on my to-do list. Obviously I want to do this is a
systematic manner so that my release process is more repeatable than "find
boxes running the following operating systems: ... and borrow them for a few
minutes to build a binary".

I'm not sure if it's possible to build binaries using cygwin which will then
run without having cygwin installed. If not, this would turn into "port
tarsnap to Windows", which is also on my to-do list, but much lower down.

~~~
0wned
Try mingw and msys. You can compile static, stand-alone C or C++ code for
distribution.

~~~
cperciva
Thanks, I'll be looking into that. Does mingw provide a full POSIX API layer?

------
bravura
Seriously, I know we're all hackers here, but do we really need to price
things in "picodollars per byte" ?

~~~
cperciva
Need to? Nope. Want to? Sure.

------
Hexstream
Note that tarsnap is not available for Canada at this time :/

~~~
lpgauth
Why are citizens and residents of Canada not allowed to use tarsnap? What do
you have against Canucks?

I don't have anything against Canadians — in fact, I am one. I do have
something against sales tax. Dealing with federal and provincial sales taxes
would not only mean dealing with extra paperwork; it would also mean figuring
out whether the government considers me to be selling software (the tarsnap
client code), providing a service within Canada (since I'm resident here), or
providing a service outside of Canada (since the tarsnap service is provided
via hardware in the US) -- not to mention questions like whether tarsnap is
"data warehousing", "data processing", "telecommunications", or something else
entirely. How tarsnap is classified would determine if I'd have to charge
sales tax, how much, and to whom — and I'm guessing that those answers are
different between federal and provincial taxes, too. From what I've read about
sales taxes I'm reasonably confident about one thing, however: I don't need to
charge sales tax to non-Canadians.

So for the moment, I'm taking the path of least resistance: Don't allow
Canadians to use tarsnap, and spend my time writing code instead of trying to
figure out how complicated sales tax laws, which were written by people who
never imagined the internet or online services, apply to tarsnap.

~~~
mcav
(that was quoted from the tarsnap website)

~~~
lpgauth
Yes, I should of probably cited where this came from. This is from the Tarsnap
FAQ. (I'm not affiliated in any way.)

