

Tarsnap public beta - cperciva
http://www.daemonology.net/blog/2008-11-10-tarsnap-public-beta.html

======
tptacek
Too much crypto, not enough value prop! Everyone believes you on the crypto
stuff already, Colin.

Move out of privacy --- privacy is a dead end, nobody has ever capitalized on
it successfully (see: the original PGP, Zero Knowledge Systems, that stupid
data center in the pylon off the coast of England).

Rewrite your content to focus on shared hosting. Emphasize that _you don't
keep the keys_ to the backup data. Consider an OEM deal with a Slicehost-like
provider. Backup services at hosting providers fucking blow.

People won't pay much for privacy, but businesses will pay a decent premium
over basic hosting for CYA security.

Take your crypto content off the front page; ironically, I think you're
_decreasing_ its value by fronting it like this. Instead, _brand it_ : 100%
audit-ready, fully transparent, something like that: "the only secure backup
provider with [blah blah blah]".

I should have to click to see that stuff. For normal users, they should just
see the stamp of assurance and a page they can click to with lots of docs.

I'm a lot less concerned about the clientside security here and more concerned
with the serverside --- even though I know intellectually it doesn't matter
much, because the security has been factored out to the client. That could
mean one of many things, including these two:

(1) People are never going to get over their concern about serverside security
and you're going to have to do so something to demonstrate that the server
code and associated web apps are secure.

(2) You're not doing enough with your pitch to emphasize that your design
moves security off the server and onto the clients, where the customers
control it.

How do you do dynamic updates? Spend less time sniping at SSL, because SSL is
still the weak link in this system.

~~~
cperciva
Keep in mind that this is my blog, not the tarsnap website itself. The tarsnap
website -- once it's more than a placeholder -- won't have technical details
about the cryptography, but will instead (as you suggest) have it on a
separate page which only the interested people will go and visit. My blog,
however, is the blog of someone who regularly critiques other people's
security (and complains about companies which don't provide details about
their security), so people would start asking questions if I avoided the
question of security in a blog post about tarsnap. :-)

Server side security -- you're right, I should probably talk about that more.
The cryptographer in me says that there's no need to talk about it because
it's irrelevant... but people tend to care about irrelevant things anyway.

What do you mean by "do dynamic updates"?

~~~
tptacek
Start figuring out how you're going to talk about this now, and use the blog
to practice. This killed our first product too.

I'm not saying talk up serverside security; I'm saying, after allllllll that
content about crypto algorithms, I'm _still_ left wondering whether I can
trust my data to your app. Most everyone else who talks about secure backup is
going to leave it at "SSL, plus we encrypt the data on our servers". Your #1
feature is that you do the opposite, but all I see is RSA OAEP.

Also, I'm not saying downplay the crypto; I'm saying, reverse pyramid style,
find a way to come up with a brand-y feature-y way to summarize (a) custom
crypto, (b) open, auditable design, (c) freely available source code. Then use
the crypto details to back that up (on a seperate page).

Dynamic updates: how do you update the tarsnap clientside software?

Consider a name other than tarsnap, too.

~~~
cperciva
_Dynamic updates: how do you update the tarsnap clientside software?_

I don't. Users do (by downloading the latest version, extracting the tarball,
compiling, and installing).

Actually, you're right that I depend on SSL for this at the moment -- but I
will be adding GPG signatures for the tarballs soon.

 _Consider a name other than tarsnap, too._

What's wrong with that name? In all seriousness, the people who would be
turned off by that name are also people who would be turned off by the
command-line interface.

~~~
tptacek
_What's wrong with that name? In all seriousness, the people who would be
turned off by that name are also people who would be turned off by the
command-line interface._

Yeah, and, about that...

~~~
tdavis
I think you make some VERY good points, but I'm unsure if you and the creator
of tarsnap are on the same page regarding the audience / customer base for
this service. I like the name because it makes sense to me and it's easy to
understand. Of course the management at Big Co. probably have no idea what
"tarsnap" means just by looking at it. That could matter, or not, depending on
the author's target audience.

Also, from your example, I've crafted the perfect position statement: _For
summer camps that need surface mount soldering, Colin Percival offers tarsnap.
Unlike carbonite, tarsnap provides puppy mincing that allows users to make
better tacos_

After reading that, I'm buying tomorrow.

~~~
tptacek
There's a pretty well recognized trap in the security software space; call it
the "IDA Pro Effect". IDA is the industry standard disassembler, easily the
most popular commercial reverse engineering tool, certainly licensed by every
major security research team. It costs ~$500.

This is a problem: successfully cornering the market for disassemblers and
selling to ever security research team wins Hex-Rays literally _thousands_ of
dollars!

If you're going to craft a product pitch that appeals to the very high end of
the market --- those comfortable with per-host command-line backup who care
very intensely about security --- you may need to price way higher than where
Colin is today. Of course, then he'll find himself in competition with
homebrew PGP+S3 scripts, just like IDA would quickly be replaced by a web UI
on "objdump" if Ilfak jacked the price up to $20,000 (which is kind of where
it belongs, value-wise).

I'm also wary of the history of "premium" command-line backup tools; remember
Bru?

~~~
tdavis
I don't disagree one bit. I'm not entirely convinced it will be successful,
considering it is a pretty niche product as far as backup products go and the
security value-add isn't really priced according to what very security-
conscious people/organizations would spend on the solution. All I was really
pointing out is that Colin and you may have different ideas of who the
audience here is, in which case some of your advice isn't relevant (yet,
anyway); if it's going to be marketed to people who <3 cli then tarsnap is a
perfectly good name. Otherwise, something _awesome_ like "Fort Knox Backup
Vault 2008" may be more appropriate ;)

~~~
tptacek
For the record: I'm very bullish on Colin's idea; I just don't think he has
the positioning and packaging worked out.

Unless he's not out to make money. Some people aren't. He's a FreeBSD dev,
after all. Sucker. :P

~~~
cperciva
Don't worry, I'm in this to make money. :-)

I started with *nix and a command-line interface for two reasons: 1. This is
what I know, and 2. a CLI can be far more easily used as a component of a
larger system.

I'm following a fairly standard startup course here: Build something people
want, then build something more people want. I expect a GUI to happen in the
future (probably using Qt). I expect a Windows version to happen in the
future. I expect tarsnap to be included in web server management interfaces in
the future. The name might change in the future, or "tarsnap" might just end
up being one of the names like "google" which start out looking dumb but end
up sounding normal.

But that's all "easy" stuff -- the place to start is with a solid and secure
foundation, which is what I'm doing.

------
paul
Looks nice. How fast is it? I wonder if you could make this into a simple off-
site backup system for databases. It might require some more complex
interaction with the db or os to do it right though (since the data is
changing while you are reading it).

~~~
cperciva
_How fast is it?_

How fast is your internet connection? On a decent CPU the tarsnap code can
easily push 50 Mbps -- the largest part of its time is spent in zlib,
compressing data before it is encrypted and uploaded -- so the limiting factor
is almost certainly going to be bandwidth.

 _I wonder if you could make this into a simple off-site backup system for
databases._

Tarsnap isn't likely to be useful for "real time" replication, but if you get
your database to write a transaction log you might get interesting results by
having tarsnap loop creating archives containing that one file.

~~~
paul
I was thinking of the snapshot speed, where most of the data hasn't changed.
Do you keep a local cache of block checksums or something to avoid having to
check the server? How about an option to not examine the data if the file size
and mtime haven't changed? (like rsync does)

~~~
cperciva
_I was thinking of the snapshot speed, where most of the data hasn't changed._

On modern CPUs, somewhere in the 20-50 MB/s range.

 _Do you keep a local cache of block checksums or something to avoid having to
check the server?_

Yes.

 _How about an option to not examine the data if the file size and mtime
haven't changed? (like rsync does)_

Tarsnap already does this for unmodified (file name, inode number, file size,
mtime) tuples.

------
aston
In case people aren't paying attention, this is well-respected news.yc-er
cperciva's big project. Looking pretty good, Colin!

 _edited for clarity_

~~~
cperciva
I'm well-respected now? When did that happen?

~~~
abstractbill
_When did that happen?_

481 days ago. <http://news.ycombinator.com/item?id=35083>

~~~
sanj
I simultaneously feel proud and shamed for making that happen!

No hard feelings, I hope.

~~~
cperciva
No hard feelings. :-)

More than anything else, I'm amused that people here care so much about the
Putnam -- I personally consider winning the Putnam to be a relatively minor
accomplishment.

~~~
dfranke
Those of us who pride ourselves at merely having gotten a non-zero score on
the Putnam think otherwise :-)

------
naish
Sorry to hear that you aren't up to dealing with the hassles imposed by the
Canadian tax systems. Perhaps you can offer a limited storage/bandwidth
account free of charge to fellow Canucks. :) There aren't that many of us...

~~~
Andys
Idle thought: Here in Australia, you don't have to deal with sales tax until
you've reached about $50,000 per annum income. I always thought this limit
should be much higher, say $200,000, since it is in society's interest for
small businesses to keep growing to that stage anyway.

~~~
cperciva
For Canadian sales tax (GST) there's a lower limit before people have to
collect. For British Columbia sales tax (PST) there isn't any such limit.

Rather than allowing in Canadians but worrying about making sure that I
figured out how to deal with GST before I hit the limit, while still keeping
BCers out, I figured that it was simpler just to keep all Canadians out.

------
cperciva
Another small step towards launch: After many months in private free beta and
a bit over a month in private paid beta, tarsnap has now moved into public
beta.

------
jfarmer
You're free to ignore me, but what I see is someone really impressed with
their technical prowess and the engineering that went into this software.

It looks like cool technology, but the packaging is all wrong. A command line
tool?

I'd build a seamless front-end a la JungleDisk and bill it as an online safe
deposit box -- nobody but you has the key.

Just my $0.02.

------
bayareaguy
Sounds interesting, this could replace a scheme I use to store files in AWS.
However if you're using AWS under the covers I'd rather use my own AWS account
via dev pay instead of paypal and I'd rather grant your service permission to
store stuff in my own bucket.

~~~
cperciva
I am using S3 under the covers, but they are very thick covers. It's not
feasible to separate out individual users' data to store it in different
buckets via devpay. (Also, I'm in Canada, where FPS and devpay aren't
supported yet -- as soon as they are, I'll be accepting payments via FPS.)

------
Harkins
This is like JungleDisk but at twice the storage price and lacking the GUI or
cross-platform clients.

The per-transfer and per-storage fee sounds like tarsnap also stores back to
s3.

~~~
Harkins
To be slightly more constructive:

You have at least one entrenched, successful competitor with lower prices and
more features. This proves there's a market, but it means you're also going to
have to be good at marketing to succeed. Especially because the product is
something people are very risk-averse about and the switching cost of re-
uploading everything is a big pain.

You might enjoy the book The Knack: How Street-Smart Entrepreneurs Learn to
Handle Whatever Comes Up by Norm Brodsky and Bo Burlingham. There's a lot of
war stories from author Brodsky about his document-storage business (a similar
market to yours) and how he differentiated it from competitors.

------
callmeed
Ok, so I currently use cron + s3sync + s3 buckets to backup servers/sites.

How would this differ (other than the price)?

~~~
cperciva
1\. Security.

2\. Snapshotted backups. With tarsnap you can take a backup every day, but
only pay for each unique block of data once -- which is useful if you don't
want your nightly cron job to "synchronize" the fact that you just
accidentally deleted some files to S3.

3\. I don't know all the details of how s3sync works, but how efficient is it
at handling files which have had small changes made to them? (Or, in the case
of log files, data appended to them?) Tarsnap doesn't have to re-send the
entire file.

4\. Does s3sync compress data before storing it? Tarsnap does (and you pay for
the bandwidth and storage used, post-compression).

5\. When looking at the price, keep in mind that S3 has a per-request fee
which can get pretty expensive if your average file size is small.

~~~
wmf
_Snapshotted backups_

While we're giving marketing advice, shouldn't that be "source-based de-dupe"?
:-)

~~~
cperciva
Yes, but I absolutely HATE the word "de-dupe", and I've been hoping that if I
ignore it for long enough, it will go away. :-)

