
Why Tarsnap won't use DynamoDB - cperciva
http://www.daemonology.net/blog/2012-01-23-why-tarsnap-wont-use-dynamodb.html
======
drewcrawford
> $44 per month. That's 14.6% of Tarsnap's gross revenues;

Hi Cpervia,

Please, please, __please __, charge more for Tarsnap.

Unbeknownst to you, I used you as an example in a blog post a year ago:

> If cpervia tripled the cost of tarsnap tomorrow I would not bat an eyelash.

~~~
cperciva
Since your quote missed some important context: That's $44/month per TB of
data stored on Tarsnap. Tarsnap's gross revenues are considerably more than
$300/month. :-)

> Please, please, please, charge more for Tarsnap.

Funny thing is, I get lots of people (especially large users) saying exactly
the opposite.

The fact that there are lots of people who think I'm very wrong in both
directions suggests to me that Tarsnap's pricing is about right.

~~~
cletus
> The fact that there are lots of people who think I'm very wrong in both
> directions suggests to me that Tarsnap's pricing is about right.

I have no opinion on Tarsnap's pricing but I will say this: there will
_always_ be people who think your server/app/whatever should be cheaper, even
when it's _free_ (then they'll complain about the support, uptime or
whatever).

What's more, the people who aren't willing to pay anything typically make the
worst customers. Instead of being grateful for what they're getting, in my
experience they tend to consume far more support "bandwidth".

You might be doing yourself a favour by cutting such people off (effectively).

~~~
StavrosK
I don't know, SpiderOak costs $100/yr and I can use up to 100 GB. In
comparison, tarsnap would cost at least $130/yr for the 36 GB I use now.

It's not about cheapskating, it's about competition.

~~~
rlpowell
As a large customer considering moving to tarsnap, there is one area in which
they have no functional competitors I have been able to find: scriptable (i.e.
command line based) remote backup with decent deduplication. All of the
cheaper options I've found that have deduplication support are GUI based; we
simply have too many VMs, and they change to often, for that to be tenable for
us at all.

~~~
StavrosK
I'm assuming you mean deduplication among your own data, in which case,
wouldn't compression take care of it? I haven't used tarsnap, what's the
benefit over duplicity?

EDIT: Oh, you mean deduplication of files between separate backup sets? That
is a nice feature, true.

~~~
rlpowell
It is, and tarsnap does it really, _REALLY_ well. I'm backing up what is, on
disk, umm... (checks emails) about 15GiB, and I'm currently spending about 8
cents a day. Now if he'd just implement de-dupe across machines on a given
account... :)

~~~
vertex-four
For the record, rsync.net claims to be able to do data deduplication on their
Features page. Could send them an email.

~~~
rlpowell
What rsync.net actually says:

"This simple offering gives you complete control over organization,
compression, deduplication, versioning and meta-data. You are NOT locked into
a particular application or protocol, and there are no constraints on file
sizes, retention, or access."

Which is great if you can find something that will do deduplication for you
_and_ encrypt _and_ handle that the disk isn't actually local. I couldn't.

Also, rsync.net is significantly more expensive than tarsnap in my experience.

------
ceejayoz
TL;DR version: "The custom NoSQL store (that lacks one of DynamoDB's major
selling points) I wrote for my unusual use case works better for my unusual
use case than DynamoDB."

~~~
mkup
My TL;DR version is "DynamoDB is much more expensive than custom NoSQL store
at thousand-writes-per-second scale"

~~~
encoderer
I don't mean to hammer this comment page with this point, but just to clarify
this since the product is so new: It really has very little to do with writes
per second. High volume writes are Dynamo's bread and butter.

His cost issues come from the large item size.

~~~
cperciva
No, I have very small items. The cost issues come from the large _number_ of
items.

~~~
encoderer
I thought I read 33kb somewhere in there. If so, by dynamodb standards, that's
not a small item. If I mis-attributed that 33kb number, then I kindly retract
what I said :D

~~~
cperciva
I have 33kB blocks which are stored on S3 (after aggregating them into objects
of up to 8 MB). I have key-value pairs which are 24 bytes and 53 bytes long.

------
TeamAqua
Good article.

In many ways, I think the author is echoing the same frustrations as with
Google App Engine's Datastore ( <http://news.ycombinator.com/item?id=3431132>
App Engine charges $6,500 to update a ListProperty on 14.1 million entities ).
Both Google Datastore and the new AWS DynamoDB are great, fast services, but
they are a bit too expensive at the medium/high end.

Frankly, I wish App Engine/AWS/Heroku/etc would introduce a database that
traded speed for cheaper costs. I'd be fine with certain writes being delayed
if it meant lower costs.

~~~
tyw
If Amazon had an option to use HDD instead of SSD storage for a certain
DynamoDB table and adjusted the pricing accordingly, would that pretty much
take care of users wanting basically what DyanamoDB offers except for cheaper
with slightly worse performance?

------
tom_b
Colin, since you also wrote the Tarsnap back-end data engine yourself, how
would you weigh a cost vs loss-of-control of that functionality in Tarsnap?

From my perspective, the flexibility and control you have over kivaloo
generates some value to you as the developer of Tarsnap. At what cost
point/function-level would you actually make a switch to a SaaS back-end?

My interest in this question goes beyond this specific case to the
consideration of when service/library use in a new SaaS production is
contraindicated.

~~~
cperciva
If Amazon had a service which gave me the cost and performance I expect to get
from kivaloo I'd start using it without any hesitation. I assign a negative
cost to loss-of-control since the testing provided by their large user base
and their experience with scalability far outweighs the advantages of greater
control.

------
res0nat0r
Not to denigrate Colin's work, nor Tarsnap at all, as I love these CLI based
tools, but is there anything better about Tarsnap than just using Duplicity as
I do now?

All I have to do is a ' duplicity $DIR s3+<http://bucket> ' and I will get a
gpg encrypted full/diff of my data, which looks to be a bit cheaper and GPG
encrypted stored on S3. Plus if I really want to save some cash I could mark
every object as Reduced Redundancy to save even more money.

Thoughts?

------
encoderer
I only skimmed this, and I'll go back and read it later, but I think the issue
he's mentioning is price?

It's true that on Dynamo, large item sizes get very expensive, very quick.
That stems from your data being stored on SSD--and with a replication factor
of 3 (iirc).

Cloud or no cloud, Amazon or not, SSD storage is still a lil pricey.

~~~
philjohn
Lots and lots of small items also costs though, because there's a 100byte per
k/v pair overhead. So if you're storing 2GB of 100 byte k/v pairs, you have to
pay for 4GB of storage.

If you store lots and lots, you probably also have either high read, high
write, or high read/write rates, in which case you'll need to pay more per
hour as well.

