

What's Wrong with Amazon's DynamoDB Pricing? - pbailis
http://bailis.org/blog/whats-wrong-with-amazons-dynamodb-pricing/

======
cperciva
When I saw the DynamoDB pricing, I assumed the setup was something along the
lines of "N=3, writes go to all replicas and wait for 2 responses, consistent
reads go to 2 replicas, inconsistent reads go to 1 replica". Given that
DynamoDB is using SSDs, I don't see why they would need to broadcast read
requests to all replicas -- that's only useful if you have high variability in
your read latency.

Someone who wanted an interesting research project might be able to pull back
the covers on DynamoDB a bit by issuing a large number of requests, very
carefully measuring the response latency curves for consistent and
inconsistent reads, and looking at what model fits them best.

~~~
pbailis
If the network latency is negligible, this sounds reasonable. However, the
mapping of 2x IOPs to 2x cost seems tenuous to me.

>Someone who wanted an interesting research project might be able to pull back
the covers on DynamoDB a bit by issuing a large number of requests, very
carefully measuring the response latency curves for consistent and
inconsistent reads, and looking at what model fits them best.

This is possibly interesting, but why shouldn't Amazon just provide this data?
The data would useful to developers who otherwise have to guess.

And finally, why not allow users to place this cost on the write path instead?
"W=3" seems like a reasonable option (given that consistent reads may be
unavailable anyway).

~~~
cperciva
_the mapping of 2x IOPs to 2x cost seems tenuous to me._

Well, it's not just the number of disk I/Os which increases -- you've also got
the network traffic and CPU time associated with each copy of the read request
you send out. Given that (a) DynamoDB is using SSDs, and (b) Amazon seems to
love using slow languages, I'd guess that the CPU time is what ends up
dominating their cost.

 _why shouldn't Amazon just provide this data?_

Amazon is very secretive. And I can't really blame them; after all, why would
they want to subsidize their competitors' development?

 _And finally, why not allow users to place this cost on the write path
instead? "W=3" seems like a reasonable option_

I'd guess they wanted to be able to tolerate partitions without sacrificing
availability. (At least for the common case where "partition" means that nodes
are completely offline and are both unable to communicate with other nodes and
unable to receive incoming requests.)

------
the_bear
This is just the tip of the iceberg with DynamoDB's pricing issues. Based on
my understanding of how their pricing works, here are two more major issues:

-You have to provision throughput for each table individually, so you basically have to pay for the maximum throughput you expect for every single table all the time (you can only reduce your throughput once per day as far as I understand). This means that adding a new table can be pretty expensive (even if your total throughput isn't increasing at all).

-Each unit of throughput gives you one 1kb write/read per second. If you exceed your throughput for a second, the call fails. This means that if you want to support the ability to write 50kb (like a "notes" field or something), you need to constantly pay for 50 write units even if you won't ever realistically use that much. And you have to do that for _every single table_

As the OP points out, it's all about FUD. I'm so terrified of exceeding my
throughput allotment that I'm forced to pay for an order of magnitude more
resources than I will actually use. This seems to go against everything AWS is
about.

------
apanda
This post brings up this interesting divide on how you price any of these
services. In particular, with software (and movies, and whatever), we know
that the marginal cost of producing another copy is low if not zero, and hence
in some sense intellectual property can be priced arbitrarily by creators,
since there aren't nice curves to determine pricing. This is often the same
argument for why IP protection is reasonable.

At the same time, clearly this is not the case for something like EC2, where
the marginal cost for another machine isn't necessarily 0, but it is true for
Dynamo in some sense. Are we in a world where one can't choose to mix the
model?

------
shaddi
To be fair, they're not explicitly charging twice as much for consistent
reads. You get consistent reads until you exceed your provisioned read amount,
which unfortunately is measured in these odd units of "read capacity"). It's
not clear (to me, anyway) if the 2x number represents something accurate about
resource usage or if it's just a provisioning guideline.

That said, what provisioned capacity means in DynamoDB is pretty opaque. Great
point about the utility of latency information for developers.

~~~
pbailis
For a fixed amount of throughput, if I want consistent reads, I pay x dollars.
If I'm okay with eventual consistency, I can pay x/2 dollars and get the same
throughput.

Conflating resource usage, pricing, and consistency using a single metric
(here, price) seems confusing, especially without the right metrics to guide
devs.

------
slackerIII
Isn't DynamoDB backed by SSDs? Perhaps that is the reason for the increased
cost.

