

A Tour of Amazon's DynamoDB - timf
http://www.paperplanes.de/2012/1/30/a-tour-of-amazons-dynamodb.html

======
jedschmidt
I've been waiting for Mathias to write something like this, and I'm glad he
didn't waste any time diving in. For those who don't know him, he has more
breadth in the area of emerging datastores than anyone I know; he literally
wrote the book on Riak[1].

I've also been acquainting myself with the DynamoDB API over the past week,
and am building a node.js binding that I hope will abstract away most of the
esoteric aspects of interacting with it. It has full API coverage currently
and is tested in Travis[3], so now I'm writing the high-level interface. So
far I've covered about half of the operations DynamoDB offers, but would love
to hear any ideas/feedback.

    
    
        [1] http://riakhandbook.com/
        [2] https://github.com/jed/dynamo
        [3] http://travis-ci.org/jed/dynamo

~~~
ryanfitz
Looks like we are building a very similar node.js client for dynamo.
<https://github.com/Wantworthy/dynode>

------
jbellis
I'm still struck by what a strong endorsement DynamoDB is of the design
decisions we've been making in Cassandra over the last couple years. Composite
keys, distributed counters, ...

More details: <http://www.datastax.com/dev/blog/amazon-dynamodb>

~~~
j2labs
Didn't you folks design Cassandra _after_ the Dynamo paper though?

~~~
thamer
Composite keys (multi-dimensional) and distributed counters were not in the
Dynamo paper. They were added to Cassandra, which is indeed based partially on
Dynamo.

------
cleaver
DynamoDB is not open source, at least according to a blog comment by Jeff Barr
of AWS. ([http://aws.typepad.com/aws/2012/01/amazon-dynamodb-
internet-...](http://aws.typepad.com/aws/2012/01/amazon-dynamodb-internet-
scale-data-storage-the-nosql-way.html))

I'm certainly not going to commit to platform lock-in like that. I know a lot
of folks were hit hard by Googles app engine price changes and it could happen
again here with DynamoDB.

~~~
techscruggs
Then you don't use S3?

~~~
cleaver
I don't actually, but I feel there is less lock-in. It wouldn't be hard to
move off.

Also, the risk of S3 pricing changing greatly is constrained by the wide
adoption. If DynamoDB failed to become pervasive, it would be relatively easy
for Amazon to increase the price to make an underutilized service profitable.
OTOH, if DynamoDB does become as popular as S3 your exposure to risk will be
much less.

~~~
spullara
It would be pretty easy to implement the DynamoDB API on top of HBase so I
wouldn't worry too much about that.

------
siculars
I would second Mathias' thoughts in that this does feel more bigtable/column
oriented than traditional dynamo. Dynamo and its derivative, Riak (not from
amazon), make no attempt to determine data types or schema in any way. The
fact that dynamodb can do range scans and has various counting features leads
me to believe that it is more hybridized along the lines of Cassandra. Either
way, it is a welcome addition to the nosql toolset that developers have to
chose from today. I will certainly look at it more closely.

~~~
rbranson
Actually Riak can be used in a way in which it is content-aware of JSON and
XML documents.

[http://wiki.basho.com/Riak-Search---Indexing-and-Querying-
Ri...](http://wiki.basho.com/Riak-Search---Indexing-and-Querying-Riak-KV-
Data.html)

~~~
siculars
Search and Indexing were addons to the initial Riak product that were born
from user demand. Only recently (1.0 I believe) was Search integrated into the
main Riak distribution. Riak has no capability of updating values in place,
ie. a partial update and no means of ordering or pagination. Some of these
features are available via mapreduce queries or Search/Index as you noted.

I'm by no means harping on Riak. I actually use it on a number of projects.
But reading about DynamoDB's capabilities does not conjure Riak, it conjures
Cassandra and HBase.

~~~
rbranson
I agree with you on other counts, but Riak is a far cry from the spartan
implementation of Dynamo outlined in the paper.

------
taylorbuley
Does anyone know in which language DynamoDB is implemented? I've read
somewhere that SimpleDB is done in Erlang. Is that the case with DynamoDB as
well? I've been reading about Ets and Dets in Erlang and it makes me wonder
whether they have anything to do with either of these data stores.

~~~
gigq
I haven't seen it written anywhere, but judging by the __type in errors like
this, I would say Java:

    
    
      {"__type":"com.amazon.coral.validate#ValidationException",
      "message":"One or more parameter values were invalid:
      The provided key size does not match with that of the schema"}

------
antirez
About pricing, it is interesting how caching is apparently not an effective
option to reduce the bill in a significant way since data size is a big
component of price.

~~~
justincormack
Thats because it is SSD based, so size starts to matter again.

~~~
antirez
Yes but real hardware with an SSD disk plugged inside could deliver excellent
performances with many reasonable on-disk databases, with a fraction of the
cost per megabyte, with many many millions of keys.

It is like that other than the fact you don't have to manage the DB, that is a
good point, here the best reason for many small-mid business to use such a
service is since they are already on EC2. May work in the short term but I see
something odd about this model.

------
SaltwaterC
The API doesn't bother me as much as the temporary authentication via STS.
Temporary credentials for database access? Seriously? I am the only one who
sees how ridiculous this is?

~~~
amock
What's wrong with using temporary credentials?

~~~
SaltwaterC
They add useless latency and another point of failure. And sometimes, the AWS
APIs DO fail.

~~~
amock
Temporary credentials are used specifically to reduce latency, since you get a
token that is valid for a period of time and doesn't need to be checked
against the auth service on every call. Since the credentials last for 12
hours the work to retrieve them should be negligible. Because they don't need
to be checked against the auth service on every call it seems like they would
more resilient to API failure than standard AWS credentials.

~~~
SaltwaterC
You still need to provide an access key id and secret access key with that
session token as returned by Security Token Service. And properly sign every
request with those credentials. With a proper session token, but invalid
credentials, the request fails with HTTP 400: "__type:
com.amazon.coral.service#InvalidSignatureException, message: The request
signature we calculated does not match the signature you provided. Check your
AWS Secret Access Key and signing method. Consult the service documentation
for details.". Amazon still checks the signature. The same logic that could be
used for IAM credentials that don't expire, without the need to check for a
valid session token. If it is an Amazon screw up, that static credentials take
more time to evaluate vs temporary credentials - that's not out fault. It is a
pure Amazon screw up. The signing logic is still there for every API call. So
(apparently?) we're back to square one: extra requests and wrapper logic for
zero benefits, and worse overall experience. The session credentials aren't
stateful. I specifically checked for this behavior. Therefore what you say,
doesn't seem to happen in reality.

And for the love of God, don't blindly trust the documentation / what the AWS
folks say. As an AWS library author myself, I had a lot of fun debugging
failed requests because the smarty who wrote the signing procedure docs forgot
to mention some HTTP headers that are mandatory to sign. I had to reverse
engineer an official SDK in order to patch my own code. While being
implemented as the docs say, the signing method failed at every request,
although I had a valid session token. Trial and error is always a broken way
of developing things due to broken docs. If you're an AWS employee, please
send my regards to the documentation folks.

