I've been using Azure Table Storage since the beginning and this doesn't seem to be the same. Like others have mentioned TS is more similar to SimpleDB. Now, I would love for someone to give me a tl;dr on the the feature set of DynamoDB so I can make an accurate comparison.
Table Storage does not allow any other indexes other than the main primary ones (Row Key and Partition Key). You also cannot store complex object within fields and use them in a query. You basically just serialize the data and stuff it into the field.
The dynamic schema is very nice if you can leverage it but the actually query support is TERRIBLE. (Sorry Microsoft, I'm a fanboy but you blew it here). There is no Order By or event Count support which makes a lot of things very difficult. Want to know how many "color=green" records there are? Guess what, you're going to retrieve all those rows and then count them yourself. They're starting to listen to the community and have just recently introduced upserts and projection (select). I would love to see them adopt something like MongoDB instead :)
Can you elaborate how this is different and not comparable? Azure's table service offers the same automatic partition management, unlimited per-table scalability, composite keys, range queries, and availability guarantees. The linked paper goes into more details.
Besides the points I already complained about before... How about 200ms response times even when performing a query using the Row & Partition Keys. I'm not sure if by composite keys you were referring to something other than the RK & PK because those are the only indexes you get.
I agree with your other pain points - in terms of not being able to get counts, secondary indices etc. However, you can easily simulate some of those - maintain your own summary tables, indices and so on. These ought to emerge as platform features pretty soon though. It's not perfect, but its feature set is close to Dynamo.
As for Mongo DB, I guess this service has been built from ground-up to provide the availability guarantees and automatic partition management features. I don't know if Mongo provides those. You could run Mongo yourself on Azure if you wanted to; there's even a supported solution done recently.
Hmm, I guess when I think about composite keys I think of ways to indicate a specific field/column as being part of the key. Data duplication along with string concatenation aren't really an elegant way to do it. If I remember right you also can't update the key values once the record has been saved. This is coming from a big SQL guy though :)
Balakk is correct. There are a lot of similarities between Windows Azure Tables and DynamoDB, and the release of DynamoDB validates the Data Model we have provided for a few years now with Azure Tables
• They both are NoSQL schema-less table stores, where a table can store entities with completely different properties
• They have a two attribute (property) composite primary key.One property that is used for partitioning and the other property is for optimizing range based operations within a partition
• Both of them have just a single index based on their composite primary key
• Both are built for effectively unlimited table size, seamlessly auto scale out with hands off management
As mentioned by someone else, one difference is that DynamoDB stores its data completely in SSDs, whereas, in Azure Storage our writes are committed via journaling (to either SSD or a dedicated journal drive) and reads are served from disks or memory if the data page is cached in memory. Therefore, the latency for single entity small writes are typically below 10ms due to our journaling approach (described in the above SOSP paper). Then single entity read times for small entities are typically under 40ms, which is shown in the results here:
Can you please explain your math? AFAIK Azure txns are not paid by the hour - they are a flat cost of $.01 per 10000 storage txns. If you do batched GETs and PUTs you make only 550 txns (55000/100 entites/batch).
I agree that Dynamo's provisioned throughput capacity is a very useful feature though. Azure does not provide any such performance guarantee; the throughput limit is also a guideline as far as i know, not an absolute barrier.
I should have explained that my costs were calculated on a "per day" assumption. Thus the costs are for:
5000 x 60 x 60 x 24 = 432000000 Writes
50000 x 60 x 60 x 24 = 4320000000 Reads
(432000000/10000) x 0.01 = $432
(4320000000/10000) x 0.01 = $4320
Azure Total Cost For One Days Use: $4752
((5000/10) x 0.01) x 24 = $120
((50000/50) x 0.01) x 24 = $240
AWS Total Cost For One Days Use: $360
You are right that I don't take into account the bulk feature of azure reads & writes but this is down to bulk requests only being possible on a single partition at a time which in my personal experience (not exhaustive) is non-trivial to take advantage of.
The cost difference between Windows Azure Tables and DynamoDB really depends upon the size of the entities being operated over and the amount of data stored. If an application can benefit from batch transactions or query operations, the savings can be a lot per entity using Windows Azure Tables.
For the cost of storage. The base price for Windows Azure Tables is $0.14/GB/month, and the base price for DynamoDB is $1.00/GB/month.
For transactions, there is the following tradeoff
• DynamoDB is cheaper if the application performs operations mainly on small items (couple KBs in size), and the application can’t benefit from batch or query operations that Windows Azure Tables provide
• Windows Azure Tables is cheaper for larger sized entities, when batch transactions are used, or when range queries are used
The following shows the cost of writing or reading 1 million entities per hour (277.78 per second) for different sized entities (1KB vs 64KB). It also includes the cost difference between strong and eventually consistent reads for DynamoDB. Note, Windows Azure Tables allows batch operations and queries for many entities at once, at a discounted price. The cost shown below is the cost per hour for writing or reading 1,000,000 entities per hour (277.78 per second).
• 1KB single entity writes -- Azure=$1 and DynamoDB=$0.28
• 64KB single entity writes -- Azure=$1 and DynamoDB=$17.78
• 1KB batch writes (with batch size of 100 entities) -- Azure=$0.01 and DynamoDB=$0.28
• 64KB batch writes (with batch size of 100 entities) -- Azure=$0.01 and DynamoDB=$17.78
• 1KB strong consistency reads -- Azure=$1 and DynamoDB=$0.05
• 64KB strong consistency reads -- Azure=$1 and DynamoDB=$3.54
• 1KB strong consistency reads via query/scan (assuming 50 entities returned on each request) – Azure=$0.02, DynamoDB=$0.05
• 64KB strong consistency reads via query/scan (assuming 50 entities returned on each request) – Azure=$0.02, DynamoDB=$3.54
Open source really helps here. Amazon are innovative, but they are not the only place innovation is happening. In fact, here's a pretty good writeup (if a wee biased) on how the new offering compares to to the open source Cassandra project:
Cassandra is a great project, as it is Hadoop, MySQL, etc. The issue I am raising is that it is not so much which project is better on a feature basis, but the fact that Amazon is able to offer it as a service, in a scalable way that no other vendor is able to do (with the exception of Google and, on a good day, Microsoft). Most other "traditional" cloud vendors, such as Rackspace, do not have anything remotely comparable to this, EBS, SQS, RDS, etc.
I also found it interesting that the storage media is specified and it's SSDs. Solid state will be hugely disruptive for hosted services, I've been hoping for an instance-by-the-hour service backed by SSDs and I'll surmise from this announcement that it won't be long before that shows up on the EC2 menu. Gimme :)