A couple years ago there was an interesting tidbit at re:Invent about customers moving from DynamoDB to Aurora to save significant costs. The Aurora team made the point that DynamoDB suffers from hotspots despite your best efforts to evenly distribute keys, so you end up overprovisioning. Whereas with Aurora you just pay for I/O. And the scalability is great. Plus you get other nice stuff with Aurora like, you know, traditional SQL multi-operation transactions.
It was kind of buried in a preso from the Aurora team and the high-level messaging from Amazon was still, NoSQL is the most scalable thing. Aurora was and is still seemingly positioned against other solutions within the SQL realm. I sort of get it in theory that NoSQL is still theoretically infinitely scalable whereas Aurora is bounded by 15 read replicas and one write master.. but in practice these days those limits are huge. I think one write master can handle like 100K transactions a second or something.
So, I'm really curious where this has gone in the past couple years if anywhere. Is NoSQL still the best approach?
Since then we've pretty much cut our DynamoDB bill in half and had a drastic reduction in throttled responses.
But, as far as the "you end up overprovisioning" because of hotspots thing, DynamoDB does offer autoscaling these days, which should alleviate a lot of provisioning-related headaches and save you money compared to the provisioning you would have done with DynamoDB, from what I understand.
Granted, I don't think I'd want to use Dynamo for anything other than temporary data. Lock-in makes me nervous, and the way it scales up/down really makes it difficult to use it for hourly workloads...by the time it scales up we're close to done needing more capacity, then it doesn't scale down for like 40m after. We set up caps and the DB overflow machanism keeps things from grinding to a halt.
The problem they noted isn't lack of autoscaling, it's that you have to provision the entire datastore to accommodate your hottest partition.
 total throughput/num shards
We use Aurora or Postgres for key/value unless we need something specific, like multi-regional capacity or really high-end performance. For that we run ScyllaDB.
I'd be really surprised if the client library introduces a latency significant enough to be compared to the network latency between the app server and the database server.
Libraries just have to do more work, compared to simpler protocols, or HTTP which is incredibly easy to scale and pretty much handled automatically by the standard libraries at this point.
Another advantage until two days ago, was that with most of the data stores on AWS, you kept your databases behind a VPC and if you used lambda, your lambda also had to be in a VPC and that increased warm up time for the lambda.
Now, there is the Read Only Data API for serverless Aurora. You don’t have to worry about the traditional connection pooling or being in a VPC.
NoSQL has such a nich usage!
* Fast one-time data import without permanently creating a lot of shards (important if you are restoring from a backup)
* Better visibility into what causes throttling (e.g. was it a hot shard? Was it a brief but large burst of traffic?)
* Lower p99.9 latency. It occasionally has huge latency spikes.
* Indexes of more than 2 columns
* A solution for streaming out updates that is better than dynamodb streams
Wish Dynamo had something similar
A way of doing this without expending all that effort is oh my wish list too.
What bothers you about dynamodb streams specifically?
There is a new breed of databases that use consensus algorithms to enable global multi-region consistency. Google Spanner and FaunaDB where I work are part of this group. I didn’t catch anything about the implementation details of DynamoDB transactions in the article. If they are using a consensus approach, expect them to add multi-region consistency soon. If they are using a traditional active/active replication approach, they’ll be limited to regional replication.
Uh... this is just not true.
I don't think it's fair to compare them.
However, the more recent Google storage offerings based on Cloud Spanner do seem to offer this. I don't see how Amazon can make this statement - that doesn't stop it being an excellent enhancement to DynamoDB though.
It also supports the Cloud Datastore API.
(I work on it!)
Google Cloud Spanner: https://cloud.google.com/spanner/docs/transactions
Google Cloud Firestore: https://firebase.google.com/docs/firestore/manage-data/trans...
Plus if you use Cloud Firestore in Datastore Mode then Google Cloud Datastore would satisfy this requirement as well.
As for Firestore, it’s not clear whether it supports cross-collection transactions. Cloud Datastore does not support cross-namespace transactions AFAICT.
b) Given that the primary use case for namespaces was/is multitenancy, it's not clear to me why you'd want to transact across them. Nevertheless, you can. What's leading you to draw this conclusion?
/database1/key1 = foo
/database2/key2 = bar
“Multi-document transactions can be used across multiple operations, collections, databases, and documents.”
If we're referring specifically to shards then "DynamoDB is the only non-relational database that supports transactions across multiple partitions and tables." no longer sounds like hyperbole.
There is globalization and intermingling happening on technology too.
On a similar thought, a few years back, C# and Java got `Any` generic types, while Python/JS got static types (via python3 typings, typescript)
You are still responsible to implements a Queue or a Lock on the Items you want to mutate.
That said this is a huge milestone for DynamoDB, we can now safely mutate multiples items while remaining ACID.