Edit: I can confirm: does not allow the UTF-8 null character in strings: https://docs.aws.amazon.com/documentdb/latest/developerguide... ... It is written on top of PostgreSQL.
I kinda expected them to build it on top of DynamoDb's backend and provide the same kind of "Serverless" on demand experience, but I guess the architecture didn't fit, or maybe this was just faster.
However, the fact that writes aren't horizontally scalable makes it a laughable nosql database but it probably satisfies the checkmark for enough of their enterprise customers that it will be a mild success and they'll keep it on life support like simpledb forever until they implement a proper solution assuming there is enough demand for it.
ElasticSearch on the other hand...
If only they had a competitor that could launch the same products a few months later but offered higher reliability off the bat, that could eventually force Amazon to improve their reliability or risk losing customers long term.
Being first to market doesn't ensure eventual market dominance. Sure, it could give you important feedback. But if your product is subpar, the feedback will have a ton of noise and possibly be useless. Plus it's not worth creating negative externalities and earning the reputation.
Reliability is the trickiest of the three because it requires the customer to architect their solution with multi-AZ support in mind, but AWS always provides the foundation for that architecture.
Could they, and should they provide more features and a better developer experience around building fault tolerant solutions? Absolutely! But I certainly don't think they have a bad reputation for reliability.
Doesn't Azure Cosmos DB do this? From https://docs.microsoft.com/en-us/azure/cosmos-db/introductio...
> You can elastically scale throughput and storage, and take advantage of fast, single-digit-millisecond data access using your favorite API among SQL, MongoDB, Cassandra, Tables, or Gremlin.
Haven't used it though, so would welcome some real world experience.
They have, it's Azure. I'm even a little bit scared because no one here is mentioning CosmosDB... It seems to me that most of the community only knows aws products.
Customers are paying AWS so that their SREs don't get called, they don't care if the AWS SREs do as long as the system keeps running.
Based on the supporting quotes at launch from Capital One, Dow Jones and WaPo it sound like enough customers are ok with vertical write scalability and (pretty awesome) horizontal read scalability for now because it fits their use case and is better than what they had before.
Also consider that since the cluster management overhead has been removed from the customer, they can essentially "shard" by using a separate cluster for each sufficiently large service/org/dept, which might actually work out better for them in some respects.
Perfect is the enemy of good enough, the architecture might be laughable to you, but it is probably miles ahead of what the customer was using before.
And the nice thing about this hypothesis, you can test it by looking how successful DocumentDB will turn out to be. ~
I think it works, and AWS has yet been brought down by this horizontal complexity. Quite an achievement, but might not be a satisfying experience for the engineers work there.
The downside is that a lot of their products lack polish which sucks. On the flip side even when they are launched with minimal features, they do tend to be reliable, durable and secure, which is important when it comes to data related services.
I wonder how widespread this view is. I suspect it's more widespread than Amazon realise. They may have optimised into a local maximum where they get a lot of value from being first to market, but could potentially get more by being first to "viable to trust a business on".
As far as being "viable to trust a business on" the numbers don't lie, AWS is number one because customers are running their businesses on AWS. The fact that DocumentDB launched with supporting quotes from Capital One, Dow Jones and WaPo shows that customers were clamoring to use it even before GA.
Remember a lot of these customers are coming to AWS because they tried doing themselves and stuggled. When it comes to data, customers trust AWS more than they trust themselves, and rightly so.
AWS also has not had a reputation for deprecating services it launches. I find very little risk in taking a dependency on something AWS releases.
They already are viable and trusted by multiple billion-dollar companies and governments.
This focus on actually meeting needs today is what keeps AWS on top while the others take 2 years to launch minor service upgrades.
The Aurora storage subsystem is much more limited in terms of horizontal scalability and performance, they probably chose it because it was a better/quicker fit.
There was work underway at the time I left to replace InnoDB with WiredTiger. It seemed to be very slow going, and I suspect WiredTiger being acquired by 10gen had a part in it. They also had only 1-2 engineers on the project of ripping out MySQL and replacing it, in a long-lived branch that constantly dealt with merge conflicts from more active feature development happening on mainline.
Aurora, simply by virtue of being newer and learning from DDB's mistakes (in the same way DDB learned from SimpleDB and the original Dynamo) probably has better extension points for supporting (MySQL, Postgres, Mongo) in a sane way.
Then again, the relationship between AWS and Oracle is even more contentious and Aurora MySQL is one of AWS's most popular products so I don't think they are terribly worried about building on competitor's technologies.
At least when I was there, the strong focus was always on adding new features (global & local secondary indexes, change streams, cross-region replication, and so on) to keep up with the Joneses (MongoDB et al).
Meanwhile, a bunch of internal Amazon teams were taking a dependency on it instead of being their own DBAs, and those teams didn't care that much about the whiz-bang features, they just wanted a reliable scale-out datastore that someone else would get paged about when some component failed.
Adding features at a breakneck pace while keeping up umpteen-nines reliability and handful-of-milliseconds performance meant tech debt and non-user-facing improvements, including WiredTiger, all got sidelined. Around the time I left, our page load was around 200 per week. That's one page every 50 minutes, 24/7, if you're keeping score at home.
I would love to get a behind the scenes look at the process of gradually improving the components of DynamoDB with better technologies, while still maintaining reliability and performance.
Apparently, they are using a 1:1 mapping between a collection and a table. Either by flattening the document or by using jsonb or equivalent. I'm not a big believer this is good for performance reasons, at least compared to a more normalized approach like the one we did for https://www.torodb.com But they may change it in the future --if they don't expose the SQL API to their internal representation.
I led a C# project where we could seamlessly switch back and forth between Mongo and SQL Server without changing the underlying LINQ expressions.
We sent the expressions to the Mongo driver and they got translated to MongoQuery we sent the expressions to Entity Framework and they got translated to Sql Server.
I’ve seen a LINQ to REST API provider.
I am however really hoping Amazon provides a MySQL 8.0 compatible version of Aurora with full support for its new hybrid SQL and Document Store interfaces courtesy of the X DevAPI and lightweight "serverless" friendly connections courtesy of the new X Protocol.
That way your don't have to choose just one approach, and you can have your data in one place with high reliability and durability.
My ultimate pipe dream would be that they also provided a redis compatible key/value interface that allows you to fetch simple values directly from the underlying innodb storage engine without going thru the SQL layer, similar to how the memcached plugin currently works
X DevAPI and X Protocol/X Plugin could team up and map K/V style access to the server internal InnoDB API instead of using a SQL service as it is currently done. They could try to do it "transparently" or let you set hints. Whatever is desired from an application standpoint.
Maybe not (but OP makes a lot of good points for why it is), but it is still based on the aurora limits, 64TB of size, 15 low latency read replicas in minutes, and presumably 1 write capacity which makes it a laughable nosql system since it cannot scale past 1 servers write capacity.
From the docs:
Changed in version 2.0: Version 2.0 of the MongoDB Connector for BI introduces a new architecture that replaces the previous PostgreSQL foreign data wrapper with the new mongosqld.