Hacker News new | past | comments | ask | show | jobs | submit login
Amazon DocumentDB, with MongoDB compatibility (amazon.com)
519 points by ifcologne on Jan 9, 2019 | hide | past | favorite | 306 comments

My bet is that it is built on top of Aurora PostgreSQL. By looking at the "Limits" section (https://docs.aws.amazon.com/documentdb/latest/developerguide...), identifiers are limited to 63 characters and the same characters that PostgreSQL limits identifiers to; and a collection size limit of 32TB, coincidentally maximum PostgreSQL table size.

Edit: I can confirm: does not allow the UTF-8 null character in strings: https://docs.aws.amazon.com/documentdb/latest/developerguide... ... It is written on top of PostgreSQL.

It sounds like it is built on top of the Aurora storage subsystem that is used by both Aurora MySQl and Aurora Postgres[1].

I kinda expected them to build it on top of DynamoDb's backend and provide the same kind of "Serverless" on demand experience, but I guess the architecture didn't fit, or maybe this was just faster.

1. https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide...

Definitely because it was faster. Amazon's strategy is to launch new features ASAP and then rely on everyone having to be on-call to fix shit when it inevitably breaks in prod because they rushed to launch. I will admit that while their "operational excellence" is shit, the security engineers do have quite a bit of power to block launches so their security isn't as bad as the reliability.

However, the fact that writes aren't horizontally scalable makes it a laughable nosql database but it probably satisfies the checkmark for enough of their enterprise customers that it will be a mild success and they'll keep it on life support like simpledb forever until they implement a proper solution assuming there is enough demand for it.

I was there for the launch of a major AWS service where they had an entire separate team working on the next iteration since well before launch (because the initial design wasn’t even intended to be sustainable). They are happy to incur technical risk (and in this case, to eat major losses in hardware costs) in order to be first to market.

We used MongoDB in my last job and I just want to say that I would have given up management of that beast in a heartbeat. We didn't stress MongoDB nearly enough to warrant all the effort required to construct it, monitor it, back it up, etc. Even if the performance was crappy, I would have lobbied hard to change to DocumentDB ASAP.

While I don't love MongoDB, I don't find it to be especially difficult to run. I'm running ~200 instances of MongoDB with a small team and it consumes very little of my attention.

ElasticSearch on the other hand...

Why is Elastic so difficult to run for you?

I'm sure they are, so they a pass those lovely negative externalities onto the customer because they know it's in demand and only they provide that service.

If only they had a competitor that could launch the same products a few months later but offered higher reliability off the bat, that could eventually force Amazon to improve their reliability or risk losing customers long term.

Being first to market doesn't ensure eventual market dominance. Sure, it could give you important feedback. But if your product is subpar, the feedback will have a ton of noise and possibly be useless. Plus it's not worth creating negative externalities and earning the reputation.

You think AWS has a reliability problem for their database products? That's news to me. AWS often launches products with limited features, but security, durability and reliability tend to be the standard.

Reliability is the trickiest of the three because it requires the customer to architect their solution with multi-AZ support in mind, but AWS always provides the foundation for that architecture.

Could they, and should they provide more features and a better developer experience around building fault tolerant solutions? Absolutely! But I certainly don't think they have a bad reputation for reliability.

From my perspective, performance and scaling issues are most likely to occur.

> If only they had a competitor that could launch the same products a few months later but offered higher reliability off the bat

Doesn't Azure Cosmos DB do this? From https://docs.microsoft.com/en-us/azure/cosmos-db/introductio...

> You can elastically scale throughput and storage, and take advantage of fast, single-digit-millisecond data access using your favorite API among SQL, MongoDB, Cassandra, Tables, or Gremlin.

Haven't used it though, so would welcome some real world experience.

> If only they had a competitor that could launch the same products a few months later but offered higher reliability off the bat, that could eventually force Amazon to improve their reliability or risk losing customers long term.

They have, it's Azure. I'm even a little bit scared because no one here is mentioning CosmosDB... It seems to me that most of the community only knows aws products.

For how many customers do all of AWS flaws combined represent more than 2% of their production outages? I think it’s a very small number.

Well, they are second to market this time around, Cosmos has had mongo api compatibility for a long time.

It definitely sounds like it sucks from the perspective of an internal AWS developer or SRE, but if the AWS systems are architected such that these internal failures aren't seen by end users then AWS's reliability reputation remains fully intact.

Customers are paying AWS so that their SREs don't get called, they don't care if the AWS SREs do as long as the system keeps running.

Based on the supporting quotes at launch from Capital One, Dow Jones and WaPo it sound like enough customers are ok with vertical write scalability and (pretty awesome) horizontal read scalability for now because it fits their use case and is better than what they had before.

Also consider that since the cluster management overhead has been removed from the customer, they can essentially "shard" by using a separate cluster for each sufficiently large service/org/dept, which might actually work out better for them in some respects.

Perfect is the enemy of good enough, the architecture might be laughable to you, but it is probably miles ahead of what the customer was using before.

I suspect that most MongoDB users never get to the point where they need to horizontally scale (i.e. it gets chosen for fad reasons, not because they actually have something big enough to scale).

And the nice thing about this hypothesis, you can test it by looking how successful DocumentDB will turn out to be. ~

AWS prioritizes launch above EVERYTHING. It is their strategy, to have market tells them what to build.

I think it works, and AWS has yet been brought down by this horizontal complexity. Quite an achievement, but might not be a satisfying experience for the engineers work there.

It makes sense in terms of feeling out the market as well. If this version of the service takes off it validates the decision to proceed with a more complex/scalable version and it gives them more customer feedback. Standard MVP best practices.

The downside is that a lot of their products lack polish which sucks. On the flip side even when they are launched with minimal features, they do tend to be reliable, durable and secure, which is important when it comes to data related services.

This is one of the main reasons why I don't like AWS services, everything just seems so half-finished. There's not a lot in AWS that I would trust enough to use in production.

I wonder how widespread this view is. I suspect it's more widespread than Amazon realise. They may have optimised into a local maximum where they get a lot of value from being first to market, but could potentially get more by being first to "viable to trust a business on".

I certainly agree that they seem half finished in terms of features and developer experience, but from the point of view of security and data durability they have an excellent reputation. They typically have a pretty good reliability story as well, but it relies on the customer architecture their solution to take advantage of multiple AZs/Regions, which is often not trivial.

As far as being "viable to trust a business on" the numbers don't lie, AWS is number one because customers are running their businesses on AWS. The fact that DocumentDB launched with supporting quotes from Capital One, Dow Jones and WaPo shows that customers were clamoring to use it even before GA.

Remember a lot of these customers are coming to AWS because they tried doing themselves and stuggled. When it comes to data, customers trust AWS more than they trust themselves, and rightly so.

What's your definition of production ready? AWS services when launched "half-finished" still do not have outages, data lost or security issues. They also come with metrics and enough monitoring to support them in production. Those are the major checkboxes for production ready.

AWS also has not had a reputation for deprecating services it launches. I find very little risk in taking a dependency on something AWS releases.

You mean if they used a different strategy, they might have more than the entire one third market share of the entire cloud hosting industry?

>> "viable to trust a business on"

They already are viable and trusted by multiple billion-dollar companies and governments.

Apparently SimpleDB is still used quite a lot internally. As for their market tactics, there's no denying it works as their pace is accelerating and leaving everyone else in the dust. Most customers just want to pay some money and have a solution ready to go, they don't need infinite scaling from day 1, if ever.

This focus on actually meeting needs today is what keeps AWS on top while the others take 2 years to launch minor service upgrades.

MongoDB is not horizontally scalable either is it?

DynamoDB is written on top of MySQL (more specifically, MySQL's storage engine, not the query engine) so using Aurora which has a newer design would make sense.

Saying DynamoDB is built on top of InnoDB is a pretty big oversimplification of a much more complex distributed system[1] and for all we know they could have switched out the low level the storage engine on the backend to something like RocksDB or WiredTiger.

The Aurora storage subsystem is much more limited in terms of horizontal scalability and performance, they probably chose it because it was a better/quicker fit.

1. https://youtu.be/yvBR71D0nAQ

Yeah, I used to work on DynamoDB, I know it's more complicated (much more complicated than that video makes out - their code quality was atrocious, like 2000-5000 line Java classes in 3 or 4 deep inheritance hierarchies; no unit tests, only "smoke tests" that took 2 hours to run and were so prone to race conditions that common advice was to close everything else on your machine, run them, then leave them alone while you went to meetings)

There was work underway at the time I left to replace InnoDB with WiredTiger. It seemed to be very slow going, and I suspect WiredTiger being acquired by 10gen had a part in it. They also had only 1-2 engineers on the project of ripping out MySQL and replacing it, in a long-lived branch that constantly dealt with merge conflicts from more active feature development happening on mainline.

Aurora, simply by virtue of being newer and learning from DDB's mistakes (in the same way DDB learned from SimpleDB and the original Dynamo) probably has better extension points for supporting (MySQL, Postgres, Mongo) in a sane way.

Interesting, how long ago was that? I would be curious to know if the WiredTiger switch ever happened, and what that support relationship looks like not given the contentious relationship between MongoDB and AWS. The old Wired Tiger Inc website[1] still lists AWS as a customer.

Then again, the relationship between AWS and Oracle is even more contentious and Aurora MySQL is one of AWS's most popular products so I don't think they are terribly worried about building on competitor's technologies.

1. http://www.wiredtiger.com/

3+ years ago, so it's entirely possible that things have changed since I left. I don't have any more recent information on the state of the system.

At least when I was there, the strong focus was always on adding new features (global & local secondary indexes, change streams, cross-region replication, and so on) to keep up with the Joneses (MongoDB et al).

Meanwhile, a bunch of internal Amazon teams were taking a dependency on it instead of being their own DBAs, and those teams didn't care that much about the whiz-bang features, they just wanted a reliable scale-out datastore that someone else would get paged about when some component failed.

Adding features at a breakneck pace while keeping up umpteen-nines reliability and handful-of-milliseconds performance meant tech debt and non-user-facing improvements, including WiredTiger, all got sidelined. Around the time I left, our page load was around 200 per week. That's one page every 50 minutes, 24/7, if you're keeping score at home.

According to this post [1] the WiredTiger project seems to have been cancelled after the acquisition.


Given the scale and popularity of DynamoDB and the distributed nature you would think that they could hire multiple teams just to work on improving it, but I guess it isn't as simple as that.

I would love to get a behind the scenes look at the process of gradually improving the components of DynamoDB with better technologies, while still maintaining reliability and performance.

People downvoting one of the guys who worked on DynamoDB at Amazon, somehow thinking they know better. HN in a nutshell.

You have been downvoted.

It would be nice if Amazon provided an API to access the data via SQL alongside the MongoDB API; I've seen quite a number of organizations migrate from mongo to Postgres once they get out of the rapid development phase. This would make that transition butter smooth.

That would make the internal representation used an "API" and thus won't be able to change it in the future.

Apparently, they are using a 1:1 mapping between a collection and a table. Either by flattening the document or by using jsonb or equivalent. I'm not a big believer this is good for performance reasons, at least compared to a more normalized approach like the one we did for https://www.torodb.com But they may change it in the future --if they don't expose the SQL API to their internal representation.


I led a C# project where we could seamlessly switch back and forth between Mongo and SQL Server without changing the underlying LINQ expressions.

We sent the expressions to the Mongo driver and they got translated to MongoQuery we sent the expressions to Entity Framework and they got translated to Sql Server.

C# is ahead of the game with LINQ, expression syntax, and the entire Rosyln platform. Passing an IQueryable<> around that can be interpreted and transformed for multiple backends is a incredibly productive. I wish more people knew about this, and .NET in general.

And I’ve seen a few Java and Javascript libraries that purport to “implement LINQ” and they don’t get the power of LINQ is not the syntax, it’s that LINQ translates your code to expressions that can be parsed and translated to any back end query - it’s not just an ORM.

I’ve seen a LINQ to REST API provider.

I doubt that they actually built this on top of Postgres. They probably just integrated the WiredTiger[1] storage engine used by Mongo with their Aurora storage subsystem.

I am however really hoping Amazon provides a MySQL 8.0 compatible version of Aurora with full support for its new hybrid SQL and Document Store interfaces[2] courtesy of the X DevAPI[3] and lightweight "serverless" friendly connections courtesy of the new X Protocol.

That way your don't have to choose just one approach, and you can have your data in one place with high reliability and durability.

My ultimate pipe dream would be that they also provided a redis compatible key/value interface that allows you to fetch simple values directly from the underlying innodb storage engine without going thru the SQL layer, similar to how the memcached plugin currently works[4]

1. https://github.com/wiredtiger/wiredtiger

2. https://mysqlserverteam.com/mysql-8-0-announcing-ga-of-the-m...

3. https://dev.mysql.com/doc/x-devapi-userguide/en/devapi-users...

4. https://dev.mysql.com/doc/refman/8.0/en/innodb-memcached.htm...

What's the motivation for a faster access path to InnoDB: performance?

X DevAPI and X Protocol/X Plugin could team up and map K/V style access to the server internal InnoDB API instead of using a SQL service as it is currently done. They could try to do it "transparently" or let you set hints. Whatever is desired from an application standpoint.

> I doubt that they actually built this on top of Postgres.

Maybe not (but OP makes a lot of good points for why it is), but it is still based on the aurora limits, 64TB of size, 15 low latency read replicas in minutes, and presumably 1 write capacity which makes it a laughable nosql system since it cannot scale past 1 servers write capacity.

Are you aware that they are working on multi-master for Aurora? https://aws.amazon.com/about-aws/whats-new/2017/11/sign-up-f...

And there are organizations who can do rapid development in Postgres.

I think they're built on a common storage system, just like the MySQL compatible version too.

Aurora Postgres isn't really Postgres(only compatible), or is it?

The storage engine is different, but the frontend is actual Postgres

Interesting- Reminds me, I wish Postgres would increase this default identifier limit to 255 - or make it easily user configurable. It can be done by a sophisticated user, but only via special compilation and only when first installed, which is a right pain. I find Long identifier names useful for constraint names and foreign key names auto generated by code.

Corollary: PostgreSQL is also web-scale! ;P

Wasn’t latest Mongo built on Postgres backend too?

I think your thinking of the BI Connector for analytics/SQL compatibility.

From the docs:

Changed in version 2.0: Version 2.0 of the MongoDB Connector for BI introduces a new architecture that replaces the previous PostgreSQL foreign data wrapper with the new mongosqld.

No, MongoDB has its own storage engines, it's not built on top of anything else.

I was reading a post [0] by Brian Cantrill that predicted this would be the result of licences like the SSPL. I instinctively disagreed with him, but it turns out he was right: "The cloud services providers are currently reproprietarizing all of computing — they are making their own CPUs for crying out loud! — reimplementing the bits of your software that they need in the name of the service that their customers want (and will pay for!) won’t even move the needle in terms of their effort."

[0] http://dtrace.org/blogs/bmc/2018/12/14/open-source-confronts...

I liked that post - this especially near the end, referring to Adam Jacob and some of his posts.

> Adam has endured the challenges of the open core model, and is refreshingly frank about its economic and psychic tradeoffs. And if he doesn’t make it explicit, Adam’s fundamental optimism serves to remind us, too, that any perceived “danger” to open source is overblown: open source is going to endure, as no company is going to be able to repeal the economics of software. That said, as we collectively internalize that open source is not a business model on its own, we will likely see fewer VC-funded open source companies (though I’m honestly not sure that that’s a bad thing).

Years ago I realized that a hidden driver for the growth of cloud is this. The cloud is DRM, and almost uncrackable DRM at that since you have neither the code nor the hardware.

Basically. The one who controls the servers is King.

The code needed to run those servers is the secret sauce and a huge competitive advantage, but with open source software you're giving away the secret sauce and the business victory goes to the one with the most business friendly servers

(There are many dimensions to "business friendly", a big one of which is "it's easy for us to start using this additional service since we're already paying this company for other services")

Software and the control plane is the razor, compute resources are the blades. Amazon's software is its loss leader.

Not really. There's roughly a 50% premium vs raw EC2 instances for any RDS related service. The crux is keeping the operating cost below that delta.

If you haven’t already, do a search on FSF.org for “service as a software substitute”.

What is the way out? Would love to hear from people.

Ultimately, Cantrill put it well:

> ...for those open source companies that still harbor magical beliefs, let me put this to you as directly as possible: cloud services providers are emphatically not going to license your proprietary software. I mean, you knew that, right?

MongoDB Inc cannot make Amazon pay commercial license fees. That is not a thing that will happen. They have a lever in front of them with two positions, one of which is "large cloud companies might use your software for free", and the other is "large cloud companies will not use your software at all". They didn't like the first option, so they gave the lever a yank, but they're not going to like the second option, and there is no third option.

The way out is not to try and build a business on the assumption that people who have no interest, requirement or reason to give you large amounts of money will inexplicably do so anyhow. :)

This thread already has people eyeing up DocumentDB's pricing and comparing it favourably to MongoDB's competing Atlas service, and it's almost unthinkable to suggest that Atlas can compete on price with Amazon. The way to win this game is not to play; the rules are not in your favour.

> They have a lever in front of them with two positions, one of which is "large cloud companies might use your software for free", and the other is "large cloud companies will not use your software at all".

Was that even the goal? My impression of the licensing change was not that they expected to Amazon to pay fees for offering a hosted MongoDB service. It was instead to lock Amazon out, and keep MongoDB Inc. as the only "cloud provider" of a hosted MongoDB service (perhaps still on top of AWS but with separate management interface).

> My impression of the licensing change was not that they expected to Amazon to pay fees for offering a hosted MongoDB service. It was instead to lock Amazon out, and keep MongoDB Inc. as the only "cloud provider" of a hosted MongoDB service.

Oh absolutely. I don't think they really thought they could force Amazon to license MongoDB, but I do think they believed they could force Amazon to not offer something that competed directly with Atlas.

That hasn't worked out for them very well.

(Not that I think leaving the license alone would have worked out any better. To the best of my knowledge, the MySQL, Postgres, Redis, and Memcache projects have not particularly benefited from Amazon building RDS and Elasticache on top of them, and I see no reason to think Amazon would have contributed a bunch of great patches upstream for MongoDB either.)

I think PostgreSQL does benefit from Amazon RDS and Google Cloud SQL indirectly.

Unlike MongoDB, it is a real volunteer-led open-source projects, and the goal is to provide an excellent database to users rather than make money. Having easy-to-use cloud hosted versions available helps with attracting users, mindshare, and perhaps in the long run developers to the project itself. Having cloud hosted versions from big vendors means that it's easy to justify "we'll use PostgreSQL for this project" to management or clients.

Could Mongo or other companies use the Oracle v. Google precedent regarding API copyright to extract money from competitive vultures like Amazon?

I hope like hell that horrible precedent doesn't stand; if you find yourself on Oracle's side you may want to rethink some of your priors. Regardless of the rights and wrongs of this specific issue, some solutions are worse than the problems they solve.

No, because the API was made open source as it’s just part of the MongoDB source code. Future changes to the API made under Mongo’s new license would in theory be eligible for such protection - but what that means in practice is anyone’s guess. For starters they would need to be “substantial”. I can’t imagine Mongo going down that road.

Oracle are the good guys in this scenario?

They always were the OK guys in that argument. Google invented a whole new VM and bastardized the language just to get out of a $1/device licensing fee for mobile uses. The Java ecosystem has been irreparably harmed by Dalvik and its lack of support for more modern versions of Java.

On another note, anyone that doesn't think API design is a creative endeavor and worthy of protection probably has never made a great API before. It may be OK to accept that and also let other people use the API for free but I think ruling that it isn't is BS.

I also always found it amusing that people thought API design was not creative and protectable.

Like, “how many ways can you do a date api”, and then turn around to look at the original java Date api, the Calendar api, JodaTime and JSR310.

An API is just a collection of facts of the form, "if the system gets input X, the system produces output Y". And facts shouldn't be copyrightable.

You could describe inventions as "facts" too, are you saying that inventions shouldn't be patentable as well?

Maybe the fundamental properties of the universe aren't copyrightable/trademarkable/patentable, but what you CHOOSE to do with those - what API you design or what widget you build out of it certainly is.

Patents and copyrights are two very different things, though. I don't know if APIs are patentable, but that's a very different question. Has anybody ever successfully patented an API?

> The Java ecosystem has been irreparably harmed by Dalvik and its lack of support for more modern versions of Java.

So if ReactOS gets popular but doesn't support Windows 10 APIs, will it be harming the windows ecosystem? If popular implementations of a tool exist that don't chase other (official or not) implementations' features but still get lots of users, that probably means that the popular implementations provide other benefits.

> API design is a creative endeavor

I agree with that.

> and worthy of [legal] protection

But not that.

With current copyright law you basically can't agree with both of those statements as they are mutually exclusive.

Not likely because the Apache 2 licence version they are compatible with includes an explicit copyright and patent licence grant.

I don't think so, isn't the API version they're using still covered under an Apache license?

Even if true this would not be a win for open software.

Pricing for smaller workloads is better on MongoDB Atlas right now. The DocumentDB performance pays of for super large collections and really high read/write workloads.

As a DevOps consultant; if Amazon is already setup as a vendor, I would just use DocumentDB. Setting up a vendor can be a major hassle and is not worth the saving of a few $$ per month. It's also much cheaper than spinning up and managing a EC2 instance with MongoDB installed on it since most of the operational knowledge can be deferred to AWS.

There's no secret formula to stop people from competing with you. If MongoDB Inc is successful, it should be because they run a good document-database-as-a-service people want to use, not because they earn indefinite seigniorage from launching a popular open source project.

> If MongoDB Inc is successful, it should be because they run a good document-database-as-a-service people want to use

Unfortunately, something that is good, and something people want to use, are not the same thing. People will use AWS's offering even if it is worse and harder to use, because it is bundled as part of AWS. That is a safe option (it can't be that bad if AWS has released it) and an easy one (no need to think about what to use, you are using AWS already.

Being a big provider of virtual machines puts them in a very strong position to sell loads of other stuff.

But isn't it wrong to place all economic value in the hosting layer rather than the software layer?

Maybe, but MongoDB did that by themselves by making the software layer free.

because they run a good document-database-as-a-service

Spoiler: they do not

They don't?

I've been using Atlas for over a year now and I don't have any complaints. It was super quick to set up and I've never had a single issue in terms of performance or availability.

What have your issues with Atlas been?

In comparison to what?

The problem is VC backed companies expecting ridiculous multiples.

There are thousands of very successful and profitable software companies that make proprietary products and offer managed services, training, support, etc. It's a great business, but it's not going to offer 100x wild startup growth.

These companies would all do fine if they bootstrapped or took a small seed/loan instead of taking on 100s of millions.

Stop assuming the value in the development ecosystem belongs to you (and should be extractable as money). It doesn't.

Realistically, the next step you will see, unless something changes, is that they will start going after people for API duplication. They have precedent (currently) on their side in the US.

None of the reasonable players will touch this, but you can be sure some VC backed "open source" player will be willing to touch this 3rd rail in exchange for a Series A.

> Realistically, the next step you will see, unless something changes, is that they will start going after people for API duplication. They have precedent (currently) on their side in the US.

IANAL, but since they already released the API as open-source under the Apache 2.0 license, this avenue is closed off to them.

The API is implemented on the server side; the licensing of the MongoDB drivers is irrelevant.

Arguendo: Cloud providers seem to be assuming that the development ecosystem belongs to them. They are extracting tons of money. Why don’t they just accept that hosting storage and compute will become commodity services, driving margins toward zero, and give up?

Plenty of reasonable players will touch Oracle v. Google going forward. I’m as eager to debate the opinion as other counsel. But procedural history demonstrates directly, not theoretically, that it’s effective against tech giants.

In the matter of API Owner v. Google, if API Owner touches that “third rail”, Google gets the shock.

By "they" do you mean Mongodb?

The linked blog sorts of hints at it, but the way out is to not try to build business models around people paying directly for some sort of license.

Successful open source does not require someone making money off developing it. It is successful when it is something that helps a profitable company but is not core to their business; then, they benefit from making it open source and having everyone contribute to its development and maintenance.

Or, you make money off support and consulting.

The key take away is, you aren't going to make money off selling licenses for open source. Which is good, I think.

SSPL, the new license for Mongo, isn’t written to force developers already using Mongo to build apps to pay license fees. It’s designed to stop cloud companies from offering managed Mongo with closed service rigging.

I suppose Mongo could sell exceptions to cloud companies, the way other companies dual license libraries or frameworks. But even Mongo’s bread and butter paid deals aren’t primarily about alternative license rights for open code. They’re about closed add-on code and services, as you describe.

Dual licensing, on its own, is an old and plenty good model for funding development of open source code. I’ve heard wind of dual licensing deals done decades and decades ago, maybe even before GPLv2.

Right, but the point of the article the GP linked was that expecting a cloud company to pay for a license for add-on code... instead, they are just going to write their own versions to work with the open source parts.

You’re right about the article. But the SSPL approach is different from what we’ve seen from Redis Labs, Elastic, Cofluent, and Cockroach. SSPL applies to Mongo’s “open core” itself. The other companies have applied new terms to previously “closed shell” add-ons.

The question is whether giants will pay the cost of reimplementing entire stacks, core and shell. I don’t have the time myself, so I’ll have to wait on a report about how compatible AWS DocumentDB really is.

Given AWS history, I’d expect they’ll get most of the popular functionality, most of the way, but gotchas will abound, and they’ll never hit 100%. Switching cost of code won’t bottom out unless DocumentDB takes lead mindshare, which closed clones rarely manage.

Not sure this would fall under SSPL in any case. It's clear that what Amazon is doing is using Postgres under the hood, not really mongo. So I'm not sure how that would work if you make an interface shim to make postgres look like mongo, are you then subject to the mongo license? the postgres license? the apache 2.0 mongo api license? all of them? what if clauses of them are mutually exclusive? etc etc etc.

Just at a cursory glance it certainly seems like only the apache 2.0 mongo api license would apply. But I guess mongo could try to force the sspl on amazon?

Now I kinda hope Oracle decides to buy out MongoDB and integrate it into their own cloud. Then Oracle can decide to pull the same bullshit that they did with Google over the Java APIs with the MongoDB APIs but now against their current enemy Amazon (and Microsoft, too).

Then a combined Google + Amazon + Microsoft may finally be able to reverse the API Copyright insanity that is hovering ominously over the tech industry, and Oracle can continue to be a shining city upon a hill of shitty technologies you should never allow your business to adopt.

AWS is preemptively defensive about API licensing claims: "Amazon DocumentDB implements the Apache 2.0 open source MongoDB 3.6 API".

I think they are referencing the drivers which are licensed under Apache 2.0.

I've always seen Google+Android as the good guys that gave Java new life while I saw Oracle has the bad guys that bought Sun and killed Java.

I don't understand why people are reacting to it so aggressively. That's basically how AWS works, they did the same to Apach Kafka with Kinesis, Prestodb with Athena, PostgreSQL and MySQL with Aurora, Redis with ElastiCache and many others over the last 4 years so it's not new.

It took too long for the open-source community to figure out that the cloud providers are killing them, now it's too late. Well played, AWS.

> It took too long for the open-source community to figure out that the cloud providers are killing them

How are service providers killing FOSS? That doesn't make sense. Permissive FOSS licensing allows anyone to use their software, regardless of how it's used, and that's how it should be.

Do you get to see AWS's source code for these services?


That's how it's killing "FOSS". Extend and Extinguish. This is not a new playbook.

I don't think you get to see MongoDB's source code for the enterprise edition either (though I couldn't quickly verify on Google).

Of course they provide source! Source rpms and tgz are downloadable.

Enterprise isn’t gpl, but source is provided.

(This could have been easily answered with a google search, as you pointed out)

No, I meant that I did search it on Google, and couldn't easily see from the results which case it is. Google "mongodb enterprise source code" -- which one answers the question?

If it were so easy, you could have provided the citation yourself in that comment.

Well, people are angry about that, that’s why they react aggressively.

Open-source gave AWS the ability to monetize their software so the software companies should be careful enough to prevent any other big company to steal their software and use their name to make money.

I think that it's too late considering AWS already did that to most of the industries but here is Hazelcast's take: https://www.linkedin.com/pulse/open-source-needs-protect-its...

If I'm reading the pricing page correctly, DocumentDB would run a _minimum_ of $200/month. That's for the smallest instance and no storage or I/O. Kind of steep if you ask me.

We were paying $5k a month for Atlas. So while it's not 'cheap' for a hosted solution it's cheaper. And the autoscale and RR is better DR is super configurable. And then there is this line.

'Together with optimizations like advanced query processing, connection pooling, and optimized recovery and rebuild, Amazon DocumentDB achieves twice the throughput of currently available MongoDB managed services.`

Can you please elaborate? This was launched today, you had access to the new feature in advance?

MongoDB Atlas is the name of the cloud service run by MongoDB themselves, with which Amazon DocumentDB competes. https://www.mongodb.com/cloud/atlas

Not OP, but that wouldn’t necessarily be a surprise. As customers make product requests to AWS they can be tapped to test upcoming launches - anything from pre-release testing to very early alphas.

I can confirm that this is very much a thing that they do. We have an account manager able to bump feature requests over to the appropriate product managers, and have been involved in pre-release testing of features that we expressed interest in.

As others have said we had MongoDB Atlas. And it is basically mongodb ran in aws with a pretty interface to do basic things like whitelist ips and another such functions.

Yeah, that's pricy. They're definitely not going after early-stage startups then.

But if you have a medium-sized data set (eg. 50+ GB), this is definitely competitively priced. More RAM, storage, compute than Mongo Atlas and Compose for less money.

Here's hoping they introduce cheaper options!

50GB is a really small data set.

Eh, by what measure? Realistically it's probably bigger than 90% of all Mongo datasets.

It's tiny if you're a massive company and it's massive if you're a tiny startup.

50GB easily fits in RAM. It's a small dataset.

If you can run the dataset comfortably on a Macbook then it's very, very small.

Heck, you can even just use grep over 50GB reading straight from disk. It's tiny.

Is an argument based on the premise that relative terms have absolute meanings a good use of people's time here?

A recent work Slack chat had a dev asking what a particular table contained. They were going through our data inventory and found a randomly-named table 18TB in size. When I ran "select count()" against it, I got back 5,325,451,020,708 rows (that's a copy-and-paste).

50GB isn't trivial, but it's utterly manageable.

It seems a bit wrong if you have a 18TB table but no idea what it contains...

It was a temp table that we hadn't garbage collected yet. We don't make a habit of leaving that much junk data around, but it bumped our monthly storage bill several percent, not like tripled it.

Was this a relational or NoSQL DB?

It's primarily in things like Spark and Snowflake that act like relational DBs as long as you squint the right way.

in my experience it qualifies as "medium"

If it can be stuck in a sqlite database and run on a developer laptop, then no, it is not medium by any standard.

Please elaborate why you think 50Gb is anything other than a small dataset that can fit in memory on any half-decent server though.

[edit] in the spirit of not being a condescending tool to you, i'll replace my original reply with this: https://en.wikipedia.org/wiki/Long_tail

I'm assuming this is a joke. You can run databases that size without any of the fancy scalability stuff - no sharding no anything. I'd actually recommend that, it's makes admin super easy!

Besides that AWS will charge per transaction (at 0.2 per million) outrageous given that you already pay per instance.

Correct pricing strategy needs to be per request or per instance, AWS is charging for both

I would guess the pricing model is actually closely related to the main dimensions of their costs and is quite valid.

The key point is illustrated by this quote from their main landing page: "storage and compute are decoupled, allowing each to scale independently".

This suggests it is built on top of the Aurora storage layer, or something similar, as other comments have suggested. This means there is a real cost per I/O operation because you aren't limited by the physical hardware of the compute instances, you get "free" storage nodes underneath that do much more than traditional storage and thus have to be built into the pricing structure.

It is definitely not going to be the cheapest possible solution for all use cases, but do the math before you reject it. If it does follow the Aurora pattern, then the number of I/O operations you are billed for will be a lot less than you may think because, to use another quote from their product page, "Amazon DocumentDB reduces database I/O by writing only database changes to the storage layer, avoiding slow, inefficient, and expensive data replication across network links". I think that quote is harder to understand without background as it sounds like market speak, but lines up very well with some of their in depth Aurora whitepapers, such as https://www.allthingsdistributed.com/files/p1041-verbitski.p... Again, I haven't seen evidence this is based on Aurora but the details they talk about line up really well.

The correct pricing strategy of any product is "whatever the customer is willing to pay for it". If you feel the price is too steep for your use-case, then don't buy it.

That's rarely actually true for anyone that wants to operate for more than a short time period. There are significant costs to gouging your customers. Anything from it being illegal, to it encouraging competition and your customers being motivated to actively flee you and shit on your reputation. The correct pricing strategy for people that don't have a long term enforceable monopoly is "whatever most customers are willing to reasonably happily pay"

The minute that you have customers paying any amount at all, you set yourself up for possible competition undercutting you on price. The truth is, whether you have a great or poor relationship with your customers, unless you have legal protections you have very little control over whether competitors will eventually enter your market or not. So you need to always operate as if there is competition breathing down your neck.

Pricing strategy has little to do with customer happiness in aggregate. Every price will make some customers happy, and other customers feel gouged, because different customers extract different amounts of value from your product. The key to protect yourself from competition isn't to spend time worrying about how pricing affects your aggregate customer volume, but about whether your customers are happy. Maybe some customers are unhappy because they feel gouged. Maybe you could make them happier by reducing prices. But maybe, you're better off letting them go, if they represent a small minority of your users, and instead focus on what a majority of your users might appreciate more - better service, relevant features, etc. which make them happier.

I think you've hit on what always bothers me about this sentiment. It is obvious that at any point in time you can charge the maximum customers are willing to pay, but that allows for disruption through the channels like competition. The opposite where you charge the minimum to continue providing the goods or services seems optimal, though, leads to a company with zero profits that is unattractive to investment. Is there any literature on how to identify the optimal point of "whatever customers are willing to reasonably happily pay"? Businesses successfully exist on many points in the spectrum of zero profits to most profits the market will bear, but I'd be interested in anything discussing optimality.

[Edit] Amazon employee working in Physical Consumer (not AWS). Asking out of personal curiosity.

I'm not an economist, and can't point to to anything in particular, but I would be skeptical of anything that claimed a general approach to that. "Optimal" depends entirely on what you're optimizing for, which is basically an infinite possibility space. I could need a significant amount of revenue immediately to accomplish a desired business development, or I could have plenty of cash and want to build a large and loyal long term customer base at the cost of immediate profit. As you say, successful businesses exist doing pretty much everything. The only limiting factor is being a viable ongoing concern (and that can just mean having a rich backer). I'm sure there are things discussing optimizing based on small slices of the possibility space though (but all the normal caveats about economists making dumb assumptions that rarely apply to humans apply even to those).

> Amazon employee working in Physical Consumer (not AWS)

You too? I'm in AFT. I posted the original "whatever the customer is willing to pay" comment. Mostly just offhand and yeah there's a lot of nuance to it.

I don't mean that anyone should want to individually gouge each customer, but when running a business one should pick a price whereby the total long term profit is maximized.

Your pricing determines the number of customers. Your pricing also determines the profit on each customer. But choosing your pricing strategy correctly, you should have some people who won't buy your product.

>> "whatever most customers are willing to reasonably happily pay"

Do you have a better idea of what this is then they do?

Considering they already have launch customers actively using this product and there are several comments on this page saying pricing is better than MongoDB?

AWS very often charges across multiple axes. I tried to model out our Cloudfront charges and they charge there for 3-4 different factors, each of which varies in pricing by region.

I think the idea is that by charging precisely where they incur costs, they can be much more reactive to different usage patterns, and therefore be more competitively priced overall.

Although it certainly does create lock-in due to not being able to figure out your billing and accurately model alternatives.

Seems to be really aimed at businesses which want to get off of MongoDB desperately.

Not really at all.

It's targeted at enterprises like mine who currently use MongoDB on premise and are looking for a managed solution. The advantage of AWS over Atlas is you can use the same security and governance approaches e.g. IAM policies, ADFS/SAML integration, Cloudwatch/Cloudtrail etc.

Exactly- atlas kills me without any type of SSO options for the control plane.

Also I feel that they HAD to offer this to counter Azure CosmoDB

So they’ll have a huge target market.

I’m not a fan of mongo, but I’ve run a fairly sizable enterprise platform on it for nearly a decade and we haven’t had any major issues that would make replacing it an urgent desire.

Once there is AWS version, it seems like a matter of time before it becomes the safe choice. Nobody got fired for using AWS.

Finally an AWS service where the name makes sense and describes what it is. I hope this is the start of a trend.

Yeah, they copied Azure basically. DocumentDB was the name of an Azure service in past, interestingly it offered MongoDB, Gremlin and other API gateway options. Its called Azure CosmosDB now.

I love Azure for this. The names are almost all extremely straightforward. There are a handful that have made the jump from confusing to straight forward, and a handful that have made the jump from straightforward to confusing (CosmosDB, formerly DocumentDB, chiefly comes to mind).

Agreed. Too bad the Azure portal is the polar opposite. AWS, for all its faults, is mostly just a boring HTML portal but it works.

Azure tried to get fancy, with side sliding panels all over the place, and it is barely useable. The nicest thing I can say is it is "quirky." It isn't really productive however, particularly not on my 1080p monitor at Windows 10's default 125% DPI.

I literally quit Azure's Application Insights and went back to Google Analytics simply because I hated the Azure UI with a burning passion of a thousand suns.

The concept of writing queries is good, but if that's the only way you can get at your data you better make it damn easy, and they didn't. I'm sure for full time data pros it is a dream however.

Azure Portal feels like if someone tried to make the Xbox 360 blade interface into an admin tool, without first asking the admins what they needed.

That's interesting. I actually quite like it. I can build monitoring dashboards for our various services an see how something I don't need to monitor is doing just by going to the panel for it. To each his own I suppose.

Except for the fact that Azure names seem to change once per year.

Our Azure SA was giving us a presentation and actually got confused himself. "So that's TFS... I mean VSTS... Actually wait, it's Azure DevOps now?"

I worked at Microsoft for a while and I swear most of their "upgrades" are nothing more than renaming things and juggling menu items around so people can't find them.

Just wait until next year when it’s called TFSHub.

Yes the names are good but they still suffer for the MS illness: they change every few years. I also agree with other comments about the portal UI. Heck, it's supposed to be a professional tool...

Surprised they didn't go with a 3-letter acronym. AWS DDB.

Recently I made a typo on a formal document. Wrote "AMI" when I meant to to write "IAM". Oops.

DDB is typically used to denote DynamoDB

i honestly thought that was his joke

I would have preferred AWS D2B.

My brain keeps trying to parse this into either DB2 or some Star Wars reference (R2DB, RD2B, etc.)

Looking through the supported APIs (https://docs.aws.amazon.com/documentdb/latest/developerguide...), it appears DocumentDB has no support for Mongo's oplog (https://docs.mongodb.com/manual/core/replica-set-oplog/), or change streams (https://docs.mongodb.com/manual/changeStreams), which I guess is no surprise because change streams were introduced in Mongo 4, whereas DocumentDB copied the 3.6 API. So DocumentDB seems much less useful as a reactive data store than MongoDB.

In other words, DocumentDB is only a drop-in replacement for MongoDB if you weren't using any of the features Amazon decided not to support.

Happy to be corrected if I'm misreading the documentation!

Also the aggregation pipeline is seriously hobbled with way more No-s than Yes-es over here https://docs.aws.amazon.com/documentdb/latest/developerguide...

I agree with you.

Having said that, when we were working on https://www.torodb.com we discussed how we'd implement the oplog. And actually, based on PostgreSQL's logical decoding (LD), it wouldn't have been a great deal (there are some gotchas, but LD brings much of what you need. So I won't be surprised if this would be implemented sooner than later.

Interesting. I think the sole purpose of this product is to wean existing Mongo customers (3.6-). And only those customers who are happy with Mongo API but not MongoDB itself. Is that such a huge market? Would be curious to see how this solution is adopted.

Weird question: Could Microsoft sue Amazon here for infringing on the DocumentDB name? I mean Microsoft's DocumentDB was among the first to even have such a MongoDB layer also) and that was like 3 years ago.

Given that current Amazon leaders actually came from Microsoft's data platform group this leaves a bit of a bad taste behind.

I'm not working for either company.

My assumption is that DocumentDB falls into a category of being so generic you can't trademark it, or otherwise claim exclusivity to it. Its literally just describing the fact this is a database for documents.

Bear in mind we're talking about the company that trademarked "Word" and "Excel" here.

For all any of us know, Amazon's lawyers already talked to Microsoft's lawyers about it and got permission beforehand.

See: Apple licensing the iOS name from Cisco before announcing the name change.

Sounds like this runs on the same storage service as Aurora.

Not sure why I'm getting downvoted. The characteristics sound exactly like Aurora.

- "replicates six copies of your data across three AWS Availability Zones (AZs)" [0]

- "Amazon DocumentDB uses a distributed, fault-tolerant, self-healing storage system that auto-scales up to 64 TB per database cluster." [0]

- "When writing to storage, Amazon DocumentDB only persists a write-ahead logs, and does not need to write full buffer page syncs." [1]

[0] https://aws.amazon.com/documentdb/

[1] https://aws.amazon.com/documentdb/faqs/

I'm guessing if you would have included this reasoning in your original comment, then it wouldn't have been downvoted.

Yeah, downvoting is totally broken on Hacker News. Seems like anything that isn't immediately agreed-with gets downvoted. I don't know where all the small-minded people come from, but they seem to have found their home here.

There is just some random noise. It might just have gotten one downvote. Now it's the top-comment.

I'm pretty sure this is going to kill Mongo as a company dead. With this in existence there's literally no reason to use Atlas.

If they wanted to twist the knife they should get to work implementing a pass through migration option.

MongoDB still have a strong hand

- control of the client and particularly its exposed featureset

- due to that, also control of the protocol and the ability to, for example, insert legally protected strings in the style of the Apple SMC signature into the handshake.

- ability to gate new features on the presence of an object like a copyrighted text, trademark, or even a crypto signature

- ownership of the name. AWS are pissing in Mongo's pool marketing themselves as compatible, and there are a variety of ways it could be made to backfire, if it were in Mongo's interests to encourage that outcome

- AWS focuses on breadth and very rarely nails any particular service. Their hosted Postgres for example still does not expose core features years later

- Following from that, AWS services on the whole are rarely best-in-class in terms of raw performance. I imagine Mongo could continue to easily compete on benchmark results running on AWS own infrastructure

I think this is a really interesting case, far more interesting than the technical minutia of Just Yet Another AWS service. It does not sit well with me whatsoever that they're basically ripping off a much smaller company's core tech while simultaneously borrowing their trademark (in a legally acceptable manner) as part of the marketing, but I also find it hard not to see a ton of potential upside from this for Mongo

The client change would never work. The client is licensed as lgpl, so if they tried to pull any funny business like that, it would be instantly forked and if’d out.

As the person suggesting it, it's difficult to imagine how it could never work considering I haven't managed to figure out all the possible combinations in which such a strategy could be applied. Finally, it is quite exasperating to call this kind of strategy "funny business" in a thread about their core tech being ripped off by a megacorp

Mongo and Amazon are both large companies, but mongo are the only ones here trying to stiff their customers. Selling a product that uses open source software is not ripping anybody off, and the only party in this situation who are upholding open source values are Amazon. The only thing I can take away from this is that if I get too successful using Mongo technology, that they’re happy to change their license to try extort money from me. I find this to be especially greasy since open source product like this become successful because of their open source nature. They exist because the community that exists around them, and for them to turn around and decide to spit in our faces by dictating how we can consume the product just makes me hope their product is forked and that they go under.

Their API is core tech? That’s like saying there shouldn’t be separate implementations of Java, right?

Have you heard of Amazon Elasticsearch Service, launched in 2015? Elastic is doing fine.

This is hopefully a good counterexample. Amazon's Elasticsearch Service is pretty bad (poor general performance, very slow to make cluster changes/launch new clusters, etc).

But I can't help but think Amazon can and would easily fix those things if they mattered. Amazon's hosted Elasticsearch is a lot cheaper than Elastic's, and I'll bet that's enough to get people to use it.

> poor general performance

By poor performance, I assume you mean IO? AWS Elasticsearch has supported i3 instance type (nvme on-instance storage) for well over a year now [0]. Additionally, you could enable slow-logs to catch perf issues yourself [1]

> very slow to make cluster changes

Scale-out and access-policy changes happen in-place now and so happen much faster than they used to be.

> launch new clusters

In my experience, it depends on the cluster size, but usually, I see cluster being up in 20m. That's nice given that it sets up pretty much everything (spin up instances, apply access policies, run health checks, enable cloudwatch monitoring, snapshots, create route53 records, integrate with cognito, enc-at-rest via KMS, spin up load balancers, setup vpc resources etc) on my behalf.

[0] https://docs.aws.amazon.com/elasticsearch-service/latest/dev...

[1] https://aws.amazon.com/blogs/database/analyzing-amazon-elast...

Actually not IO performance, but mostly CPU. Last time I tested (which admittedly was about a year ago), an AWS ES cluster was about 20% slower than a self-made cluster with the same instance types. Given that AWS ES clusters still cannot use C5 instances, which offer FAR better cost/$, the performance disparity today might be even larger.

I can also launch an Elasticsearch cluster myself in about 2 minutes via terraform, so 20 minutes is not super impressive.

That said I recognize Elasticsearch is actually quite a finicky beast to set up, and my setup only has to deal with the needs I have, and probably would be set up horribly for certain other people. I can see how a hosted system that has to deal with all the weird edge-cases of a few thousand customers would take longer to set things up.

Elastic Co isn’t profitable by definition it isn’t “doing fine”.

From their SEC filing:


We have a history of losses and may not be able to achieve profitability or positive cash flows on a consistent basis. If we cannot achieve profitability or positive cash flows, our business, financial condition, and results of operations may suffer.

You're right they're not profitable — and neither is MongoDB — but the point is that AWS launched an Elasticsearch service 3 years prior to Elastic having a very successful IPO supported by stellar metrics (also found in the SEC filing you linked). So the statements made at the beginning of this thread are probably a bit premature.

Only in tech do people think that a money losing company is “successful” because they were able to convince investors to buy stock instead of defining success as having a business model where income is greater than expenses.

In reality long term profitability is the only metric that matters for a corporation

And in contrast to these startups with their "success" AWS is printing cash for amazon which releases surprisingly few "metrics" beyond $ in and $ out.

And at the end of the day. What else matters when measuring whether a profit seeking corporation is successful?

Literally every S1 filing will have some sort of language like that. They are required to list the risks that may harm them.

They haven’t shown a profit yet. So you don’t have any proof that they have a sustainable business model.

Dj from MongoDB here. We have, obviously, been keeping up with this and other threads, but we've also been busy testing out Amazon DocumentDB's correctness and performance. While we're getting that together to bring you an official response in a few days, complete with test results and methodology, I'd like to pick up on a couple of points and some inaccuracies that have been repeated in various threads:

This move shows MongoDB’s approach to document databases is compelling. We’ve thought so for a long time.

A cloud-hosted, truly global and managed MongoDB, MongoDB Atlas, has existed for the last two and a half years and has been serving more and more satisfied users every day with some massive workloads.

MongoDB Atlas runs the full implementation of MongoDB in the cloud.

Many features of MongoDB are documented as not being implemented by DocumentDB: these include change streams, many aggregation operators including $lookup and $graphlookup. But beyond that, well let’s just say we’ve been staggered by how many tests DocumentDB has failed (no spoilers!).

The MongoDB API is not under an Apache license.

MongoDB drivers are still under the Apache license. The MongoDB server used to be licensed under AGPL and is now licensed under SSPL. The source code is open to all, as it has always been, at https://github.com/mongodb/mongo

DocumentDB is not cheaper than MongoDB Atlas. Preliminary estimates show this to only be the case with very large collections and very, very high read/write workloads.

There’ll be more next week over on the MongoDB blogs.


Any idea when Atlas will expand support for Sharding configurations and taggable zones? My impression is Atlas ONLY supports a 2 field shard, and the first shard MUST be location. Also it's impossible for clients to set write-concern to tags, because you don't support custom tags as MongoDB itself does.

SSPL feels a lot like a bait and switch to me

The current biggest threat to Free and Open Source software is cloud computing. Plain and simple.[0]

I know this is a blunt and harsh statement to make, but when you sell a service, you have zero native incentives to Open Source the way your system works. It just opens up Competition. This is not unique to AWS/Amazon. But their success gives them the power to have wide OSS damage.

This is, to me, the biggest reason why cloud portability should be something that every customer of a cloud service should have in their plans. Amazon as a company has shown no timidness in both "embracing, extending, extinguishing" their competition.

OSS literally built the internet and opened up the wold wild communication age, let's not be so short sighted that we don't see proliferation of cloud services ( specifically one having so much dominance), for what it really is.

[0] http://dtrace.org/blogs/bmc/2018/12/14/open-source-confronts...

This comment is pretty bizarre when taken into account with the link you referenced. Unless I'm completely misreading what Cantrill is saying in that blog, I don't think he agrees with you.

>And while they’re at it, it would be great if they could please stop making outlandish threats about the demise of open source

>Adam’s fundamental optimism serves to remind us, too, that any perceived “danger” to open source is overblown: open source is going to endure

>and in the end, open source will survive its midlife questioning just as people in midlife get through theirs: by returning to its core values and by finding rejuvenation in its communities

Cloud computing was arguably an outgrowth of inability to prevent piracy and everyone being encouraged to open source everything.

No, it was an outgrowth of infrastructure work being a niche trade, and capacity management by startups ( and mature companies), being a hard task.

This point is well known, and pretty much in every cloud providers marketing material.

So, even though it's impossible to prevent piracy, effectively forcing you to hide the code behind an internet API, the real reason to do so is something else, as proven by marketing literature?

If I could re-engineer MongoDB so that a monkey could administer, you'd recommend I still use the cloud model rather than sell binaries?

Why is it so expensive? Not only is the entry point $200/m, the instances are twice the price of their EC2 equivalents. (At first I thought perhaps the price included multi-AZ, but it doesn't.)

Seems like this is likely to be the real result of licenses like the SSPL. Not even a terrible outcome if the different implementations remain relatively compatible.

SSPL style licenses only work if the software can't be cloned but that's a difficult assumption.

Operating systems, compilers, and web browsers come to mind. There are currently:

* 4 independently-developed competitive compilers (gcc, clang, msvc, icc)

* 4 independently-developed competitive operating systems (windows, macos, linux, and bsd --I'm grouping the BSDs as one since their source code has a common ancestor)

* 3 independently-developed competitive browser engines, soon-to-be 2 (edgehtml, gecko, webkit)

And it's been that way for a few decades now; doesn't look like anyone is interested in taking the resources to make another one of those.

Throwing Chrome/Blink under WebKit is a pretty hard sell at this point. They’ve diverged enough that supporting one far from guarantees you’ll support the other. You might as well replace WebKit with KHTML in your list.

For the purposes of licensing and simple inertia (what it takes to start a project from scratch) -- that work was done once, with khtml, sure.

I’d be surprised to learn that AWS started and launched this project since the SSPL announcement. I suspect they began when latest Mongo was still AGPL, with no sign of impeding change.

Even though you're certainly right, Amazon offers 'real' options for Redis, MySQL, PostgreSQL, Elasticsearch, that all use upstream code. They will almost certainly never offer a similar thing for MongoDB.

So now AWS DocumentDB, Azure CosmosDB, and even Apple's FoundationDB have a MongoDB compatible API. I expect other multimodal databases to offer the same soon enough.

Strange turn of events for MongoDB but I guess that's what happens when the interface is open and anyone can build a backend to it, especially a relatively simple document-store.

And when they have a restrictive license.

I really wish this were priced more along the lines of https://www.compose.com/pricing - a $200/m floor is a tough dB cost to absorb on smaller yet important projects. Suppose an app has a few mb of data and maybe one day hits 100mb of awesomeness I really have to pay $200/m here?

I get it, I love Aws, just wish this was priced differently.

If you need a cheap/free document store for a small app, just use DynamoDB. It's free (forever, not just the first year) up to 25GB of storage and enough read/write capacity units to handle up to 200M requests per month: https://aws.amazon.com/free/?awsf.Free%20Tier%20Types=catego...

Exactly, this probably isn't targeting you. A few mb of data? Host an instance? Use sqlite? Atlas can run 3-5k/month as the minimum. This is going to be 10x cheaper (which amazon seems to try and shoot for).

Another thought along these same lines is how do you host non-production (Dev, QA, Demo, etc.) environments without spending a fortune? Sure my production workload is 15TB but my development system is only running 15GB and I want to develop against what I deploy to.

If you can create schemas, that could be one solution.

I do this with Azure SQL Server instances, so a single instance can host all the 'non-critical' environments (dev, test, QA, demo) - works great!

Azure has been running a Mongo-compliant DB under their Cosmos umbrella for quite a while. It's not clear to me that either Azure or AWS are actually running Mongo software under the hood or rather a proprietary DB that uses the Mongo wire protocol.


Doesn’t Azure also have a not-Mongo service also called DocumentDB? Is this the same code? These cloud services are confusing enough when they don’t borrow each other’s names.

The Azure service formerly known as DocumentDB is now called Cosmos DB. Part of that switch was introducing multiple APIs, including a MongoDB API: https://docs.microsoft.com/en-us/azure/cosmos-db/mongodb-int...

Correct. They added more gateway compat layers in addition to Mongo and renamed it from DocumentDB to Azure CosmosDB then.

That’s now Azure CosmosDB because it offers several different APIs for your data.

it's not mongo software, there are some incompatibilities.

I think it is likely it was the other way around. MongoDB caught wind of what AWS was about to release and changed the license.

Does that even apply? This is api-compatible, but doesn't appear to be using any actual MongoDB code.

It seems like Amazon may think so, with this line:

> Amazon DocumentDB implements the Apache 2.0 open source MongoDB 3.6 API by emulating the responses that a MongoDB client expects from a MongoDB server

Yeah, along with the FDB document layer: https://www.foundationdb.org/blog/announcing-document-layer/

I really wish this was serverless. Azure CosmosDB offers SQL and MariaDB interface against a serverless, pay for what you use database and DynamoDB is the only product of that class Amazon has. Even Aurora "serverless" appears to be little more than autoscale, and it requires an elastic IP which slows launch of lambdas since they have to connect to VPC.

> it requires an elastic IP which slows launch of lambdas since they have to connect to VPC

Do you have any other info/links related to this?

There's an enlightening graph in this post:


I was surprised to learn this. When working in Lambda, you have to choose between a relational database & a responsive API. It seems inevitable that AWS will fix this soon, but apparently this is a significant architectural problem. As I understand it, RDS instances should (must?) be accessed from within a VPC, and anything inside of a VPC needs an IP address, so the Lambda function has to wait ~10 seconds on cold-start for the Elastic IP service.

The only workaround I've heard of is to setup a service, such as CloudWatch, to call your Lambda function every ~25 secs to keep it "warm", but this seems anti-thetical to the value proposition of serverless architecture in the first place.

Of course, you could "just" use DyanmoDB, but IMO the query language is really limited, and I'm not sure I fully grasp the problem (why doesn't DynamoDB need to be accessed from within a VPC?)

> I'm not sure I fully grasp the problem (why doesn't DynamoDB need to be accessed from within a VPC?)

This is due to the fact that DynamoDB's query API is a standard AWS API which means granular internal/external access can be provided through IAM mechanisms (ie: roles, temporary tokens, federation, etc.).

On the contrary, to access RDS, Redshift or DocumentDB you would use standard ODBC/JDBC/Mongo facilities, which do not rely on IAM mechanisms, leaving VPC/Security Groups as the only isolation option.

Not quite. It’s not the auth mechanism or even the wire protocol. The issue is to accesS traditional resources in a VPC you need to have an IP address within the VPV to route network traffic to/from it. It’d be the same if you ran a DB on and EC2 instance or even ran your own DynamoDB clone with no auth.

AWS services don’t have that issue because they’re accessible from anywhere on the network, even through an internet gateway / internal NAT.

I think that was my point.

Services with "native" AWS APIs use IAM for granular access management. Other services can only support access restrictions using the network so that means VPC/Security Groups.

The key is that DynamoDB is just any old web service as far as you code is concerned. Everyone uses the same public endpoint, same domain name, same API, authenticate with IAM. You can also integrate it by using DynamoDB Streams as a trigger for Lambda functions which is of course a purpose-built feature. There is DynamoDB Accelerator (DAX) which is a cacher running in EC2 that runs as a normal VM with an EIP.

TL;DR: RDS and this new DocumentDB are essentially AWS managing your database VMs in EC2. Advantage is drop in compat with regular apps expecting to reach a local server, because it is local to your VPC and uses normal ports. Can make them public accessible via VPC firewall, but less secure that way. DynamoDB was designed from the ground up as an HTTPS API and that's the only way its accessed.

This looks great for teams getting started in AWS, being able to reuse idioms, libraries and knowledge in managed services to remove some of the operational load.

A note this is also nothing new, Azure Cosmos DB has had this for a while.

Serious question: is there any real reason to use Mongo over Elasticsearch?

ES grew out of Lucene, which provided an inverted index of all the text in a document, with a bunch of NLP related features bolted onto that. While ostensibly designed to be developer friendly, ES had and still has a horribly hacked together API with bugs and mis-documented misfeatures all over the place. In my experience it's anything but developer friendly. But if all you need is a text index on a document store with a static schema, it does its job reasonably well.

Mongo started as a very developer-friendly data store with lots of overstated claims to being a database. While earlier versions of Mongo were wildly dangerous to use as a business critical database, it has since matured and is now quite good at being a developer-friendly document-centric database. In my experience Mongo truly is developer-friendly as long as you don't try to use it as a full-blown transactional database with lots of complex data shapes and indexes.

I would not trust ES with anything but text search on a document store, and I would not trust Mongo with anything resembling multi-document transactions. With that said, they are both good at specific, different things.

MySQL and Postgres have their own baggage that makes them pretty terrible in some aspects. IMHO a JSON-over-HTTP API should really be table stakes for a database to be considered developer-friendly nowadays. (But please don't butcher HTTP like ES did and then claim to have that.)


No no no no, again no.

We don't need yet another shitty query language bolted onto one of the most error prone and annoying to type serialization formats while transmitting data on top of a by default stateless protocol that makes no sense for a database.

I'm sick of it.

SQL. The same queries will work in 95% of the case on any SQL db. There is a driver in almost every language that is robust. There are implementations great for every use case; embedded, scalable, transactional. That's friendly.

There is nothing wrong with SQL, it's freaking awesome.

Sure, keep the SQL. Just make it a field in a JSON POST payload, and send the results back as JSON.

The "drivers in almost every language" suck. They all suck. I've never seen a SQL driver and wire protocol that was not awful in some way. The statefulness is part of what makes them awful. We have better ways to keep track of state now.

> I've never seen a SQL driver and wire protocol that was not awful in some way.

Have you seen the PostgreSQL wire protocol[1]? I recently built a logical replication client driver for a project and found the protocol to be excellent. After looking at the documentation, I'm no longer limited to languages that have drivers for Pg, because I know how easy it'd be for me to just write one.

Just because some SQL drivers and wire protocols are awful (looking at you, Oracle[2]) shouldn't mean one should go running to the hills, let alone to JSON.


1: https://www.postgresql.org/docs/current/protocol.html

2: https://noss.github.io/2009/04/28/reverse-engineering-oracle...

An HTTP/JSON protocol doesn't have to replace the standard one. But having such a standard protocol makes sense in the age of web apps, particularly when NoSQL offerings that are perceived by the market as competitors (leaving aside whether they really are - perception matters more here) do that already.

> An HTTP/JSON protocol doesn't have to replace the standard one.

So, two protocols? Two standard protocols is rarely better than one.

> having such a standard protocol makes sense in the age of web apps

Only if the existing standard protocol cannot work with the "web", and we have plenty of history proving otherwise. Replacing the existing standard with the loose JSON would be, strictly, a downgrade; and unnecessary, because we already do interoperate JSON and SQL. See: PostgREST and the many REST & GraphQL frontends on PostgreSQL.

> particularly when NoSQL offerings that are perceived by the market as competitors (leaving aside whether they really are - perception matters more here) do that already

This is really more of going into a pig's pen and wrestling with them. A database should do the job of a database. Competing for perception in a market that cannot make sane decisions for itself is how we get MongoDB.

When was your experience with Mongo? I'm wondering because I use Mongo now with transactions (supported since version 4) and I like to know about problems I might face in the future.

Elasticsearch is a distributed index, not a reliable document store. You can expect availability problems and data loss. This is fine, because you can rebuild the index from your source of truth.

In my experience with ES, definitely don't treat it as a source of truth; allow it to be rebuilt.


I run a startup that uses both of these.

Mongo is a heck of a lot easier to configure and develop with, and works great as a general database with a rapidly changing schema.

Elasticsearch is great at solving specific problems like searching for items in a specific way, but it's got quite a learning curve and is pretty painful to host and configure compared to Mongo where you can have a prod instance going in seconds.

Is it a good idea to use a database instead of a search engine?

Yes. Elasticsearch is not nearly as easy to use and is not really designed to be a transactional database.

Apples and Oranges. Some use-cases may overlap, but not all. For instance, why would you use MongoDB for centralized logging, as opposed to the ELK stack?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact