> A spokesperson at MongoDB said: "The rise of MongoDB imitators proves our document model is the industry standard. But bolting an API onto a relational database isn't innovation – it's just kicking the complexity can down the road. These 'modern alternatives' come with a built-in sequel: the inevitable second migration when performance, scale, and flexibility hit a wall."
I think the reason that a there are so many MongoDB wire compatible projects (like this postgres extension from microsoft, and ferretdb) is because people have systems that have MongoDB clients built into their storage layers but don't want to be running on MongoDB anymore, exactly because "performance, scale, and flexibility hit a wall".
If you can change the storage engine, but keep the wire protocol, it makes migrating off Mongo an awful lot cheaper.
MongoDB imitators? Wasn't CouchDB before MongoDB? CouchDB also stores JSON documents. They did create an imitation of the Mongo query syntax, but the document model doesn't seem to originate from Mongo, as far as I can tell.
In some ways it is another reflection of the unix philosophy "worse is better". The CouchDB wire protocol is really complex because A) it starts with an almost full HTTP stack and is "REST-like", and B) prioritized multi-database synchronization and replication first and up front, which is incredibly powerful and useful, but makes compatible implementations a lot harder to write. MongoDB's wire protocol is simple, stupid, relatively efficient (binary encoding of JSON over plaintext JSON encodings), had a query language that specifically wasn't built on REST terms, and was so simple to implement that every database could do it (and have done it).
MongoDB's protocol has fewer features (RIP easy synchronization and replication) but worse is better and if the "core" feature of a database it its query language, then the best/worst query language won, the simple, dumb, easy to replicate one.
To be fair, I'd love a plugin/framework/microservice you could simply on top of Postgres for having CouchDB's offline synchronization capabilites.
Like the syncing capabilites for mobile and other "sometimes offline" devices are fire.
But in my stack, everything else is already running in a Postgres already, and that's the source of truth.
At one point I started to explore what you would need to adapt any MongoDB compatible DB to work with CouchDB-style synchronization. The biggest problem is you need a consistent Changes view of the database and that's optional from MongoDB-provider to MongoDB-provider. Had I continued on that project it probably would have wound up Azure CosmosDB-specific, specific enough that it didn't feel particularly good for ROI, and was starting to feel like a "back to the drawing board/time to leave CouchDB" situation. It's interesting to wonder if the CosmosDB/DocumentDB pieces that Microsoft is open sourcing (as mentioned in the article here) would eventually include a Postgres extension for their Changes view? Had that been done when I was evaluating options in that past project, it might have lent more weight to that ROI decision that it would also support open source Postgres databases with these open source Document DB extensions.
What a load of b.s. in that quote. Why already coming into panic and dismissal mode? Because we really do need more than just a database that stores bunch of JSON and cannot properly do transactions when required.
For known attributes the model based on rigid tables having strict typing, with unknowns placed in one or more JSON fields works best for me. I mean for example there like 100 known attributes of a product, and the category-specific ones, the unknowns generated dynamically at runtime, go to the "product_attributes" JSON column.
I've said it as a joke many times that PostgreSQL can be a better NoSQL database than many actual NoSQL databases just by creating a (text, JSONB) table.
You can do actual searches inside the JSON, index the table with JSONB contents etc. Things that became available in MongoDB very very late.
To your point, I replaced MongoDB with postgresql and a naive wrapper to still use JSON docs the same, and to my surprise my services got way faster and more stable. I write about it here:
And many told you that you did not tell the whole story or did not understand mongodb but apparently you fail to adress those.
Going from 150ms to 8ms and 80% CPU reduction does not make any sense perf wise. I stand my point it's missing a lot of details in your post and probably miss usage.
In GP’s shoes, why bother to learn Mongo then? What’s the benefit?
Postgres has a good-enough (if not better) document DB inside it, but it also has a billion other features and widespread adoption. Mongo needs a huge benefit somewhere to justify using it.
If that reduction can be achieved by simply replacing Mongo with Postgres... what is the point of using Mongo?
Edit: on top of that, very few things that you store in the DB are non-relational. So you will always end up recreating a relational database on top of any NoSQL database. So why bother when you can just go for a RDBMS from the start?
While data can be used in a relational way, it doesn't mean that's the best for performance or storage. Important systems usually require compliance (auditing) and need things like soft deletion and versioning. Relational databases come to a crawl with that need.
Sure you can implement things to make it better, but it's layers added that balloon the complexity. Most robust systems end up requiring more than one type of database. It is nice to work on projects with a limited scope where RDBMS is good enough.
They slow to a crawl when you have huge tables with lots of versioned data and massive indexes that can't perform maintenance in a reasonable amount of time, even with the fastest vertically scaled hardware. You run into issues partitioning the data and spreading it across processors, and spreading it across servers takes solutions that require engineering teams.
There's a large amount of solutions for different kinds of data for a reason.
I have built "huge tables with lots of versioned data and massive indexes". This is false. I had no issues partitioning the data and spreading it across shards. On Postgres.
> ... takes solutions that require engineering teams.
All it took was an understanding of the data. And just one guy (me), not an "engineering team". Mongo knows only one way of sharding data. That one way may work for some use-cases, but for the vast majority of use-cases it's a Bad Idea. Postgres lets me do things in many different ways, and that's without extensions.
If you don't understand your data, and you buy in to the marketing bullshit of a proprietary "solution", and you're too gullible to see through their lies, well, you're doomed to fail.
This fear-mongering that you're trying to pull in favour of the pretending-to-be-a-DB that is Mongo is not going to work anymore. It's not the early 2010s.
For modern RDBMS that starts at volumes that can't really fit on one machine (for some definition of "one machine"). I doubt Mongo would be very happy at that scale, too.
On top of that an analysis of the query plan usually shows trivially fixable bottlenecks.
On top of that it also depends on how you store your versioned data (wikipedia stores gzipped diffs, and runs on PHP and MariaDB).
Again, none of the claims you presented have any solid evidence in real world.
Wikipedia is tiny data. You don't start to really see cost scaling issues until you have active data a few hundred times larger and your data changes enough that autovacuuming can't keep up.
I'm getting paid to move a database that size this morning.
English language Wikipedia revision history dump:
April 2019: 18 880 938 139 465 bytes (19 TB) uncompressed. 937GB bz2 compressed. 157GB 7z compressed.
I assume since then it's grown at least ten-fold. It's already an amount of data that would cripple most NoSQL solutions on the market.
I honestly feel like talking to functional programming zealots. There's this fictional product that is oh so much better than whatever tool you're talking about. No one has seen it, no one has proven it exists, or works better than the current perfectly adequate and performant tool. But trust us, for some ridiculous vaguely specified constraints it definitely works amazingly well.
This time "RDBMS is bad at soft deletions and versions because 19TBs of revisions on one of the world's most popular websites is tiny"
Archival read only servers don't have to worry about any of the maintenance mentioned. Use chatgpt or something to play your devil's advocate, because what you're saying is magical and non existent is quite common.
> I've said it as a joke many times that PostgreSQL can be a better NoSQL database than many actual NoSQL databases just by creating a (text, JSONB) table.
Indeed, the tag line for one of the releases, I think 9.4 or 9.5, was “NoSQL on ACID”.
Often true, but updates of JSON documents in postgres or inserts into nested arrays or objects are much, much slower, and involve locking the entire row. I think if your use case is insert/read only, it works well, though at scale even that can become an issue due to the GIN fastupdate index logic, which can lead to the occasional 20-30 second insert as the buffer gets flushed and the index updated.
> it works well, though at scale even that can become an issue due to the GIN fastupdate index logic, which can lead to the occasional 20-30 second insert as the buffer gets flushed and the index updated.
Hmm, interesting! I've been having some intermittent issues recently with a Postgres table that uses JSONB, where everything seems to lock up for several minutes at a time, with no useful logs as to why.
Do you have any more info about the GIN issue, and how it can be verified as being a problem please?
It’s not a super well-documented feature in postgres, so I also wound up doing some digging in the mailing lists, but I was really appreciative of all the detail in that GL issue and the associated links.
It's not a joke. JSONB blows Mongo out of the water. I've run large production workloads on both.
> You can do actual searches inside the JSON, index the table
To be clear, you can do those things with Mongo (and the features have been available for a very long time), they just don't work as well, and are less reliable at scale.
We use postgres jsonb and have search built 30 million/~100-500 kb records. Only GIN/GIST indexes are possible, if we want to anything special(Search/send events if that is edited) at a deeper node of jsonb, the team prefers to extract it out of jsonb into a new column, which creates it own set of problems.
Probably if someone looking to use jsonb to replace NoSQL DB would need to look deeper.
There is still a long road to improve database usability on semistructured data.
My team has compared PostgreSQL and MongoDB on analytical queries on JSON, and the results so far have been unimpressive: https://github.com/ClickHouse/JSONBench/
It's fair to compare alternatives but to be clear I'm not saying postgres JSONB is worse than mongo. I'm not very familiar with mongo either. Im saying it can be tedious is absolute terms. I did not assume this to be universal to document dbs but perhaps it could be. Although your example does seem simpler.
The word is it's a serious effort on the part of Microsoft. It's missing a MongoDB wire protocol which they plan to opensource as well. In the meantime it's possible to use FerretDB for that.
I think the biggest use case is big data and dev platform that need application compatibility and wrapping Atlas is less attractive for some reason.
My preference would have been for PG to have a hierarchal data type that supported all of its normal data types and then built JSON support on top of that.
I guess my question would be how it handles updates, inserts, and deletes on nested components of objects. These operations in postgres’ JSONB can be a huge problem due to needing to lock the entire row.
I was hoping this was an implementation of the schemaless indexing [1], which is the foundation for Azure DocumentDB.
That design allows arbitrary nested JSON data to be indexed using inverted indexes on top a variation of B-trees called Bw-trees, and seems like a nice way of indexing data automatically in a way that preserves the ability to do both exact and range matching on arbitrarily nested values.
GIN does not support range searches (needed for <, <=, >, >=), prefix or wildcard, etc. It also doesn't support index-only scans, last I checked. You cannot efficiently ORDER BY a nested GIN value.
I wish PostgreSQL would support better syntax for JSON updates. You can use `->>` to navigate the JSON fields, but there's nothing similar for updates. You have to use clumsy functions or tricks with `||` and string casting.
Idk, `jsonb_set` and `||` works good enough for most use cases. For reference, you can use `jsonb_set` to do stuff like:
jsonb_set('{"a": [{"b": 1}]}', '{a,0,c}', '2')
I think a `||` that works with JSON patches would be nice, but you can easily implement that as an extension if you need it.
It's super not great. Instead of having something like: "update blah set field->abc = '123'" you end up with "update blah set field = jsonb_set(field, 'abc', '123');".
Not the end of the world, but I have to look up the syntax every time I need to write a query.
My biggest gripe with MongoDB is the memory requirements, which did not become immediately obvious to me when I was first using it. If you index a field, Mongo will shove every record from that field into memory, which can add up very quickly, and can catch you off-guard if you're not accounting for it. It is otherwise a fantastic database that I find myself being highly productive with. I just don't always have unlimited memory to waste, so I opt for more memory friendly SQL databases.
This looks promising. I am using MongoDB in my application because the data is document oriented. I did take a brief look at using JSONB in Postgres, but I found it a lot harder to work with.
I would prefer to use Postgres as my database, so this is worth investigating. Taking a brief look at the github page, it looks like it will be easy to swap it out in my code. So I think I know what I'll be spending my next sprint on.
It is MIT licensed, and targets PostgresSql which itself has a very liberal license (and is not a Microsoft project). How does the "extend, extinguish" work in this scenario?
It's just my experience. Microsoft wants you to use SQL Server for everything.
If you start thinking in Domain-Driven Design terms, you realize the technology should be dictated by the business models. In many (if not most) cases, a document database is more than sufficient for most bounded contexts and its services, events, commands, and data.
Spinning up a relational database, managing schema changes, and tuning indexes is a constant drag on productivity.
If Microsoft wanted to compete with DynamoDB or MongoDB, they'd make the Cosmos document database a first line service in Azure. But you have to first spin up Cosmos and then identify a type of data storage. There is no technical reason for this setup other than to lump various non-relational data storage types into one service and create confusion and complexity.
I've done bake-offs between Microsoft and AWS and when I talk about a standard serverless architecture, the MS people are usually confused and respond, "What do you mean?" and the AWS folks are "Cool, so Lambdas and DynamoDB. We have 17 sample solutions for you."
I'm not saying you can't do serverless in Azure. I'm saying the support and advocacy is not there.
> But you have to first spin up Cosmos and then identify a type of data storage
They call (or called, 3 years ago) it a "multi-modal" database, but really it was just a wrapper around three engines. It did come with fairly standardised pricing and availability guarantees, though, so my impression it was it was trying to sell to cloud architects and CIOs who might appreciate that.
I think the reason that a there are so many MongoDB wire compatible projects (like this postgres extension from microsoft, and ferretdb) is because people have systems that have MongoDB clients built into their storage layers but don't want to be running on MongoDB anymore, exactly because "performance, scale, and flexibility hit a wall".
If you can change the storage engine, but keep the wire protocol, it makes migrating off Mongo an awful lot cheaper.