Mongo FUD has more to do with the fact that RDS...es work better with almost any...

neuronexmachina · on Aug 2, 2022

Anecdotally, I've been at several companies where MongoDB was a crucial part of a tech stack due to early tech decisions. In every case, most of the engineers working with it wished there was a reasonable way to migrate away from MongoDB to PostgreSQL. I've never seen the opposite (engineers wanting to transition from SQL to MongoDB).

redwood · on Aug 2, 2022

Sorry but "any other key:document store" seems to imply that you're not aware of the value of consistent secondary indexes: a lot of people are not aware and conflate K/V stores which are fundamentally inflexible with more general purpose alternatives like Postgres and MongoDB: frankly these two are both able to express a wide variety of modern needs and whether you prefer one versus the other is a question of tradeoffs. The world is no as reductive as you imply

robertlagrant · on Aug 2, 2022

MongoDB isn't a general purpose alternative. You're right that there are tradeoffs, but relational databases are the general purpose "good at everything" engines. MongoDB is amazing at some things, but worse at others. Indexes can help, but - as with any index - don't always help. Some performance characteristics are more structural than that, which is why document databases exist.

redwood · on Aug 2, 2022

I struggle to understand how relational isn't a subset of documents (namely flat documents): what am I missing?

robertlagrant · on Aug 3, 2022

I'm no expert but my understanding is that a lot of these specialised databases' strengths and weaknesses come from laying the data out on disk differently to how a relational database would.

A relational database saves data by table, and within each table by row. This means it's very good at getting a row of data out of a table, and pretty good at column operations (e.g. sum the "price" column in this list) and pretty good at joining data (e.g. "fetch me related data from these other tables").

A document database such as MongoDB saves all the data per-key together, so it doesn't have to do work to relate data. It just reads and returns it. That means it's exceptional at getting related data (as long as you saved all the related data in one document), but terrible at joining to other documents by field, and terrible at column operations across documents (e.g. "sum the price of these books, where each book is a different document").

A columnar database is amazing at column-oriented operations (e.g. "compress this data" or "sum this column", but bad at getting all the data for one record, as it has to traverse all the columns to find the right data, and therefore bad at relating data, as it has to do column traversal across multiple tables.

A graph database is amazing at getting related tables or entities (e.g. "I'm a person, now through a self-referential edge back to the Person node get me all my friends of friends of friends"). I can't think off the top of my head what that would be bad at, but perhaps it's bad at summing columns, that sort of thing.

Again: I'm no expert. That's just my understanding.

throw1234651234 · on Aug 3, 2022

Hopefully someone corrects you if you got something wrong, because to me that was an amazingly succinct and clear explanation of the advantages and disadvantages of different document types.

One side note is that with document databases in the Cloud, like Cassandra, I also have to worry about Partition Keys, even though most businesses don't work at that scale (https://www.baeldung.com/cassandra-keys)

robertlagrant · on Aug 3, 2022

Yes, I had the dubious pleasure when I dabbled (thankfully briefly) with Azure's CosmosDB!

redwood · on Aug 5, 2022

Ha indeed let's be honest most Azure building blocks are not up to par but this takes the cake in terms of peak embarrassment

redwood · on Aug 3, 2022

I think there's some false equivocation going on here.

When you point out that behind the scenes rows sit adjacent to each other on disk in an RDBMS, that's the same thing that's happening behind the scenes with documents... they're sitting adjacent to each other on disk. Documents instead of rows is all: and guess what, a flat document without nested documents or arrays is a relational style row.

You jump to the ability to get speed up operations with column scoped operations, but really what you're getting at here is a question of indexing... in a document oriented database with rich secondary indexes you can do aggregations across fields (fields are like columns in an RDBMs but in a document) which can include arrays. Frankly that's why it's so important to have consistent secondary indexes that can express the breadth of your query patterns and evolution therein and frankly it's why key/value stores like dynamodb are absolutely not general purpose, in other words cannot be used to express bread and butter workloads that folks would have used RDBMS for in the decades leading up to this moment.

You are correct that column store engines are a different category entirely optimized for super efficient lookups on individual columns as opposed to entire rows and therefore are less optimized for updates: but again frankly there's no reason why this can't actually express a document structure it's just custom running back to the 1970s that causes us to think and use terms like this.

Now bringing up joins is a great point, and it's true that one of the things document databases allow people to do is store data in a denormalized way which offers scalability opportunities and perhaps more importantly a way to store data the way you think about it in your code but this is certainly not a requirement... it's just the best practice where appropriate. You still have to express different business objects in distinct parts of your data model and occasionally have to join them.. join performance is simply a question of the engine's implemented optimizations and is not related to the data model.

By the way, graph databases maintain data structures optimized for closeness but there's frankly no reason why such a data structure can't be also an optimization on another data engine, again it's essentially distinct from the data model; the industry has leaned into a false understanding that somehow these all need to be separate categories by default.

robertlagrant · on Aug 3, 2022

Thanks for the detailed answer, but I don't think this is right. I will attempt to be as civil as you in my rebuttal :)

The key point I think your perspective foregoes is that documents aren't like tables, and columns aren't like fields. A document holds one complete entity. A table holds a part of the information for all entities.

As an example, if I were a bookshop, and I chose to store books in an RDBMS, I might store their prices in a price table that just has id, book_id and price_in_cents in it (ignoring valid_from and valid_to niceties). If I store them in a document database, I would have a price field embedded in each Book document.

If we wanted to operate on all the prices, say to generate a histogram of all current book prices, I can query my RDBMS table for each price per book. The RDBMS knows exactly where to look in the data file(s) as it operates on fixed byte offsets defined in the table schema. The document database needs to jump into each document, pull out the price, find the next document and repeat. This is very far from fixed offsets, and likely involves loading far more data into memory in the first place.

I await a rebuttal with interest :)

redwood · on Aug 3, 2022

This is what secondary indexes enable: if you have a sub-field of the document for price and you have a secondary index on that sub-field, you can do a covered query straight out of the index on an aggregate of that price field.

Essentially in your example you could think of the price table that you described as effectively a secondary index style data structure in a user space table.. but why not just make it an index that the database engine manages consistently for you?

robertlagrant · on Aug 7, 2022

Well, I suppose because indexes can slow some things down (e.g. writes), and can also fail to speed other things up. They're not a magical answer to everything, or they'd be on by default on everything!

But let's take your example of using an index to get all prices. How would that work?

redwood · on Aug 7, 2022

You would do an aggregate query on the price field grouped by price

lmm · on Aug 3, 2022

> Mongo FUD has more to do with the fact that RDS...es work better with almost any line of business application.

Nope. That's simply not true, and it has nothing to do with why people dislike MongoDB (you don't see similar attacks on Cassandra for example). The issue is that it has a lot of cases of data loss especially in its default configuration, and that's not a document database problem, it's a MongoDB problem. (Indeed it's very similar to the reputation that MySQL used to enjoy, for those of us who can remember back that far).

rglover · on Aug 2, 2022

Work better in what way? Speed? Reliability? Scalability? And is that based on direct experience or indirect?

cynusx · on Aug 2, 2022

Simple operations like joining data together or filtering a table for a specific value are way more performant.

They are also extremely common operations.

Mongodb became obsolete when postgresql added the json column type.

throw1234651234 · on Aug 2, 2022

Extensive experience with the lower-end (in terms of revenue) Fortune 1000 companies.

Every single cloud and MongoDB pushes horizontal scaling and sharding. This rarely a necessity in practice.

Traditional RDS normalization is almost always necessary to keep clean data. As I said, I have used MongoDB extensively for non-normalized, client created forms, and it was fine for that, but I wouldn't do it again given the choice now.

In more simple terms - when I work for a staffing agency and delete a profession from the database, I want the DB to yell at me about FK constraints that needs to be considered (e.g. millions of jobs and candidates set to that profession_id)