> We're not going to be migrating terabytes of data You may have dramatically le...

thomascgalvin · on May 22, 2020

This is (probably) an artifact of Mongo's schema-less nature; when you don't have tables with structure, every document you store has to detail its own schema inline.

In a relational database, you have columns with names and types, and that info is shared by all of the rows.

In Mongo, every cell has to specify its name and type, even if that layout is shared by every other cell in the document.

Mongo's way is more flexible, but it's terrible for storage efficiency.

rubyfan · on May 22, 2020

/caveat i know nothing about how mongo is storing data and haven’t used it since 2010

it doesn’t really have to approach the schemas that way. one would think it would be optimized for repeat schema in the same way one might create the schema definition and then reference it in packing and unpacking the data. seems like if there was schema overhead taking up storage unnecessarily that could be optimized relatively easily.

having schema references also might be a good management tool to understand which records vary potentially due to an application evolving it’s needs.

thomascgalvin · on May 22, 2020

> one would think it would be optimized for repeat schema

I don't think that's the problem Mongo is designed to solve. Mongo's promise was the ability to work with un- and semi- structured data, and it would make sense if its optimizations were focused on that problem, not on reducing overhead when someone tries to strongarm it into being MySQL.

Generally speaking, if your data is structured well enough that you can define a schema ahead of time, you're better off with a traditional RDBMS, because that's the problem an RDBMS is designed to solve.

carlps · on May 22, 2020

There's something beautiful about normalizing the storage of denormalized schemas.

jgalt212 · on May 22, 2020

Yes, and the sooner you do it the better. But doing it during project planning / experimentation phase (or when you don't know what the final result should be yet) will really just slow you down. In many ways, very similar to the static / dynamic language trade offs.

svachalek · on May 22, 2020

Unless this has changed in recent years, the BSON format that Mongo uses is more or less JSON optimized for parsing speed and takes more or less as much space as storing your entire database in JSON.

JSON is a great format for simplicity and readability but as a storage format it's hard to come up with one that's more bloated.

yetihehe · on May 22, 2020

> but as a storage format it's hard to come up with one that's more bloated.

It's easy: xml.

jonnypotty · on May 22, 2020

Work in the book industry, our implementation of it(onix) is a decent way to allow non IT professionals to encode complex data in a standardised way, but as a way to store and transmit large amounts of data its a nightmare. The only thing that saves it is there is so much repeated data it compresses brilliantly. Ha.

damidekronik · on May 22, 2020

Not OP but I think it's more about the importance of the said data, number of collections to think about and so on. Regarding your point, I would guess some index changes might have had a significant impact here.

mbell · on May 22, 2020

> Regarding your point, I would guess some index changes might have had a significant impact here.

I'm not sure exactly what you mean, but that particular collection only had a single index on it (outside the ID column).