Hacker News new | past | comments | ask | show | jobs | submit login

> We're not going to be migrating terabytes of data

You may have dramatically less 'real data' than mongo makes you think you do. I migrated one of our mid sized database out of mongo and into PG a couple years ago. The reduction in size was massive. One table in particular that was storing a small number of numeric fields per doc went from ~10GB to ~50MB. I wouldn't expect this with all datasets of course, but mongo's document + key storage overhead can be massive in some use cases.




This is (probably) an artifact of Mongo's schema-less nature; when you don't have tables with structure, every document you store has to detail its own schema inline.

In a relational database, you have columns with names and types, and that info is shared by all of the rows.

In Mongo, every cell has to specify its name and type, even if that layout is shared by every other cell in the document.

Mongo's way is more flexible, but it's terrible for storage efficiency.


/caveat i know nothing about how mongo is storing data and haven’t used it since 2010

it doesn’t really have to approach the schemas that way. one would think it would be optimized for repeat schema in the same way one might create the schema definition and then reference it in packing and unpacking the data. seems like if there was schema overhead taking up storage unnecessarily that could be optimized relatively easily.

having schema references also might be a good management tool to understand which records vary potentially due to an application evolving it’s needs.


> one would think it would be optimized for repeat schema

I don't think that's the problem Mongo is designed to solve. Mongo's promise was the ability to work with un- and semi- structured data, and it would make sense if its optimizations were focused on that problem, not on reducing overhead when someone tries to strongarm it into being MySQL.

Generally speaking, if your data is structured well enough that you can define a schema ahead of time, you're better off with a traditional RDBMS, because that's the problem an RDBMS is designed to solve.


There's something beautiful about normalizing the storage of denormalized schemas.


Yes, and the sooner you do it the better. But doing it during project planning / experimentation phase (or when you don't know what the final result should be yet) will really just slow you down. In many ways, very similar to the static / dynamic language trade offs.


Unless this has changed in recent years, the BSON format that Mongo uses is more or less JSON optimized for parsing speed and takes more or less as much space as storing your entire database in JSON.

JSON is a great format for simplicity and readability but as a storage format it's hard to come up with one that's more bloated.


> but as a storage format it's hard to come up with one that's more bloated.

It's easy: xml.


Work in the book industry, our implementation of it(onix) is a decent way to allow non IT professionals to encode complex data in a standardised way, but as a way to store and transmit large amounts of data its a nightmare. The only thing that saves it is there is so much repeated data it compresses brilliantly. Ha.


Not OP but I think it's more about the importance of the said data, number of collections to think about and so on. Regarding your point, I would guess some index changes might have had a significant impact here.


> Regarding your point, I would guess some index changes might have had a significant impact here.

I'm not sure exactly what you mean, but that particular collection only had a single index on it (outside the ID column).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: