Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

At one company we had one-off requests to run get some statistics from our mongodb.

Writing the proper queries proven pretty tricky whenever GROUP BY/JOINs were involved, so I used online converters between SQL -> mongo query.

But then I realized that PostgreSQL has nice JSONB type (supports indices for subfields). So I put all the mongo data into tables with a single JSONB column (or two columns id+data, if you prefer).

Turned out that was much faster to run analytical queries :)

"document" is just row

"collection" is just table




I would recommend against modeling too many relations in a document DB. It can work, but it gets bogged down very quickly for the reasons you've stated.

My off-the-cuff suggestion is that document databases are not built to model normalized relational data. If you try to do so, you bring a whole new level of pain upon yourself. It is do-able, but it is hard and annoying.

I know this from extensive personal experience dealing with a large, highly relational Mongo database.

I am very glad you found a solution within Postgres that works for you, and your mapping of document to row and collection to table is very apt!

If possible, would you care to tell me what the size of said documents (rows) and collections (tables) are in your solution? I am curious if in another life, we might have built our tech stack on Postgres instead of MongoDB and been much happier for doing so.


We had everything de-normalized, as you pointed out.

However, every now and then we had some one-off questions that required using Mongo for something it's not designed for. However, migrating to postgres+JSONB made it easy to do both.


We've been on a big "Use postgres for everything" path. Mostly migrating dynamodb, elasticsearch, and redis to postgres.

The number of times we've had to do some one-off query that was orthogonal to production usage is higher than I predicted. There were so many times in the past where we just didn't fix something because it was too difficult to backfill a document store. We just hacked around it.

Now with postgres we fix the root of the problem instead of hacking around it. And we do so with a level of confidence we've never had with document stores.


This is one of the reasons we created https://schemafreesql.com. We wanted SQL with some of the document store (NOSQL) features.


Joins only make sense in analytical context as a tool to get some additional data into your report and almost never as domain modeling concept. People should pretty much hardly ever use RDBMS for their domain data... but since everyone learns RDBMS as their first DB, we have these horrible ORM frameworks on every corner.


Curious why you think this? I've found Relational databases pretty powerful, and learned mongo first.


They are powerful. But often their Power is unnecessary and counter productive for domain modeling.

Where are they really good is reporting. Analysts love rdbms. When I am on an analyst role, I love them them too. As an engineer I find them redundant.

More about domain modeling

https://dev.tasubo.com/2022/07/crash-course-domain-driven-de...


1) That article says nothing against using rdms in my quick perusal, in fact it suggests MySQL!

2) In practice having the ability to run reports on your data can be super important and useful.

I'm still confused at the point you're trying to make.


Yeah, I've kind of railed against ORMs for a while now... I'd just assume a simple data mapper (like Dapper for C#) or be really explicit in a scripted language with template queries.


If there were JOINs it may be a sign MongoDB was not the best DB for you. It means the model was relational and a relational DB would be a better fit. MongoDB lookups are discouraged in general, and especially in analytical workloads, which cover a lot of data. MongoDB is better for the scenario, where all you need is already in the document.


It may also mean poor schema design. You can absolutely model relational data in MongoDB without relying on joins. If you want to normalize your data don't pick Mongo.


If you aren't normalizing, how are you ensuring that you don't avoid anomalies?


Proper NoSQL design has a different perspective, it asks you "well, so what if you have an anomaly?". One example could be movies, with actors, producers, genres, etc. that could all be in separate tables in a relational database, or each movie could be a document in a document database. Now let's imagine that an actor changes their name a few years after the movie is released. Is it important to go back and change the name of the actor in each of the movies? Maybe, maybe not. Certainly you can't change the credits inside the movie itself. Maybe it's sufficient to just have a link to the actor's page, where the actor's up-to-date name is. Maybe it's insufficient if the actor will get upset that their name wasn't updated across their old movies.

You choose relational models when anomalies are unacceptable and non-relational models when anomalies are acceptable.


That makes sense. I tend to go away from non-relational models for this reason, but it's definitely a matter of risk management.


Thanks! That is a very useful example.


Not sure if this is exactly what you're referring to, but my understanding is that picking the "right" schema for a document database to ensure that you don't end up with slower queries like mentioned elsewhere in the thread tends to benefit from thinking at a somewhat lower level of granularity than you would probably need to with a relational database. Instead of just identifying "one to one", "one to many", "many to many", it's useful to ask questions like "how 'many' is many"; as a simple example, if the "many" in "one to many" is on the order of 10, maybe it makes sense to embed them in an array rather than use a separate collection for them. It can also help to start from thinking about the types of queries you might want and then designing the schema based on them rather than starting by deciding on the schema and then having the queries be based on that; if you're going to want certain data to be accessed at the same time, you're probably going to find some way to store it together.


agreed, you can have reference fields to other collections and filter by them or query related data. But I wouldn't say its designed for relational workloads. Similarly I wouldn't use MongoDB for graph queries, even though it has an operator for just that.


> agreed, you can have reference fields to other collections and filter by them or query related data.

Yes it has that, but that's an anti-pattern. What I'm referring to is schema design. In Mongo you denormalize and store relational data in the same document as references. In SQL you a record might be split up into 3 tables, in Mongo that should all live in the same document. If you can't embed and need lookups, you should avoid Mongo and other nosql DBs


Should it be used as a database in the way that Postgres would be used as a database? It seems pretty unfit for that role, with its past durability issues and that writes have to be tailored to how the data will be consumed, due to lack of ad-hoc queries.

As something layered on top of the actual source of truth it may be reasonable, writing a materialized version of a costly query that's read very often, sure, but that's an optimization and it'd be competing with Redis for what matters there.


MongoDB has pipeline queries https://www.mongodb.com/docs/manual/core/aggregation-pipelin... that allow for quite ad-hoc queries.


Same thoughts. I see mongo as a key-value store (string key -> string value), with a json wrapper over the values.


No, our app didn't use JOINs for the main work. But, as I wrote, every now and then we had some one-off requests that required either JOIN or GROUP BY or both.

With postgres+JSONB we could do both at the same time on live data.


MongoDB really only makes sense when you have an extremely large set of documents and you don't need to do this kind of statistics.


Yep, although for large datasets I'd prefer multi-master architecture, depending what is needed, for example Blob storages or Cassandra for key-value, and BigQuery or Clickhouse for analytics.

We also thought we wouldn't need it, but turned out sometimes we wanted to run some stats.


Since FerretDB is built on top of Postgres, could it be an extension? So that one can mix some FerretDB tables and some general SQL tables in the same database (with proper foreign key constraints between them etc)


Yes, building an extension is something we are looking at, and I am glad to hear that you would be interested in this kind of mixed use case. Would definitely make things interesting, and more performant.


Pretty exciting!

What about optionally validating some columns with jsonschema? Perhaps using https://github.com/supabase/pg_jsonschema - is using other postgres extensions supported in FerretDB? (if not, maybe it's feasible to incorporate the code of pg_jsonschema in FerretDB?)



The only problem I have with PostgreSQL's JSONB type is it strictly implements the JSON standard, so things like nulls and integers aren't supported. Depending on your data, this could be a real problem.


FWIW, JSONB supports JSON nulls, and all numbers are encoded as `numeric` type, storing integers and float64 values precisely (well, as much as possible for float64 values).

But you comment is correct for other data types like date-times and binary strings where some form of encoding/decoding is needed.


Yeah, I was misremembering. Our issues were related to NaNs, which we ended up encoding as nulls. I don't think we've had issues with dates yet.


The json standard has nulls




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: