> *When querying the database,* query the database. *It’s almost always more eff...

bob1029 · 2025-08-16T09:03:29 1755335009

This is a big part of what makes ORMs a problem.

Writing raw SQL views/queries per MVC view in SSR arrangements is one of the most elegant and performant ways to build complex web products. Let the RDBMS do the heavy lifting with the data. There are optimizations in play you can't even recall (because there's so many) if you're using something old and enterprisey like MSSQL or Oracle. The web server should be able to directly interpolate sql result sets into corresponding <table>s, etc. without having to round trip for each row or perform additional in memory join operations.

The typical ORM implementation is the exact opposite of this - one strict object model that must be used everywhere. It's about as inflexible as you can get.

mattmanser · 2025-08-16T10:35:03 1755340503

Most ORMs will happily let you map stored procedures and views to a class, you can have as many models as you want. So your point doesn't really make sense.

The author's said nothing about ORMs. It feels like you're trying to post a personal beef about ORMs that's entirely against the "pragmatic" software design engineering the author's opining. Using ORMs to massively reduce your boiler-plate CRUD code, then using raw SQL (or raw SQL + ORM doing the column mapping) for everything else is a pragmatic design choice.

You might not like them, but using ORMs for CRUD saves a ton of boilerplate, error-prone, code. Yes, you can footgun yourself. But that's what being a senior developer is all about, using the tools you have pragmatically and not foot gunning yourself.

And it's just looking for the patterns, if you see a massive ORM query, you're probably seeing a code smell. A query that should be in raw SQL.

dondraper36 · 2025-08-16T10:48:35 1755341315

In Go, for example, there is a mixed approach of pgx + sqlc, which is basically a combo of the best Postgres driver + type-safe code generator (based on raw SQL).

https://brandur.org/sqlc

Even though I often use pgx only, for a new project, I would use the approach above.

DanielHB · 2025-08-18T14:17:17 1755526637

I did some exploratory analysis on sqlc some time ago and I couldn't for the life of me figure out how to parametrize which column to sort-by and group-by in queries.

It is quite neat, but I don't think it actually replaces a proper ORM for when ORMs are actually useful for. That on top of all the codegen pitfalls.

I personally quite like the Prisma approach which doesn't map database data to objects, but rather just returns an array of tuples based on your query (and no lazy loading of anything ever). With typescript types being dynamically computed based on the queries. It has its own pitfalls as well (like no types when using raw queries).

Yokohiii · 2025-08-16T12:19:25 1755346765

The way you describe it, it would be ideal if ORMs would only handle very basic CRUD and force you to use raw sql for complex queries. But that's not reality and not how they are used, not always. In my opinion some devs take pride to do everything with their favorite ORM.

I think if an app uses 90% ORM code with the remains as raw queries, a junior is inclined to favor ORM code and is also less exposed to actually writing SQL. He is unlikely to become an SQL expert, but using SQL behind a code facade, he should become one.

mattmanser · 2025-08-17T14:34:51 1755441291

And the ORM free code has massive downsides, not limited to if you add/change a column code can break at runtime, not compile time.

The negatives of not using an ORM is far worse than the negatives of not reigning in some developers who shouldn't be making complex queries.

If they don't even know how to check the SQL their complex ORM query produces, that's a training problem, not an ORM problem.

It's one of our great weaknesses as a profession, assuming everyone will figure stuff out on their own.

Too · 2025-08-16T13:28:09 1755350889

With an ORM your application code is your views.

You can write reusable plain functions as abstractions, returning QuerySets that allow further filters being chained onto the query, before the actual SQL is materialized and sent to the database.

The result of this doesn’t have to match the original object models you defined, it’s still possible to be flexible with group bys resulting in dictionaries.

tremon · 2025-08-16T14:54:35 1755356075

But converting a SQL relation to a set of dictionaries already carries a lot of overhead: every cell in the resultset must be converted to a key-value pair. And the normal mechanics of vertical "slicing" a set of dictionaries is much more expensive than doing the same in a 2d relation array. So while you might want to offer a dictionary-like interface for the result set, please don't use a dictionary-like data structure.

zbentley · 2025-08-16T19:18:13 1755371893

There are valid reasons to avoid complex ORM/query result representations, but this isn’t one of them.

I have very rarely seen or even heard of the result representation data structure for an SQL query being a bottleneck. The additional time and space needed to represent a raw tabular result in a more useful way on the client are nearly always rounding errors compared by the time needed to RPC the query itself and the space taken up by the raw bytes returned. Given that, and the engineering time wasted working with (and fixing inevitable bugs in) fully tabular result data structures (arrays, bytes), this is bad advice.

henry2023 · 2025-08-16T14:34:57 1755354897

Unpopular opinion. ORM by definition is the gcd of "supported databases" features. It exists only because people doesn't like the aesthetics of SQL but the cost to use them is immense.

CuriouslyC · 2025-08-16T16:14:42 1755360882

Not unpopular. ORM hate is real. I like SQL Alchemy and Drizzle in projects for the features they give you for free (such as Alembic migrations and instant GraphQL server), but I still write SQL for most stuff.

richardlblair · 2025-08-16T11:42:20 1755344540

If your ORM is going to the DB per row you're using it wrong. N+1 queries are a performance killer. They are easy to spot in any modern APM.

Rails makes this easy to avoid. Using `find_each` batches the queries (by 1,000 records at a time by default).

Reading through the comment section on this has been interesting. Either lots of people using half baked ORMs, people who have little experience with an ORM, or both.

wild_egg · 2025-08-16T14:04:58 1755353098

I mean Rails also makes it easy to accidentally nest further queries inside your `find_each` block and end up with the same problem.

Your team can have rules and patterns in place to mitigate it but I'd never say "Rails makes this easy to avoid".

richardlblair · 2025-08-16T16:25:54 1755361554

This is true with any any interaction with the DB, ORM or otherwise. Regardless of the layer of abstraction you choose to operate at you still need to understand the underlying complexity.

What Rails gives you is easy to use (and understand) abstractions that enable you to directly address performance issues.

Easy is highly contextual here, because none of this is trivial.

mathiaspoint · 2025-08-16T16:28:06 1755361686

I think the real value in frameworks like rails and Django is that it makes it easier to collaborate. When you do it from scratch people inevitably write their own abstractions and then you can't share code so easily.

hk1337 · 2025-08-16T10:06:59 1755338819

Even in the article the solution wasn’t to abandon the ORM in favor of raw SQL but knowing how to write the code so it doesn’t have to run 100 extra queries when it doesn’t need to.

> Particularly if you’re using an ORM, beware accidentally making queries in an inner loop. That’s an easy way to turn a select id, name from table to a select id from table and a hundred select name from table where id = ?.

mexicocitinluez · 2025-08-16T12:17:36 1755346656

>The typical ORM implementation is the exact opposite of this - one strict object model that must be used everywhere. It's about as inflexible as you can get.

I can't respond to the "typical" part as most of my experience is using EF Core, but it's far from inflexible.

Most of my read-heavy, search queries are views I've hand written that integrate with EF core. This allows me to get the benefit of raw SQL, but also be able to use LINQ to do sorting/paging/filtering.

tossandthrow · 2025-08-16T09:36:21 1755336981

Have you ever build a complex app like this?

In particular, have you have to do testing, security (eg. row level security), manage migrations, change management (eg. for SOC2 or other security frameworks), cache offloads (Redis, and friends), support for microservices, etc.

Comments like this give me a vibe of young developers trying out Supabase for the first time feeling like that approach can scale indefinitely.

rbees · 2025-08-16T10:30:17 1755340217

> Comments like this give me a vibe of young developers

I don’t think so. The context is about avoiding joining in memory, which is fairly awful to do in a application, and should be avoided, along with uninformed use of ORMs, which often just add a layer of unwarranted complexity leading to things like the dreaded N+1 problem that most inexperienced Rails developers had when dealing with ActiveRecord.

If anything, what you’re talking about sounds like development hell. I can understand a database developer having to bake in support for that level of security, but developing an app that actually uses it gets you so far in the weeds that you can barely make progress trying to do normal development.

A developer with several years of experience or equivalent will have pride in developing complexity and using cool features that make them feel important.

After a developer has maybe twice that many years experience or equivalent, they may develop frameworks with the intent to make code easier to develop and manage.

And beyond that level of experience, developers just want code that’s easy to maintain and doesn’t make stupid decisions like excessive complexity. But, they know they have to let the younger devs make mistakes, because they don’t listen, so there is no choice but to watch hell burn.

Then you retire or get a different job.

tossandthrow · 2025-08-16T11:48:30 1755344910

I don't know what I am talking about that sounds like hell?

I am merely talking about properties of developing complex web applications that have traditionally not been easy to work with in SQL.

I am in particular not proposing any frameworks.

How can that sound like hell?

lurking_swe · 2025-08-16T10:36:06 1755340566

Not the person you replied to, but I have! A java project I worked on a couple years ago used a thin persistence layer called JOOQ (java library). It basically helps you safely write sql in java, without ORM abstractions. Worked just fine for our complex enterprise app.

Sql migrations? This is a solved problem: https://github.com/flyway/flyway

What about micro services? You write some terraform to provision a sql database (e.g. aws aurora) just like you would with dynamo db or similar. What does that have to do with ORMs?

What about redis? Suddenly we need an ORM to query redis, to check if a key exists in the cache before hitting our DB? That’s difficult code to write?

I’m confused reading your comment. It has “you don’t do things my way so you must be dumb and playing with toy projects” vibes.

__MatrixMan__ · 2025-08-16T11:58:59 1755345539

As a previous user of alembic I was surprised that flyway's migrations only go forward by default and that reversing them is a premium feature. That's like having the luxury trim being the one with seatbelts.

lurking_swe · 2025-08-16T18:43:25 1755369805

it’s been a while since I used flyway. is there a better option in 2025? Just curious.

tossandthrow · 2025-08-16T11:55:09 1755345309

From what I can se jooq is only really type safe with pojo mappings, to what point it is an orm with an expressive query dsl.

Alternatively you use record style outputs, but that is prone to errors if positions are changed.

Regardless, even with jooq you still accept that there is a sizable application layer to take responsibility of the requirements I listed.

lurking_swe · 2025-08-16T18:39:54 1755369594

i guess it’s semantics, but i agree with you actually. After all ORM = object relational mapping. However it’s certainly the most lightweight ORM i’ve used in the java and c# world. With JOOQ you are in complete control of what the SQL statements look like and when those queries happen (avoids the common N + 1 risk). _Most_ ORMs i’ve seen attempt to abstract the query from the library user.

In our project we generated pojo’s in a CI pipeline, corresponding to a new flyway migration script. The pojos were pushed to a dedicated maven library. This ensured our object mappings were always up to date. And then we wrote sql almost like the old fashioned way…but with a typesafe java DSL.

Yokohiii · 2025-08-16T12:03:07 1755345787

I don't understand why all these problems should be easier handled with an ORM then with raw sql?

LinXitoW · 2025-08-16T13:30:44 1755351044

Why is it so hard to believe that well tested, typed code is better than manual string concatenation?

Before you tell me about how you just use a Query Builder/DSL and a object mapper for convenience: That's a freaking ORM!

tossandthrow · 2025-08-16T12:18:33 1755346713

It is a granluarity tradeoff.

With SQL you need to explicitly test all queries where the shape granularity is down to field level.

When you map data onto an object model (in the dto sense, not oop sense) you have bigger building blocks.

This gives a simpler application that is more reliable.

Obviously you need to pick a performant orm - and it seems a lot of people in these threads have been traumatized.

Personally, I run a complex application where developers freely use a graphql schema and requests are below 50ms p99 - gql in translated into joins by the orm, so we do not have any n+1 issues, etc.

johnmaguire · 2025-08-16T12:22:45 1755346965

The issue with GraphQL tends to be unoptimized joins instead. Is your GraphQL API available for public consumers? How do you manage them issuing inefficient queries?

I've most often seen this countered through data loaders (batched queries that are merged in code) instead of joins, or query whitelists.

tossandthrow · 2025-08-16T13:01:03 1755349263

While this api in particular is not publicly exposed, that would not be a concern.

The key is to hold the same schema on the database as on the graphql and use tooling that can translate a gql query into a single query.

johnmaguire · 2025-08-16T18:44:53 1755369893

The issue I've seen with GraphQL isn't necessarily the count of queries run, but rather the performance or said queries (i.e. most SQL queries are not performant without proper indexes for the specific use case, but GraphQL allows lots of flexibility in what queries users can run.)

tossandthrow · 2025-08-18T14:09:00 1755526140

Yes - one needs to ensure that the data is well indexed - that is reasonable.

But indices does not need to yield a single result. It is OK that indices reduce the result set to tens or couple of hundreds of result. That is well within the performance requirements (... of our app)

Yokohiii · 2025-08-16T12:29:57 1755347397

In my ears that's just neglect? You assume your ORM does the basic data mapping right and don't verify it?

marcosdumay · 2025-08-16T15:54:44 1755359684

> You assume your ORM does the basic data mapping right

You know, it should. There's no good reason for an ORM to ever fail at runtime due to mapping problems instead of compile time or start time. (Except, of course if you change it during the software's execution.)

Yokohiii · 2025-08-16T16:32:00 1755361920

Why should a raw query fail?

tossandthrow · 2025-08-16T13:02:00 1755349320

No? The difference is to verify it ones for the orm VS ones for every single place your query.

Yokohiii · 2025-08-16T15:18:36 1755357516

I have to respond here as I seemingly the depth limit is reached.

As you've mentioned graphql you probably comparing ORM in that sense to an traditional custom API with backed by raw sql. In a fair comparison both version would do the exactly same, require the same essential tests. Assuming more variations for the raw sql version is just assuming it does more or somehow does it badly in terms of architecture. Which is not a fair comparison.

tossandthrow · 2025-08-16T17:48:02 1755366482

The orm represents deferred organization. Ie someone else is testing mapping and query generation for you.

An example is prisma. Prisma has a team og engineers that work on optimizing query generation and provide a simple and intuitive api.

Not using an orm forces you to take over that organization and test that extra complexity that goes into you code base.

It might be merited if you get substantiel performance boosts - but I have not seen any reasonably modern orm where performance is the issue.

Yokohiii · 2025-08-16T14:19:29 1755353969

A raw query doesn't has to be repeated in every place it's required. Not sure what your point is.

tossandthrow · 2025-08-16T14:57:15 1755356235

You will have a bigger variety of queries hwne you don't use an orm - this puts a higher load on software testing to get the same level of reliability.

sgarland · 2025-08-16T16:14:56 1755360896

> 50 ms p99

You realize that’s abysmally bad performance for any reasonable OLTP query, right? Sub-msec (as measured by the DB, not including RTT etc.) is very achievable, even at scale. 2-3 msec for complex queries.

tossandthrow · 2025-08-18T20:34:55 1755549295

The is the response time for the server, not the database - which is appears that everyone but you understood clearly from the context.

cpursley · 2025-08-16T10:37:37 1755340657

Guessing you are a Rails dev?

scarface_74 · 2025-08-16T18:22:23 1755368543

C#’s Linq based ORMs have always been - type safe built into the OS feature -> run time generation of an agnostic expression tree -> database provider converts it into SQL. It does database joins (unless you do something stupid like get out of IQuery land).

tialaramex · 2025-08-16T11:51:57 1755345117

Stored procedures seem like a win but the big problem is that while I could write the rest of the software in a very nice modern language like Rust, or more practically in C# since my team all know C# if I write a stored procedure it will be in Transact-SQL because that's the only choice.

T-SQL was not a good programming language last century when it was vaguely current, and so no I do not want to write any significant amount of code in T-SQL. For my sins I maintain a piece of software with huge T-SQL procedures (multi-page elaborations by somebody who really, really like this stuff) and they're a nightmare. The tooling doesn't really believe in version control, the diagnostics when you make a mistake are either non-existent or C++ style useless spew.

We hire a lot of very junior developers. People who still need to be told not to comment out code in release, that variable numbers are for humans to read not machines, that sort of thing. We're not quite hiring physicists to write software (I have done that at a startup) but it's close. However, none of the poor "My first program" code I see in a merge request by a new hire is anywhere close to as unreadable as the T-SQL we already own and maintain.

Yokohiii · 2025-08-16T12:26:26 1755347186

I've only once tried to use stored procedures in mysql and it was almost impossible to debug back then. Very painful. Average devs already have issues being smart with their databases and stored procedures would add to that.

Stored procedures also add another risk. You have to keep them in sync with code, making releases more error prone. So you have to add extra layers of complexity to manage versioning.

I can see the advantage of extreme performance/efficiency gains, but it should be really big to be justified.

CuriouslyC · 2025-08-16T16:16:47 1755361007

I'm a big postgres guy and in theory I love stored procedures (so many language options!) but you're 100% right that the downsides in terms of DX make them pretty much the last thing I reach for unless they're a big performance/simplicity win and I expect them to be pretty static over time.

loglog · 2025-08-16T13:06:46 1755349606

> Stored procedures also add another risk. You have to keep them in sync with code, making releases more error prone.

This one is easily solved: never change a stored procedure. Every version should get a new name.

Yokohiii · 2025-08-16T14:13:28 1755353608

That's what I meant when I've mentioned versioning.

doitLP · 2025-08-16T13:42:25 1755351745

I worked at a place with just such a system. Half the application code was baked into sprocs, no version control and hidden knock on effects everywhere.

There was _one guy_ who maintained it and understood how it worked. He was very smart but central to the company’s operations. So having messy stuff makes it brittle/hard to change in more ways than one and

mdavid626 · 2025-08-16T10:30:33 1755340233

I disagree. In modern highly scalable architectures I’d prefer doing joins in the layer front of the database (backend).

The “backend” scales much easier than the database. Loading data by simple indexes, eg. user_id, and joining it on the backend, keeps the db fast. Spinning up another backend instance is easy - unlike db instance.

If you think, your joins must happen in db, because data too big to be loaded to memory on backend, restructure it, so it’s possible.

Bonus points for moving joins to the frontend. This makes data highly cacheable - fast to load, as you need to load less data and frees up resources on server side.

riv991 · 2025-08-16T10:56:33 1755341793

High Scale is so subjective here, I'd hazard a guess that 99% of businesses are not at the scale where they need to worry about scaling larger than a single Postgres or MySQL instance can handle.

Tade0 · 2025-08-16T11:03:14 1755342194

In the case of one project I've been in, the issue was the ORM creating queries, which Postgres deemed too large to do in-memory, so it fell back to performing them on-disk.

Interestingly it didn't even use JOIN everywhere it could because, according to the documentation, not all databases had the necessary features.

A hard lesson in the caveats of outsourcing work to ORMs.

richardlblair · 2025-08-16T11:33:01 1755343981

I've worked both with ORMs and without. As a general rule, if the ORM is telling you there is something wrong with your query / tables it is probably right.

The only time I've seen this is my career was a project that was an absolute pile of waste. The "CTO" was self taught, all the tables were far too wide with a ton of null values. The company did very well financially, but the tech was so damn terrible. It was such a liability.

mdavid626 · 2025-08-16T11:04:18 1755342258

Scalability is not the keyword here.

The same principle applies to small applications too.

If you apply it correctly, the application never going to be slow due to slow db queries and you won’t have to optimize complex queries at all.

Plus if you want to split out part of an app to its own service, it’ll be easily possible.

nicoburns · 2025-08-16T11:53:07 1755345187

One of the last companies I worked at had very fast queries and response times doing all the joins in-memory in the database. And that was only on a database on a small machine with 8GB RAM. That leaves a vast amount of room for vertical scaling before we started hitting limits.

dondraper36 · 2025-08-16T10:57:54 1755341874

Vertical scaling is criminally underrated, unfortunately. Maybe, it's because horizontal scaling looks so much better on Linkedin.

mdavid626 · 2025-08-16T11:06:39 1755342399

Sooner or later even small apps reach hardware limits.

My proposed design doesn’t bring many hard disadvantages.

But it allows you to avoid vertical hardware scaling.

Saves money and development time.

dondraper36 · 2025-08-16T11:35:52 1755344152

Not really disagreeing with you here, but that "later" never comes for most companies.

AdrianB1 · 2025-08-16T10:52:43 1755341563

My manufacturing data is hundreds of GB to a few TB in size per instance and I am talking about hot data, that is actively queried. It is not possible to restructure and it is a terrible idea to do joins in the front end. Not every app is tiny.

mdavid626 · 2025-08-16T11:00:18 1755342018

In some cases, it’s true.

But your thinking is rather limited. Even such data can be organized in a way, that joins are not necessarily in the db.

This kind of design always “starts” on the frontend - by choosing how and what data will be visible eg. on a table view.

Many people think, showing all data, all the time is the only way.

AdrianB1 · 2025-08-16T11:10:35 1755342635

The SQL database has more than a dozen semi-independent applications that treat different aspects of the manufacturing process, for example from recipes and batches to maintenance, scrap management and raw material inventory. The data is interlocked, the apps are independent as different people in very different roles are using it. No, it never starts in the front end, it started as a system and evolved by adding more data and more apps. Think SAP as another such example.

mdavid626 · 2025-08-16T11:14:23 1755342863

This is and “old-school” design. Nowadays I wouldn’t let apps meet in the database.

Simple service oriented architecture is much preferred. Each app with its own data.

Then such problems can be easily avoided.

dakiol · 2025-08-16T11:49:19 1755344959

It’s not old school, it’s actually solid design. I have worked too with people that think the frontend or even services should guide the design/architecture of the whole thing. Seems tempting and it has the initial impression that it works, but long terms it’s just bad design. Having Data structures (and mainly this means database structures) stable is key to long term maintenance.

johnmaguire · 2025-08-16T12:26:53 1755347213

> Seems tempting and it has the initial impression that it works, but long terms it’s just bad design.

This appears as an opinion rather than an argument. Could you explain what you find bad about the design?

In any case, I believe a DB per backend service isn't a decision driven by the frontend - rather, it's driven by data migration and data access requirements.

RaftPeople · 2025-08-16T16:07:49 1755360469

> In any case, I believe a DB per backend service isn't a decision driven by the frontend - rather, it's driven by data migration and data access requirements.

I think the idea of breaking up a shared enterprise DB into many distinct but communicating and dependent DB's was driven by a desire to reduce team+system dependencies to increase ability to change.

While the pro is valid and we make use of the idea sometimes when we design things, the cons are significant. Splitting up a DB that has data that is naturally shared by many departments in the business and by many modules/functional areas of the system increases complexity substantially.

In the shared model, when some critical attribute of an item (sku) is updated, then all of the different modules+functional areas of enterprise are immediately using that current and correct master value.

In the distributed model, there is significant complexity and effort to share this state across all areas. I've worked on systems designed this way and this issue frequently causes problems related to timing.

As with everything, no single solution is best for all situations. We only split this kind of shared state when the pros outweigh the cons, which is sometimes but not that often.

mdavid626 · 2025-08-16T17:06:52 1755364012

I disagree. I generally understand the problem a "split-up" database brings to the table. This is how people designed things in the last many decades.

What I propose is to leave this design behind.

The split up design fits modern use cases much better. People want all kind of data. They want to change what data they want rather often.

"One" database for all of this doesn't really work -- you can't change the schema since it's used by many applications. So, you'll stuck with a design coming from a time when requirements were probably quite different. Of course, you can make some modifications, but not many and not fundamental ones.

In the split-up design, since you're not sharing the database, you can do whatever you want. Change schema as you see fit. Store data in multiple different forms (duplicates), so it can be queried quickly. The only thing you have to keep is the interface to the outside world (department etc.). Here you can use eg. versioning of your API. Handy.

The 90's are over. We don't have to stick to the limitations people had back then.

Yes of course, data not being up-to-date in every system can be a problem. BUT business people nowadays tend to accept that more, than the inability to change data structures ("we can't add a new field", "we can't change this field" etc.).

RaftPeople · 2025-08-17T16:20:46 1755447646

> In the split-up design, since you're not sharing the database, you can do whatever you want.

> we can't add a new field, we can't change this field

Ok, let's do an example.

Assumption:

A-ERP system with approximately 30 modules in use (e.g. sales order mgmt, inventory, purchasing, etc)

B-For split DB, the DB is split by module and data flows exist for all shared data. So there are X different copies of the item master (many and possibly most of those modules use the item master), each with the subset of data required by the specific module.

Sample change, add a new field to the item master:

Shared DB:

1-Update DB schema for item master

2-Update code in different modules that need to use the new data element (per feature requirements)

Split DB:

1-Update DB schema in all modules that require the new data element (per feature requirements)

2-Update code in different modules that need to use the new data element

3-Update the data flows for item data in each module that needs to use the new data element

I think you're understating the level of effort when you say "now we can do whatever we want". The actual effort in this change (which is a very common example) is actually greater than in a shared DB and requires more coordination.

Again, there are times when it's the right thing to do, but definitely not a silver bullet without trade-offs.

mdavid626 · 2025-08-19T13:16:12 1755609372

It's interesting to see what people consider difficult to do.

In my opinion the "Split DB" case you outlined is still much easier to do.

It's never the lines of code or number of steps that need to be done make it complicated or difficult.

It's always the strange, weird, unexpected things. I change "this" and "that" breaks, but nobody knows why.

The biggest benefit of my approach is that it can be split-up between people. One team handles this part, the other some other part. You can only break your part of the database, not everything for everyone else.

sgarland · 2025-08-17T03:43:08 1755402188

If you have to change your schema frequently, you didn’t adequately (or at all, more likely) model your data.

DB schema is supposed to be inflexible and strict; that’s how you can guarantee that the data it’s storing is correct.

> The 90s are over

And now we have a generation of devs who think that 1 msec latency for disk reads is normal, that applications need to ship their own OS to run, and that SQL is an antiquated language that they don’t need to bother to learn.

mdavid626 · 2025-08-16T11:11:06 1755342666

Good, simple solution could be data duplication, eg. store some props from the joined tables directly in the main table.

I know, for many, this is one of the deadly sins, but I think it can work out very well.

sgarland · 2025-08-16T19:37:36 1755373056

Unless all your tables have the same width - or you’re doing weird things with constants in your SELECTs - you can’t UNION the various queries, so they’re sequential. You could parallelize those I suppose, but now you’re adding more complexity.

If you want a KV store, use a KV store. If you want an RDBMS, then use its features. They haven’t changed much in the last 50 years for a reason.

DanielHB · 2025-08-18T14:12:02 1755526322

Microservice achitecture promotes splitting data cross multiple databases making it impossible to do proper DB JOINs from application code.

Then companies buy a solution to aggregate all the different databases in a single "data-lake" (or whatever buzzword is hot right now) so you can do OLAP queries. Without consistency guarantees of course.

And I am not saying this is never the _right_ solution, but it should almost never be the _first_ solution

torginus · 2025-08-16T10:59:22 1755341962

Are you sure about this?

Let's say you run a webshop and have two tables, one for orders with 5 fields, one for customers, with 20 fields.

Let's say you have 10k customers, and 1m orders.

A query performing a full join on this and getting all the data would result in 25 million fields transmitted, while 2 separate queries and a client side manual join would be just 5m for orders, and 200k for customers.

jameshart · 2025-08-16T11:42:16 1755344536

If you need all the orders and all the customers sure.

But usually you need some of the orders and you need the customer info associated with them. Often the set of orders you’re interested in might even be filtered by attributes of the customers they belong to.

The decision of whether to normalize our results of a database query into separate sets of orders and customers, or to return a single joined dataset of orders with customer data attached, is completely orthogonal to the decision of whether to join data in the database.

bambax · 2025-08-16T18:13:22 1755368002

One way to think about 1-to-many relationships is to think in the other way, "many-to-one". You don't join the orders to the customers, you join the customers to the orders (enrich the orders with customer information).

It's very natural to want customer information when querying an order, and if you have a view like orders_with_customer_info, you get that with zero effort when querying that view by order id.

You also get consolidated data (orders by customer) by doing

  select count(*), sum(amount) from orders_with_customer_info group by customer_id

which I think is pretty straightforward.

digitalPhonix · 2025-08-16T11:06:37 1755342397

What sort of application is regularly doing a query for “all data”?

aembleton · 2025-08-16T13:48:25 1755352105

Client report generation.

tremon · 2025-08-16T15:05:11 1755356711

1) As soon as reporting requirements get serious, you build a data warehouse. Because odds are, the client will want to combine data from multiple systems in their reports anyway. If not today, then they will tomorrow.

2) such reports never need all the data, it's mostly about top N volume queries or month-over-month performance data. When a reporting application does query all the data, it's because it's building its own data warehouse so the query usually happens only once per day, at a specific time, which means the load is entirely predictable.

digitalPhonix · 2025-08-16T19:49:18 1755373758

Even then, DB side aggregation and joins would be beneficial.

I’m asking specifically about “all data” unfiltered, un-aggregated, un-joined as the parent’s analysis was on.

nicoburns · 2025-08-16T11:56:51 1755345411

These days you can use JSON aggregation in the database to avoid returning duplicate data in what would otherwise be large joins.

dondraper36 · 2025-08-16T11:05:06 1755342306

What I particularly like about the comments in this thread is how it proves that everything is a trade-off :)

valiant55 · 2025-08-16T11:56:25 1755345385

My rule of thumb is if it's a 1:1 relationship, use a join. If it's 1:M, separate queries.

quietbritishjim · 2025-08-16T09:03:19 1755334999

I think it's ok to have this rule as a first approximation, but like all design rules you should understand it well enough to know when to break it.

I worked on an application which joined across lots of tables, which made a few dozen records balloon to many thousands of result rows, with huge redundancy in the results. Think of something like a single conceptual result having details A, B, C from one table, X, Y from another table, and 1, 2, 3 from another table. Instead of having 8 result rows (or 9 if you include the top level one from the main table) you have 18 (AX1, AX2, AX3, AY1, ...). It gets exponentially worse with more tables.

We moved to separate queries for the different tables. Importantly, we were able to filter them all on the same condition, so we were not making multiple queries to child tables when there were lots of top-level results.

The result was much faster because the extra network overhead was overshadowed by the saving in query processing and quantity of data returned. And the application code was actually simpler, because it was a pain to pick out unique child results from the big JOIN. It was literally a win in every respect with no downsides.

(Later, we just stuffed all the data into a single JSONB in a single table, which was even better. But even that is an example of breaking the old normalisation rule.)

9rx · 2025-08-16T09:07:47 1755335267

> which made a few dozen records balloon to many thousands of result rows

That doesn't really sound like a place where data is actually conceptually joined. I expect, as it is something commonly attempted, that you were abusing joins to try and work around the n+1 problem. As a corollary to the above, you also shouldn't de-join in application code.

kccqzy · 2025-08-16T10:37:25 1755340645

It's a join. A join without any ON or USING clause or any filtering is a Cartesian product which is what's happening here.

nicoburns · 2025-08-16T11:58:19 1755345499

If you use CTEs and json_agg then you can combine your separate queries into one query without redundant data.

wongarsu · 2025-08-16T10:08:53 1755338933

That reminds me of many cases of adhering to database normalisation rules even in views and queries, even in a case where you should break it. Aggregation functions like postgres's array_agg and jsonb_agg are incredibly powerful at preventing the number of rows from ballooning in situations like those

magicalhippo · 2025-08-16T11:06:40 1755342400

I think it's more like avoid doing a "limiting" join in the application, ie where the join is used to limit the output to a subset or similar.

As a somewhat contrived example since I just got out of bed, if your software has a function that needs all the invoice items from invoices from this year which invoice address country is a given value, use a join rather than loading all invoices, invoice addresses and invoice items and performing the filtering on the client side.

Though as you point out, if you just need to load a given record along with details, prefer fetching detail rows independently instead of making a Cartesian behemoth.

scarface_74 · 2025-08-16T12:23:50 1755347030

I have a very strict rule about any system I design or that I’m over against using stored procedures.

When I use to interview to be a developer at a company, it was always an automatic no for me if a company kept business logic in stored procedures and had a separate team of “database developers”.

As far as not doing joins in code, while I agree for the most part. GitHub itself has a rule against joining tables using sql that belong to different domains.

https://github.blog/engineering/infrastructure/partitioning-...

victorbjorklund · 2025-08-16T12:32:24 1755347544

Not sure I agree. First of all it can be more performant. Say you fetch 1000 records. And we need to join on a table where these 1000 records just got 2 different foreign keys. Instead of joing in db and fetching a lot more data we can do two queries and join in app instead. Secondly, makes it easier to cache data. Lets say the thing we joing with almost never changes (like some country info) we can cache that and just join it with the data from the db.

Not saying this should always be the case, but sometimes it is the right call.

teraflop · 2025-08-16T12:44:39 1755348279

But as a counterpoint to that, (a) the database has its own caching built in, which you don't have to implement, and (b) the database knows when to invalidate its cache.

To quote Douglas Adams: "The major difference between a thing that might go wrong and a thing that cannot possibly go wrong is that when a thing that cannot possibly go wrong goes wrong it usually turns out to be impossible to get at or repair."

Likewise, if you cache a piece of data in your application because you assume that it won't change, that just makes it likely that if and when it does change, you'll have bugs. Moving the cache to the database layer so that it can be properly invalidated fixes this.

It's true that an application-side join can still be more performant if the DB cache isn't good enough, but IMO you should only take that step after actually profiling your queries.

luckylion · 2025-08-16T12:01:32 1755345692

I feel like the biggest question to ask is: how expensive is it exactly, how often do you need to do it, and how important is the speed of it?

If you have some complex queries on every page load with a huge number of users, put it in the DB as much as possible.

If you need to iterate over a bunch of records and do something based on some combination of values, and it's for a weekly reporting thing, I'd much rather see 3 nested foreach loops with a lot of early exits to skip the things you don't care about than a multi-kb SQL-statement that took two days to develop and nobody every dares to touch again because it's hard to handle.

hk1337 · 2025-08-16T10:13:28 1755339208

You should be careful with how much you lean into “doing it in the database” as well with how you implement it. Lest, you get the situation where your application inserts as one value and it gets saved completely different.

mattmanser · 2025-08-16T10:47:59 1755341279

I'm not sure if this is what you mean, but I think a big thing missing from the article is how you should isolate you business logic.

A great software design will separate all business logic into its own layer. That might be a distinct project, module, or namespace, depending on what your language supports. Keep business logic out of SQL and out of web server code (controllers, web helpers, middleware, etc.).

Then you're treating SQL as the data store it is designed to be. When you embed application logic in SQL, you're hiding core functionality in a place where most developers won't expect to find it. This approach also creates tight coupling between your application and your database provider, making it hard to switch as needs change/the application grows.

dondraper36 · 2025-08-16T11:12:34 1755342754

That also depends on what you would consider "business logic in the database".

What would you say about CHECK constraints, though? I don't think it's something few developers expect to see, and having these checks is very convenient.

I know that there are even opponents of foreign keys (which makes sense sometimes), but in general, I don't understand why I would ever throw away the nice features of Postgres that can enforce correctness.

hk1337 · 2025-08-16T16:24:33 1755361473

Pretty much yeah. I think you need to stay somewhat vigilant because it can creep in under the guise of validating data.

Triggers is something I was thinking about that is grossly misused.

porridgeraisin · 2025-08-17T13:37:20 1755437840

I like using triggers for validation important for maintaining data integrity.

Example: let's say you have a parent table P whose rows are soft deleted (update P set _deleted=1 where id=?) and a child table C (foreign key on P.id) On delete cascade isn't gonna work here obviously. So I use a trigger to emulate on delete cascade.

Also, I've seen triggers to cancel transactions raise abort(..) if they violate some property that can't be retrofitted into a proper db constraint due to team politics at a previous workplace.

Ozzie_osman · 2025-08-16T13:28:53 1755350933

There are definitely examples of when you want to do joins in the application.

For example, you may want to (or have the option to) vertically partition your database, or use different data stores. The app layer is usually stateless and can scale perpetually, but the database might be a bottleneck.

Joining in the database over the application is a great default. But I wouldn't say "never join in the application code".

tossandthrow · 2025-08-16T08:57:18 1755334638

Views make good sense when you can check them in - and DB migrations are a poor way of doing it due to their immutable nature.

Depending on the ecosystem the code base adopts a good orm might be a better choice to do joins.

nurettin · 2025-08-16T09:31:17 1755336677

for me, it is orm -> schema bound views -> views -> table functions -> stored procedure as a last resort (hopefully it doesn't come to that)

bencornia · 2025-08-16T16:12:55 1755360775

Views are great. Stored procedures are cursed.

0xDEAFBEAD · 2025-08-17T09:13:24 1755422004

>Stored procedures are cursed.

Elaborate?

bencornia · 2025-08-19T14:55:44 1755615344

Stored procedures make sense if you are trying to make your backend less chatty with the database. The downsides are that they aren't version controlled. PLSQL and its variants can be inscrutable and difficult to debug. I guess you just have to decide what tradeoffs you are willing to make in your application. For me stored procedures take too much discipline for any potential upside. Sure you might have a wizard on your team that has it all in their head. But once they are gone it becomes a tar pit.

CafeRacer · 2025-08-16T09:00:52 1755334852

I came here to say an exactly opposite things. There were a few instances where a relatively heavy join would not perform well, no matter what I tried. And it was faster to load/stitch data together with goroutines. So I just opted to doing it that way.

Also SQL is easy, but figuring out what's up with indexes and planner is not.