
Domain Logic and SQL (2003) - mxschumacher
https://www.martinfowler.com/articles/dblogic.html
======
tylerjwilk00
I wince every time I use application code to do what I know SQL can do faster
and easier. ORMs are nice but it's often impossible to translate performant
SQL into the ORM's API. SQL is just so concise.

~~~
jmnicolas
We recently changed one of our supplier. The previous one allowed us a full
read access to their database (400 tables) but the new one insists that we use
their SOAP webservices.

So now when someone ask me for some data, I need to code, manually joining
objects instead of crafting a quick SQL query and using the export to CSV
function.

Dictionaries (C# key value collection) have fast become a staple diet since
Linq is way too slow when you try to 'join' hundred of thousands of objects
together.

It feels backward (and I'm saying this as a dev, I'm certainly no DBA or SQL
expert).

~~~
stefano
If the data fits in RAM, you could try loading it into an in-memory sqlite
database, and run SQL queries there.

~~~
jmnicolas
I thought about making a "shadow database" on a server since my work PC is
only 8 GB ram but ultimately decided against it.

The first reason was GDPR compliance and the second is that we have an already
complex architecture as it is (and we're only 2 to manage it) so I don't want
to add one more brittle layer that will need debugging at the worst time.

~~~
wwweston
About GDPR: is it possible that holding data in a non-storage/volatile medium
could be legally distinct from non-volatile storage, especially if it's
essentially syncing with an authoriative data source that's responsible for
managing GDPR? Because if not, it seems to me every proxy and persistence
layer runs legal risks....

~~~
jmnicolas
To be honest I have no idea. But since our supplier still hasn't implemented
the mandatory 4 months max data retention I'm not taking any risk.

------
emccue
This is why I like transparent sql libraries like
[https://www.hugsql.org/](https://www.hugsql.org/)

Just writing sql and maintaining and testing it like regular code is a
perfectly reasonable thing to do.

Not to mention that having sql be callable as an actual function in the
programming language that the rest of your application is written in is just
so much more comfortable than having small bits of sql scattered about in
strings.

------
tkyjonathan
It is worth mentioning that there are some data-process heavy work that can be
done much faster and more efficiently in SQL.

Real life example for a regulatory batch job:

6 tomcat servers + 1 RDS. 30X lines of code + UTs in java. 30+ minutes time.

In SQL No tomcat servers + 1 RDS instance. 1X lines of code(SQL) + UTs (in
java). 3+ minutes

Here is a good book on it: [https://www.amazon.co.uk/Relational-Database-
Programming-Set...](https://www.amazon.co.uk/Relational-Database-Programming-
Set-Oriented-
Approach/dp/148422079X/ref=sr_1_1?keywords=set+oriented+approach&qid=1582289243&sr=8-1)

~~~
smoyer
"But that's not going to be database agnostic!"

Haha ... I'll admit I used to say that but rarely did an internal system (what
I work on most) ever actually have that requirement. Systems that are sold to
customers to run on their premises are another story but don't worry about
using PostgreSQL-only (e.g.) features to speed up both your processing and
development times. I have however seen the horrors of 600 line stored-
procedures with calls to other stored-procedures nested 15 deep. If you get to
the point that you can't easily understand the business logic, you've gone too
far.

~~~
mumblemumble
There seem to be two ways that interacting with the database works out in
practice: You can embrace the specific DBMS you have, get the most out of it,
and end up tightly coupled to it. Or you can try to be database agnostic,
increase your development costs and limit your performance and data
expressivity in the service of that goal, and still end up tightly coupled to
the DBMS.

My pet hypothesis is that "database agnostic" isn't really about being able to
switch databases on a whim. It's more expressing a developer-centric anxiety.
A dog whistle way of saying, "I don't want to have to RTFM of a new DBMS."
Which I get, most DBMSes have awful manuals. Especially SQL databases. SQL
dialects tend to mix syntax, semantics, and subtle implementation details all
together into one big ball of mud and tangled hair. But this is one spot where
I think it really is worthwhile to hold your nose and get through it.

------
victor106
this really depends on the context.

Scenario 1: No plans to add additional functionality. Sticking to the
e-commerce example in the article, if I am designing a system for use by just
one customer who wants to offer the discount for just that one scenario and
they don’t have any future needs to offer any other kinds of discount then put
logic in theDB. It’s quick and easy.

Scenario 2: Plans to expand to add additional functionality. Sticking to the
example we have future plans to offer other types of discounts to our
customers. Which means we have to develop generic interfaces so it’s easy to
add functionality. Have the business logic in the application layer.

Scenario 3: plans to expand and additional functionality but also let users
configure and add additional functionality without IT intervention. Sticking
to the e-commerce example we want business users to create new offers and
expose them to our customers. In this case use a rule engine like drools.

It really depends on what functionality you are delivering and how you see
that evolving in the future

------
sriku
The article was written at a time when "polyglot" wasn't common iirc ... to
the extent that knowing SQL in addition to a programming language wasn't
common.

Today the thinking "keep all domain logic in code" would take you to crazy
places given the diversity of systems we need to build applications.

"A foolish consistency is the hobgoblin of little minds" and all that.

If anything it seems more sensible to centralize storage of all business rules
in the DB.

------
jasonhansel
One thing this leaves out is the danger of concurrency issues, which can
sometimes be worsened by using an ORM. Writing correct, race-free SQL can be
very hard (with READ COMMITTED transactions) or require complex retry logic
(with SERIALIZABLE). Preventing deadlocks can also be a concern.

ORMs tend to hide the underlying SQL operations, making it even harder to
verify whether operations are concurrency safe.

~~~
mumblemumble
I think that this is one of those spots where, if you want things to be ORM-
friendly, you've got to bake it into the data model.

A good starting point is to use CR instead of CRUD. Append-only tables with
soft deletes aren't necessarily the most performant, but they're naturally
less susceptible to race conditions. The lack of destructive modification
(under normal operation) also makes it easier to diagnose problems and keep
audit trails.

~~~
slim
isn't soft delete an update?

~~~
pocket_cheese
One way I get around this is having an append only version of the main table,
and then a "most current record" version of the table. My append only table
will be a running record of all the data I've loaded unmodified. Then I run a
create table as select statement against my append only table to generate my
new version of the final table. So in essence, I'm "deleting" the old version
of the record by omitting it when I create my new version of the final table.

------
oarabbus_
It is a great tragedy of our day that so many software engineers are poor at
SQL. Use tools like Jinja and DBT to modularize it, and Airflow to schedule
it; but use of SQL should be maximized wherever possible.

~~~
jeanloolz
I agree completely. I love SQL just as much. I built a little framework based
on SQL and jinja last year.

Shameless plug: [https://github.com/socialpoint-
labs/sqlbucket](https://github.com/socialpoint-labs/sqlbucket)

Just changed job though, and the data engineering team I just joined is
literally banning raw SQL and enforcing scala instead... I'm a bit sad.

------
nizmow
One thing this article doesn't mention is that in many cases databases scale
poorly, and only vertically. You can throw a bunch of domain logic in complex
SQL queries and see significant performance benefits, but at some point you
may find that your application has grown and your SQL server is running into
IOPS and CPU limits that are difficult to fix. At that point your answer could
be, pull logic out of those complex queries and create simpler queries that
ask less of the DB and run on the application side so you can scale
horizontally.

~~~
ak39
Most (if not all) modern RDBMSs allow for reliable horizontal and vertical
scaling. SQL Server (the db you mention) is no slouch when paired with
additional CPUs (for vertical linear scaling) and allows for transactionally
safe replication to distributed clusters of SQL Servers (for horizontal
scaling) in geographically distributed servers. PostgreSQL has this and it's
absolutely free.

Where is this meme that RDBMSs do not scale coming from? (I see lots of people
saying this as though it is a given. Has there been any published data on this
for me to read?).

~~~
keanzu
> Where is this meme that RDBMSs do not scale coming from?

They don't scale to Google or Facebook operational sizes. Once you get to a
billion customers or so the ol' RDBMS tends to struggle. Because everyone
wants to be Google they imagine they have Google's problems. I've been in a
meeting where the client was talking about their severe scaling issues for
their "big data" which could only possibly be resolved by state of the art
cloud solutions. I pressed them on the numbers - they had 400GB. You can buy
an iPhone with 512GB.

Oracle's Exadata X8-2 is ~1PB per full rack.

~~~
hmmmmm2
> They don't scale to Google or Facebook operational sizes.

You may or may not know this, but the primary datastore used at both Google
and FB is MySQL. Sure, they use replication and sharding, but I would strongly
argue that MySQL with sharding scales better than some multi master NoSQL
thing like Cassandra.

Related, you should check out
[https://github.com/vitessio/vitess](https://github.com/vitessio/vitess) if
you haven't seen it. It's what Youtube and others use for their primary data
store in production.

~~~
keanzu
> You may or may not know this

I didn't, but they do, surprising!

[https://www.facebook.com/notes/facebook-engineering/tao-
the-...](https://www.facebook.com/notes/facebook-engineering/tao-the-power-of-
the-graph/10151525983993920/)

