
ORMs Are Backwards - awinter-py
https://abe-winter.github.io/2019/09/03/orms-backwards.html
======
bayesian_horse
I have a lot of experience with the Django ORM, but no other.

So I disagree with most points.

The Django ORM does map pretty neatly to SQL (in my opinion), and especially
Postgres. The migration tool is anything but "half-baked". And I have observed
that beginners have an easier time creating Django models than dealing with
straight SQL.

On top of that, the Django ORM mitigates most SQL injection attacks out of the
box, without even requiring the developer to know about SQL injections in the
first place.

And the ORM enables elegant code reuse by constructing query objects that can
be used in Template logic, Form validation, generic CRUD, REST endpoints and
more.

I agree that most ORMs don't help you with multiple programming languages.
Though both the Django ORM and SQLAlchemy allow to customize a lot of the
naming conventions and can be adapted to any legacy schema.

But then again: Is it even a good idea to access the same database from two
codebases at all?

Some of the concerns about "ORMs kill the SQL-star" can be boiled down to a
lack of a general understanding of relational databases in developers who
learn to use an ORM before diving deeper into the philosophy of relational
databases.

~~~
nnq
...you're the _ONLY_ person I saw with a good opinion of Django's ORM. I'm
willing to accept you could be right in preferring it, but:

 _Could you share some resources of using it properly for advanced use cases?
Good cookbooks etc._

For, it's always been a huge footgun, coupled with Python's dynamic nature,
it's and endless source of bug after bug... Mistype a property on a model
instance - no problem, no error, it will just not get saved to the db and data
will be lost. Use progres native json and array fields - good luck running
your tests on sqlite, the ORM that was helpful up to here just gives up and
tells you "f u buddy, you're on your own". Need to write complex hand-coded-
sql queries that return complicated data, but you's still want to be able to
serialize them into a graph of objects that can then be modified by regular
ORM code - good luck with that, nobody cared about this scenario.

 _They 've somehow managed to make it too simple and too complex at the same
time! Oh, and don't expect to just click a few jump to definitions an make
sense of the ORM's inner code easily, god forbid :|..._

While I like the _idea of ORMs_ , I found them all _totally "wrong headed"_
And SQLAlchemy - that's just not worth the effort of learning to use it
properly despite its good ideas, if you're only going to end up using 10% of
its functionality and in an idiosyncratic way that was not intended.

~~~
satyanash
> progres native json and array fields - good luck running your tests on
> sqlite

you're not testing what you actually run in production, then what are you
testing? If you just need to test in-mem behaviour, mock the DB completely;
instead of expecting compatibility for vendor specific features across
vendors.

~~~
bayesian_horse
I'd say the sqlite backend and using it in testing is mostly useful for
reusable django apps that are supposed to be used in a wide variety of
projects.

Most of the code I write in Django would never need to be SQLite-compatible,
even if a lot of it would be. The SQLite compatibility is almost like a relict
from a time when setting up RDBMSs was difficult and slow.

~~~
Scarblac
> Most of the code I write in Django would never need to be SQLite-compatible,

Then don't run your unit tests against SQLite, run them against Postgres.

------
DevKoala
It looks like the author is generalizing shortcoming in SQLAlchemy as faults
with the concept of ORMs. ActiveRecord for example, treats SQL as “the source
of truth”, per his definition.

Also, there is a whole section called "ORMs take over connection management
and migration" and not a single point as to why connection management handled
by the ORM is bad, only comments on migrations. Moreover, migrations are not
an unsolved problem at all.

In Ruby, ActiveRecord solves all scenarios I have encountered gracefully, and
trust me, I have dealt with tons of edge case projects.

For Python, I recommend Alembic:
[https://pypi.org/project/alembic/](https://pypi.org/project/alembic/)

~~~
Diggsey
ActiveRecord does not treat SQL as "the source of truth". ActiveRecord objects
maintain their own state, and still try to be a "proxy" for rows in the
database. When you call "save", it's tracking the identity of the object using
the primary key, and generating SQL from the runtime state of the object. This
is backwards.

The biggest problem for me is this "proxy" behaviour. When you start treating
objects as proxies for database rows, your database logic inevitably leaks
into your business logic. Someone returns an ActiveRecord `User` object from a
database method and now random other parts of your application are intricately
linked to your database implementation.

On top of that you suffer terrible performance problems. Even if you're smart
enough to avoid the whole N+1 query issue, you lose control (and more
importantly visibility) over where database updates are coming from, and your
ORM is either not smart enough to efficiently apply updates, or so smart that
building the query takes longer than actually running it! (I have this problem
with SQLAlchemy right now, where it's taking _several seconds_ per request CPU
bound in the ORM). Undoing this is literally impossible without a near total
rewrite.

And it all serves no purpose! The application-specific interface to its
database is usually quite simple. You can just have it be a literal
interface/trait/whatever and implement each method by executing a bit of SQL,
or calling a stored procedure, or however you like. Arguments and return
values should be plain-old-data (no magic proxy objects) and each method
should correspond roughly to a transaction so that the rest of the application
doesn't have to worry about that database-specific stuff.

There are a ton of other benefits to doing stuff this way, too many to list
here, but it really helps with complex migrations (like if you need to migrate
from one database to another), maintainability, testing, etc.

~~~
jrochkind1
If "SQL is the source of truth" means you have no in-memory state, I'm not
sure that's feasible, with or without an ORM.

Once you fetch something from the db, you have it in variables in memory. With
or without an ORM. That could no longer be sync'd with the db a ms later. That
you will probably re-use for more than one access (if you need it more than
once), because issuing the same SQL again the second time you need to
reference any data returned, in a given routine, would be insane.

~~~
derefr
I think what they’re proposing is something like “if I want to read and then
update a value, incrementing it by one, the proper way to do that is to tell
the DB to increment whatever value it has stored; not to increment my local
representation of the value and then tell the DB to overwrite the value with
it.”

~~~
Scarblac
ORMs do that fine, for instance it'd be

    
    
        MyModel.objects.filter(pk=...).update(myfield=F('myfield') + 1)
    

In Django's ORM.

~~~
rubber_duck
This is not obvious unless you think about it - I've seen issues like this
because the code looks correct if you ignore the update semantics, especially
when it wont fail on your local dev env because in memory state will be the
same as DB state since you're the only one hitting it...

Having to think about these things because the abstraction is leaky instead of
just using the SQL where the transformation is explicit is not worth the
hassle.

Using Clojure and yesql was a real pleasure for this - Clojure is immutable so
you're working with values all over the place - and working with SQL query is
just passing values in it and getting values out - no magic mapping - it's
just values - if you want the latest value - query again, if you want to store
a new value - send the new value. No mutation, no magic, just data.

------
mumblemumble
So, I've seen this go back and forth so many times, and I've come to the
conclusion that the two extremes are irreconcilable. Not because they're
incorrigible, but because they represent two very different ways of using a
database.

For some, the database is not actually a database, per se, it's just a place
to persist state. Usually folks in this camp think in an object-oriented way.
They haven't read Codd's paper, they don't care about the relational algebra,
they just want some way to take bags of properties and stick them somewhere
that offers random access with good performance and is reasonably unlikely to
not spontaneously erase data. If you're in this camp, ORM works great, because
there isn't really much of an object/relational impedance situation in the
first place. The data store wasn't structured in a way that creates one,
there's no need burn a bunch of effort on doing so, and dragging in the whole
relational way of doing things is like dragging a wooden rowboat on a hike
through the forest because you'd like to use it to go fishing on the off
chance you find a nice pond.

Others see a lot of value in the relational model. They're willing to put a
bunch of time and effort into structuring the data and organizing the indexes
in ways that will allow you to answer complex, possibly unanticipated
questions quickly and efficiently. They, too, hate the syntax for recursive
common table expressions, but are willing to hold their nose and use them
anyway, for various reasons, but mostly because people think you're really
cool when you can spend 30 minutes and make something go 2-3 orders of
mangnitude faster than it used to. They don't think of the data in terms of
bags of properties, they think of it in terms of tuples and relations and a
calculus for rearranging them in interesting and useful ways. For them, there
is potentially a _huge_ problem with object/relational impedance; the data's
organized in ways that just don't fit cleanly into Beans.

The thing is, neither of these ways of using a database is wrong. Each has it
strengths and weaknesses. The trick is figuring out which way fits your
business needs. Well, that's the easy trick. The hard trick comes when someone
on your team doesn't understand this, believes there is one universal solution
that will work for everyone, and is hellbent on jamming the One Righteous and
Holy Peg, which happens to be square, into a round hole.

~~~
Scarblac
I know about the relational model and used to work with raw SQL for years.

But this is the stack I often work with now:

\- Postgres

\- Django ORM

\- Django REST Framework

\- Auto-generated OpenAPI Typescript functions

\- A Redux store

\- My React app

In other words, _four ORM-like non-relational frameworks stacked on top of
each other_ between Postgres and my app.

A generic solution for "complex, possibly unanticipated questions" would need
to work through all of those layers.

~~~
sul4bh
What do you use for "Auto-generated OpenAPI Typescript functions" ?

~~~
Scarblac
Django-rest-swagger to have a Swagger / OpenAPI spec on top of Django REST
Framework, and then OpenAPITools/openapi-generator to generate Typescript
code.

------
dyeje
You'll have to pry ActiveRecord from my cold, dead hands. I'll take it any day
over the half baked quasi-ORM that your company has inevitably made over the
years because an early engineer was anti-ORM and it turns out doing raw SQL
all the time is tedious.

~~~
rjbwork
At my last position we simply did sprocs and dapper. When you take as true
that the database is the source of truth, it only makes sense to put the
queries in the database. The only exception is dynamic queries or
modifications.

~~~
johnnyb9
How do you avoid leaking business logic into database stored procedures?
Sounds like you would now have two sources of truth, application logic, and
database stored procedures.

~~~
paragraft
I'd like to see someone have a decent go at reexamining the idea that business
logic (BL) doesn't belong in the DB and tease apart exactly what that should
mean, if it's actually still true, or even if it's ever actually been true.
Maybe something like that's been posted here before, but I haven't seen it...

I bought into the idea myself for a while, but when I interrogate my belief
there it was just something I picked up at uni as part of the general 3-tier
approach, and I'm not sure how much of it is really practically grounded. I
see the same ideas held dogmatically by newer employees, and when I ask them
about it in detail, I see the same fuzziness.

After all, if you want to really be pedantic, then you could claim anything
beyond having a single table with two columns, "key" and "value" is pushing BL
into the DB.

I.e, what practically does the separation gain you? You may want to swap out
the DB in future? Almost never happens, not without some significant other
refactoring going on (e.g breaking out into separate DBs because the product's
grown to the point of needing distributing across services, etc). If you want
to really design with that in mind, you're almost certainly giving up a lot of
functionality that the particular DB is going to give you. It's on the level
of holding onto the possibility that you'll want to change the language you're
writing in at some stage (and _that_ event would favour pushing more into the
DB anyway).

In reality, what I've found is when there's a reliable, transactional,
relational datastore at the bottom (i.e we're not talking about a larger
distributed system), then for your own sanity you want to be pushing as much
as possible down the stack as possible, to reduce the surface area that
someone could reach through to change data in the wrong way. Data consistency
issues are some of the worst when it comes to eroding customer trust, and if
your data access/mutation paths are many and varied then you can do your head
in or burn through a lot of dev morale trying to diagnose them.

The strongest advocates otherwise I've found are those in office cultures
where there's little trust between devs and DBAs and getting things through DB
review are a rigamorale that end up driving devs towards doing as much in code
as possible. I've always suspected that beyond Google envy, these sort of
dynamics are what drove a lot of the initial NoSQL movement...

~~~
taffer
Some interesting posts/articles on the topic:

[https://www.vertabelo.com/blog/business-logic-in-the-
databas...](https://www.vertabelo.com/blog/business-logic-in-the-database-yes-
or-no-it-depends/)

[https://sivers.org/pg](https://sivers.org/pg)

[https://www.martinfowler.com/articles/dblogic.html](https://www.martinfowler.com/articles/dblogic.html)

~~~
paragraft
Thanks, the Derek Sivers piece rings a bell now (and maybe was going into my
thinking on this), and the other two will go into my notes to avoid having to
repeat myself the next time this one inevitably comes up on the office
Slack...

------
mattigames
So you know one guy that quit over having and ORM instead of using SQL... do
you have any idea how many quit due having to deal with thousands of lines of
SQL string concatenations because some genius that also hates ORMs decided to
do a giant project that way?

Then you call your alternative anti-orm and it generates... orm specs, and
then you say "migration commands based on the git history"... what does that
even mean? What does git history has to do with the database?

~~~
alexandercrohde
So I want to respond to this because there is a genuine question behind your
rhetoric. And that question is "How do I cleanly do complex things in SQL
without an ORM?"

And there is an answer to that. The idealistic answer is good DB design from
the start. The realistic answer is by using views (sometimes materialized
views) to create a clean representation out of odd tables.

The monster query can be decomposed into simple and clean operations against
these views (and if necessary, views upon views).

The only caveat is you need a SQL expert to understand the performance if you
want to do this at scale.

~~~
zzzeek
I wrote SQLAlchemy after a five year job where all we did was work with
enormous views in an Oracle database. They performed nightmarishly bad. The
slightest attempt to query from one of these views using something as
heretical as a join would kill the database. The source code for the views was
locked up tight by DBAs who had no idea how our application worked, didn't
care, and they could never be bothered to help us with their horrendously
inefficient views or to give us new ones that our application sorely needed in
order to get data from the database in an efficient manner.

SQLAlchemy exists exactly because of the pain the "just use views" approach
causes. The entire point is that you can structure a query of arbitrary
complexity in a succinct manner using Python structures, while not losing any
of the relational capabilities. Of course you can write queries that are too
complex, if you're then lucky enough to be able to have materialized views at
your disposal, you can turn the SQL you wrote into one. But that's an
optimization you can use if you need it, or not.

~~~
alexandercrohde
On the one hand the article itself is offering a anecdote about being unhappy
with ORMs. On the other hand you have an experience in which you were unhappy
with views.

That would leave us at an impasse (he said / she said), unless we get more
nuanced. So let's.

It sounds like some of your frustration is at some DBAs you knew. Firstly I've
never worked with a DBA, I'm just a senior software engineer / DevOps and to
me that includes an intimate knowledge of SQL perfomance, so maybe this is why
I don't mind an in-database approach.

I'm fairly sure that whatever SQL Alchemy could do with a monstrously complex
data-relationship views can do with better performance (the further the
abstraction gets from the engine the less performance optimization it can do).

Now maybe you agree with that, and see ORMs as a tool for the kind of job
where the dev is locked out of the database. If so then I can't really
disagree as I haven't worked at such places much at all.

~~~
balfirevic
> I'm fairly sure that whatever SQL Alchemy could do with a monstrously
> complex data-relationship views can do with better performance (the further
> the abstraction gets from the engine the less performance optimization it
> can do).

SQL Alchemy will spit out SQL query. Whatever the DB engine can do with
queries hand-written on top of views, it can also do it with the query
generated by SQL Alchemy.

~~~
alexandercrohde
> Whatever the DB engine can do with queries hand-written on top of views, it
> can also do it with the query generated by SQL Alchemy.

No, on two levels. Firstly, there are many features in SQL that an ORM will
not support (e.g. no substitute for a materialized view).

Secondly, once you get really good at SQL, you learn that very tiny seeming
things can make the difference between a query running for an hour or a second
(e.g. case-insensitive search against an indexed column). Part of being expert
in DB technologies is knowing how/why some ways of writing a query are very
fast, and others are very slow. The idea of an ORM is to hide that complexity,
which is fine for a toy todo-mvc app. But once you start querying tables with
100k rows you need to understand that complexity and master it if you want to
write a fast query.

~~~
zzzeek
> Secondly, once you get really good at SQL, you learn that very tiny seeming
> things can make the difference between a query running for an hour or a
> second (e.g. case-insensitive search against an indexed column).

SQLAlchemy supports case-insensitive searches, per-column / per-expression
collation settings, index and other SQL hint formats for all databases that
support them, e.g. SQLAlchemy's query language supports most optimizing SQL
syntaxes, with the possible exception of very esoteric / outdated Oracle
things like which table you list first in the FROM clause (use index hints
instead). We are adding esoteric performance features all the time to support
not just SQL-level tricks but driver level tricks too which are usually much
more consequential, such as the typing information applied to bound parameters
matching up for indexes as well as special driver-level APIs to improve the
speed of bulk operations.

SQLAlchemy has been around for thirteen years, plenty of SQL experts have come
along and requested these features and we implement them all. Feel free to
come up with examples of SQL-expert level performance optimizations that
SQLAlchemy doesn't support and we'll look into them.

~~~
alexandercrohde
Yeah, don't get me wrong, I haven't used SQL alchemy and I'm not trying to
besmirch it. I'm not even sure we disagree.

What I'm trying to challenge is the philosophy that an ORM is an adequate
facade in place of learning how databases work. My point about views is that
in my experience, the answer to disgusting queries is cleaning up the DB
design (and views can be the tool to accomplish this). My point about case-
insensitive searching (you can create a case-insensitive index if you want) is
that a lot of db-performance stuff just can't be solved on the query side
alone anyways.

It sounds like SqlAlchemy is designed with a lot of flexibility around how
queries are run, so maybe you agree that understanding what's going on beneath
the hood is important to handle these complicated cases.

~~~
zzzeek
> What I'm trying to challenge is the philosophy that an ORM is an adequate
> facade in place of learning how databases work.

Thank you for this response and I agree, we are likely on the same page. I
don't know that anyone actually espouses that philosophy. This is the anxiety
that ORM skeptics have, and certainly you get beginners who rush into using
ORMs not knowing what they are doing, but if someone wants to be a DB expert,
I'm pretty sure they go to read about databases :) These ORM beginners will
fail either with the ORM or without out. I guess the idea is you'd prefer they
"fail fast", e.g. the ORM covers for them while they proceed to screw up more
deeply? This is arguable; if you've seen much of the non-ORM raw DB code I've
seen, it too fails pretty hard. But even with this argument, if ORMs produce
the problem of incompetents who are not found out fast enough, why hoist the
denial of useful tools to those developers who do know what they're doing.

------
peteforde
I know that the author of SQLAlchemy is reading these comments and I don't
want to be mean, but anecdotally I hear a lot of pain stories and the OP seems
to largely deserve the title "Don't Use SQLAlchemy".

Nobody suggests that ActiveRecord is perfect, but it is pretty amazing at what
it does. I think a big part of the reason it's more reliably excellent is that
Rails in general embraces the notion of "convention over configuration" \-
there's typically a correct naming scheme (with opinionated pluralization)
that starts in the table scheme and flows straight through the models to the
controllers and finally the view layer. I can join your Rails project and make
guesses about where your logical components are located and what they are
called that would beat any TV psychic cold 100:1.

Plus, much like with SQLAlchemy and basically every other modern ORM... you
are always one method away from writing raw SQL if you want to be.

As always: use the tools that work best for you and for the vast majority of
web development cases, you'd have to pry ActiveRecord out of my hands.

~~~
fizx
I use SQLAlchemy to serve tens of thousands of requests per second on one of
my services at Reddit. It's painful, but possible.

SQLAlchemy uses the data mapper + unit of work patterns. This means that your
connections and your record objects have the same lifecycle. If your request
takes 200ms, then you are likely holding open your connections for 190ms, even
though the only SQL query takes 20ms. If connections are scarce, and they
usually are, you will not utilize them well at all.

There are two ways that I know of to solve this, neither of which is pretty:
(1) run pgbouncer as a sidecar on your python processes. This has the effect
of making connections a far cheaper resource. (2) After every single query,
manually close the related transaction/connection/cursor in your code.

ActiveRecord defaults to giving up the connection after every query, which in
a high-traffic webapp, is much less of a footgun.

~~~
traviscj
(No significant experience with either SQLAlchemy or ActiveRecord, but work in
Java at a shop that really strongly emphasizes the unit-of-work pattern.)

"give up connection after every query" is, IMO, not _less_ of a footgun, but a
_more subtle_ footgun. That is to say, sure, for the most part, it will work
ok.

_But when it doesn't_... like when your database goes down for a couple hours,
or your rack loses network, or your proxy to whatever cloud provider is
running your database, or maybe some other dependency had an outage between
the queries... you end up not really knowing what executed and what didn't,
with some fraction of your 10k req/sec leaving behind some dangling state -- a
partially processed $whatsit -- and no clear/easy way to fix it or even
necessarily identify how many $whatsits were affected. At least a few times,
I've needed to either wait until records make their way into some OLAP system
for offline analysis, or deploy some fixup code, or emulate either/both to fix
stuff up after something weird happens.

At least that's what I am constantly afraid of. Do you also worry about this
kind of stuff? If not, what prevents it from being a problem in your
environment?

Signed,

The guy constantly badgering seemingly everyone about minding their
transaction boundaries.

~~~
jayd16
But he's not saying "don't use transactions." He's saying "Don't keep your
transaction open for your entire user connection for no reason. Close your
completed _transactions_ as soon as you can."

~~~
traviscj
Great point, thanks!

Fun story: A coworker and I are in the process of tracking down exactly
violations of this in a misbehaving app, due to an accumulation of RPC within
transactions across the entire app. (I used to be responsible for it as a
greenhorn ~5years ago, but walked away for the last several years to work in a
different part of the org.)

The approach is actually kinda interesting: we add an interceptor in the RPC
layer for every outgoing call that asks the DB machinery “am I currently in a
transaction?”, and emits a warning (containing at least the RPC being made,
perhaps a (partial) stack trace) if so. We had so many occurrences of those
warnings that our logging framework proactively started dropping log lines! So
the next step is to only emit the warnings once per $duration per message.

Another day in the life...

------
mianos
For simple queries, sure SQL is great but it is not composable. Ideally you
want to send your whole query to the database at once and for a complex
statement that may mean many many lines of reasonably complex joins etc. With
Sqlalchmey you can write a bunch of simple components and join them together
for submission at once instead of having a thousand lines of SQL that can be
hard to read. You can re-use parts and even come back later and understand
clearly what is happening. How do I know this? 25 years of writing SQL and at
leat 6 using SQLAlchemy. A lot of the time I will still compose and test in
SQL using, say DataGrip or psql, then build an SQLAlchemy query from that.
This guy should look at alembic too. It has it's faults (like the half module
half executable runtime), but it solves most of the problems of just having a
folder with a million naively numbered SQL scripts like managing up and down
versions as well as branches.

------
reilly3000
SQLAlchemy can accept raw SQL with:

    
    
      result = db.engine.execute("<sql here>")
    

I haven't met an ORM that wasn't more work to implement, especially when
trying to keep schema changes in line with existing deployments. That said,
I'll take a decent DB client SDK any day; I don't want to spend too many
cycles on connection pooling and such. I just want it to handle variable
substitution and concurrency primitives at the language level.

I don't think there's anything wrong with using tools to help you generate
good SQL queries, but you should see them clearly and own what the database is
doing. MySQL and Postgres have a myriad of useful functions and extensions
that ORMs don't utilize or hide from the developer. That leaves the
application to do multiple queries or a lot of logic that isn't needed.

It doesn't have to be gnarly; any SQL query that can't be handwritten on an
index card is probably too cumbersome to maintain. A 1000 line SQL query is a
code smell that means the data model could probably be improved, perhaps with
views.

------
bb88
As someone who's used both SQL and Django ORM, I could ask: "Why SQL is so
backward?"

Here's a datapoint I experienced:

The company I worked for didn't trust the Django ORM for DB migrations. They
thought they could do it by hand. After lots of deployment failures where the
migration script worked in the dev environment but failed in the prod
environment, we switched to using django migrations. Haven't had an issue
since.

The author even admits that migrations are a hard problem, but yet, he has no
solutions other than just using raw SQL.

Edited to Add:

In Django, these migrations are handled pretty well with a single command to
make the migrations. In SQL, someone would probably have to spend some time
testing, and then would need to be reviewed in the PR.

~~~
rectangletangle
In my experience Django's migrations are really solid, assuming you're using
the PG or Sqlite backends. This is especially true if you're working from a
new Django project, and not a "legacy system" that wasn't initially built for
Django.

~~~
Hamuko
They seem pretty solid for MySQL as well. I only remember having some issues
with VARCHAR PKs, but not even sure whose fault that was.

------
pg_bot
Every project I've seen that refused to use an ORM in favor of their own home
grown solution was worse off for doing so. Almost every time someone claims
that the ORM can't be used, I would point out how it could and that code
became simpler and easier to maintain. If they read the documentation and used
a bit of creativity instead of fighting against their tools, they wouldn't be
left with such a mess.

This post reads to me as the author not having the proper ability to structure
and plan their projects so they blame their tools instead of themselves. So
instead of understanding limitations and tradeoffs, they try to build their
own magic lamp that solves every problem under the sun.

~~~
aforwardslash
My experience is actually the opposite - every project I worked on that relied
on an ORM actually benefited from replacing it with a clear architecture
pattern and a query builder. Having the concrete notion of what is being
passed to the DB allows easier debugging, and it is way easier to solve
performance bottlenecks. As an example, recently I replaced several hundred
lines of complex ORM oriented code with a 10 line CTE that would perform the
iteration and calculation being done on the client side - the result was a 10x
increase in performance with a fraction of the memory. By not using an ORM,
the developer also stops treating the database as a "black box". It is a
"black box" conceptually, but the more you understand how it works, the better
your applications will perform. As an example - knowing if, inside a
transaction, when you're performing a read it comes from the server you just
started the transaction or some slave that may or may not be updated (due to
propagation delay) may seem picky, but its one of those things that is very
hard to debug afterwards as a transient error.

~~~
pg_bot
My ORM (and probably yours) handles CTEs ;)

~~~
aforwardslash
Yes and no, it seems its an external module. And this is part of the problem -
why would I need an external dsl extension to translate a pure relational
concept into SQL? Debugging the query will require sql knowledge, regardless
of ORM.In fact, the ORM is another layer to be debugged, it just adds
complexity.

------
sergioisidoro
I find these ORM opinions generally coming from someone who is used to, and is
proficient in SQL, and only when they have a complex use-case/operation at
hand.

If you consider a web project, with many developers of different levels of
experience, having an ORM is a life saver. Because not everyone is constantly
mindful of SQL injections. Because not everyone is constantly mindful of
transaction management in requests. Those take experience.

In my experience ORMs are great 95% of the time. Those 5% can be extremely
painful (I've gotten bitten by ORM Rails transaction nesting problems, and
sometimes it frustrates me when I need to make something more complex). But
let's not throw away the baby with the bath water.

I have had great experiences with Django ORM and SQL Alchemy. Rails ORM goes a
long way, but it feels eons behind Django's.

~~~
piva00
Code reviews are for that, I've taught countless junior developers on how to
write better SQL through code reviews. I've introduced them on how to mitigate
the common exploits and how powerful SQL can be when needed and I believe they
are better engineers because of it.

I really don't like using a tool as a crutch to a systemic deficiency of a
team, in the long run you are only compounding the problem, the more complex
the system gets the more workarounds and gotchas you have with ORMs so it's
better to educate developers on how to not depend on them from the get go.

Yup, ORMs have their place in some projects but in the majority of what I've
worked they brought more pain and confusion than just properly effing learning
SQL.

~~~
sergioisidoro
> I really don't like using a tool as a crutch to a systemic deficiency of a
> team,

Very valid point, but then this becomes a discussion of how much you want to
create mitigations in the system vs. educating developers to topics of, for
example security. The larger a team becomes, the harder it is to police that
every single query in your system is sanitized.

Still on the security side, Software architecture tell us not to repeat
ourselves, and if sanitizing query strings can be done centrally through an
ORM, then why shift the responsibility to code review processes that are
especially prone to failure due to their human nature?

As you said, these kind of become a crutch to people who don't know SQL.
People begin to rely on those instead of knowing the underlying mechanisms --
but hey, that's the price of any abstraction. Probably this same argument was
used when high level language compilers (analogous to the ORM in this
discussion) were introduced.

~~~
piva00
It will boil down to: how much more complexity having leaky abstractions such
as ORM adds to the project compared to teaching others to not make the same
mistakes all over the codebase with raw SQL.

As with anything related to software design and architecture, it depends,
depends on the size of the project, how many teams work on the same codebase,
etc. I don't think using an ORM as the sanitisation layer of your SQL queries
to be a good enough feature compared to its drawbacks on complexity and
abstraction leakage. The object model doesn't serve relational data very well,
that is already a first sign and smell for me that ORM should be used in very
specific cases instead of as the default go-to solution.

Having to deal later on the application lifecycle with partial entities and
the likes (anything to do with partial data access compared to the models'
defined fields) is really painful when done improperly, which is easy to do,
and then we are back on square one having to teach developers how to use an
ORM framework properly. Also one thing I really dislike (and that I believe
will always increase a system's complexity) is making code less explicit, an
ORM layer will make your code flow less accessible, you'll need to know where
the layer takes over and where it leaves you to implement more complex use-
cases and then you are back to writing SQL queries manually or depending on
some set of features from the library implementing the specific use-case you
are looking for.

I don't hate ORMs but I have mea culpa in using them improperly and learning
from my past experiences, nowadays it's really hard for me to justify its
usage aside from very simple data access patterns. I say that because in the
end you'll always need to reason about what the ORM layer is doing in terms of
SQL concepts, translating between those layers in my head is adding another
complexity: cognitive load.

------
mindcrime
You'll have to pry Hibernate and Spring Data JPA from my cold, dead hands.
Extending CrudRepository and getting a ready to use repo for an entity type,
combined with annotations for customer JPQL queries? Yeah, I'll take that over
any hand-rolled, low-level SQL interfacing for _almost_ anything.

But, as always, "use the right tool for the job." I have gotten tremendous
value from ORM's over the years, but I'm not going to insist on using one for
everything just "because".

~~~
fixedsys
Ha. Funny but I’m the exact opposite. Had to use hibernate and spring once.
Trying to debug with 45 XML config files scarred me for life. That coupled
with MySQL who even Ellison will tell you, is a toy, I saw the writing on the
wall and quit the job a couple of months before that whole team got fired
while the company had no choice but to throw the code in the bin. My view is
that ORMs can never compete with a proper enterprise scale RDBMS with people
who know how to use it.

~~~
twic
JPA/Hibernate has moved on a _lot_ from the church-of-XML days. The basics are
a lot smoother, with annotations and sensible defaults and so on, and there
are some pretty nice additions.

My favourite neglected feature is JPQL with constructor expressions, which
lets you write queries in an SQL-like language, but in terms of objects rather
than tables, which is slightly easier, and materialise results directly into
objects of your choice.

~~~
fixedsys
I don’t see the point of using an SQL like language instead of actual SQL
unless you need to support multiple different RDBMSs.

~~~
steve_taylor
One of the nicer JPQL features is being able to navigate the Java object model
and have that translated to the equivalent joins in SQL.

However, there are things that JPQL can’t do, but it’s easy enough to create a
native SQL query and have its results map to JPA entities. I use Spring Boot
(JPA, Spring Data, Spring Data REST) for the sheer convenience of it and speed
of development, not to abstract away the database, which is always PostgreSQL.
I am very comfortable writing SQL queries and making use of PostgreSQL-
specific features and Spring Boot with JPA certainly lets me do that when I
want to.

------
trixie_
At my company we use C#/EF with 50 developers and a gigantic codebase and
database.

A statically typed binding to the database with 'find all references' etc..
The ability to write a few lines of simple understandable code that turns into
a much more verbose SQL query aids in readability, maintainability, and
understand ability of the code base.

I have used other ORMs and feel that EF/LINQ is way simpler in that the same
query syntax is used for in-memory and database operations. They can be mixed
and matched for doing really intensive data processing with relatively few
lines of very easy to read code.

My only comment would be that using an ORM - you do need to understand how it
works, and how the code you write turns into SQL and a lot of the details of
contexts and tracking.

It's like flying a plane - big, complicated, it will get you where you need to
go very fast and effectively, but if you don't know what you're doing you'll
crash right into a mountainside.

~~~
fixedsys
IMO, SQL is a domain specific language that is far from verbose. If you need
to understand or predict the SQL and let that influence your ORM code, then
just use SQL.

~~~
olmo
If you're using a dynamic language, an ORM doesn't provide much. But in C#
LINQ has the benefit of being strongly typed, this means your DB schema can be
evolved. If you write raw SQL (or Python) and you rename a table/entity or
column/property you gonna have a bad time. In C# you can just use a
refactoring or, worst case, compile time errors.

------
imtringued
Insert queries are tedious. Especially if you are using named parameters.

    
    
        Foo.withSession { session ->
            session.createSQLQuery("INSERT INTO FOO (bar, baz) VALUES (:bar, :baz)")
                   .setParameter("bar", "bar")
                   .setParameter("baz", "baz")
                   .executeUpdate()
        }
    

vs

    
    
        new Foo(bar: "bar", baz:"baz").save()
    

I don't need more justification to use an ORM.

~~~
kangoo1707
Exactly, I don't get why people are against it. All ORM allows you to write
raw query and using ORM does not mean that you suck at SQL or vice versa.

~~~
cburgas
I have seen a lot of people in interviews that have failed SQL exercises and
said that they are rusty because they have been using ORMs for too long

~~~
icebraining
I'm also rusty on my assembler because I generally use high level languages
and compilers/interpreters; that's what abstractions do, and not really a
reason to avoid them, since people can generally solve the problem anyway, it
just takes a bit more time (more than saved by using the abstraction most of
the time).

~~~
sjwright
But assembler is the wrong analogy for SQL. SQL isn't a lower level version of
procedural code, it's a domain specific language.

When you write a technical paper in English, you switch to math equations when
appropriate. Math isn't assembly language, its the correct language. It's the
_native_ language.

Yes, you can always describe your equations with English prose—and if you're
doing simple addition and subtraction, it's probably nicer to write that in
English. But once you want to do anything remotely complex, writing it as a
math equation is more terse and less ambiguous.

~~~
icebraining
> Yes, you can always describe your equations with English prose—and if you're
> doing simple addition and subtraction, it's probably nicer to write that in
> English. But once you want to do anything remotely complex, writing it as a
> math equation is more terse and less ambiguous.

Right! But 99% of the time, I do _not_ want to do something complex - I just
want to load a few rows based on simple search parameters, and save a changed
values (which may involve heavy data processing, but not relational). Hence
only using SQL rarely, and therefore getting rusty.

~~~
sjwright
If you're only doing basic CRUD operations, then fair enough.

That said, based on my experience analysing applications, "loading a few rows"
to perform "heavy data processing" and then "save a changed value" smells
suspiciously like "the entire task can be rewritten as a single, moderate
complexity query and the data never even had to leave the database server."

I'm not saying that's the case in your own generic hypothetical, as there are
indeed forms of business logic and advanced manipulations that exceed the
scope of a database server. But you might be amazed at what you can do within
the SQL language. For example if you aren't intimately familiar with window
functions like DENSE_RANK() then do yourself a favour and learn about
those—and then contemplate how you could use them within a subquery joined to
a table that you're updating.

~~~
icebraining
Well, my current "heavy data processing" is generating a 3D render from some
metadata stored in the database. Postgres is amazing, but I don't think it can
do that yet :)

My experience over my career has been this: there's CRUD, lots of IO, and a
bunch of data processing that just needs specialized software (image, video,
weather simulation, etc). I try to offload what I can to the database - and
yeah, I know about window functions - but other than for the occasional
report, it just doesn't move the needle.

~~~
sjwright
Fair enough. It looks like we use databases in rather different ways.

------
syntheticcdo
The sweet spot for me is a query builder: a tool to generate SQL and treat
queries like code, and not get in the way like a heavy ORM. Shout out to
[http://knexjs.org/](http://knexjs.org/) for Node.

~~~
erikpukinskis
I’m curious if people are using raw Arel in Rails apps to this end.

~~~
jrochkind1
I don't use "raw Arel", I use most of the parts of ActiveRecord -- but I
definitely _appreciate_ that about Arel, and use that aspect of Arel
intentionally. Composable query parts made of code.

I think it's a mistake to think you can't write good efficient SQL for well-
normalized schemas with AR. You can. Usually anyway, for a great many use
cases.

[Arel is _really nice_, but sadly Rails absorbed it and considers it "private
API", especially in it's most sophisticated features. I use em anyway, they
generally don't break... but if I wasn't using ActiveRecord as a whole, I'd
probably use Sequel instead.]

~~~
peteforde
I work with Rails almost every day and I'm still quite fuzzy on where Arel
ends and ActiveRecord begins. It's an area of ambiguity that a technical post
on the history and contemporary status would be appreciated by many.

~~~
jrochkind1
For the most part, the stuff you do when constructing a query is Arel. Where
you chain methods and such.

When it was first added to Rails, improving the query building possibilities,
it was a separate gem that Rails depended on. I'm not sure if it was written
specifically for Rails originally just maintained in a separate gem, or
intended by it's original writers to be an independent project. But it was
later absorbed into the Rails repo and we were later told by Rails maintainers
that it's more sophisticated manual API (look up how to make a subquery with
Arel for instance) were not intended as public API and had no backwards compat
commitment.

------
maxk42
ORMs are amazingly useful in the limited case that you're performing basic
CRUD operations or simple joins.

In any sufficiently complex codebase you'll be doing more than this, however,
and the work to support the ORM properly then outweighs the work of simply
coding the raw SQL and preparing statements as appropriate.

Raw SQL also tends to be much more performant.

~~~
robomc
People hugely exaggerate this IMO. If you're writing a lot of queries that
can't be composed, with sufficient performance, in ActiveRecord, but can be in
raw sql, then you've probably done something hair-brained else-where.

And for those special (and IMO pretty rare) occasions, you can drop down to
arel or raw SQL anyway. Why throw away the consistency and readability of
something like AR for edge cases, when you can just treat your edge case as an
edge case with raw sql and still keep AR for your other 95% of queries.

~~~
sjwright
> you've probably done something hair-brained else-where.

That's not my experience.

I maintain a relatively unremarkable but bespoke online discussion forum,
which has hundreds of queries, few of which could be composed by an ORM, let
alone composed and run performantly. The _median_ complexity query in my code
base probably has two or three joins, two or three subqueries, and some kind
of aggregation or window function.

The result is a typical page runs around two or three queries total—one query
to authenticate the user and load everything about their profile and
permissions, one to load the entirety of the data being output on that page,
and occasionally one to update a statistic somewhere.

(The authentication query runs on every page because there's absolutely no
persistence in the application layer. The authentication query goes three
layers deep in subqueries and includes half a dozen joins. It hits perfect
indexes when it runs and takes only a few msec round trip.)

> People hugely exaggerate this IMO.

In my experience, people who think an ORM can do most things are simply under-
experienced with SQL and set theory.

~~~
icebraining
> The median complexity query in my code base probably has two or three joins,
> two or three subqueries, and some kind of aggregation or window function.

Well, just from that description, Django ORM could do it. Can you post an
example of a median query? I'm curious to see why it can't be ORM'ed.

~~~
sjwright
This is a heavily redacted, completely renamed and summarised version of a
typical page data query. It's less complicated that the top three most common
queries that run to build the most common pages.

    
    
      SELECT
        b.field, b.field, b.field,
        group_concat(concat(y.field, useful_thing)) as useful_things
      FROM (
        SELECT t.field, f.field, w.field,
          (case when w.wid is null then 0 else 1 end) as has_watched,
          exists(select id from posts p where p.tid = t.tid and p.uid = :uid) as has_posted
        FROM (
            SELECT tid
              DENSE_RANK() OVER (PARTITION BY foo ORDER BY bar DESC) AS useful_pseudo_id
            FROM editorial_things
            WHERE thing = :thing
            ORDER BY something
            LIMIT number
          ) as a
          INNER JOIN thread as t ON t.tid = a.tid AND t.last > Now()-INTERVAL 2 DAY
          INNER JOIN forum as f ON f.fid = t.fid AND f.fid in (:security)
          LEFT JOIN watched as w ON w.tid = t.tid AND w.uid = :uid
        ORDER BY something
      ) as b
      GROUP BY something;
    

The other thing is that being returned objects just adds complexity and
handling. The language I use has a perfectly nice native, iterable data type
for returned database records (kind of like an array of dictionaries) and I'd
rather just use that instead of an object middleman to satisfy some kind of
OOP completionist fantasy.

~~~
taffer
> he language I use has a perfectly nice native, iterable data type for
> returned database records (kind of like an array of dictionaries)

What language are you using for that?

~~~
sjwright
CFML (i.e. ColdFusion) using the Lucee engine on the JVM. Don't believe the
popular scorn: it's as good a language as any other for web development.
Perhaps not as innovative as newer languages, but it's densly packed with
pragmatic conveniences and (for better or worse) doesn't impose any particular
architectural style upon you.

CFML has a "query" datatype that represents the rows and columns returned from
a database combined with useful metadata and really neat features like n-level
iteration where values repeat. To the programmer it's like a dictionary that
magically changes its values by passing it to an iterator. Without the
iterator it works like a dictionary of the first row. Or if you treat it like
an array you can manually read any column in any row directly.

------
kissgyorgy
I think the opposite is True: SQL is backward to a programming language.
Creating statements by concatenating strings? An ORM makes real objects and
types you can actually instantiate and use, or even type check.

~~~
rjmunro
You don't create statements by concatenating strings. Use prepared queries
with placeholders and pass values. Databases will compile the query once, then
can run it many times.

Sometimes you have to concatenate, e.g. to specify order by clauses. This is
something database vendors need to fix. There need to be ways to tell the
database to modify a prepared statement without changing it's SQL. You might
be able to do it with a crazy IIF, but then it's likely not to use indexes
well.

~~~
Can_Not
How would you make a table or view that has:

    
    
        deleted_at TIMESTAMP
        created_at TIMESTAMP
        user_id INTEGER
        body TEXT
    

all _optionally_ filterable (greater than, less than, like, equal, etc.) where
applicable without an ORM, without string concatenation?

------
kryptonomist
After more than 10 years using those ORMs at work, in DotNet/Java worlds, my
current opinion on that topic:

-the only project where ORM was worth was a simple backend admin project with many tables and very little load: ORM avoids having to write tedious simple CRUD queries.

-for recruiters, it feels more simple to find people with language X than to find people with language X + SQL. OK but ...

-to understand the intricacies of ORMs takes a lot of time and effort, probably as much if not more as writing proper SQL.

-illustration of the previous point: it is hard to know the amount of data the ORM is really carrying, hence the common situation where half of your DB ends up in memory.

-developers tend to forget performance improvements. For instance, in C#, IQueryable objects are used everywhere, meaning that in any layer of your app, the query that will be pushed to your server might be modified: hard to tweak your performance in such context. More generally, this additional layer makes everybody postpone the time when you have to think about DB performance.

-it prevents you to use advanced DB features (ex.: built-in audit trailig and versioning), either because nobody ends up knowing SQL DB anymore, or because ORM makes it harder if not impossible.

-queries sent by the ORM to the DB seem always huge and be returning more than they should. -for performance or complicated requests, you always end up adding some SQL to your ORM.

-because of the previous point, the maintenance advantage of ORM, namely being able to automatically migrate your DB with your code is lost: you end up having to maintain more or less manually some SQL.

-letting the ORM automatically update the SQL DB during migration on real data always make people sweat: if any problem occurs during that process, welcome in hell. I feel always more comfortable being able to write some SQL fix in worse case.

-for my side projects, I never use ORM, and enjoy it.

~~~
romanovcode
For .NET there is a sweet middle-ground called Dapper[0]. SO uses it and they
have great performance.

[0]
[https://github.com/StackExchange/Dapper](https://github.com/StackExchange/Dapper)

~~~
mrlala
Completely agree. I've been using Dapper for a few years now and it's amazing
because I have control over everything. I created a simple overlay, which also
allows me to do some very specific things that you just cannot do with a
traditional ORM (like having dynamic table names)

Anyone wanting to set up something similar just watch Mosh's Repository
Pattern

[https://www.youtube.com/watch?v=rtXpYpZdOzM](https://www.youtube.com/watch?v=rtXpYpZdOzM)

I modified this pattern to use dapper instead of entity framework, but he
already decouples entity framework to the point where you still have control..
so you can do either.

------
iddan
ORMs are solving a problem that really should be solved on the DB layer:
providing a non-bullshit programmatic API. As a developer, I DO NOT CARE
whether the value of my object is saved in a different table or not I JUST
WANT IT SAVED. A DB that will provide a good programmatic API AND analysis
interfaces will be golden. And there have been a few good attempts reaching
there. ORMs will be solved when you don't need an ORM to access your DB.

~~~
quickthrower2
> I JUST WANT IT SAVED

Storing data is a hard problem and likely to be a leaky abstractions. "Save my
object" is OK until you need to ask questions like how to find such objects,
by which attributes. If I update this and another object, do I care that they
are done in a single transaction?

An "I JUST WANT IT SAVED" friendly abstraction would be JSON as a value in a
key/value store. As a bonus you can choose your consistency/availability trade
offs by going "nosql". For example Azure table storage. Nice!

But now what if your object is an invoice with a link to an account. Is than
an ID or is the account info embedded in the invoice. Etc. There is a lot of
design thought that needs to go into data. You can stop using SQL, Tables or
whatever but these problems don't disappear.

ORM allows you to almost forget about these issues for simple CRUD-like
tables. And for the start up style SaaS doing a bunch of boilerplate crud
stuff (to begin with) something like ActiveRecord saves the day. But anyone
using such conveniences should also learn database design, query optimization
and such things too.

------
MK_Dev
"...they force you to design schemas in your implementation language (python,
javascript, java) and then export them to SQL ‘somehow’."

No, they do not. If yours does, throw it away.

------
ajnin
I believe that ORMs try to solve a problem that fundamentally cannot work. As
long as you keep the usage simple and use your database as an object store, as
a majority of projects do, all is fine and dandy. But once you start to look
into your database a bit more, and try to use it for what it's capable of,
you'll run into problems. It it because the fundamental building block of the
data model of the languages in question here are objects, while the
fundamental data block of RDBMS are relations. As soon as you try to write
custom queries which join columns from other tables, then the simple
table<->object mapping ceases to work, the OMR becomes lost and it starts to
get in the way, and can make you lose a lot of time.

I've personally settled on using a "light" ORM, which does the mapping of
result sets to objects, variable replacement, and connection handling, and I
write the SQL myself. It's still not ideal, because for statically types
languages I need to write ad-hoc object types for query results, or deal with
untyped data, but at least I don't get any surprises.

~~~
oliwarner
Your issues with ORMs are indicative of _bad ORMs_.

Try a good ORM (eg Django).

------
viraptor
> was good at SQL and who quit over trying to work with a large aggregation
> query in SQLAlchemy

Is this really a good example? Sqlalchemy allows you to use raw queries when
you want to. Just like almost every orm out there.

~~~
captainbland
Yeah it's kind of a false dichotomy isn't it? Just use your ORM for CRUD
operations and use proper SQL if you're doing anything much more complicated
than basic joins. It's a lot better than the busy-work of hand-cranking every
basic SELECT/INSERT/UPDATE/DELETE statement, preparing them and the associated
boilerplate that you usually have to write to stick it in/extract it from your
object model.

------
donatj
This succinctly says the things I have been trying to communicate to other
developers for years.

Bookmarking it so I can reference it.

------
danellis
Ted Neward's The Vietnam of Computer Science is an interesting read on the
topic of ORMs.

[http://blogs.tedneward.com/post/the-vietnam-of-computer-
scie...](http://blogs.tedneward.com/post/the-vietnam-of-computer-science/)

~~~
jessaustin
The central metaphor of this essay reminds us that 2006 was a long time ago.

------
yogthos
I wholeheartedly agree with the article. Every time I've used an ORM I've
regretted it. SQL is already a great DSL for working with relational data, and
there is no reason to create a leaky abstraction on top of it. The reality is
that there is no practical way to do efficient mapping from relational data to
object models, and pretty much every ORM ends up generating horridly
inefficient queries when you start getting into non-trivial cases. At that
point you're going to have to go in and write the queries by hand anyways. So,
you might as well just skip that step and work directly with SQL.

The worst part is that you often find out that ORM is affecting performance
when you start hitting real world loads in production.

~~~
mumblemumble
It doesn't always affect performance. A lot of business apps are mostly doing
CRUD under fairly light load. It may be that beefing up the database server is
much cheaper than interacting with the database in a way that, while being
more performant, is also more laborious.

That said. . . yeah, I've also had my share of cases where the app ended up
seeing higher load than anyone anticipated, and it's hard to even _see_ the
source of the performance problem, because, with an ORM, it takes a little bit
(or, with the worse ones, a lot) more work to find out what the actual queries
being executed are so that you can ask the DBMS to `EXPLAIN` its query plans.

~~~
yogthos
Except, I don't find using SQL more laborious, and it if is then I'd suggest
that's a problem with whatever language makes using SQL directly laborious.

------
micimize
I'm pretty in love with schema-driven development with
[https://www.graphile.org/postgraphile/](https://www.graphile.org/postgraphile/).
I generate a graphql schema from postgres, and code models/serializers from
that.

Currently using migra and some custom scripts for migration management - will
check out automigrate.

The main issue I've had with developing a large backend in pure SQL is that
you end up having to piece together custom scripts, migration tools, etc. to
add features your database/SQL doesn't provide, like modules and higher order
abstractions. So I'm actually considering switching to something like
SQLAlchemy simply for that

------
taffer
> SQL should be the source of truth

There are frameworks like JOOQ or MyBatis that work this way.

~~~
doctor_eval
We went from hibernate/eclipselink to MyBatis - and have never looked back.
Needing to deal with the idiomatic behaviour of JPA, HQL and the target SQL is
a nightmare. We’ve literally removed tens of thousands of lines of Java code
by dropping JPA.

~~~
debug-desperado
We’re really regretting going with MyBatis. Huge waste of time when most
queries could be easily handled with Spring Data JPA.

Also now that we need auditing there’s a pile of work we could have have
gotten mostly free with an Envers annotation.

~~~
doctor_eval
We’ve taken the view that audit should be done in the database layer using,
for example, triggers. Audit at the application layer is less resilient then
at the SQL layer, IMO.

In fact, an important part of our transition away from ORM was to invert the
ORM-centric relationship between the application and the database. We wanted
our tech people to be able to manipulate the database natively via SQL, which
is generally much easier and more efficient than writing and deploying Java
code to do the same thing.

For us, this meant moving most data manipulation logic into the database using
PL/PGSQL. Doing so has empowered a team of non Java admins, massively improved
performance, and significantly reduced LOC. I’d say that we now have about 30%
(1/3rd) of the LOC in PL/PGSQL than we had in Java+JPA. Obviously this means
less bugs, but performance is literally 100x in some cases.

I think that if you take a Java-first view of your application then all the
complexity of JPA is part and parcel of any solution, and that’s fine for you.
But we’ve found that taking an SQL-first approach has been great for our use
cases, and has had lots of ongoing benefits.

Of course, we needed to build a bunch of tools to help with this. For example,
we built a gradle plugin that treats SQL code like Java code so we can write
libraries in SQL with transitive dependencies. We also needed to build testing
tools and tools to automate schema migrations for CD.

I do think the lack of tooling around managing large SQL-based projects is a
blocker to wider adoption of our approach, but the benefits of going against
the ORM grain have been very significant for us.

~~~
debug-desperado
That's a fair criticism about auditing. Doing it through database triggers
avoids the problem with audits being missed because of code that doesn't
participate in the Hibernate lifecycle (including native SQL queries and
Criteria updates, yikes!).

I will also concede that under some circumstances it's too slow to bring the
data to the code, and instead you have to bring the code to the data (i.e.
PL/PGSQL). The sort of speedup is mostly from eliminating round trips and data
marshaling, however. That's not quite a like-for-like comparison of an ORM vs
a mapper.

The reason I stick to ORMs is mostly because of RAD tools that save me so much
time. For instance, JHipster generates liquibase migrations and JPA entities.
This gives me a relational schema very quickly. To avoid any "surprise"
queries that tank performance, I do turn Hibernate's statement logging on when
developing new features.

Next time if I can find some more tools to help with an SQL-only + stored
procedure approach, I'll give it a deeper consideration. Maybe convince your
company to open source some of tools it has developed!

~~~
doctor_eval
There is a lot of internal enthusiasm for open sourcing our tools, but we are
heads down at the moment so it’s just a matter of time (or lack of). I think
we will time it for the next pgAU conference so we can talk about it at the
same time.

I definitely agree with you that tooling is a big deal. We did spend a lot of
time manually proving it out before we spent the time on the tools. It was a
bit of a leap of faith but it quickly became obvious that we were onto
something.

And you’re right, the point of our approach was to move the code to the data.
We deal with reasonably large, complex real time data sets Sotheby’s round
trips and marshalling become the dominating contributor to total time. Hence
our ability to improve some operations by 100x.

I can see that if you have a familiar and reliable tool chain and a compatible
use case then ORMs could be great. I think our problem was that we had
neither, so it was never going to work out too well for us!

------
mnm1
"Designing" the database by creating objects first and exporting create
statements is ignoring the responsibility to design the database completely,
the database which is the foundation of almost all apps. It's irresponsible
and leads to a database that isn't optimal at best and doesn't work at worst.
The database design should be one of the most important tasks of an app and it
shouldn't be an afterthought. The design shouldn't even involve code, even sql
at first. Getting the data structures correct is most important. The code will
flow from there. "Show me your data structures and I don't even need to see
the code to understand how it works."

Putting an orm layer on top of this refusal to do one's job and design the
database is the second nightmare of an orm. Adding a new language and new
paradigms that don't make much sense just to translate data into objects adds
a ton of completely unnecessary complexity. Not to mention many orms are slow
either at fetching or hydrating data. They are just unnecessary solutions for
nonexistent problems. It's too bad most people have no experience with
database design or even sql to understand and see this. Most people prefer to
make apps more complex unnecessarily rather than reduce complexity and think
that makes them experts. They are usually the ones advocating for the
unnecessary orm layer. On the other hand, this problem mostly goes away when
you no longer deal with objects in oop languages, so there is hope.

------
vkaku
Very subjective arguments.

Here's the deal: if you design a decent entity manager, you shouldn't have to
worry about how the database works.

A database was created to ingest and query user data. Programs pretty much
ingest and query user data, the amount of regular (non tech) people using SQL
is too damn low.

So, IMHO, programming the data ingestion and query should be easy, and a good
ORM framework should handle it. Of course, to make a programmers life easy,
they should start following mindful and clean practices - Otherwise any tool
will be a gateway to hell.

~~~
JacKTrocinskI
So wrong, you always want to know how your database works, you should be
tuning your database, and queries against your specific database. Every
database has unique technology you can take advantage of and there are
performance tuning techniques and design techniques you should be aware of. If
you're doing simple INSERT/UPDATE/SELECT then sure, but in a large
organization with tons of data that's not gonna cut it.

~~~
vkaku
\- SQL is like putting an artificial layer to insert the values you want to
insert with some ('') around it.

\- SQL alone will never help you understand if your underlying table is
hashed/unordered, column/indexed or tree/ordered either.

\- ORMs need not generate SQL statements. An optimized ORM can directly send
the data to be inserted in a native protocol to the DB.

\- The deal is that data tuning / access optimizations have no semblance with
SQL or the ORM.

\- Bad ORM usage / anti patterns does not mean ORMs are bad. It means people
use them badly.

~~~
JacKTrocinskI

      - SQL is not an artificial layer, an ORM however is
      - SQL is a language, it's up to you as a developer to tune your queries against your database (e.g. create and use partitioning, indexes, statistics gathering etc.)
      - If your using an ORM just to send data via a native protocol then why use the ORM in the first place?
      - Performance tuning is very closely related to how you write your SQL
      - This is my opinion but I think almost any usage of an ORM will mostly be bad, no real arguments to support otherwise.  Let the database do the work, that's what it's there for.

------
Chris2048
SQLAlchemy can get pretty complicated with some of the hairier use cases; raw
SQL can too, but SQLA adds another dimension to it:

* SQLA can infer joins, but sometimes it can't and needs hints - then you may worry that you miss some combination of table join it doesn't have enough information for.

* In one case, SQL inferred a join incorrectly and I ended up with dupe results of a cross join. this is possibly because of a certain primary key duplicated on a few tables. The way SQLA infers joins is a bit magical, and it's not always obvious how to modify that behavior.

* dealing with joining a table on itself can be complicated, when you need to distinguish between those two versions of itself (using an alias?)

* how about this use case: modify a set of filters to apply at two levels: one in filtering rows before a select query, then again to filter the result of joining those original results.

* If you operate with objects/models, for saving/retrieving things, it can often be hard to mix with _raw_ SQL which can't easily be modeled (at least without committing everything, and _then_ executing the SQL). That means you need to communicate that SQL in object syntax, which is far more complicated.

* for migrations, I need to redefine some models, because I need two - one representing the old DB structure, and another representing the new structure.

It's possible some of these have better solutions than I've found, but I find
it's often hard to figure them out.

------
qatanah
You gotta love both.

I just use ORM for OLTP and raw SQL for OLAP (Better for being more
expressive).

Migrations saves a lot of time.

~~~
tatersolid
One problem I’ve seen is that most non-trivial apps have both OLTP and OLAP
elements in the same places. Think of how many “mini-dashboards” you see
sprinkled throughout the UI of a modern CRM.

ORMs are generally not good at or even capable of efficient set-based
operations spanning entitles.

Building a dashboard with an ORM results in pain and wildly oversized DB
instances in my experience.

------
linux2647
Maybe I’m in the minority, but I’ve found that if the right ORM fits your
brain, it can be incredibly useful for a lot of CRUD operations, especially if
it offers escape hatches to lower levels of the abstractions; ultimately down
to raw SQL.

SQLAlchemy for Python is that for me. I find it incredibly expressive and
allows me to write complex queries that are readable, maintainable, and
perfomant. Yes, it requires learning, but it’s the right tool for the job,
personally.

------
toomim
It sounds like the poster would like to check out the python DAL library:
[https://github.com/web2py/pydal](https://github.com/web2py/pydal)

It's exactly what you want. It blew my mind, and taught me that python syntax
can natively express SQL semantics. This gives me all that ORMs are really
good at -- letting you avoid having to write one language in string-form
inside of another language.

~~~
toomim
DAL also has automatic migrations! The author's tool is called "automigrate".
Hmm... :)

------
tabtab
ORM's are yet another instance of "Big Dark Grey Boxes" (BDGB). They are
powerful tools that do a lot for you IF they work as intended. When they
don't, you have to experiment and fiddle, and perhaps end up fudging up an
ugly work-around to meet deadlines.

Templating engines (like Razor), UI formatting engines (Bootstrap), and
routing engines (URL path translation) are also BDGB's with a similar great-
until-broken profile.

I most cases it seems we can could have simpler versions that are only a few
hundred lines of code. If this few hundred can generate 90% of our SQL or
whatnot, then we get most of the time savings without the complexity: we can
study or fix a few hundreds lines of code. Chasing that last 10% has created
these bloated BDGB monstrosities. Leave it at 90% and enjoy simplicity. It
seems components have a handling-curve something like this:

    
    
       80%: 100 LOC (LOC = lines of code)
       90%: 250 LOC
       95%: 1000 LOC
       98%: 3000 LOC
       99%: 6000 LOC
    

The percent value here is the percent of situations the component handles. In
ORM's case, it would be the percent of SQL code generated by the component.

We should strive for components that assist us with grunt-work, not hide the
process of doing it behind too much code to grok. If it's simple enough, we
can also tune it to better fit shop conventions.

For big development shops, it may make sense to assign specialists to master
BDGB. If they work with it all day, they'll eventually figure out most of its
quirks and oddities. It's specialization at work. But smaller shops often
waste too much time trying to troubleshoot irksome BDGB's because they don't
have time to focus on any one.

------
mushufasa
ORM protections against SQL injection attacks are worth the clunky syntax.
Change my mind.

For example, Django's ORM protections against SQL injections:
[https://docs.djangoproject.com/en/2.2/topics/security/#sql-i...](https://docs.djangoproject.com/en/2.2/topics/security/#sql-
injection-protection). Of course, not all ORMs have this feature.

~~~
UK-Al05
In most languages ecosystems I've used have paramatized queries or prepared
statements which do this anyway...

~~~
rjmunro
Almost all, except the old php mysql interface. Unfortunately, that happened
to become one of the most popular and influential ones.

~~~
hu3
It was removed in newer PHP versions in favor of mysqli/mysqlnd/PDO which
makes prepared statements dead simple.

------
gigatexal
Not discounting the monumental work that SQLAlchemy was to create. It’s a
complicated set of machinery that allows one to plug in any relational backend
and then just define models and go and if your way of thinking fits ORMs and
sessions and you want to use lazy loading etc it’s great. Personally I would
rather couple my code to my database a bit more and go the route of raw SQL in
code or stored procedures on the database that then acts as an abstraction
that my code can call into.

I’ve worked at places that have taken both approaches and each has advantages
and disadvantages but I would take writing my query over an ORM any day. I am
just too paranoid too leave any performance on the table and don’t want the
cognitive overhead of an ORM getting in the way of my queries. That being said
SQLAlchemy is pretty awesome when used as a query builder — I think that’s a
good compromise if you must use an ORM and it is SQLAlchemy then I’d use it
like that.

------
eternalban
ORMs are the technical solution to a social problem. From a pure technical
point of view, a "Database" (with capital D) that allows for user defined
functions (see Postgres) with a canonical "Domain Schema" expressed in SQL
(DML & DDL) is technically superior.

The social problems imo are:

\- (contrary to blog's assertion) most programmers are scared to death by SQL,
and are far more comfortable with iterative programming than
declarative/functional programming in SQL.

\- ORMs in part also addressed organizational issues. Databases used to be the
domain of DBAs. ORMs to a large extent wrested the control over the "Domain
Model" from DBAs to Programmers.

From a purely technical pov I believe that the facility of user defined
functions (c.f. Postgres) permit for _complete and coherent structural and
semantic description of a domain model_ in the database. However this is
nearly universally hated by programmers as it requires programming in the
database.

------
cryptos
After reading "Functional architecture is Ports and Adapters" [1] and using
ORMs (Hibernate) for some time now, I'm wondering whether a functional
approach wouldn't be better in many cases. With such an approach you would
have the following flow:

HTTP adpater receives request -> object is loaded from database as immutable
object -> business operation creates a modified copy -> DB adapter persists
the new object.

The last step could be improved if the DB adapter would look what was actually
changed. Since everything is immutable it is as easy and fast as comparing
references (like React does btw). This works even with Domain Driven Design,
what is usually used with mutable entity objects.

[1] [https://blog.ploeh.dk/2016/03/18/functional-architecture-
is-...](https://blog.ploeh.dk/2016/03/18/functional-architecture-is-ports-and-
adapters/)

------
jb3689
They have their place. I particularly like Ruby's ActiveRecord for handling
relational data. The extra caching and easy eager loading is really nice. It
sucks for high performance batching processing but it works great for web
requests. I wish it was a little more complete when it comes to bulk
updating/inserting ActiveRecords

------
jayd16
I don't agree that ORMs are backwards. They often go both directions so how
can it be backwards? You can usually generate schemas from your code objects
as well as vice versa.

No, ORMs are usually solving the wrong problem entirely. Your code objects and
your schema are solving different problems and should not be directly mapped.

As for other things they do...In my opinion, they usually ruin the declarative
beauty of SQL by creating a lot of imperative methods, but I admit this is
_mostly_ a preference thing.

That said, _I've_ never seen transactions or complex queries done properly in
an ORM without simply giving up and using SQL.

But I still use a thin ORM. Who wants to write a result set parser? You just
need to reason about and be aware of when you want to create an object for a
custom result columns that aren't mapped to a table vs when you want to do the
default ORM thing and nest table mapped objects.

------
nnq
Imho the "smell test" for good-enough ORM should be: can I, at the same time,
(1) just switch the DB from underneath, replacing a relational MySQL with a
document-oriented DB like Mongo or a hybrid one like Arango and have it work
the same, 99% same business logic, (2) replace the underlying DB with a REST
API of another app and use that as the DB, and (3) have the ability to drown
down to "raw db" language, whether that is SQL, Mongo JSON's based queries, or
a GraphQL API I use to "prasitise" another app and use it a the DB for mine?

 _None_ of the ORM/ODMs out there come even close to satisfying half of these
requirements... so I'd rather implement an app-specific one that does 50% of
this for the 1% restricted subset of functionality of the app I have...

~~~
imafish
I do not think (1) makes sense. Why would you use an ORM for a document-
oriented DB?

~~~
nnq
...well, call it an OWM, Object-to-Whatever-Mapper, but that's what you'd one
anyway, document-oriented DB's don't map directly to your business objects
either :)

I think tackling the most general problem would lead to a way more interesting
solution but I have to time to work on this :P

------
cedricium
I'm going to project my naivety here and ask how does one write raw SQL
queries in code? Should said queries live in their own files and be referenced
when needed or are the queries usually written where they are called?

Bonus points for open source examples.

~~~
wayneftw
I've worked on a few apps that relied heavily on SQL stored procedures as the
data access layer. In a database like SQL Server or Sybase, a stored procedure
is like a function with a body of SQL that is stored on the server, receives
typed parameters and can return more than one result set, each result set
having a different shape (different columns). So, a call to get an entire
graph of data would only require one request and one response, saving many
round trips. You could write a stored procedure that selects a customer, their
recent orders, the products on each of the orders and so forth. You can also
insert/update into multiple tables in one trip and you can wrap any parts of
your SQL in transactions.

The SQL variant used by SQL Server and Sybase, called Transact SQL, also has
elements of procedural programming such as variables, flow control with
IF/THEN/ELSE/switches/etc, looping, calling other SQL procedures or user
defined functions, exception handling and so on.

These features have been part of SQL Server since the 90s. It's extremely
powerful and if used well, can be much more elegant and flexible than an ORM
in my opinion. It's a shame that PostgreSQL doesn't support stored procedures
quite as well or have an SQL variant that's as well crafted as Transact SQL
because I'd love to be able to write apps in that style again. Somewhat
recently I believe PG got something like stored procedures that can return
multiple heterogeneous result sets and although I can't remember the
specifics, I don't think it's quite as robust as what you can do with TSQL and
it lacked client/driver side support maybe... (Also PG has some support for
running TSQL itself, but not with all the same features.)

Here's an example of how you might call a stored procedure in C#, using a
library written by Marc Gravell of StackOverflow -
[https://stackoverflow.com/questions/5962117/is-there-a-
way-t...](https://stackoverflow.com/questions/5962117/is-there-a-way-to-call-
a-stored-procedure-with-dapper)

------
marcus_holmes
I don't use an ORM, because they're not a good fit for my project (also, for
the reasons in the article).

But I do have a pain point around migrations - I have to either painfully
hand-write migrations for the deploy code to deal with, or hand-alter the
production database to make it fit the new schema. Neither of which are good
solutions.

Totally stoked to see the automigrate tool. It completely fits my pain point.
Ideally I can have a line in my make file that compares the current schema
with the last-known production schema and automagically generates a migration
for it.

I'll have a play with this, and contribute if I can.

------
Alir3z4
ORM is what makes you be able to move fast, otherwise find an alternative or
use SQL if you have to.

It can make what you built slow if you don't know how to use, if it doesn't
allow you to do optimize find an alternative or use SQL if you have to.

ps: I love ORMs and haven't find a better one than Django ORM when working
with Django, everything seems integrated tons of flexibility and database
specific features are included as well. Forms, Admin, Search, Sessions, Auth,
REST Framework and many other things are well integrated with the ORM so
things works smoothly as possible.

------
bvanderveen
Another way of looking at the problem is that (in many naively-implemented
backend codebases) almost every layer and component of the stack ends up
depending on the ORM objects, which has the consequence of allowing any of
those layers or components to do arbitrary database operations.

You might have some 'data access layer' class/module but if you're using an
ORM, it more than likely accepts as arguments and returns as results some ORM
objects. The ORM objects returned from the data access layer invariably have
methods like `save()` or `delete()` or `validate()` or some magical property
when called ends up issuing another query (think: `customer.orders` or
`order.customer`). The objects you're supposed to pass to your data access
layer as arguments probably also have constructors that, under the hood, grab
static references to singletons (often the connection factory).

In this way, the data access layer is leaking the opportunity to (if not the
responsibility of) invoking save/delete/validate functionality to any and
every part of the code base, when it should really be contained in one module.
Moreover, you cannot reason about whether a particular data object is
persisted, could be persisted, or was persisted just from its type.

That is, your Customer object always exposes `save`/`delete`/`validate` even
if you got it from a data access layer method called
`getCustomersWithRecentOrders`. (Idem for the elements of
`getCustomersWithRecentOrder()[0].orders`.) There's nothing stopping you from
pulling out `getCustomersWithRecentOrder()[0]`, mutating it, then calling
`save()` (or `validate`, or `delete`)—which never-ever makes sense.

Instead, `getCustomersWithRecentOrders` should return behavior-less structs,
and your data access layer should provide an `updateCustomer` method. Moreover
it should be impossible to construct the type that goes into `updateCustomer`
with data that does not validate.

When you implement this kind of data access layer, maybe you use a low level
DB driver, maybe you use an ORM for some niceties, but either way, you
shouldn't leak types from these dependencies out to the consuming module.
However I rarely see this kind of discipline in 'naive' ORM apps
(rails/django/laravel/CoreData/Hibernate/etc).

------
gmac
I've been working on a simple ORM alternative for TypeScript + Postgres which
addresses the first two complaints (takes SQL as source of truth, and doesn't
take over migration or connection management), and am really happy with it so
far: [https://github.com/jawj/mostly-ormless](https://github.com/jawj/mostly-
ormless)

~~~
jnordwick
Grrr.. I'm vehemently against ORMs, but take your upvote for the name.

------
JacKTrocinskI
I never really saw the reason for having an ORM, why not just do your data
logic in the database? Write procedures/functions/packages/triggers and handle
everything there and it will almost always perform better than trying to do
the same thing in application level code. Then have your application do simple
SELECTs, INSERTs, and/or UPDATEs without the need for an ORM.

~~~
icebraining
Isn't all code "data logic"? Why have an application at all?

~~~
JacKTrocinskI
An application might be good as a display/interaction layer for an end user
but not for performing complex data manipulation in a database, that's what
SQL,procedural database languages, and ETL tools are for.

------
AmazingTurtle
SQL as source of truth... Generating models from those SQL create table
instructions sounds pretty sick actually. Nice one! Would use it in upcoming
projects if there was broader support accross different languages and
frameworks. I'd like to see some adoption by big players

------
thrownaway954
I'm sorry, but unless this dude has some credentials for writing ORMs (which,
not to toot my own horn, I personally have done) I don't think he even knows
what he is talking about. Almost ever ORM I've worked on, we treated SQL as
the source.

------
collyw
Why is it always assumed that you have to use one or the other in these types
of articles? Use the ORM where it makes sense. Use SQL when it makes sense. I
know that Django's ORM lets you you run SQL through it and prevents SQL
injection.

------
nablaone
There is other way. ORM nor SQL is a source of truth. Database design
(physical diagram) is the source of truth. ORM code and SQL migrations are
generated from the design.

Why not replace "vs" with "and"?

------
sasaf5
New concept of the day: Object-Relational Mapping, converting an object, with
all its members of diverse types, to a format suitable for storage in a
database, roughly a table of scalars.

------
mythrwy
Sometimes ORMs are like trying to tie a fishing lure with mittens on.
Frustrating.

On the other hand I can not stand, no, really really really cannot stand
clever 1000+ line SQL queries. Writing an entire program in one dense line
doesn't show how smart you are, it shows you don't have a clue about the
bigger picture or else lack respect for the time and mental health of others.

For most cases on a team and for maintainability I've finally decided the
advantages of a good ORM and migration tool where you can commit migrations in
a rational and standard pattern far outweigh the disadvantages. But this is
not because I don't know or respect SQL.

------
JustSomeNobody
Judging by a lot of comments regarding this article I find it funny that we
can have HTML and CSS in our JavaScript but we can’t have SQL in our code.

------
ryanmarsh
_The ‘somehow’ is usually a half-baked migration tool_

Maybe don’t use a half baked ORM.

I’m experiencing schadenfreude that someone is having ORM problems in 2019
long after the NoSQL hype died.

I can remember on zero hands the number of times my not half baked ORM was the
source of my problems such that I wanted to say “fuck it lets just write SQL”.
ORM patterns are pretty well laid. Hell they haven’t changed much in at least
a decade or more and they work great for _the problem they solve_. Which makes
me think maybe a relational store is not right for whatever you’re using it
for.

tl;dr If you have ORM problems I feel bad for you son, I got 99 problems but
SQL ain’t one.

------
buboard
> The ability of ORMs to target multiple DB backends isn’t an asset

SQL has the ability to target multiple backends.

------
ijonas
ORMs are for suckers. DB Drivers are for suckers. I interact directly with the
port 3306. ;-)

------
eigenloss
Don't you love it when authors completely fail to expand acronyms even once?

------
Isinlor
Reading the comments I have a strong feeling that people here equate Active
Record with ORM. There is another ORM pattern called Data Mapper:
[https://martinfowler.com/eaaCatalog/dataMapper.html](https://martinfowler.com/eaaCatalog/dataMapper.html)

I think that our goal as enterprise software developers is to automate
business logic (business constraints) in a way that more-or-less a domain
expert can read and understand. To create a precise language that facilitates
fast and robust communication between experts and developers. New hire to the
team should be able to learn domain expertise from the codebase because your
codebase should contain easy to read domain model.

Relational algebra and SQL is not a very expressive natural language. SQL
limits your vocabulary to 4-5 verbs unless you start writing procedural code
in procedures etc. but SQL is not a good procedural language. Relational
databases are a solution to specific technical problems of execution speed,
atomicity, consistency, isolation, and durability (ACID). They excel at that,
not at communicating intention.

You should use ORMs (preferably Data Mapper) if your goal is to solve the
first problem while using off the shelf solution for the second problem. Data
Mapper allows you to isolate your domain model from technical aspects of data
storage. This will work very well as long as you will actually be able to
ignore technical aspects in your domain model.

You can do the data mapping, querying, migrations, and all this technical
cruft manually with a handwritten SQL if you want, but SQL certainly will not
address very well the first goal of creating an expressive domain model that
facilitates robust communication between developers and business experts.

And to address Active Record. It's a solution in the middle, you don't get
full isolation, because database details creep into your model. The tight
coupling of the domain model to ORM also makes it difficult to swap it for a
handwritten mapper. But it's a very good tool for fast iteration and
prototyping.

And to address the only point from the article that is not moot based on my
explanation above.

"We can’t have nice things until we move our schemas out of our app languages
and into declarative spec languages like SQL, proto, or even jsonschema /
swagger (though I’m not a great fan of the last two)."

SQL, jsonschema / swagger can't express business constraints. I'm working for
a university. How do you express that "a student internship in some company A
can not fall in time between some two other internships if the two other are
in some company B" in a declarative non-domain specific language and make it
still readable? You can write a declarative domain-specific language, but I
wish you luck on your adventure trough parsers, interpreters, planners etc. to
create that declarative domain-specific language in the first place.

------
nikolasburk
What this article misses IMO is a clear distinction of the various
_abstractions layers_ an ORM can provide, most importantly these two:

\- _Heavy ORMs_ : Object-oriented mapping layers that map classes to tables. A
record is represented as an object that not only carries data but also
implements various behaviours for storage, retrieval, serialization and
deserialization of its own data, sometimes it also implements business/domain
logic. Examples: ActiveRecord, EntityFramework, Hibernate, Sequelize, ...

\- _Lightweight query builders_ : Lightweight, programmatic abstractions on
top of SQL. Examples: knex.js, pypika, ...

There are issues with both approaches.

 _Heavy ORMs_

Heavy ORMs make it _seem_ like a developer doesn't need to know SQL to work
with a database. They help to get a project off the ground quickly, but as a
project grows and requirements become more complex, they often generate slow
queries or generally lack capabilities to express the right queries. After
all, a developer will be forced to understand what's going on in the SQL layer
underneath, spending a lot of time debugging and understanding the generated
queries, often without the possibility to optimize them. The premise of "not
knowing SQL" and productively working with database longterm has just been
proven wrong in so many cases (I feel like everyone has been burnt by an ORM
at least once in their dev career...).

 _Lightweight query builders_

Lightweight query builders on the other hand don't try to abstract too much
away from SQL. Their APIs typically use SQL terminology and they're not trying
to hide the fact that they're working with a database. Coming from an
application developers perspective, there still is the drawback that you must
have a solid understanding of SQL to work with them efficiently. Another issue
that these query builder APIs typically aren't type-safe. There's no static
analysis that can help identify typos or other issues in the DB queries you're
building. In terms of developer experience, they lack features like
autocompletion for DB queries. Personally, I still prefer query builders over
heave ORMs since I get more control.

Time for a plug: I work at Prisma where our mission is to make it easier for
application developers to work with databases by providing developer-friendly,
efficient and performant abstractions and tools for working with database!
We're currently building the _Prisma Framework_ [1] which consists of two main
tools:

\- Photon [2]: A lightweight, auto-generated and type-safe database client

\- Lift [3]: A tool for declarative data modeling and database migrations

You can learn more about both tools in this blog post:
[https://www.prisma.io/blog/announcing-
prisma-2-zq1s745db8i5/](https://www.prisma.io/blog/announcing-
prisma-2-zq1s745db8i5/)

Both tools are based on the _Prisma schema_ [4], a declarative abstraction
over your database schema that serves as foundation for the generated Photon
API. Lift migrations work by mapping the Prisma schema to your database
schema. We also build a ton of other nice tools and libraries that simplify
database workflows (such as Prisma Studio, a database GUI) or integrations
with API layers, such as GraphQL Nexus [5] for building type-safe GraphQL APIs
without boilerplate.

To get started, check out this tutorial:
[https://github.com/prisma/prisma2/blob/master/docs/tutorial....](https://github.com/prisma/prisma2/blob/master/docs/tutorial.md)

[1] [https://github.com/prisma/prisma2](https://github.com/prisma/prisma2)

[2] [https://photonjs.prisma.io/](https://photonjs.prisma.io/)

[3] [https://lift.prisma.io/](https://lift.prisma.io/)

[4] [https://github.com/prisma/prisma2/blob/master/docs/prisma-
sc...](https://github.com/prisma/prisma2/blob/master/docs/prisma-schema-
file.md)

[5] [https://github.com/prisma/nexus](https://github.com/prisma/nexus)

~~~
arunix
Are you aware of that other project called Lift (A Scala web framework)?

[https://www.liftweb.net/](https://www.liftweb.net/)

------
tony
Earlier comments I wrote on ORMs:
[https://news.ycombinator.com/item?id=14661391](https://news.ycombinator.com/item?id=14661391),
[https://news.ycombinator.com/item?id=19853396](https://news.ycombinator.com/item?id=19853396)

It makes it much easier to avoid errors and tab-complete relations across the
object graph.

> Some ORMs offer to ‘mirror your live DB’ into a schema. That’s insane to me.
> It makes any type-checking or linting impossible. You have no idea what
> you’re getting.

Hm, I believe the author is talking about SQLAlchemy's sqlalchmey.ext.automap:
[https://docs.sqlalchemy.org/en/13/orm/extensions/automap.htm...](https://docs.sqlalchemy.org/en/13/orm/extensions/automap.html).
This is a _relatively_ new feature in SQLAlchemy, no need to use it if you
don't want (but it is pretty cool).

I can't speak for everyone here, but I doubt most ORM users are using it that
way.

> Your test suite will be useless because your local DB probably is getting
> spun up from a schema.

Most likely a test database is spun up (if it's not a sqlite in-memory db). In
the case of django it'll automatically prefix test_{databasename}, run
migration files, and go to town.

> If not, then you have another hard problem of doing regular prod dumps to
> your dev environment. Yuck.

Which leads me to another benefit of ORM, adding fixture data to simulate very
intricate states is much easier to do and maintain. Between tests, most
likely, transactions are being rolled back.

Since you are using python, you'll be able to use pytest fixtures:
[https://docs.pytest.org/en/latest/fixture.html](https://docs.pytest.org/en/latest/fixture.html).
I really like pytest-django, but I'm not sure how well the sqlalchemy
equivalent is though. Maybe that's the area you could have having issues with,
since when they have bugs, they're notoriously hard to figure out (is it how
I'm bootstrapping sqlalchemy? is it my models? is it my fixtures? is it a
pytest bug? or the pytest extension?)

> There’s always some surprise about the connection or pool type. It’s never
> straightforward, it’s always badly documented, it has twice as many layers
> as necessary because it has to manage bespoke session tracking and table
> registries. Transactions are undocumented or mysterious, autocommit is
> implied or hidden.

I recommend Django's ORM. I've used SQLAlchemy for years and couldn't tell you
the default behavior cold. I believe autocommit is off by default unless you
specify otherwise
([https://docs.sqlalchemy.org/en/13/orm/session_api.html](https://docs.sqlalchemy.org/en/13/orm/session_api.html),
[https://docs.sqlalchemy.org/en/13/core/connections.html](https://docs.sqlalchemy.org/en/13/core/connections.html))

> We can’t have nice things until we move our schemas out of our app languages
> and into declarative spec languages like SQL, proto, or even jsonschema /
> swagger (though I’m not a great fan of the last two).

You could have your django (or whatever is using the ORM) server / etc. run an
API. What are you building exactly? graphql will get you a typed API and have
other languages use it via whatever you like (OAuth2?)

You're probably going to a main programming language of some sort doing the
CRUD/migrations. Why? There's normally business logic / layer stuff on top,
that's another +1 having other languages access via API may be used so there
isn't duplication.

Are there cases where making API wouldn't work? If that's the case you're
probably well beyond the ORM questions, and probably learned to just keep ORM
to the basics - simple ForeignKey relations, staying away from polymorphism /
stuff that doesn't map well to other languages where you may have to access
SQL.

------
smashedtoatoms
Word

------
crimsonalucard
Imperative code is complicated when dealing with data, so we abstract
imperative code into SQL. SQL is complicated so we abstract it from SQL back
into imperative code via an ORM.

Of course it's backwards.

It's like one of the biggest legacies of web development next to javascript,
html and CSS. SQL is a very specific way to think about data and a very high
level and leaky abstraction. So high and leaky, that you literally need to
look at the query plan in order to optimize sql queries. If Database IO is the
bottleneck for web development the best abstraction for this area is most
likely a zero cost abstraction (rust for example). Instead we have SQL and we
create a generation of SQL hackers who memorize a bunch of technology specific
hacks to try to get things to work. Why is SELECT * bad? It's a hack you
memorized, a leaky abstraction.

I'm not saying schema-less is better hear me out. I'm saying SQL is a bad API.
You can probably make some api language that's more explicit and imperative on
top of something like postgres. Change the API itself don't write something on
top of it.

Either way, an ORM on top of SQL is like what type script is to javascript.
You know when they make an API on top of an API it's to cover up something
that doesn't work well. Unless it's a zero cost compilation such fixes only
make performance worse.

If you insist on using some abstraction on top of sql the way to do it with
zero cost is an api that is bijective to the syntax of SQL itself. Type script
I'm guessing is relatively isomorphic to javascript, so it works out.

~~~
taffer
> I'm saying SQL is a bad API. You can probably make some api language that's
> more explicit and imperative on top of something like postgres. Change the
> API itself don't write something on top of it.

Something like VSAM or IMS?

The relational model and declarative query languages were invented because the
previously existing imperative navigational model was far too inflexible and
difficult to use. Access paths were baked into the data structures, you had to
be your own optimizer, and if you optimized for the wrong usage patterns, you
had to refactor everything or live with poor performance.

~~~
crimsonalucard
You can enforce patterns via types. You can also have default optimizations.
The problem with SQL is to choose optimization you have to do it by poking the
syntax.

For example:

    
    
       query1 = select(table, columns, optimization_method="auto")
    
       query3 = join(query1, query2, joinOptimizationMethod="auto"...)
    

Obviously the above example is trivial, wrong and probably won't work. But I
hope it paints a fuzzy picture of my point. That is that the optimization or
'plan' should be explicit. You choose auto or you dive in and you change it
yourself. No language or syntax hacks.

~~~
james_s_tayler
No thanks. The benefit is not having to specify the plan. The optimizers
themselves are fairly good. A well designed table with the right indexes
should perform well.

Raw SQL is great at what it does.

ORMs are typically good at what they do.

Depending on the use case both have their pain points. Typically I see a
mixture of ORM for most CRUD based stuff and raw SQL for cases where the ORM
isn't suitable.

No need to throw the baby out with the bathwater.

~~~
crimsonalucard
In my example code. Do you notice the string "auto"?

It means you can literally have the API do exactly what SQL does.
Automatically specify the query plan. But you explicitly do it.

Meaning that sometimes the auto planner screws up. When it screws up, in those
cases you explicitly make it use another algorithm rather then modify SQL to
be slightly different and hope that it compiles into something more
reasonable.

The baby is rosemary's baby.

~~~
james_s_tayler
I'm not convinced on the UX of that. Developers are lazy. I think what would
more than likely up happening is they explicitly set it to "auto" and modify
the query to get a more performant plan. After a certain query complexity It
would become exceedingly hard to start to be able to piece together plans by
hand.

A bunch of performance problems aren't even solved by tweaking the query
either. They're solved by changing the database structure. Adding the
appropriate indexes and such. Making sure datatypes match between joined data
and no implicit conversions are happening. Right-sizing columns.

No amount of specifying your own query plan will do anything to affect those
kinds of issues.

In practice I find mostly it comes down to cardinality estimate issues as a
very common source of problems as the database either over or under provisions
enough memory for the query. If it estimates it's going to get back a lot more
data than it actually does and it grants too much memory that will reduce
parallelism because that memory can't be used by other queries. If it under-
estimates it doesn't grant enough memory and when it gets back more rows that
will fit into memory it has to write them temporarily to disk taking a massive
hit in I/O performance.

How does your scheme work with figuring out how much memory the database
should grant to a query when specifying a plan by hand?

What's more is plans change over time as statistics change. Leaving it up to
the query optimizer means it's adaptive. Having to specify yourself means you
have to know the optimizer isn't giving you the best plan at design time.
There are some cases where you know that. There are some cases where you
don't.

You can already specify query hints etc. I think SQL just has this covered
already. I have no qualms if the query changes slightly.

------
PavlovsCat
Help this noob see the light: how do I handle super dynamic queries in an ORM?

E.g. I have nodes that can be listed and sorted in various ways. That might
mean having to join and/or select on additional tables (taxonomies, tags,
sources, auhtors). I have a lot of code to generate exactly the query I need
for any given listing, but how do I do that without string concenation? Make a
dedicated query for each permutation of additional tables I need or don't
need?

I'm not so much looking for an explanation in your own words, but just a point
on what to search for, or maybe some articles that skip right to such
"advanced" stuff.

------
iamleppert
Most ORMs I have used are by definition a leaky abstraction: an inferior and
insufficient representation for the underlying language at which they try to
improve upon.

In addition, ORMs greatly increase the surface area for troubleshooting.
Instead of just being able to run and profile a query, you now need someone
who is experienced in the internals of the ORM and how it "plans" a certain
query. Of course, because ORMs do not understand how the data is actually laid
out, when they say "plan" they mean create the SQL string.

So in place of a query you will need to learn the conventions of the
particular ORM, leaks and all, and how that maps to the SQL. Then you will
still need to understand how that SQL actually interacts with the table
schema.

The end result is normally one backend person who is very territorial over
their code, and who acts as a bottleneck for the rest of the team. If you're
using an ORM I'm sure you already know who this guy is...

