Hacker News new | past | comments | ask | show | jobs | submit login
ORMs Are Backwards (abe-winter.github.io)
307 points by awinter-py 17 days ago | hide | past | web | favorite | 384 comments



I have a lot of experience with the Django ORM, but no other.

So I disagree with most points.

The Django ORM does map pretty neatly to SQL (in my opinion), and especially Postgres. The migration tool is anything but "half-baked". And I have observed that beginners have an easier time creating Django models than dealing with straight SQL.

On top of that, the Django ORM mitigates most SQL injection attacks out of the box, without even requiring the developer to know about SQL injections in the first place.

And the ORM enables elegant code reuse by constructing query objects that can be used in Template logic, Form validation, generic CRUD, REST endpoints and more.

I agree that most ORMs don't help you with multiple programming languages. Though both the Django ORM and SQLAlchemy allow to customize a lot of the naming conventions and can be adapted to any legacy schema.

But then again: Is it even a good idea to access the same database from two codebases at all?

Some of the concerns about "ORMs kill the SQL-star" can be boiled down to a lack of a general understanding of relational databases in developers who learn to use an ORM before diving deeper into the philosophy of relational databases.


...you're the ONLY person I saw with a good opinion of Django's ORM. I'm willing to accept you could be right in preferring it, but:

Could you share some resources of using it properly for advanced use cases? Good cookbooks etc.

For, it's always been a huge footgun, coupled with Python's dynamic nature, it's and endless source of bug after bug... Mistype a property on a model instance - no problem, no error, it will just not get saved to the db and data will be lost. Use progres native json and array fields - good luck running your tests on sqlite, the ORM that was helpful up to here just gives up and tells you "f u buddy, you're on your own". Need to write complex hand-coded-sql queries that return complicated data, but you's still want to be able to serialize them into a graph of objects that can then be modified by regular ORM code - good luck with that, nobody cared about this scenario.

They've somehow managed to make it too simple and too complex at the same time! Oh, and don't expect to just click a few jump to definitions an make sense of the ORM's inner code easily, god forbid :|...

While I like the idea of ORMs, I found them all totally "wrong headed" And SQLAlchemy - that's just not worth the effort of learning to use it properly despite its good ideas, if you're only going to end up using 10% of its functionality and in an idiosyncratic way that was not intended.


This thread is littered with people singing praise of Django's ORM. He is not the only one. I'm a huge proponent of it as well.

> Could you share some resources of using it properly for advanced use cases? Good cookbooks etc.

The official documentation covers advanced use cases. Stackoverflow will help you shape up more complicated queries.

> Mistype a property on a model instance - no problem, no error, it will just not get saved to the db and data will be lost.

That's a drawback of most dynamic languages, not a flaw of Django's ORM.

> Need to write complex hand-coded-sql queries that return complicated data, but you's still want to be able to serialize them into a graph of objects that can then be modified by regular ORM code - good luck with that, nobody cared about this scenario.

Maybe that's the threshold Django ORM devs thought sensible to stop supporting? At some point, mixing hand-coded-SQL and ORM code introduce a whole set of hard questions and decisions. Very few people expect ORM code to properly play with hand-coded-SQL. If you need to switch back to SQL, you are generally on your own.

> They've somehow managed to make it too simple and too complex at the same time! Oh, and don't expect to just click a few jump to definitions an make sense of the ORM's inner code easily, god forbid :|...

It's simple to use, yet have complex internals. That's about what I would expect from a tool abstracting something complicated.


>> Mistype a property on a model instance - no problem, no error, it will just not get saved to the db and data will be lost.

> That's a drawback of most dynamic languages, not a flaw of Django's ORM.

In Python, mistyping a name usually results in an exception at runtime. If Django doesn't check models for correctness, it can't be explained by being written in a dynamic language.


In Python, mistyping a property of an object doesn't result in an exception. JS have the same behavior.

If you mistype a property on a nullabe or already filled property of a model instance, Django will happily save the model with the mistyped property (which will be ignored because it's not part of the model) without actually modifying anything in database.

Calling that behavior a flaw of Django's ORM is disingenuous.


> In Python, mistyping a property of an object doesn't result in an exception.

Sure it does. It doesn't on assignment, as you can modify the object however you like, but it does have AttributeError for accessing a property that doesn't exist.


Actually it does on asignments to `__slots__`-based classes.


Which you can only use compile time - not dynamically. That said I recall the sqlalchemy orm reject fields that it did not recognize. A decent IDE (pycharm) will hilight these for you on the Django orm. Having now used both the Django orm and SQLAlchemy extensively I prefer SQLAlchemy a lot. The Django ORM is usable, but many of its behaviours are odd and I agree that writing more complex queries gets ugly fast, unlike with SQLA where you can write some really hairy sql in a composable manner that makes it just so much more legible.


There is no such distinction between compile time and run time in Python; everything can be done dynamically. Want to use a class with slots computed at runtime? No problem:

  >>> def with_slots(**kwargs):
  ...   class SlottedClass(object):
  ...     __slots__ = kwargs.keys()
  ...     def __init__(self, **kwargs):
  ...       for k, v in kwargs.items():
  ...         setattr(self, k, v)
  ...   return SlottedClass(**kwargs)
  ... 
  >>> with_slots(foo=1, bar='qux').zork = 4
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
  AttributeError: 'SlottedClass' object has no attribute 'zork'
Of course there are many improvements that could be made to this approach (e.g. not creating a new class for each instance, or setting the __name__ to something meaningful) but I hope it shows that dynamic typing and robust error checking aren't contradictory.


They were talking about assignment.


If you misspell the property when setting the value, there is no error.

Again, this can be caught by eye, by testing and probably soon by static type checkers.


i mean if you try to access undefined attribute in model it will throw an exception


> ...you're the ONLY person I saw with a good opinion of Django's ORM. I'm willing to accept you could be right in preferring it,

No, he's not. Almost the entire Django community has a good opinion of it.

The main criticisms I see of Django's ORM seems to be from people that prefer SQLAlchemy. They have different design philosophys and the distinction seems largely a matter of taste and differing use-cases.


> main criticisms I see of Django's ORM seems to be from people that prefer SQLAlchemy

...then I'm in the minority of hating them both, and in the process of looking for an alternative since I'd be using a lot of python too for the foreseable future :|

I liked web.py's database abstraction layer a long time ago - know of anything similar in philosophy but more powerful?


Massimo still maintans it at https://github.com/web2py/pydal and he's working on py4web at https://github.com/web2py/py4web.

DAL was the first Python db abstraction I used, and I liked it well enough, but I've since used both Django and SQLAlchemy, and they both seem less opaque to me, maybe because not obscured by some of the weird things about web2py itself.

I love the auto-generated backwards references you get in Django and SA when you use foreign key and many to many fields. They make life so much easier. I don't recall how that was achieved in DAL, I just remember it being... complicated.


Years ago I had the same discussion [0] on Medium and here is my take on that: You will end up reinventing "an ORM" because your SQL won't handle all the edges cases. At least with eg. Django ORM we know it helps avoid some security flaws.

[0]: https://medium.com/@iMitwe/been-working-with-django-for-over...


thanks for the reference, especially to the Doordash's original article "Tips for Building High-Quality Django Apps at Scale" that was related to... I really like their thinking and think they are solving for the same problems, really like their idea of abstracting/wrapping over the ORM instead of using signals, and I'm looking for a similar architecture for a "default" Django boilerplate, featuring:

- data models in separate files one for each model, under a data_models/

- business logic models that may or may not map one to one with data models under a models/ that would content roughly framework-independent code

- encourage "fat apps"

- figure out some way to let the db be the source of truth and allow "multiple services, one DB" scenarios - maybe having "core models" for which all db migrations are excluded, table names fixe, and possibly extra columns that the django app could ignore but other services could use, and "app/project models" for which regular django migrations would run

I might write it up after I get the boilerplate app, and I'm really interested in hearing opinions for people with opposing viewpoints especially!


Migrations and integration databases don't mix. One works in a code-first paradigm, the other is database-first.


...of course they don't: I was referring to having a sane standardized way of having each app work with two dbs, one that it owns and includes some models of no interest to other apps, a different one that it shares with other apps and is used by other models (and has migrations turned off for these models, ofc).

The main point is that it can result way less code, think of all the APIs between microservices that would never need to be developed or documented or maintained because they'll never exist, yet the problem they would've solved will be addressed :)

Doesn't work all the time, but if most of what you need is data-sharing, and there's a clear model of "this service produces this data, which has a clear documented fixed format, all others just consume it", you'll prevent thousands of LOCs from being written. And I do see my mission more and more as preventing (useless) code from being written nowadays...


In Django we can have multiple databases using multiple backends and either the same, different, or overlapping ORM models.


Indeed. I recently worked on a legacy project where I don't have the luxury to modify the old database. I ended up using Peewee [0] to access the old database, in conjunction with the normal Django based database and even wrote a simple model that use Peewee [1]. Everything works, even Django Rest Framework associated with the model.

[0]: http://docs.peewee-orm.com/en/latest/ [1]: https://stackoverflow.com/questions/57602206/mysql-gone-away...


Django's ORM is fine. I would say that it's good but not particularly powerful.

Its easy to use and makes inserting / updating data fast and easy. It's simple and fast for simple queries. It starts getting a bit crap when you need complex queries. Annotations / GROUP BY stuff isn't intuitive.

It does make it easy for inexperienced people to generate huge numbers of queries by looping through querysets and getting field from another table.

It lets you run raw SQL through it and prevents SQL injection, so its really a case of knowing when to switch to that.


If you're building SQL with string concatenation of data, you've lost. You're not secure. Give up.

Parameterised queries are straightforward in all competent languages. If Python can't do it natively, switch to a better language that can.


Where did I say I was concatenating data? You might want to take a look at Django's ORM before making comments like that, it handles parametrised queries fine.


The way I see it, an ORM is just a way of formalising the false assumption that a database is stupid and useless.

I've taken a look at a couple of Django's most popular ORM options and I suppose if I was a mid-level corporate cog writing CRUD interfaces for Stack Overflow tutorial-grade operations like customer-sale-item-product relationships it would probably work fine. I've only had passing exposure to that kind of soul-crushing drudge work.

For the way I work an ORM is baby stuff, and I'm not even doing anything particularly advanced or sophisticated. I maintain (among other things) a relatively unremarkable bespoke online discussion forum. It comprises hundreds of queries, few of which could be composed correctly by an ORM, let alone composed and run performantly. The median complexity query in my code base probably has two or three joins and two or three subqueries—at least one of which has some kind of tangential aggregation or window function.


> For the way I work an ORM is baby stuff

https://www.reddit.com/r/iamverysmart/

> a relatively unremarkable bespoke online discussion forum.

>It comprises has hundreds of queries

Sounds like you have a design issue. Making the unremarkable complex.


Thanks for the well deserved lesson in internet arguments. You win, sir.

(For the record, complexity and remarkability aren't strong correlates. And it's not particularly complex; probably 90% of the app's queries are written for indirect tasks like moderation tools, automations, statistical reports and various stratum of anti-abuse mechanisms. Most common pages only have one security-related query and one content-related query.)


> progres native json and array fields - good luck running your tests on sqlite

you're not testing what you actually run in production, then what are you testing? If you just need to test in-mem behaviour, mock the DB completely; instead of expecting compatibility for vendor specific features across vendors.


For database testing, I often use a "ramdisk", or in-memory filesystem: sudo mount -t ramfs -o size=512m ramfs /path/to/mount/point, initdb, pg_ctl start. (There's a bit more bookkeeping to generate passwords, store them to /path/to/mount/point/.pgpass, etc.) You can tear it down afterwards with pg_ctl stop, sudo umount /path/to/mount/point. I'll put the necessary commands in start / stop scripts, then call those from within the beforeAll / afterAll hooks of whatever testing framework I'm using.

It's definitely slower than mocking, and therefore not really suitable for unit tests - but it's great for repeatable, isolated integration tests (continuous or otherwise).


Sounds like a job for docker or CI systems.


Hmm... can one test use in-mem db mocking with django.contrib.postgres fields?

I'll definitely look into this, not a solution for existing projects bc of legacy baggage, but probably the right way to do it. Problem in practice is that you yes what you really want is to mock the db completely, but practically speaking existing code is to entangled with the db adapter specifics, so you end up with running with a "fake real db" :|


Can you write the tests so that you create object instances and not save them to a database at all? You don't need to test the ORM and database, after all, just your code.


The closer you keep things to production, the more likely you are to catch errors that will occur on production.


unit tests, yeah, but if you try to get "the most bang for the buck" in like "test a lot with zero/little effort" you'll end up with a layer of "integration tests": that (1) set stuff in db, (2) do stuff, (3) read stuff from db and assert its correctness... A sane ORM would not actively hinder this scenario even cross-dbs or with memory-mocked-dbs, but Django's does!


I don't see the issue - you're asking it to support a feature of one database on another, something it never claims to do. You need to use the same database for testing that you use in production.


I'd say the sqlite backend and using it in testing is mostly useful for reusable django apps that are supposed to be used in a wide variety of projects.

Most of the code I write in Django would never need to be SQLite-compatible, even if a lot of it would be. The SQLite compatibility is almost like a relict from a time when setting up RDBMSs was difficult and slow.


> Most of the code I write in Django would never need to be SQLite-compatible,

Then don't run your unit tests against SQLite, run them against Postgres.


If so, don't use things specifically in the Postgres contrib package of Django? The package is built that way for a reason


I don't know if there is already some plugin to make mypy aware of Django models, but that should be possible and would avoid bugs where you misspelled a property and it slipped through your quality control and unit-test.

I suspect "raw sql" to be a bigger headache and source of bugs in any moderately complex project.

Most of the use-cases of queries are actually very simple, and there the ORM allows you to simplify the code a lot.

My hints how to enjoy the ORM more: Write custom managers and QuerySets (for example for commonly used filters), write generic views that take QuerySets, take advantage of Models and QuerySet in Form Validation and REST APIs.

If you use the Django ORM like you would concatenate raw SQL, then it ain't gonna be pretty...


> Most of the use-cases of queries are actually very simple

I write web applications / data tools with raw SQL and I find that only about a quarter of the queries are simple. Many have subtle specificity to the joins, subqueries and window functions that return for me almost exactly what I want to output.

The problem I see in most people's ORM-powered code is that they lean so heavily on application code to get all the data assembled in the right place and in the right order, resulting in scaling problems

> concatenate raw SQL

Often dangerous and almost never needed. Parameterised SQL is the right approach.


> If you use the Django ORM like you would concatenate raw SQL, then it ain't gonna be pretty...

Depends on what exactly you mean here. The Django ORM fully supports some things you wouldn't initially expect if you hadn't seen it before, like passing one queryset into the filter args of another. It'll actually compose the queries and run it once on the database, instead of running the first and passing the result into the second.


> Mistype a property on a model instance - no problem, no error, it will just not get saved to the db and data will be lost.

Test your code. Use an IDE. PyCharm highlights these errors immediately.

> Use progres native json and array fields - good luck running your tests on sqlite,

You're using stuff from the Postgres contrib package on a different db. It's pretty clear in the docs about that

> Need to write complex hand-coded-sql queries that return complicated data, but you's still want to be able to serialize them into a graph of objects that can then be modified by regular ORM code

Make a view and manage it through the ORM? If you're updating data from this, then what you're doing seems really strange.


I'll leave the other parts since sibling comments responded to those, but:

> Need to write complex hand-coded-sql queries that return complicated data, but you's still want to be able to serialize them into a graph of objects that can then be modified by regular ORM code - good luck with that, nobody cared about this scenario.

It's been built-in since at least 1.7, and I think I remember it since 1.4: https://docs.djangoproject.com/en/2.2/topics/db/sql/


I am the other person with an excellent opinion of Django's ORM. I vastly prefer it to writing SQL queries manually, and the only thing I miss is making it ever more functional-like.


> prefer it to writing SQL queries manually

Everybody prefers anything that does a barely decent job at preventing you from writing SQL manually :)

If you set the bar that low then sure, you'll have an "excellent opinion" of ANY ORM. Can you give an example of an ORM you didn't prefer to writing SQL manually for 50% of the cases? - most that were that bad are probably not in usage anymore...


I'll chime in as someone who has used Django and its ORM for about 5 years and I have to say it's a pleasure to use.


My experience as well. I use an ORM that I absolutely love for most work (Apache Cayenne) and I use plain SQL where appropriate, like for complex reporting.

The "ORMs are bad" articles usually boil down to "My ORM is bad" or "I tried to use an ORM where it doesn't really fit and now I believe all ORMs are bad".


Even for "complex reporting" you can still use ORMs if you save your SQL query as a VIEW in the database. Then you use ORM for reading the output.


Or just use the ORM directly with a raw query, on a model that represents the results: https://docs.djangoproject.com/en/2.2/topics/db/sql/


That's exactly what I do with Apache Cayenne.


> Some of the concerns about "ORMs kill the SQL-star"

This doesn't even make sense. It kills the SQL duds, stars are even more important to understand the convoluted SQL when the ORM isn't working as expected, and cases where the ORM can't handle at all (there are always some).


> there are always some

That's not my experience. In ~20 projects in this company, with pretty complicated GIS-related web apps, there have been about 4 cases where I need to create a raw SQL query because Django's ORM didn't do what I needed. Almost all the time, it just works.


When you say "pretty complicated GIS-related web apps" does it mean you perform powerful spatial analysis using PostgGIS extension over a few millions spatial objects OR is it more about fetching a handful of spatial objects for display/processing in web-mapping framework?

Because for pretty obvious reasons in the first case you'd want computation on the back-end. Sure you can do a lot of spatial operations with python after passing through an ORM... but is this really more adapted than raw SQL & PostGIS? Genuinely curious.


Most of the processing happens outside PostGIS, we often compute things with geometries from PostGIS to raster data that we don't keep in PostGIS. So our Postgis queries are mostly limited to filtering on a BBOX, or closest distance to a point.

What users want to compute can be defined by users, e.g., "take all houses in the Netherlands as polygons, compare with a DEM of the Netherlands to find the lowest point of the house's contour, compare that point to a water depth raster to see if it would be inundated" is a computation a user could define in our GUI and the actual computation would then run in (Python) background tasks.


Thanks for your reply.

I work in GIS field and I tend to put a lot of business logic into PostgreSQL because I can re-use all functions/view in a variety of tool from Desktop GIS, BI and even (shamefully) Excel for some our oldest App.

Whenever we'll move to a new webmap or I we'd want to serve an API it's always one connector away. All business logic put into the DB is heavily reusable because a lot of tool "speak" SQL.

I guess in the end it really depend on wether you approach is data centric vs functionality centric.


Wow, where do you work?


Coming from Rails I think that Django's ORM is too complicated, like almost everything in Django and in Python. However I'm using Python, Ruby and Elixir in different projects and Ecto is much more complicated. It makes me want to write plain SQL as I did in the early 90s.

Actually I'd be OK writing SQL. It's just inconvenient having to decode the results and handle migrations manually. I did plain SQL migrations before ORMs. That's not something I want to be back to.


Django's ORM is pretty complicated under the hood, but I suspect that this is not easily avoidable given the requirements.

I also don't subscribe to the view that Django is too complicated. With other frameworks you end up implementing part of Django, and most of the time that will be worse in every conceivable way.


Ecto is just a dsl for writing SQL queries, not an ORM. It’s difficult to understand if you think it’s an ORM...


This. It's not harder, it's different. Takes some getting used to, but then you realize "oh it's pretty much just sql that returns structs" and you instantly hate every other ORM you've used prior.


I'm not surprised about your comment on Ecto. Ecto takes experience and practice, but by the time you finish writing your 5th or 6th application, Ecto will be so natural that you wouldn't want to go back. Especially not to writing vanilla SQL.


The problem with complicate things is that the second application never comes. I've got a customer that's been paying me to write a Phoenix application for the last two years. That's good but I'm not going to use Ecto in one of my own projects. So no Elixir if it touches a database. There are simpler alternatives to Ecto, maybe I'll use one of them but Ruby and ActiveRecord are good almost for everything.


> your 5th or 6th application

s/application/day/


I used to really dislike django ORMs since I had a background in SQL and thought that ORMs are really constraining and not as flexible as just writing SQLs. However, recently I have come into the realisation that not everyone who programs knows SQL (woah, what a revelation right?), and the django ORM is a really nice tool to fill that gap for them with something they are more familiar with (ie python objects).


If you are using any language other than SQL, then you better have a way to convert SQL data to the appropriate data types of your language and vice versa. And it better be injection-proof and reusable. That's already at least half of an ORM right there.

I think I know SQL fairly well, but I don't see how I could implement things like django admin, form validation, template views etc as elegantly on top of plain SQL.

So I really don't agree with "ORMs are only useful for SQL-noobs".


> But then again: Is it even a good idea to access the same database from two codebases at all?

A motivation for relational algebra was different applications accessing data in different ways. This integration remains a selling point. Further, data often outlasts applications... even languages.


Still, I think accessing a single database in RDBMs like Postgres or MySQL from multiple codebases or even languages means asking for trouble.

Providing some kind of API as a library, REST, RPC or whatever would almost always be preferable, if only to centralize permissions and validation.


the problem is that these days you need a lot of application code to check for permissions and that kind of stuff.


In principle permissions can go in the DB.


In practice this is rarely ever done (right).

Row-level permissions are possible, but it can get more complicated when certain users are only allowed to change a certain row in a certain kind of way but not in another.


I agree, but it does seem to become more compelling if you're doing your integration in the database.


"Is it even a good idea to access the same database from two codebases at all?"

Yeah, if you're migrating code between languages and platforms and want to run both simultaneously so you can go live with a subset in the new platform and phase out the legacy system in stages.


In that scenario there is no benefit to raw SQL because you might as well write the new Application with another/better ORM?


No rewriting. Program must be migrated as-is with as little manual change as possible. If the customer wants to make changes later that's up to them.


But then again: Is it even a good idea to access the same database from two codebases at all?

I think that depends entirely on who you ask. Some orgs may have dedicated DB guys who are designing all the DB schema and stored procedures that must be vetted by the DB guys before the application guys can merge in changes. Others might decide dev teams should be responsible for their own data and store it as they see fit. I think product data should only be updated by one main API and that API exposed internally for other tools to interact with, unless it's something that is special. And in development, there are many special cases.


Microservices communicating via an RDBMS sounds like a recipe for chaos.


Came here to say this, specifically about Django too. It's pretty bulletproof these days.


> But then again: Is it even a good idea to access the same database from two codebases at all?

It's an integration pattern. It works in some contexts.


If its treated correctly as a set of SQL language API-s with clear ownership and explicitly defined public/private parts. There is a reason why its often considered an anti-pattern, and it is a general lack of discipline in maintaining stable semantics and separation of responsibilities.


why using an SQL DB if you don't know SQL?


Because web frameworks, mostly.


It looks like the author is generalizing shortcoming in SQLAlchemy as faults with the concept of ORMs. ActiveRecord for example, treats SQL as “the source of truth”, per his definition.

Also, there is a whole section called "ORMs take over connection management and migration" and not a single point as to why connection management handled by the ORM is bad, only comments on migrations. Moreover, migrations are not an unsolved problem at all.

In Ruby, ActiveRecord solves all scenarios I have encountered gracefully, and trust me, I have dealt with tons of edge case projects.

For Python, I recommend Alembic: https://pypi.org/project/alembic/


> ActiveRecord for example, treats SQL as “the source of truth”, per his definition.

SQLAlchemy author here. I can't imagine how someone would not consider SQLAlchemy to treat "SQL" as the "source of truth" as well. All of SQLAlchemy's constructs map directly to SQL syntactically. Our most grumpy users are the ones who don't know SQL. This blog post would certainly benefit from some actual examples. "ORM-light tools that coerce responses into native structs and allow for type-checking are less offensive to me." - that is...an ORM?


What the blog probably means by "SQL as a source of truth" is that you first design your database and then generate the domain classes from it and not vice versa. This is for example what myBatis and JOOQ do.


I don't think there is enough information from the database schema to feed an ORM reliably. E.g., take a "boolean" in MySQL: MySQL doesn't have proper booleans, it just represents them as a `tinyint`. But you'd really like your ORM to convert those to your language's `bool`!

There is certainly a lot of information in a schema, and you definitely want your ORM's idea of the database and the database's idea of it to be in close alignment, but I feel like any serious attempt would find all sorts of holes like the above when it actually came to doing it.


Wouldn’t you just offer user-modifiable translators for such cases (presumably with some sane default)? It seems more preferable than writing the domain model in code first, resulting in N copies of the schema definition for N applications utilizing the db..


> N copies of the schema definition for N applications utilizing the db..

Is multiple applications talking to a shared DB schema a common practice, especially nowadays?

In my mind, each app/service should have a DB schema which only it talks to, and other apps/services needing data in that DB go through services exposed by that app/service (REST, RPC, GraphQL, whatever) rather than talking to its DB directly. That means you can rearchitect the DB schema, change which DB you use completely, etc., and only the app/service which owns that DB needs to be modified.


Yes, it is common practice just about in every organically grown system, ever. Also anything corporate.


I've done this repeatedly with SQLAlchemy. It does require that I do a bit of additional record keeping in my models, but it's not exactly hard to do.


Why use an ORM when I have to know SQL and its a direct syntactical mapping?

>"ORM-light tools that coerce responses into native structs and allow for type-checking are less offensive to me." - that is...an ORM?

Yeah...its a light ORM that focuses on turning a result set into an object but not the syntax remapping of queries. Object mapping is almost universally liked but ORMs usually include query syntax mappings and not the addition of a transaction lifecycle into your data objects.


because an ORM has nothing to do with writing your SQL for you. You can use textual SQL with an ORM and an ORM like SQLAlchemy has a query language that mirrors the structure of SQL in any case. nobody is taking your SQL away. Additionally, you most certainly do want a tool that will write most of your DML for you, there is no deep wisdom or joy in writing the same INSERT / UPDATE statement over and over again.


I've used SQLAlchemy a bunch and found that it has excellent support for complex (for me anyways) query options. For example, avoiding N+1 selects with eager joins has been a breeze.


ActiveRecord does not treat SQL as "the source of truth". ActiveRecord objects maintain their own state, and still try to be a "proxy" for rows in the database. When you call "save", it's tracking the identity of the object using the primary key, and generating SQL from the runtime state of the object. This is backwards.

The biggest problem for me is this "proxy" behaviour. When you start treating objects as proxies for database rows, your database logic inevitably leaks into your business logic. Someone returns an ActiveRecord `User` object from a database method and now random other parts of your application are intricately linked to your database implementation.

On top of that you suffer terrible performance problems. Even if you're smart enough to avoid the whole N+1 query issue, you lose control (and more importantly visibility) over where database updates are coming from, and your ORM is either not smart enough to efficiently apply updates, or so smart that building the query takes longer than actually running it! (I have this problem with SQLAlchemy right now, where it's taking several seconds per request CPU bound in the ORM). Undoing this is literally impossible without a near total rewrite.

And it all serves no purpose! The application-specific interface to its database is usually quite simple. You can just have it be a literal interface/trait/whatever and implement each method by executing a bit of SQL, or calling a stored procedure, or however you like. Arguments and return values should be plain-old-data (no magic proxy objects) and each method should correspond roughly to a transaction so that the rest of the application doesn't have to worry about that database-specific stuff.

There are a ton of other benefits to doing stuff this way, too many to list here, but it really helps with complex migrations (like if you need to migrate from one database to another), maintainability, testing, etc.


> (I have this problem with SQLAlchemy right now, where it's taking several seconds per request CPU bound in the ORM). Undoing this is literally impossible without a near total rewrite.

for the write? writes are not usually much of a performance issue except when people are inserting thousands of rows. You can replace flush with direct INSERT/UPDATE or use the bulk API. Pre-setting primary key values will also improve performance by an order of magnitude. Share some of your code and profile results on the mailing list and we can look into speeding it up.

SQLAlchemy always provides many techniques to optimize areas that have performance problems, not to mention the primary direction for most major releases is that of new performance features and improvements year after year, despite having to deal with the dog-slow cPython interpreter. No "total rewrite" of your application should be needed.


The SQLAlchemy ORM is a very complex beast. It provides plenty of methods to optimise the queries it generates. It provides no way to "magically speed up the python code".

This is not just for writes, but also for loading complex relationships from the database: we've gone to great lengths to ensure we're using relationships optimally so as to cache query results and not fall into the N+1 query problem, but our application spends all its time inside SQLAlchemy.

The only way to get the performance improvements we needed was to stop using it as an ORM altogether. However, the ORM was too intricately tied with our business logic to do that without a near total rewrite.

What we've actually ended up doing to do this rewrite more incrementally is to create a Rust service which implements just the "GET" end-points: this gives us an order of magnitude performance improvement despite it using the exact same database and fetching the exact same data. Most of the business logic did not need to be duplicated as it is not used from "GET" requests, but users of our application will get much snappier loading times, and it will generally feel much better.


> It provides no way to "magically speed up the python code".

it most certainly does provide many techniques to reduce in-Python computation. Have you read the performance section in the FAQ and worked through the examples given ? Have you looked into baked queries , querying for columns and not objects ? These are often quick wins that make a huge difference. The baked queries concept is going to be much more cleanly integrated in an upcoming release.

> This is not just for writes, but also for loading complex relationships from the database: we've gone to great lengths to ensure we're using relationships optimally so as to cache query results and not fall into the N+1 query problem, but our application spends all its time inside SQLAlchemy.

that's typical because a CRUD application is all about its object model and SQLAlchemy is doing all of that work. if you wrote your own object layer using Core then you'd see all the time spent in your own objects<->core layer. This is just the reality of Python.

Here's a runsnakerun I make some years ago of the PyMYSQL driver loading data over a network-connected database: https://techspot.zzzeek.org/files/2015/pymysql_runsnake_netw... Notice how just the Python driver spends TWO THIRDS of the time an the actual waiting for the database 1/3rd ? that's Python. If the database is on localhost, that pink box shrinks to nothing and all of the time is spent in Python, just reading strings and building tuples. It's a very slow language.

Rust language however is blazingly fast and is one of the fastest languages you can use short of straight C code. This is not the fault of an ORM, this is just the reality of Python. You certainly wouldn't think that if you used an ORM that runs under Rust, it would be as slow as your Python application again. It would continue to be extremely fast. The Hibernate ORM is an enormous beast but runs immensely faster than SQLAlchemy because it's running on the JVM. These are platform comparisons. SQLAlchemy is really fast for the level of automation it provides under pure Python. You made the right choice rewriting in a fast compiled language for your performance critical features but that has little to do with whether or not you use an ORM. As long as your database code was in your Python application, you'd end up writing an ORM of your own in any case, it would just be vastly more code to maintain.


> it most certainly does provide many techniques to reduce in-Python computation. Have you read the performance section in the FAQ and worked through the examples given ? Have you looked into baked queries , querying for columns and not objects ? These are often quick wins that make a huge difference. The baked queries concept is going to be much more cleanly integrated in an upcoming release.

Most of those techniques boil down to "use SQLAlchemy as a query builder rather than an ORM" - something I wholeheartedly agree with and would love to do, but it is next to impossible because these ORM objects have leaked into the business logic of the application.

> that's typical because a CRUD application is all about its object model and SQLAlchemy is doing all of that work. if you wrote your own object layer using Core then you'd see all the time spent in your own objects<->core layer. This is just the reality of Python.

I'm not trying to say SQLAlchemy is a bad implementation - it's a great implementation with tons of powerful features. I just believe that using anything as an ORM is a bad idea and directly leads to these performance and maintainability issues. Whether that's ActiveRecord, SQLAlchemy or even if someone implemented a similar ORM in Rust.

Some of the performance issues are due to python, but that's not the whole story: if I use SQLAlchemy as a query builder, and only return plain-old-data, I do not see nearly the same performance issues. Not because SQLAlchemy ORM is badly written, just because it has to do more work, create more objects, with more "magic properties", maintain more entries in the session, etc. etc.


While you're not technically wrong at the SQL level, I believe there's a higher level of business problem that ORMs like ActiveRecord solve for that's being overlooked. (aka ActiveRecord got Airbnb and Shopify a long way in scaling both developers and product).

CPU time building a query is rarely the business bottleneck until you're at a huge scale - scaling up the engineering team uniformly and having business logic at abstraction level closer to the rapidly evolving Product specs is the bottleneck that ActiveRecord solves for, and does pretty well.

If you'd like something that avoids object/hidden state - check out how Elixir/Phoenix's Ecto[1] was designed - it avoids many of the shortcomings of ActiveRecord and SLQAlchemy:

1: https://hexdocs.pm/ecto/Ecto.html


+1 for Ecto, the most powerful ORM that I have seen so far (among PHP, Node, Ruby, Python and Elixir)


If "SQL is the source of truth" means you have no in-memory state, I'm not sure that's feasible, with or without an ORM.

Once you fetch something from the db, you have it in variables in memory. With or without an ORM. That could no longer be sync'd with the db a ms later. That you will probably re-use for more than one access (if you need it more than once), because issuing the same SQL again the second time you need to reference any data returned, in a given routine, would be insane.


I think what they’re proposing is something like “if I want to read and then update a value, incrementing it by one, the proper way to do that is to tell the DB to increment whatever value it has stored; not to increment my local representation of the value and then tell the DB to overwrite the value with it.”


The number of times I've had to increment a value by one are minimal. Or similar things where you want to update one value based on the other values, and do it in an atomic way.

But if I had to increment specifically, and were using ActiveRecord... I'd use the ActiveRecord increment! method. Which has the proper db-atomicity semantics you want, it does execute `UPDATE x SET n = n+1 WHERE pk = y ...`


ORMs do that fine, for instance it'd be

    MyModel.objects.filter(pk=...).update(myfield=F('myfield') + 1)
In Django's ORM.


This is not obvious unless you think about it - I've seen issues like this because the code looks correct if you ignore the update semantics, especially when it wont fail on your local dev env because in memory state will be the same as DB state since you're the only one hitting it...

Having to think about these things because the abstraction is leaky instead of just using the SQL where the transformation is explicit is not worth the hassle.

Using Clojure and yesql was a real pleasure for this - Clojure is immutable so you're working with values all over the place - and working with SQL query is just passing values in it and getting values out - no magic mapping - it's just values - if you want the latest value - query again, if you want to store a new value - send the new value. No mutation, no magic, just data.


So obvious and simple! ...or you could just use SQL.


> Once you fetch something from the db, you have it in variables in memory. With or without an ORM. That could no longer be sync'd with the db a ms later.

Nope. The DBMS is making the same transaction isolation guarantees to an ORM as to any other client.

I think the larger "source of truth" issue is how the schema is represented.

All of these systems that provide their set of schema objects that are then pushed to the database. This works fine for LAMP stacks, but as soon as you grow it becomes apparent that you've put the horse before the cart.


I’ve found that you want to be super careful in application code about where you do updates and inserts when using AR.

Sprinkling calls to order.save! or order.update(foo: bar) in random places is a recipe for disaster.


>Someone returns an ActiveRecord `User` object from a database method and now random other parts of your application are intricately linked to your database implementation.

Indeed. Its a rookie mistake but hey, you work with rookies all the time! Orms can be useful but they certainly allow for some dangerous patterns that are hard to notice if you're a young developer.


ORMHate is nothing new under the sun. Martin Fowler wrote a more balanced article about the issue years ago https://www.martinfowler.com/bliki/OrmHate.html


They mention the case where the ORM shadows the db schema. The argument isn't terribly salient, it's not like you can't build type checking in, it just means more work if that's what you want to do. Essentially according to the author the only way to do it right if you have an existing db is to write all queries from scratch.

My preferred solution for existing DBs is Ruby's Sequel or ROM if you want to go nuts. But I'm a diehard Rails lover and will choose ActiveRecord over all other solutions if I can. To me there's only one decent ORM and everything else is either good for special purposes or junk.


So, I've seen this go back and forth so many times, and I've come to the conclusion that the two extremes are irreconcilable. Not because they're incorrigible, but because they represent two very different ways of using a database.

For some, the database is not actually a database, per se, it's just a place to persist state. Usually folks in this camp think in an object-oriented way. They haven't read Codd's paper, they don't care about the relational algebra, they just want some way to take bags of properties and stick them somewhere that offers random access with good performance and is reasonably unlikely to not spontaneously erase data. If you're in this camp, ORM works great, because there isn't really much of an object/relational impedance situation in the first place. The data store wasn't structured in a way that creates one, there's no need burn a bunch of effort on doing so, and dragging in the whole relational way of doing things is like dragging a wooden rowboat on a hike through the forest because you'd like to use it to go fishing on the off chance you find a nice pond.

Others see a lot of value in the relational model. They're willing to put a bunch of time and effort into structuring the data and organizing the indexes in ways that will allow you to answer complex, possibly unanticipated questions quickly and efficiently. They, too, hate the syntax for recursive common table expressions, but are willing to hold their nose and use them anyway, for various reasons, but mostly because people think you're really cool when you can spend 30 minutes and make something go 2-3 orders of mangnitude faster than it used to. They don't think of the data in terms of bags of properties, they think of it in terms of tuples and relations and a calculus for rearranging them in interesting and useful ways. For them, there is potentially a huge problem with object/relational impedance; the data's organized in ways that just don't fit cleanly into Beans.

The thing is, neither of these ways of using a database is wrong. Each has it strengths and weaknesses. The trick is figuring out which way fits your business needs. Well, that's the easy trick. The hard trick comes when someone on your team doesn't understand this, believes there is one universal solution that will work for everyone, and is hellbent on jamming the One Righteous and Holy Peg, which happens to be square, into a round hole.


I know about the relational model and used to work with raw SQL for years.

But this is the stack I often work with now:

- Postgres

- Django ORM

- Django REST Framework

- Auto-generated OpenAPI Typescript functions

- A Redux store

- My React app

In other words, four ORM-like non-relational frameworks stacked on top of each other between Postgres and my app.

A generic solution for "complex, possibly unanticipated questions" would need to work through all of those layers.


What do you use for "Auto-generated OpenAPI Typescript functions" ?


Django-rest-swagger to have a Swagger / OpenAPI spec on top of Django REST Framework, and then OpenAPITools/openapi-generator to generate Typescript code.


cool -- what do you use to generate OpenAPI Typescript functions?


I don't doubt you're right.

If you're thinking in the first terms, is a relational DB the right backing store, or wouldn't it be better to back your DB with something more like Mongo?

Do they still obtain some benefit from the schema, since from time to time the semantics of a property will change, and a schema can tell them what the current shape of the data is and guide the migration. Or would they prefer to just write a new property which does a lazy conversion from the old terms? (To an extent, I suppose the answer to this question determines the answer to the first. But I guess there's other tradeoffs to Monggo I'm not aware of, since schemalessness fills me with fear and I just don't want to look there.)

Regarding your final paragraph: I suspect that, for many apps, the choice between relational vs transparent persistency is largely determined by the team who is working on it. The "hard trick" is therefore trying to balance a strong personality with a strong view who disagrees with the rest of the team who have weaker personalities and weaker views, but who all agree on the other side of the fence. This is simply a standard management question with very little technical relevance.


I suppose it depends on whether you want schema-on-read or schema-on-write.

Even if you're working under the first model, there's still a lot an RDBMS can do to help you ensure data integrity. Largely by being less flexible. Databases like MongoDB allow for a more fexible schema, at the cost of pushing a lot of the work of ensuring data integrity into the application code.

For my part, I do a fair bit of working with databases that were built on the MongoDB of the '90s, Lotus Notes, and I've seen what they can grow into over the course of 25 years. It's not pretty. That experience has left me thinking that, while there's certainly a lot of value in the document store model, I wouldn't jump to a document store just because I don't need everything an RDBMS does. I'd only do it if I actively needed a document store.


> If you're thinking in the first terms, is a relational DB the right backing store, or wouldn't it be better to back your DB with something more like Mongo?

Certainly relational like for 90% of the cases, if not all.

The relational model is THE answer to nosql from the start (ie: it was the solution of the originals "nosql").

Is totally more flexible, powerful, dynamic, expressive... And that without talking about ACID!

You can model all "nosql" stores with tables. With limited exceptions it will work very fine for most uses...

> This is simply a standard management question with very little technical relevance.

I don't get what your are implicating here...

But nosql solutions are the ones to be suspected and the ones to requiere a harder qualifications and justifications to use. Is the wrong choice in the hands of the naive. "NoSql" is for experts and for niche/specific workloads.


The advantage of using an ORM is that you can always not use it in the places where you are doing things that are not suited to the strength of the ORM.

I tend to hand-write almost all of my migrations and many of my queries that synthesis data from multiple tables to reach a conclusion. I can think of only a handful of times where it was worth writing custom code to persist state (usually only when there are a large number of records that need a couple of specific fields updated.)

Like many tools, it all depends on how well it is used and how well it fits its use-case.


100% agree.

> The trick is figuring out which way fits your business needs.

Rule of thumb - if you don't control the database, use an ORM; if you do control the database, work directly with it.

For example, let's say that your business is a software company that sells an on-prem product. Some of your customers have Postgres expertise, some have MySQL expertise, some MSSQL, some people are stuck on Oracle. Forcing customers to develop DBA expertise in a database they're not familiar with just for the privilege of buying your product is a sales disaster in the making. So you go with an ORM and set up QA testing that tests releases across all of the databases that you support, and the ORM helps you by making it much more likely that your development efforts will automagically succeed in working with each of the supported databases.

In most other situations, though, it makes much more sense to start with the data design. If your business grows, your databases are going to grow. You are almost guaranteed not to switch databases (absent overwhelming financial need, see: Oracle) over the lifetime of your company. Data analysts (data scientists now?) can extract serious value from your databases by getting into the weeds of the database schema, indexes, and queries and working with developers and DBAs to optimize them for business reporting. If you give up control to an automated tool that knows nothing about your business, your business will be less competitive as a result.

Data is far too valuable these days to refuse to develop expertise with the underlying databases.


Why do your customers even need to have expertise in the DBA you're using? We just sell them a black box (usually VM images), with a few endpoints to extract data in standard formats. They can use whatever they want to connect to those.


Maybe one customer wishes to run their databases in a cluster distributed across two continents.

Maybe another customer has bought Oracle and the installation still has room. Also, they have a custom backup scheme that takes their load patterns into account.


Customers wish many things, that doesn't mean they're relevant selling points. It just sounds bad judgment to me to tie yourself that way, unable to take advantage of the RDBMS to the full. And have you even tested running your (hypothetical) application in a distributed Oracle cluster across two continents? If not, how will you support it?

Absolutely not worth it, in my experience.


What are some heuristics to determine which way fits the current business problem (and future business problems)?


The most basic question I'd ask: is the core of your application the data or is it the business logic? Or to put it another way: does it make more sense to build the application around data, or to build your data around the application?

Another question I'd ask is whether you're expecting to deal with millions of rows/objects or hundreds of millions. Modelling relational data correctly can have performance impacts in orders of magnitude.

When I look at most projects, I instinctively begin by modelling the data structure; then I think about why/when/how data can move from one state to another; then I think about how an application could prod the data between these states; then I build code which runs against the data.

If your application isn't data at its core (e.g. a document-based app) then it probably makes more sense to treat data elements as objects and use a CRM (or similar) to store and retrieve the objects.


An ORM does not usually limit how much you can model your data and create fast queries. The modeling you talk about can be done just as well in e.g. the Django ORM.


ORMs that I've experimented with tend to fall into one of two categories: either they treat the object model as prime, or they treat the relational model as prime.

The former almost invariably spurt out inefficient queries, or too many queries, or both. They usually require you to let the ORM generate tables. If you just want to have your object oriented design persist in a database, that's great.

The latter almost invariably results in trying to reinvent the SQL syntax in a quasi-language-native, quasi-database-agnostic way. They almost never manage to replicate more than a quarter of the power of real SQL, and in order to do anything non-trivial (or performant at scale) they force you to become an expert SQL and/or how it translates its own syntax into SQL.

And once you become more expert at SQL than your ORM, it's not long before you find the ORM is a net loss to productivity—in particular by how it encourages you to write too much data manipulation logic in code rather than directly in the database.

For the projects I've worked on, I've almost never wanted to turn data into objects. And on the occasions when I've thought otherwise, it has always turned out to be a mistake; de-objectifying it consistently results in simpler, shorter code with fewer data bugs.

I tend to find that the longer data spends being sieved through layers and tossed around inside your application, the more data bugs you'll end up having. It's much better to throw all data at the database as quickly as possible and do all manipulation within the database (where possible) or keep the turnaround into application code as short as possible. It means treating read-only outputs/reports more like isolated mini-applications; the false nirvana of code reuse be damned.

And that doesn't mean replacing an OOP or ORM fetish into a stored procedure/trigger fetish. It means realising that if your application is data at its core, it's your responsibility as a programmer to become an expert at native SQL.

The problem is that far too few programmers realise how deeply complex SQL can be; it's treated like a little side-hustle like regular expressions, when for so many programmers it's the most valuable skill to level up.


You'll have to pry ActiveRecord from my cold, dead hands. I'll take it any day over the half baked quasi-ORM that your company has inevitably made over the years because an early engineer was anti-ORM and it turns out doing raw SQL all the time is tedious.


What’s nice about ActiveRecord is that while it obviously does hook into the Rails object-oriented style, it’s actually more relational and functional than oo in how it works underneath. It’s a fairly thin layer of abstraction over SQL. I think of it more as a way to build and compose SQL statements than a true ORM—at least when it’s used well.


At my last position we simply did sprocs and dapper. When you take as true that the database is the source of truth, it only makes sense to put the queries in the database. The only exception is dynamic queries or modifications.


How do you avoid leaking business logic into database stored procedures? Sounds like you would now have two sources of truth, application logic, and database stored procedures.


I'd like to see someone have a decent go at reexamining the idea that business logic (BL) doesn't belong in the DB and tease apart exactly what that should mean, if it's actually still true, or even if it's ever actually been true. Maybe something like that's been posted here before, but I haven't seen it...

I bought into the idea myself for a while, but when I interrogate my belief there it was just something I picked up at uni as part of the general 3-tier approach, and I'm not sure how much of it is really practically grounded. I see the same ideas held dogmatically by newer employees, and when I ask them about it in detail, I see the same fuzziness.

After all, if you want to really be pedantic, then you could claim anything beyond having a single table with two columns, "key" and "value" is pushing BL into the DB.

I.e, what practically does the separation gain you? You may want to swap out the DB in future? Almost never happens, not without some significant other refactoring going on (e.g breaking out into separate DBs because the product's grown to the point of needing distributing across services, etc). If you want to really design with that in mind, you're almost certainly giving up a lot of functionality that the particular DB is going to give you. It's on the level of holding onto the possibility that you'll want to change the language you're writing in at some stage (and that event would favour pushing more into the DB anyway).

In reality, what I've found is when there's a reliable, transactional, relational datastore at the bottom (i.e we're not talking about a larger distributed system), then for your own sanity you want to be pushing as much as possible down the stack as possible, to reduce the surface area that someone could reach through to change data in the wrong way. Data consistency issues are some of the worst when it comes to eroding customer trust, and if your data access/mutation paths are many and varied then you can do your head in or burn through a lot of dev morale trying to diagnose them.

The strongest advocates otherwise I've found are those in office cultures where there's little trust between devs and DBAs and getting things through DB review are a rigamorale that end up driving devs towards doing as much in code as possible. I've always suspected that beyond Google envy, these sort of dynamics are what drove a lot of the initial NoSQL movement...



Thanks, the Derek Sivers piece rings a bell now (and maybe was going into my thinking on this), and the other two will go into my notes to avoid having to repeat myself the next time this one inevitably comes up on the office Slack...


I worked on one system that did intentionally put business logic almost entirely in the DB. The VP was a former database guy and the primary justification was auditing. It worked well enough in practice.

There’s a quote from Gavin King somewhere exhorting developers that ORMs and raw sql are meant to be used concurrently, ORM for CRUD ops,sql for most everything else.

The problem I have with ORMs is lack of paging on collections; there’s always that one user that pulls in half the database.


In my experience, keeping BL out of the DB is a practical concern, not a dogmatic one.

Usually there are two things:

1. Performance. Business logic is usually horizontally scalable, databases are usually not. You want to pull as much BL out of the database as possible, and put it into stateless microservices so they can be scaled.

2. Version control and deploying changes to business logic. Business logic changes frequently. Do you want to have to be making that frequent of changes to your database? Do you have practices around safe deployment and code review for stored procedures?

I've worked at a place where there was lots of business logic (millions of LOC) stored "in the database". In this case the database was MUMPS. It can be done, and it can be pleasant. The catch is that vertical scaling is the only option, and you have to spend years building your own tooling from scratch.

I think the line between database and code is destined to become even more blurred, but not by bringing the code down into the database, but by lifting more and more database concepts up into "application space" as distributed computing becomes more normalized.


Yeah the performance point is interesting, and I think it has merit in a well-defined scope. The idea of bringing DB concepts up is something I've seen done, but it wasn't a pleasant experience overall and I think everyone involved is regretful that we went down that path. It's a good story though. Ramble starts:

The system was an enterprise-focused offering with weak product management, so in the early growth-oriented days of the company ended up saying "yes" to a lot of feature requests that we probably shouldn't have. Where this particularly impacted was the permissions system, which evolved over time from an ACL with odd warts to also include a bolted on hierarchy of users and their objects, and then a role based system, and then all sorts of enterprise-level oddities like cross-links in the hierarchy, shared subhierarchies, roles that could appear at multiple hierarchy points, virtual ACLs that behaved slightly differently etc. So potentially a large number of different access paths between a given user and resource.

Years ago, when this mess was starting to expand it was decided it was too hard to model this in the DB (and really that should've been a giant red flag), so the new approach was to load each customer's dataset up into an application server that would handle the BL, crucially including the permission logic. It very much wasn't stateless, as you mention, but I'm not sure how it could've been really, given you needed so much loaded up from the DB in order to make these permission decisions. Would've avoided a lot of headache if it had...

The consequence though of using this write-through in-memory cache was it became the source of truth.

The chief problem this led to was that shift into application land was a bell that couldn't be unrung. Everything others in this thread have complained about seeing BL spread across all sorts of SPs in a DB happened here, just in application code (which again hints I guess that the chief problem is architectural and lack of governance, not a wrong-layer problem). Nobody could properly describe the permissions system in response to support requests without a lot of code scouring, let alone hold it in their heads.

Even worse, as the application grew in size and needs, we found we still needed things we had left behind in the DB. A couple of smart devs working on the core service, because they were smart and trusted, convinced themselves that what we needed was an in-memory transaction system for managing this in-server cache (by now the core was being called the Object Cache, a name so generic that it also should've been a red flag). So a couple of years went into implementing a technically impressive reimplementation of a transaction system.

Meanwhile the system as a whole was well past the point of being needed to split up into multiple services, so a grand goal was set of moving the Object Cache into a standalone service: the Object Service. Slap an OData API on it, and then leave that API open for all internal teams to hit. By this point the core team who owned this was starting to become well aware they had fallen into a bad pit of reimplementing the Postgres DB everything still sat on: transactions, generic query API, configurable in-memory indexes for each type of object, partition loading logic for the timeseries sets, user permissions etc. Worse, the generic query API (and this is what's ultimately turned me off OData/GraphQL etc for cross-team interfaces) ran into all the same problems as an SQL interface - people in other teams would always be coming up with brand new queries your indexes hadn't anticipated, forcing new work on the keepers of the cache to support.

The way forward probably would've been to leave the permissions structure in place and pull as much as possible out of the service/cache into separate stanadlone services, i.e leave the objects in the object cache little more than just IDs you could look up elsewhere for actual attributes. We'll never really know though: it was recently decided to put the whole thing into maintenance mode, partially because nobody has the political will to fix the thing and its complexity.

I've thought a lot about the lessons of this system and where it went wrong, and I trace it ultimately to forging ahead with a permissions system (i.e core BL) that couldn't be modelled in the DB. I think that doomed us to what I'd ultimately name second system syndrome (i.e the whole stack above slowly evolved into a poor Postgres clone). Perhaps if we had been ruthless in going to a CQRS design or similar very early on we could've pulled it off (which would again demand stateless services above) we could've pulled it off.

I think a big takeway for me was to not take for granted the good things a RDBMS is giving you, especially transactionality. I think I'd bias these days towards looking for ways to scale up the powers the DB gives you (even if it involves pushing external integrations into the DB maybe via some funny FDW or transaction log integrations, or maybe using a tiered solution of DBs, one layer as the storage layer and another layer above that acting as coherent transctors, a little bit similar to Datomic's internal structure), rather than rushing toward replicating core DB functionality in application code, which can be tempting initially as the easier-looking solutions.

I think I might blog about this when the system's properly retired and the involved can look back and laugh rather than cringe...


As a rule, relational database systems have two strong areas: parallel processing and data consistency enforcement. I've always understood "don't do business logic in the database" as meaning not to use the database engine for sequential processing.

If you have a single-user database system (e.g. webapp with database backend), there is relatively little to gain implementing data validity checking in the database, other than the declarative statements versus imperative rules discussion (which can already be a huge benefit, depending on the team).

But once you have a central database with multiple frontends (common in enterprise ERP solutions), enforcing data consistency in the backend becomes pretty much unavoidable -- otherwise a single bug or new feature in one frontend could disable entire production lines.

when there's a reliable, transactional, relational datastore at the bottom [..] you want to be pushing as much as possible down the stack, to reduce the surface area that someone could [..] change data in the wrong way

Yes, this. Very much this.


I'm working on a project that is almost literally your example of reducing a relational database to a key value store, despite client applications having complex problem domains to model.

The consequence has been catastrophic data consistency problems and the whole programme rapidly grinding to a halt.

I think the problem is the overly broad definition of 'business logic' that encompasses everything from data integrity to presentation logic.


I've seen "business logic" refer to both A) how a discount promotion works and B) how to arrange images of the product for a particular set of customers. Obviously it's fine to have A in the database...


thanks for bringing that up. everyone on a team needs to have the same working definitions, and yeah... "business logic" is a term that's used a lot, but I very often see people not really understanding that it's usually meant in opposition to "display logic", not just as an interchangeable term for "any code".


Business logic being in stored procedures is the ideal place for it.

The problem with it is technical: the tooling for migrating and managing schemas and stored procedures is hot garbage, and there aren't good ways to enforce consistency between the invariants of the applications and the database.

Really, it should be possible, on both application deploy and when stored procedures are updated, to import library functions that the stored procedures use from the database and run unit tests against them to ensure that all clients are compatible.


Have done stuff like this where the queries were quite complex and essentially BI style.. Wrote a system to store and manage the procs in VCS and manage them through the ORM migration system(ORMs are quite a la cart).

Your either generating and submitting the queries through prepared statements or just storing them in the database. It's not "leaking" IMHO to do the later and in fact can come with benefits; you can call the procs while connected to the database through various other clients be it command line or a BI system. So in effect you are encapsulating the logic in the place with the most potential for re-use.


It's great when you have good database ci systems and have your database schema versioned + sprocs in source control.

But I've worked places where dba refused to use version control.


With the MS stack it's really easy. You can design the schema and procs, views, etc. in Visual Studio, then have it compare against each environment to create the deployment script.


I like ActiveRecord. I like it so much I've implemented the Active Record pattern myself and compared my results with ActiveRecord.

That said, I've come around to the Repository pattern and quite like the Ecto way of doing things. It's a lot more flexible at the expense of some simplicity. It makes some really hard things easier than AR can.


This but Hibernate + Spring Repositories. Writing SQL is fine but the 95% CRUD cases are much better served with a decently managed ORM. We have an older project here using iBATIS, MSSQL and lots of stored procedures/SQL-first approach. See this tutorial for a preview of the horror to come: https://www.tutorialspoint.com/ibatis/ibatis_create_operatio...

'Employee.xml'

Now imagine this but with stored procedures and inserts with tens of parameters.


I've worked on a system where almost all of the business logic was handled as stored procedures (apart from things that belonged at the UI level, of course). Worked fine, was easy to debug , and performed great. Sure, you had to know how to actually write good SQL, which threw off kids who think programming equals using JavaScript framework du jour, but I don't see it as a shortcoming. No XML anywhere either.

At some point they rewrote it as a web app with NHibernate and all that. Took them literally many hundreds of man-years, and it still runs like molasses. And profiling whatever crap NHibernate emits is rather joyless enterprise.


As a fellow AR lover the only things that consistently bug me are when it does things like:

- .count on a collection calls SQL every time, but there is also a .count method for arrays - which means you need to remember to use .length if you just want to count the objects loaded into memory

- ‘build’ methods for an association usually only initialize a record in memory, but with has_one associations it deletes any record that already exists

So basically when it’s not clear about the fact that it’s doing database things.


There's a .size that works for pretty much everything: if the relation isn't loaded then it will do select count(*), but for arrays and loaded associations it does length of an array


A fair comparison would be ActiveRecord vs raw Arel.

I.e. ORM vs query builder.


So you know one guy that quit over having and ORM instead of using SQL... do you have any idea how many quit due having to deal with thousands of lines of SQL string concatenations because some genius that also hates ORMs decided to do a giant project that way?

Then you call your alternative anti-orm and it generates... orm specs, and then you say "migration commands based on the git history"... what does that even mean? What does git history has to do with the database?


Any SQL wrapper worth its salt provides some mechanism to specify positional (or even named) parameters within that string (without resorting to string interpolation or concatenation) and pass them in when executing the query. Usually it'll be something like

    foos = execute_sql("SELECT * FROM foo WHERE bar = ? AND baz = ?", [bar, baz])
If you're resorting to concatenating strings, then you're almost certainly opening yourself up to SQL injections (and rather needlessly, I might add!).


Which doesn't work for table names, or for building up where clauses gradually, etc. SQLAlchemy makes those much more trivial to deal w/ the SQL Expression bits. (Though technically, that's not an ORM.)

Also, getting the library to properly expose the formatting interface is also a trick. We would do massive loads/dumps that we did not want to use Python for (but we worked in Python) as the DB-API pulls the entire result set into RAM, and we were pulling >RAM (and we didn't want to do something lower-level w/ the driver). We'd spawn psql, but that needs the query, as a complete, pre-templated string — it does not support any sort of positional parameters. And the Python library really doesn't (last I checked) have a public interface to the templater portion of the library. (I think this is b/c PG just sends the parameters, and the binding is done server-side. So, really, psql needs an interface to supply those values as arguments to the CLI.)


> We'd spawn psql, but that needs the query, as a complete, pre-templated string — it does not support any sort of positional parameters. [...] So, really, psql needs an interface to supply those values as arguments to the CLI.

I haven't tried this before, so caveat emptor, but per psql's manpage this should be supported:

    QUERY="SELECT * FROM sometable WHERE name = :'foo'"
    echo $QUERY | psql -v foo=asdf # ... other args ...
Basically: you can use colon-prefixed variables in the query text (with the variable name optionally quoted to indicate how the result should be quoted, apparently?), and set them with the -v option (which is equivalent to using the \set command within the query string itself).

This also happens to work for table names, at least per the example in the manpage:

    testdb=> \set foo 'my_table'
    testdb=> SELECT * FROM :"foo";
Except in this case it'd be more

    echo 'SELECT * FROM :"foo";' | psql -v foo=my_table # ... other args ...
Still no option there for positional parameters, but it's a start, right?


> Which doesn't work for table names, or for building up where clauses gradually, etc. SQLAlchemy makes those much more trivial to deal w/ the SQL Expression bits. (Though technically, that's not an ORM.)

Thin SQL wrappers/dsl's like the SQLAlchemy expression library are great. IMO those thin wrappers are what most people should reach for first, over a full blown ORM. A good mimimal sql wrapper will:

* Save people from pain, problems, and security issues involved in building queries through string manipulation.

* Map very closely to raw SQL

* Make it easy to compose query expressions and queries, using language native features.

* Help devs develop their SQL skills. Learning the wrapper IS learning SQL, for the most part.

I have a hard time seeing many good reasons to add any more layers of abstraction on top. That last point is particularly important to me. My first exposure to many advanced SQL techniques, was through such libraries. Since the code maps to sql quite naturally, that learning can be applied in a much wider variety of contexts. Teach someone how to do aggregations using Django's ORM, and you've taught them to do aggregations in Django's ORM, and basically nowhere else.


I think the issue the parent is referring to is when you have optional clauses in your statement. Like if you have a table with 5 update-able columns (where someone might want to update any 1 to 5 columns that you don't know in advance), and you want some generic code that updates a row, if you're writing raw SQL then you have to do concatenations to build up the "set clause":

    UPDATE sometable SET {set clause} WHERE id = ?
The "set clause" might just be "foo = ?", but it might be "foo = ?, bar = ?", or "foo = ?, baz = ?", etc.


SQL's got you covered here:

    UPDATE sometable SET
        foo = COALESCE(?, foo),
        bar = COALESCE(?, bar),
        baz = COALESCE(?, baz),
        bam = COALESCE(?, bam),
        bat = COALESCE(?, bat)
    WHERE id = ?
And then you'd pass in NULLs for any columns not being updated at the moment.

Confirmed that this works exactly as expected in SQLite:

    $ sqlite3 optional_column_test.db
    sqlite> CREATE TABLE sometable (id INTEGER PRIMARY KEY, foo, bar, baz, bam, bat);
    sqlite> INSERT INTO sometable (foo,bar,baz,bam,bat) VALUES ('You', 'are', 'a', 'bold', 'one');
    sqlite> SELECT * FROM sometable;
    1|You|are|a|bold|one
    sqlite> UPDATE sometable SET
       ...> foo = COALESCE(NULL, foo),
       ...> bar = COALESCE('ain''t', bar),
       ...> baz = COALESCE(NULL, baz),
       ...> bam = COALESCE(NULL, bam),
       ...> bat = COALESCE(NULL, bat)
       ...> WHERE id = 1;
    sqlite> SELECT * FROM sometable;
    1|You|ain't|a|bold|one


I'd never heard of COALESCE before! I'm very very far from a SQL expert, but I'm surprised that's never popped up for me before.

Are there any bad performance implications here? Like, will it still rewrite columns that don't need to be updated?


Good question.

The answer to that is almost certainly "it depends on your DB". Most query planners should be smart enough to not try to write anything, but some are smarter than others. For example, SQL Server (if I'm understanding right) does perform a log write even if it should be obvious that the value won't change: https://www.sql.kiwi/2010/08/the-impact-of-non-updating-upda...

That being said, I'd skeptical of the performance impact (if any) being all that significant; even in a worst-case scenario of "yeah, it's going to stupidly write every field every time", the columns for each row should be pretty close together both in-memory and on-disk, so unless you're writing a whole bunch of giant strings or something at once (and even then) there shouldn't be a whole lot of seeking happening.

tl;dr: assuming there's a minor performance impact is reasonable, but it should probably be measured before trying to optimize it away.


So you mean I'm writing code like this, in a language somewhat like Typescript but simplified to my end.

   function update(id: ID_TYPE, value: Partial<sometable>) {
      const set_clause = value.keys.map(k => k + " = ?").join(", ")
      const params = value.values.concat([id])

      db.query("UPDATE sometable SET " + set_clause + " WHERE = ?", params)
   }
How is code not terrible? It shouldn't pass code review. Once you concatenate strings to generate your sql, all bets are off. There will be a day when your object doesn't match your assumptions. Either you have a series of different functions, each of which will change under different circumstances; or you should just be generating this function like this directly from the schema in which case you might as well just generate code you want at build time so there's string concatenations anywhere.

Is there some other circumstance when you want that?


It is terrible. That's the problem. No one wants to write code like that, so they turn to an ORM to do it for them, in a hopefully safer way.


So I want to respond to this because there is a genuine question behind your rhetoric. And that question is "How do I cleanly do complex things in SQL without an ORM?"

And there is an answer to that. The idealistic answer is good DB design from the start. The realistic answer is by using views (sometimes materialized views) to create a clean representation out of odd tables.

The monster query can be decomposed into simple and clean operations against these views (and if necessary, views upon views).

The only caveat is you need a SQL expert to understand the performance if you want to do this at scale.


I wrote SQLAlchemy after a five year job where all we did was work with enormous views in an Oracle database. They performed nightmarishly bad. The slightest attempt to query from one of these views using something as heretical as a join would kill the database. The source code for the views was locked up tight by DBAs who had no idea how our application worked, didn't care, and they could never be bothered to help us with their horrendously inefficient views or to give us new ones that our application sorely needed in order to get data from the database in an efficient manner.

SQLAlchemy exists exactly because of the pain the "just use views" approach causes. The entire point is that you can structure a query of arbitrary complexity in a succinct manner using Python structures, while not losing any of the relational capabilities. Of course you can write queries that are too complex, if you're then lucky enough to be able to have materialized views at your disposal, you can turn the SQL you wrote into one. But that's an optimization you can use if you need it, or not.


On the one hand the article itself is offering a anecdote about being unhappy with ORMs. On the other hand you have an experience in which you were unhappy with views.

That would leave us at an impasse (he said / she said), unless we get more nuanced. So let's.

It sounds like some of your frustration is at some DBAs you knew. Firstly I've never worked with a DBA, I'm just a senior software engineer / DevOps and to me that includes an intimate knowledge of SQL perfomance, so maybe this is why I don't mind an in-database approach.

I'm fairly sure that whatever SQL Alchemy could do with a monstrously complex data-relationship views can do with better performance (the further the abstraction gets from the engine the less performance optimization it can do).

Now maybe you agree with that, and see ORMs as a tool for the kind of job where the dev is locked out of the database. If so then I can't really disagree as I haven't worked at such places much at all.


> I'm fairly sure that whatever SQL Alchemy could do with a monstrously complex data-relationship views can do with better performance (the further the abstraction gets from the engine the less performance optimization it can do).

if you have the option to use materialized views, that can be helpful when appropriate, however materialized views have their own issues, not the least of which is that they need to be refreshed in order to have current data, they can take up a vast amount of storage, not to mention a lot of databases don't support them at all. It should not be necessary to emit CREATE MATERIALIZED VIEW with super-user privileges and to build the infrastructure to store them and keep them up-to-date for every query one needs to write or modify that happens to have some degree of composed complexity.

you don't need an ORM for these queries either, just a composable query builder (like SQLAlchemy Core) that allows the composition task to be more easily organizable in application code.

none of this means you can't use views but it's unnecessary to dictate that they should be the only tool available for dealing with large composed queries.


> I'm fairly sure that whatever SQL Alchemy could do with a monstrously complex data-relationship views can do with better performance (the further the abstraction gets from the engine the less performance optimization it can do).

SQL Alchemy will spit out SQL query. Whatever the DB engine can do with queries hand-written on top of views, it can also do it with the query generated by SQL Alchemy.


> Whatever the DB engine can do with queries hand-written on top of views, it can also do it with the query generated by SQL Alchemy.

No, on two levels. Firstly, there are many features in SQL that an ORM will not support (e.g. no substitute for a materialized view).

Secondly, once you get really good at SQL, you learn that very tiny seeming things can make the difference between a query running for an hour or a second (e.g. case-insensitive search against an indexed column). Part of being expert in DB technologies is knowing how/why some ways of writing a query are very fast, and others are very slow. The idea of an ORM is to hide that complexity, which is fine for a toy todo-mvc app. But once you start querying tables with 100k rows you need to understand that complexity and master it if you want to write a fast query.


> Secondly, once you get really good at SQL, you learn that very tiny seeming things can make the difference between a query running for an hour or a second (e.g. case-insensitive search against an indexed column).

SQLAlchemy supports case-insensitive searches, per-column / per-expression collation settings, index and other SQL hint formats for all databases that support them, e.g. SQLAlchemy's query language supports most optimizing SQL syntaxes, with the possible exception of very esoteric / outdated Oracle things like which table you list first in the FROM clause (use index hints instead). We are adding esoteric performance features all the time to support not just SQL-level tricks but driver level tricks too which are usually much more consequential, such as the typing information applied to bound parameters matching up for indexes as well as special driver-level APIs to improve the speed of bulk operations.

SQLAlchemy has been around for thirteen years, plenty of SQL experts have come along and requested these features and we implement them all. Feel free to come up with examples of SQL-expert level performance optimizations that SQLAlchemy doesn't support and we'll look into them.


Yeah, don't get me wrong, I haven't used SQL alchemy and I'm not trying to besmirch it. I'm not even sure we disagree.

What I'm trying to challenge is the philosophy that an ORM is an adequate facade in place of learning how databases work. My point about views is that in my experience, the answer to disgusting queries is cleaning up the DB design (and views can be the tool to accomplish this). My point about case-insensitive searching (you can create a case-insensitive index if you want) is that a lot of db-performance stuff just can't be solved on the query side alone anyways.

It sounds like SqlAlchemy is designed with a lot of flexibility around how queries are run, so maybe you agree that understanding what's going on beneath the hood is important to handle these complicated cases.


> What I'm trying to challenge is the philosophy that an ORM is an adequate facade in place of learning how databases work.

Thank you for this response and I agree, we are likely on the same page. I don't know that anyone actually espouses that philosophy. This is the anxiety that ORM skeptics have, and certainly you get beginners who rush into using ORMs not knowing what they are doing, but if someone wants to be a DB expert, I'm pretty sure they go to read about databases :) These ORM beginners will fail either with the ORM or without out. I guess the idea is you'd prefer they "fail fast", e.g. the ORM covers for them while they proceed to screw up more deeply? This is arguable; if you've seen much of the non-ORM raw DB code I've seen, it too fails pretty hard. But even with this argument, if ORMs produce the problem of incompetents who are not found out fast enough, why hoist the denial of useful tools to those developers who do know what they're doing.


You have shifted the discussion from ability of DB engine to optimize queries to the ability of a developer to write fast queries.

But in any case, we are not discussing some abstract ORM whose "idea is to hide complexity", but SQL Alchemy whose idea (one of them, at least) is to allow you to write composable queries yet still retain full power of SQL.


You are correct about needing to understand the underlying dB to be able to optimise it. I consider myself lucky to have come from a raw sql background and to have a deep understanding about how the query planners work.

You are incorrect in your other assertion. Sqlalchemy allows for optimising your queries at runtime in a way that just isn’t available to sql views. The beauty of sqlalchemy is that allows for arbitrary composition of blocks of sql. I too have had to fight with dbas in a previous life because they didn’t want me to construct a query in code (non sqla), but I eventually won because I was magnitudes faster because I had runtime knowledge they couldn’t use.


Many of us also know stories about how ORMs can be slow if developers don't pay sufficient attention. In some cases told by database experts who see the query code locked up tight by the app team, etc :)

Having a separate "DBA" from the developers or having an adverserial relationship with said person is not really a technical problem I think.


I've had similar issues, but I got the added benefit of being "DBA-splained" on why what we were doing was terrible.

I feel like if a DBA keeps thinking of the database as the solution to all of the problems (by implementing views, triggers, stored procedures, etc) then maybe that person should be willing to support them, otherwise they really shouldn't complain that the developers moved to an ORM.


Okay, obviously I've never worked in an enterprise enough company. I've always assumed a DBA was just a member of a backend team who likes thinking about SQLy stuff. Or a sysadmin who upgrades the computer the db runs on. Basically, a hat someone wears.

Do I seriously make the correct assumption that the people who are writing the backend of many apps are actually separated from the people who decide how the data those apps process the data, and you may have no direct say about the database schema? that effectively just a client - perhaps the only client, but nothing more than a client.

Obviously you've described the disadvantage of this approach; what advantage does it have?


It makes some sense when the data is extremely valuable, like in a bank. As a developer you can only access the db through stored procedures created by the database team. Which limits the damage you can do. I haven't worked in a big enterprise in years though so I don't know how common this still is.


Wait, so you wrote this library to solve the political issue of crap DBAs?


Code inside a database is a nightmare to maintain.

Just like Data in a codebase ;-)


> The idealistic answer is good DB design from the start.

The problem with this statement is that DB schemas like code can evolve badly over time, even if they're maintained by a DBA sometimes (since they're human afterall too).

And then the DBA starts to suggest that you use things like triggers in your DB, which may or may not be good, but often surprise developers. "I just wrote 0 into the DB, why did it turn into NULL? What line of code did that? Ohhhh... there's a trigger."


There were multiple literal questions embedded in his rhetoric; which without answers, invalidate the author's solution.


> The idealistic answer is good DB design from the start.

I don't feel like this answers the question. My ORM supports views.


> What does git history has to do with the database?

Hopefully a lot. Non-versioned database schemas are the devil.


Indeed this is a bad idea for many reasons.

I tried to think about it from a gitflow release cycle perspective to be generous. This would mean constant churn of migration code for each branch of the repo.

If you have complex migrations which go beyond simply renaming columns, it would require you to write special migration code on a release branch which would then need to be merged back. yuck!


In what way is manually tracking the needed DB changes per feature branch even possibly better? Even if you are not running your migrations automatically on deploy, I would still check those migrations into version control.

Having auto-runnable migrations checked in alongside the code that depends on them allows you to review and test those migrations as part of your normal code review and CI processes.

I think there must be some misunderstanding here because I don't understand why you say either of these things:

> This would mean constant churn of migration code for each branch of the repo.

> it would require you to write special migration code on a release branch

There should only a be at most a few migrations per feature branch and merging is only an issue when you have two feature branches with inconsistent schema definitions. You should only have to add custom migrations to a release branch to fix bugs and those should be merged back into your develop branch as soon as possible to minimize effort spend merging later.


I think you and the post you're replying to have opposite opinions, despite your post seemingly starting off by agreeing with them.


No, I agree with the post I replied to in that migrations should be human versioned as opposed to auto-versioned via a VCS. Again, I was being generous to the author trying to imagine a scenario where this would not create a dumpster fire.

Hopefully I have cleared that up.


This isn't quite the same as being solely git-based, but there are numerous examples in the industry of successful use of auto-versioned declarative schema management. By this I mean, you have a repo of CREATE statements, and a schema change request = making a new commit that adds, modifies, or removes those CREATE statements. No one actually writes a "migration" in this approach.

Facebook, who operate one of the largest relational database fleets in the world, have used declarative schema management company-wide for nearly a decade. If engineered well, you end up with a system that allows developers to change schemas easily without needing any DBA intervention.


This is interesting but I am curious about some of the implications. I am under the impression that migrations encompass more than schema changes. It may require data transformations, index rebuilds, or updates to related tables. I would agree that these things are preferably avoided, but in practice it seems that they happen frequently.

I viewed the author as targeting smaller dev shops who don't have a dedicated DBA/expert. Groups who might use an ORM's migration framework for example. In which case, it would appear that there is significant loss in migration flexibility if you remove application code from the architecture.


When operating at scale, it becomes necessary to have separate processes/systems for schema changes (DDL) vs row data migrations (DML) anyway.

For example, say you need to populate a new column based on data in other columns, across an entire table. If the table is a billion rows, you can't safely run this as a single DML query. It will take too long to run and be disruptive to applications, given the locking impact; it will cause excessive journaling due to the huge transaction; there can be MVCC old-row-version-history pileups as well.

Typically companies build custom systems that handle this by splitting up DML into many smaller queries. It's then no longer atomic, which means such a system needs knowledge about application-level consistency assumptions.


> "migration commands based on the git history"... what does that even mean?

It actually seems like a really cool idea. E.g.

* Commit A: create a schema.sql file "create table t (a INT);"

* Commit B: update the schema.sql to "create table t (a INT, b INT);"

* The tool can generate a migration from A..B "alter table t add column b INT;"

It seems like it probably would work really well for creating new tables and columns in simple projects. In more complex projects I've worked on, we usually had more complicated migrations to update the data to fit the new schema, and we often had to be careful to interleave schema, data and code updates (especially if you wanted to be able to rollback).


Automatic migrations using only source-code snapshots don’t work for things like column renames or splitting/combining columns. Every tool I’ve used requires some migration metadata.


> "migration commands based on the git history"... what does that even mean? What does git history has to do with the database?

I'm not sure what git-specific interaction the author is envisioning, but more generally: declarative schema management tools are based on the notion of using a repo to store only CREATE statements. It's an infrastructure-as-code style approach, where to modify an existing table, you just modify the CREATE definition. The tool figures out how to translate this to an ALTER.

I'm the author of an existing tool in this space, Skeema [1], which supports MySQL and MariaDB. Other recent tools in this space include migra [2] and sqldef [3]. From what I understand, in the SQL Server world there are a number of others.

[1] https://www.skeema.io

[2] https://github.com/djrobstep/migra

[3] https://github.com/k0kubun/sqldef/


> So you know one guy that quit over having and ORM instead of using SQL... do you have any idea how many quit due having to deal with thousands of lines of SQL string concatenations because some genius that also hates ORMs decided to do a giant project that way?

Those aren't the only alternatives. Query builders exist.


A lot of languages have pleasant multi-line string literals. Also if you have a 1000 line hand-crafted SQL query, it's probably not going to be pleasant in your ORM either.


I know that the author of SQLAlchemy is reading these comments and I don't want to be mean, but anecdotally I hear a lot of pain stories and the OP seems to largely deserve the title "Don't Use SQLAlchemy".

Nobody suggests that ActiveRecord is perfect, but it is pretty amazing at what it does. I think a big part of the reason it's more reliably excellent is that Rails in general embraces the notion of "convention over configuration" - there's typically a correct naming scheme (with opinionated pluralization) that starts in the table scheme and flows straight through the models to the controllers and finally the view layer. I can join your Rails project and make guesses about where your logical components are located and what they are called that would beat any TV psychic cold 100:1.

Plus, much like with SQLAlchemy and basically every other modern ORM... you are always one method away from writing raw SQL if you want to be.

As always: use the tools that work best for you and for the vast majority of web development cases, you'd have to pry ActiveRecord out of my hands.


I use SQLAlchemy to serve tens of thousands of requests per second on one of my services at Reddit. It's painful, but possible.

SQLAlchemy uses the data mapper + unit of work patterns. This means that your connections and your record objects have the same lifecycle. If your request takes 200ms, then you are likely holding open your connections for 190ms, even though the only SQL query takes 20ms. If connections are scarce, and they usually are, you will not utilize them well at all.

There are two ways that I know of to solve this, neither of which is pretty: (1) run pgbouncer as a sidecar on your python processes. This has the effect of making connections a far cheaper resource. (2) After every single query, manually close the related transaction/connection/cursor in your code.

ActiveRecord defaults to giving up the connection after every query, which in a high-traffic webapp, is much less of a footgun.


(No significant experience with either SQLAlchemy or ActiveRecord, but work in Java at a shop that really strongly emphasizes the unit-of-work pattern.)

"give up connection after every query" is, IMO, not _less_ of a footgun, but a _more subtle_ footgun. That is to say, sure, for the most part, it will work ok.

_But when it doesn't_... like when your database goes down for a couple hours, or your rack loses network, or your proxy to whatever cloud provider is running your database, or maybe some other dependency had an outage between the queries... you end up not really knowing what executed and what didn't, with some fraction of your 10k req/sec leaving behind some dangling state -- a partially processed $whatsit -- and no clear/easy way to fix it or even necessarily identify how many $whatsits were affected. At least a few times, I've needed to either wait until records make their way into some OLAP system for offline analysis, or deploy some fixup code, or emulate either/both to fix stuff up after something weird happens.

At least that's what I am constantly afraid of. Do you also worry about this kind of stuff? If not, what prevents it from being a problem in your environment?

Signed,

The guy constantly badgering seemingly everyone about minding their transaction boundaries.


From my experience, there’s always one developer in every enterprise dev shop who understands transaction boundaries, isolation levels, and concurrency/reliability scenarios that do happen....and 10 other devs constantly instilling fear in him/her


But he's not saying "don't use transactions." He's saying "Don't keep your transaction open for your entire user connection for no reason. Close your completed transactions as soon as you can."


Great point, thanks!

Fun story: A coworker and I are in the process of tracking down exactly violations of this in a misbehaving app, due to an accumulation of RPC within transactions across the entire app. (I used to be responsible for it as a greenhorn ~5years ago, but walked away for the last several years to work in a different part of the org.)

The approach is actually kinda interesting: we add an interceptor in the RPC layer for every outgoing call that asks the DB machinery “am I currently in a transaction?”, and emits a warning (containing at least the RPC being made, perhaps a (partial) stack trace) if so. We had so many occurrences of those warnings that our logging framework proactively started dropping log lines! So the next step is to only emit the warnings once per $duration per message.

Another day in the life...


Just to hopefully further contribute, one of the top-deck features of the new Rails 6.0 release is its really thoughtful engineering work around multiple database support. At least, that's how it's named or described; the actual net result is a lot of smart people exhausting themselves making n-instance DB stack reliability and performance better on every chart... including setups where the value of n is 1.

What this translates to is that things should be impressively stable when your DB nodes go down, the backhoe shows up to party, or your BGP is routed to North Korea.


You can divorce your objects from the session if you want. But you lose transactions then, so with any approach that involves giving up the connection it seems to me you’ll create a whole other class of headaches for yourself.

I’m lucky in that I have low requests per second so I never have to worry about any of that too much. I don’t envy you having to deal with that sort of load.


Quick reminder that advisory locks may need to be released by the same connection that acquired them...


pretty sure pgbouncer should be a sidecar to postgres database itself and not the individual process. but otherwise, yes, it's a great answer.


Entity Framework, even with its quirks, has been the best developer experience with an ORM for me no contest. A lot of that is probably tooling; like the standard C# tooling and also Linq Pad.

SQLAlchemy is probably close to the worst experience and that's likely largely due to python and pythons tooling state.. And the docs. Just something about the docs. Still, other than bulk insert performance I got it to do just about everything I needed and came to the conclusion it was the best python ORM for my needs.


For simple queries, sure SQL is great but it is not composable. Ideally you want to send your whole query to the database at once and for a complex statement that may mean many many lines of reasonably complex joins etc. With Sqlalchmey you can write a bunch of simple components and join them together for submission at once instead of having a thousand lines of SQL that can be hard to read. You can re-use parts and even come back later and understand clearly what is happening. How do I know this? 25 years of writing SQL and at leat 6 using SQLAlchemy. A lot of the time I will still compose and test in SQL using, say DataGrip or psql, then build an SQLAlchemy query from that. This guy should look at alembic too. It has it's faults (like the half module half executable runtime), but it solves most of the problems of just having a folder with a million naively numbered SQL scripts like managing up and down versions as well as branches.


SQLAlchemy can accept raw SQL with:

  result = db.engine.execute("<sql here>")
I haven't met an ORM that wasn't more work to implement, especially when trying to keep schema changes in line with existing deployments. That said, I'll take a decent DB client SDK any day; I don't want to spend too many cycles on connection pooling and such. I just want it to handle variable substitution and concurrency primitives at the language level.

I don't think there's anything wrong with using tools to help you generate good SQL queries, but you should see them clearly and own what the database is doing. MySQL and Postgres have a myriad of useful functions and extensions that ORMs don't utilize or hide from the developer. That leaves the application to do multiple queries or a lot of logic that isn't needed.

It doesn't have to be gnarly; any SQL query that can't be handwritten on an index card is probably too cumbersome to maintain. A 1000 line SQL query is a code smell that means the data model could probably be improved, perhaps with views.


As someone who's used both SQL and Django ORM, I could ask: "Why SQL is so backward?"

Here's a datapoint I experienced:

The company I worked for didn't trust the Django ORM for DB migrations. They thought they could do it by hand. After lots of deployment failures where the migration script worked in the dev environment but failed in the prod environment, we switched to using django migrations. Haven't had an issue since.

The author even admits that migrations are a hard problem, but yet, he has no solutions other than just using raw SQL.

Edited to Add:

In Django, these migrations are handled pretty well with a single command to make the migrations. In SQL, someone would probably have to spend some time testing, and then would need to be reviewed in the PR.


In my experience Django's migrations are really solid, assuming you're using the PG or Sqlite backends. This is especially true if you're working from a new Django project, and not a "legacy system" that wasn't initially built for Django.


They seem pretty solid for MySQL as well. I only remember having some issues with VARCHAR PKs, but not even sure whose fault that was.


We're using Oracle, and it's solid there too.


I find these ORM opinions generally coming from someone who is used to, and is proficient in SQL, and only when they have a complex use-case/operation at hand.

If you consider a web project, with many developers of different levels of experience, having an ORM is a life saver. Because not everyone is constantly mindful of SQL injections. Because not everyone is constantly mindful of transaction management in requests. Those take experience.

In my experience ORMs are great 95% of the time. Those 5% can be extremely painful (I've gotten bitten by ORM Rails transaction nesting problems, and sometimes it frustrates me when I need to make something more complex). But let's not throw away the baby with the bath water.

I have had great experiences with Django ORM and SQL Alchemy. Rails ORM goes a long way, but it feels eons behind Django's.


Code reviews are for that, I've taught countless junior developers on how to write better SQL through code reviews. I've introduced them on how to mitigate the common exploits and how powerful SQL can be when needed and I believe they are better engineers because of it.

I really don't like using a tool as a crutch to a systemic deficiency of a team, in the long run you are only compounding the problem, the more complex the system gets the more workarounds and gotchas you have with ORMs so it's better to educate developers on how to not depend on them from the get go.

Yup, ORMs have their place in some projects but in the majority of what I've worked they brought more pain and confusion than just properly effing learning SQL.


> I really don't like using a tool as a crutch to a systemic deficiency of a team,

Very valid point, but then this becomes a discussion of how much you want to create mitigations in the system vs. educating developers to topics of, for example security. The larger a team becomes, the harder it is to police that every single query in your system is sanitized.

Still on the security side, Software architecture tell us not to repeat ourselves, and if sanitizing query strings can be done centrally through an ORM, then why shift the responsibility to code review processes that are especially prone to failure due to their human nature?

As you said, these kind of become a crutch to people who don't know SQL. People begin to rely on those instead of knowing the underlying mechanisms -- but hey, that's the price of any abstraction. Probably this same argument was used when high level language compilers (analogous to the ORM in this discussion) were introduced.


It will boil down to: how much more complexity having leaky abstractions such as ORM adds to the project compared to teaching others to not make the same mistakes all over the codebase with raw SQL.

As with anything related to software design and architecture, it depends, depends on the size of the project, how many teams work on the same codebase, etc. I don't think using an ORM as the sanitisation layer of your SQL queries to be a good enough feature compared to its drawbacks on complexity and abstraction leakage. The object model doesn't serve relational data very well, that is already a first sign and smell for me that ORM should be used in very specific cases instead of as the default go-to solution.

Having to deal later on the application lifecycle with partial entities and the likes (anything to do with partial data access compared to the models' defined fields) is really painful when done improperly, which is easy to do, and then we are back on square one having to teach developers how to use an ORM framework properly. Also one thing I really dislike (and that I believe will always increase a system's complexity) is making code less explicit, an ORM layer will make your code flow less accessible, you'll need to know where the layer takes over and where it leaves you to implement more complex use-cases and then you are back to writing SQL queries manually or depending on some set of features from the library implementing the specific use-case you are looking for.

I don't hate ORMs but I have mea culpa in using them improperly and learning from my past experiences, nowadays it's really hard for me to justify its usage aside from very simple data access patterns. I say that because in the end you'll always need to reason about what the ORM layer is doing in terms of SQL concepts, translating between those layers in my head is adding another complexity: cognitive load.


> I find these ORM opinions generally coming from someone who is used to, and is proficient in SQL, and only when they have a complex use-case/operation at hand.

I'm reasonably proficient in SQL. My practical blocker with ORMs in general has been that you still have to understand your application obviously, and you have to understand SQL to fix most things, and now you're adding effectively a new language and framework between them.

> I have had great experiences with Django ORM and SQL Alchemy.

I think the best ORMs aren't ORMs at all, but larger frameworks like Django and Fails, as you're making the investment to learn that framework anyway.

And in fairness, I started with Hibernate which is particularly complex, having its own query language. Many ORMs have gone the no-object route and try to directly represent relational concepts, especially jOOQ[1]. That's smart engineering to recognize that problem and minimize it, but at that point it's less of an ORM and more like numpy for the relational algebra.

[1]: http://www.jooq.org/


> Because not everyone is constantly mindful of SQL injections.

I'm a senior developer but just started using SQL seriously a few months ago, and only for my own projects, so don't have much experience; but it's difficult for me to understand this. Both in Python and in Go, every single tutorial tells you how to use parameterized queries, and every answer on StackOverflow has warnings of building SQL queries from normal string operations. Fifteen years ago I could see this being an issue; but is it really possible that in 2019 "inexperienced web developers" manage to write code vulnerable to SQL injection? Is it really only my hard-won paranoia, gained from decades of building operating systems in C, that made me immediately take these warnings to heart?

(That's a genuine question, not a rhetorical one.)


I've never seen anyone who needs an ORM handle transactions correctly in the ORM.


Every project I've seen that refused to use an ORM in favor of their own home grown solution was worse off for doing so. Almost every time someone claims that the ORM can't be used, I would point out how it could and that code became simpler and easier to maintain. If they read the documentation and used a bit of creativity instead of fighting against their tools, they wouldn't be left with such a mess.

This post reads to me as the author not having the proper ability to structure and plan their projects so they blame their tools instead of themselves. So instead of understanding limitations and tradeoffs, they try to build their own magic lamp that solves every problem under the sun.


My experience is actually the opposite - every project I worked on that relied on an ORM actually benefited from replacing it with a clear architecture pattern and a query builder. Having the concrete notion of what is being passed to the DB allows easier debugging, and it is way easier to solve performance bottlenecks. As an example, recently I replaced several hundred lines of complex ORM oriented code with a 10 line CTE that would perform the iteration and calculation being done on the client side - the result was a 10x increase in performance with a fraction of the memory. By not using an ORM, the developer also stops treating the database as a "black box". It is a "black box" conceptually, but the more you understand how it works, the better your applications will perform. As an example - knowing if, inside a transaction, when you're performing a read it comes from the server you just started the transaction or some slave that may or may not be updated (due to propagation delay) may seem picky, but its one of those things that is very hard to debug afterwards as a transient error.


My ORM (and probably yours) handles CTEs ;)


Yes and no, it seems its an external module. And this is part of the problem - why would I need an external dsl extension to translate a pure relational concept into SQL? Debugging the query will require sql knowledge, regardless of ORM.In fact, the ORM is another layer to be debugged, it just adds complexity.


I've had to put up several services throughout the years that performed exactly one or two queries, and a few more that performed less than five.

It was a matter of setting up the database, writing a prepared statement, and calling that prepared statement with the relevant arguments. Bringing in an ORM for these cases is absolutely overkill.

As your requirements constantly balloon, perhaps adding an ORM (or just a simple query builder) is appropriate. But none of the services I mentioned above have grown in that manner, and most are still up and have been running (and modified) for several years now.

I've been on projects where an ORM and/or query builder was likely the right answer, especially for automated migration and rollback scripts. But the way you're presenting it seems to be all or nothing, and I couldn't disagree more.


You'll have to pry Hibernate and Spring Data JPA from my cold, dead hands. Extending CrudRepository and getting a ready to use repo for an entity type, combined with annotations for customer JPQL queries? Yeah, I'll take that over any hand-rolled, low-level SQL interfacing for almost anything.

But, as always, "use the right tool for the job." I have gotten tremendous value from ORM's over the years, but I'm not going to insist on using one for everything just "because".


Ha. Funny but I’m the exact opposite. Had to use hibernate and spring once. Trying to debug with 45 XML config files scarred me for life. That coupled with MySQL who even Ellison will tell you, is a toy, I saw the writing on the wall and quit the job a couple of months before that whole team got fired while the company had no choice but to throw the code in the bin. My view is that ORMs can never compete with a proper enterprise scale RDBMS with people who know how to use it.


JPA/Hibernate has moved on a lot from the church-of-XML days. The basics are a lot smoother, with annotations and sensible defaults and so on, and there are some pretty nice additions.

My favourite neglected feature is JPQL with constructor expressions, which lets you write queries in an SQL-like language, but in terms of objects rather than tables, which is slightly easier, and materialise results directly into objects of your choice.


I don’t see the point of using an SQL like language instead of actual SQL unless you need to support multiple different RDBMSs.


One of the nicer JPQL features is being able to navigate the Java object model and have that translated to the equivalent joins in SQL.

However, there are things that JPQL can’t do, but it’s easy enough to create a native SQL query and have its results map to JPA entities. I use Spring Boot (JPA, Spring Data, Spring Data REST) for the sheer convenience of it and speed of development, not to abstract away the database, which is always PostgreSQL. I am very comfortable writing SQL queries and making use of PostgreSQL-specific features and Spring Boot with JPA certainly lets me do that when I want to.


Your first and last sentences seem to contradict.


How so? I think the overall point of my post is clear, and consistent: I am a big fan of ORM's, won't be giving them up because some blogger-guy wrote another "ORM's are evil" post, and find them very useful for almost all of the things I do with databases. But the "almost" there is the key word. So, again, use the right tool for the job. Sometimes, in my experience, it is an ORM like Hibernate, other times something else makes sense. For example, I do some work with Redash for creating dashboards, and in that environment, you're working with SQL directly. shrug


It sounds like you will


At my company we use C#/EF with 50 developers and a gigantic codebase and database.

A statically typed binding to the database with 'find all references' etc.. The ability to write a few lines of simple understandable code that turns into a much more verbose SQL query aids in readability, maintainability, and understand ability of the code base.

I have used other ORMs and feel that EF/LINQ is way simpler in that the same query syntax is used for in-memory and database operations. They can be mixed and matched for doing really intensive data processing with relatively few lines of very easy to read code.

My only comment would be that using an ORM - you do need to understand how it works, and how the code you write turns into SQL and a lot of the details of contexts and tracking.

It's like flying a plane - big, complicated, it will get you where you need to go very fast and effectively, but if you don't know what you're doing you'll crash right into a mountainside.


IMO, SQL is a domain specific language that is far from verbose. If you need to understand or predict the SQL and let that influence your ORM code, then just use SQL.


If you're using a dynamic language, an ORM doesn't provide much. But in C# LINQ has the benefit of being strongly typed, this means your DB schema can be evolved. If you write raw SQL (or Python) and you rename a table/entity or column/property you gonna have a bad time. In C# you can just use a refactoring or, worst case, compile time errors.


You can write very simple ORM code that results in very complex and verbose SQL. Especially when you want to return structured data that includes internal aggregations. If an ORM is not used, you often see a lot of data manipulation after the query is performed.


Insert queries are tedious. Especially if you are using named parameters.

    Foo.withSession { session ->
        session.createSQLQuery("INSERT INTO FOO (bar, baz) VALUES (:bar, :baz)")
               .setParameter("bar", "bar")
               .setParameter("baz", "baz")
               .executeUpdate()
    }
vs

    new Foo(bar: "bar", baz:"baz").save()
I don't need more justification to use an ORM.


If you stop your examination of a technology at the most trivial use case I'm not sure what there is to discuss but...

Right off the bat, its obvious how to do a transaction in the first example but not the second. Its also obvious to use complex SQL functions in the first style and not the second.


It's not a dichotomy between raw SQL and ORMs.

Here's your query in knex, a query builder for JS:

    knex.insert({ bar: 'bar', baz: 'baz' }).into('foo')


If you find yourself having to type insert queries over and over again, then you're probably doing something wrong. I mean it takes like what, 5s to write that line of SQL?


Exactly, I don't get why people are against it. All ORM allows you to write raw query and using ORM does not mean that you suck at SQL or vice versa.


It is the same as with all frameworks: it promises productivity and speed of development, but in exchange for that the problem that you are trying to solve must fit within the model and the assumptions that were made by authors of the framework. In the end, that is never really the case and depending on the situation, fixing the last 20% of your work takes less/all/more of the amount of time that you saved. So sometimes a framework is a good choice, but is never as easy as the tutorial makes it seem.

ORM's are a bad case of this, because they try to bridge the impedance mismatch between how data is stored and linked in the database and the programming language you are using. This creates a forced model and a whole lot of assumptions and in the end it is still not good enough. Fixing the part of your project that does not fit the ORM's assumptions gets magnitudes harder because the bridge they built is very complex.

Yes, most ORM's allow you to write direct SQL queries, but that does not really solve the problem. You work the way it was intended, or you do not not. If the latter is the case, you will encounter difficulties, no matter the way break the 'rules'.


If the code is written clearly and it uses just SQL, I can almost immediately understand what is does. For ORMs it takes a lot more effort to understand what's going on when you read the code.


I have seen a lot of people in interviews that have failed SQL exercises and said that they are rusty because they have been using ORMs for too long


I'm also rusty on my assembler because I generally use high level languages and compilers/interpreters; that's what abstractions do, and not really a reason to avoid them, since people can generally solve the problem anyway, it just takes a bit more time (more than saved by using the abstraction most of the time).


But assembler is the wrong analogy for SQL. SQL isn't a lower level version of procedural code, it's a domain specific language.

When you write a technical paper in English, you switch to math equations when appropriate. Math isn't assembly language, its the correct language. It's the native language.

Yes, you can always describe your equations with English prose—and if you're doing simple addition and subtraction, it's probably nicer to write that in English. But once you want to do anything remotely complex, writing it as a math equation is more terse and less ambiguous.


> Yes, you can always describe your equations with English prose—and if you're doing simple addition and subtraction, it's probably nicer to write that in English. But once you want to do anything remotely complex, writing it as a math equation is more terse and less ambiguous.

Right! But 99% of the time, I do not want to do something complex - I just want to load a few rows based on simple search parameters, and save a changed values (which may involve heavy data processing, but not relational). Hence only using SQL rarely, and therefore getting rusty.


If you're only doing basic CRUD operations, then fair enough.

That said, based on my experience analysing applications, "loading a few rows" to perform "heavy data processing" and then "save a changed value" smells suspiciously like "the entire task can be rewritten as a single, moderate complexity query and the data never even had to leave the database server."

I'm not saying that's the case in your own generic hypothetical, as there are indeed forms of business logic and advanced manipulations that exceed the scope of a database server. But you might be amazed at what you can do within the SQL language. For example if you aren't intimately familiar with window functions like DENSE_RANK() then do yourself a favour and learn about those—and then contemplate how you could use them within a subquery joined to a table that you're updating.


Well, my current "heavy data processing" is generating a 3D render from some metadata stored in the database. Postgres is amazing, but I don't think it can do that yet :)

My experience over my career has been this: there's CRUD, lots of IO, and a bunch of data processing that just needs specialized software (image, video, weather simulation, etc). I try to offload what I can to the database - and yeah, I know about window functions - but other than for the occasional report, it just doesn't move the needle.


Fair enough. It looks like we use databases in rather different ways.


Why have two styles when its obvious you'll have to fall back to raw SQL?


Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: