Hacker News new | past | comments | ask | show | jobs | submit login
Python: Just Write SQL (joaodlf.com)
223 points by joaodlf 3 months ago | hide | past | favorite | 277 comments

ORMs do much more than "write SQL". This is about 40% of the value they add.

As this argument comes up over, and over, and over, and over again, writers of the "bah ORM" club continuously thinking, well I'm not sure, that ORMs are just going to go "poof" one day? I wrote some years back the "SQL is Just As Easy as an ORM Challenge" which demonstrates maybe a few little things that ORMs do for you besides "write SQL", like persisting and loading data between classes and tables that are joined in various very common ways to represent associations between classes:


this is why whenever someone writes one of these "just write SQL" comments, or wow here a whole blog post! wow. I just shake my head. Because this is not at all what the ORM is really getting you. Plenty of ORMs let you write raw SQL or something very close to it. The SQL is not really the point. It's about the rows and objects, moving the data from the objects to the INSERT statement, moving the data from the rows you SELECTed back to the objects. Not to mention abstraction over all the other messy things the database drivers do like dealing with datatypes and stuff like that.

It looks like in this blog post, they actually implemented their own nano-ORM that stores one row and queries one table. Well great, now scale that approach up and see how much fun it is to write the same boilerplate XYZRepository / XYZPostgresqlRepository code with the same INSERT / SELECT statement over, and over again. I'd sure want to automate all that tedium. I'd want a one-to-many collection too maybe.

You can use SQLAlchemy (which I wrote) and write all the SQL 100% yourself as literal strings, and still use the ORM, and still be using an enormous amount of automation to deal with the database drivers and moving data between your objects and rows. But why would anyone really want to, writing SQL for CRUD is really repetitive and tedious. Computer can do that for you.

The one distinguishing feature of an ORM is that it drops SQL and the relational paradigm and places the developer completely within the OOP world. If you drop this, you have merely a database connector library, with much more freedom of behavior than an ORM.

Yes, since they are an notoriously bad abstraction, every ORM will give you an escape hatch for the minority of tasks that it can't abstract at all. That escape hatch is not in any way a defining feature of the system.

Now, about that extra freedom that you get from dropping the requirement that your connector is an ORM... well, neither your data at rest, the presentation to the user, the abstraction for a service client, nor your validation rules benefit from OOP. Proof of that is that OOP interfaces from all of those things precede the current popular ones, and all of them were abandoned due to fundamental dissonance between the model and the data.

The rationale for an ORM is that, even though none of the interfaces you actually want for your data is best done in OOP, somehow OOP is still the best way to integrate them so that you can reuse data and metadata. This thought is not completely without merit, but there is very little empirical motivation for it, and the tiny amount that exists is more than completely explained by the OOP hype that only started to die around a decade ago.

EDIT: Oh I saw you wrote SQL Alchemy! First, thank you for that great piece of software.

Now, SQL Alchemy does provide a lot of useful ways to postpone the object representation or even to map your values into dumb data. My comment up there is on theoretical limitations, but on practice, I do think it's the best option available on Python. (And maybe about the best option afforded, since the language is intrinsically biased into OOP.)

Okay, you're the expert here, and I'll happily concede that I am not (and apologise in advance if I seem to be disrespectful), but ...

> The SQL is not really the point. It's about the rows and objects, moving the data from the objects to the INSERT statement, moving the data from the rows you SELECTed back to the objects.

I think that that is the problem: mapping a relational dataset to a hierarchical dataset is the digital equivalent of pounding a square peg into a round hole.

I know you've read Ted Neward's "The Vietnam of Computer Science." (it's a short read, so search for it), and he articulated the same thoughts I had each time I had to work with an ORM in a mainstream language.

There is an impedance mismatch of sorts between relational data and hierarchical data. In cases where there is no hierarchy, you can do away with the ORM completely and turn out much easier to read code, for example with `sqlc` and `Go`.[1]

The problem with the ORM is the programming language - one in which hierarchical objects are the idiomatic way to code (all of the mainstream OO languages) is always going to require some wrangling.

> still be using an enormous amount of automation to deal with the database drivers and moving data between your objects and rows.

Unless you're using `sqlc` and Go, where the automation is provided

[1] Now you may argue that sqlc is technically an ORM, but then where do we draw the line for calling something an ORM? Is it the case that any method for "generating boilerplate for mapping relational datasets to in-program structures" is going to be called an ORM? Because to my mind, the result is so different for in-program representation that they aren't the same thing.

yah I read Neward's thing, and it was one of the main reasons I wrote SQLAlchemy in the first place, because he was just so wrong. It read like he tried to write some object relational thing and it didn't work out, so he goes off and rant rant ORMs are wrong. Kind of proving that post wrong was one of the primary goals of SQLAlchemy, really, where I sought to change the question of "impedance mismatch" and "leaky abstraction" and all that and redefined the ORM / SQL abstraction layer as *automation*, not any attempt to "hide" anything.

I mean, that was really an important point in time when there really werent ORMs that were easy to work with, there was Hibernate in a very early stage and there were overly simplistic things for Perl, so I thought it was important that this "better way" I had in mind could be put out there, before the idea that "yeah let's all avoid ORMs unconditionally" could take hold.

Yeah if your app has one table and two queries, write the SQL and marshall the data yourself. I think everyone should do that approach anyway for awhile so they know what's actually going on. But if you are writing for 1200 tables, it's just not practical. if the Go thing is also automating that and generating SQL / data marshalling boilerplate code for 1200 classes, yes that's kind of ORMish. that's a totally valid way ORMs are written and if I had multiple lives I'd probably write a stored procedure ORM that does something like that too.

> But if you are writing for 1200 tables

Have you ever managed 1200 "live" (in-development) tables on a SqlAlchemy app? That is impressive. Not only because of the amount of work that entices, but also because you can just read the schema and provide a dynamic object representation of it (think dataclasses, but based on a table schema). Also, if you're querying 1200 tables, python might not be such a good idea. Impedance mismatch at a language level.

> boilerplate code for 1200 classes, yes that's kind of ORMish

Yes, please do explain how a Python ORM can save you from having 1200 classes you need to maintain, plus the need for specialized queries.

Not personally but we absolutely have users dealing with 1000+ table databases . SQLAlchemy includes mass reflection and dynamic mapping features and there are also extensions like SQLACodeGen that generate mappings as python source code. Additionally, its not unreasonable that applications within an organization are composed such that they refer only to a particular segment of such a database, but across the organization there is source code that refers to the database as a whole in the aggregate.

I don't think it is farfetched to say that, for the good and bad, modern software development is moving away from a single project having to handle "1200 tables". As we see the growth of "services" (gasp, microservices!), the scope for codebases is reduced, hence why the pattern in my post (OP here) is so common to see in Go. Are Go developers masochists? No... But when you're working on (micro)services and your immediate work only touches 5 tables and the relationship between them, it's really not inconceivable to just reach for a database adapter and simple abstractions.

I'm going to propose, really just out of my butt so to speak, that most Go apps that use databases nonetheless are themselves middleware kinds of services that are involved with software infrastructure, as opposed to what we might call "business cases". That is, American Express might have a bunch of Go services that are deriving data from small configurationally-oriented databases all around the organization, but "the database with everyone's account information and credit card statements" is absolutely not a five table DB with a single Go application on top of it. 1200-table databases at the center of business cases will continue to exist, it's just Go applications are not themselves centralized business applications, Go is currently an infrastructure language (I googled around for this conjecture and it's obviously debated, but is still a pretty prevalent assertion I can find being made a lot). The boring business stuff is still in places like Java, Python, C#, etc.

> I mean, that was really an important point in time when there really werent ORMs that were easy to work with, there was Hibernate in a very early stage and there were overly simplistic things for Perl, so I thought it was important that this "better way" I had in mind could be put out there, before the idea that "yeah let's all avoid ORMs unconditionally" could take hold.

I totally remember this time. It was around 2002~2005.

I started off with writing a discussion forum and a calendar booking webapp for my high school in PHP + MySQL. After working on it for a year or so, I realized I was basically re-typing a lot of SQL and boilerplate code, which repeated no matter whether I was working on a Post object, a User object, or a Event object. I almost wrote a half-assed ORM in PHP.

I then got an internship job, and they had a simple Java webapp project. Hibernate. In retrospect it was a huge pain, and we could probably have completed the project in 30% of the time if we used something more effective (but then, web frameworks weren't a thing in 2004), but still Hibernate was probably better than writing raw SQL for our cases.

Then later, at the same company, they had another webapp project in Perl. I don't remember the name of the ORM but it was kind of "magic" - as you said, overly simplistic, everything was implicit but seems to worked.

Shortly after I encountered Django, which seems to have solved all my problems with ORMs and I been happy with the way it works for the almost 20 years of using it.

To me the utility of ORMs is just obvious once you've been through writing CRUD without ORMs on a deadline, and tried anything that's not called Hibernate.

I'm not sure why there are so many people who apparently develops or maintains a codebase that deals with only a table or two ranting about ORMs. To me this isn't an issue with ORMs at all, but rather the curious phenomenon in the industry there is extreme specialization to the point that businesses can afford to put a person (or even a team), full time, working on a small part of the business only involving one table or two. (Can't say I'm not in this situation in my professional work today either!)

To those who say don't use ORMs -- try implementing HN in a week, without an ORM. It can probably be done either way, but with an ORM you end up with 50% less code and probably less sore fingers.

> try implementing HN in a week

Not all of us work with prototypes someone will rewrite later on. Design a high traffic, low budget site with Django, and you will quickly understand the trade offs. It's not all about less coding effort; in most dev positions that matter, your work is cheap compared to the runtime cost - eg. optimizing something to take 10% less has the potential to shave multiple times your salary off the global cost pool.

I completely agree with you. Every single time someone says: “Just write your own abstraction over an sql generator ”, it eventually devolves into a full blown ORM.

I swear the majority of these opinion posts against ORMs are from people who must have worked in a badly implemented project that left them with a bad experience and they blamed the pattern rather than the technology.

One of the best bits about an ORM is making it consistent for non-db users on the team to simply grok and work with, without creating monstrous and hard to debug joins everywhere. But when badly set up it can lead to a lot of debugging spaghetti. Which is the same as can happen with SQL but I suppose people think that at least there’s one less layer to debug while ignoring the problem is actually how they got where they are and not the technology

I switched a project from manually written sql to sql alchemy on a project that’s used by multiple Oscar nominated films daily for reviews. The SQL version was gross and impossible for the team to manage because it had bloated over the years, with no nice way to detangle the statement generations and joins. SQL Alchemy made it so any one of the technical directors on the team could step in and add new functionality, without serious performance footguns. Instead of me having to clean up bad sql every year (projects would fork per film and merge at the end) to keep performance up, I could trust the ORM to do that for me.

At its worst, it was way too easy for TDs to get the raw SQL to be tens of minutes per review session by structuring their logic incorrectly, but it was so difficult to see. Switching to an ORM meant I could get the performance down to seconds per session and they couldn’t destroy the performance in subtle ways.

> I swear the majority of these opinion posts against ORMs are from people who must have worked in a badly implemented project that left them with a bad experience and they blamed the pattern rather than the technology.

Fair. But you then proceed to detail a personal experience on the other side of the spectrum: Badly written code, without an ORM, and how it was fixed by introducing an ORM.

I think we can all agree that you can write bad code, with or without an ORM :). Not that this is entirely relevant to my post, I am simply advocating for writing more SQL, not how to write a good object mapper. That's a different beast, and I purposely kept that simple just to illustrate that it is possible to get started without too much pain.

My point with the personal anecdote is precisely that you can write bad or good code with both. however that an ORM can allow for more consistent experience across many more people.

My point is different than your takeaway. In my point I’m not blaming the technology, I’m blaming the people involved with using it in production.

I specifically point out that a well versed engineer can write a SQL based system well. An ORM just means I can diffuse that responsibility over multiple people more reliably.

> My point with the personal anecdote is precisely that you can write bad or good code with both. however that an ORM can allow for more consistent experience across many more people.

I find it interesting that you say that; my takes is that it's the other way around!

SQL join statements look the same no matter what programming language the reader is used to, but each ORM differs in the way the join looks to the reader. The EF method of filtering your results set in C# looks very different indeed to how the ORM for Python would do it.

Every ORM looks different, which results in a very inconsistent experience for people, especially when you bring in a DB expert to figure out why there's a slowdown somewhere, and he cannot just visually inspect the EF/SQLAlchemy code and say "Well, here's what's wrong".

DB experts can usually very easily do that just by looking at the SQL, no matter what programming language was used.

DB experts tend to be the minority though. It’s about indexing (pun intended) on the expert experience vs the generalist experience.

Often those may be very competing goals , where nativeness to the primary language used by the team might be more important than the nativeness of the expert.

To be fair, in sqlalchemy you could just do “print(query)” and you’d see your sql.

> To be fair, in sqlalchemy you could just do “print(query)” and you’d see your sql.

I wasn't saying that you couldn't see the raw SQL, I'm saying that the ORM syntax does not result in a consistent experience for everyone.

Sure, but if you’re working in a sqlalchemy codebase, you should really learn enough sqlalchemy to get by. I don’t jump between languages and frameworks and expect a consistent experience but ideally I’d have an experience streamlined for the context.

If you’re a db expert that’s doing drive-by optimisation of queries in sqla then you can print them to see what’s going on - is that not broadly what you’re concerned with in your original comment?

> Badly written code, without an ORM, and how it was fixed by introducing an ORM.

Funny how those anecdotal bits never get published, eh? Its always "I shaved my db requirements in half by actually understanding what was going on because my orm is magic". Odd, innit?

Enough people have had bad experiences with ORMs who have decided to evolve and find alternatives. ORMs are bulky and do not provide value to many modern functional, data-first development paradigms.

Why speak in absolutes? Swift data for example is a data first ORM https://developer.apple.com/documentation/SwiftData

Nobody is saying only use an ORM, but the anti-ORM crowd seem to think that there’s only one true way to do things.

I don’t know why so many programmers think purely in binary “good or bad”. Different projects may have different requirements.

Why hear in absolutes? I said essentially what you did: "Different projects may have different requirements"

You find ORMs useful, I do not, they aren't mutually exclusive.

Also, An ORM is inappropriate for certain forms of inter-team collaboration. Data scientists often prepare SQL that we need to integrate into our services. Asking all at a company to learn a language specific ORM isn't practical. SQL is, for better and worse, an important common tool.

> I swear the majority of these opinion posts against ORMs are from people who must have ...

"microservices" - https://news.ycombinator.com/item?id=37125636

That cleared things up, at least for me. I totally agree that if your job is to essentially write a custom, RESTful, fault tolerant, 99.999% uptime accessor for a table or two, basically you're hired to be the ORM, so you don't want to just suggest people to use a ORM framework and effectively make yourself redundant.

> and still be using an enormous amount of automation to deal with the database drivers and moving data between your objects and rows

I've found I would generally only need the handling of database drivers and query building since I'm already writing a validation layer and to add data marshalling to that is pretty trivial.

Likewise practicing YAGNI, what is the chances I need multiple database drivers for different databases, it's extremely unlikely that I'm going to be chopping and changing between different databases so I'm really only writing that code once.

I would argue to start with writing the basic SQL queries and adding an ORM later when you know you actually need it.

It's much easier to onboard someone into pure python code + a db driver vs having to onboard someone to SQLAlchemy, since it is quite a complex piece of software, necessarily complex for what it is trying to achieve but if you don't need it it's not a good fit.

> I would argue to start with writing the basic SQL queries and adding an ORM later when you know you actually need it.

I think you missed the point of the parent comment, which is that ORM's are not about writing SQL queries (although they do that). But ORMs are about moving data around, transforming it from rows and columns into meaningful objects in the project's language and data model.

As the parent comment suggested, if you are dying to write your own SQL (which does often happen as queries get more complex and don't fit into ORM language model) then you can write raw SQL but still let the ORM do the heavy lifting so you can take advantage of those features, which is the majority justification for the ORM in the first place.

You are basically suggesting what the original post author is suggesting. The comment above provided a rebuttal to their argument and you replied by suggesting the same thing they originally rebutted to. Hence a circular argument.

> But ORMs are about moving data around, transforming it from rows and columns into meaningful objects in the project's language and data model.

I made my point about this but will repeat, I am generally writing validation for this, adding the code that converts this into known types isn't significantly more work, if I was never writing the validation this point would make sense. Also the ORM the parent comment is an author of doesn't do this, it generates a class with a ton of ancillary ORM specific functions that I don't care for and many ORMs do that.

I don't even agree with what the article states, the article is effectively creating a custom ORM, I'm saying write a function that converts the data from your database driver to the type of thing you want, write a function that validates that data, those two can be 1 function for simple data but likely will be 1 function which is a composition of other functions.

When you get to the point where that is tedious you now likely have a good idea of what you want in an ORM and are significantly better equipped to make an informed decision for your specific project, you might find you end up only wanting a query builder.

Regarding abstraction over database drivers, well how often do you need an abstraction over database drivers, are you really using two or more different databases in a single project?

My ideal workflow is: 1. Query exactly what I want 2. Make it valid data in the form I want

That valid data shouldn't have a ton of extra stuff attached to it, I want plain old data that is validated and the correct type.

Why would I add a dependency that is complex before I can make an informed decision that I would even need it?

> Regarding abstraction over database drivers, well how often do you need an abstraction over database drivers, are you really using two or more different databases in a single project?

Extending on this, you actually shouldn't even use different databases. There are bugs and quirks between different databases on things like string matching :) Having a multiple database project is often a nightmare, as you need to track all of these small differences and nuances that may affect the way your system works.

> Likewise practicing YAGNI

This is why I have come to hate YAGNI. Nowadays when it's said what I hear is "I don't understand why I need it (yet)".

An app with 3 endpoints that makes 12 SQL queries won't need it.

An app with 30 endpoints that makes 100s of queries might need it.

In this case, it's a matter of not knowing whether you need it or even what iteration of it you need. Different ORMs have different tradeoffs, you don't even know which tradeoffs you want to make.

First of all, thank you for SQLAlchemy! If I ever had to make a final choice in how I would interact with a database for a very large project that involves a considerable dev team, I would always bet on SQLAlchemy. Not that I would necessarily like all aspects of it, but when it comes to Python and SQL - “Nobody ever got fired for picking SQLAlchemy.”.

With that out of the way, despite ORMs doing much more than "just writing SQL", it is exactly on that point that I flinch: Most devs should be exposed to SQL. And if your project allows you to build around simple enough abstractions so that you aren't reinventing the wheel, you should definitely be writing SQL. Especially if you don't know SQL yet - which is the growing case of new devs coming into the job market.

You can achieve a lot with SQlAlchemy Core, a tool that I absolutely recommend, but my post is just a simple alternative to get developers to think about their approach. If that results in some devs reconsidering using "full fat" SQLAlchemy and to try SQLAlchemy Core, that's a win for me!

Your gist tries to highlight the difficulty of doing certain things without an ORM. Migrations (as just 1 example) doesn't need to be hard, simple tools like flyway, or migrate (https://github.com/golang-migrate/migrate) achieve a similar result (while also keeping you on the path of writing SQL!). Deep and complex relationships between objects also don't need to be hard - typically people approach this subject with a requirement to be very flexible in the way they want to build queries and objects, but that to me in a sign that maybe they should reconsider their business logic AND reconsider that, just maybe, their project doesn't require all that flexibility, it is fairly straightforward to extend objects and introduce some more complex representations as and when it is needed - will all of this make me write code faster? Absolutely not. That is why you have spent so much time perfecting SQLAlchemy, but then again, I am not advocating for devs to go and replace their usage of ORMs, just presenting an alternative that may or may not fit their needs for a new project + give devs the chance to learn something that the ORM might have taken away.

> writing SQL for CRUD is really repetitive and tedious

Agreed, and this is the primary reason that ORMs are a necessary tool. And possibly that it's easier to train a junior developer to use one than it is to get them to a basic level of proficiency (and security awareness!) in SQL.

That said, I think this is one of those areas where It's Complicated, because inefficient database calls due to ORM usage is one of the primary ways I see applications completely break down. For most types of apps I've seen, which are very read-heavy and which aren't doing intensive writes, roughly all your writes should probably be using the ORM (including such niceties as validations, default scopes, all that nice stuff), and if it's simple enough, your "show" actions (fetch and display one entity) may be fine as well, but every "dashboard" and "index" action (show many entities, basically things with joins) likely need to be written in SQL.

In my experience (Rails), the object instantiation cost is insane, much greater than the actual time talking to the DB, so not only do you need to write SQL, but you need to handle the data that comes back without instantiating ActiveRecord models.

This is much harder work (more specifically, it needs much greater skill and experience, and is easier to make a mess of) versus using the ORM and models, but only on apps with tiny amounts of data per request, or very low request volume, can the ORM be a serious exclusive option for this task. Unless you want to end up like an app I once was asked to help fix, where they were on the $9,000 per month Heroku postgres instance, and since that's the biggest one, they could "scale" no bigger. (Okay, this wasn't their only problem, their main one was not understanding that you don't sprinkle analytics DB writes all over the place because now the simple high-volume "read" pageviews can't be generated using just a read replica).

> The SQL is not really the point. It's about the rows and objects, moving the data from the objects to the INSERT statement, moving the data from the rows you SELECTed back to the objects.

That's easy. It's a database mapping library. Write the query, give it an object and it fills it in or reads from it. You can do this with annotations, or struct tags in the Go world. There's no need to introduce abstractions over SQL joins which I find very off-putting, because the abstractions are never perfect and suddenly you have to learn invented concepts just because you didn't want to write a JOIN in an SQL query but rather have some clever framework introspect some classes and automagically write the JOINs for you.

To add to what you're saying: It's not like the Object-relational impedance mismatch[0] is some great unknown property of ORMs. Since people have been putting ORMs into production designers of these systems have been well aware that you must always chose a trade-off between a full functional object system and a fully relational one.

And the mismatch has two sides. If one's answer is "just use SQL!" then you're going to have new problems dealing with the mismatch coming from the SQL side of things.

The pre-ORM solution to this was not simply to shove a bunch of SQL in your application logic (though this was done), but to have a much, much more complex set of data layer logic. I'm guessing most people today writing about eschewing ORMs in favor of pure SQL have never used: stored procedures, triggers, cascades etc. I personally do miss some of the features from that era of software, but there's a lot more complexity to the "just write SQL" approach than most people realize.

0. https://en.wikipedia.org/wiki/Object%E2%80%93relational_impe...

While I somewhat agree that a lot of these articles are people who just don't actually try/use the full feature set of ORMs, I don't agree with the overall premise you're presenting that they really do more than write SQL for you. The other things they provide are largely just abstractions around how the queried data is returned and some additional metadata tracking of the relationships. Your estimation of those parts being 60% of the value added is probably generally wrong for most users of SQLAlchemy. Not having to consider or use another language syntax while writing your code is probably closer to 70-80% of the value for most people.

Your example gist is essentially just "ORMs excel at this one thing, anything else is worthless" and also effectively hides code ("the benefit of the library!!!") to make it a very much disingenuous comparison.

> But why would anyone really want to, writing SQL for CRUD is really repetitive and tedious. Computer can do that for you.


I wanted to love SQLAlchemy but even in some relatively simple things it generates really asinine queries that take insane amounts of time. It's just usually not noticeably insane amounts of time at first. So you query for some data and wind up with a query that returns in 20ms when there's 20 rows in the database but falls apart when there's 200,000.

It's also bad at projecting analytics type queries onto a transactionally-normalized database, but then most ORMs are not great at that.

I had both of these issues enough times that I instead just opted to start handcrafting queries in those cases and using a single "ResultSet" type class that projects as both a SimpleNamespace and a dict.

Would need to know what versions of SQLAlchemy you spent time with and what exactly were these "asinine" queries you refer towards. SQLAlchemy generates very good queries these days for the information its given. Years ago there was heavy use of large complicated subqueries for many cases and those days are long gone.

Also my 40% / 60% breakdown is based on the SQLAlchemy source code itself regarding what parts of the code deal with generating SQL and what parts deal with all the rest. 40% is likely a large overestimate.

40% LOC != 40% of the value, which is what your comment stated.

My most recent experience of needing to hand-craft queries because SQLAlchemy was doing something truly hilarious was on 1.4.40-something from 2022 (I think possibly .41 or .42?).

I don't have the specifics in front of me as it was for a prior job but rough outline of what I observed:

A simple select including several related tables with filters on each table resulted in an individual subquery for each *filter* being applied. The resulting SQL was incredibly large, selected data in a poor way, and when replaced with a simplified single query that let the DB do its thing sped up about 100x.

Those days may be gone now with 2.0 but I don't consider barely a year ago "long gone".

I think there's a great place for Create/Update/Delete, and generally when doing trivial Reads. But the lesson I've learned with basically every ORM is that once a third table becomes involved you probably want to write your own SQL because they'll break in some weird way.

This is essentially how I use SQLAlchemy in current projects and it's been great in that regard.

this sounds like the select() object was being used incorrectly, using a pattern that currently emits very prominent deprecation warnings which is:

   stmt = select(...)
   stmt = stmt.where(stmt.c.foo == 'bar')
   stmt = stmt.where(stmt.c.bar == 'xyz')
That above pattern is one I've seen people do even recently, using the "select().c" attribute which from very early versions of SQLAlchemy is defined as "the columns from a subquery of the SELECT" ; this usage began raising deprecation warnings in 1.4 and should be fully removed by 2.1 as it was a remnant of a much earlier version of SQLAlchemy. it will do exactly as you say, "make a subquery for each filter condition" and will produce disastrous queries. people have started tripping over it recently as a result of 1.4 / 2.0's move away from the ORM `Query` object back towards `select()`.

the moment you see SQLAlchemy doing something you see that seems "asinine", send an example to https://github.com/sqlalchemy/sqlalchemy/discussions and I will clarify what's going on, correct the usage so that the query you have is what you expect, and quite often we will add new warnings or documentation when we see people doing things we didn't anticipate (such as in this case, I thought we had removed .c by 2.0 but apparently it's still just deprecated. it will be gone in 2.1).

Agree, my datasets have multiple billions of rows and if I don’t know the details of the query, or have the ability to tune it, it’s utterly insufficient for my needs.

I still fail to see how anybody who actually knows sql and works with “Big Kid” datasets would use an ORM.

When I've worked with datasets that size, the common operations tend towards the analytic or bulk operations, IME. That's not the sweet spot for an ORM, and SQL will often make a lot of sense. But, and again this is my experience, you often want to work with some more traditionally sized slice of that data, doing CRUDish things, and ORM can exist alongside the nicely tuned bulk or analytic operations. If you have the pure data operations side of a DB that size, I agree ORM isn't it.

Yeah I agree here. If you're doing mostly transactional work then it can work for the 99%. As soon as you step towards analytics queries you start to see things fall apart and it's better to hand-tune your queries.

In my recent projects that's essentially what I do - I use SQLAlchemy for very basic CRUD and then anything analytical gets handcrafted.

Doesn’t that increase complexity? I just have a database wrapper class and keep all my db code segregated there. In my case also using an ORM would blow up that segregated approach.

It's very very easy to keep the sql alongside models and other business logic. SqlAlchemy lets you drop down to fairly complex sql while remaining in Python, and will even take in a direct sql query as well.

The added complexity has always been a non-issue for me. Even a simple `sql.py` next to `models.py` works really well, for a pretty straightforward solution.

I’m using SQLAlchemy in my job, and have worked with Python for many years. Never have I seen a good case of someone using SQLAlchemy to hydrate objects from raw SQL queries. I’ll definitely admit — I have not gone out of my way to search for it. It seems that it is a common want to do this kind of 60% benefit ORM you speak of, but it’s definitely unclear to me how to pick those parts together with the daunting (and fantastic) piece SQLAlchemy is.

My only caveat to what you are saying, is that I have yet to see anything that successfully scales up to many tables. I can also count on one hand the number of products I have seen that successfully migrated between big databases in a meaningful way without a ton of ancillary rewriting in the process.

That said, I fully agree that the ORM isn't necessarily the problem. I point the blame at over eager data modelling that ignores the OLTP/OLAP divide. Or that ignores that row creation has a different contract than row editing, such that using the same object for both leads to confusion with missing fields. Heck, even different edits have different contracts, such that mapping a single "object" to a database row is often a mistake.

> writing SQL for CRUD is really repetitive and tedious

If you think of relational DBs as just CRUD machines, an ORM makes total sense, but that's the original mistake.

Is it a mistake if that's all you need?

I'm kind of on the side of avoiding ORM unless you have a clear need for it. But I've seen projects where they are very valuable. If you mainly want to register, update and delete a bunch of structured data, it's not a bad idea to put it in a dabatase. And if you do those operations a lot in some part of the application, it's not a bad idea to use an ORM there. For analysis and reports and such I would probably suggest just writing SQL directly though.

Yeah, part of an application might be CRUDing big pieces of structured data, and for that I think jsonb is the best compromise (https://news.ycombinator.com/threads?id=hot_gril#37123355). Comes with the limitation that you can't have FKs into the objects, but you shouldn't anyway.

It's hard to be sure that CRUD is all you need. Maybe you'll be fine, maybe you'll get stuck between keeping a slow DB or rewriting everything (I've seen this movie several times). It doesn't cost much to go proper relational from the start, especially cause that involves the least tooling/boilerplate. And if you're really sure you only want objects, NoSQL DBMSes are actually designed for that.

I think that’s part of why I’m having trouble with this framing - it’s treating CRUD as the entire universe of why you’d need to connect to a database.

Some of us do very intense compute in very large datasets, and ORM are not capable in those tasks. At all.

Doesn't even have to get very intense. A simple timeseries schema gets slow/annoying quickly with an ORM.

I think for a lot of things an ORM is “good” at, you could likely just have a bunch of static JSON files and retrieve them from disk with zero practical difference.

I just can’t wrap my head around why you even *need* a database in those cases.

Honestly I wouldn't deal with JSON files on disk except as a last resort. Aside from convenience, you need a DB for transactional reads/writes. But you don't need an ORM for that anymore because Postgres and MySQL have json/jsonb types.

> It's about the rows and objects, moving the data from the objects to the INSERT statement, moving the data from the rows you SELECTed back to the objects. Not to mention abstraction

That's actually the problem.

I'm sure you're aware that ORMs have this fundamental problem called the ORM impedance mismatch problem. AFAIK it remains unsolved to this day.

> You can use SQLAlchemy (which I wrote) and write all the SQL 100% yourself as literal strings

This is true, but it's like saying you can disassemble a Boeing 737 to build chopsticks. It's true but completely unnecessary.

Thanks for you perspective, Mike. Completely agree that the interface between data and code should be handled by a single tool. That tool must meet some minimum complexity, because it’s solving a very hard problem! Also just want to say that my team has benefited greatly from your work on SQLAlchemy, and we appreciate you immensely!

Honestly, sqlalchemy is such a different breed of ORM, whenever people slander ORMs, I'm imagining they're thinking of like Django ORM or ActiveRecord which are like the duplos to SQLalchemy's Legos.

> I'd sure want to automate all that tedium. I'd want a one-to-many collection too maybe.

You don't actually need an ORM for that. And some of us (I know I'm a minority) actually want to have pure data objects. Some of us (talking about myself again) actually design schema-first - as in, you use DDL and your code just consumes the definition. Some of us don't care about code knowing about relationships which are an intrinsic part of the data model.

> You can use SQLAlchemy (which I wrote) and write all the SQL 100% yourself as literal strings, and still use the ORM

SqlAlchemy is an amazing piece of software, and it is relevant becaise it is actually part of the problem. You don't really have an ecosystem of database abstraction libraries in python, as you have in other languages. The options are SqlAlchemy or Django ORM. Golang? You have 50 different libraries that implement 100 ways of doing the same stuff. C#? You have probably a dozen libraries besides Entity framework that provide different level of abstractions? Java? You got low level stuff up to high level ORMS. PHP? The same.

Python? You have SqlAlchemy, Django or roll your own. Both SqlAlchemy and Django can do raw queries with little effort - but if you're doing this, arent't you discarding most of the functionality that comes with it - namely, code maintenance? If you have a big project with a bunch handwritten queries (because you can do it, and quite easily, I may add), aren't you just ignoring any of the advantages of the orm? If, when you edit a model, you need to also check handwritten sql queries, you shouldn't be using an orm in the first place.

And with SqlAlchemy and Django, there is no middle ground. You either are all in, code-first definitions, or you are in a world of pain.

That is why specialized applications tend to be rewritten in other languages. Because optimizing the data layer alone often boasts 10-100x performance increase, when compared to this kind of abstraction.

> writing SQL for CRUD is really repetitive and tedious

It is. But between that and SqlAlchemy - in most languages, there are options. Not in Python. Between orm and no orm, I'll take no orm any day of the week - at least I can understand the queries.

I picked up some SQL knowledge through osmosis by using ActiveRecord and I do wish there was a better connect between what the ORM did and the end result. There were some tools to see what code generated what queries but it was not that intuitive and in your face like for example how you would have to deep dive frequently in your generated front end javascript and css code.

In other word I think there are some tooling left on the table that can assist in increasing SQL literacy and comprehension.

> The SQL is not really the point. It's about the rows and objects, moving the data from the objects to the INSERT statement, moving the data from the rows you SELECTed back to the objects.

If anyone is looking for this "pure" ORM (no query builder API) in Node there is https://github.com/craigmichaelmartin/pure-orm

I’m sorry but 90% of the time I encounter somebody who swears by an ORM I’ll figure out the reason they use it is because they didn’t know SQL to start with and didn’t commit to learning a new language.

The number of people who know SQL well and still choose an ORM seems to be very, very low in my experience.

Low, maybe, but we’re out there. Sqla just saves me so much time in general. Sometimes it’s easier to drop down a level, but for 99% of my db interactions the code will be significantly shorter using sqla, so I choose to use it.

Fair, I always figured there were some people in that bucket, just not that many!

I think I’m also just cranky over the general trend of junior devs who know a “framework” but none of the underlying concepts or code and can’t cope when their “framework” doesn’t…work.

Do we know what Hibernate did wrong? It used to be very popular among Java engineers but later seemed have become an obscure technology.

Do you know why? In our company we still use hibernate heavily in our java web app

badass mic drop. i will refer to your comment whenever i see this discussion coming up again and it will come up again as long as people are still learning the art of software development.

If you're going to end up querying all the fields and putting them into a model like this dataclass anyways... Django can do that for you. If you're going to later pick and choose what fields you query on the first load, and defer other data til later.... Django can do that for you. If you're going to have some foreign relations you want to easily query.... Django can do that for you. If you're doing a bunch of joins and are using some custom postgres extensions for certain fields and filtering... Django can help you organize the integration code cleanly.

I totally understand people having issues with Django's ORM due to the query laziness making performance tricky to reason about (since an expression might or might not trigger a query depending on the context). In some specialized cases there are some real performance hits from the model creation. But Django is very good at avoiding weird SQL issues, does a lot of things correctly the first time around, and also includes wonderful things like a migration layer.

You might have a database that is _really_ unamenable to something like an ORM (like if most of your tables don't have ID rows), but I wonder how much of the wisdom around ORMs is due to people being burned by half-baked ORMs over the years.

I am curious as to what a larger codebase with "just SQL queries all over" ends up looking like. I have to imagine they all end up with some (granted, specialized) query builder pattern. But I think my bias is influenced by always working on software where there are just so many columns per table that it would be way too much busywork to not use something.

The main problem I've encountered with complaints surrounding ORMs usually tend to be the result of trying to overfit the ORM in a certain way.

ORMs are, for the most part, good at the CRUD operations - that is to say, they easily translate SELECT, UPDATE, INSERT and DELETE operations between conventional class objects and database rows.

Things they usually aren't very good at are when you start trying to do things that require a lot of optimization - it's very easy to have an ORM accidentally retrieve way more data than you need or to have it access a bunch of foreignkey data too many times (in Django you can thankfully preload the latter by specifying it in a queryset). That's less an issue for basic CRUD, but is an issue if you're doing say, mass calculation and only need one column and none of the foreignkey data for speed reasons.

Basically - an ORM is good but don't let yourself feel suffocated by it. If it's not a good fit for an ORM, then don't do it in the ORM, either use SQL code to do it in the DB server or do a simpler SELECT (in the ORM) and do the complex operation in your regular application before INSERTing it back in the db (if that's a goal for the operation anyway). If it's outside of the CRUD types of DB access already, the extra maintenance overhead you get from having non-ORM database code (if you're doing the SQL approach) in the application would be there anyway, you'd just get a very slow application instead of a hard error, and the latter is easier to troubleshoot (and often fix), while with the former you need to start pulling up profiling tools.

>Things they usually aren't very good at are when you start trying to do things that require a lot of optimization

I find this ends up being, like, 1 or 2% of queries. It's also very hard if not impossible to guess which queries will end up in that group.

You're better off building it with the ORM first and breaking out SQL later when you are trying to performance optimize.

There is also a small % of queries which use some feature of your database engine which the ORM won't support.

I find this ends up being 90% of SELECT queries. Usually when selecting data you want to retrieve a bunch of related objects too. Often with complex criteria for which objects to pick. And doubly so for "list" type endpoints where you're selecting many records.

I tend to use the ORM function CUD operations, and just write raw SQL for SELECTs unless they're super-simple.

> It's also very hard if not impossible to guess which queries will end up in that group. > You're better off building it with the ORM first and breaking out SQL later when you are trying to performance optimize.

I disagree on this. It's almost no extra work to just write these optimised in the first place (if you're not trying to squash everything into an ORM workflow). So it makes sense just to write them all optimised. Ditto for doing batch inserts and updates rather than looped inserts/updates (although you can usually use the ORM for this).

> It's almost no extra work to just write these optimised in the first place

Not everyone is capable of quickly optimizing SQL, and I don't think it's an absolutely necessary skill to build a decent application. Junior devs can pick up on this skill over time and as long as they can manage to avoid any obvious footguns, using the ORM is fine most of the time.

Writing queries that are optimized in the first place just means now you have to maintain a bunch of SQL and you have to rely on anyone modifying that SQL later understanding those optimizations. Sometimes it's necessary, but if it's not, I think it's much nicer to stay in ORM-land even if the query might not be optimal.

> Not everyone is capable of quickly optimizing SQL

I mean that's true, but equally not everyone is capable of using an ORM. I don't think SQL is inherently any harder to learn.

At my last job, I had juniors who had never used SQL at all productive in SQL within a couple of weeks, and using "complex" SQL like JSON aggregation and windows function with a few months. They were a little intimidated by it when they started, but didn't find it too hard to learn in the end.

Honestly, I think I could get someone with no raw SQL experience writing code that’s at least 50% faster within a few hours. There are so many footguns that ORMs completely ignore, and never warn you about.

Using a case-sensitive filter (default for Django) in a DB with case-sensitive collation (default in Postgres)? Django will helpfully cast the tuple and your query to UPPER to match it for you, and the former wrecks indexing.

Checking if a string ends with something else? Goodbye, index.

I _think_ the latter can be worked around in PG with a GIN index, but I’m not positive (I work with MySQL much more). And in any case, you’d have to know to create that, and I imagine most devs won’t.

Fixing seemingly tiny things like that have a massive impact on large table query speed.

With your example of optimization, I think anyone who is able to make this optimization could certainly also implement it on top of an ORM.

>I find this ends up being 90% of SELECT queries. Usually when selecting data you want to retrieve a bunch of related objects too. Often with complex criteria for which objects to pick.

What is it that you do here that can't be handled by, say, django's workhorses - filter and select_related?

If it's impossible to write 90% of your queries in an ORM my suspicion would be that you're either not using the ORM correctly or you're using a crappy ORM.

I'm unfamiliar with Django's ORM specifically. But the problem wasn't that it couldn't be done with the ORM, but that the ORM code quickly became unreadable. Things like:

- Complex joins

- Complex WHERE clauses with mixes of AND and OR (with parentheses)

- JSON aggregation

- Window functions

tend to require quite heavyweight syntax in ORMs (e.g. nested lambda functions). Whereas the corresponding SQL tends to introduce much less noise.

It's basically just another case of a dedicated language being nicer to use than a DSL embedded into a general purpose language. Normally it's not worth creating a whole language just for nicer syntax, but in the case of SQL the language already exists! So why not use it.

>But the problem wasn't that it couldn't be done with the ORM, but that the ORM code quickly became unreadable.

This is the problem with not using an ORM. If you cut it out and move everything to parameterized SQL queries the SLOC explodes which massively inhibits readability as well as introducing bugs.

If your issue with ORMs is just that you're familiar with SQL and you don't like how ORMs look then I think the issue is just about becoming more familiar with a decent ORM.

> If you cut it out and move everything to parameterized SQL queries the SLOC explodes

My experience has been the opposite: that raw SQL queries end up much shorter (and consequently more readable) than the equivalent ORM code. The exception to that is INSERT/UPDATE queries, where I do tend to use some kind of ORM/query builder. I have used both, and I prefer raw SQL for anything beyond very simple queries.

I have never seen an ORM in my life that didn't reduce the total amount of code written. Not even the Java monstrosities increased the SLOC.

Perhaps it depends what you're doing?


    .where('column_a', '=', 'value1')
    .and(q => q.isNull('column_b').orWhere('column_b', '=', 'value2')))
is a lot less readable than:

       column_a = 'value1'
       AND (column_b IS NULL OR column_b = 'value2')

In Django that would be .filter(column_a='value1', Q(column_b__isnull=True)|Q(column_b='value2'))

And obviously you can use whatever indentations you like.

Ah, that's quite a bit nicer. You can't do that in JavaScript on two counts:

- No keyword arguments

- No operator overloading (so you can't override | to get the nice "or" syntax)

I actually messed it up a little because I'm not sure you can mix positional and kwargs in filter, and you definitely can't use kwargs first. Still, the idea is there.

In JS, theoretically you could design an API like

.where({column_a: 1}, Q(column_b__isnull=True).or({column_b: 2}))

Which really isn't bad IMO

It's been a bit since I've used Django, but I believe you can just swap the order so the kwarg comes last.

    filter(     Q(column_b__isnull=True)|Q(column_b='value2'),
Or just turn it into another Q

    .filter(Q(column_a='value1'), Q(column_b__isnull=True)|Q(column_b='value2'))

The docs show both of those as working examples, but Q supports using & for "and", and can be combined in any way - you could also do this:

  .filter(Q(column_a='value1') & (Q(column_b__isnull=True) | Q(column_b='value2')))

One of these my IDE can typecheck and apply code hightlighting, the other is just a blob of text.

On my case that would be SQL, as I use nice SQL aware IDEs for Oracle and SQL Server.

Separate chain for a different Django example, it only natively supports joins on explicit foreign keys, but because it has that extra information in the model the syntax for using it is extremely compressed. Let's say you have a "Dog" table with a foreign key to "Owner", and "Owner" has a foreign key to "City". Getting all the dogs that live in New York would be:

   Dog.objects.filter(owner__city__name = 'New York')
The double-underscore follows the foreign keys until the last one, which is a field on the last model.

It's a ton of extra work. It's like 10x slower than using the Django ORM. Let's say it takes you ten seconds to write that query. I wrote it in one second with the ORM and my IDE.

That adds up, with almost no downside most of the time.

The problem is that what you typically get is one group of people who have no idea how to do anything outside of the ORM and play off the problems because hey, its only a problem query or two that use up 100% of the system resources of the databases and bring them crashing down, but might as well throw more system resources at it because we've spent no time understanding queries the last N years of building.

Or you use something much simpler and more light weight, like query builders or a tool that generates code from sql queries.

This approach is more bottom up. You end up with uni directional data flow, better separation of concerns, data coupling instead of object dependencies and better performance right out of the bat.

The cost? In my experience just some basic familiarity with SQL.

Nah, ORMs just encourage a lot of bad behavior, and come with edge cases and code bloat. You're better off using an API generator such as postgrest/hasura for the simple cases, and hand crafted queries for anything more complex than basic crud.

I realized too late that objects, as of now, are not capable of synthetizing a new class/type based on joins.

It's strange that this point doesn't get more attention.

ORMs by their nature tend to be built around a 1:1 mapping between fields on some object type and columns in some table. Bulk queries get you multiple objects corresponding to multiple rows. Relationships get you multiple objects with some of the fields being references to the other objects. Obviously I'm simplifying here and there have also been some attempts to do things in other ways but this is basically how most of the popular ORMs work today.

However in reality a lot of useful queries return a list of flat data structures or even just a single flat data structure with some subset of the columns of all of the relevant tables and maybe a few extra columns that are calculated on demand and not stored directly in any database table. If that's the data I've read then what I really want is something like a properly-typed dataclass with exactly those fields/columns and nothing else that might add confusion or ambiguity.

Unfortunately that doesn't really fit the classic ORM and OO model. Instead we often have to work with multiple objects with some form of nesting to follow the relationships, ambiguity about which fields have actually been read from the database and can safely be accessed, possibly some inaccuracy with the types such as nullable fields that have just been read from not null columns in the database, and a lottery to see what happens if we try to access fields on those objects that might not have been read by any previous database query anywhere in the system at any point since that particular ORM-backed object was created.

I find it's one of those things where the popular approach - using an ORM in this case - works for relatively simple needs and in practice a lot of work does only have relatively simple needs so that's OK. But when I start doing more complicated things it can become a pain to work with because the whole model fundamentally doesn't fit what I'm actually doing.

Something like this is implemented in russian 1C system. They use an extended query language (bilingual as the whole system) and the query executor returns a special dataset object with all values wrapped into regular business-logic classes. So when you “select …, agent, … from … join …”, you can access record.agent.manager.phone in your code later. Everyone knows the difference between selecting it in query and in code. All primitive types get wrapped too: dates to dates, bools to bools. It has no static typing, but the query result fields are all of “platform” types, not raw values like ids or json/int dates.

Don’t get me wrong, 1C products are regular enterprise crap on top of shitty language that stuck in the last century. The latest attempt to refit it as-is to www was a paradigmal disaster. But the platform (the runtime) itself may teach Django a volume or two about query integration. They do it since the '90s and 1C developers who traditionally weren’t even considered developers had no issues with programming these systems without any deep stack knowledge. There’s no stack basically, it’s all homogeneous once you learn the fundamentals.

Edit: yes, I find it very strange too. There’s no popular/generic and simple open source platform which could bind it all together into a nice runtime. The whole ORM vs SQL and pitfalls feels so strange, as it’s not something hard in my book.

Django lets you define an abstract model for the resulting set of columns, then you can use raw SQL on that model to get something that looks like a normal model to the rest of the code. As long as the raw query has the right number of columns, and of the right data type, Django doesn't care how it's populated. Then you can just stick the query behind a classmethod on the abstract model so you don't have to worry about the columns not matching up wherever it's used.

I occasionally do this with views managed by https://github.com/xelixdev/django-pgviews-redux/.

very nice

how is it called in the docs ?

If I understand GP correctly, it's 'performing raw SQL queries', or specifically here, 'mapping query fields to model fields': https://docs.djangoproject.com/en/4.2/topics/db/sql/#mapping...

Basically you call `.raw("...")` from some model's queryset, but there's no requirement you actually query that model's table at all.

    class SomeModel:
        name = models.CharField()

    class OtherModel:
        foobar = models.CharField()

    SomeModel.objects.raw("select foobar as name from thisapp_othermodel")
will yield `SomeModel`s with `name`s that are actual `OtherModel` `foobar` values.

Yes, but also it's been a long time since I've had any reason to do this, and had gotten "managed = False" mixed up with abstract classes. Abstract classes won't let you do this, but you probably want "managed = False" to prevent migrations from doing stuff in the database, if it's going to be a reporting-only query that doesn't have a backing table.

Also you need to return an "id" column since Django needs a primary key.

On the flipside, you can put that query in the database as a VIEW and point the model at it, also with "managed = False".

Oh yes I forgot about that, that can be annoying. But of course, when it's annoying because it doesn't matter, it doesn't matter and you can pass anything back `as id`.

The most stupid issues with "active record" type ORMs is the implicit queries on member access, especially in collections - leading to the N queries problem. But in SQLAlchemy one can actually turn that off - that is, make it throw an exception when undeclared table dependencies are attempted to be accessed. This restores sanity. And one gets to keep goodies you mention, plus Alembic migrations (mentioned by another). Also one can write direct SQL too with SQLAlchemy, or use the "core" layer to keep a DSL but avoid ORM.

> I am curious as to what a larger codebase with "just SQL queries all over" ends up looking like. I have to imagine they all end up with some (granted, specialized) query builder pattern. But I think my bias is influenced by always working on software where there are just so many columns per table that it would be way too much busywork to not use something.

You end up writing a query per usecase, rather than writing generic queries that can be stitched together however in the business logic

The first agency I worked at did this on their Java projects. They should have just used a fully baked ORM. Basically, they ended up creating a massive query layer in the program which contained all the different queries organized into different interfaces. To edit a simple API endpoint you would have to open like 5 different files at a minimum. And because queries were usually tailored to logic in a specific controller, they were not typically reusable. It was always a relief to go back to Django after dealing with that.

Have this right now. Fortunately no custom query builder. It's enough of a hell as is. Cleanup will take years.

What problems are you finding with it?

Injection potential through the roof, copypasta galore, refactoring the same join pattern in a hundred different queries gets old rather fast… the usual suspects.

Oh god, good luck

So what happens is that you have the one query that runs for a specific page and fetches the data and the relevant fields? I could definitely see that working for many projects, at least while your objects don't have too many tiny little details to pull out of thhe DB

I think it ends up being hybrid, like seen in CQRS and especially with DDD. Toss in "Vertical Slice Architecture" as well. What abstractions you want can decided on a per command or query basis and it feels natural.

That is what stored procedures are for.

> issues with Django's ORM due to the query laziness making performance tricky to reason about

It's infuriating that this is still not a thing you can disable (https://code.djangoproject.com/ticket/30874). Pretty much my only gripe with the Django ORM which I'm a huge fan of (and I also write lots of SQL).

There's a library that solves this for you, thankfully https://github.com/charettes/django-seal.

I read this title and immediately thought "...but why wouldn't you just use Django?"

Having written the sort of SQL-inline code the author talks about, then refactored the whole thing to use Django: Django's ORM solves waaaay more problems than it creates in this regard.

There are plenty of advantages of using a dataclass, being the most obvious the fact that behaves like a pure data object (aka it doesn't have underlying associated resources). Serialization/deserialization of data is dead simple, and a dataclass is a construct you can use as a data object when building 3-tier applications. Having pure data objects also gives way more flexibility when implementing cache strategies.

While this separation isn't common in the Django ecosystem, it is very common in enterprise application design (regardless of usage of an ORM). On complex applications, Django models are often a leaky abstraction (not only because the mentioned resource connection problem, but also issues like for relations they require the inclusion of the target model, it cannot be lazy-loaded; a good example is a nullable foreign key to an optional module that may or may not be loaded), and they actually behave like a variation of the ActiveRecord pattern, that mixes two different scopes - data and operation on data. In many cases this is ok, but in many others this is a problem.

I personally use a repository pattern, coupled with a query builder and something vaguely similar to dataclasses (its a bit more complex in the sense that the data object attribute name can be different from the database field name). It is basically an object mapper with a non-related repository class.

Yeah I can understand wanting to split out the ORM model from a separate class that holds data. I just think absolving oneself of an ORM or query builder entirely for a DB schema that doesn't (glibly) fit on a postcard feels like a good way to generate a lot of busy work.

I somewhat disagree about your point on caching. If you're working with models (that, namely, are 1:1 with DB rows) stale object problems are a reality no matter what, and having the ID be put into a pure data object generates the same issues. But these are things that are not very interesting to discuss outside of specific contexts.

I am a bit of a functional programming nerd, but I've just found that for Python stuff in particular, swimming upstream is its own bug generator relative to writing concise stuff in a very imperative fashion. Using the fat models directly is a part of that calculus for me, but YMMV and every team has different strengths.

> I just think absolving oneself of an ORM or query builder entirely for a DB schema that doesn't (glibly) fit on a postcard feels like a good way to generate a lot of busy work

True, that's why I built mine as a library I can reuse in my projects :) I'm still eating my own dogfood, but doing it with a framework approach.

> If you're working with models (that, namely, are 1:1 with DB rows) stale object problems

One of my common patterns is to implement cache at the service layer, not the data layer - and all data operations are performed via services. This allows caching of actual business-domain computed values, not necessarily just db rows (in fact, more often than not, caching just db rows is just a waste of memory with little to no advantage). As a quick example, imagine a purchase order with a header, a list of products and a state associated with each line - it is trivial to cache the whole purchase order info, including runtime calculations such as lead time per product, and invalidate the cache at each update operation on the different tables that may represent this purchase order. Services would have methods manipulating "purchase order" (and keeping cache state) and not ad-hoc code messing with OrderHeaderModel, OrderDetailModel, OrderProductStatusModel, etc.

> I am curious as to what a larger codebase with "just SQL queries all over" ends up looking like. I have to imagine they all end up with some (granted, specialized) query builder pattern.

One successful pattern I’ve seen treats the database as its own service layer. Service code does not send arbitrary SQL to the database—instead, all of the SQL queries are set as stored procedures in the DB. People sometimes freak out when I say this, so I should clarify that the stored procedures, table schemas and other things of that nature were version controlled and deployed with some minimal build tooling we’d developed in house.

I really liked this pattern. I think anything that you need to talk to across a network should be treated as a service in itself, with a well defined interface. This simplifies testing and monitoring as well. The big risk with an ORM, architecturally, is that you end up treating your database as a sidekick to your service code, or even a dumb data store—some shameful implementation detail your service keeps locked in the basement—when they’re capable of much more than that.

The only problem I have with using the django ORM is that it relies on the django project structure. While there are ways to use the ORM independently, they are full of hacks and trade-offs.

Granted, this problem goes away if you are building a web app or a REST API, but if I just want an ORM for a command line application, I am using django's management command functionality which is OK, but it doesn't really scale easily.

Yuck indeed. The way it dictates the order of imports, forcing you to import settings before any models can be imported, is reason enough not to use it. This problem spreads to any of your other files, leaving you in the end with everything depending on being launched in a full Django context. Shame, given it’s otherwise very user friendly.

Add refactoring, migrations and testing to all the reasons you mentioned, and it quickly becomes an adapted case of Greenspun's Tenth Rule for ORMs.

Migration tooling and SQL testing predate ORMs, it is a matter of actually caring about their existence.

This is just reimplementing Djangos ORM, but badly.

ORM queries compose. That's why [Python] programmers prefers them. You can create a QuerySet in Django, and then later add a filter, and then later another filter, and then later take a slice (pagination). This is hugely important for maintainable and composable code.

Another thing that's great about Djangos ORM is that it's THIN. Very thin in fact. The entire implementation is pretty tiny and super easy to read. You can just fork it by copying the entire thing into your DB if you want.

This is what I came here to say.

For example, I'm working on an project now that long ago added a "sellable things" store that used plain sql. There are many, many stores like this one, but it did some logic to figure out what items are sellable and return the set. Easy, developer happy, ticket closed.

Some time later, it was needed to have "sellable items of a specific type." Well the "sellable things" store was too much to clone so the developer simply pulled all the sellables and filtered in memory. Hey it's Go so it's fast right?

This continued for a couple years and now I'm joining a project with a p99 of >15s. It would have been a natural fit to return a query set of sellable things and the other callers could further refine it however they wanted. Now I'm looking at a ball of logic that should have been in the database and it's beginning to break down at scale.

This article is just that pattern with syntax sugar. It will lead to sadness.

> This is just reimplementing Djangos ORM, bud badly.

I guess this is a good thing, as "reimplementing" Django's ORM is the opposite of what I wanted to do here :)

> ORM queries compose. That's why [Python] programmers prefers them. You can create a QuerySet in Django, and then later add a filter, and then later another filter, and then later take a slice (pagination). This is hugely important for maintainable and composable code.

I don't really disagree, but there are many ways to skin a cat. You can absolutely write maintainable code taking this approach. In fact, I can build highly testable, unit, functional, code following a abstraction very similar to this. The idea that "maintainable and composable code" can only be achieved by having a very opinionated approach to interacting with a database, is flimsy. I offer a contrary point of view: With the Django ORM, you are completely locked in to Django. You build around the framework, the framework never bends to your will. My approach is flexible enough to be used in a Django project, a flask project, a non web dev project, any scenario really. I want complete isolation in my business logic, which is what I try to convey just before my conclusion.

Djangos ORM isn't highly opinionated. That's just wrong.

> With the Django ORM, you are completely locked in to Django

Another bit of nonsense again. You have a dependency. Sure. Just like you have a dependency on Python. But it's an open source dependency, and the ORM part is a tiny part that you can just copy paste into your own code base if you want.

Also, worrying about being "locked into" something that you depend on is madness. Where does it end? Do you worry about being "locked into" Python? Of course not.

> You build around the framework, the framework never bends to your will.

You don't actually seem to understand Django at all. It's just a few tiny Python libraries grouped together: an ORM, request/response stuff, routing, templates, forms. That's it. You do NOT need to follow the conventions. You can put all your views in urls.py. You can not use urls.py at all.

You do NOT bend to the frameworks will. That's just false. You bend to it by your own accord, don't blame anyone else on your choice.

> Djangos ORM isn't highly opinionated. That's just wrong.

It's a fully featured ORM... Including migrations, query API (which is HIGHLY opinionated, it looks nothing like SQL), supports async (via asgiref!), custom model definition... It's almost the definition of opinionated. Not that you can build a fully featured ORM without being opinionated. That's not a dig at Django btw.

> Also, worrying about being "locked into" something that you depend on is madness. Where does it end? Do you worry about being "locked into" Python? Of course not.

Ermm, my premise is that you DON'T have to depend on it. It's not that crazy to not want lock in when it comes to the software that handles my database. Other programming language communities seem to handle that just fine.

The rest is a bit too ad hominem for my liking, so I'll pass.

Can you explain a bit more about the Django ORM being very thin and easy to read? It does seem like the Django ORM is thin (from an architecture perspective), but it doesn't seem to be small, it seems to be pretty big. Maybe I'm not understanding it though, so here's what I see:

The "ORM" part of Django seems to be everything in `django.db.models.Model`, which seems to require you to declare your Models as subclasses of the aforementioned class. Looking into that code though, it seems like the implementation supporting all this is around ~20,000 lines of Python: https://github.com/django/django/tree/main/django/db/models

That doesn't strike me as a super lightweight. For comparison, all of Flask (a Python WSGI web app framework, but mostly a 10+ year old project to compare to, and excluding tests) is ~4,000 lines of Python.

Is there a small subsection of the code in `django/db/models/` that is all that's necessary to use the ORM part? Or maybe I'm missing something about the "core" of the ORM?

It supports several backends and lots of stuff including migrations. It's thin for what it does.

> For comparison, all of Flask

That's... not a reasonable comparison. Flask does basically nothing. Of course it's small. And it does something totally different so why compare?

Ha, should have read the comments before I wrote mine. Yep, it's this composability aspect that never seems to have occurred to the authors of this kind of think-piece.

I used to be pretty anti-ORM myself because I loathed the complexity of ActiveRecord (in the Rails world), but then I discovered arel, the nice composable relational query builder underneath, and saw the light. A composable layer of abstraction over SQL is critical in an application. (I still prefer raw SQL for analytical queries.)

Just like stored procedures compose, like in any procedural programming language.

Seems like there's 3 groups of opinions on ORMs:

Firstly (1); "I want to use the ORM for everything (table definitions, indexes, and queries)"

Then second (2), on the other extreme: "I don't want an ORM, I want to do everything myself, all the SQL and reading the data into objects".

Then thirdly (3) the middle ground: "I want the ORM to do the boring reading/writing data between the database and the code's objects".

The problem with ORMs is that they are often designed to do number 1, and are used to do number 3. This means there's often 'magic' in the ORM, when really all someone wanted to do was generate the code to read/write data from the database. In my experience this pushes engineers to adopt number 2.

I'm a big fan of projects like sqlc[1] which will take SQL that you write, and generate the code for reading/writing that data/objects into and out of the database. It gives you number 3 without any of the magic from number 1.

[1] https://sqlc.dev/

I'm in a 4th camp: we should be writing our applications against a relational data model and _not_ marshaling query results into and out of Objects at all.

Elaborations on this approach:

- https://news.ycombinator.com/item?id=34948816

- https://github.com/papers-we-love/papers-we-love/blob/main/d...

- https://riffle.systems/essays/prelude/

Nice share, thanks.

I was never more productive than when using Access (and dBase II before that). Why can't we have that?

My theory: Something was lost in the jump from workgroup computing to client/server.

Imagine if Access was rebuilt on top of a client/server stack. That's kind of what Riffle is trying to do.

I've been (slowly) working on the persistence stuff. It (mostly) moots the SQL vs ORM vs query builder slap-fight.

I've got some notions (and POCs) about UI, mediated via HTTP & HTML.

I'd love to have some CRDT-like smarts; learning more is on my TODO list.

I'm still thinking about the "reactive" part. I haven't imagined anything past Access VBA style programming. I'm struggling to envision a FRP-meets-CRUD future perfect world.

> I was never more productive than when using Access (and dBase II before that).

I've never used access and have 0 familiarity with it. Are there any examples I could look at?

Still on the 4th camp?

Using stored procedures and triggers as much as possible.

I'm pretty firmly in #2... it's relatively straight forward in a scripting language, and easy enough with something like C# with Dapper. In the end ORMs tend to over-consume, and often poorly. And even when they don't in most cases, they start to in more difficult cases. That doesn't even get into the amount of boilerplate for ORMs. You have to buy in to far more than their query model(s).

Maybe those are the main/popular groupings. Where I fall is that I want typesafe constructions of queries that match the current schema. The query compositions should follow the SQL-style structure so there's no 'shape-mismatch' composing the query using the library. Some may not consider this to be an ORM (though it does map relations to objects).

There is definitely a fourth category -- "I want to build database queries natively using the paradigms of the language I am developing with, without use of SQL or an intermediary which translates into SQL."

> ...this pushes engineers to adopt number 2

Yup. Use of HQL/JPQL is proof that you shouldn't be using an ORM. Just use SQL instead. And once you have some SQL, it's just easier (overall) to do all SQL.

Well put!

Any serious application beyond the example given in this article will include conditional SQL constructs which go beyond SQL query parameters and will therefore require string formatting to build the SQL.

Think a simple UI switch to sort some result either ascending or descending, which will require you format either an `ASC` or a `DESC` in your SQL string.

The moment you build SQL with string formatting is the moment you're rewriting the SQL formatter from an ORM, meeting plenty of opportunities to shoot yourself in the foot.

> The moment you build SQL with string formatting is the moment you're rewriting the SQL formatter from an ORM, meeting plenty of opportunities to shoot yourself in the foot.

I used to think this, but at my last company we ended up rewriting all these queries to use conditional string formatting as we found it much more readable. The key was having named parameter binding for that string, so you didn't have to worry about matching up position arguments. That along with JavaScripts template string interpolation actually made the string-formatted version pretty nice to work with.

Sounds like just begging for SQL injection attacks.

Values were still provided separately. The string-interpolated SQL would include a placeholder just like static SQL does. That's pretty easy to audit for in code review: no variables in interpolated code.

That makes no sense. What are you interpolating? Some variable. And you now have to audit that THAT VARIABLE is safe.

    sortable_fields = ["name", "age", "gpa"]
    selected_filter = sortable_fields[form.filterIndex]

    if form.sortBy == "asc":
      query += "ORDER BY {} ASC"
    elif form.sortBy == "desc":
      query += "ORDER BY {} DESC"
Doesn't have any opportunity for SQL injection unless you have rogue programmers able to change code running in prod.

> What are you interpolating? Some variable.

Nope, I'm generally interpolating an inline expression consisting entirely of string literals.

> generally

Does not inspire confidence.

In the rare case that you're interpolating a variable, you'd need to audit it in review. This is similar to carefully auditing the rare use cases of raw SQL expression when using an ORM.

You can avoid this entirely with JavaScript's tagged template literals. Here is an example library: https://github.com/blakeembrey/sql-template-tag

There’s a world between a query builder and an ORM. The point of ORMs isn’t to build queries, if that’s the only need might as well just use a query builder which is a lot more lightweight and doesn’t come with all the downsides of orms

The OP literally says to ignore query builders, not just ORMs. When they state “just write SQL” that’s their actual thesis.

What you describe just needs a query builder (e.g. in Java something like jOOQ), not necessarily an ORM.

OK but TFA is not just against ORMs, it’s also against query builders. That’s what GP is replying to.

Depends on the abstraction... for example .Net's extensions for LINQ are pretty good at this, I haven't generally used the LINQ syntax, but the abstraction for query constructs are pretty good, combined with Entity Framework. Of course, there's a lot that I don't care for and would prefer Dapper. In the end, the general environment of .Net dev being excessively "enterprisey" has kept me at bay the past several years.

Just looked at LINQ and it looks like Ecto [0] used a lot of its ideas for inspiration! I haven't used LINQ, but in Ecto, there are so many useful constructs for composing queries. If you get a stinker of a query, you have multiple escape hatches such as fragments [1] or just writing the queries directly as needed [2].

For beginners, the Elixir language constructs can be a little clunky, but once you get it, it's so productive and I miss that productivity when doing more advanced queries in other languages.

[0]: https://hexdocs.pm/ecto/Ecto.Query.html [1]: https://hexdocs.pm/ecto/Ecto.Query.html#module-fragments [2]: https://hexdocs.pm/ecto_sql/Ecto.Adapters.SQL.html#query/4

IMHO the real proper solution is to have an SQL parser, so you can have your SQL represented as an AST, do some operations on it, then compile it back to a query.

Sadly, I'm not aware of any good solutions to this. SQLAlchemy Core can build an operates on an AST, but it doesn't parse raw SQL into a query (so one has to write their queries in Python, not SQL). Some parser libraries I've seen were able to parse the query but didn't have much in terms of manipulations and compiling it back.

The next logical step after writing the code given in the article is to abstract common boilerplate SQL into a library so you're not spending 50% of your time writing and re-writing basic SQL insert, update, and select statements every time your models need to be updated. At which point all you've done is write your own ORM.

If you want to go full-blown SQL you can use something like PostGraphile, which allows you to create all of your business entities as tables or views and then write all your business logic as SQL stored procedures, which get automatically translated into a GraphQL API for your clients to consume, but once you move beyond basic CRUD operations and reporting it becomes incredibly difficult to work with since there aren't really any good IDEs that help you manage and navigate huge pure-SQL code bases.

If you're really dead set against using a powerful ORM, it's probably still a good idea to find and use a lightweight one--something that handles the tedious CRUD operations between your tables and objects, but lets you break out and write your own raw queries when you need to do something more complex. I think there's a sweet spot between writing every line of SQL your application executes and having an ORM take care of boilerplate for you that will probably be different in every case but will never be 100% at one end or the other.

The SQL inserts/updates have never felt tedious for me even in large projects, partially owed to careful use of jsonb for big objects where it makes sense (e.g. user settings dicts). Other than that, keeping a tight schema design.

Of course, jsonb didn't exist until 2014ish. IMO this was a serious gap in SQL before, and it likely spawned the concepts of NoSQL and ORMs to begin with, which may have been the inspiration for jsonb. Hurray for competition.

The article is missing the code for creating the "users" database table. What about indexes? Migrations? Relations to other tables?

I mean you can just write SQL instead of using the ORM if your project consists of a single table with no indexes that will never change, sure.

When it comes to migrations, I've been fine with https://github.com/golang-migrate/migrate

There are a multitude of extra things to consider, but none of those things are, in my opinion, imperative to having success with SQL in Python. Will it be hard to achieve the same level of convenience that modern ORMs provide? Absolutely. But there is always a cost.

I firmly believe that for most projects (especially in the age of "services"), an approach like this is very much good enough. Also, a great way to onboard new developers and present both SQL and simple abstractions that can be applied to many other areas of building software.

Agreed, I've seen plenty of what wind up being very byzantine and complex migration strategies over the years, and in the end simple SQL scripts tends to work the best. I will note, that it's sometimes easier to do a DB dump for the likes of sprocs, functions, etc, if you want the "current" database bits to search through.

And good luck with writing tests for your sql code.

I really dislike SQL, but recognize its importance for many organizations. I also understand that SQL is definitely testable, particularly if managed by environments such as DBT (https://github.com/dbt-labs/dbt-core). Those who arrived here with preference to python will note that dbt is largely implemented in python, adds Jinja macros and iterative forms to SQL, and adds code testing capabilities. No ORM required whatsoever.

Why would the be an issue? I have written plenty of tests for SQL code an it is no harder than writing tests for e.g. Ruby or Python code. Especially if you have an ORM involved.

sometimes the idea is that the database lives it's own life outside the application. Probably not the case here, but under that viewpoint the application is just one of perhaps many that access the data and as such creating tables, indexes, migrations and relations are none of it's business.

But it is the application's business. You may not be altering the database schema from your application, but you still need to make sure that its code is in-sync with it.

This means that you will need extra tooling, and if you're DIYing you will need to write it yourself.

The effort is the same, regardless of the approach. If you're consuming a third-party database and the underlying schema changes, you'd have to patch your model definitions accordingly - or in the presented example, the dataclass definition. Creating code to dump dataclasses from a database is actually trivial.

In ORMS like Django, models are defined as code-first, not schema-first. Yeah, you can use inspectdb, but in any sufficiently complex application, odds are you need to add to the generated models any custom behaviors you already implemented, and verify all the names and whatnot, because data definition and operations on data are actually mixed in the same class. More often than not, if the change is profound (eg. imagine switching from reading a User model from the database to fetch it from an external service), you may have to refactor a large portion of your code due to the way it interacts with the model - eg. search operations won't be proxied via orm, but by using external service endpoints, etc. There is no free lunch. And don't even get me started on field names that differ on the database.

> ... Python dot not have anything in the standard library that supports database interaction, this has always been a problem for the community to solve.

Python has built-in support for SQLite in the standard library: https://docs.python.org/3/library/sqlite3.html

Python also has the DBAPI specification, which defines what interface a library must support to be considered a database driver. The author claiming Go’s sql package encourages writing SQL directly while Python doesn’t really seems a bit awkward.

Also, DB-API 2.0 is a standard that's followed by most database drivers, similar to what JDBC is for Java, though not as strictly enforced.

sqlite-utils[0] is a great library for working with SQLite

[0] https://sqlite-utils.datasette.io/en/stable/

It's fine advice... if you don't ever need to build queries programmatically (you will, probably) and don't care about type checks (you should, it's 2023).

If you don't know what you're doing on the DB-app interface, you're still better off with an ORM most of the time. If you don't know if you know, you don't know (especially if you think you know but details are fuzzy); please go read sqlalchemy docs, no, skimming doesn't count.

If you know what you're doing but are new to Python, use sqlalchemy.core.

PS. zzzeek is a low-key god-tier hacker.

It's fine advice - if you can type check your queries. My colleague wrote a mypy plugin for parsing SQL statements and doing type checking against a database schema file, which helps to identify typos and type errors early: https://github.com/antialize/py-mysql-type-plugin

Raw sql doesn’t compose, so it’s a no go for me except in special cases, but the tool would be a great addition to sqlalchemy.core for when those special cases occur.

Here, go and read about the WITH statement https://modern-sql.com/feature/with

That’s exactly when I write raw sql but being stuck on old Postgres versions has performance implications.

> I have spent enough time in tech to see languages and frameworks fall out of grace, libraries and tools coming and going.

I feel like Django ORM and SQLAlchemy are the de-facto ORMs for Python and have been around for over a decade. If anything I'd recommend juniors to pick one of these over hand rolling their own solution because it's so ubiquitous in the ecosystem.

Congrats, you just wrote your own ORM. Please mind that ORM doesn’t necessarily mean ActiveRecord, which could be considered an anti pattern.

He didn't really. The SQL is right there, and this is important.

What I've experienced (unfortunately) across multiple projects is that people who understand databases will write SQL with a nice collection of helper and wrapper functions as needed, and the people that think that databases are mysterious black boxes will reach for ORM. I've seen the ORM-happy teams getting scared at the idea of a million (1,000,000!) rows in a table, and they always neglect to set up even basic indexes or to think through what their JOINs are really doing.

YMMV but that's the pattern I see again and again.

Well, the SQL is always somewhere. If you use an ORM library, even if you use ActiveRecord, you will find some SQL in it. In the end, it always translates to SQL. In the blogpost, the writer created a User Python object ("O"). A corresponding (R) database row will be mapped (M) to the object. That's basically ORM. Not as heavy as the usual libraries that support relationships etc, but still, ORM.

Good ORMs provide the best of both worlds. Basic tasks such as loading a single object from the database by ID and then writing it back after changes are made shouldn't require you to write any SQL, because everyone already knows what that SQL looks like, just let the ORM do it for you. A query builder component that allows you to programmatically build queries of medium complexity is also essential. And for anything not covered in the previous two cases, it should be possible to just write raw SQL or something like it without the ORM fighting you for it.

My preferred ORM is Doctrine and it provides all of these features. It has its own variant of SQL called DQL that lets you effectively write complex select queries as raw SQL with a bunch of object-specific conveniences built in, and get back an array of objects.

For this reason I really like GRDB for iOS, documentation is also excellent.

I have decades of experience with databases and I happily use ORMs.

You're conflating ORM with people who know nothing about databases. Why?

From my experience, ORMs allow folk who know nothing about databases to continue knowing nothing about databases

And from my experience, forcing people who know nothing about databases to write SQL will not make them learn about databases, all you end up with is worse SQL and more injections.

Although the worse offenders by far are those which decide ORM = bad and bypass it at every opportunity.

That's fair, ORMs can be slow but they rule out a whole category of basic mistakes

And introduce plenty more. over-fetching, unindexed joins across many tables being two of the more common.

C allows people who can't build a CPU from NAND gates to do programming :P

NAND gates allow people who can't build transistors from self-mined ores to build CPUs.

(this was a flippant remark, if I could edit it I would reword it)

So let's ban ORMs because newbies don't know the entire toolchain. Yes, that will definitely solve it.

Why stop there? Let's ban RDBMS. If you can't contemplate a binary file structure and index strategy perfectly tailored to your application's needs ahead of using an RDBMS, why should we deign to let you use an abstraction layer with clever query planning and algorithms?

It's all too easy to morally 'ban' people from technology and gatekeep it behind wishy-washy nonsense like "ORMs are bad for beginners."

Isn't that just a useful abstraction?

sqlalchemy allows you to separate ORM from SQL and combine them when needed. the idea that 'ORM == you don't have to write SQL and/or you can't write SQL' is, please excuse my strong words here, wrong.

my biggest gripe with sqlalchemy is that sometimes I know what I need to write in SQL (have a working prototype usually) and have trouble mapping the concept to sqlalchemy.core constructs, but that's mostly inexperience.

What's Active Record and why is it an anti pattern?

The active record pattern is an approach to accessing data in a database. A database table or view is wrapped into a class. Thus, an object instance is tied to a single row in the table. After creation of an object, a new row is added to the table upon save. Any object loaded gets its information from the database. When an object is updated, the corresponding row in the table is also updated. The wrapper class implements accessor methods or properties for each column in the table or view.


It was fashionable for a while to say it was an anti-pattern because that was a contrary view and ActiveRecord is very tied into building Rails applications.

It's not about fashion. Observations about fashion are no deeper than fashion itself.

It scales badly with table size, I think by design. That's why SQLAlchemy's and Hibernate's Data Mapper pattern is slightly more cumbersome to write, but works out much better.

It's a pattern where a single object (the "active record") not only represents a single database row, but also usually is responsible for saving/inserting data into the database (via save method) and retrieving them (via find methods). Because of this it breaks SRP. If this is neglectable or not, I don't want to argue here. Personally, I would not use it anymore because of bad experience in the past.

Active Record is NOT anti pattern. It's just one of ways to get things done related to (relational) database and your application. Active Record, Data Mapper, Raw SQL or whetever has its pros and cons.

Yes, this!

I've been burned countless times by Hibernate (and consorts) and now I argue in favour of plain SQL wherever I can. I do not imply that Hibernate is in itself bad, I just have collected many years of observations about projects built upon it, and they all had similar problems regarding tech debt and difficult maintenance, and most of them sooner or later ran into situations where Hibernate had to be worked around in very ugly ways.

Yes, I can understand some of the arguments for ORMs, especially when you get a lot of functionality automagically à la Spring Boot repositories.

And since nowadays I have more influence, I do advocate for plain SQL or - the middle ground - projects like jOOQ, but without code generation, without magic, just for type safety. We've been quite happy with this approach for a very large rewrite that is now being used productively with success.

If you are just going to "Just Write SQL" then I really don't think you should be coding your own Object and Repository classes.

My vision of the "Just Write SQL" paradigm would be a "class" or equivalent that would take a SQL command and return the response from the server. Obviously the response has a few different forms but if you're "just writing SQL" then those responses are database responses and not models or collections of models.

(For the record I think simple ORM type functionality is actually quite useful as your use case moves past the scale of small utility scripts.

I'd take it a step further and move all SQL into stored procedures and call those using a functional interface. That's because of PostgreSQL's excellent stored procedure support, it might be harder with, e.g. MySQL.

One major benefit of stored procedures, in addition to separation of concerns, is that you can declare them SECURITY DEFINER and give them access to tables the Python process doesn't (in a way reminiscent of setuid), thus improving the security posture dramatically.

One example: you could have a procedure authenticate(login, password) that has access to the table of users and (hashed) passwords, but if the Python app server is compromised it doesn't have access to even the hashed passwords or even the ability to enumerate users of the system.

Last project we've explicitly decided to not have any stored procedures ever since you basically can't test nor deploy them in any sane way. I'm all ears how you make it work.

I had a boss once with a similar viewpoint, so I learned from him how they automated database deployments for Java/Scala apps using Liquibase+Maven and found I could apply and integrate the same principles to testing and deploying a data-layer stored procedure engine I had previously built into a joint product we were building together.

For that project I was able to put SQL DDL+DML+stored procedures in version control, create/run stored procedure (TDD even) unit/integration tests on mock data against other stored procedures, had pass-fail testing/deployment in my CICD tool right alongside native app code, and did some rollback support (although that was more trouble than I think it was worth), all using Liquibase change sets (+git+Jenkins). Flyway could also have worked but we were using Liquibase.

It did require some creative thinking to apply changesets and preconditions and post conditions and ability to have stored procedures execute and read results of other stored procedures.

Last I checked, the system had been used over many years to process/evaluate $50 billion in order transactions.

I used sqitch to have very explicit tests for procedures, written in PL/pgSQL. Worked great though writing that code was a little weird due to lack of IDE support I'm used to in any other language. Could take it even further with pgTAP.

SQL scripts with anonymous PL/pgSQL DO blocks to do unit tests, with a library of utility functions to do things like assertions, just as you would in any imperative language. Granted, the PL/pgSQL debugger tooling is not ideal but it works, and we wrote our line-by-line own profiler to identify hotspots.

You can test them very easily by using database containers.

The setup/teardown, and even working with schema migrations can get complex, and potentially excessively so for the benefit of doing everything in SPs. Also, the developer experience and discoverability are definitely less than ideal.

From the point of view from Oracle and SQL Server, it looks like a consequence of not being aware of proper tooling.

That is my favourite approach, when I am allowed to have the last work how the DB layer should be like.

I was sad to find that the author proclaimed "just write SQL" then fell into the trap of modeling his data as objects.

If you're going to model your data this way... you might as well use an ORM.

A better way is to just just write SQL (or datalog) and model our data, from DB all the way up to the application, as relations rather than objects.

Rather than re-hash, this idea has previously been discussed on HN here: https://news.ycombinator.com/item?id=34948816

I'm happy to see someone mention peewee, having used it for numerous startup prototypes since its inception.

Coleifer does not get enough credit IMHO:


Peewee has been solid since I began using it a decade ago. Coleifer's stewardship is hard to see at once, but I've interacted with him numerous times back then and the software reflects the mindset of its creator.

Peewee is excellent! I've especially enjoyed using it with SQLite - there are a number of handy extensions and very good support for user defined functions.

I'd definitely like to second how great peewee is. It's been a core part of running our telescope for the last 8 years. It strikes a nice balance between power and simplicity, a significant set of useful extensions and great documentation.

Sometimes I find it impossible to believe that @coleifer is just one person. Peewee has 2300 issues and 500 PR's none of which are open and outstanding, and almost all of which he has personally responded too in a genuinely helpful way. He pipes up on Stackoverflow for peewee questions too.

You don't need an ORM, but this isn't how you avoid one. If you're thinking of your DB as a mere object store / OOP connector like this article is, you're better off with an ORM or NoSQL than this basically equivalent DIY solution. It's best to instead learn how to use a relational DB like a relational DB, and the rest will follow.

Also, I'm not one of those people who dislike easy things (and will often whine about JS or Python existing). I'm all for ease and focusing on the business goals. It's just that ORMs and bad schema design will make things harder.

I've been both ways on this, and ultimately I come down heavily in the camp of using ORMs for as far as it makes sense. Why? Sure, the "just use SQL because it's so simple" crowd use seductively simple examples, and indeed for very static use-cases it can be quite neat and simple.

But projects (almost always) grow, and once you need to start conditionally adding filter clauses, conditionally adding joins, things start to get very weird very fast. And no, letting the database connector library do the quoting for you won't save you from SQL injection attacks unless you're just using it to substitute primitive values.

And once all your logic is having to spend more space dealing with conditional string formatting, the clarity of what the query is actually trying to do is long gone.

I'll refrain from digging up the piece of code where I was having to get the escaping correct for a field that was embedded SQL-in-SQL-in-SQL-in-go. And I could hear the echos of the original author's "YAGNI"s haunting me.

With SQLAlchemy, I come for the type checking. I stay for the Alembic migrations.

I feel this, sort of. I taught myself how to code by writing a Python bot (among other things), and eventually needed some sort of database handling to make things work. I decided on teaching myself basic SQLite, and just did it raw with `sqlite3`.

Currently 50% of my anxiety when it comes to my bot is related to database matters. There's no type checking so I gotta be careful when writing them, and stuff might blow up weirdly at runtime. Refactoring tables is also a major pain, or at least was until I (sort of) figured out a 'routine' of how to do it.

It's doable, and I assume that teaching myself some basic SQL and using it in production was a great learning experience, but once I'd reached the point where I was inventing database migration tooling from first principles and considering how to implement that, I realized that I probably just want to look at SQLAlchemy again.

I like writing things in Python purely because of SQLAlchemy. I think it's completely great.

Indeed. So much more than this simple example. It gets interesting for more advanced use cases. If i now rename a field on the model, it will not be renamed in the database. If i want that to match, i have to change the query. And make a migration. But that is probably another simple blog post.

Putting it all together is another blog post. And if you have colleagues: probably needs documentation. Which you also have to maintain yourself.

My god, object-relational impedance mismatch seems to be more polarizing than US politics. I've been observing this field for more than 15 years, and it's always "JUST WRITE SQL!!!!!" versus "DRY!!! DRY!!! USE ORMs SO YOU DON'T HAVE TO WRITE SQL!!!! BUSINESS LOGIC!!!" shouting matches. It's like this topic itself takes 30 IQ points away from each participant and the conversation then devolves into complete chaos.

Maybe there's some professional trauma at work, as many of us have been traumatized by shitty databases and shitty code working with them alike, ORM or not. But ORMs do come and go, promise bliss, deliver diddly, and I'm reading the same stuff I've been reading in 2008, as if nothing ever changed since.

I've used Rails AR and Django/SQL Alchemy orms, and the more I use it, the more I wish for a fusion of both.

Django ORM is amazing for Schema management and migrations, but I dislike their query interfaces (using "filter" instead of "where"). I really like Rails AR way of lightly wrapping SQL, with a almost 1-1, and similar names, but does not have a migration manager - and there is always the chance that your schema and your code will diverge.

If I would get a Schema / migration manager, that would allow to do type checks and that would work well with a language server for autocompletes, but use SQL or a very very thin wrapper around SQL, that would be my Goldilock solution.

This is also why I really like Peewee in Python. If I am not going to write SQL, at least give me an API that looks and feels like it. When I look at Peewee code, I can often see the end result query.

Thea answer to your prayers already exists: http://sequel.jeremyevans.net/.

By far the best database toolkit (ORM, query builder, migration engine) I have seen for any programming language.

Hmm, is AR's Migration framework not a migration manager?

I’ve been a data engineer for many years and have lots of practice optimizing SQL touching many parts of the language and I still enjoy using SQLAlchemy for its tight, elegant integration with flask/django. Of course some queries make sense to optimize with raw sql but I think here, like with many other things, there’s no black/white conclusion to draw from these situations.

I've been using PugSQL to write SQL in Python [1].

With this package, you write the SQL inside SQL files so you can benefit from syntax highlighting, auto formatting, static analysis, etc. At the difference of writing strings of SQL inside Python files. I'm surprised this is not more popular.

[1] https://pugsql.org

Great concept, based on the Clojure library HugSQL.

Unfortunately it depends on a specific version of SQLAlchemy and won't run with the latest version.

Question to the SQL-only people, how would you handle something dynamic? If I have a database of shoes and want people to be able to find them by brand, size, style, etc., what does that look like?

If you don't want to maintain several queries, you could write something like

    SELECT *
    FROM shoes
    WHERE (CASE WHEN :brand_id IS NOT NULL THEN brand_id = :brand_id ELSE TRUE END)
    AND (CASE WHEN :style IS NOT NULL THEN style = :style ELSE TRUE END)

That is pretty good. Much more readable than gluing a bunch of strings together.

And if you want to make it even more readable, those inner expressions can be wrapped into functions.

If statements that either add conditional statements or blank lines to the SQL block. There are a lot of tradeoffs to the pure SQL method but I prefer being able to look up exact snippets in the codebase to find things.

Alright, let's use the custom approach. And then you need another field. and then you need some slight type checking or (de)serialization, which can change over time. You'll end up writing your own custom, kludgy ORM over time.

I have seen people write their own custom crazy version of GraphQL ("I've created a JSONified way of fetching only some fields from an API call) over ego or just ignorance. It's never a good path.

Why bother moving away from SQLAlchemy, which will do all of that for you in a simplified, type-checked and portable way? SQLAlchemy is literally one of the best ORMs out there, it's ease of use and maintainability is insane.

People that complain about ORMs might have never really used SQLAlchemy. It is that good. I'm a fan and zzzeek is huge force behind why it is so good.

And as always, if you need an escape hatch, you can use raw sql for that ONE sql statement that the ORM is giving you grief for.

I come from the PHP world and have used a variety of ORMs/query builders in that ecosystem but the most common issue I encounter that they don't handle well is when I want to do a "left join foo where foo.id is null"

I've seen many iterations of this type of debate by now, and learned to recognize the patterns. The people arguing for the ostensibly simpler solution are really asking you to trust their ability to architect apps out of simpler building blocks without using an abstraction that they don't like. This can work, but it often leads to situations like someone writing a bespoke system, then leaving the job or otherwise imposing extra complexity on the team. In the immediate term, what often gets overlooked is

- The ORM is an externally maintained open-source project with a plurality of contributors; "just write SQL" is not

- The ORM is designed to support the full lifecycle of the application including migrations; "just write SQL" is not

- The ORM is documented to be legible to newcomers; "just write SQL" is not (for all but the simplest of applications)

- The ORM is composable and extensible with opinionated and customizable interfaces for doing so (I've lost track of the number of times I've had my mind blown by how elegant and smart Django and SQLAlchemy's query management tooling is)

- The ORM has a security posture that allows you to both reason about your application's security and receive security updates when bugs are found

- The ORM is a platform for many other modules responsible for different layers of the application (DRF, OpenAPI, django-admin, testing utilities, etc. etc.) to plug into and allow the application to grow sustainably

I now try to guide people to a middle ground. Yes, both Django's and SQLAlchemy's ORMs can be annoying, have performance issues, etc. But for large applications maintained by multiple people over time, their benefits usually outweigh the drawbacks. Both have extensible architectures that allow customization and opinionated restriction of the interface that the ORM presents. If you're unhappy with your organization's ORM, I suggest you try that route first.

I always write SQL code directly, but that's mostly because I actually enjoy writing SQL queries :).

Same here. SQL is arguably the best language for writing queries. That's what it was designed for.

If you want to try out something cool, check out


It's written in Go and it converts your sql migrations and queries into typesafe code that you use access your database.

It currently has a plugin for Python that's in Beta, but what essentially does something similar to what this post is saying.


You write your migrations, and queries and a config file and it does the rest.

There's a whole family of libraries like that. Yesql is the first I became aware of. The repo has an (incomplete) list of ports to other languages: https://github.com/krisajenkins/yesql#other-languages

A good ORM knows when it needs to fuck off. I just want an easy and boiler plate avoiding way to do crud operations on certain tables and map a custom type against a custom query. The lengths I have to go through to just have a custom query in some ORM's is mind-boggling. I remember fighting Microsoft's Linq to SQL or whatever the incarnation was called so hard. I could do it in the end but it fought me all the way to the end.

It might be worth mentioning LiteralStrings from [PEP 675](https://peps.python.org/pep-0675/) and how you should use them to prevent SQL injections. I'm not sure this blog adds much to the discussion when it comes to when to write SQL and when not to. It does not cover the struggles, the benefits, and the downfalls.

As a data engineer, the pattern the OP shares is very familiar. I find it much preferable to use of ORMs for wide variety of reasons. However, I view implementing with SQL as an antiquated problem rather than a pragmatic feature. The evolution of this pattern would be to integrate database querying into languages more directly and eliminate SQL entirely. While this could be achieved in Python, I find that a language like Clojure, via functional programming (FP) primitives and transducers, is a natural candidate, particularly for JVM implemented databases. Rather than encapsulating SQL via query building or ORM based APIs, an FP core could be integrated into database engines to allow, via transducers, complex native forms to be executed directly across database clusters. Apache Spark is an analog of this. In particular the Clojure project, powderkeg (https://github.com/HCADatalab/powderkeg), as an Apache Spark interface, demonstrates the potential of utilizing transducers in a database cluster context.

Recently I was wondering myself whether I should just write SQL as I didn't particularly enjoy working with SQLAlchemy.

Then I discovered peewee. I am happy now.

A better title would be "just write your own ORM".

I have used several Python ORMs over the years, both for SQL and NoSQL. SQLAlchemy is the most powerful way of interacting with a relational database I have experienced.

I also write Go, and when I do, I do not use an ORM. But when it comes to Python I know my solution won't be better than SQLAlchemy, so why bother rolling out my own?

I always ctrl-f to search for the word "composability" when I come across arguments like this. I could take or leave ORMs, but relational query-building libraries are invaluable for composability, compared to proliferating mostly-duplicative raw SQL in format strings all over the place.

So far all my projects have targeted a specific database with no reason to change it.

So what I do is write SQL commands, but keep all inside a specific file or module of the project. So that I can decide later to refactor it into an ORM.

I think ORMs are great if you write libraries that target more than one database. Or situations, where you have more than one database and need a proper migration part.

If you don't need migration, but in the worst case can start with a fresh, empty database, then write SQL.

But for production system, the no/manual migration might get old quickly. Writing migration code that just adds fields, indexes or tables is easy. But writing code that changes fields or table structures? Not do much.

Still, you don't need an ORM at the beginning of a project, just don't put SQL everywhere.

You'll eventually write your own dynamic query building logic if you take this development path

I can already see this doesn't have connection pooling which all those ORMs he listed have without you knowing what a connection pool is it just works, and scales, doing that on your own is not easy.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact