Hacker News new | past | comments | ask | show | jobs | submit login

... Or by people that actually understand the impedance mismatch between objects and data (quick django example - request data and models are different and not easily interchangeable). Or people that require good caching implementations. Or people that actually design database systems schema-first. Or peoplw that rely on advanced usage that isnt always easy to perform in orm's. The list goes on.



Django's ORM is much better than most, to the point where Django considers that "I can't model this query with the ORM" to be a bug.

There are of course some mismatches, but it's pretty hard to have a query that is not at all modelable, and Django's ORM is ... fairly predictable (I have some gripes about how obvious or not joins are but it's subjective).


There is a subtle difference between "can't be done" or "I'll spend an afternoon digging through documentations, code & examples to implement this".

Quick obvious example - use views for data retrieval (that may contain more fields than the actual model) and tables for data insertion.


In that case for the SQL, you would be pulling in data from one table, putting it into another. in the ORM I would similarly model it by having an abstract model to hold the general shape of the table, and then one read-only model and another for writing. With some helper functions to pull the data from one or the other.

Though if you are working with views and not materialized views, performance wise "use .annotate when fetching data" will be roughly as performant as using a view. Views just end up getting swapped out when the planner deals with the query (in Postgres at least).

Of course I get this cuz I spent a lot of time with the ORM. I think it's easy for people to think "there must be a magic option somewhere in the ORM to do this", but sometimes there's no magic. The counterpoint, though, is you have Python behind all of this so stuff like "share schema definitions between classes" is easy and straightforward in my experience (including on projects with like...80+ models, some with way too many fields, so I get your pain). It might not be the exact API you want though


> In that case for the SQL, you would be pulling in data from one table, putting it into another

In django ORM lingo, that's declaring 2 models, assuming they're even in the same namespace or app. plus the request you use to fill them. If you don't find this awkward, its on you, not me :)

> Views just end up getting swapped out when the planner deals with the query (in Postgres at least)

Huh? do you actually know how views work? You don't even have an easy way (non-sql) of declaring views in Django. And apparently, you also seem to doesn't seem to grasp implementation details regarding views and materialized views - views are "server-side cached queries", but materialized views are physical pre-computed tables. Also, in PostgreSQL (as well as many other databases) a view can actually hold data from several tables, including remote tables. The whole "this is a RDBMS system and we shall abide by it" went through the window the moment I can use a SQLish database to query eg. CSV files and/or import them as local "tables".

> Of course I get this cuz I spent a lot of time with the ORM

Don't get me wrong, but it seems you're not spending enough time. Let me enlighten you using a personal anecdote: a project management solution where you log in your tasks during the day, the hours and it would keep track of the collaborator's project allocation during the project execution time. Each collaborator (from less than 100) would introduce from 1 to probably 20 tasks per day, on the projects they were allocated. Reports were generated - per project, daily, monthly, etc- you get the point. Obviously, those reports were computed using python and the ORM - so at some point, getting eg. a company-wide report with a couple of projects for a year would trigger some specific conditions that made the report take more than 10 minutes to generate. A dataset I could tabulate in excel (couple of hundred thousand lines). Half of the time was actually spent on allocating objects in memory for stuff that had 1 field read. Of course the report routine reused the existing code functions for specific computations, that increased 10-fold the execution and memory allocation. The 20-line sql query that replaced hundreds of python lines executed in less than a hundred milliseconds. I could blame the ORM, but instead I blame the application design that follows ORM constraints. Someone detached from a data source would tell you "the parent API needs to provide that" - instead, you use what you have at hand. Just because it works, it doesn't mean its good.

> o stuff like "share schema definitions between classes" is easy and straightforward

That is actually the shit-show design I always want to prevent. Classes should not share schema definitions, but data formats (data objects). In the specific context of Django, Models are a piss-poor data object representation (because it is a quite decent Model implementation, mind you), and the whole apps scaffolding is just... nuts. The notion of apps are self-contained units of functionality, but they quickly become a cesspit of cross-referencing imports. Rule of thumb in most languages, if different "modules" of the application need to share a schema definition, you're doing it wrong. But heyyy, Python.


Alright I'm not going to engage with you on this, except to say that I've also written a report generator that does the whole "aggregate by time bucket and certain conditions across multiple tables" and it was in the Django ORM and it would end up as a single query that looked like what the SQL I would have written is. Lots of `query.explain()` calls locally to make sure I was getting what I want, but hey I'd be doing that for SQL in perf-sensitive queries as well.

I'm partial to the ORM being easy to lull you into writing quadratic-behavior stuff. You still gotta know how your DB works! But "mostly using the ORM, using SQL for some tricky things" works really well, especially given that the Django ORM lets you inject SQL fragments for a lot of behavior and have that "just work".


Extending on impedance of objects and data, and validated request data being different from models, imagine this pseudo-code:

function action_endpoint():

     if request.is_valid():

        data = request.to_data_object(data_object_class)

        self.service.update(data)

        return success()

     return request.errors()


in this simple example, the internal data representation isn't using a full-blown object, but a "data object" (ex. a dataclass). There are no transitive database dependencies, it behaves just like a fancy dict. When including data_object_class, I'm not including the whole database driver. When passing this to other system components, this can be serialized & de-serialized because it has no intrinsic behavior implemented. As such, when using architectural patterns like three-tier design or hexagonal design, you can pass data between layers without any "external"(from a data perspective) dependency; this allows the frontend to be completely agnostic on where & how data is stored. In fact, in this example, self.service could be an RPC proxy object to another subsystem, in a different server. The advantage of this design becomes quite apparent when you need to decouple how data is stored from how data is processed - you start designing your application in a service-oriented approach, instead of a model-oriented approach.

In fact, one could just create an endpoint processor that receives a table name, and infers the rest of the logic in the middle (the request validation, the data object, the glue service for database), that today can write to a database, and tomorrow just calls an api without rebuilding your application.


The list goes on an on and yet in practice these problems are solvable and the impedance is just really not a big deal.

There is only one feature missing in the ORM which is composite primary key. For everything else all those things have clear and simple solutions.

"Impedance mismatch" is just a thought-terminating cliché. High-level languages have impedance mismatch with binary code; reactive components have impedance mismatch with state, relational tables have impedance mismatch with hierarchical data. Yet we find solutions and workarounds and the severity of these problems is generally overrated outside of purely theoretical contexts.


> and the impedance is just really not a big deal

Im quite happy you haven't come across major issues with this. If you develop clean architecture solutions that are live products across years, this is a major problem (eg. table X is now a separate full blown service; table Y is an external materialized table with no inserts, as inserts now go into an external messaging system such as kafka) etc.

> "Impedance mismatch" is just a thought-terminating cliché.

So are the whole ORM advantages. My personal distaste from ORM doesn't even start in the obvious technical drawbacks, starts with the fact that a developer should have a solid grasp on the domain he is working, which more often than not, ORM advocates lack. If you can't model data from a storage perspective (which, btw, is often the bottleneck of your application), you shure as hell won't do a good job modelling it in a business domain.

> Yet we find solutions and workarounds and the severity of these problems is generally overrated outside of purely theoretical contexts.

Ahh yes, the typical "lets not get theoretical" argument. ORMs are usually crap, and in python they are actual crap. If Django is a good example for you, good for you. If you ever have a look at Entity Framework you'll be amazed. Try to use a schema-first approach with any mainstream ORM and you'll quickly realize all you do is workarounds because of assumptions and limitations. Thing is, for my daily work, these problems are actual problems. So much we don't use Django or ORMs.


ORMs are just not performant unless you reason about all the code at the level of "what queries are going to be generated and when", which makes the ORM an unhelpful layer of obfuscation over the layer of abstraction that you're actually reasoning at.

This is quite different from high level vs assembly where you can easily go your whole life without ever learning assembly language or how a compiler works.

Or to put it another way, the difference between the two situations is that an ORM API is not a higher level language than SQL. Transpiling between two languages of comparable expressiveness (SQL is actually more expressive but no need to go there) adds an extra source of problems without gaining you much.


> ORMs are just not performant unless you reason about all the code at the level of "what queries are going to be generated and when", which makes the ORM an unhelpful layer of obfuscation over the layer of abstraction that you're actually reasoning at.

You're assuming I use the ORM to not reason about SQL or not think about performance. This isn't true; first of all because even if you write SQL, SQL performance is not immediately obvious for any but the simplest of indexed queries. In no storage system do you ever get away from reasoning about this.

Second because SQL is actually a mediocre abstraction layer over your data storage. You can't really compose SQL queries; in an ORM taking a base Query object and adding a bunch of various `filter()` statements automatically does the right thing. Basic queries are much shorter visually; ORMs deal with the abstraction of table and column renames that mean rewriting all your SQL in other systems. I feel like you're just trotting out "reasons" out of a blog post from people whose priorities aren't the ones that people like us who write CRUD systems day in and day out do.

Again, you're talking about theoretical disadvantages which I have only really encountered about a half dozen times in over a decade of using Django even in performance-sensitive areas. Rewriting one ORM query out of a hundred is not a problem, especially if I had to rewrite the SQL in the first place.


> Second because SQL is actually a mediocre abstraction layer over your data storage

Actually, SQL is not an abstraction layer for data storage. It is a query language based on set theory. Even DDL doesn't impose on you any limitations or makes any assumptions regarding storage - just data representation. You're conflating SQL RDBMS with SQL language, just like many ORM's do. Quick example, you can use SQL with Apache DRILL to query log files in CSV - SQL Language, none of the RDBMS stuff.

> You can't really compose SQL queries;

in an ORM taking a base Query object and adding a bunch of various `filter()` statements automatically does the right thing. Basic queries are much shorter visually

Of course you can. But even without needing to explain to you how one of the most famous anti-patterns work (EAV - entity-attribute-value), the most easy way of doing it is by using a query builder in your own development language - you just filter based on the specific code contitions.

Oh, and "compose" is a terrible term, specially when SQL actually allows you to use SELECT * FROM (SELECT QUERY), LATERAL and UNION - all of these allow you to perform proper composition, years ahead of what most ORMs give you.


> In no storage system do you ever get away from reasoning about this.

Agreed

> Second because SQL is actually a mediocre abstraction layer over your data storage.

This is fair but adding yet another layer only makes things worse

> I feel like you're just trotting out "reasons" out of a blog post

Rest assured that I haven't read a blog post or indeed anything on the topic; in fact I'm shockingly ignorant in general.


> This isn't true; first of all because even if you write SQL, SQL performance is not immediately obvious for any but the simplest of indexed queries. In no storage system do you ever get away from reasoning about this.

True, but ORM adds yet another layer of cruft to debug and monitor, with arguably few benefits for such an advanced user; Also, in some databases, EXPLAIN is your friend, and may not give you execution time (it does, but lets assume you're right just for the sake of it), but it does give you planned "execution cost". 0.1 is better than 0.5, and so on and so on. Also, it will tell you if your indexes are actually being used (are you a mysql user? that would explain a lot).

Regarding performance, you have 3 main areas that are costly: query execution, result retrieval time, result serialization & transformation; The first two are characteristics of the design of the database and the specific system characteristics; the last one is pure framework-dependant. If you're a python buff, you'll quickly realize that your application will spend a non-trivial amount of time getting around proxy class implementations for models in the deserialization phase of the data - in some cases, more than the query and transport time itself. All because you eg. wanted to compute spent minutes in a given task for a given user for a year, counted in hours or melons or whatever.

> that people like us who write CRUD systems day in and day out do

I write a shit-ton of CRUD systems (in python), and none of them use Django, because interoperability is often desired, and in some cases - a requirement. Just because you use Django and a ORM to design some stuff, doesn't make it wrong, but also doesn't make it right. Want to design database CRUDS without code? There are plenty of tools for that, ranging from bpm tools like Bonita BPM (OSS) to OutSystems.

>Again, you're talking about theoretical disadvantages which I have only really encountered about a half dozen times in over a decade of using Django even in performance-sensitive areas.

Or - or - we're discussing problems you don't have because you work in a very narrow field where Django is actually a good fit. They exist, and nothing against it. But the mustard is on the nose if you tell me you've been spending "over a decade" with Django. In "over a decade", I've written a shit ton of libraries in at least 4 languages on this specific topic (database middlewares, query builders, object mappers, etc), and the driver for all of them was to solve problems you describe as "one in a hundred".

> Rewriting one ORM query out of a hundred is not a problem

Actually, it is. Because its not *one* query, its one endpoint - it may be using a dozen models, on a dozen child subroutines, to compute a specific value; It may be actually using a function on a different app for a specific computation; It may actually happen that the local computation has certain characteristics the replacing query doesn't because it is used server-side - sum of times; sum of decimals; sum of floats; And now every time someones edit that model assuming "that's it", they either have a crappy abstraction model and realize they also need to edit some method's sql queries by hand (oh and patch the migrations, if necessary), or they're just playing whack-a-mole with the assorted array of custom functions breaking up in unit testing. In the end - assuming all things are equal - I would very much prefer if architecture and concern delegation wasn't affected by some performance-related refactorings.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: