Hacker News new | past | comments | ask | show | jobs | submit login

ORM's are good form, why?

- Avoids mistakes when dealing with writing raw SQL queries (SQL is quite repetitive in practice)

- The declarative nature of classes maps well to types and relationships

- The declarative nature of classes maps out well to tables, even with polymorphism [1]

- Keeping "Models" in an ORM often maps out well to migration utility (Alembic, Django Migrations)

- Object-chaining map very well to queries

- ORM objects can be reused and composed

- They can abstract out intricacies across SQL dialects

- They can potentially make it easier to migrate to different SQL servers if no specialized features were used

- Can help avoid common security vulnerabilities like SQL injections

- When something can't be expressed via ORM relationships, they tend to allow the dev to drop down to raw SQL. In the case of SQLAlchemy, there is a core query language [2], too.

- In the case of Django, QuerySet is used as a standard throughout extensions that power a whole community. Plugins that don't even know each other (e.g. django-filter and django-tables2) can operate on the same django queryset to filter/search and sort/display data.

I mention QuerySet/Django ORM quite a bit in a recent blog post at https://www.git-pull.com/code_explorer/django-vs-flask.html.

[1] http://docs.sqlalchemy.org/en/latest/orm/inheritance.html [2] http://docs.sqlalchemy.org/en/latest/core/




> Object-chaining map very well to queries

This isn't remotely true; it turns what looks like an in-memory access into a network round trip. Navigating your database using an idiom of lists and member accesses is a fast path to n+1 queries all over the place, or ORM tuned to eagerly fetch the whole world when you only touch a tiny piece of it.

The closer an ORM's API is to a monad, the happier you'll be. Fundamentally, accessing the database is executing remote code; the more you can package up into the query before it goes off and does anything, the better performance you'll see.

IMO trying to shoehorn objects (in the OO sense, with polymorphism, data hiding and behaviour) into a database is wrong-headed. Data hiding in particular is a wrong mental model for thinking about facts in the database, and the more is hidden, the harder it will be to reason about performance and bulk remote operations generally.


> This isn't remotely true; it turns what looks like an in-memory access into a network round trip. Navigating your database using an idiom of lists and member accesses is a fast path to n+1 queries all over the place, or ORM tuned to eagerly fetch the whole world when you only touch a tiny piece of it.

That isn't true for e.g. Django's ORM, which lazily evaluates the query and only actually accesses the db after that, with a single query, and filtering done in SQL.


This is not the case - with Django's ORM you are still susceptible to the N+1 problem. It can be largely mitigated through `select_related` and `prefetch_related`, but the fundamental issue is there - that simply by accessing an attribute for an object that wasn't already fetched, you can do another database query without it being at all clear in the code that this will happen.


That's true, though select_related is mostly the default, and that behaviour is documented well. In practice, I very seldomly find N+1 cases with Djangos ORM, even when doing relatively complex data analysis.


This didn't use to be the case, I remember a single line of Django orm causing several hundred db requests, but maybe they improved it.


When was that and what was the form of the query?


> Fundamentally, accessing the database is executing remote code; the more you can package up into the query before it goes off and does anything, the better performance you'll see

The queries that most tend to be making just aren't that sophisticated. Relationships tend to be basic.

And I haven't even mentioned stuff that'd really, really hard to express/manage in pure SQL like tree/nested information that has to stay balanced [1]. Thanks django-treebeard/mptt and sqlalchemy-orm-tree.

> The closer an ORM's API is to a monad, the happier you'll be.

Developers using ORM's simply aren't caring about ORM's matching a certain construct. They care that models emit correct representations of their schemas and that the data is retrieved "fast enough".

Take it a different way: the best part about ORM's? They're effective 95% of the time, right out the box, so you end up avoiding time that'd be spent over and prematurely optimizing.

> IMO trying to shoehorn objects (in the OO sense, with polymorphism, data hiding and behaviour) into a database is wrong-headed.

Objects in things like SQLAlchemy declarative and Django Models map perfectly to generated SQL, so they also act as a way to generate migrations. It's that accurate. A lot of the relationships project's need expressed tend to be vanilla joins.

> Data hiding in particular is a wrong mental model for thinking about facts in the database, and the more is hidden, the harder it will be to reason about performance and bulk remote operations generally.

ORM's strive to hit a value sweet-spot in terms of code expressiveness, reducing duplication, handling the bread and butter relationships and types. That covers what most developers really need.

Perhaps there are projects out there not fitting to ORM's. Not all projects are sophisticated data mart projects, but even then, a good share of those still go back to simple joins at the end of the day.

And I've even gone as far as trusting heavy-duty stuff like django-mptt, along with plugins that filter and sort. I don't even look at the queries, all I see is they're running performantly. In all these years, SQL queries have never been a bottleneck. Maybe it's because I'm only storing simple stuff.

[1] https://en.wikipedia.org/wiki/Nested_set_model


An ORM provides easy DB access and various implicit operations. The trade off is that if you don't know what you're doing, your app gonna have bad performance. Seems fair to me.

Also, Django ORM documentation is very clear and you can use Django Debug Toolbar to analyse the raw sql generated.


>This isn't remotely true; it turns what looks like an in-memory access into a network round trip. Navigating your database using an idiom of lists and member accesses is a fast path to n+1 queries all over the place, or ORM tuned to eagerly fetch the whole world when you only touch a tiny piece of it.

Only if you use naive ways of doing this. In .NET at least your overcomplicated expression can be compiled down to the minimum query needed to pull the bits of data you use.


It's not magic. I've personally removed dozens of N+1 queries from code written by one of my fellow consultants caused by misuse of Entity Framework. Here's a reference: https://msdn.microsoft.com/en-us/library/jj574232(v=vs.113)....

At least with EF it's possible to completely disable lazy loading. I always recommend doing so -- when you're passing entities between methods it's pretty easy to lose track of what's implicitly 'loaded' on the object. Innocent changes to a method that operates on an entity can cause a database round trip...horrible idea.


But they're unnecessary baggage, because your top six reasons all work in reverse.

If you define your schema in your database and derive your data layer from that, you get everything in your list (apart from that thing that totally always happens where you switch your underlying database technology every few months).

But then you don't have your database defined in two places. And if anybody ever does modify the db by hand, your build will break and it will quickly surface itself as an issue at compile time instead of via an obscure error message somewhere 40 levels deep in the call stack.


How do you in practice derive the data layer from the schema? Somehow parsing the schema in the application code?


Personally, I generate base classes for table objects at compile time.

You can look at a table and its keys to determine what sort of thing it represents (entity, lookup, association, etc.) and build out helper stuff as needed. So in addition to basic CRUD anything that looks like an entity gets .Load(), .Save() and accessors for fields as well as .GetByWhateverID() methods for any foreign keys. I base my actual Entity classes off of those auto-generated base classes, so they can get blown away and recreated as often as necessary.

I also wrap any one-off stored procs that are lying around in calling code, so that they can be used in place of the standard-issue Frankenstein SQLBuilder thing that an ORM would have.

It's kinda all the upsides of an ORM, but without any dynamic garbage, schema-as-config-file, mystery auto-SQL, migrations, or (again) Almost-SQL-In-The-Magic-Query-Languague to Not-The-SQL-I-Meant conversion.


Thank you.


Yes, with Django I would definitely stick to ORM. But when working in Golang or some other new language, unfortunately there isn't any ORM library that good. So people usually stick to SQL.


I always liked Django's ORM because it didn't even try to do any complicated stuff, it forces you to drop down to raw SQL instead (I haven't used Django in years, hopefully it's still like that).

Compared to something like Hibernate or SQLAlchemy that tries to support everything under the sun and can result in a lot confusion when trying to understand what exactly it's doing.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: