> If you're using an RDBMS, bite the bullet and learn SQL.
If this person spent all that time using Hibernate and then SQLAlchemy, and all that time did not know SQL, then their suffering and bad experiences make complete sense. You absolutely need to know SQL if you're going to use an ORM effectively. Good ORMs are there to automate the repetitive tasks of composing largely boilerplate DML statements, facilitating query composition, providing abstraction for database-specific and driver-specific quirks, providing patterns to map object graphs to relational graphs, and marshaling rows between your object model and database rows - that last one is something your application needs to do whether or not you write raw SQL, so you'll end up inventing that part yourself without an ORM (I recommend doing so, on a less critical project, to learn the kinds of issues that present themselves). None of those things should be about "hiding SQL", and you need to learn SQL first before you work with an ORM.
> Good ORMs are there to automate the repetitive tasks of composing largely boilerplate DML statements, facilitating query composition, providing abstraction for database-specific and driver-specific quirks
None of that requires an ORM. A simple query builder will suffice and it will be much easier to debug and much less error prone than an ORM.
> providing patterns to map object graphs to relational graphs, and marshaling rows between your object model and database rows - that last one is something your application needs to do whether or not you write raw SQL
So this is the real ORM juice. And ya, without an ORM you have to do this by hand.
So here’s the question: How well can an ORM map to your data model out of the box? In a lot of cases this is where the mess comes from. You need to set up basically a low level AI that can figure out the “right” thing to do as call sites all over your codebase are making arbitrary queries on a huge API surface.
And so the argument against ORMs is that it will take less time and be less error-prone to write bespoke loaders that take rows and build your data model than it will be to configure that AI such that it can do the “right” thing with arbitrary queries.
(If you don’t like the word “AI” here, use “expert system” instead.)
> ORM: You ask for something and because you've researched and found a well-written, quality ORM you trust that it will create a sane query.
Ha. More like:
ORM: You've just joined a project already using an ORM selected by an 'architect' that no longer works here. Everything is fine until you start testing your system with a database sufficiently populated with real-world data. You and the DBA spend the next next 6 months trying to get the damn ORM to generate performant SQL (you even call Oracle support, which promptly dispatches an "engineer" who tries to sell you on another $500k of crap you don't need that won't really address the problem). You eventually just start writing queries by hand wherever you find a bottleneck caused by the naive ORM, which you could have done at the outset, but no one who actually knew SQL well enough was on the team back then.
IME, it should never take more than a few hours to run down why the ORM made such a query. At a minimum most RDBMS's have query logging and can explain queries.
Ironically the last time I had a big ORM performance problem was Hibernate eager loading all joins. Diagnosing and fixing it didn't take more than an hour. (Though we did have a very experienced DBA at the time.) YMMV
You have to known when to break free of the ORM. They are great for graphs of CRUD operations and pretty nasty for much of anything else. You may come to a different conclusion depending on the project and dataset of course.
And when you break free of the ORM prey to god that your ORM doesn't have a hidden cache somewhere. If it does then you spent a week tear out your hair until you figger out that the ORM' cache caused all the trouble.
A query builder aids you at constructing queries, while an ORM builds queries for you, runs them and maps the output to objects. It's more sophisticated than a query builder.
Isn't that the same reasoning as "I don't like using high-level languages because they limit my visibility of what's running on my processor"? Do you write all your code in assembly?
I think the issue is that if you don't understand what a high-level language is doing under the hood, then you will constantly be surprised by side-effects. You shouldn't be using an ORM to substitute for your lack of understanding SQL; you should be using it to automate tasks.
If you have to log the generated SQL to understand what's happening, you're already behind the curve.
And then what do you do if the ORM is generating junk? If the answer is "use a querybuilder/handcrafted SQL for that one", what's the point of the ORM in the first place?
> And then what do you do if the ORM is generating junk? If the answer is "use a querybuilder/handcrafted SQL for that one", what's the point of the ORM in the first place?
The point may be that 98% of the queries are just fine, and you've saved time vs writing by hand, and it may be easier to read/understand for the next people to have to touch the code.
"saved time" at the least optional time to save time. ORM's are a maintenance burden. They obfuscate DB performance behind usually an enormous API surface.
Most software work is maintenance work. Optimizing for initial deployment is shortsighted.
What "enormous" API surface? It's not about "optimizing initial deployment". It's about "optimizing continuous deployment" and having "always releasable Software". I've worked for departments with 15 devs all working on the same codebase and we could release and rollback releases (A/B deployments) every week because we treated the database as a dumb data store. We had multiple branches at the same time, etc.
I've also worked at a companies where all of the logic was in ungodly stored procedures and getting anything released took months and we had a whole two weeks "hardening sprint" because we had to coordinate with the "database developers" and of course the entirety of the business logic was In stored procedures.
Most software being maintenance work is even more of a reason to optimize deployment. What good is unreleased software? The most important part of the business is releasing software. Most of my emphasis asan Architect is making sure we can release fast.
I like the idea of stored procedures. I get some of the value they bring. However, the few times I've been on projects where sprocs where the primary focus of logic/truth/app... it was always a pain.
* The 'developers' weren't allowed to write the sprocs. We were at the biz meetings, but the DB guys were hardly ever there - their meetings were separate for some reason, but because devs were at the meetings, the devs were the ones who also were the face of the project. When the project was behind, we caught it in the neck, even if we were bottlenecked waiting for the DB team to write their logic and expose it to us.
* The DB team was always fewer people, juggling more projects, and other things like system uptime, maintenance, backups, etc.
* The sprocs were never part of version control or part of any source code that we could ever see as part of normal development. They typically weren't subject to any unit testing process.
The answer to all of this is mostly human management, structuring resources differently, coming up with different processes, etc. But any of those things would have changed the power dynamic, which seemed to be the purposes in those environments. Hey, I can write a stored procedure too - let me write them, and if a DB wants to 'review' them - or really, anyone on the team - please review and let's hammer it out and make it better. But roadblocking projects until the DB guys can 'get around' to writing our mission critical procs is just silly.
One other 'weird' division I saw a few places was this "developers can never have access to production systems - that violates XYZ" (a regulation, or some 'law' that was never produced, etc). I asked what the core issue was, and it was "you can't just have developers going on to production and just making changes on live systems - that's ... (against our policy, etc)". This was particularly challenging in a situation where a critical bug only happened on one production system, and we weren't allowed to replicate the database to another system, nor was anyone with any knowledge of the deployed code allowed to get on to the production system to even see if what was deployed was what we'd developed. But... this was still "our problem". Yet... the DBA in this case was "allowed" to get on the system and hand-write new triggers and sprocs to 'fix' our problem, all without documenting/testing his code, nor committing the code to any repo for us to even have visibility in to the data manipulation he was doing to 'fix' the problem we supposedly cause but couldn't investigate.
Again, I know this isn't a problem specifically with stored procedures. When sprocs have been promoted as the primary interface, however, it's usually been a political/power grab more than a technical benefit. And yes, again, I know there are technical advantages in some cases, but usually not enough to outweigh the drawbacks I've encountered.
I believe I pointed out that I realize it's a human issue more than a technology one. It was easier for some people to get suckered in to the power dynamics being played because "DBA" was already seen as more of black-magic art sort of thing, and those guys were the "real wizards" and so forth, so whatever they say goes. It's not been everywhere I've ever worked where sprocs were used, but it seems to have been at the places where "stored procedures are law".
We're conjecturing about hypothetical bugs. No one in this thread knows what bugs were encountered nor how easy they were to fix; they aren't real bugs. They're symbolic of the bugs that we encounter every day, which give us the experience on which we base our conjecture.
How often is the ORM generating junk? To say that the ORM is useless because occasionally it doesn't generate performing code (with EF and Linq more often than not it does generate performant code) could be applied to any high level construct. But I don't see people giving up modern languages to go back to writing everything in Assembly or even C.
Yes I optimize when my automated performance testing/stress testing, tells me I need to and may write handcrafted sql, but I've also handcrafted some classes in C back in the day when my old Windows Mobile app using the C# compact framework wasn't performing.
I meant a good ORM. But then again, my definition of a good ORM is an ORM with a language that treats queries as a first class citizen. EF with Linq doesn't really act feel like a separate framework since Linq and Expressions are built into the language.
How is it a "query framework"? Linq works with objects and the Linq expression provider treats the Linq as data that translates the objects to Sql at runtime. That is by definition an Object Relational Mapping.
Of course most ORMs do fall short because most languages don't have the powerful concept of "code as data".
SQL syntax is extremely verbose, and compounds the more tables are involved in a query. You're not counting the time the ORM has saved you before having to resort to logging SQL.
ORMs are more useful as insert builders. Putting data from an object into the database is something of a boilerplate process. Queries vary with what you want to ask. Most of the time, you don't need all the fields, so filling up some object just because it has slots for everything is a waste of effort. Especially if it means references to multiple tables.
Good ORMs have Partial<T> and lazy loading of complex properties through proxies, which can be overridden with something like .With(x => x.ComplexProperty).
But of course, as the queries get more and more complex, the flexibility of the ORM syntax approaches the flexibility SQL. In the end, there are many situations one would rather just use SQL.
I think the best ORMs are those that just leave out the "Relational" part entirely. So... "OM"?
For example, in Go, I use Gorp, which has a Select() function where you pass in the SELECT query string (plus bound values) and the target type, and it loads every result row into an object of that type. So you can have an arbitrarily complex SQL query as long as it starts with `SELECT one_table.* FROM`. That's a marvelous design.
And when you have to do a query that returns results from multiple tables? Guess what, you just use the normal SQL module from the standard library.
Not just PostGres. That's standard SQL. Columns have data types. Tables have schemata. Schemata are very strongly typed. Can't insert a 'full_name' column into a table without such a column.
Unless you're talking about SQLite, SQL RDBMSs are both static and strongly typed. The systems typically do allow some implicit type conversions, but type is critical to how a table works.
while ORMs can do quite a bit of optimization, they're still general query builders and can't construct the optimal queries for your use case. if you don't know SQL, or you don't know what's going on behind the scenes, your ORM could be performing much larger queries than it really needs to, costing performance and time
I built a moderately complex application in Django at a previous workplace, using the ORM for most things, until the queries were too complex for the ORM.
Another guy connected to the same database and built some graphs using PHP and SQL. Guess who had to help him write the SQL when the queries got too complex for him? The ORM user.
It depends on how willing you are to allow your object graph to match your relational schema. The key is to let SQL be SQL. I wrote one that allows you to load data using standard SQL resource files, but it handles the persistence automatically:
Taking the SQL-first approach also allows you to serialize without circular reference problems, since you don't just load the data, you also define a path to decompose the graph into a DAG.
My experiences have led me to the standpoint that most ORMs handle three major things:
1. provide idiomatic domain-object oriented query interface which it in turn translates to SQL
2. provide CRUD sql generation
3. provide some sort of session-based object lifecycle change tracking and management
#1 - The generated SQL is important on many levels. As things like HQL/linq/<your QL here> deviates further from the generated sql transparency is lost. SQL is normally brittle compared to your domain language which has better testability and type safety but still you have a handful of queries where it feels more sensible to write the SQL yourself.
#2 - Code which reflects on a type and generates basic insert/select/update/delete is usually pretty naive and easily done. With the exception of complicated legacy databases and iBatis-style tools ORMs which only support #2 arent really worth bothering.
#3 - I've found this to be the real benefit of an ORM. Being able to scope object lifecycles into clear units of work, buffer pending changes until a discrete point and get scoped caches for "free" have been hugely beneficial.
Each ORM unfortunately/inevitably come with great learning curve. It seems to unfortunately/inevitably bring lots of new concepts and conventions to the table required for the user simply must understand. Sometimes it requires you to re-arrange the way you may write your code to be more session-oriented.
I dont like the idea of AI managing how/when to apply changes to storage media. AI can be smart and efficient yes but you lose that important transparency/predictability. On the contrary I prefer very dumb/mechanical/predictable ORM, something not elegant but the behavior of what happens when is well-understood and easily scaled out to a large team. In my experience hibernate, EF, sqlalchemy have that sort of dumb/predictable behavior (however the SQL they sometimes generate can be performant but unreadable).
You’ve probably refactored the ORM Data layer 3 times in that time period as ORM producers have a hard time figuring out the interface which they wish to provide you
Agreed. I'd say my point still stands because duplicated database calls affect performance and scalability and this can grind the project to a halt at a certain point. Also this refactor is much more dangerous/probably buggy.
On the other hand refactoring your data logic is just business as usual, and you will probably do it in both cases anyway.
caching is a very tricky area with tons of pitfalls. In my opinion, the ORM should not be caching. Let the clients (or any other layer) cache/clear based on their needs.
I can get very creative with SELECTs, making use of Prolog style queries, which are fully done server side on the database.
Most ORMs will download all the data and evaluate them on the client side, with code that is even more convoluted that the SQL one and thus with less performance.
I know EF, which is why my comment also mentions "code that is even more convoluted that the SQL one and thus with less performance".
LINQ only allows for a fraction of what is possible with SQL, and good luck having the best queries generated out of it, if the RDMS doesn't happen to be SQL Server.
> What AI do you need? You map your tables to objects and relationships between objects via FK relationships.
That is only true in a one-to-many entity relationship (and even so, it is debatable). A one-to-one relationship can be modeled in the two objects, in one of them, or delegated to a third entity. A many-to-many entity relationship can also be handled, in OO, in various different ways. Idem for a ternary relationship or, basically any higher order relation between objects.
This is known, borrowing a term from electrical engineering, as an impedance mismatch between the two models, and it's not an easy problem by any measure.
> so you'll end up inventing that part yourself without an ORM (I recommend doing so, on a less critical project, to learn the kinds of issues that present themselves).
I recommend anyone that has to deal with ORMs create their own at some point, in a non critical project. Not because they will necessarily create the next big thing (but who knows?), but because nothing quite gives you the perspective and appreciation for what these systems can do and why they have their pain points like making one yourself. You'll most likely not use your own module long after you've created it and then again surveyed what's already available, but it's invaluable in making a good assessment of those options as well.
Similarly, making a web framework yields similar benefits.
In both cases, a good understanding of the underlying technologies they build on (SQL and HTTP), is required, and if you don't have it doing in you'll have it coming out the other side (which is really the reason for this in the end).
Similar things exist all along the spectrum. From embedded OS's and compilers to javascript utility libraries.
What it comes down to is that a tool is best utilized when the person knows when and how to apply it appropriately, and that's as often as not an understanding of the tool as it is of the context.
A person intimately familiar with hammers and their uses can build some interesting wood furniture, but they'll likely never achieve the same level of product as a master woodworker that's just well acquainted with a hammer. Investing time and effort into tools provides only so much benefit. At some point, more knowledge of the craft itself is far more beneficial.
zzzeek - can I take this chance to praise your work on SqlAlchemy. People say there's not enough thanks given to open source developers... here's thanks to you. It's the work of a craftsman.
I'm pretty sure that SqlAlchemy has caused more grief and frustration than any other single library. After all if I didn't know about the excellence of SqlAlchemy, maybe I wouldn't get so cross when I have to do anything non-trivial with the Django ORM ;)
Joking aside, SwlAlchemy is very impressive software, and zzzeek deserves all this praise and more. Every time that there's one of these anti-ORM articles I feel like SqlAlchemy pre-emtively addressed all the substantive criticisms in its flexible, well layered, design.
(And in the interests of fairness, Django's ORM is also very good at making simple things simple).
+1 - personally I really like how SQLAlchemy doesn’t abstract away the database so much where all modeling and power is lost. He and the contributors have done an excellent job.
I've always been impressed by zzzeek's near-omnipresent participation. Years ago, I had a question about SQLAlchemy that he'd answered on the mailing list which amazed me given the relative rarity of developer interaction. Yet here we are almost a decade later and he's still answering questions directly, but on many more platforms. I'm actually not convinced he ever sleeps.
I wish more people aspired to be like him, because you know there's no more authoritative answer when you run into a Stack Exchange post and he's offering up his assistance.
But I'm really surprised every time people tell me they look at the schema as defined into the ORM instead of at the table in the database.
I'm really jaw dropped the few times I know somebody doesn't even know SQL, only the ORM. Maybe they look at it as if it were the reaction of somebody that thinks you must know assembly if you want to program Ruby, Python or Node (I don't.) Still, if you work with a database you must know it's internal language, SQL or NoSQL. Your going to need it or build a mess.
And involving a DBA early in the project can make your database at least twice as fast, with the right schema and the right queries. Then you translate that into the ORM you want to use.
I expect that everybody with a degree knows SQL and I'm realizing that I could be wrong. Maybe sometimes I'm the only one in the room that knows it. I'll check it next time I'm at a technical event leaning on the backend side.
Very happy to not have and ORM on our stack for many years.
I believe that the interaction with the database is just about the most important part of code that you need to rely on. We have had messy data written to the database due to misused or misconfigured ORM (perfomance issues, bad orphan handling, session state problems, overflown sequences, to list a few) and decided that we should not rely on all devs touching that code to be experts in the ORM api to avoid the pitfalls, we rather our devs be expert in SQL.
Never had any of them complain about mapping using something like jOOQ. Type checking and composability are also built in.
Although, I think what you plan to do with the data also makes a big difference. Which probably explains why opinions vary so widely.
For short-lived processes, like typical web applications and command line utilities – that load data from the database, do something with it, and then purge it from memory again – I'm becoming less and less convinced that ORMs are actually a benefit.
On the other hand, if your application plans to map the database data to memory for long periods of time, with the need to keep them in sync, then you're probably going to end up writing something that resembles an ORM anyway, and poorly at that. In this case a good ORM is beneficial.
For 3: the proper abstraction for relational data is, surprise, a relation.
I used to work in an environment where we had relations as full first class data structures. They are very pleasant to work with.
We actually had 'relation-object-mappers', ie when we had to interact with other systems that didn't use relations, we often mapped them to relations internally to make them play nicer.
How were those relations represented? If I understand correctly, you were not just wrapping tables in a database, but using some other backing implementation. What were the most common operations, and what was the performance like?
Oh, the implementation was fairly straight-forward. I think just sorted arrays are something.
It wasn't about speed of execution, but expressiveness when coding. Later on they even added proper type system support.
Common operations were things like map/project, extend, filter, join, collect-by-key / expand, etc.
Just as Codd pointed out in his original papers, relations allow you to not have to make a choice about a hierarchy for your data.
Using key-value store like a hash-table in your program, or the much vaunted has-a relationship between objects would force you to make these choices. Thus making interacting with the data awkward for all but one access pattern.
Relations work best when your program is written in a style that deals largely in immutable data. (What we call 'purely functional', but people in dysfunctional languages have also picked up on the advantages recently.)
> Just as Codd pointed out in his original papers, relations allow you to not have to make a choice about a hierarchy for your data.
Or, alternatively, make it really painful when you do actually need to query hierarchies, along the lines of "give me all the tuples above this one in the hierarchy".
so, i think #3 is less useful for switching out the actual DB server itself in an ongoing project, but i’ve found it immensely useful in a couple of ways:
- replaced the DB driver mid way through a project to a slower, but more complete implementation. this went flawlessly, because it’s all so generic and in the end the same DB under the hood, so a simple change (unless you don’t use an ORM where i could see it being a nightmare)
- started a new project where i had to use MSSQL, which I’d never used before, but i’m a big fan of postgres/sqlalchemy. other than ODBC oddities, it was really simple to write the new app with all the same patterns i was used to with things like update on write, lazy joins, constraint deferral, and i think most importantly would be MIGRATIONS! huge help to have the same migration framework that i was used to
> As a theoretical abstraction above the data-store
I worked on a project with a home-grown ORM (in C; it was horrible) that abstracted over both MySQL and Postgres ... except the overarching application required Postgres-specific column types and functions.
ORMs of various stripes were in common use well before 2004 so 1 & 2 don’t sound terribly persuasive. 3 is definitely an evergreen selling point - you might change to a different RDBMS vendor (back when there was such a thing), etc.
This is especially true where the ORM can mask things such as moving a column to an associated table. The ORM knows that object.attribute is now represented in the object_attribute table with the relationship using object_attribute.pk in the object.attribute column (which may or may not be renamed to attribute_id).
No need to rewrite all your SQL, just the ORM description of the model!
Realistically though, if you’re querying the same table from that many places your architecture is already in trouble. So editing the queries shouldn’t be that big of a deal.
> providing abstraction for database-specific and driver-specific quirks
That is quite theoretical. My PRs for fixing non-spec compliant behavior in pgjdbc get rejected because they might break some ORMs (mostly Play). My PRs for adding MariaDB sequence support to Hibernate get rejected because there are additional MariaDB features that Hibernate doesn't support as well.
heaps of this kind of thing are nicely dotted around the code so you don’t have to deal with weird driver quirks, maps “Text” column type to whatever it needs to be in your given database to have an unbounded text blob
In Java land the JDBC driver is supposed to do that. If it doesn't then that's a bug in the driver. If it's a bug in the JDBC driver then the fix needs to go into the JDBC driver, especially if the driver is on GitHub.
The presence of an ORM relying on these bugs should not prevent a bug in the driver from being fixed.
Exactly. It is amazing how many bad ORMs that I've seen in systems from people who "just used SQL" instead of an ORM.
I've found that the people who know SQL pretty well can actually do good work with or without an ORM. But the maintenance is a lot easier when the abstractions are consistent with the abstractions that are used by a popular ORM.
Everything that isn't SQL tuned to your environment forces you to sacrifice performance at some point. At some point it does an inefficient join. How can you not go back and fix performance issues is you don't know SQL? (It isn't like you need to fix it in C)
SQLAlchemy makes dropping down to SQL where necessary incredibly easy. There where necessary you can tune the queries, but where not necessary you can let the ORM create them for you.
This gives you a lot of flexibility and power, but as zzzeek mentioned, you need to know SQL to understand how what the ORM is inefficient and to be able to replace it as necessary while still letting the ORM do what it is really good at.
you need to learn SQL first before you work with an ORM.
Analogously, I'd say that being able to write a compiler, down to having it generate machine code or assembly, gives you a leg up when using a compiled language. I've met and interviewed a number of coders who had cringeworthy gaps in their knowledge, because to them, a C compiler was just some kind of "magic."
Hibernate (and presumably any ORM) was never intended to be a complete abstraction of anything-SQL. Like others have mentioned here, an understanding of SQL must be had before using an ORM. The ORM is one piece to the puzzle, not a shield to prevent you from having to touch SQL.
One pattern I typically use is a take on CQRS: Hibernate for writing/updating/fetching/deleting a single instance of deeply-relational object model, SQL (I like jOOQ) for larger-scale fetches and any bulk actions.
I must admit that I've inherited two messes where someone used NHibernate as a replacement for SQL. In both cases, the schema was excruciatingly simple but the code performed excruciatingly poorly.
In once case the project was canceled, in part because it was so late due to the developers not knowing how to use a database. In the other case I put my foot down and removed NHibernate. The schema was so simple that it was just easier to put a few extra minutes into boilerplate code than to put time into learning a new thing.
I'd really like to see a good writeup about the use cases that tools like Hibernate excel at. The problem is that, in both cases, I had to work with a high-level manager who had very bad assumptions about what Hibernate can and can't do.
I'm so glad someone finally confirms my idea about using hibernate. I've been in constant battle with developers that map full object graphs coming Frome some endpoint to hibernate/jpa annotated classes and then throw it at hibernate. Here you save this... It just doesn't work that way very well. It's always a mess with relationships. Whereas you run your logic on entities attached to the session things work out nicely.
> Hibernate (and presumably any ORM) was never intended to be a complete abstraction of anything-SQL. Like others have mentioned here, an understanding of SQL must be had before using an ORM.
Disagree. You need to understand the relational model, but you don't need to understand SQL-the-language. I've written plenty of successful systems using hibernate without needing to touch SQL, and am much happier for it.
I have my own views on this but I basically agree with everything you said.
I think ORM is a great way to translate tables into real objects that have their own methods and properties that may or may not interact with the DB. I think that's where the real power is.
But for performance-necessary actions, yeah, SQL all the way (or rather a query builder).
99% of the time, an ORM is fantastic and will make you more productive while providing performance, security, and maintainability. They come in many sizes from thin wrappers around a db connection to full-featured frameworks.
For the other 1%, use raw SQL, or perhaps a query building tool to help with parameterization, composability, etc. In fact, modern ORMs will even let you input raw SQL and handle the conversion back to objects if you need it.
Saying ORMs are always wrong is just as dumb of a statement as saying all database access needs to be in raw SQL. They are just a tool and abstraction, like everything else you use in software development. You know the right time to use it.
That being said, not knowing SQL at all means a lack of general understanding in how relational databases work and will almost always cause problems.
All of these endless debates seem to boil down to two different groups who work in two different problem domains talking past each other.
I've personally never seen an ORM lead to success in the long run. But I also work in a space where queries frequently end up involving something that ORMs typically don't handle well: merge statements and pivot statements, window functions, management of the lock escalation policy to fine-tune performance, temp tables... The list is endless.
What I have not ever worked on is a relatively basic CRUD datastore. Which I realize is what most people are using databases for. So at this point, I'm putting my money on ORMs being a hole in one for that application. Because, otherwise, I just can't reconcile a statement like, "99% of the time, an ORM is fantastic" with the reality I'm living in. In my career, 100% of the time, when an ORM was present, it was invariably the single biggest piece of technical debt.
Yes, you are talking the 1% of use-cases, which is probably more like 10% these days with more complex software. Most business apps are just CRUD, but if you're doing analytics queries and such with tabular/pivot/nested result sets, then an ORM isn't going to do much for you.
SQL is the database interface so of course using it directly without abstraction helps you get all the power and control. I have seem some cases though where a query-builder with a solid DSL can be a good middle-ground.
I guess my experience is limited, but I’ve not seen much of this despite working in an ORM environment with around 75 entities for the last few years (my recent experience anyway, the rest goes back 16 years). Maybe that is small potatoes, I don’t know, but I’ve found that anyone who understands JPA well enough can work to avoid any pitfalls of using ORM. It seems to me that having a good mix of understanding SQL and ORM is a good thing; and especially understanding exactly what the ORM system is doing for you and how it is doing it. Dropping ORM altogether sounds like a bad idea since it provides a number of built-in security features as well as an abstract modeling paradigm that is fairly easy to conceive and maintain; provided, of course, that you learn to say “No” to protect the integrity of the model (such as rejecting the attribute creep the article warns about).
I have found, in my experience, that people who tend to want to write SQL over ORM usually want to do so because they simply know SQL better. That’s okay, there is nothing wrong with that. But that doesn’t immediately mean ORM systems are bad. No need to be tribal about it.
The problem I see is that many new software developers these days sometimes can’t see the forest for the trees because they dwell too much on what they think is better instead of simply seeing the software and abstractions as nothing more than tools in the tool belt. It happens everywhere — PC vs Mac, iOS vs Android, Scala vs Java, SQL vs ORM. It’s fine to have opinions, I have many, but as I’ve aged I’ve become acutely aware that my biases are almost solely rooted in the limitations of my understanding.
The post is from 2014, so I won't be too harsh here. The author's problem is with some specific flavors of ORM he's used, and shouldn't be generalized. Hibernate's expressiveness is/was crippled by Java itself. C# ORMs on the other hand are way better because they benefit from LINQ which adds queries natively into the language. Other more expressive languages have excellent ORMs as well.
The objective of ORMs is not to replace 100% of your queries. That 10% might still require SQL or Stored Procs and that's fine.
ORMs give you:
1. Type safe queries
2. Ability to refactor easily, click to rename prop
3. Not having to handcode joins if objects are related
I've been writing C# professionally for ~5 years at this point. While I was initially quite infatuated with LINQ2SQL and EF, I have gone through the same situation as this fellow. I just write SQL in-line using Dapper for parameterization/data mapping, and use stored procs when I need some of the more arcane features of SQL (merges, CTE, etc).
I've been writing C# professionally for ~12 years at this point. I'm extremely comfortable with SQL, the first startup I worked for for 3 or 4 years in the mid-2000s did amazing things with it and was extremely anti-ORM. We did things like write SQL that would automatically get translated into XML, which we'd combine with xslt to create dynamic pages.
Yes, I've hit major problems with the EF (including one on Friday which was causing huge CPU and Mem spikes at the busiest time of our year). Yes, it's a PITA to debug the queries. Yes, it can do stupid things. Yes, some programmers can massively over-complicate it.
But you can prise the EF from my cold, dead hands before I give it up. ORMs rock. It's such a huge time saver as long as you KISS and accept you have to drop down to SQL sometimes.
And screw switching to Core until they've sorted out lazy loading. Lazy Loading can also screw you, but again, it's just wonderful when you use it right.
I've also turned into a huge fan of Code-First and Migrations, having been against them at first.
> "And screw switching to Core until they've sorted out lazy loading. Lazy Loading can also screw you, but again, it's just wonderful when you use it right."
Are you referring to EF Core? I was considering learning it. What's this lazy loading issue? It's not one I've heard of before.
Basic functionality like group by, lazy loading, etc. is missing. By the look of it you can't even load custom types from hand-crafted queries, which is pretty ridiculous.
Haven't really been keeping that up-to-date with it. I personally feel the whole Core thing has been a massive cluster-fuck for their existing customers.
>Lazy Loading can also screw you, but again, it's just wonderful when you use it right.
That's funny - the first thing I do when starting a new project is turn off lazy loading globally. I find it hides poorly-performing code until it's causing problems; the equivalent code without lazy loading usually just throws an exception.
Granted I've only been in the industry for 3 years, so /shrug
Also, question: you mention code-first and migrations... do you think you'd rather use SQL for your schema definition/migrations if the tooling was better? I find SSDT doesn't quite cut it :(
We have far more problems with too many .Includes causing terribly performing queries with bad JOINs than lazy loading problems. Granted this code base is in a bit of a state and we have a complex order structure that can go like 10 layers deep, and ideally we're looking to go even deeper with complex pricing. You do an include with all of that and you're going to get a terrible query.
It's generally very cheap to do a single item query with no joins (as in a nanoseconds db query, yes, nano), and it only has to do it once, if you re-use that object again anywhere else, it's already in the context so it doesn't have to go to the db. Even doing them hundreds of times can be super cheap[1]. Add to that it only has to load each item once you can make intelligent decisions about what to .Include and what to lazy load.
[1] Caveat, Azure db connection latency often sucks so this isn't completely true, we had 5-6ms instead of the <1ms you'd expect. Presently at about 2-3ms. This is the fault of Azure and not lazy loading though. Causes a problem when you make hundreds. 100 lazy loading calls each taking 6ms would add 600ms, or 1/2 a second, to a request.
>We have far more problems with too many .Includes causing terribly performing queries with bad JOINs than lazy loading problems. Granted this code base is in a bit of a state and we have a complex order structure that can go like 10 layers deep, and ideally we're looking to go even deeper with complex pricing. You do an include with all of that and you're going to get a terrible query.
Wait - are you "Include"ing things you don't need? If not...assuming a sane query plan, shouldn't the single query (e.g. "Include" version) outperform the deconstructed series-of-queries that brings back the same data?
select * from Orders o join OrderItems i on o.OrderId = i.OrderId
where o.OrderId = 5
vs.
var order = context.Orders.Where(x=>x.OrderId = 5);
var items = order.OrderItems;
which translates roughly to
select * from Orders o where o.OrderId = 5;
select * from OrderItems where OrderId = 5;
As far as I understand the former will outperform the latter, even if you don't take into account the additional connection overhead. If the second version was faster...wouldn't SQL just compile down to a series-of-queries automatically?
The example is too trivial, which is why it looks like it might be better. Here's a real world example, not even complete. A restaurant booking might have:
- A venue associated with it
- A user who booked it
- A menu
-- With courses (starter, main, dessert)
--- of Menu items (steak)
---- with Menu item option groups (think, 'pick one of', 'choose at most 3', etc.)
----- of Menu item options ('rare', 'medium', or 'extra chips', 'onion rings')
- Guests
-- Guest.User
-- Guest selections of menu items
--- Guest selection options
- And Many more! (payments, events, offers, postal addresses, etc.)
There are loads of things that are optional, or even extremely rarely filled in (say, an associated special area of the restaurant, or perhaps an assigned waiter, or a third-party partner who placed the booking). And we can just let the EF load that in lazily. It knows a nullable int means nothing to load, but if there is a third party supplier, it can go off and load that lazily (and extremely cheaply).
As for the query, you can load them in chunks (which we do), but the way the EF works you're limited on how you can do that.
If you try loading that all in one go, you get a very, very slow query. It's beyond the limits of the execution planner to do it well.
Because of the nature of ORMs, the EF can also make decisions which result in horrible sub-selects, or terrible joins of sub-tables where the execution planner can't use the right indexes, especially when you're trying to do groups, counts, sums, etc.
This makes it often better to load things separately and to selectively use the lazy loader to do certain things.
Other scenarios include where say you have a complex object that you've only partially filled in, but in 1 in 5 cases you want to send an email using that object.
Now you could write your email function to load all the data again, or you could let lazy loading do its thing and, overall, save time and decrease db load, because you've already got 80% of the data, it just needs to fill in the missing 20% with some simple queries.
Answer to earlier question: I use to hand-code my db upgrades as my opinion is that having correctly structured data is king and I understood relational db design. But it turns out EF Migrations are wonderful when you know how to use them. I still check every single one to make sure they're doing exactly what I wanted and expected though, and take them up and down manually. Good way of catching mistakes.
I see - so in your case you gain from the fact that many of the joins are likely to result in zero matches. And since you're joining on a nullable FK, you can tell in advance whether the record exists without a lookup - I ran a test and I verified that you pay a cost for the below join regardless of whether the FK column is null.
select *
from Person p1
--p1.Spouse is null for the record in question
join Person p2 on p1.Spouse = p2.PersonId
where p1.PersonId = 42
(I understand why it can't do it in the execution plan, but I'm surprised it doesn't 'short-circuit' at runtime since the join predicate is trivially unsatisfiable for that row)
Your other benefit involves conditionally needing data. I will say it's not too hard to structure app code to avoid loading redundant/unneeded data in your email example, but it's certainly easier and more maintainable when property access is fundamentally linked to its actual retrieval - it's impossible for another developer to make changes to your version and 'lose' the efficiency, while the same isn't true for mine.
So it's less black and white than I thought...which it usually is :)
How do you feel about using 'explicit' lazy loading? E.g.
PersonEntity person;
if(IWantToSendEmail){
person.Reference(x=>x.Email).Load();
//use email info here
}
It actually is quite hard to do it and have re-usable code.
There are various different ways the same email might get triggered, maybe the booking came from an API call, maybe it came from a new booking form, maybe it came from a 'send reminder' button.
In all cases, I have a booking object that will be in a different state of being filled in. The underlying need for data for the rest of the request is very different. Some of them need a fully filled in booking, some of them need the bare essentials. Our "fully load this booking" function takes like 150ms, which isn't cheap and a significant amount of the time of that is DB time, which is again our most in-demand resource.
CPU/Memory is (generally) under-utilized on web apps and letting the EF do it's lazy loading thing is usually the best solution.
As for explicit lazy loading, it's inelegant and way more code. One thing we know for sure, more lines = more bugs.
I'm not saying turning off LL is a bad thing, if it works for you, but I semi-regularly have a SQL profiler running while developing so I see when it starts kicking out loads of queries un-necessarily.
Is LINQ actually an ORM? It looks like a DSL to do relational stuff in C#, but I don't see where the 'O' part of ORM fits in especially not in your examples to show off LINQ's power and convenience.
The language's ability to treat code as data allows the programmer to express queries in native language syntax and pass it to an ORM (such as EF or Linq to Sql) for execution on a data store.
Add: Where does the 'O' part fit in? You build a entities (in plain C#) with relationships to each other, and you could do stuff like:
Thanks. My quip was just that objects (as in object-oriented-programming) aren't the right abstraction. Having some kind of mapper in your language between the database and the entities your program is dealing with is useful, and you showed that functional programming is a more friendly host than oop.
Objects are fine, it's classes that are often limiting. Consider the very first example above (corrected to be valid):
var r = customers.Select(c => new { cust = c, orders = c.Orders })
This gives you an IEnumerable (basically, a forward-only sequence) of objects - but these objects are of an anonymous type that was implicitly defined by "new".
Linq looks like a free monad: you declare the program as a data structure and the runtime implementation just interprets it on the go (something you could - not so easily - try to do with a strategy pattern)
LINQ itself doesn't have anything directly to do with SQL. It's a mechanism for composing expression trees that are compiled into your application. At runtime a LINQ query provider can walk the expression tree and translate it into a query for the underlying data store. It's a very very powerful concept, but IME it doesn't get used all that much for things other than SQL because writing a query provider is a lot of work.
Anyway, the typical C# ORM is Entity Framework, which includes the LINQ-to-Entities query provider.
1. So does PostgreSQL...
2. Coding directly in SQL one first normalizes as much as possible, then denormalizes as much as needed to make desired queries performant. If you should need to refactor the schema, things like "rename prop" are trivial, and other things less so, but probably also not automated by most ORMs anyways.
3. Simple joins, sure, but need much more than that and the ORM gets in the way.
4. Well, SQL is lazy.
Object-oriented programming 101 assumes that all your objects are in memory, in a graph, so you can do things like person.getFriends()get(0).getName() [assuming the person in question has >0 friends]. Each step in the graph is essentially a pointer dereference, costing a constant effort.
(If your data is small enough to fit in memory, that's what you should generally be doing. People who use hadoop for half a GB of data are usually doing it wrong.)
A relational database assumes that all your data fits on disk, but only a subset of it will be in RAM at any one time (and you generally have a network round trip every time you change that subset). This means you need a completely different way of thinking; this difference is sometimes called the "object-relational impedance mismatch". This is not to do with SQL and OOP just being different APIs for the same thing, they are designed for very different use cases.
ORM tries to pretend that this difference doesn't matter, and works quite well in simple cases when it really doesn't matter.
My standard example why it sometimes does matter: PersonDAO.fetchAll().size() is silly because it forces the database to fetch all Person objects, send them over the network, your application creates the necessary objects for them - and then you throw it all away again because all you needed was the number of people. PersonDAO.count() is much better, even if you have to implement it yourself.
If you don't like the syntax of SQL, sure - use a query builder. In C# or Java you can even get some kind of type safety that way. But you need to understand the difference between an object graph and a relational database to use either of them efficiently, long before you get to advanced ideas such as window functions.
> My standard example why it sometimes does matter: PersonDAO.fetchAll().size() is silly because it forces the database to fetch all Person objects, send them over the network, your application creates the necessary objects for them - and then you throw it all away again because all you needed was the number of people. PersonDAO.count() is much better, even if you have to implement it yourself.
I mean, PersonDAO.fetchAll().size() is just bad code. You probably need to know how to write ifs properly before programming? You need to know about databases before using an ORM.
> Object-oriented programming 101 assumes that all your objects are in memory, in a graph,
Yes. We insist on using this OOP style where your objects are Person, Order, Invoice, etc. Ignoring that the data is stored in tables.
But, it may not be the only way. What if your objects really are things like Tables, Records and Relationships. (e.g PersonTable, OrderTable, OrderDetailTable, etc) and your operations are whatever you can do with these tables. We are trying to abstract away something that is very real. What if we embrace it instead ?
I think saying "Just use SQL" is probably a bad idea. You'll most likely end up implementing an ORM anyway, or you will end up with your model code mixed up everywhere with your views.
I do think a lot of people use ORMs as a crutch, which sucks. Also, ORMs often provide too much abstraction, forcing people who actually know SQL to relearn how to do everything the way the ORM happens to like it. I should not have to learn twice as much to be productive due to an abstraction.
What I prefer are really lightweight ORMs that give me models which I can then enhance with custom code. I don't need an ORM that supports plugins or inheritance or a dozen different kinds of joins. All of that can be done more efficiently with custom code.
Also, I think SQL builders are really useful. I think a lot of people conflate SQL builders with ORMs but they're actually very different problems.
> You'll most likely end up implementing an ORM anyway,
This is a really good point. Many people start with the "no ORM" philosophy, realize their application needs some way to map the SQL to the code, time passes..., they have implemented their own half-baked ORM.
A more positive spin is that you'll have an "ORM" that's exactly adapted to your application. Many apps (1) don't need to work with multiple DBMS types and (2) don't use even close to the full panoply of SQL features.
In a language like Java that has a generic DBMS API you can get along just fine with a few classes that handle CRUD operations and transaction management. Somebody familiar with JDBC and SQL can write the bridge classes in about a day, while keeping the overall application vastly simpler.
Either way somebody needs to make an informed choice about ORM vs. direct SQL. It seems as some people get in trouble because they skip that part of the design process.
I mean, stored procedures are fine, but I don't think that actually solves anything? Except maybe for reducing the amount of SQL code you have to send back and forth and (in some databases) allowing for a few more optimizations?
If you use stored procedures, all you've done is move part of the model into the database, so you have to update the stored procedures as part of a deployment. You still need to have the SQL code written out somewhere, and you still need to have something in the application code that knows which procedures exist and how to use the data they return in business logic.
But you can decouple the database logic from the app logic anyway, without using stored procedures. They don't actually help you do this since you still need code that knows what stored procedures to call. Also, I'm not sure how this makes security easier? It seems like security would be the same or maybe a little harder since you now have to track the stored procedures you're currently using as well.
I'm not really sure what you mean by "you can deploy schema changes independently" and "you can change everything and the app should not notice". The stored procedures are basically just an extension of the apps logic right? So you can deploy them at any time, sure, but that isn't different from an app that doesn't use stored procedures, because you could also deploy changes to any part of that app at any time.
I do think stored procedures can be more efficient, because you have a lot more control. But it's not like they are clearly superior from an organizational standpoint. If you write an ORM using stored procedures, it's still an ORM.
As far as I know, in my experience. Stored procedures in postgres are good when you really use the database and care about the data, you need transactions, need to handle races and concurrency etc. Whereas ORMs break down at this point or prevent you even getting to a point when you can use your database as a database.
Why pretend your SQL database is about objects? It is not... (it is about data)
A stored procedure can act like a view or a query, or use procedural logic. Point is: your app can call it and get a concistent result, no matter what refactoring has been going on.
A direct query needs to know too much about the database (orm generated or otherwise) which prevent refactoring and couples app to database harder...
You can rename or merge tables, views functions in the database but the interface the app use (stored procedures/DAL) will stay the same and work the same way.
As for app logic... I prefer bussiness logic in the database, not the app when the data is important. Application logic stay in your application, data dependent bussiness logic stay with the data.
Every single implementation where I've seen "business logic in the database" has been an unmitigated disaster.
On the other hand, having well factored microservices (out of process) or in process modules have worked out really well with modern devops and software engineering principals - easy push button deployments and rollbacks, unit testing, A/B deployments, etc.
I think bussiness logic in the database has prevented disasters in the projects I have worked on. I honestly dont see how it could have been solved better...
It probably depends on the domain/problems.
My experience is with
transaction heavy financial systems or similar, with web frontends, microservices sprinkled around in different languages...
The web app or java worker should be allowed to focus in its problems, the bussiness logic needs to live in one central place, which happens to be in the database accessed through thightly controled interface in the form of stored procedures.
And what's stopping you from having a tightly controlled interface with a REST Api that is easily deployed, rolled back, unit tested, source controlled and deployed?
I like data. A database is created to handle it, give you tools to query, modify, scale, secure the data.
A rest api... how and why should it be responsible for your data? It solved a different problem.
You might not even need a database I guess, and then anything goes.
I need and like my database, and have suffered trying to get along with different ORMs. SQL is so good at what it is designed to do if you just let it.
(And why just
one rest api? How about 100 restapis, some microservices, some web apps, some background workers, many different languages. One database. No ORM)
(have we come to some max nesting level here, cant reply to the child comment)
One db can be a problem, or a strenght depending on the domain; And I really dislike religious design, esp microservices.
I have less problems by avoiding ORMs (and religios microservice arch, or fundamentalist interpretations of rest)
Database handles the shared state in a heterogenous environment.
We need it to be centralized to keep track of money, the apps can't do that, two independent databases cant do that either. It must be one system that guarantees concistency.
It works great, there is no downtime. The interfaces are defined, the database stands alone, updates are deployed separately.
Database handles the shared state in a heterogenous environment. We need it to be centralized to keep track of money, the apps can't do that, two independent databases cant do that either.
Why can't apps "keep track of money"? I'm assuming you're referring to transactions. Apps can create transactions and you can share transactions across apps using distributed transaction (I'm not saying distributed transaction is a good idea).
One database is still an issue. When you have a clear slice with one microservice being in charge of one set of data, it's easier to scale, slice, rewrite, and you can deploy and iterate faster without interdependencies.
And you lose all of the benefits of microservices if there is still a tight coupling between unrelated (from a domain perspective) to tables.
Why would you want to deploy schema changes separately? I would be horrified if someone changed my DB back end without running a full (hopefully automated) set of tests.
The db is separate and the interfaces are defined and the test is for this interface (as part of the schema repository)
You dont need an ORM for testing your code...
But I think this varies from project to project.
How many different applications, in different languages are using your db and do you tolerate downtime?
Why downtime? A developer commits their code, the CI server builds the code, run non database dependent unit test, it gets deployed to the integration environment, automated integration tests get run - fewer in number somewhat slower - it gets deployed to the QA environment and goes through a round of manual testing (sometimes), QA signs off and the build gets deployed to the UAT environment and waits for the business owners sign off, then we turn off the A side of the load balanced farm and it gets to deployed to the A side of the load balanced production servers, it goes through a round of smoke testing (automated and/or manual) and once everyone is satisfied, we make A live, set the load balancer to use side B and deploy to B.
All of the manual sign off steps are integrated with the automated release pipeline. As soon as the required approvals sign off, the next step of the pipeline is done.
Rolling back is just redeploying the previous released version. Branching, source control, etc is also a lot easier when all of your business logic is in code and you don't have to sync up the "right" version of your source control with the right version of your stored procedures.
Of course this is even easier when you're using a NoSql solution where your schema is also defined by your class models. But that's another discussion.....
Of course this doesn't have to just apply to code. With things like Packer and Terraform you can do the same with infrastructure. Automated infrastructure deployment is not my expertise...yet
Just use stored procedures? Then you lose the ability to do unit testing without a database dependency, it's a lot easier to rollback code than to rollback code and stored procedures as one and you don't get full visibility on what the code is doing just by looking at the source code.
If all of your business logic is in the stored procedures, what are you actually testing?
And I realize that being able to test queries without database dependencies, only really applies to a few languages that treat queries as a first class citizen in the language like C# and Linq where you can mock out your actual Linq provider - replace the EF context with in memory List<T> - and still test your Linq queries.
> "If all of your business logic is in the stored procedures, what are you actually testing?"
Depends on what you want to test. Can either write unit tests for the stored procedures or unit tests for the code that makes use of those stored procedures.
And then when you write "unit tests" for stored procedures with a lot of developers you get slow "unit tests" that don't scale across multiple developers because of Comte toon issues.
I think GP meant that you can't/it's hard to test the stored procedures themselves. In this case if you mock the database calls you will not test the database logic.
How would you test data access in a meaningful way without a DBMS? It does not really matter whether you use tables or SPROCs. You'll still need a DBMS instance available.
For PostgreSQL and MySQL you can bring up the DBMS in Docker. That is not a built-in fixture obviously but easy enough to do locally as well as in CI/CD systems like Travis. You'll need to load SQL into the DBMS as a prerequisite to testing. That has the benefit of testing your load/upgrade sequence.
Usually with most modern automated deployments, you keep your build artifacts in a package (zip file, tar, etc.) and run a script to deploy it to your target system.
Rolling back is a simple matter of installing the previous archive. That doesn't just apply to code anymore. You can treat "infrastructure as code" also.
You can do A/B upgrades, rollbacks, etc. There is so much better tooling around regular code than sql/stored procedures. How many times have you seen stored procedures with hundreds of lines, duplicated stored procedures with V1,V2, etc appended to it, commented out logic etc?
I've had to wade through some hairy code to but at least with a code, I can do some automated guaranteed safe refactoring, find dependencies, keep the interfaces backwards compatible, etc.
You did see me preach about all of the capabilities for unit testing without a database dependency, type safety, flexibility (with Linq I can switch back and forth between an RDMS and NoSql without any code changes)?
AlphaZero's chess strategies turned out to be quite different from how humans have traditionally come up with chess heuristics/strategies. I wonder what paradigms will be used by AI that does computer programming. Will it organize code into some bits of functional programming, some OOO? Will it structure things into MVC? My guess it whatever AI does will be completely inscrutable to us. It may be optimized for efficiency rather than understandability, which humans need for maintainability.
I have to agree. ORMs can be great when an application is just starting, because that's when you're writing the most tedious queries and statements, but beyond that I personally find that ORMs just get in the way.
As soon as I need to write something more complicated than select-by-id I end up reaching straight for SQL. Otherwise I have to learn both the ORM's query API or DSL and have the proper mental model for how it translates to actual SQL.
Or I could just write SQL and be done with it. No mysteries.
Maybe this is just me, but I've never written a join in an ORM that I had the slightest bit of confidence in.
Any discussion on ORMs needs to consider what sort of app and queries you are writing.
ORMs are fantastic for the very common case:
* Fetch 20 rows and display them as a table,
* Fetch 1 row by primary key and display it as a form,
* Write updated fields from the form back into the 1 row in the database.
Anything more complicated, and direct SQL starts being more attractive. But for the common case that ORMs are designed for, they are a major productivity boost.
The first time I've seen an ORM I was fascinated by this seemingly beautiful idea. But I have quickly realized that it's almost useless in real life projects, plain old SQL seems just much much better. Now I don't understand why would anybody use an ORM actually.
Also, basic SQL can be easily taught in as little as 10 minutes (I have been initially taught it at middle school during MS Office Query, Access and VBA class). An image of a programmer that can't use SQL (I don't mean advanced cases which can indeed be a bit tricky but these are far beyond the powers of any ORMs anyway) seems really bizarre to me.
SQLalchemy is kind of a pain in the ass to actually use, though. The DSL doesn't feel very Pythonic, it's weird and confusing. The way it traverses the Object graph when loading associations between models is magical and opaque and I could never predict when it was going to automatically work and when it wouldn't.
I actually would rather be writing Ruby on Rails, because it's _less magic_ than SQLalchemy.
But my very point was not to make it traverse object graphs at all. You can write nice SQL using it, with selects, joins, functions, etc. All these parts, SQL clauses, are composable, so you can factor out common parts.
Yes, I'm fine working with lists of tuples, not "model objects". Object graphs don't map all to well to the RDBMS model all too well. It's best done on case-by-case basis, if you care about performance at all.
If you aim to have a pythonic data layer, I've heard good things about pony orm[1]. I mean,you can't get much more pythonic then the example they have on their website. But I haven't used it my self.
The one that bit me with sqlalchemy is joins. You have to structure your sqlalchemy object in a specific way to do joins. Usually I write my sql query then spend half an hour trying to convert it to sqlalchemy.
This is a quip that misses the parent's point. SQL isn't very composable, because its syntax requires infix notation, position-dependent phrases, separators, and the like. The abstract syntax tree of SQL is much more valuable than its syntax, which is essentially just a bad English serialization that resembles a natural language sentence.
A DSL which manipulates queries and then produces syntactically valid SQL is tremendously valuable.
Bare in mind that SQL was designed to allow non-technical business users to use it. In most offices, a vetern business user or BI person will know how to use SQL. So SQL is a DSL for relational deconstructing of data models that is conceptually simple enough for people to grasp.
Templating can absolutely save time and boilerplate, but adding a DSL over a DSL in the form or ORM may well be the problem.
Views are only half the answer. Try factoring out a set of conditions and use them both in a `select` and in `update`.
Also, views tend to pollute the namespace a bit: you want a set of descriptive names, per application, and possibly per application version. Schemata help here, though.
He says he likes SQLAlchemy, but doesn't say why. I'm interested to know that though.
To me, the big win with SQLAlchemy is that it separates the SQL expression layer from the ORM layer so cleanly. I can write very complicated queries with the expression layer that wouldn't be possible at the ORM layer, and do it in a much more composable way than concatenating SQL strings. Example: http://btubbs.com/postgres-search-with-facets-and-location-a...
I'm very comfortable in SQL, and would prefer to write my queries. But our team has grown, and we've found that developers say they know SQL, but they really don't. In general, I've found that it's not the syntax that messes people up. There's a significant mental "jump" between the usual procedural coding paradigm and the "set based" paradigm offered by SQL. Some people just never catch on.
So we're going to start using an ORM (Entity Framework Core 2) so that the "mere mortal" developers can pitch in. I know there's a way to run raw SQL, so I know that when we find spots where the ORM fails, we can just rewrite it with some good SQL if we decide that's best.
But maybe that'll never happen? We've been trying to do simpler SQL stuff as of late so that the heavier lifting in the system is done by the application/client instead of the database (bottleneck). As the database sees simpler, less unique queries, more stay in the plan cache, indexes are more reliably hit as expected, and performance increases.
Why are we even having this debate? There are some ORMs that bring so much value to the table that you'd be stupid not to use them. Example being Django's ORM or SqlAlchemy.
Also be specific in what ORM you are comparing to raw SQL. Are you talking about Hibernate or SQLAlchemy. Are you talking about a query builder?
Good ORMs help tremendously with maintainability and security. They also let you drop down to raw SQL when needed.
I don't think rails would have been as popular if you had to use SQL.
I've used SQLalchemy a lot. I've even written a keyset paging extension for SQLalchemy.
But recently I've switched to writing stored procedures and calling them directly, instead of going through an ORM for everything... And it's so much easier.
In the databases I've worked with extensively (PostgreSQL and, somewhat in the past, Oracle) Store Procedures are routinely versioned controlled as part of the application, just these bits of code are in a different language than the rest.
The creation of functions/procedures is not tied to state of the database in quite the same way as tables are; the functions/procedures, where they care about the data, do need to recognize the table structure and changes to that structure, but that's no different than any other of the application code which makes use of the data in the database.
I think where many people get caught up on this is that they do something like migrations to get code, including procedural code, into the database... but that's not the only game in town. And really, given what's possible with databases today, I'm not sure migrations are even the best way anymore.
Consider a tool like: http://sqitch.org/ which facilitates not treating stored procedures as migrations, but rather as individual files which change just like any other code.
There are ways to accomplish having good version control on the table/structure side as well, which again, is something you lose with the migration tools I've worked with.
Sure you can. SQL is just plain text, so keep all your create table scripts in your repo, along with deploy and rollback scripts that you’d use to extend or migrate your schemas.
Microsoft has SQL Server Data Tools which can be integrated into your favorite SCM (we use VSTS but we'll switch to Git soon), can have automated deploys (which we don't) and in general can be a part of a modern development lifecycle.
Flyway for Java... For stored procedures you use the repeatable syntax that way it checks the checksum of the file and if it doesn't match what is in the migration table it will run it... Easy and always up to date... That's with Java anyway...
There are several such tools, which are specialized and thus tend to perform a better job than orms (who try to do everything)
But I would recommend writing, reviewing and deploying migrations by hand, esp for critical parts of the schema (automatic tools are almost guaranteed to get something wrong, with locking etc)
Yes, and if all your stored procedure is doing is execute a query and return a cursor to the result set, it's using just as much database cpu as with a regular query.
I disagree. ORMs dont bring value, they dilute value and piles abstractions upon apstractions. Your application becomes hard to maintain, database hard to refactor, queries slow.
ORMs seems nice for simplified problems but becomes a horrible mess for real problems imho...
I've worked on some rails apps, and the ORMs caused more problems than they solved...
A few years ago, I worked on an application which would occasionally run into performance issues. Every time the solution was to rewrite the Entity Framework usage into standard SQL. Sometimes it would generate the strangest but somehow legal queries which would be challenging for SQL Server to optimise.
For my next app, I just started using SQL only and never looked back. Simple queries are easily generated with a nice internal API (SELECT * FROM Table WHERE ID = X, etc).
However, more complex selects are all written using hand-written SQL. New developers sometimes find this a bit strange, especially the younger ones, but nobody can fault the system's speed.
Django's ORM is actually a good example of where identity is an issue. It lacks composite primary keys. If I have a dependent table that has my real ID, say an order number that is a varchar I have to join to the parent table to do lookups by the order number. I can't doing something like key(order_number, line_number) so I end up hydrating a parent object for operations that only require operations on the dependent objects.
Good question. It would depend what kind of rules they were and how they were configured. Wasn't specified that they were "user-configured" in the original statement, though.
The kind of rules we use are boolean expressions that identify which rows the user's groups can read/write/delete. The ORM automatically combines all the rules as a single expression and applies it to the query.
As best I can tell, Postgres lets you do this albeit constraining your boolean checks to things you can efficiently do in the database. But then I can definitely see how this could get unwieldy very quickly.
- Are maintainable by a team. "Oh, because that seemed faster at the time."
- Are unit tested: eventually we end up creating at least structs or objects anyway, and then that needs to be the same everywhere, and then the abstraction is wrong because "everything should just be functional like SQL" until we need to decide what you called "the_initializer2".
- Can make it very easy to create maintainable test fixtures which raise exceptions when the schema has changed but the test data hasn't.
- Prevent SQL injection errors by consistently parametrizing queries and appropriately quoting for the target SQL dialect. (One of the Top 25 most frequent vulnerabilities). This is especially important because most apps GRANT both UPDATE and DELETE; if not CREATE TABLE and DROP TABLE to the sole app account.
- Make it much easier to port to a new database; or run tests with SQLite. With raw SQL, you need the table schema in your head and either comprehensive test coverage or to review every single query (and the whole function preceding db.execute(str, *params))
- May be the performance bottleneck for certain queries; which you can identify with code profiling and selectively rewrite by hand if adding an index and hinting a join or lazifying a relation aren't feasible with the non-SQLAlchemy ORM that you must use.
- Should provide a way to generate the query at dev or compile-time.
- Should make it easy to DESCRIBE the query plans that code profiling indicates are worth hand-optimizing (learning SQL is sometimes not the same as learning how a particular database plans a query over tables without indexes)
- Make managing db migrations pretty easy.
- SQLAlchemy really is great. SQLAlchemy has eager loading to solve the N+1 query problem. Django is often more than adequate; and has had prefetch_related() to solve the N+1 query problem since 1.4. Both have an easy way to execute raw queries (that all need to be reviewed for migrations). Both are much better at paging without allocating a ton of RAM for objects and object attributes that are irrelevant now.
- Make denormalizing things from a transactional database with referential integrity into JSON really easy; which webapps and APIs very often need to do.
I'd rather learn the ins and outs, problems and issues, highs and lows, of SQL rather than an ORM.
ORM require just as much investment in time and even then you still need to learn the sql to get the ORM to do what you want it to do.
SQLAlchemy on Python is a truly fine piece of software but in the end it was much simpler and felt more powerful for me to write the SQL. And not even hard BTW.
I only have a limited amount of time available for learning and if I can trim out an entire class of technology (i.e. the ORM) then that's a whole bunch of stuff I just don't need to spend time learning.
I prefer SQL to ORMs as well for exactly the reason you state. It's more comfortable and it's less of a mystery.
But you don't always get to pick what code you'll be working with and if you are working with others in a web framework pretty good chance you'll be learning an ORM anyway.
Also really long SQL statements are no fun. Like debugging a whole program written in one line. I think sometimes there is a temptation to get excessively clever with SQL.
This is great advice in general for everyone in a role that even slightly touches on ops.
In my day-to-day work, I frequently observe that knowing some SQL (esp. joins, and aggregate functions like SUM/MAX with GROUP BY and HAVING) turns you into some sort of mighty wizard for most people. They're trying to debug a problem in their service and not making progress for hours, and you just walk straight into psql, take a look at the schema, do a few SELECTs, and zoom in on the problem.
Yet nobody seems to consider SQL a valuable skill. I guess it's not buzzwordy enough.
The most cringeworthy thing I heard about ORM's actually happened two weeks ago when I explained our use of a query builder rather than an ORM. The new senior developer was talking about speed (??? uhm… k...) and the benefit of being able to switch between PostgreSQL and... MongoDB. I just cringed up, didn't know what to say. Using the same domain model in an RDMBS as a Document Store? I really didn't know how to respond to that.
On the other side you have juniors, just starting development and working with an ORM/ODM the first time. It's not uncommon for them to spend years in development while learning very little SQL at all, hardly understanding the database (apart from what they learn in school). I don't blame them. I hardly know TCP/IP at all and still I write technology for it everyday. But ORMs are a more leaky abstraction; a lot of complexity is made easier by just switching to SQL.
More principal points:
- Why do we want to abstract away further from the most important part of the business: data. Data usually outlives the applications built on top of it.
- Best practice is to keep backend API's stateless and requests short-lived. ORM's promote the use of state. In client-side applications state is a much more interesting long-lived thing, but a backend API these days? It has actually become a lot simpler there since I've started doing development; few modern backend servers render views these days.
I understand why the author (re-)embraced stored procedures but I never will. Blame the overzealous PL/SQL Oracle seniors I've met, god forbid any human bestows such complexity on it's fellow colleagues. It's also hard to automate tests for them and promote it through DTAP alongside an application, hence I keep putting all logic in the application.
I can imagine if the language and ecosystem are very strongly geared towards ORM you shouldn't try to do anything else. That's just painful. But in most languages other than Java and C# I'd make a good consideration if you really need/want an ORM (coming from Java, I never took non ORM design seriously there, but perhaps I shouldn't have).
Not sure where I'll stand in a few years time on this… but for now just happy with plain old SQL and query builders.
ORM stands for Object-Relational Mapping (wikipedia). It is just a way to map domain objects to tables. There is no "promotion the use of state" in that definition. It is your choice to start using state in a (mis)designed manner but please don't blame ORMs for that.
Sorry, I should have been more precise. Not all ORM's are designed the same, far from it.
Many ORMs are built using the Unit of Work / Data Mapping patterns. Such ORM's map your data into a separate domain model and manage this model for you. If your orm has something like an "EntityManager" it has likely implemented this Unit of Work pattern.
A key thing the Unit of Work achieves is to commit changes to the database in a single transaction. You often need to update multiple records in an atomic way within enterprise software.
You might not be faced with such challenges in a simple app, but in monolothic enterprise software it's a core feature of what a good backend server does.
Active Record-based ORMs or query builders aid you only a little in this task; they expose the transaction handling logic so much so that it starts to read as a normal SQL database transaction (and might only be cumbersome to use at worst). Here you, the programmer manages it, similar to a normal SQL transaction.
The Unit-of-Work based ORM is more intelligent. It manages the database transaction for you and figures out any changes that were made to the managed entities. In my experience all Java-built enterprise software (I used Hibernate, EclipseLink and Toplink) are designed this way and make heavy use of it. I've used it with Doctrine in PHP quite a lot, and my guess is C#'s Entity Framework is also built around such concepts. That is a big slice of the ORM market.
Here is where the state comes in; different parts of the applications contribute to creating a single database transaction until flush-time. You as a programmer should know when "flush time" actually happens and understand which entities were marked dirty. That is a lot of hidden state that is managed for you; it is in fact the core of what such ORM's do; managing state until it's ready to be flushed. To make it really advanced, powerful ORMs (the popular enterprisey ones) do a lot of caching too, at different times and at different scopes.
When tackling with such tools the distance to normal SQL becomes very large. I think that is where quite a bit of the hate comes from. It's become very powerful magic.
I've been in places where I had to really understand how this magic works to solve serious performance issues with it. I learned a lot, solving problems that shouldn't have existed in the first place.
I don't like magic. It makes me hide in the corner and cry a little.
My comment was targeted towards the UoW / Data Mapper stuff and much less so to Active Record ORM's.
To me, it hinges on what you mean by "a collection of objects." If it is just a list C-like structs, that's great. If it involves lazy-loaded child collections, inheritance hierarchies, or any kind of behavior at all, that makes me worried.
Yeah, I agree. Lazy loading in ORMs what happens when people want OO databases. And by OO databases, I mean everything to act like it's in an in-memory collection.
Sounds nice, but in reality, latencies, networks, massive data sets, atomicity of operations, and a whole host of other annoyances get in the way.
... so it ends up being simpler and easier in the long run to be very explicit about your interactions with data stores. Took me a long time to get to this point.
I do agree every one who needs to get data from an SQL database should know SQL, and mostly, should know about indexes and how and why queries can be slow.
But a query builder is extremely useful and in any complex application, if you don't use one, you end up bulding one yourself, which may not be a very good idea if you don't understand things like query injection.
So learn SQL, learn ORM, and choose in a case by case basis.
I wish there was a commonly agreed upon name for "query builder" that isn't "ORM". Actually mapping relational data to objects often rubs people the wrong way, for good reasons, but the query-building layer underneath is pretty universally useful.
But in general I agree with you: Why just learn SQL? Learn all the layers!
You can write more efficient C(++) if you know assembly, even if you don't write anything in assembly.
You can write more efficient python if you know C(++) and the various tradeoffs between data structures, even if you don't write any C++ and don't do any data structure by hand in the current project.
...
You can use an ORM more efficiently if you know SQL. Same thing isn't it?
That assumes the ORM doesn't completely get in the way, of course.
Like static vs dynamic typing and frameworks vs libraries this is an endless debate that rarely sheds much light on the topic.
The people who use ORMs daily are likely to favour them and those that think they are the devil's work are unlikely to have intimate experience of a range of different ORMs.
It's "Active Record" style ORMs like Hibernate that are the culprit, and the way many developers utilize them to avoid any contact with the realities of RDBMs which leads to data access antipatterns which lead to poor performance (multiple needless queries per request etc.).
Another thing people need to really give up on is the pipe dream of switching databases -- you're not going to do it. I've never seen one single case of people actually utilizing ORM to actually change RDBMs they store data in.
Not all ORMs are like that and the best solutions are the ones like Knex/Objecion where an ORM (Objection in this case) is nice abstraction for single-object access/writing and underlying SQL builder (Knex) is fully exposed and used for everything else.
I think it's misleading to think about switching databases on an ongoing application as the use case for database independence. More important use cases:
1. Using SQLite for unit testing and a real RDBMS for integration testing, acceptance testing, and production.
2. When you are writing a library that will be used by different projects, not all within a single organization (e.g., a Free Software project).
3. When you are providing a product that the end-user may want to use with different choices of database (e.g. forum software, Nextcloud, etc.)
i’d agree switching a current project from DB server to a different on is a pipe dtream, but the couple of use cases that are related but incredibly handy are switching DB driver mid project for any number of reasons, or using a new and unfamiliar database in a new project
I've found ORMs (such as Entity Framework) great for operations that can be described as "find a single thing by PK and update it." For read operations I've favored this strategy: "Imagine the ideal result set for the task at hand. Use SQL to deliver that result set. Do the rest of the work in your app language of choice."
Edit: I guess I should clarify that I would favor using any library that maps result sets to lists of objects. And I would consider that to be part of "do the rest of the work in your app language of choice."
Yeah EF is great for CRUD. The way I distinguish whether EF is going to be used or not is simply whether the workload is OLTP or OLAP. At OLTP EF excels. It's terrible at OLAP (ORMs generally are) so I'll drop to raw ADO.NET and (if lots of data has to go in to SQL Server) table valued parameters.
Regarding the article's point about managing your schema, there are only two real options IMHO:
1) "The database schema is the official definition."
Programmatically generate what ever ORM objects (at build time) in a 1-to-1 fashion from a schema dump. This is the approach DKOs use: https://github.com/keredson/DKO As long as the code generation step is done as part of the build process, you'll have none of the normal code generation headaches, and your build will fail if you've made a code incompatible schema change.
2) "Your ORM objects are are the official definition."
And generate the schema definition automatically. The common process of this is that most "generate schema" functions are stupidly lazy, and drop the work of calculating the diff from an existing schema on the developer (forcing them to write migrations). This is unacceptable in my eyes, just as it would be if my version control software wanted me to write my own diffs by hand in order to make a commit. I strongly prefer automatically generated diffs, like in https://github.com/keredson/peewee-db-evolve. So you can do non-destructive schema changes. It's a model I've re-implemented for any new ORM I wind up using.
In my current project (work) I've chosen to use JDBI for communication between model and database. For me it is the perfect level of abstraction whereby it still feels like I'm just writing SQL queries but also avoiding a lot of the boilerplate code that comes with having to manually manage DB connections.
I still have to write the actual code that maps the result set to objects, but I've found this is a very small tax to pay, especially with Kotlin's data classes.
I used an ORM in a previous job (RoR ActiveRecord). I didn't find it an altogether horrible experience, but there where many cases where we would get these 'leaky abstractions' in the wrong direction from the model -> database schema. There were also a lot of cases where we would realise that some ActiveRecord query we were doing was unintentionally loading entire tables into memory (our fault, not ActiveRecord's fault). This was usually remedied quite easily by just RTFMing the docs, but my feeling is we could have avoided it in the first place if we'd used a lower level strategy.
Personally, when developing a new feature I always like to start by thinking about the database representation first and then working my way up. I think having this sensitivity to how it should be represented in the database can allow you to avoid many of the pitfalls that may come with using a more opaque ORM.
Another valuable attribute I get from using JDBI is it's dead easy to mock (like heavyweight ORMs), so unit testing stuff that interacts with it is super simple.
We took a tack similar to how PostgREST and PostGraphQL are structured. We use views in the public schema to build our objects. Functions constrain our mutations. Triggers respond to events and maintain consistency.
It makes our web API code simple and hard to introduce errors that invalidate our customers’ data.
Don’t miss having an ORM. Always seemed like more abstraction and complication than was necessary given recent advances in servers like Postgres.
Every extra abstraction layer comes at the cost of complexity, maintenance, and capability, and therefore inherits the burden of proving its worth. ORMs frequently claim to be less tedious than SQL. Failure to deliver on that promise is a reason to cut them out.
A lot of the reasons listed in this article actually made me shift from sophisticated ORMs like Hibernate back to a query based approach. A really nice framework for this (in Java) is jOOQ which gives you the possibility to write typesafe SQL via code generation.
https://www.jooq.org/
(I'm not at all affiliated with jOOQ - just a happy user)
Another happy jOOQ user here! I'm ditching Hibernate wherever I can in favour of it. And using it with Kotlin instead of Java makes it almost a form of poetry.. I've seriously been thinking about writing a blog post, something along the lines of "jOOQ: how I learned to love the database again"
I've been thinking about what makes jOOQ so good and a huge part of it is brilliant engineering: SQL clauses are mapped almost 1:1 into reasonable and understandable code while the author spends huge effort to cover new features as databases introduce them without turning his product into a mess or introducing API breaks in every major new version. That's hard.. but awesome! :)
> But the damn migration issue is a real kick in the teeth: changing the model is no big deal in the application, but a real pain in the database. After all, databases are persistent whereas application data is not. ORMs simply get in the way here because they don't help manage data migration at all.
If you ORM doesn't help you manage data migration, get a better one. Any ORM worth its salt should let you generate migrations; at least Django and Hibernate (+ Liquibase) can. I find the key to making everything work nicely is to let the definition in the application/ORM be the source of truth for what the DDL looks like. If you want a particular SQL table layout, figure out how to tell the ORM to generate it. If you try to retrofit an ORM onto an existing table schema (which it seems is this author's preferred approach), you're in for a world of pain.
> These two things don't really get along because you can really only use database identifiers in the database (the ultimate destination of the data you're working with).
> What this results in is having to manipulate the ORM to get a database identifier by manually flushing the cache or doing a partial commit to get the actual database identifier.
Use UUIDs. Generate them in the application, but use them directly as identifiers (pkeys) in the database.
> Something that Neward alludes to is the need for developers to handle transactions. Transactions are dynamically scoped, which is a powerful but mostly neglected concept in programming languages due to the confusion they cause if overused. This leads to a lot of boilerplate code with exception handlers and a careful consideration of where transaction boundaries should occur. It also makes you pass session objects around to any function/method that might have to communicate with the database.
> The concept of a transaction translates poorly to applications due to their reliance on context based on time. As mentioned, dynamic scoping is one way to use this in a program, but it is at odds with lexical scoping, the dominant paradigm. Thus, you must take great care to know about the "when" of a transaction when writing code that works with databases and can make modularity tricky ("Here's a useful function that will only work in certain contexts").
Use a monad to represent "this function has to happen in a transaction", then all those problems go away.
An ORM is both a more native way to represent data rows in the application and a convenient wrapper to automate much of dealing with SQL and result sets. But only a fool would wish to see an ORM as an object-based abstraction over SQL.
Dynamic languages such as Lisp and Python can often make-do without a specific ORM layer because it's easy to stash data into lists and dictionaries. If you can read rows from your result set into a list, make dicts or structs out of each row, and pass around that list or its entries to various functions you've just implicitly implemented a LRM: lispy relational mapping. But sometimes it just fits to create objects out of rows.
Simple wrappers will do for an ORM: the best kind always make it clear that you're just using a machinery to operate an SQL database instead of updating an object and then "saving" it back to disk in the end.
I really enjoy seeing this debate because I hope more anti-ORMers unite and we see less use of ORMs in future projects. There are other disadvantages of ORMs that I haven't seen mentioned much:
1) low-level performance. Even if, and it's big if, you manage to get your ORM to generate somewhere near the optimal query, ORMs in my experience are always significantly slower than hand-written sql. When I last benchmarked, I couldn't get hibernate to be any better than 4x as slow as JDBC, and keep in mind that's pure CPU overhead.
Think your service is I/O bound? It's probably not, and it's probably your ORM to blame. This may be less of an issue for a dynamic language like Python, but I see it as a much bigger issue for Java/C# and friends.
2) Debugging/understandability. Did you know that hibernate maintains a cache of every object you load in a session until you flush it? I didn't, until we had an outage because our service OOM'ed while loading too much data without flushing.
Do you know how exactly your ORM is loading and saving data and when? Depending on your use of the various lazy-loading and storing features of your ORM, it can be very difficult to reason about when and how your ORM is talking to your database.
Do you know how your ORM is integrating with your cache, which is likely memcached? Why is your ORM integrating into itself the concept of a cache in the first place? In my experience, hibernate gets caching wrong, and that's not entirely its fault. It's difficult to get caching right in the general case. But I would rather be forced to think about caching up front and get it right for my use case rather than try to understand how Hibernate is doing it and working around its mistakes and limitations.
The common theme is that the use of an ORM makes it incredibly more difficult to understand, reason about, and debug your application rather than using a simpler library. In my experience, this alone makes an ORM not pull its weight.
Using an ORM saves your from having to implement a lot of code. But you still have to understand how everything works. ORMs makes it look easy, but there is a lot of magic involved that you need to understand sooner or later.
This type article of article has come up countless times before. Why not both? Using both, you let the ORM handle regular, boring CRUD, validation, repeatable exercises, and write tricky joins, aggregates, function calls that ORMs don't do well in SQL. Every decent ORM supports dropping to SQL.
Ah yes - the proverbial "ORMs are bad, just learn SQL" post. This is analogous to saying "don't use frameworks". Sound ridiculous? Yes, yes it is.
The law of leaky abstractions applies to many, many things, ORMs included. I would also argue they apply in different degrees, usually related to the design of the ORM (the post mentions SQLAlchemy vs. Hibernate, for example).
But consider the following:
1) Why do people still use ORMs? Exactly.
2) Question 1 but s/ORM/framework_or_widely-used-library
3) ORMs allow you to develop faster, and deliver value
4) Beginners already have a hard time coding, designing, and understanding what they're doing. ORMs provide a nice abstraction over underlying data stores
5) Although fraught with peril, ORMs provide a common interface that'd give _some_ help if you switch data stores
6) Multi-line SQL statements are a huge pain in some languages
In response to (4). I'd posit that beginners have a hard time with this stuff because they use too many leaky abstractions. Like the article says, if you want to use an ORM for anything non-trivial, you need to learn both the ORM and SQL, which is inherently more difficult than just learning SQL.
Providing tons of abstractions to beginners so they can build something in 10 lines of code doesn't help them get better. All it does is encourage them to learn top-down when it's often much easier to learn something from the bottom-up.
> This is analogous to saying "don't use frameworks". Sound ridiculous? Yes, yes it is.
When developing API's in Golang or for microservice / serverless architectures not using a framework might actually make a lot of sense. Also microframeworks (trimmed-down versions compared to opinionated frameworks) are very popular in almost any language.
I'd say that Golang rather comes with a (simple) framework in the standard library, so you don't need a third-party one. After all, http.Handle/HandleFunc clearly follow the Hollywood Principle: http://wiki.c2.com/?HollywoodPrinciple
> ORMs are bad, because objects aren't particularly useful to deal with most data.
Objects are containers of data; that's a crazy statement to make. The relational structure maps pretty cleanly to objects, properties, and collections.
However, objects are not good for reporting. And the author mentioned doing 14 joins and hundreds of columns - that smells like a reporting query.
But why do you need these containers of data, when you could have more direct access to both the data itself and the whole set (not individual objects, nor collections of objects)
I dont like ORMs but did struggle some years trying to use them which imho was a detour. SQL and stored procedures in plpgsql is so much better, easier to maintain, easier to reason about etc.
I personally would write a simple plpgsql stored procedure, but I would not trust user input and only allow a defined set of colums from a defined set of tables.
you can have lots of dynamic sql but that might become a rabbithole, just as with an ORM. It sounds like a problem you shouldnt have, now throwing an ORM at such a problem... might lead to even more strange issues down the road...
I personally would write a simple plpgsql stored procedure, but I would not trust user input and only allow a defined set of colums from a defined set of tables.
Sure, you can limit it to a single table and to a certain set of columns e.g. A B C D E. Can you give me an example of a simple plpgsql stored procedure to do this?
you can have lots of dynamic sql but that might become a rabbithole, just as with an ORM
That's not my experience. In a language with good introspection and/or where most entities are first-class, you can do this rather easily. In Python that would take two or three short lines.
It sounds like a problem you shouldnt have
This is a bit of a cop-out :) allowing the generation of simple reports configured by the user is a typical need for us.
The relational structure maps cleanly to badly designed objects, where most classes have 5-10 fields and most fields are opaque data values with setters and getters. If you wrote classes like that in isolation, you'd have to do some serious work in code review, explaining why a hodgepodge collection of properties is a single unit of responsibility. But I've never worked in an ORM-using system where they weren't endemic.
I don't buy into the idea that those kind of classes are just unqualified bad design. The majority of applications exist to merely to store information entered by users. Using a class to hold that data in a structured and type-checked way is perfectly reasonable.
And they aren't a hodgepodge collection of properties; they're entities that represent a single item in a system whether that be a person, a widget, an order, or a product.
> The relational structure maps cleanly to badly designed objects
If you are exposing base tables to the application instead of appropriate views, which as much a violation of good RDBMS design principles as what you describe is a violation of good OO design.
I have had exactly one positive experience with ORMs: nHibernate + fluent nHibernate config with .net's IQueryable expression interface. That is the ONLY time I have used an ORM and felt like I had enough control to do what I needed for edge cases and still the a perceived productivity boost; granted, there was what seemed like a large learning curve at the time. That said, I must prefer to just write queries myself and use an object mapper (Dapper, etc...) to eliminate a lot of the boilerplate.
Maybe there are some amazing ORMs for other stacks. But my time with .net was the only time I heavily utilized sql DBs.
Many posts say that an ORM is a good idea for the type safety / vulnerability / composibility / portability aspects. But that’s actually independent from the object mapping aspects - e.g. web2py provides all of the above in its DAL without the object mapping. It does sort-of abstract over SQL variants with Python query syntax, but no objects are involved.
Nim’s Ormin is still in not fully functional but is providing the same in a static language with straight up SQL.
My experience is that the O aspect of ORM is where it doesn’t help (and often gets in the way); if that’s your experience, consider DAL and Ormin
This article is valuable since it's raising some interesting pitfalls it's good to know and avoid.
That being said: you need to use the right tool for the job.
It's just hilarious how people expect the new "foo framework/paradigm" to solve ALL the problems... jeez! It's nice to know and understand new paradigms, but you really need to evaluate your case.
ORMs took away the complexity of 90% of web apps. All the "Model X has many model Y". If you step outside of that realm with the ORM then you're officially "fighting the framework", and bad things will happen.
There are ORMs that address partial records, multi-threading troubles brought by lazy loading, uniquing, auto-updating records, and, importantly, put raw SQL on the same level as the query builder. I think of [Diesel](http://diesel.rs) and [GRDB.swift](https://github.com/groue/GRDB.swift/blob/master/Documentatio...).
> ...in order to use ORMs effectively, you still need to know SQL. My contention with ORMs is that, if you need to know SQL, just use SQL since it prevents the need to know how non-SQL gets translated to SQL.
This is presuming that you will never come across ORM being used almost exclusively in any future projects. After all, ORM doesn't seem to be going anywhere (even with all the hate against it). One could make an argument that learning both effectively would provide a better general foundation.
The problem with ORMs is that they're an attempt to hack relational models into languages that lack them. The thing nobody has ever done is add relations as first class citizens to a language. Instead we have only maps, arrays, and lists in our languages and we have to shoehorn richer data into that.
That's why hierarchical document databases have had a resurgence. At least they match the data model if the language.
LINQ isn't a ORM. I assume you mean Entity Framework?
EF definitely has the foreign key issue. We have around a thousand tables, we tried to generate the classes for all of them including foreign keys, problem is that when you create a context that references even only a single table, it will load everything that is foreign keyed including siblings of siblings of siblings, until you run out of memory.
Only way around it is to not set up foreign keys which massively diminishes the value of using EF, so we wound up creating two copies of each table's classes, one with and one without foreign keys. That causes its own issues.
Sounds like you disabled lazy loading and that forced everything to be loaded at once. That's a user error, not a EF fault.
This is a common theme I've seen with people blaming ORM's for being slow, it's the devs not using them appropriately more than the ORM's themselves. Not to say that they don't have their own issues.
With or without lazy loading enabled the result was the same. Generating the structure took seconds and went OOM with enough tables.
LazyLoading impacts what data is retrieved from the database (or more to the point when), this is a structural issue before a query was even sent to the database. It would die while generating the query, not sending the query or populating the result.
You likely should have asked for more information before concluding it was "user error."
> LazyLoading impacts what data is retrieved from the database (or more to the point when)
It impacts more than just that, it will impact the query generation as well. If you have a property that is not lazy loaded then it will attempt to join the relation or load it very another query in the same round trip. Turning it off tells the ORM that every single time you want A it needs to go and get B as well. If you have it off universally it will attempt to load the entire database, or as may be the case here, crashing while trying to generate a query to do so.
> You likely should have asked for more information before concluding it was "user error."
Perhaps, but you've got multiple conflicting accounts of what went wrong, some comments indicate that it returned data, others say it never touched the database.
I have been working with EF for years now. It has it quirks - but this is not something that I have ever experienced, nor have I heard of anything like it before today.
What I have heard of is traversing the entire graph and causing cascading loading of navigational properties. Yes, I have done that. Something like AutoMapper will do that to you, if you are not careful. Been there and done that.
How did you determine that it OOMed while generating the query?
You were doing it wrong then. EF does not eagerly load child collections. You cannot even configure it to do so.
As someone else suggested, you probably had lazy loading enabled, and some of your code tried - e.g. through reflection - to get all properties.
EF has some of it's own issues - but you can most certainly create composite primary keys, composite foreign keys and work with projections right from within the code.
None of the issues the original author had with SQLAcademy and Hibernate are really a pain in EF.
Some prefer to switch off lazy loading in EF and instead either explicit eagerly load specific child collections or explicitly load them right before use.
In EF you can do
var customers = Customers.Include(c => c.Orders).Single(c => c.CustomerNo == '1234')
This will load Customer '1234' with the Orders collection eagerly loaded.
Were you doing serialization or something like that that recursively accessed all of the properties in the model? If so, and if lazy loading was turned on (the unfortunate default) then the serializer would indeed keep exploring the object graph until it either loaded your whole database or hit OOM. That's about the only scenario I can think of that would cause such an issue - EF doesn't eagerly load related entities unless you explicitly tell it to.
Sounds like they're actually referencing some of the well-known query-generation issues in EF, though giant queries and long generation times are more common manifestations than OOM, which seems to be an extremely pathological case.
Essentially yes. It created giant object map with millions of objects in it and hit a memory limit (rightfully so, it wouldn't have been usable anyway). It would start at the initial table, get the siblings, then the siblings of the siblings, and so on until it was trying to generate an object for every property of almost every table we had (even if we only referenced the original table).
Some people, when faced with a database problem, choose to use an ORM on top of an RDBMS. Those people now have three problems.
The problems are: the original problem, the expressive and performance disaster that is every ORM ever, and the layered and hidden RDBMS whose peculiarities nonetheless always find a way to leak through the ORM abstraction.
I've been using Django + Django ORM for a decade, and it has covered vast majority of database usage. Entire applications written with zero SQL, and clean DRY code describing data and allowing to use it in the code. And yet other application written with lots of SQL.
The real question is, why anyone using an ORM not learning SQL, how the the thing they are mapping to objects actually works?
This topic has been done to death and the rate at which comments have been made since this was posted tells me that we either haven’t learned much of anything as a collective, or we just like to rehash the same talking points because... we can.
Know how and when to use ORMs. Probably learn SQL first. Don’t believe that any shiny bullet is silver.
It's not my quote but it goes along something like in order to use a layer of abstraction effectively you need to know 1 layer deeper.
Honestly what are you doing using an orm without knowing sql? The point isn't to hide sql, it's to automate repetitive tasks. Who goes to learn Angular.js without first knowing JavaScript or html?
What's the current state of object-oriented databases? Any progress since good old Gemstone? Why was Gemstone unsuccessful? Cost? Performance? Tie-in with Smalltalk? Is there any promising OODB being used these days? Because now that would take care of the OO-relational impedance problem.
Yup, take those Repository designs, where you can’t join on two tables unless stars align very specifically. Of course there are very good technical reasons for that, but isn’t this because the abstractions are leaking onto each other? What’s the added value of the extra layer then?
It's not about avoiding to write SQL, it's about to have standardized API on which all your architecture can count.
Why do you think Django was so successful ?
Because it was built in a way that allowed a rich and powerful ecosystem to flourish.
The Django ORM is not the best out there and it's doing plenty of silly things. If you don't know SQL and you use it you will be in a world of pain.
However.
Because Django features this ORM it can:
- provide auto-generated forms from db model, outputting HTML and validating user inputs, saving changes automatically to the DB.
- provide auto-generated CRUD views from the db model, that you can extend at will.
- provide auto-generated admin
- provide tookits and helpers to deal with your data: signals, various forms of getters, native object casting, advanced validation, better error messages...
- provide entry points for extending the data manipulation API, in a generic way (fields, managers, etc)
- provide tooling for migrations
- provide auth and permissions
- provide user input cleaning and escaping
- automatically deals with value normalization: encoding, timezones, text/number formats, currencies... There is one entry points for those where you can put custom code, and you don't need custom code most of the time since somebody did the work for you more often than not.
- ensure all django projects look the same, so that it's very easy to move from team to team or train people
- formalize the schema, which became a the documentation and only source of truth for your data, that is commited to your VCS. Wannan know what a Django project is all about ? Check urls.py, settings.py and models.py. Done.
The cherry for this cake is of course the fact 3rd party modules (so called django "apps") can leverage that, which lead to the amazing ecosystem Django has.
- auto-generate REST views from model (eg: django-rest-framework), again that you can tweak as much as you want.
- dozens of auth backends.
- data manipulation (workflow, filtering, dashboard, analytics) that just work.
- tags, search, comments, registration and all those stuff you alway rewrite otherwise.
And because they all use the ORM, they are all compatible with each others. And they all work on Mysql, Oracle, SQlite and Postgres, like the entire rest of the framework, out of the box, for free.
You want to do that in any other framework (except RoR) ? You'll get a lib that do half of it, and let the persistence and API integration work to you. And it will not play with others. And that will be integrated differently on another project. If you have a lib at all ! Oh, and you have to use the proper DB. If you are corporate or startup, it won't be the same one and you better hope the lib author is in your shoes.
All that stuff is easy in Django because you have a centralized, easy to inspect, standard, shareable definition of each of your model in one place.
That's what ORM are for. Not "doh, SQL is hard".
Now you could get some part of those benefits by creating central models using schemas untied to the DB, such as marshmallow. It would be an interesting take, but my guess is that you will end up with interfacing it with your DB with some kind of layer, that would look like an ORM anyway.
Disagree on joins and foreign keys: normalized tables make for smaller tables in my experience. Sure you might denomalize for reporting but it’s best to normalize and then demoralize as needed not avoid foreign keys because you have to write a large number of joins
"I should have learned SQL before dealing with ORMs."
Every single point the author brings up comes seems to come down to simple database design, lazy development, or not understanding their tools. They really don't seem to have anything to with ORMs or query languages.
Any screwdriver can make for a bad hammer and some screws may go in with a large enough mallet.
> Perhaps the most subversive issue I've had with ORMs is "attribute creep" or "wide tables"
Normalization of data is required whether you're using a query language or an ORM to access it. The fact that ORMs make it easy to "hide" the fact that you've added 500 columns to a table isn't the ORM's fault.
> Knowing how to write SQL becomes even more important when you attempt to actually write queries using an ORM. This is especially important when efficiency is a concern.
What do they think the ORMs are doing? Magical incantations over the disks? The ORMs are just using queries too. You can write really horrifically bad queries in a query language and also abuse ORMs, but that doesn't make either one bad. Most ORMs can let you see precisely the SQL they are creating. If not, the database will surely log the queries for you and let you know what's going on.
> The problem is that you end up having a data definition in two places: the database and your application.
Welcome to the fact that we have multi-layered technology? There's always going to be discrepancies between the layers that have to be ironed out because no data designs are perfect or future-proof. The author then attempts to bring migrations into the picture as if database migrations are somehow just not a problem if you aren't using ORMs (hint: database migrations have always been tough even in very well-design systems).
> Dealing with entity identities is one of those things that you have to keep in mind at all times when working with ORMs, forcing you to write for two systems while only have the expressivity of one. What this results in is having to manipulate the ORM to get a database identifier by manually flushing the cache or doing a partial commit to get the actual database identifier.
Sounds like a pretty frustrating example, but I've worked with at least 10 different ORMs I can think of off of the top of my head and not a single of them required "manually flushing a cache" or a "partial commit" to "get the actual database identifier." I wouldn't write this up as being an issue with ORMs or that this problem would be magically fixed by only writing SQL either.
> Transactions. Something that Neward alludes to is the need for developers to handle transactions. Transactions are dynamically scoped, which is a powerful but mostly neglected concept in programming languages due to the confusion they cause if overused.
Transactions are pretty straightforward and I cannot agree with: "The concept of a transaction translates poorly to applications due to their reliance on context based on time."
Transactions don't care about time at all. They care about order and making sure that things are completed in a certain series of steps. This actually translates very well to applications, especially when you have processes that take a long time, where you don't want something to happen unless another thing happens first.
While a decently-written article, this comes across as someone who learned about ORMs more deeply than databases, discovered the flaws that ORMs have, and decided that query languages must be the only way forward.
This ignores the fact that we created and adopted ORMs after struggling through years of rigid queries smattered throughout code.
Writing bare queries has a time and a place, but ORMs have saved countless hours of development time, and allowed for vastly improved longevity of code.
yes yes yes!
the point is NOT "learn sql"
you already know sql, FINE!
the point is: use sql, not orms!
migrations are best done in pure sql,
i'll contend that model-inflation is best done in pure sql also.
another thing: if the format is json, that's already a nested "joined" blob of usable data! it's what the end result of a sql join would achieve, the client often just has to drill into that data blob and everything needed for the entity in question is already there!
If you are working with Node.js and Postgresql, give tabel(https://github.com/fractaltech/tabel) ORM a shot. It is an unconventional ORM that works with pure JS objects and arrays, instead of "Model" classes, and "Collection" classes, and what not. It has quite a few other nifty features too.
How does the ORM detract from sqlalchemy? I’ve been very happy with it for years, and I’m not clear on where an advantage is explicitly not using mappers.
You can do both. My preference is to use "Micro ORMs" which provide a thin, light-weight, typed RDBMS-agnostic API around the most popular CRUD operations but also allow you to execute custom parameterized SQL when you need to run more complex queries but still let you use their fast mapping to populate clean POCOs/POJOs to take the tedium away from forcing you to become a manual bookkeeper from extracting the results into a more manageable form.
.NET has particularly nice support for developing typed ORM's by utilizing typed Expressions which lets you parse the syntax tree of the expression (instead of executing it) so you can generate the appropriate SQL that matches the intent of the expression. You can check out a live example of what that looks like for C# in:
Although the development experience is more productive using the rich intelli-sense inside any C# IDE. It's not just the Type Safety and producitivy that typed APIs offer, ORM's also provide built-in conventions for converting RDBMS types into the most appropriate language data type and their typed abstractions take care of generating the appropriate RDBMS specific SQL for each supported RDBMS.
A lot of the stigma of using ORMs is from "Heavy ORMs" which constantly fight the leaky abstraction of mapping a Relational Data Model into a Hierarchical Object Model which I've never seen an implementation I've liked, they're always inefficient and expose APIs that make it difficult to know what SQL is generated or have any ideas which APIs perform hidden perf-killing N+1 queries behind the scenes. Many Heavy ORMs want to maitain entire control over the source code used to interface with your RDBMS. They should be separated from "Micro ORMs" which are loosely coupled so it only needs a DB connection a Type definition that matches the RDBMS table or schema that's returned where they provide a clear 1:1 typed mapping of an RDBMS table to your programming languages Type.
The Types provide a contract your app logic can bind to and given they can map to clean disconnected POCOs/POJOs they can be reused to develop declarative, safe, typed Web Services that can be inferred from the Type's schema saving you the effort from having to implement it: http://docs.servicestack.net/autoquery-rdbms as well as automatically generating the UI to query it: https://github.com/ServiceStack/Admin
Disclaimer: I've developed the above.
If your ORM is causing you friction by all drop down to custom SQL, but don't use Stored Procedures unless you've identified situations where they provide clear benefits over their trade-offs. They're essentially free text commands without the support or capability of a proper programming language that splits your logic from your system making it harder to reason about it in isolation that doesn't benefit from the investments around maintaining source code, e.g. development environments, source control, CI, static analysis & compiler feedback, fast unit testing, REPLs, etc.
I wouldn't recommend using an ORM to save you from learning SQL, but rather to leverage ORM's to save the effort and boilerplate from interfacing your programming language with your RDBMS and provides an "in code" contract representation of your RDBMS Tables that your App logic can bind to.
I wonder why Git, CircleCI, test runners, and IDEs all have that “reject stored procedure code” logic built into them...
> doesn't benefit from the investments around maintaining source code, e.g. development environments, source control, CI, static analysis & compiler feedback, fast unit testing, REPLs, etc.
ORMs were pretty limited at the time I began writing my first platform, phponpie.com . I remember looking at Propel and Doctrine at the time.
Since then, I've written our own ORM which works in both PHP and Node.js, but we're the only ones that use it. It's been battle tested, though, with millions of users and variations. I would say that many of the issues the author brings up were things we had to face, and we solved them.
1) Schema – the ORM should have a script to regenerate base classes from the database, so that your schema only lives in once place. The nice thing is, after that, your IDE can help you out instead of writing sql by hand. It can use your language syntax to catch unbalanced parentheses, and more.
2) Adapters – the ORM should be modular so you can hook in adapters for MySQL, PostGres, SQLite, MongoDB, and various key-value stores.
3) Joins – the ORM is supposed to be smart enough to describe relationships and automatically write the most optimized JOIN queries for you. For example $article->getTags() . You could, of course, implement this stuff yourself manually but it gets tedious, when the code could easily be autogenerated with stuff like $article->hasMany('tags', ...) kind of like this: https://qbix.com/platform/guide/models#relations
4) Insight – using an ORM makes you pass actual values in a structured way, instead of interpolating them in a string. Thus you don't make the catastrophic mistake of forgetting to escape them, allowing SQL injections by Mr Bobby Tables. Also our ORM can do SHARDING in the app layer, especially useful in Node.js where it can issue simultaneous queries to several databases and combine the results. Although I recommend using CockroachDB these days :)
5) Flexibility – the ORM should support fetching partial objects, but with the Primary Key so they can be saved back. Recently we even added support for vector-valued lists, something we needed for extra flexibility.
6) Transactions – the ORM should be smart enough to handle transactions, in fact support nested transactions on various shards. Since the database engine usually does not support nested transactions, you need to emulate that in the app layer. For instance, when you start a session, you might want to begin a transaction and lock the session for update.
7) Methods – objects which are fetched can have user-friendly methods added, like $stream->exportToClient() and so on.
It's free and open source. Here are examples of usage:
fuck ORMs. Here's the thing, the DB queries will make up less than 0.1% of your typical codebase. Spend a week and write a good fucking query instead of relying on your bullshit magic blackbox to do it for you. Even a shitty query is going to perform better than the 100s or (literally!!) 1000s of lines of ORM code which need to happen to generate your "select * from users" query.
seriously. grow up. learn the amount of SQL you need to and napalm any ORM that comes within arm's reach.
and if you're not sure about the SQL you've come up with then just go a few cubicles down and ask your DBA what they think. Chances are they'll write something 1000x better than you came up with an you'll have learned something along the way. win-win.
If this person spent all that time using Hibernate and then SQLAlchemy, and all that time did not know SQL, then their suffering and bad experiences make complete sense. You absolutely need to know SQL if you're going to use an ORM effectively. Good ORMs are there to automate the repetitive tasks of composing largely boilerplate DML statements, facilitating query composition, providing abstraction for database-specific and driver-specific quirks, providing patterns to map object graphs to relational graphs, and marshaling rows between your object model and database rows - that last one is something your application needs to do whether or not you write raw SQL, so you'll end up inventing that part yourself without an ORM (I recommend doing so, on a less critical project, to learn the kinds of issues that present themselves). None of those things should be about "hiding SQL", and you need to learn SQL first before you work with an ORM.