I've been doing ORM on Java since Hibernate was new, and it has always sucked. One of the selling points, which is now understood to be garbage, is that you can use different databases. But no-one uses different databases. Another selling point which is "you don't need to know SQL", is also garbage. Every non-trivial long-lived application will require tweaks to individual queries at the string level. The proper way to build a data layer is one query at a time, as a string, with string interpolation. The closer you are to raw JDBC the better.
Oh yeah, another bad reason for ORM: to support the "Domain Model". Which is always, and I mean always, devoid of any logic. The so-called "anemic domain model" anti-pattern. How many man-hours have been wasted on ORM, XML, annotations, debugging generated SQL, and so on? It makes me cry.
Couldn't agree with this more. The way I like to put it is "ORMs make the easy parts slightly easier, but they make the hard parts really hard". But I don't care that much about improving the easy parts, because they're already easy! Yet I can remember plenty of times when ORMs made my job a million times more difficult and all I was doing was fighting with the ORM itself (and usually at the worst time, e.g. when a query reached enough scale that it "fell over").
Query builders aren't exactly the same thing, but I love this article, https://gajus.medium.com/stop-using-knex-js-and-earn-30-bf41..., and it's made me a huge fan of Slonik and similar technologies. It does 90% of what I want a query builder or ORM to do (e.g. automatic binding of results to objects, strong type checking, etc.), but it lets me just write normal SQL. I also never understood this idea that SQL is some mystically difficult language. It has warts from being old, sure, but I'd rather learn SQL than whatever custom-flavor-of-ORM-query-language some ORM tool created.
> but I'd rather learn SQL than whatever custom-flavor-of-ORM-query-language some ORM tool created
This point is worth emphasizing!
When you application carries non-trivial business logic, it will create non-trivial DB access patterns.
The developer will either use SQL or some other language that, pragmatically speaking, translates to SQL. But either way, they will learn some DB expression.
I'd rather use a standard. Existing familiarity, easier to find docs, support, and is probably more bug-free, etc.
Another reason to use standard MySQL? ChatGPT is really good at writing, and modifying, SQL queries and the code that injects input into it.
You don't really have to write SQL anymore, you just tell chatgpt what you want changed and it works most the time. The times it doesn't, it gets you most of the way there.
The problem is the ORM requires you to know its language, the underlying SQL for that database, as well as how the ORM maps to that underlying language.
It’s the worst of a leaky abstraction, and like 3x the conceptual overhead. But it’s better because … something something OOP.
> It’s the worst of a leaky abstraction, and like 3x the conceptual overhead.
I really think you hit the nail on the head with this sentence. For example, another commenter said "But the same SQL fanboys who believe every developer ought to memorize tri-state truth tables..."
Yes, I think it's fine to argue that the boolean logic with respect to NULL in SQL is a pain in the ass. But ORMs certainly don't hide that from you! The ways DBs treat nulls is a fundamental part of nearly all RDBMSes - if you don't understand that, you're (eventually) going to have a bad time. All ORMs really do is make it harder to understand when you may need to take special null logic into account.
OOP isn't an anti pattern, OOP as a _silver bullet_ is an anti pattern. Almost all of my programs end up to some degree multi paradigm of OOP, functional, and procedural.
I call that post-OOP. The outcome might even be very, very similar to what you might have implemented as an OOP believer, but the assessment has flipped: instead of assuming yourself "did I use enough OOPisms" you ask yourself "could it be better with less?"
I'm not gonna argue against OOP in general here, but in the context of ORMs, some of the most important characteristics of OOP (encapsulation and inheritance, especially) are the source of many anti-patterns.
Keeping methods that operate on data together with the data is the biggest problem with the Active Record pattern, and it's good that some new ORMs (like Ecto) are inspiring others to try other alternatives. Those alternatives invariably eschew the exaggerated use of method/data coupling and code-sharing via inheritance/mixins, among other things. They are either functional or procedural.
Not to mention that a lot of OOP-heavy codebases are already replacing a lot of OOP-heavy ORM code with procedural (Services) or functional-inspired ("Railway Oriented Programming", Trailblazer operations).
Sure, there are still classes being used. But I don't think it's fair to call "just using classes" as really as OOP, when they're used as structs, and especially in languages that make class usage mandatory.
I never said otherwise. “Lovely” and “the ORM is free of issues” is orthogonal. The issues are why there’s a lot of different alternative ways of writing domain logic in Rails.
I agree. We'd be better off with data structures and functions that work on those data structures, and packages/modules/namespaces (with public/private exposing) to do encapsulation.
OOP comes with the banana-gorilla-jungle problem, which I consider the fundamental issue with it.
This stems from the fact that SQL itself is a very leaky abstraction: each RDBMS implementation is different, and the common part is shallow. There is no way to optimize a query in an RDBMS-agnostic way.
Actually, ORMs are worse than a leaky abstraction: they border on cheating! They actually abstract the easy, boring part away, but they run with their tail between their legs as soon as the issue becomes interesting.
> They actually abstract the easy, boring part away, but they run with their tail between their legs as soon as the issue becomes interesting.
This is the biggest offense, IMO. It's not just that ORMs are a leaky abstraction, it's that they're just a bad abstraction--leaky or not. We don't really need an abstraction to make easy things easier, and we certainly don't need/want an abstraction that makes hard things even harder!
There's a famous quote that always comes to mind when I encounter stuff like this. I've seen it attributed to several people, so I'm not positive, but I think it was from Larry Wall: "Make easy things easy and hard things possible."
The analogy I use is SQL is like two-wheeled motorbike - fast, nimble, elegant. Yes, it may be dangerous. Yes, you may need to wear a helmet. Yes, it cannot carry too much load. But it gets you very quickly very far, if you have skill!
Now, ORM sees deficiencies and presents you... two motorbikes welded together side-by-side by some iron sticks. They call it a "car". Four wheels are better than two, right? Two engines can push more load, right? No helmet needed, right?
But does it really become a "car", or it is just has most of deficiencies of the motorbike plus some spectacular new ones? And no benefits of a car whatsoever?
Granted, I haven't dove deep into Django, but from what I learned scratching the surface, it's slow, bloated, utterly un-Pythonic, and I just don't understand how it got so popular. Yeah, it's the most batteries-included web app framework for Python, but it just sucks to use.
Yep, fair fix. My favourite bit of Django metabollocks is when they mix in an override for __new__ on models, that won't let you instantiate an instance if the Django "app" the model or any of its dependencies belong to isn't in django.settings.INSTALLED_APPs.
I'm sure they had a good reason for doing this at some point, but I'm just trying to write a unit test for something that consumes a FooModel, no DB access needed, just a small test that doesn't involve starting up all the Django bits, but Django says no, best I can do is creating an SQLite DB and creating all your tables on it.
I have toyed with doing my own metabollocks hackery to circumvent this, but it just makes everything even more fragile.
(Oh, the Django test-runner won't discover tests that aren't subclasses of a Django test class, so no writing a unittest.TestCase for you! You might need Django in it!)
> I'd rather learn SQL than whatever custom-flavor-of-ORM-query-language some ORM tool created.
Me too! That's why I use an ORM that lets you write queries in SQL, like... Hibernate.
This narrative seems to come up frequently. People make (good) arguments against certain features of ORMs and then conclude it's an argument against ORMs in general. The boundary between databases and codebases is very broad. But ultimately you work with objects in your code, and relational tables in your database, so there's going to be some sort of Object Relational Mapper in between.
> Me too! That's why I use an ORM that lets you write queries in SQL, like... Hibernate.
Yeah, this criticism of ORMs strikes me as off-base. I feel like any ORM worth its salt gives you the tools to write raw SQL as needed, which can also be combined with the query builder the ORM uses.
I use Rails/ActiveRecord and have very few complaints. When I need to do something simple, the convenience methods work great. When I need a complex query, I just write it.
There's a learning curve here, where you have to understand when to "break glass" for writing queries directly (either for better performance reasons or because an ORM lacks the expressiveness required).
I think a lot of ORM hate comes from people who never got deep enough to really hit the sweet spot. They (correctly) note that the ORM abstractions aren't suitable for every query, but (IMO) they disregard the benefits that they provide for most queries. If something like 90% of my queries can be generated by an ORM, I'm feeling pretty good about that - it's a lot more maintainable.
> I think a lot of ORM hate comes from people who never got deep enough to really hit the sweet spot.
This. And also, the feel of using Rails/ActiveRecord resonates with me in a way that no other ORM I've used does. It's not just the ORM but the tooling around it esp. wrt generating migrations, scaffolding and so on.
> There's a learning curve here, where you have to understand when to "break glass" for writing queries directly.
Yep. ORM is a tool. The craftsperson needs to acquire skill both in using the tool and in knowing when to reach for another tool.
Maybe some people feel ORM is never the right tool. Who can argue with that? I'm curious to see how they do it. Maybe I'll learn something.
> ultimately you work with objects in your code, and relational tables in your database
Yes, you can map a record in a DB table to a class, and as long as you are happy with retrieving records (and related records) and mapping them to classes, an ORM will be all you need. But SQL is not just limited to the columns of a table, you can join, define new columns in your select statement, group, use aggregate functions etc. etc., and then you quickly get into territory that the ORM doesn't understand anymore. So the point that ORMs take away a lot of the flexibility that SQL gives you still stands...
This is not an objection to ORMs, except perhaps very simplistic ones. The ORMs I'm familiar with let you map queries to arbitrary objects. I use CTEs, joins, groups, aggregations, custom columns, etc in SQL queries (with Hibernate) all the time.
But... At that point what does hibernate even do for you? You can just use a lighter query wrapper at that point, or almost even roll your own wrapper around jdbc directly. The latter is a bit more work but it might be worth it depending on project since it gives you absolute control.
I'll take a query wrapper that's designed so I supply queries directly and it does the work from there (binding to objects) over using hibernate that way any time.
Having seen quite a lot of 'roll your own' solutions for wrapping queries: they were almost all unstable, inefficient messes that have a tendency to pick up lots of ad-hoc functionality. I'm not particularly fond of Hibernate but at least it's well tested and efficient for what it does and does things in a standard way so you can mostly look up stuff in the manual instead of guessing your way through a homebuilt framework.
> depending on project since it gives you absolute control
Absolute control to hunt down your own bugs, instead of using something that is an industry standard and any competent programmer jumping to your project can be up-to-speed with it instantly..
Very few developers are competent with hibernate, many fewer than with sql. Writeback cache nature of hibernate makes it very complex, order of operations on the app won't match order of operations in the DB.
Even an "industry standard" framework can get in the way. Given the constraints of executing efficient queries; handling errors and timeouts on the application's terms; and enforcing concurrency and transaction handling policies even a "good" ORM can be more a problem than a solution.
All of those will be worse off by hand-made “solutions”. Especially that you are not forced to use ORM for everything. Use it for like 90% of your generic db usage, and use its in-built escape hatches for the rest, like native queries or go straight to the db driver if needed.
I agree that ORMs are bad, but disagree in that I think that SQL is also bad--it has this bizarre natural-ish language syntax that is inconsistent and often difficult to reason about, with a huge number of keywords (including many that are noise-only). I would welcome a simple, modern alternative to SQL. I think the fact that people keep building and using ORMs is evidence that SQL is difficult to use.
In general, one of the thing I hate most about SQL is "the order of clauses is wrong" (should really start with the FROM clause and end with SELECT, see https://news.ycombinator.com/item?id=21150606), and PRQL fixes that.
And if that's the biggest perceived problem with SQL, then SQL isn't going anywhere.
It can't just be a little better than SQL, especially in terms of syntactic aesthetics; it has to be substantially and provably better than SQL to the point where companies are willing to pay for retraining, folks are willing to buy books and classes about it, and database engine developers have to consider it worthwhile to implement.
I don't see PRQL coming close to reaching that bar.
But at that point, it’s no longer SQL, it’s something better. That seems like a very nice point in favor of Clickhouse! But then again, Clickhouse is not a RDBMS, right?
I always hated ORM and saw it as an anti-pattern since I first used it many years ago for these exact reasons.
I remember back then, devs would look down on me for voicing my critiques about it.
There are many tools and techniques which I am forced to use today as a dev which are also anti-patterns.
The unfortunate reality of our industry is that most of the popular tools which software devs are using suck and are encouraging anti-patterns. The thing about the tech industry is that there are a small number of well-connected developers and ex-developers who call all the shots in this industry and they suck.
Or maybe you haven't taken the time to learn those tools and techniques and this is why you think they are anti-patterns. The people who set the trends in the development industry are extremely smart.
Take for example ORMs - people already know SQL and are often too lazy or un-motivated to learn their ORM of choice in depth. So for them the ORM is confusing and SQl just works. But an ORM eliminates a ton of code that needs to be written and maintained. Boilerplate code to read a row from DB than map it to an object or a map. Boilerplate code to do DB inserts and updates. Optionally boileplate code to track changes. Any decent ORM allows one to bypass it's mechanisms and in the edge cases write SQL directly.
It's essentially the same argument people used countless times in the past - e.g. why learn a higher level language, if they never work and I can do anything in C.
That's usually a good comeback since people who don't like something tend to avoid using the thing but because of the way the tech industry works, it's not possible to avoid the tools so I actually became an expert at them even though I hate them. I use some of these tools daily and have been using them for years to the point that I'm basically an expert at them. They suck. They're over-engineered (introduce more problems than they solve), have too many dependencies, slow down development speed, etc...
Anyway, these days I know to shut my mouth and cash my cheques... Software development work pays by the hour after all.
Of course everyone is free to have their own opinion, but for good reasons the opinions of the people who create new ORMs, frameworks, languages, OSs, lead big complex projects at big corp are generally valued more that the opinions of those who see dev work as nothing more than a way to make a buck.
And yes ORMs and other tools used in software industry often look over engineered if one doesn't want to understand in depth the problems they are solving, but wants to quickly solve a business problem.
> The people who set the trends in the development industry are extremely smart.
Sure. But, why do the trends keep changing, then? Why don't these smart people just get it right once and for all?
It's because, frankly, the people who set the trends are often wrong. E.g., Java-style OOP with deep single-inheritance chains is not even considered good style IN JAVA anymore.
They're just celebrities, and we're (as a group) just as susceptible to group-think and cargo-culting as people who try every fad diet because some celebrity did it.
Well still, one could argue that if (hypothetically) scientists had been smarter than they were, there would have been less back-and-forth. There may have been less collective effort wasted on scientific dead-ends. It would have been a good thing.
I think where the comparison fails is that science is far more tolerant of alternative theories than coding is. Coding is almost a mono-culture. The choice of tools and acceptable approaches is really very limited and there isn't much tolerance for alternatives; this is especially problematic given the fact that none of these tools and techniques are backed by math. Coding tools and techniques are backed entirely by subjective opinions concerning productivity across a highly inconsistent range of environments and these opinions are riddled with biases and conflicting business interests.
Programming ideology is built on flaky foundations (vague, inconsistent and biased beliefs about efficiency and maintainability) and yet there is intolerance towards alternatives. This is at odds with science which is generally built on far stronger foundations supported by mathematics and yet is more tolerant of alternative theories.
That counter argument sounds reasonable at a superficial level, but let me explain why it's really not the same thing at all.
For context, first you said: "Or maybe you haven't taken the time to learn those tools and techniques and this is why you think they are anti-patterns. The people who set the trends in the development industry are extremely smart."
This is pretty clearly an appeal to authority fallacy. It directly implies that calling a currently-popular programming technique or tool an anti-pattern is evidence of a lack of understanding (not taking the time to learn them). It's not assessing whether there is good evidence or arguments about the value of said tools and techniques, and asserts that the techniques are (at least very likely) good because "extremely smart" people said so.
Then, I asked: "But, why do the trends keep changing, then? Why don't these smart people just get it right once and for all?"
Your rebuttal, to my honest reading, is that hard science theories also change over time and those theories are also constructed by "extremely smart" people; therefore my incredulity over software engineering trend-setters should be no higher than my incredulity over basic science theories.
Your argument fails for multiple reasons.
First and foremost, scientific theories only change in the presence of empirical or logical (by which I mean hard or formal logic, not "logic" like we use when arguing politics at Thanksgiving) evidence. Software engineering is not science (and real computer science has nothing to say about programming practices out in the real world), and has almost no means by which it can test any hypothesis with the same rigor as the basic sciences. At BEST, you can do surveys and case studies of software defects in some very fuzzy, statistical, sense, which would put it much closer to the social sciences like psychology, than hard sciences like physics. But, even those kinds of studies are exceedingly rare. The VAST majority of the time, someone just makes a good sounding argument for or against a technique and we either buy it or we don't.
You're fooling yourself if you think there's any comparison between real science and software engineering.
Next, you're actually somewhat making my point for me. If even basic scientific theories evolve over time, then why would you be so confident that the "extremely smart" people who push for certain programming techniques are right about them, and that the skeptics among us just don't understand them? Surely you're admitting that the "extremely smart" people are wrong sometimes. Now, to prove a physics theory wrong requires a LOT of work, data, peer review, etc. Since these smart programming trend-setters did none of that for their preferred technique in the first place, it really won't take nearly the same effort to "prove" them wrong--they are in a MUCH shakier position than a scientific theory.
So, yeah, I stand by my claim that the programming trend-setters are mostly just cult leaders. They have no objective foundation on which to build their ideas, unlike science, so it's much more reasonable to question them than to question a currently-accepted scientific theory.
Competency is a function of the human and not the tools.
People who create piles of garbage will do so regardless of what you give them.
Let's not pretend ORM made poor behavior more manageable. It was just a different style mess to clean up.
You give them TDD, they'll create bad broken tests. You give them linters, they'll put in arcane and asinine rules. You give them git and they'll have branching that looks like the Hapsburg family tree with commit comments like "fix".
You aren't going to fix behavior with tools. It takes time, discipline, probably a bunch of therapy...
This argument has been presented forever to excuse bad tools, and it is a strawman. Of course good tools don't fix bad behavior. The argument in favor of good tools is that competent people with good tools are insanely more productive than competent people with bad tools.
I don't agree because a large part of competency is choosing the right tool for the right job.
It's like giving Novak Djokovic a hammer instead of a racket and still expecting him to win tennis tournaments.
I bet he invests a lot of thought and effort into selecting his racket, his shoes, his clothes, his food, his coach, his sponsors, his medications, his strategies... Without the ability to choose his tools and techniques, he would be nothing.
... And yet that's exactly what we expect from developers.
If you're advocating for tool fetishizing and rewriting things by chasing hype cycles, then that is part of the problem.
The world is awash in half baked tools that are extremely popular until people somehow collectively realize it's a waste of time.
The frenzied mania of mobs shifting from framework to framework every 6 months is like the world of top40 radio brought to programming; relentlessly promoted crap that just disappears silently to be replaced by more irritating crap with a different version of the same problems because there's always a new generation of teenaged programmers who think they've fixed everything.
The people who follow and fall for it are part of the trainwreck programmers I'm talking about.
Really I want to stop the burnout and churn of the industry. I strongly believe the decrease in quality is because senior people tend to exit. We're 45 years into the microcomputer revolution and 65 or so years into career programming yet whenever I go to a startup, conference or meetup, the "in industry < 5 years" outnumber the old by a significant margin (except for sysadmin stuff).
It's an intrinsic cultural disposition that reinforces itself and we're going to continue to be stuck in this interregnum until we can somehow unmoor ourselves, collectively, from it.
I think a cessation of tool fetishizing and substitution with tool skepticism will help.
There's a bunch of them. I'm opinionated on physical real world actual concrete versioned implementations, not theoretical amorphous relationships existing in pure thought stuff, which is how a lot of these projects "documentation" is written - as if the code lives in abstraction like some improvised jazz scatting.
That practice is pure cult bullshit.
But yes, plenty of orms are fine, as long as it's a convenience. It has to empower the user, not just set up blockades in order to adhere to some ideological purity. That's once again, cult nonsense
It had nothing to do with competency. It was just comparatively more sucky to maintain regardless of competency. If the coders were low quality, it sucked more then low quality ORM based code. If they were high quality, it sucked more then high quality ORM based code.
Thank you for saying this. I don't consider myself some kind of 10x dev. Hell, most days I feel like a 0.1x dev. But, sometimes I can't help but feel like "everyone else" is doing it wrong.
The cargo cult has been wrong many times in the past, yet if you go against today's cargo cult, they act like you're insane. Do we not learn our lessons? Or at least accept that maybe today's "best practices" won't always be considered as such?
Let's keep in mind that if any of us had said anything about OOP being overrated in the 90s and 00s, we'd have been laughed out of the room. It was "obvious" that OOP was the best way to write reliable, maintainable, projects. Now, it's almost the opposite: it's super cool to write a blog post shitting on OOP (while almost never actually defining it...).
Likewise, it was "obvious" that statically typed programming languages were just tedious and getting in our way, so let's write everything in Python, Perl, and JavaScript. Except that now, most people seem to be in the opposite camp: it's all about Rust, Swift, TypeScript, etc.
I hate ORMs. Every single one I've tried feels like a bad abstraction and doesn't really do what I want.
For added, unrelated, controversy, I think Java's checked exceptions are a good language feature, and I absolutely don't care that the creator of C# did an interview 12 years ago declaring them bad. I'm vindicated every time I see someone praise Rust's Result type, or when I see someone write a TypeScript or Kotlin library that implements a Result/Either type. We're moving back toward statically declaring how a function may fail and trying to force the caller to deal with it. It's exactly the same as checked exceptions.
I have also seen some laughably bad and inefficient code written in non-functional languages just for the sake of writing in functional style, when it does absolutely nothing to actually increase the quality of the code/project. FP will absolutely be ridiculed in ten years, the same way OOP is being ridiculed today.
In fact, there are many smart people, and relatively few of them have their manager's ear, while management also listens equally to a bunch more not-smart people pointing to the popular smart people, saying "these people are smart", ignoring the merit of the position of other smart dissenters.
In short, it's all a big train of reasoning based almost entirely on the "appeal to authority" fallacy.
if it works and it's stupid... it's not stupid. If companies around the world can get stuff done with "anti-patterns" then, by definition, they don't "suck".
I'm all for doing better and trying to improve my "craft" but I'm not going to look down on decisions that work.
For a technology to not work at all it has to be really really shitty. Anti-patrern means pattern that is bad. If the tech didn't work at all it wouldn't be a pattern at all.
You could write your website in handcrafted assembly and it would work, but it would still be really stupid.
I for one an looking forward to the advent of a frontend framework where every endpoint would be handled by a (possibly) handcrafted web assembly module.
You can drive a nail with a brick if you try hard enough. Does that make it not stupid to go looking for a brick while there's a hammer right there in your toolbag?
Nope. If it works and it's stupid, it's still stupid.
Just because companies can get stuff done in dumb ways doesn't mean those ways aren't dumb, it means that by brute force, sunk cost and obstinacy they can make things happen. There are any number of technical and non-technical dumb things in any large organization. You definitely can't assume something isn't dumb just because a company does it.
Noone is looking down on ORMs because they work, people are looking down on them because they don't work well enough to justify their downsides in most use cases. That's why people call it an anti-pattern. Yes you can get it to work but in general it's going to make things worse not better.
Now are some things that people call "anti-patterns" actually the right thing in certain circumstances? Absolutely. Just because something sucks in one situation doesn't mean it isn't the right choice in another.
> Nope. If it works and it's stupid, it's still stupid.
Successful businesses with successful and profitable products say otherwise.
The fact that stuff gets done means that it's not dumb, at a base level. You may not like it - and a lot of people don't because it removes the need for overengineered, oversmart and other "solutions" when sometimes the best solutions is literally the "stupid" solution.
> Noone is looking down on ORMs because they work, people are looking down on them because they don't work well enough to justify their downsides in most use cases.
A decade of experience in a few companies I've worked at and every company I interview with says otherwise. EF is top notch and removes a large swath of "boilerplate" code and works more than well enough.
I've had much better experiences with something like EF than I have with stored procedures and triggers and other "better" solutions.
> Now are some things that people call "anti-patterns" actually the right thing in certain circumstances? Absolutely.
See? you even admit that "anti-patterns" are the right patterns "in certain circumstances". The only thing you disagree with is how common those circumstances are and for the vast majority of circumstances? ORMs are a lot better than "custom" layers underneath.
Nothing worse than creating a custom way of doing something that's already done. Inserting your own bugs. etc.
Smart programmers don't solve problems that are already solved. I wouldn't create a PDF parser from scratch (outside of learning excercises). I wouldn't create a Word Document parser. XML parser. etc...
Why would I create a data layer management package when I have better things to do with my time that actually adds value to a company?
I guess you have never had to come in and solve a bad situation that has arisen specifically due to the bad short term decisions someone made in the past.
Not all tech debt is bad, as long as it is recognized when being created and gets accounted for in near future planning. That way it is a deliberate decision taken, after weighing pros and cons, knowing it will need to be dealt with at some point.
The worst of these is when those making the decisions don't know as much as they think they do, forgo weighing pros and cons, and paint them selves into a corner by accident.
Those kinds of situations can literally kill a company (and have).
There are ORM with good pros/cons ratios out there. Python has espacially good ones:
- Django comes with a clunky and poor performing one, but it's very well integrated, super practical, productive and has nice ergonomics.
- Peewee gives you an ORM in a small package, which makes writing those little programs a joy when you don't need anything fancy, but feel lazy.
- SQLAlchemy requires a lot more investment, but is very flexible, generates clean SQL and has extremely correct behavior. It also exposes a lower level query builder for when you don't want the OOP paradigm and wish to express idiomatic SQL behavior, but abstracted away with Python.
This becomes a standard engineering decision, where you analyze the ROI.
I absolutely hated having to deal with SQLAlchemy. It took me days to get through the ugly documentation to a level where I could more or less competently use it, and for what? The code is harder to understand than just writing SQL and the performance is worse. I have yet to learn of a single advantage of ORMs that don't boil down to developers not wanting to deal with SQL.
Currently have to use sqlalchemy on an existing project, and while I am not necessarily disagreeing with you on ORMs or performance, just wanted to add that chatgpt has been able to reduce my time fighting with their documentation by what I estimate of 90+%. To be fair, if I would have written pure sql, I might not have needed chatgpt at all, but i also would have to learn pythons database connector calls for the umpteenth time.
Yeah, the productivity and performance with the Django ORM is amazing. I have trouble using any other web framework now...
The thing people are missing is relations. The most important aspect of your application is how data is related to each other. A good ORM makes that easy.
> Every non-trivial long-lived application will require tweaks to individual queries at the string level. The proper way to build a data layer is one query at a time, as a string, with string interpolation. The closer you are to raw JDBC the better.
This is bullshit, IME. I've been doing this for over 10 years, and there's not a single one of those "requires a tweak at the string level" queries that couldn't be handled better by actually spending 5 minutes reading the Hibernate documentation and doing what it says to do in that situation. But the same SQL fanboys who believe every developer ought to memorize tri-state truth tables and COBOL-style function names can somehow never be bothered to do that.
> there's not a single one of those "requires a tweak at the string level" queries that couldn't be handled better by actually spending 5 minutes reading the Hibernate documentation
So instead of learning and using widely applicable SQL skills you learn and use Hibernate specific skills? And that’s supposed to be a positive?
Once we're talking about this kind of detailed tuning those "widely applicable SQL skills" are usually tied to a single database vendor.
Using any tool requires a certain investment in learning the tool. If you don't want to learn an immensely popular and widely used library that's in pretty high demand career-wise, well, that's your lookout, but it's a crazy double standard to claim that Hibernate sucks compared to handwritten SQL when you spent weeks tuning your SQL but wouldn't spend hours tuning your Hibernate.
> If you don't want to learn an immensely popular and widely used library that's in pretty high demand career-wise, well, that's your lookout
You’re making the argument against yourself. Even including different dialects an in depth knowledge of SQL is far more valuable than knowledge of Hibernate.
Over a decade ago now I used NHibernate, the .NET version of Hibernate on a project. Since that project I’ve written code in JavaScript (Node), Python, Rust, played around with a little Go and Kotlin.
In all those years the only database engine I’ve used seriously is Postgres. Knowledge of Postgres has been invaluable throughout my career. I can’t remember anything about NHibernate. It’s not even specific to NHibernate either, I had to use ORMs in other situations (like Sequelize on Node) and I haven’t retained any of that either.
> Over a decade ago now I used NHibernate, the .NET version of Hibernate on a project. Since that project I’ve written code in JavaScript (Node), Python, Rust, played around with a little Go and Kotlin.
> In all those years the only database engine I’ve used seriously is Postgres.
Interesting, I would think changing database happens a lot more than changing language for most developers. Over the last decade I've worked on systems that used MySQL, Postgres, Oracle, H2, and Sybase, but they've all been on the JVM and almost all used Hibernate (one used EclipseLink but a lot of the knowledge transferred over - even if something isn't called exactly the same thing, they have the same kind of capability structured in a similar way - much smaller than the differences between databases IME. Indeed even having to use e.g. Django ORM occasionally a lot of stuff transferred over). Spending time learning to use an ORM well was absolutely a good investment, far more useful than a bunch of SQL details would've been.
Like, if a company has a Ruby/MySQL stack, they're going to hire someone whose experience was Ruby/Postgres over someone whose experience was Python/MySQL. So if you're a Ruby dev then you open more doors for yourself by learning Ruby's standard ORM in depth than you do by learning a specific database in depth.
> I can’t remember anything about NHibernate. It’s not even specific to NHibernate either, I had to use ORMs in other situations (like Sequelize on Node) and I haven’t retained any of that either.
I suspect that reflects more on what you paid attention to and were interested in using rather than anything fundamental about how easy or hard ORMs are to learn.
> a lot of the knowledge transferred over - even if something isn't called exactly the same thing, they have the same kind of capability structured in a similar way
At a certain point this is getting beyond parody. That statement sounds exactly like it’s discussing SQL!
> I suspect that reflects more on what you paid attention to and were interested in using rather than anything fundamental about how easy or hard ORMs are to learn.
Nor am I saying anything about anything being easy or difficult to learn. I’m talking about what’s transferable. In my experience ORMs are far less transferrable than SQL knowledge.
I think it’s safe to say we’re going to have to agree to disagree here.
> At a certain point this is getting beyond parody. That statement sounds exactly like it’s discussing SQL!
Nope. The underlying relational model is great. Knowing what tables are like, how and why indexes work (and when they don't), transaction isolation levels, all that stuff is extremely useful transferrable knowledge (and will help you out whether you're using an ORM or not). Knowing the three or four different bizarre pseudo-COBOL variants that different database vendors use to express something like "pull rows recursively from this table" or "format this string using these two numbers and this date" is stamp collecting at best.
Disagree. All too often, JPA/Hibernate dumbs the relational model down into record-at-a-time processing with navigational queries reminiscent of pre-relational network databases and counting and joins implemented by unnecessary fetching and client-side processing rather than aggregate queries and joins. Furthermore, it brings fuzzy locking semantics/locking as a side effect of lazy/eager fetching and artifacts such as a pattern of unnecessary "id" columns due to lack of support or asymmetrical and idiosyncratic handling of compound keys, also extra magic for "many-to-many connectives" (the wording alone is non-relational), lack of attention for decimal arithmetic/overflow with money values, idiosyncratic date/time conversions and mappings of enums into ordinals, etc. etc.
Not my experience at all. The support for aggregate queries and joins is very good if you actually use it; likewise the support for decimals, date/time and enums (though I say that from a position of thinking a lot of database date/time handling is wrong, so take that as you will). I'm not a fan of lazy fetching but if it's something you want then the locking semantics are a natural consequence of that. Many-to-many connectives are indeed non-relational, but again if they're something you want (and again I mostly avoid them) then they work the way that they have to work.
Changing databases on an existing application doesn't happen, but I've worked across teams within a single company for ~8 years and in that time I've used:
1. SQL Server
2. MySQL
3. Neo4j (very briefly)
4. PostgreSQL
5. DynamoDB
Each of those was for a different set of applications, and none of those applications changed database, but point being sometimes an engineer will be made to use a variety of databases in their career, even sometimes within the same company (although you could also chalk this up to a particularly laissez-faire style of tech direction).
In our case company was strictly "Oracle-only", but one team did a quick implementation in startup-style of Rails and used mysql. No one forced them to migrate to oracle, company just hired DBAs that now mysql.
Changing the database for an existing system happens very rarely, but changing the ORM or language is even rarer. So learning the ORM well is just as rewarding as learning the database well, IME - and more so once you take into account changing jobs, since you're more likely to take another job using the same ORM but a different database than vice versa.
I've used many different databases. They're all very similar I've not had trouble adjusting. Every ORM I've used has required more time to learn. I'm not against ORMs but I've yet to see a situation where one was a net benefit. Aspects of ORMs are great but going all in tends to add significant complexity.
I'd argue that means it's doing its job quite successfully! The point of using an ORM is so you don't have to also be a DBA on top of writing code. If I have to remember a bunch of implementation details and program weirdly because of the leaky abstraction that is an ORM, then the ORM has failed. If I grab all the rows from the database and then WHERE clause them in my code, instead of querying the database with a WHERE clause, and then wonder when my program's slow, that's on me. But if the ORM makes it straight forwards not to, so I don't need to grok the database, so I can focus on the other problems, then I'd consider it a success.
what's there to remember? NHibernate you basically use Linq, so if you remember Linq you remember NHbernate. The only extra bits are the setup and mappers, which are sort of trivial. The pain points come around because NHhibernate decides to lazy load everything or ends up doing crazy joins. But overall I found NHibernate not too bad, it certainly was fast doing a lot of basic CRUD type stuff.
However, these days I use Dapper.NET and SQL. Dapper gives a nice mapper for parameterized queries and maps results to types. I think it's a nice middle ground.
That's the same debate as whether you to hand craft xml files or have a library do it for you. You'd better know in details how the format works, but going through a library will help cover most common issues you wouldn't want to deal with by hand.
Except that an XML file that passes the spec and contains the required data is as good as it will ever get. The world's foremost XML-typing genius is not going to improve on that.
On the other hand, with hibernate, you get mediocre, inefficient SQL, and could produce much better results by focusing your energies on learning the database instead of learning hibernate.
On hibernate: there is a common approach of having pure ORM handling of 99% of the queries and use a lower level query building tool for the 1% that needs it.
The query to fetch your user profile should be mediocre, properly cached, and fully defineable in the ORM. And most of your queries should be straight and obvious joins between tables with no crazy pants handling.
Something in your basic data structure that throws off an ORM would also probably throws off your new employees looking at the code for the first time, and could be fixed for readability and maintainance.
A thought exercice: XML entities can be self referencing, have you ever though of how you validate a document that has recursive entity definitions ?
That's one quirk that comes first to mind, but the XML format definition is probably at leat a few hundred pages and I'm hopeful there's enough in there to fuel anyone's worst nightmares.
It's kinda interesting in itself that XML is seen as a plain and boring data format. I don't wish on anyone to be the receiving end of an XML typing genius' documents.
The XML specification does contain a schema: the DTD. This is why it is 37 pages; without the DTD part it would be at most a half of that.
Other schemas are not merely popular, they are more powerful. In DTD the type of the element depends only on its own name. In XSD the type depends potentially on the context: the type of 'xxx' in 'aaa/xxx' and 'bbb/xxx' may be different. And in Relax NG we can define different types, for example, for the first 'xxx' and for the subsequent 'xxx's. These extensions make the validation somewhat more elaborate, but still linear, as it we remain within the complexity of a regular expression. These are formal validators; then there is Schematron which seems to be more like a business-rule kind of a validator that also has its uses.
I think I was seeing all the XSLT and XQuery and all the kitchen sink directly bound to XML, but those are just meta tools that don't bear directly on the language.
No. I see that as a reflection of the reality of the database; if your database contains rows that violate your domain invariants, what would you expect to happen?
It may not be a bad idea to fail fast by ORM calling the constructor (same way as Jackson does it when parsing JSON).
Broken invariants may propagate and cross system boundaries making things much worse (I have seen a case, when $200M transaction was rolled back in 19 systems because data was not properly validated at entry - it was a crazy day for everyone in production support).
That comparison would only really make sense if people were advocating writing SQLite storage files by hand.
In this situation SQL is the library. It’s the interface that allows you to query and write the underlying data while knowing nothing about the underlying format. An ORM is just a library sitting on top of the library you already have. There’s just as much to learn, it’s just a higher level of abstraction (until it isn’t because you need something low level).
I think that "requires a tweak at the string level" is not precisely true/correct, but the overall idea is true. For any non-trivial, long-lived, application you will at least have to inspect the ORM-generated SQL string for some query and you'll have to either adjust your ORM calls (e.g., add even more magical Hibernate annotations) or use an "escape hatch" to write the query by hand.
The larger point isn't really that you can't accomplish everything with the ORM, IMO, but rather that the conventional ORM value proposition of "you don't have to write/understand SQL" is totally false. You will have to understand SQL and you will have to fiddle with your ORM to generate the SQL you want at some point. But, if you're going to end up fiddling with your ORM and inspecting the generated SQL anyway, then why not just start with something closer to the SQL in the first place?
Phrased differently, we have to learn SQL no matter what--and you won't convince me otherwise--, so why should I have to learn SQL, and Hibernate's APIs and gotchas when I do Java, and Laravel's Doctrine APIs when I do PHP, and Entity Framework when I do C#, etc, etc.
I interpret parent's point as doing the SQL level tuning through the ORM's dedicated mechanisms, and not through a raw string.
Any decent ORM has extra procedures to tweak the resulting query and inject optimizations that it can't arrive at automatically, kicking the ORM away every time there's some tweak to do is counterproductive.
> Another selling point which is "you don't need to know SQL"
It has never ever been a selling point by anyone with an ounce of brain.
ORMs are made for OLTP workloads, not OLAPs (one of the creators of Hibernate have also said so, but I won’t look it up now), and their primary utility is to save you from writing long and error prone insert and update queries, plus they automatically map an sql row to an object. That’s it.
You will reimplement plenty of parts of it if you go the raw way, so it is again, not a zero sum game.
> Every project I've been involved with that got saddled with an ORM was originally justified "so junior developers don't need to know SQL".
The people who made "not knowing SQL" the main reason to use an ORM should not be making the decision to use an ORM. They can only use it incorrectly.
If there are ORMs that document "not needing to know SQL" as a key selling point of their ORM, they should remove those sentences because it's not.
As someone with 15 years experience using ORMs, you absolutely do need to know how your database works to be effective at using an ORM well. Once you do, and you understand how your ORM works, I still think the benefits of a well-integrated ORM into your codebase make it worthwhile. Generally the code becomes easier to understand, modify and reuse if you're using an industry-standard ORM. Like with most good libraries or frameworks, the number of use cases the maintainers encounter usually means you'll find something pre-prepared when you come across a new problem.
To be fair, in our DAL we use Entity Framework (which is kind of just an ORM++) and a good 95% of the queries we do are just straight up linq. The last 5% which use raw SQL are critical paths we found either directly through testing or as our customers systems gradually got slower as the database grew and needed optimisation.
Eventually the Juniors are expected to learn SQL, but until such a time as they do they can still make useful contributions to the code base.
Honestly, where are they even getting junior developers that don't know some basic amount of SQL or can't learn it in an afternoon? It's such a non-issue. What a great way to ensure juniors never gain proper SQL experience/expertise, and they'll be the later senior developers too, which just propogates the problem.
Sorry to hear that you encountered such teams. It tells a lot about their professionalism and experience. There are teams which do not use this kind of argumentation.
I wish. I have it justified to me as "our SQL is not good" as the reason to use ORMs in my workplace. The result is code that primarily centers around the idea of fetching objects into memory and flushing them back to the database. JOINs are not used, you do lookup chains in Java instead. And if you need to update a field, you fetch the entire object into memory (a SELECT *), update one field and flush the whole thing back to the database. That's what an ORM gets you. A plainly wrong way of thinking about how to manipulate data for developers.
It's not that they don't know SQL. They know SQL. The problem is, the only SQL they know is fetching objects by primary key/some other condition because that's the only thing an ORM does well. The SQL table could very well be a dumb key-value store because that's all that their code treats the SQL database like.
Charitably, they probably meant the process by which the parameterized query string itself is built, in the case where you need a dynamic query structure.
I swear to god I'm going to write a SQL library some day that doesn't take strings as arguments. Although LINQ kinda already does what I'm thinking about, so maybe I just wait for someone to port it.
"With the new SqlQuery method, you can now write your own SQL queries and map the result to any type that you want.
This is useful for tasks that are not directly related to the day-to-day business of your application. For example, this can be used to import data, migrate data, or to retrieve data from a legacy system."
Sooo it’s dapper with string interpolation override to make queries parameterized so you don’t need to manually put the parameters on as 2nd arguments?
> Sooo it’s dapper with string interpolation override to make queries parameterized so you don’t need to manually put the parameters on as 2nd arguments?
Pretty much. C# does some really cool innovations around string interpolation which proves very useful for logging, SQL query etc.
You can create your own string interpolation handler, which is what the new SQL query does and what several log libraries do. So basically you can use interpolation safely and remain assured that you do not introduce SQL injection errors. Also, this way string interpolation does not prevent query plan caching, as normal string interpolation would do.
You can do it in typescript/es6 using tagged template literals. I have done it before and it was awesome. Seriously wish more languages supported custom variable interpolation for it.
Details. But sure. In truth the best way to do the data layer is to use stored procs with a generated binding at the application layer. This is absolutely safe from injection, and is wicked fast as well.
Having been on the team that inherited a project full on stored procedures (more than once): no thank you, not ever again.
Opaque, difficult to debug, high risk to invoke, difficult to version and version-control properly. I’m sure they could be done “less worse”, but I doubt they can be done well.
Thanks for echoing my pain. I am paying for the bad decisions taken by a bloke who only knew SQL and used that hammer to bang on every nail he could find. Everytime someone asks why the result of the stored proc is the way it is, I need to manually run the steps one by one until I find the point where an expected row failed a condition and dropped off. And have extra fun wrapping those statements with declarations incase you are heavily using declared variables in those statements. To top it off, if you are troubleshooting in an environment where you don't have insert/update permissions, you day just became amazing. Fuck you to those who use stored procs without strong justification (like high volume data processing on database side where it totally makes sense).
By version control, I assume you mean the inability to use different versions simultaneously without hacks, unlike a library built using Java where you get to pick and choose the version you want to use. Because of you mean version control of stored proc sources, that would be just like a normal git asset.
I like stored procedures, or I guess to be more specific functions in postgres, but they have to be done in a really specific way:
1) It's all or nothing. If there's functions then they better feel like a complete high level API, enforcing the consistency of data, perfectly handling locking, etc. If there's tons of complex SQL in the app and then also a some functions then it's better to have no functions at all.
2) They need to have a consistent interface, such as always returning an error flag, error message if applicable. Functions must never surprise the caller. It either succeeds and the caller commits, or fail and the caller rolls back and knows why.
3) No monkey business with triggers. They can maintain updated_at columns, or maybe create some audit trails, but anything beyond that will make behavior surprising to callers and that is the worst.
As for version control it needs to be maintained as a separate application, which the server depends on having certain versions of. Even if you don't use functions you have to worry about schema versions and you can't always run two versions at the same time or roll back schema changes easily as a practical matter.
Holy crap yes. This thread has triggered a long lost memory from over a decade ago where everything, absolutely everything was done through stored procedure. Such a wild, wild waste of time.
> - Are maintainable by a team. "Oh, because that seemed faster at the time."
> - Are unit tested: eventually we end up creating at least structs or objects anyway, and then that needs to be the same everywhere, and then the abstraction is wrong because "everything should just be functional like SQL" until we need to decide what you called "the_initializer2".
> - Can make it very easy to create maintainable test fixtures which raise exceptions when the schema has changed but the test data hasn't.
ORMs may help catch incomplete migrations; for example when every reference to a renamed column hasn't been updated, or worse when foreign key relations changed breakingly.
django_rest_assured generates many tests for the models (SQL)/views/API views in a very DRY way from ORM schema and view/controller/route registrations, but is not as fast as e.g. FastAPI w/ SQLalchemy or django-ninja.
> Opaque, difficult to debug, high risk to invoke, difficult to version and version-control properly.
Traditionally, maybe. With databases like Neon (postgresql) and Planetscale (mysql) supporting easy branching / snapshot it's at least made this a lot nicer.
That might be true, but that’s like saying “now that my mechanic can repair my car for free, it’s way less worry to replace my radiator by removing it with a chainsaw”.
The solution to stored procs being awful isn’t making a whole branch just to see what it does, it’s fixing the problems with stored procedures at the root level.
Though im on the fence with stored procs and have seen complicated messes that make it depressingly no fun to work with I also had very good experiences with systems based around stored procs, not in an abusive dogmatic way. The usual answer I’m going to invoke here is that it depends how the tool is used. Any tool, methodology, philosophy can be borked in various ways, that experience traumatize people that they prefer to move on. Of course some tools are clearly worse or less useful than others but you can find some consensus if you know where to look.
I (softly) disagree with stored procs being definitively the "best way to do the data layer." Stored procs are an extremely powerful tool to have in your belt, but lord have mercy do they have their own tradeoffs.
What I've found is that devs tend to be able to write "OKish" SQLs, but fall over when trying to optimize them, so the DB side can then take over and write optimized stored procs, which can have a separate cadence of code updates and version control compared to the backend team.
Stored procedures often increase CPU load of the DB instances, deploys can be more challenging, and security is only better if one takes care how the procedure is authorized or delegated.
They can save some network round trips, and unify interfaces if there are a lot of different stacks sharing one data store. Though it's not universally better.
Stored procedures often increase CPU load of the DB instances
This is of course true if you're doing a bunch of computational work in your stored procedure (as opposed to just doing pure CRUD stuff) but I'm struggling to think of a real world situation where this might be the case.
Can you name some examples you've encountered? (I'm not doubting you, I'm just curious)
I did some complex expiration calculations in a stored function, it cut down on round trips and kept the app simpler. Though it did cause some modestly higher CPU load.
I've also seen MVs cause CPU spikes where there is a lot of SELECT massaging going on. Even without the MVs themselves certain functions like REGEXP stuff can impact performance. Language of the PL matters too, if you've got choices like Python.
Ah! Thanks. I'm not sure why my brain wasn't making the connection.
Materialized views are almost criminally underused. I feel most people don't know about or understand them. They can be a very effective "caching layer" for one thing. I have used them to great effect.
A lot of times I see people pulling things out of the database and caching them in Redis. When in fact a materialized view would accomplish the same thing with less effort.
> Materialized views are almost criminally underused. I feel most people don't know about or understand them. They can be a very effective "caching layer" for one thing. I have used them to great effect.
They are great if the frequency of changes isn't very high and the underlying tables are read-heavy and running it directly against the tables would be expensive (e.g. complex joins).
> A lot of times I see people pulling things out of the database and caching them in Redis. When in fact a materialized view would accomplish the same thing with less effort.
Typically Redis is used in medium to high volume of many smallish but repetitive queries where you want to store some key bits of information (not entire records at which point going to database might be simpler) that are sought often and you don't want to hit the database again and again for them - a bit different from the materialized view scenario.
False. I've developed enormous systems fully using stored procs for data layer and it's absolutely not the "best way to do the data layer". Next you'll be saying triggers are a good idea.
Yes they can be fast, and they're controlled by the DBA, that's why you'd choose them. Injection is not an issue in modern data access layers, that's a 1999 problem.
If your default stance in 2023 is that you should start with stored procedures, you are doing something very wrong and you are a risk to the survival of your business.
For CRUD stuff if you don’t care about execution plan cache and other optimizations it could save you some time sure to query directly. For chunks of code that transactions/procesing large data directly on the server I’d reach out for stored procedures without thinking too much
it's not like stored procedures are inherently
faster than normal queries, right?
They're doing the same amount of "work" with regards to finding/creating/updating/deleting rows.
But you (potentially) avoid shuffling all of that data back and forth between your DB server and your app.
This can be an orders-of-magnitude benefit if we are describing a multistep process that would involve lots of round trips, and/or involves a lot of rows that would have to be piped over the network from the DB server to the app.
Suppose that when I create an order, I want to perform some other actions. I want to update inventory levels, calculate the user's new rewards balance, blah blah blah. I could do all of that in a single stored procedure without bouncing all of that data back and forth between DB and client in a multistep process. That could matter a lot in terms of scalability, because now maybe I only have to hold that transaction lock for 20ms instead of 200ms while I make all of those round trips.
There are a lot of obvious downsides to using stored procedures, but they can be very effective as well. I would not use them as a default choice for most things but they can be a valuable optimization.
For example you can look up millions of rows then manipulate some data, aggregate some other data and in the end return a result set without shuffling back and forth client/server.
you can do that equally well in a stored procedure and a single query
like, you can write a query which does all of these transforms in sequence, and returns the final result set
the data that goes between client and server is only that final result set, it's not like the client receives each intermediate step's results and sends them back again?
If you’re going to mix multiple queries with procedural logic — eg running query A vs B depending on whatever conditions based on query C, then a stored proc saves you the round trips versus doing it in your app code. That’s all he’s saying.
It seems you don't have a lot of experience or understanding regarding stored procedures.
Obviously if you just take a single query and turn it into a stored procedure then yes, the round-trip cost is the same. This seems to be where your knowledge ends. Perhaps we can expand that a bit.
Let's look at a more involved example with procedural logic. This would be many, many round trips.
I'm not exactly endorsing that example. Personally, I would almost never choose to put so much of my application logic into a stored procedure, at least not as a first choice. This is just an example of what's possible and not something I am endorsing as a general purpose best practice.
With that caveat in mind, what's shown there is going to be pretty performant compared to a bunch of round trips. Especially if you consider something like that might need to be wrapped in a transaction that is going to block other operations.
Also, while you may be balking at that primitive T-SQL, remember that you can write stored procedures in modern languages like Python.
good lord man, the condescension is so thick and rich, it's like i'm reading an eclair, and the eclair is insulting me based on its own misunderstanding of the topic of conversation
That'll happen when you're publicly and confidently wrong. If you scroll up, you'll find any number of posts where folks pointed out your misconceptions in a more kindly fashion. When we factor in the fact that you're wrong (as opposed to my post which is correct, informative, and cites examples) I think many would say your incorrect assertions are a lot ruder and less HN-worthy.
Sometimes folks know more about a given thing than you do. That is okay. The goal is to learn. I am sure you know more than I do about zillions of things. In fact, that is why I come here. People here know things.
haha, man, the absolute _chutzpah_ you need to make (incorrect) accusations like this, even as an anonymous person on the internet, is really breathtaking
i hope you reflect on this interaction at some point
You seem to be firm in your objectively wrong belief that:
a stored proc is just a query saved on the db server, nothing more
Absolutely not. They can contain procedural logic as well. You can do a wide range of things in a stored proc that are far beyond what can be done with a query. Again.... I provided some links with examples. You don't need to believe me.
i hope you reflect on this interaction at some point
If you're just writing a single access app for a DB I agree it doesn't really matter much. Where SPs really help is when you're maintaining a DB that multiple projects from multiple teams access and you can present them a single high-level API to use, so that Accounting and Fulfillment aren't reinventing the same wheels over and over. So it's more about stack management than anything else.
My first thought is to scream in horror as Bobby Tables says hi to "string interpolation". complaining about evil ORMs and then just asking to get taken to the cleaners...
I really feel like a good chunk of the discussions around ORM/not ORM and such are because the core method of interacting with our databases sucks.
Passing strings around is an awful interface, and maybe if we explored the space to fix that, a good chunk of these subsequent problems would be obviated.
Man, you know what you can really just mostly ignore when you’re using an ORM?
Injection
If i’m building an app or an API, I will ORM till I die (as long as it’s Django). If I need anything so much as a groupby, though, i will drop right into SQL or create an aggregate table that my ORM can just do a basic SELECT from
If you’re just worried about injection, you just need to use bound parameters instead of string interpolation. Boom, avoided at the database driver level even with plain sql. I admit though that some cases of string interpolation can be harder to catch in your code than when using ORMs.
Humans make mistakes and we can't rely on them not to make mistakes in software development, yes. But preventing SQL injection is not really an "oopsie.". It's a fairly easy thing to spot and prevent, and none of the code I've ever reviewed had a SQL injection bug in it because string interpolation looks completely different from parametrized queries.
I have no problems with ORMs, but you make it sound like it's as easy for a SQL injection bug to go into production code as a memory bug in C. You'd have to actively be fucking up at that point, and if a Junior dev does it, it's a 5 minute talk and they'll never do it again.
I didn’t say you had to be magic about it in the way of git gut, no.
One can for example hook up a static analyzer to your GitHub and configure it to complain if there are any sql calls that can’t be automatically verified as free from interpolation of potentially unsafe data unless those dynamic calls are marked with a comment annotation as having been individually verified by a human as definitely not interpolating potentially unsafe data. I suspect such an analyzer may already exist as a SaaS offering.
If nobody lies about those annotations, a static analyzer can handle the common cases. If you have a human faking such annotations, whichever motivation causes them to do that would create other security issues even with the ORM.
And the analyzer could even handle some kinds of dynamic sql depending on how aware it is of your programming language and type system and what markings you’re willing to apply to functions and/or data with potentially unsafe and/or safe origins. Being vague only in the service of generality here since the specifics vary so much by company.
If there is no SaaS service for this, maybe one ought to exist.
Who is making those claims? Really? I've used Hibernate probably about as long as you have, and to me: it's fine. Is it perfect? Of course not, but if I have to make the trade-off between Hibernate (or nowadays Spring Data) or rolling my own abstraction to interact with the database, for any kind of non-trivial application, I would not roll my own.
When you roll your own, you know what you're doing: still ORM!
And like with many things: date and time zone math, encryption, writing databases, and ORM, if you can pull in a specialized tool, you can focus on your core business, and don't have to become an expert in a field that's not your core business.
It’s frustrating to see “not needing to know SQL” as supposed reason for using an ORM. If you don’t have a decent foundational knowledge of SQL and relational DBs, of course you are going to have a bad time with ORMs.
ORMs can help you to write less SQL, and less boilerplate for getting data in and out of the database. But pretty soon you’re going to need to debug something, and without a solid understanding of DBs and SQL, you’re going to be hopelessly lost.
This year happens to be the first year when I worked with SQL enough to say that I know my ass from a hole in the ground in the language, and my job title is Senior Backend Developer and I have 11 years in tech and 5 as a developer. My journey with SQL is basically the GIF of Colin Farrell running in circles avoiding the violent American tourist in In Bruges; I have some good qualities as a developer but diligence in learning SQL is not one of them.
I learned SQL many many years ago using Microsoft Access. It has (or at least had, I don’t know how it’s changed) a great visual query builder which would generate and run the SQL for you.
Seeing what SQL it generated, and the data it produced from the diagram I had created, was really helpful in learning interactively. Any tool that does this (well enough) should help a lot.
Most of our projects are mysql, and in those we don't use any db-specific features. Those all use in-memory sqlite for tests. Only in one of our projects do we use jsonb with postgres, that one does use postgres for the tests as well. Though, that project probably didn't need it - we're just sort of stuffing semi-regular data from another system into it. It would have worked just as well as a text field that was parsed as-needed, since as far as I remember we don't query it directly, only query based on other columns in that table.
Yes, that looks like a docker-compose config. But, it doesn't really matter. The point is that you could run MySQL in a way such that its storage (the actual data files) are on a virtual filesystem running in RAM.
It's a nice way to do it, IMO, because I prefer testing against a real DB instance in my projects anyway (rather than pretending that a unit test against a fake 'FooRepository' interface proves anything at all).
I come from Java and moved to nodejs for backend four years ago. I now use sequelize as orm. Most of your pain is due to hibernate itself. It may be the only really enterprise level orm but it is a pain in the ass.
Sequelize is extremely simpler and writing code with it is a joy compared to hibernate.
When you say nobody uses different database that’s true 99% of the time but i worked with a company which developed a tool that must be placed within customer infrastructure and this type of customers force you for the db choice since they have highly paid db support teams (financial sector) so they had to support multiple dbs.
A year ago i had to develop a big Java application without orm (cto’s choice): i didn’t remember how tedious, error prone and slow is development without orm!!! Never do it again!
I think the best approach is to use orm for common crud tasks and add specific sql queries when things get a little bit complicated.
Hibernate is not that bad. The problem with ORMs in the Java web space is lots of people used them as an 'extreme DRY' solution to link database 'objects' to the front-end which breaks separation of concerns and forces you to use the same model or resort to ugly hacks for further mapping.
In a properly separated application it works fine.
> A year ago i had to develop a big Java application without orm (cto’s choice): i didn’t remember how tedious, error prone and slow is development without orm!!! Never do it again!
To be fair, JDBC is an awful, awful, API. In a sane language, with a sane SQL API/library, there's really no reason you shouldn't be able to just pull out a statically typed tuple (with proper handling of NULL, unlike JDBC) from a query result in one line.
I don’t get how people jump from “it doesn’t work sometimes” to “never use it”.
You can write raw sql when the ORM fails. But for many queries the ORM does work, so why not use it?
You don't always know before-hand when, where, or how it's going to "fail". SQL doesn't fail. Sure, *I*, the programmer, could use either incorrectly, but that's not what I mean- I mean that the ORM library is always a bad abstraction that either hides the features of the DB you're actually using, and/or has gotchas and workarounds.
If you use both, now you have to deal with integrating them both into your code in some semi-consistent way. Most (some? just me?) of us would balk if someone said they included two different HTTP request libraries into their project because one of them was more convenient most of the time, but the second one was the only one that worked for some of the requests they have to make.
ORMs are an investment. They're always complex and leaky abstractions that require a significant amount of time to learn and wrestle with. If they only serve to make the easy queries a little bit easier, but you still need SQL for the actually-hard stuff, then it's hard to imagine a scenario where that investment is actually worth it.
> If you use both, now you have to deal with integrating them both into your code in some semi-consistent way
I have no idea what you mean by this. What does it matter to my business logic whether a repository function got data from an ORM or from Raw SQL? In either case, I would return the same kind of data.
The edge of your application, where side-effectful stuff like DB queries sits, is exactly where you don't need to do anything consistently. You're always fetching random stuff. At some point, you need to turn that into a format that your code understands. That doesn't change if you decide to only use raw SQL.
I apologize for not being clear, but it was somewhat intentional because I didn't want to get dragged down into arguing about a specific ORM if I made too concrete of an example.
What I'm thinking of is situations where you might have multiple queries/statements that need to be done together within a transaction. Depending on exactly what your ORM's API is like, it may or may not be easy to run a piece of ORM-using code with a piece of Raw-SQL-using code inside the same transaction at that "application edge".
It can also be especially tricky if the ORM in question is one of those real heavyweight solutions that does caching, "dirty" checks for updates, etc. You have to be careful if you ever mix Raw-SQL functions with ORM functions in a use-case.
So, those are the kinds of things I mean. This probably isn't much of an issue for more lightweight, simple, ORMs, but it's an issue I've run into with at least one of the big, heavyweight, ones.
This is why I moved to jOOQ years ago. Nice lean abstraction over SQL, to the point where it's just a typesafe SQL builder and also does the mapping to domain objects if you want. Abstracts and polyfills SQL capabilities just enough for you to be able to port your queries between DBs if desired.
I'm a big fan of jOOQ. I've been able to implement the most complicated SQL expressions in jOOQ. Though there is one particular use case that it cannot do, but my memory fails me. There is an open issue in Github for it that's been open for years. Luke, the creator of jOOQ, is not able to figure out the solution to implement it. But no worries, it's still my favorite way to access my DB.
I've only had a brief stint in Java in my career, but I got to learn about and use jOOQ, and I think it's such a fantastic option in this space. I'm still a diehard SQLAlchemy fan, and I'd use it in Python-land. For Go, I think sqlc is a decent option, but it's no jOOQ. I'd love jOOQ for Go.
jOOQ is far and away the best library in this class for any ecosystem.
It's so good you would need to think long and hard not to choose the JVM for a SQL heavy application because it's just that damn good and most other library requirements are relatively interchangeable.
I dislike ORMs, but good ORMs are significantly better than none at all. They act as a fairly simple layer between the database and the application, and are great for inexperienced engineers to get started with an experienced engineer helping out occasionally for performance hints.
Furthermore they are a good adapter in general. Converting the way the world looks to a DB and the way the world looks to a web-app is always a time consuming effort.
I would have to say that overall ORMs have saved me more hours than they wasted over the years, with AREL in ruby being pretty much 80% time savings minimum. Sure there are occasions where I just drop down to SQL, but most of the time it is just a win. It is the perfect library: Make simple things simple, and complex things possible. Which includes AMAZING debug logging in rails 7. Like game-changer style.
I've used Hibernate in the past, and I feel like it probably saved me 20% of the time, but required me to learn a complex system. It did have some nice features like caching and such.
The ORM itself matters, I strongly do not believe ORMs are an anti-pattern.
As far as the original selling points... meh
- use any database, never worth even thinking about this. If you switch DBs you will have issues.
- you don't need to know sql. True, but eventually you will need to learn it. But I've seen Jr. engineers get far and use this as a learning experience. I think thinking of the problem from both the Sr. level and the Jr. level is important. I happen to have a TON of db experience and am often the guy debugging sql, even if I'm on the front-end team, but many people simply aren't there yet.
Edit: To everyone saying "just let me write normal sql". I want to challenge you to think of how to extract knowledge of relationships and query patterns into a codebase without constantly repeating the same damn joins. SQL is a bad language, but it is the one we have. And so ORMs solve a lot of SQL's weaknesses. They are _terrible_ for writing custom reports, but are excellent for making structured backends. In the end, that knowledge abstraction you'd make if you wanted to abstract the knowledge of data relationships into a codebase ends up being... an ORM.
We switched database providers and our ORM saved us a ton of effort. Sure we ran into issues due to differences between the DBs, but every single insert, update and JPQL query in our app ported over without issue. The few places where we used native SQL, on the other hand, needed to be tweaked and rewritten. If we were making heavy use of native SQL and stored procedures, it would have been an order of magnitude more time consuming.
I used to be of the same opinion that swapping DB providers isn’t worth worrying about, because you’re screwed regardless. But our company was bought, and forced to make the change, and the ORM made a huge difference.
> One of the selling points, which is now understood to be garbage, is that you can use different databases.
It was a major selling point back in the days. You can say that's now a legacy but it was definitely a thing pre-cloud / SAAS.
Lots of software used ORM to offer multi-database support, which was required when you sell a license and users installed it on-premise. Some organizations strictly only allowed a certain brand of database.
You couldn't spin up a random database of flavor in AWS, Azure or GCP. There were in-house DBAs and you were stuck with what they supported.
Lord I can't say enough how glad I am those days are over. Even with ORMs the RDBMS of the early/mid 00s had plenty of differences so that it was a collosal undertaking getting apps to run on multiple DBs.
Sales team: We've got a huge potential client, but they require DB2.
Eng team: <spends 6 months getting shit to work on the insanity that is DB2>
Sales team: Damn, after all that, the potential client went with another vendor.
Eng team: Please pour gasoline on us, we'll light the match.
I have always thought that ORM with overrides could be the answer. At the call site objects to be extracted/inserted are defined. The implementation checks whether there is DBA written query and executes that or tries to generate something. A fancy DAO with auto-generation if you will.
If you want any sort of database maintainability you just cannot have queries concatenated from strings scattered around code, especially in environments with code hot loading. Otherwise, database migrations quickly start requiring shims for old interface. So ORMs/DAOs are absolutely necessary in any larger application just to maintain (hehe) maintainability.
At this point why not have the abstraction layer auto generate queries? DBA time is much better spent optimizing those "few" queries that do matter for performance than writing thousands of straightforward CRUD queries.
> Another selling point which is "you don't need to know SQL", is also garbage
On one hand, programmers do not need to know SQL beyond basic data extraction techniques. DBA is going to be better than them at the job anyway, even if for the reason that DBAs have access to (and to look at) performance metrics. On the other hand, auto-generator is going to be worse than a DBA too, therefore "you don't need to know SQL" is garbage. We already have to fight SQL engine, now we have ORM layer to fight on top.
Worst mistake I ever made, beside of course the night when we left a fully loaded automatic rifle on the roof of our van, was a move from Tomcat + JDBC to JBoss + Hibernate. It went from fast (JDBC) and stable (Tomcat), to unstable (JBoss) and slow (Hibernate).
It was that period in my life with involuntary dealings with guns and ammo, called the military. I was commander of a security detail and we often went about in our region with our gear and live ammo.
This particular night we had our last stop at about 4AM and went back to base. At base it appeared the UZI belonging to the driver was missing. Headlines like "Boy finds military weapon and empties it at the gas station" is a certain ticket for jail time, so I made sure my concerns resonated with the team. Luckily the driver remembered leaving the UZI on the roof, so we went back to exactly the last spot and spend a good hour searching in that neighbourhood.
I think it was then about 6AM and it was getting light. We were getting desperate and I called another commander to assemble his team to help searching. At that point I figured why not replay the whole event with another gun? And so it happened: when that fell off the roof, it landed just a few meters from the UZI.
As a tool for mapping query results to objects and doing crud, hibernate is fine. Certainly it is much better than writing your own hibernate on jdbc. Which you do eventually because everyone gets sick of typing the same boilerplate over and over.
I think you hit a few key points though - don't try to hide the database flavor and don't use JPQL (just write native SQL).
I do. And I use Hibernate to abstract over mysql and oracle, since I want to keep clients that use both.
You need a little more work than that, but it make it possible.
Hibernate is not the technology I love the most, but it does some non trivial work like managing the unit of work, graph caching and mapping that can be useful. It is also highly prone to be used wrong by devs.
You might not need hibernate, sometimes it might help. It should probably be used less than what you see in the wild, though.
> The proper way to build a data layer is one query at a time, as a string, with string interpolation. The closer you are to raw JDBC the better.
Or use a query builder if a good one exists for your language, e.g. jOOQ for the JVM. After suffering through years of Hibernate I could not have been happier.
Counterpoint: We are in the process of switching from Oracle to Postgres and the ORM (is definitely helping. It's otherwise a pain. We've literally written a query in SQL and asked ChatGPT to give us the equivalent in the ORM (then checked it was right, of course).
Haha, that’s funny. I wouldn’t trust ChatGPT but it’s something I may change my mind in the future of course and would give it a go too if I had to go anywhere near oracle, to me it’s quite unapealling and am sure the project it lends itself to is also ‘fun’
For this use case, I don't have to trust ChatGPT, since I can just read the SQL that the ORM outputs. The project isn't terrible, it's just old. Oracle could very well have been the right choice 25+ years ago.
Big +1 to this. I’ve been using JDBI lately and could not be happier. It is not an ORM, only a lightweight mapper. And it supports all modern libraries and paradigms (Vavr, Immutables, immutable “value classes” …)
I used to have a similar opinion, but today I'm not so much against it. There was a project where I did not have access to the "official" database (Oracle) and was forced to develop against MySQL instead. Granted, it was a reasonably simple application, but still, most of the queries _just_worked_ on both MySQL and Oracle.
I would also argue that for querying you don't need ORM (might be better off with something like Jooq), but for complex updates, ORM can be a godsend.
> is that you can use different databases. But no-one uses different databases.
The main selling point to me was that you could write a lot of wicked fast integration tests on top of H2 and exercise your ORM. But that was already at a point where "the testing pyramid" was a diagram that some of your coworkers had seen and were trying to figure out how to explain to everybody else.
So my desire to write fast backend integration tests never lined up with my job responsibilities + opinions on relative test counts.
> The main selling point to me was that you could write a lot of wicked fast integration tests on top of H2 and exercise your ORM
Nowadays in the age of containers you can spin up the database you are using in production easily in a few seconds. Unless you are stuck with using one of the old commercial monsters like Oracle this is not really worth doing anymore.
> Nowadays in the age of containers you can spin up the database you are using in production easily in a few seconds.
Is that a database for ants? At $dayjob “a few seconds” isn’t even enough to send a dump over the network. Which is a moot point anyway because there’s absolutely no way random j dev will get access to all the PII that’s in the production db.
I think the parent pretty clearly meant that you can spin up an instance of the same kind of database used in production. E.g., if you use MySQL in prod, you can spin up a local MySQL docker instance for tests rather than an in-memory not-MySQL option like H2 or SQLite.
ORMs have other selling points. Primarily, it looks like code and works with tooling designed for code.
So changing a column name or type, for example, is pretty easy to refactor, which may not be true for SQL queries and may take significant testing.
Additionally, ORMs allow you to easily hook into things (post-commit, etc) that is often useful, and to define custom things, like JSON serialization with custom types, without a massive amount of work.
>So changing a column name or type, for example, is pretty easy to refactor, which may not be true for SQL queries and may take significant testing.
You can also get that with raw sql on the right ecosystem. I once developed a simple application using hasql and hasql-th(on Haskell). Very close to raw sql strings but with type checking at compile time. Feels like you're driving a lambo.
I agree with this sentiment, been working against ORMs just as long.
ORMs universally suck in all languages, all projects I've worked on (java, c#, python).
You lose productivity, lose performance, lose visibility of what you're application code is actually doing vs. what it should.
I eventually learned how every ORM was implemented differently, spit out different queries even for the simplest select queries and to tweak them is a wild trip around ORM's limitations reading decompiled ORM code. As an aside, java ORMs present the distinct displeasure of poorly thought out, incompatible opaque annotations.
You gain ORM knowledge that does not apply across different ORMs and also subverted SQL knowledge and insights that are immensely useful as a developer.
When my queries are simple, I don't see any reason to use it.
When my queries are complex, I see strong reasons to not use it.
I actively work to remove or not use ORMs in any project I've worked with. The hardest person on the team to convince is usually the newbie or the manager/lead who never has written a single SQL query.
I'll often use different databases, since it allows me to test everything but the database via unit testing, and I've moved projects from one backend to another with relative ease thanks to it.
But I abstract it out to a class filled with database functions that create specific transactions around whatever application I'm working on needs to do. Not via an ORM.
I'm not sure if you're speaking specifically about Hibernate, but if you are making the argument against ORMS generally, I feel like there are some counterarguments to the points you mention:
> Every non-trivial long-lived application will require tweaks to individual queries at the string level
I suspect that the difference in expectations here is about how many of the queries will need that, and whether the number of less sensitive queries is large enough that it's worth having less boilerplate for them even if some other queries do need to be able to drop into something lower-level. People will quibble about where this boundary is, and maybe the real disagreement is about where to drawn the line between "trivial" and "non-trivial" applications, but I don't think it's quite as obvious a conclusion that ORMS aren't useful in the general case as it sounds like you're arguing.
> The proper way to build a data layer is one query at a time, as a string, with string interpolation
This also isn't obvious to me. I can understand the advantages it provides, but it only lets you move the problem of conversion to native types in your language to outside the query layer, not eliminate it. I think there's a solid argument that splitting up the problem that way is better, but again, it seems more nuanced to me than there being one obvious answer.
> The closer you are to raw JDBC the better
How are you sure that it's impossible to provide any sort of useful abstraction here? There are plenty of other cases where using the lowest-level language possible isn't the obvious correct choice all the time; it doesn't sound like you're writing your code in raw assembly either, so there's obviously some utility in using higher-level abstractions.
“ One of the selling points, which is now understood to be garbage, is that you can use different databases.”
Of course there are many cases where it fails, but years ago we /successfully/ used the django orm to write test fixtures that used an in memory sqlite db for quick tests with a mysql production db.
It is not automatic and not free, but it is absolutely not garbage.
If you sell on premise self-hosted software, sometimes you actually need it. A big client will say, your software uses only PostgreSQL but our admin team can only support MySQL. Can your software use MySQL instead of PostgreSQL.
Saying "yes" can help you close a contract worth hundreds of thousands of dollars.
> The older I get the more I like having having separate data and functions.
Am I missing something or is this just OO but with func(obj) instead of obj.func()? Like the data is your instance but you have to track the types and remember what functions work on what. The coupling is just implicit now.
There is an old joke somewhere about some monk and a programmer who is looking for enlightenment and circling through OOP, realizing it has always been FP, which has always been actors, which has always been OOP..
yes but, when you think some more about that means a lot.
There is no such thing as inheritance, you route your data through different functions. There is no such thing as private data, its all public by default.
You can still have strict types I guess, to make sure you are passing the right kind of data into the right function. But most of my experience coding this way is js and lua.
I only code like this for my home projects, I have no idea what it would be like working in a team of 10 programmers all rushing to jam new features into a giant code base.
> There is no such thing as private data, its all public by default
Not in C, put the strict declaration in the h after and the definition in an implementation source file.
Callers of the implementation functions can pass the strict around without seeing any of the fields. Only functions in the implementation can access the fields which results in all acce to the fields going through the implementation only.
Cleaner and safer than letting the caller see everything while only accessing some things.
Coupling is only implicit if you use a dynamically typed language.
Also, in languages with UFCS func(obj, something, else) can be expressed as obj.func(something, else). The calling syntax can thus remain the same even without defining the function within some arbitrary object.
In the rare case where you need runtime polymorphism it is nice. I also think people over use it (hello C#).
If you don't need it for polymorphism its only syntactic sugar anyway, and it gets a heck of confusing (imho) when people think it does s.th. else.
Like how does it make code cleaner when you put the sinus function into a class?
And while this might be an extreme example, you can have a function which operates on two data elements. But because your doing "OO", you arbitrarily put the function into on class. Everything else would be a code smell.
I dont think this is what OO was originally about, but it doesnt speak for the average developer what happened there...
If you mean static methods inside classes, I don’t think those are any different than a namespace. They should be thought as such (e.g. java’s Math::sin, not sure how it’s done in C#)
I mean both. Non static non virtual functions are essentially just normal functions where you don't see the "this" pointer. And you call ob.foo() instead foo(ob).
Of course it's better syntax and often convienient. But many people think there is more to it ("Im doing object oriented programming"), while it is semantically the same.
PS: I get that for namespaces it's not that bad. But still, why not use a namespace? Now you use classes for two completely unrelated things. Congrats Java and C#, your programming language needs one keyword less.
That’s just the “Turing-equivalency” argument, of course in the end we just jump to different points in the code. That doesn’t make it OOP nonexistent.
> How many man-hours have been wasted on ORM, XML, annotations, debugging generated SQL, and so on? It makes me cry.
Agreed. All my greatest successes with ORMs have been ripping them out of projects I've inherited. The results have been more maintainable code that's faster and less resource intensive.
Every non-trivial project that uses an ORM will, at one point, fetch the whole database.
Either by someone accidentally creating mappings that do it without noticing or someone not understanding that there's a whole-ass database in there and they just fetch everything local and filter with a for-loop.
I have been coding on Java since v1 and wrote basic ORMs before Hibernate was a thing (yes, in EJB world). There were pretty big and complex projects in fintech and healthcare, but I have never encountered the situation you describe. Maybe I was lucky, but logically you cannot say “every non-trivial project”.
Some years ago Entity Framework followed navigational properties (foreign keys) by default, which quickly escalates. The scenario could be a user table referencing a organisation table, so that when quering for an org, you’d get every user.
I dont think EF ever did that. However, if you used automatic lazy loading and you somehow inadvertently referenced the collection, then yes, it would lazy load the related entities.
This must be a mistake that you do once as a junior and then never repeat. When you declare a field with collection, would not it be natural to ask how it works?
Definitely! The problem though, is that this issue creeps up on you over time, and maybe only i production, because the performance is directly correlated to the amount of data in the database.
But I guess issues like these are what transition you from junior developer into whatever comes next.
> The proper way to build a data layer is one query at a time, as a string, with string interpolation.
That seems like a false dilemma? Yes, ORM is garbage. But that doesn't mean you can't represent SQL in your host language by eg at least it's AST or something else more sophisticated than strings.
If your host language has a rich enough type system, you can even type your tables in a way that's understood by both SQL and your host language. (But that's not too easy. Even mighty Haskell struggles a bit to give JOIN a proper generic type.)
I've done many database migrations from db2 to MySQL. It's a thing that happens quite a lot. ORMs and Java's usage of database access layers over the underlying sql drivers vastly simplified the process.
Sounds like you've have misused hibernate if you had an 'anaemic domain model'. The only reason to use hibernate is to have a full domain model and then let hibernate map it to the DB. I recently worked for a company that was using hibernate in the way you mention and I was surprised as I had never seen it used in that way in the ~10 years I used it.
I'm not trying to defend Hibernate, but if used correctly it can be a good tool for a good number of scenarios.
There are tons of scenarios where this is simply not true and an ORM is a huge time saver.
You could be building internal tooling that needs to be flexible enough to work with different databases. You could work for a massive multinational consulting firm like Accenture or professional services division at a company like Microsoft and build applications that any customer with a general relational database should be able to use.
This is an honest question. Have you worked on such systems, and which ORMs have you used with multiple databases?
I ask this because (of course) every database has different features, and some of the ORM stuff might work differently depending on what the underlying RDBMS actually is.
For example, many ORMs have an API where updating a table row will also return the new, updated, entity. In my experience, there's usually no alternative (first class) API to update without returning the updated entity. With PostgreSQL, that's not a big deal- you've mostly just wasted a few bytes of memory allocating an object in your application that you didn't need. With MySQL, though, there's no functionality to get the row(s) that were just inserted or updated as part of the same query, so the ORM always ends up doing two queries when the underlying RDBMS is MySQL; now you're doing twice as many real database queries as you actually want/need.
Another example is pretty much all of SQLite. Lots of ORMs "support" SQLite, but SQLite is sufficiently different from your MySQL, PostgreSQL, MSSQL, etc, that the documentation often ends up listing plenty of "gotchas" and caveats for what things actually work with SQLite, specifically.
There are other small things, but those are the main ones I remember encountering frequently. If you're working on one of these systems, how much can you actually get away with pretending the underlying RDBMS doesn't matter?
Yes I’ve typically used Sequelize. There are tons of small differences between databases that add up to a big and annoying waste of time. Quick example: https://www.w3schools.com/sql/sql_unique.asp
I agree ORMs are not a silver bullet and I’ve had my share of pain with Sequelize, most recent being broken migrations specifically with MSSQL when it involved dropping a column with a Unique constraint.
But all in all, it’s still been a big convenience. Part of my career has been in consulting and there, you notice lots of companies are trying to solve similar problems or automate similar business processes.
For example, contract approvals that integrate Dropbox, docusign, Salesforce, and Slack. Something like Zapier may not cut it so you develop a custom app, but it requires storing state and audit records in a database. The app has to work with different databases to cater to different client’s requirements and the application itself is relatively simple. ORMs are great to use in this case.
Looking back, I actually quite liked it - you had conditionals and ability to build queries dynamically (including snippets, doing loops etc.), while still writing mostly SQL with a bit of XML DSL around it, which didn't suck as much as one might imagine. The only problem was that there was still writing some boilerplate, which I wasn't the biggest fan of.
Hibernate always felt like walking across a bridge that might collapse at any moment (one eager fetch away from killing the performance, or having some obscure issue related to the entity mappings), however I liked tooling that let you point towards your database and get a local set of entities mapped automatically, even though codegen also used to have some issues occasionally (e.g. date types).
That said, there's also projects like jOOQ which had a more code centric approach, although I recall it being slightly awkward to use in practice: https://www.jooq.org/ (and the autocomplete killed the performance in some IDEs because of all the possible method signatures)
More recently, when working on a Java project, I opted for JDBI3, which felt reasonably close to what you're describing, at the expense of not being able to build dynamic queries as easily as it was with myBatis: https://jdbi.org/
I don't think there's a silver bullet out there, everything from lightweight ORMs, to heavy ORMs like Hibernate, or even writing pure SQL has drawbacks. You just have to make the tradeoffs that will see you being successful in your particular project.
Can't vouch enough for MyBatis, using it since forever and it never let me down, or produced any sort of frustration.
It strikes the perfect balance: on code level, you still work with domain objects and the db access sublimates away into single lines of code (select, update, ...), but they are backed by handcrafted queries that can leverage the full potential of the db. What MyBatis does is mapping from an arbitrary result set to your domain object, and it's _really smart_ about it (it knows how to convert most data types and perform arbitrary level of recursion/nesting of objects, or return collections). For all those things that it can't possibly do (like intermediary JSON conversions or things like that) you have an handy TypeHandler abstraction, that you can tuck away in a dedicated package for autodiscovery and doesn't pollute your code or queries.
Also, you pay only 3% over raw JDBC in the average case.
If you love MyBatis please donate: it can't go away.
Within the same installation, that's true. Within different installations, not necessarily. GotoSocial uses `bun` as its ORM which allows you to use the same code for a Postgres, MySQL, SQLite or MSSQL backend[1]. Which is extremely handy if you're writing something for other people to use in a system you (obviously) don't control.
[1] Although I believe GTS only officially supports PG or SQLite.
This mirrors my opinion about ORM quite well, but it's missing an important angle that might help understanding why ORM made it that far in the first place:
The big selling point, in hindsight, was running document-style persistence on relational. Today we happily dump JSON blobs in there and we accept that we'd have to deal with the consequences if it ever came to querying on some oddball property (chances are consequences won't even be so bad), but back then it was always relational first, and relational everything. All that busywork of spreading things like the change history of the user's favorite ice cream out to glorious relationality, stuff that will never ever be accessed outside its natural tree structure ("the document")? ORM were excellent at that.
Now that we have arrived at something that I'd consider close enough to consensus that some data is fine too keep in document blobs (instead of spreading out), the value proposition of ORM is much, much smaller than it used to be.
"But no-one uses different databases." Not true. A client wanted an on-premises instance of our product and specified SQL Server while we had only used MySQL before. Due to the ORM there were only tiny changes required to be up and running very quickly.
Thanks man I really feel the same when what should be 2 joins, becomes an all day hellscape of inspecting orm emitted sql, to debug why its blizzard of exists-subqueries is absolutley jacked and why fixing that breaks other code depending on such bugs
As a side note, I actually do use an ORM to support multiple databases with LLDAP, since it's a self-hosted application that can be used with either SQlite (for the most light-weight), or PG/MySQL for a more HA-friendly setup.
> very non-trivial long-lived application will require tweaks to individual queries at the string level.
It's possible we can have LLMs tweak queries to perfection given ORM-style input. Obviously ORMs of the past don't do this, but it's not impossible we couldn't do it in the future.
is an anti-pattern because it fetches all of db.users and then does the filtering in Python, which is horrible. But an LLM could optimize this code to an SQL query like
You don't need an LLM to be able to write SQL though?! Aside from attempting to shove LLM into every conversation, I'm failing to see its relevance here.
You still have to learn the ORM syntax and quirks as well, since they may be neither intuitive nor consistent. Why not just focus learning efforts on the (more) global skill of SQL queries instead of the leaky abstraction that is ORMs generating SQL queries?
I don't think this is a good example of where ORMs fall apart. If you have devs doing the first thing you have issues.
ORMs fall apart when they either do unexpected things, or "expected" but often terrible things, like lazy-loading data that causes a bagillion queries.
I've never worked in a place that didn't use different database. Maybe we're not talking about the same thing, but how are you developing new things if you don't have separate databases? Are you making changes directly on your production database while you're creating new things? How do you test things?
Do you keep everything in the same database, or do you have multiple and then collect data from different APIs?
Not that you need an ORM to have several or indeed different databases. I'd argue that it doesn't even make it that much easier.
They mean database agnostic - that you can easily/magically migrate your application from a DBMS to another and the ORM will handle it for you. That usually works only for trivial databases and does not include data migration.
It’s unlikely that you would be using different db vendors.. not different databases for different environments, that you absolutely should be doing … but pgsql in one and mssql in another is super unlikely
> The proper way to build a data layer is one query at a time, as a string, with string interpolation. The closer you are to raw JDBC the better.
This fits in my head a lot better than an ORM would but I'm a database guy so my brain is already wired different. How do you handle CRUD here? Like if I have a dozen columns in a table and I want to update two of them is that one bespoke query? Or do you always update the full record? Or do you do some dynamic SQL with the skeleton of an update?
This feels like a basic question but it's still one I don't grasp all these years later.
I always make sure to update only those columns they’ve changed if I know in advance. This is to minimize my data getting polluted with bad data should a bug creep in application logic.
That said, there are often times it’s difficult to known in advance. For instance user profile update which is a web form with like 10 fields all of which can be edited. Even if you can figure out exact fields that have changed to write that bespoke query will be intractable as it will lead to combinatorial explosion of update queries.
> This is to minimize my data getting polluted with bad data should a bug creep in application logic.
Yeah makes sense, seems like begging for a race condition. In Postgres it also generates vacuum churn.
So if you do know that I updated a few fields how does that work? Do you run a few single-field updates in a single transaction or do you generate dynamic SQL?
In the case I described I just update all the fields. Anything else is just way too tedious and big prone. I hear that ORMs come in handy in such case but then they have their own set of bigger problems as others have pointed out here.
The good part though is such cases to update all the fields are very few. In 90-95% updates I exactly know which fields are getting updates and for what reasons.
My favorite thing about Hibernate is that it’s sophisticated enough that it has its own query language, which is very, very similar but not identical to SQL. Makes me want to take a drink just thinking about it.
What's your opinion on primarily SQL based toolkits which mainly serve to serialize / deserialize results? For example
Android Room (you define DAOs with methods, and annotate each method with SQL query. `Room` takes care of deserializing results into your domain types etc..).
Or Sqlc in Go: you write SQL with annotations, and sqlc will generate Go code corresponding to them.
There's also libraries like JDBI which lend to less structured usage while still avoiding most boilerplate.
I am optimistic that we are converging on the good parts of the ORMs without the baggage that hibernate brings.
> What's your opinion on primarily SQL based toolkits which mainly serve to serialize / deserialize results?
Without `crud` or some other interface, I would probably have yeeted Go into the sea for all my db-related work because `sql` is horrible to work with.
I've used ORMs quite a bit and I agree with you, but the advantage for me has been getting the database objects into your application, and to a lesser degree good ORMs providing a schema for other code to integrate with.
It's not super popular but I've been using Scala's Quill quite a bit lately. I don't think it's an ORM exactly, it's more like LINQ and automatic mapping between the database and case classes, but it's really nice and does everything I want from an ORM.
I 100% agree. My goto is «hibernate makes simple things easier and hard things harder».
JDBC can be a little bare-bones so I prefer something like JDBI.
Years ago, I worked on a project that used an ORM and multiple databases. An ORM made sense at that point, as setting up and running the database properly was an exercise in self restraint at the best of times. Nowadays with RDS and/or docker, there's no excuse.
That said, ORM's offer one killer feature - code gen/serialisation.
> One of the selling points, which is now understood to be garbage, is that you can use different databases
Yup, and even if the ORM supports multiple databases, once you have committed to one database, it's almost impossible to switch to another due to different convention being used for data mapping.
Like most 10x technologies such as ORMs, AWS, and high-level languages, the goal should be to gain the advantage they provide early on but have a plan to move toward technologies with better long-term outlooks (ideally home-grown alternatives).
Well, in ORM's defense, it auto-generates the "data transfer objects" from the schema for you, so you don't have to. You can save minutes of typing in exchange for years of painful maintenance.
I used to agree 100% with this sentiment, as dissatisfaction with available ORMs at the time (early days of doctrine in PHP) drove me to actually write my own. Turned out an amazing exercise in why orms are hard.
Anyway a few years later I was in a position to start things fresh with a new project so thought to myself, great lets try to do things right this time - so went all the way in the other direction - raw sql everywhere, with some great sql analyzer lib (https://github.com/ivank/potygen) that would strictly type and format with prettier all the queries - kinda plugged all the possible disadvantages of raw query usage and was a breeze to work with … for me.
What I learned was that ORMs have other purposes - they kinda force you to think about the data model (even if giving you fewer tools to do so) With the amount of docs and tutorials out there it allows even junior members of the team to feel confident about building the system. I’m pretty used to sql, and thinking in it and its abstractions is easy for me, but its a skill a lot of modern devs have not acquired with all of our document dbs and orms so it was really hard on them to switch from thinking in objects and the few ways orms allows you to link them, to thinking in tables and the vast amounts of operations and dependencies you can build with them. Indexable json fields, views, CTEs, window functions all that on top of the usual relation theory … it was quite a lot to learn.
And the thing is while you can solve a lot of problems with raw sql, orms usually have plugins and extensions that solve common problems, things like soft delete, i18n, logs and audit, etc. Its easy even if its far from simple. With raw sql you have to deal with all that yourself, and while it can be done and done cleanly, still require intuition about performance characteristics that a lot of new devs just don’t possess yet. You need to be an sql expert to solve those in a reasonable manner while, a mid dev could easily string along a few plugins and call it a day. Would it have great performance? Probably not. Would it hold some future pitfalls because they did not understand the underlying sql? Absolutely! But hay it will work, at least for a while. And to be fair they would easily do those mistakes with raw sql as well, but with far few resources to understand why it would fail, because orms fail in predictable ways and there is usually tons of relevant blog posts and such about how to fix it.
It just allows for an better learning curve - learn a bit, build, fail, learn more, fix, repeat. Whereas raw sql requires a big upfront “learn” cost, while still going through the “fail” step more often than not.
Now I’m trying out a fp query builder / ORM - elixir’s ecto with the hopes that it gives me the best of both worlds … time will tell.
I feel these complaints are going to become a non-issue when we can write the optimal SQL then ask AI to do the conversion to ORM code. Or just ask AI to write the optimal ORM code with known data/performance considerations.
More languages are getting destructuring semantics built into them. I think this problem is slowly taking care of itself. You write a query that renames everything to your struct names, or you write a mapping that does it, then let the programming language do its thing, instead of a third party library doing it.
Essentially your code for talking to the database starts to look more and more like the code that deals with JSON responses, which all just look like quasi-idiomatic code in the programming language.
Oh yeah, another bad reason for ORM: to support the "Domain Model". Which is always, and I mean always, devoid of any logic. The so-called "anemic domain model" anti-pattern. How many man-hours have been wasted on ORM, XML, annotations, debugging generated SQL, and so on? It makes me cry.