You always end up having to do the thing you were trying to avoid plus deal with the middleman abstraction you introduced trying to avoid it. Anytime I hear someone say "- so you don't have to -" I wince and anticipate upcoming technical debt.
So use an ORM so you don't have to write and maintain SQL is bad, because writing and maintaining SQL is not expensive, but using an ORM is (at least in my experience: I've run into many cases where the ORM generates pathological queries that aren't fixable without throwing away all the code that was calling into the ORM; where a bad query in plain SQL is usually readily accessible).
On the other hand, use Erlang's hot code loading so you don't have to restart nodes to update code is not bad: hot code loading isn't super expensive (you have to do a little bit of planning for some types of changes), but restarting nodes is (first you have to move all the traffic, then you have to actually restart it, then you have to move the traffic back).
Not kwowing how to write proper SQL is a problem, and ORM won't solve that for you.
The correct answer is to forget about database independence and double down on just one so you can use it to its full potential.
: And obviously it should be Postgres...
when you are not writing a SaaS, but shipping a piece of software for a customer to run on their own server, it's a very common use case to support different database vendors - especially in enterprise software.
There are non-abstractable differences once you no longer have a toy project, e.g. locking and isolation semantics.
There might be some valid reason for making your applications talk to different DBMS, but you have a tough slog ahead no matter how you do it.
Yeah, sure, go tell that to your manager who promised multi vendor compat to your clients.
Problems with the ORM nowadays always stem from developers who are ignorant of SQL. Those people should not be working for you anyway.
Okay, I'll tell him that 5x the systems means more work, possible bugs, etc. Seems logical.
I'll tell him the same thing if he promises Windows, Mac, Linux, Solaris, and Plan 9 compatibility.
I'm not saying you shouldn't be able to use the same application with several databases; I'm saying you should avoid it.
The working approach is: Use Hibernate (or any other ORM) so that you only need to write the SQL that matters.
There is nothing wrong with using a tool to make your life easier, and there is a lot of necessary-but-borderline-boilerplate work that needs to be done when interacting with a database: validation, caching, transaction management, etc.
An ORM can handle the 90% use case for these things, freeing the programmer to focus on the 10% where the business value actually lives.
Using an advanced tool doesn't remove the need to understand what is going on under the hood: it just eliminates the need to care about the inner workings until they interfere with solving the problem-at-hand.
I'd say focus on the 10% that hibernate sucks at, which will be independent of the business value.
"Use Hibernate", for example, equates to "let third party software I don't understand fiddle around with my bytecode after compilation". That not only crosses the "Nope-line" for me, that even runs past the horizon behind it.
Ignoring that, if a tool like Hibernate requires you to write your business data definitions (i.e. classes) in their specific way, I'd say something's fishy. You'll never be able to untangle the two. Note that, for all their repetitiveness, that's something SQL-based DB interaction approaches would never do.
Edit: It's perhaps noteworthy to mention Uncle Bob's talk about the "Clean Architecture". I think his arguments on how to structure a Codebase based on what is actually important have much much merit to them.
I think the real anti-pattern that leads to ORMs is the fact that languages have no way of natively dealing relationally with data. This is the real origin of the impedance mismatch problem. Languages still deal with data using hierarchical and linear models.
Everything in engineering is about trade offs. You bullshit detector should go off whenever someone acts like there are only pros and no cons. The phrase "so you don't have to" seems to pop up when someone is trying to pull to wool over your eyes.
"So you don't have to" is entirely valid.
You have running water so you don't have to walk half a mile to some well with a bucket. The trade off is the whole plumbing infrastructure and the trade which maintains it.
Not saying the situations are that closely analogous, but a huge part of high level languages is not worrying about that issue.
1. Hibernate system working great
2. Idiot notices that one of their queries is slow, blames Hibernate
3. Idiot writes their query in raw SQL because hurr durr SQL is faster than Hibernate. (Bonus points for never benchmarking anything. Extra bonus points for switching to SQL and adding an index at the same time, observing this makes it faster, and deciding it's probably the switch to SQL that made the difference)
4. Idiot notices their query is getting results that are inconsistent with recent changes that went via Hibernate.
5. Rather than making any effort to fix this properly, idiot disables all caching in Hibernate. There, I fixed it.
6. Idiot notices that Hibernate queries are now slow (gee, ya think?) and uses this as a reason to migrate all other queries to raw SQL.
Hibernate doesn't save you from having to design a good data model, nor from having to tune it for performance. You can't turn off your brain when using Hibernate, you do still have to figure out which entities own which side of their relationships, and when you get to a scale where performance becomes an issue you do have to spend a bit of time doing performance tuning, just as you do when using SQL. It does genuinely simplify a lot of things and save you from writing a lot of boilerplate. But if you spend a week tuning your SQL query and no time tuning your Hibernate query and use that as your benchmark, no shit the SQL will be faster, and if your first resort whenever you hit a slow query is going to be to bypass Hibernate rather than working with it then yeah you probably shouldn't be using Hibernate in the first place.
So don't use Hibernate if you are not an Hibernate expert?!
And don't expect to mix SQL and Hibernate operations freely in the same code, any more than you'd expect to freely mix e.g. encoding-aware String operations and manipulation of the underlying bytes. It's possible to make good use of the fact that Hibernate is implemented in SQL, CQRS-style: use Hibernate for your "live" operations in your application, use SQL for reporting/aggregation and ad-hoc querying, playing to both their strengths. But don't try to interleave them, especially in the same transaction.
People do this with SQL all the time; they see queries that aren't "fast enough" and they add indexes everywhere, or they "nudge" the query planner the wrong way, or anything else than to accept that maybe their data was wrongly modeled in the first place, or maybe they need to remove indexes, or actually, you know tune the DB.
In their fictional scenario they didn't learn to use hibernate or SQL.
>And the hypothetical developer in it is a woman.
Don't assume their pronouns!
I read the title as referring to the (male) author's career; I understood him to be telling his own story through these hypothetical characters.
WTF. No concept of the N+1 problem.
> We’re still struggling with which database engine to use for the final deployment. I thought Hibernate was portable, but we have some native queries that use some MS SQL magic, and we’d actually like to go with MySQL in the production
WTF. Why are you developing against MS-SQL then?
> Sometimes we spend days looking for errors. It seems like we have to analyze Hibernate-generated SQL because we have no idea why it’s not working as expected and it’s producing unexpected results.
WTF. If you don't understand your ORM you won't understand the queries it generates.
> Can you help me understand how Hibernate cache works? I just don’t get it. There are some first/second level caches. What is this all about?
WTF. No concept of using transactions.
I've used ORMs longer than the term ORM existed. I've never once thought it as a replacement for SQL. It's a supplemental tool.
Huh? I feel like I'm missing something. Eager loading, done right, is how you avoid the N+1 problem.
In ActiveRecord, Customer.includes(:orders).where(customer_name: 'Fred') would execute, at most, two queries even if there are many Freds with many orders.
The only real ORMs I've used are ActiveRecord and a tiny bit of EntityFramework.
Is this kind of eager loading not common? Or did I misunderstand your remark?
This can be avoided with something like e.g. EntityFramework which is based on the IQueryable interface and expression trees. In EF it would be
var query = DbContext.Customers.Include( c => c.Orders).Where( c => c.customer_name == "Fred");
This isn't uncommon. Where I'm working now we're actually busy implementing our own transactions to solve performance issues :(
Even rarer, almost no one is aware of implicit transactions. Developers: Always use a transaction, if you aren't then you are using many more implicit transactions.
However time taught me writing maintainable SQL with good migration story is way better. ORMs make porting queries to another project that needs to use the same database difficult.
It's less obvious what is really happening and the performance usually isn't as good as hand written SQL.
Hand written SQL also means fine grained control over prepared queries, access to vendor specific features (advisory locks, LISTEN/NOTIFY, triggers), materialised views etc.
You could argue these should be domain of DBA but in my opinion all devs should atleast know SQL at this level.
Wouldn't you have written a shared library through which both projects access the same database?
I know SQL pretty well (20+ years experience), but still use ORMs with nary a reason to hand-code any SQL any more. As a small-business-focused consultant using Microsoft technologies I do realise this puts me in a different bracket than others using larger-scale datasets and so forth, however for me using an ORM makes my life easier, makes me more productive, and with a little attention to detail doesn't result in performance issues for clients.
Are you arguing that that's a bug or a feature? Because literally every system I've worked with that's gone down this road has been a maintainability disaster, with the entire database schema treated as an API...
If you doubt me on this, I challenge you to go to a typical enterprise and develop a web page with vanilla js or HTML forms today. Or build a data project without using something like Spark, even if you only have a few hundred MBs of data.
No, that problem is called "having shitty people skills that are so grave that you can't sell the benefits of mature technology and experience to people who leave and breathe cost-benefit analyses".
The only common theme is developers who can't interact with non-technical staff.
Most people out there are actually working with relatively mature and effective tools, it's just that the only way you notice people not catching on to fads is the dead silence of keeping your head down while solving real problems.
In my experience, ORMs and specifically SpringData-JPA-Hibernate occupy this weird niche in the enterprise spectrum. They are an overkill for small projects and they become a hinderance for large projects, so they sort of sit there for medium projects. If you start with an ORM on a medium size project and it becomes a big project you can sometimes find yourself in an awkward spot where you are writing a lot of SQL (and not of the JPQL kind). With Hibernate/JPA specifically there are so many things that are broken that they just start getting in the way if your schemas become large enough. To me this is really just a manifestation of Behavioral Problem (Martin Fowler, Enterprise Patterns), where relational databases don't really map to objects at all when you get down to it.
Basically, if you are going to do quite a bit of SQL, any ORM will just in the way. Should be plainly obvious to anyone with experience.
It's my understanding that Hibernate doesn't allow this.
> Basically, if you are going to do quite a bit of SQL, any ORM will just in the way. Should be plainly obvious to anyone with experience.
This is the problem you see : a lot of project don't have somebody very experienced with SQL anymore. Because they need to know a bit of sysadmin, and front en dev, and deployment, and CI, etc. So an ORM helps a lot in that regard, and will help most projects because the problems it causes are smaller that the ones it solves:
- it forces people to put the schema, migrations and validation code in a centralized place.
- it let you reuse your API knowledge instead of having to relearn the stuff for SQLite/Postgres/MySQL/Access.
- it enforces good practices on security with data escaping and cleaning.
- it let you benefit from your language tooling (analytics, linters, completions, doc, shells) to write queries that otherwise would be plain text in the middle of the file for most people.
- it gives a structure to DB operations, so if you look for something, you know where and what to look for.
- it gives you an event system to decouple queries and their effect.
- it gives a common API so you can write easily libs targetting the ORM. This is why the django ecosystem is so rich : you can write an auth backend or a tagging system and know other people will be able to just plug it in.
- it comes with tooling: auto generated admin, forms and APIs, migration manager, paginators...
It does have a cost in perf and sometime flexibility. But for anything that doesn't have 1 million of users yet, it's worth it. Especially in a world where a daily salary can be more than a year of server time.
I cache everything anyway :)
Or you can write queries and play around with loading strategies later / change them over time as the shape of the data changes without having to change any other code.
Every time the ORM thing comes up on HN most people jump on the ORMs are evil bandwagon. I think sqlalchemy is probably a bit of an exception in terms of just how flexible it is, however in my experience you get all the benefits with next to no downside (performance overhead being the most obvious). I generally wouldn't start a project without it these days.
SQL isn't hard. Learning database optimization is nontrivial but doable for anyone. Why do we need ORMs? Why is it better to replace a "WHERE ..." clause with ".where(...)"?
My personal impression is that these frameworks were developed for people that know how to write code in some language but don't know how databases work and want something easy instead of learning how databases work. It really feels very lazy to me.
If you have a library or framework thatproperly prevents SQL injection attacks, why use an ORM unless you feel too lazy or scared to learn how to use a database properly?
SQL lacks support for some pretty basic things like query aliases. Try writing 5 nested queries vs using variables to store the different criteria.
SQL has unnecessary grammar restrictions. Where you're putting your ORDER BY should be irrevant. Join syntax is frankly horrendous. It's difficult to autocomplete.
Embedding SQL in your code is a recipe for disaster. Your compiler does not understand it, therefore refactoring means going over every query you have and manually looking to see what changes to do. It cannot be validated in compile time. SQL strings have no knowledge of the backing domain until they're actually run.
And finally, 99% of the time you really do want to fetch and object with relations from the DB, not simple maps of field names. Aliasing field names in SQL from joins is a waste of my time. Making the returned data map to whatever object actually contains the business logic I'm interested in gets old, fast.
SQL isn't supposed to be composed. It is a query language for a relational database structure. What composition is to run of the mill programming languages is what relations are to databases, roughly speaking.
>SQL lacks support for some pretty basic things like query aliases. Try writing 5 nested queries vs using variables to store the different criteria.
"Basic things" that are basic things for programming languages. SQL has support for what it should have support for - having intermediary variables (in the sense that general purpose programming languages have variables) is not one of those things. This comment belies a lack of understanding of how databases work and why.
>SQL has unnecessary grammar restrictions. Where you're putting your ORDER BY should be irrevant. Join syntax is frankly horrendous. It's difficult to autocomplete.
The decision regarding where to order is absolutely relevant and is strongly dependent on your data model. Join syntax is a syntax to accomplish joins, which are fundamental to modeling of relational data.
And, frankly, if autocomplete is a criteria for you, then, in the context of these tools, you are not a developer, you are an end user.
>Embedding SQL in your code is a recipe for disaster. Your compiler does not understand it, therefore refactoring means going over every query you have and manually looking to see what changes to do. It cannot be validated in compile time. SQL strings have no knowledge of the backing domain until they're actually run.
This statement makes it clear that you have not seriously developed an application that is strongly dependent on a serious database. At least almost everything you said here is "not even wrong": https://en.wikipedia.org/wiki/Not_even_wrong
>And finally, 99% of the time you really do want to fetch and object with relations from the DB, not simple maps of field names. Aliasing field names in SQL from joins is a waste of my time. Making the returned data map to whatever object actually contains the business logic I'm interested in gets old, fast.
I suggest you take a look at database design and relational information modeling. You really seem to not know what the problems are that relational databases solve, how to use them, or how to use them well.
How do you build a filtered search system? Query languages can definitely be composed. Look at Codd's relational algebra operators; they can most definitely be composed.
You can't do this in SQL:
base_query = get_base_query()
query = base_query.filter(condition_a)
query = base_query.filter(condition_b)
> SQL has support for what it should have support for - having intermediary variables (in the sense that general purpose programming languages have variables) is not one of those things. This comment belies a lack of understanding of how databases work and why.
Um, no. In fact the WITH statement was added in the latest SQL standard to help with this because, well... it turns out writing 5 nested queries is a pain.
> And, frankly, if autocomplete is a criteria for you, then, in the context of these tools, you are not a developer, you are an end user.
And you've never heard of data analysts who literally spend all day doing nothing but running queries and reports? You've never run drafts of queries to see if you're doing reasonable things?
> I suggest you take a look at database design and relational information modeling. You really seem to not know what the problems are that relational databases solve, how to use them, or how to use them well.
If you're going to lecture someone when you have no proof of backing your assertions with real-life use cases then I suggest you actually go read Codd's original paper on relational algebra rather than praising SQL as something handed down from the gods. It's an imperfect representation of the tree structure of relational queries composed of projections, joins, and selections.
If you're going to be cargo culting SQL, at least actually understand what the original was. SQL was the 70's idea of a "natural-looking" language to model relational algebra. And relational algebra is only one way to understand relational data stores (tuple calculus is another decent formalism for understanding it as well). And relational stores are only one of many different data modeling tools.
There is literally nothing in SQL that you cannot do more ergonomically with most query builders, since everything boils down to the same thing: an abstract tree of query expressions that can be understood by a DB engine and manipulated through basic relational algebra rules to optimize according to heuristics.
Welcome to Java …
I generally find that an ORM makes easy things easy and difficult things indescribably hard or impossible. Honestly, the relational model is great, and it's worth understanding well. Frankly, if you're a software developer and you can't learn SQL … you're not really a software developer.
I agree, but raw sql makes the easy thing hard (or at least cumbersome), so the nice middle ground is to use hibernate for the easy things and sql for the hard things.
I've found that nearly every abstraction does this in some form. In general, the easier the easy things are, the harder the hard things are. The trick is finding a good balance for your project.
Maybe in giant non-tech company legacy systems, or written by Indian off shore teams.
These days with Java 8 there is almost none of that bull shit.
Anyone with having worked with both, care commenting ?
Now, this is not a judgment for or against SQL, given there are plenty of users of Hibernate, it has clearly worked for some teams. Equally, given people still write SQL, I am sure it has worked for many people.
Fundamentally there are two questions to my post:
1 - how much could have been saved in this scenario by a better team
2 - how often is it that the right solution now becomes the wrong solution later?
If they had chosen to go with plain SQL, maybe the project would be finished successfully by year two.
I found the problem.
No codegen, just made SQL resources a top level type safe resource you could work with.
I liked the idea a lot.
Who in their right mind would use Hibernate (or any other ORM for that matter) for reporting?
Back in the day, a friend of mine brought me in to write business software in Java for a small retail company. They didn't have much money, but they also didn't have any management that cared about our technical decisions, so we had a freedom that helped our careers tremendously: Nobody is making any less than 3x what they did then. But along with that freedom came the responsibility of getting the decisions right.
Hibernate was a great improvement over the awfulness that was naked JDBC, but the risks to bad performance were so terrible that we ended up just writing some decorators to make JDBC usable instead. Years later, I went to a big enterprise place, and saw what happens when developers stop learning SQL, and use hibernate to run complex queries.... and then I learned how big of a bullet we dodged.
Hibernate wasn't the only bullet though: One that Java is still suffering from is Spring. We decided that exchanging boilerplate you compile for XML was not a good tradeoff. Later they added annotations, but still, handing all of that wiring to the runtime is just not a good idea. Spring also taught people that it's easy to just have objects that can be constructed in ways that are broken (zero param constructors were easier to use), made application startup slower than even EJB, produce error messages that make the Clojure compiler seem friendly, and added all kinds of insane nonsense that made Rod Johnson quite rich, but didn't help Java itself move forward. Java 1.4 and java 5, out of the box, are like a person with a broken femur. Instead of fixing the bone, most of the ecosystem was just feeding the patient oxycodone: The result was that the patient still had a broken bone that was setting incorrectly, and the patient was now addicted to opiates. And yet, the people that wrote the bad tools that caused this have nice houses in Nob Hill and get to try to build new startups, now selling us their magical solution to the microservice problem.
The Java ecosystem only started to move forward when Ruby on Rails came to town, and, despite all of it's own problems, made it clear that the whole thing had to move forward. Now the JVM has far better languages than Java, and Java itself has actually improved in important ways, but that was due to competition, not bad solutions to problems.
I think we are back in the same boat with NoSQL databases: The companies selling tooling that did well didn't do so on good engineering, while the companies that did the best engineering struggle. But is the whole movement a good idea, or are we just going to see something like Spanner come in and the technologies that people got so excited about were just low quality solutions to problems that most of the time we didn't even have?
I think there's more to learn from the pattern of how developer tools succeed (and fail to deliver), than from just the lesson of Hibernate's tremendous shortcomings for anything complicated.
I write SQL (or SOQL...ugh, Salesforce) virtually every day on the job (I do a ton of ETL in Ruby) and use it to give valuable business intelligence.
Constantly I see other programmers jump through all sorts of technical hoops to get things done that could be done in a DB query quicker with more accuracy and less mental overhead.
Just write your own toy ORM sometime. Copy the most-common functions of ActiveRecord. It's that easy. That's all it takes.
TLDR: ORMs are a good tool but must never be used to avoid learning SQL. An SQL knowledge is necessary to not shoot yourself in the foot.