Problems with JPA/Hibernate

doctor_eval · on April 11, 2021

In my extensive experience managing teams using JPA ORMs including Hibernate and EclipseLink, you not only get to learn the unavoidable details of your target database’s SQL, but also the complex, non-obvious side effects - especially caching interactions and performance edge cases - of the ORM as well. You also get to learn two distinct but similar query languages (one of which you can’t use anywhere else but Java). You get to throw away all the semantics built into the database structure, including knowledge about indexes, and you need to duplicate almost everything from SQL into Java, in the form of complex annotations against entity objects that you often wouldn’t need with traditional SQL access patterns. And when something goes wrong, you have a large, complex framework, including an inscrutable caching layer, between your code and your data - which can make debugging JPA code very challenging.

In return you get reduced performance, overly complex and difficult to read SQL code generation, an inflexible entity model, and non optimal database access patterns.

JPA is an extremely complex framework whose benefits, in my opinion, outweigh the costs in only a small number of relatively simple use cases. And even in those simple use cases, in my experience it is far less performant (and double the LOC) than using SQL directly.

I’m not saying you can’t use JPA in large, complex applications, because you obviously can. But those applications are harder to write, much larger, less maintainable and less performant than the equivalent applications written with simpler frameworks or libraries.

I personally found MyBatis annotated interfaces to be the ideal way to map SQL to Java, but perhaps ApacheDB is better for smaller projects.

JPA is one of those fabulous engineering experiments that, sadly, didn’t work out. It breaks every rule in the engineering book in order to make SQL work like Java, but it doesn’t succeed, and I could never recommend it to anyone.

cletus · on April 12, 2021

Man, you and I see eye to eye here.

I'm a firm believer in leaky abstractions [1]. I often found with JPA that I would be fighting the framework to get it to produce the SQL that I wanted. I encountered (then) ibatis and it was a breath of fresh air. It took care of much of the tedious column-to-Java mapping and didn't introduce another DSL.

As soon as anyone tells you "you don't need to learn X with our framework Y that sits on top of it", all that means is you have to learn X, Y and the X->Y and Y->X translations and all but the first is proprietary.

This was my core issue with GWT (Google Web Toolkit) too. Too many engineers who weren't interested in learning Javascript saw it as a way of avoiding learning Javascript.

It's been many years at this point since I've used it so this may have changed but I recall that query verification was a runtime error issue and there wasn't a good way to essentially check that all your queries were valid. I mean you could write this yourself but it seems like some CI/automation could've done a lot of the heavy-lifting.

IIRC the same went for integration testing and stub databases for reading and writing via ibatis/mybatis.

[1]: https://www.joelonsoftware.com/2002/11/11/the-law-of-leaky-a...

doctor_eval · on April 12, 2021

I love the analogy to GWT and suffered it first hand - we believed the hype, that we didn’t need to learn JavaScript - but in the end we ended up needing to fully understand the whole stack in order to work through edge cases.

I don’t mind leaky abstractions but the killer with JPA was the cache. IIRC, mixing pure SQL with JPA was really tricky because you’d end up with cache inconsistencies. So the JPA abstraction was leaky but the the framework effectively assumed that it wasn’t, which lead to all kinds for workarounds.

js8 · on April 12, 2021

> As soon as anyone tells you "you don't need to learn X with our framework Y that sits on top of it", all that means is you have to learn X, Y and the X->Y and Y->X translations and all but the first is proprietary.

Exactly, except it's not just about language Y being proprietary. (I am using term "language" instead of "framework", because they are, in the broad sense, conceptual languages.)

It gives you a hint when the language Y is worth having in addition to language X - only if the language Y makes it significantly easier to model problem in your domain. Therefore, by definition, the language Y has to have some limitations that X doesn't. If Y is as a general language as X is (it can express same problems with similar effort), you end up doing busywork translating between X and Y.

And this problem plagues ORM frameworks of all sorts - it's not clear what is more easily expressed in the language of the ORM, which is not already expressed in the relational algebra (and SQL), and conversely, what are you giving up by using ORM. Although it happens in all sorts of DSLs.

wrwatson · on April 12, 2021

I am called apon to investigate database performance problems. Teams using Hibernate sometimes send me the Hibernate query. This is not enough to understand the SQL which has been issued. I would also need to see all the entity objects, and maybe some configuration parameters. Without the SQL, which nobody can exactly predict, it is difficult to performance tune.

So they turn on logging, get the SQL, and email that unreadable mess. In most of the cases I have seen, the SQL is fetching much more from the database than what the code really needs.

The first step when optimising SQL is to only ask for data that you actually need. Hibernate, as I have seen it used, defaults to fetching too many columns. I cut the SQL down, and come up with a performant statement. The developer has the challenge of translating the performant SQL back into Hibernate.

Hibernate makes easy things easier and harder things harder.

doctor_eval · on April 12, 2021

Meh, I think JPA makes easy things expensive and hard things really hard. :)

de6u99er · on April 12, 2021

Having managed myself multiple projects using various JPA implementations I think JPA is great for relational database driven applications. Some of my projects contained 150+ tables. I always use a combination of JPA and JDBC. And everything derived from a base object that comes with create and update timestamps and users, plus a uuid based id field.

The trick is that while knowing what you are doing in regards of limitations and performance, you design the data model alongside cardinalities from user facing components. You'll end up with a highly normalized schema.

The nice thing is that you can create views later on and transform data into shapes that are more useful to others.

If you make the mistake to start with the data-model only without knowing how the data is being collected or used you will most likely spend lot's of time converting entity objects into other objects and the other way around. The worst thing I have seen in my career, was half of the business logic implemented in Java and the other half in PL-SQL.

JPA is great for large projects! I haven't had a single issue that we could not solve.

doctor_eval · on April 12, 2021

JPA is OK (but only OK) if you don’t have anything other than JPA accessing the database but in our largest application (1,000+ tables) this was never going to be the case. We had plenty of experience with database design, that’s not the problem.

But we found even in simple applications with a dozen tables, the generated SQL was suboptimal. For example, to delete the children of a parent row resulted in a DELETE statement for each child row. I’m sure there are loads of good reasons why it did that, but it’a not something that you’d even consider in pure SQL.

The systems I build these days tend to put 100% of the database logic in the database using (you guessed it) plPGSQL, and leave Java (or actually Go these days) to do the non database transforms, networking etc. you could say that I replaced learning about JPA with learning more about the insane capabilities of my database.

I’m sure you (and half of HN) are gagging right now, but I know I’m not alone, and I’ll take 75% fewer LOC and 1-2 orders of magnitude performance improvement over JPA any day.

tablespoon · on April 12, 2021

> But we found even in simple applications with a dozen tables, the generated SQL was suboptimal. For example, to delete the children of a parent row resulted in a DELETE statement for each child row. I’m sure there are loads of good reasons why it did that, but it’a not something that you’d even consider in pure SQL.

IIRC, it does that if you use Lists instead of Sets.

https://dzone.com/articles/best-performance-practices-for-hi...

Ygg2 · on April 12, 2021

Sets in Hibernate suffer because you can store an object before it was updated and Hibernate will update it.

E.g. If say

    Person { 
       id: Integer, 
       name: String
    }

Let's say I have some entity that stores Persons as HashSet. If I add them before saving, their hash set is taken without Id.

Id can be automatically generated by Hibernate on save/create.

After save the Id now has a new value. I.e. its hash differs from when it was saved.

Boom. You have HashSet with Zombie entries. That are technically that you can't fetch actually.

doctor_eval · on April 12, 2021

Pretty sure I would have used a list, but making the Java interface ordered surely doesn’t change the semantics of the underlying relation?!

tablespoon · on April 12, 2021

> Pretty sure I would have used a list, but making the Java interface ordered surely doesn’t change the semantics of the underlying relation?!

Yeah, it seems a little bonkers on first glance and I haven't ever dug in to see if there was a justifiable reason or not.

IIRC, if you just need ordering, you could just use an SortedSet. I think technically lists are for when you need ordering and duplicates.

doctor_eval · on April 13, 2021

Yeah it is bonkers! Not just at first glance. The fact that the SQL integration for the child object behaves differently based on the kind of collection used in the parent is exactly the kind of unintuitive behaviour that drove me nuts.

There are so many rules to remember for something that is supposed to make SQL integration natural.

imtringued · on April 12, 2021

How do you get 75% fewer LOC without using JPA or Hibernate?

All I have to do to create a new table is define a class with some attributes and the corresponding SQL migration, I don't even have to annotate anything. To create a new row I just do new Table(attr1: expression1, attr2: expression2).save() and I am done.

Compare that to the usual mess of writing down the same column name four times with raw SQL (1. to reference the column, 2. to set the column value with a named parameter, 3. to set the named parameter, 4. declare the variable you wanted to save in the first place).

Of course, I'm not using Hibernate directly, that would be foolish. Instead I am simply using Gorm to wrap the awful parts of Hibernate (the irony).

doctor_eval · on April 12, 2021

So now you have code in Gorm generating DDL and JPA code? In terms of the compiled LOC, doesn’t this add even more code? Not to mention another dependency to maintain.

Anyway - I’m definitely not advocating for raw JDBC, there are lots of libraries you can use to reduce the complexity of the integration with the database, it’s just that JPA is not (IMO) one of them.

One thing I don’t understand about your comment it “the usual mess of writing down the column name four times with raw SQL” - when is this necessary? ’update table set column=? where key=?’ has no duplicate names. Can you give an example?

I’m intrigued by how your solution to JPA’s verbosity problem is to add another layer, while mine is to remove one!

overtomanu · on April 13, 2021

I think he is talking about upsert

kimi · on April 12, 2021

"Bad practice - if you hide the database, you may get something done quickly, but it's a bad idea. If your Java code expects to have a collection of one million objects as an array, it does not matter if they are lazily loaded or not - some code somewhere might want to iterate over them, and this will kill the process. You cannot really forget that there is a database somewhere, and you should not do it."

https://github.com/l3nz/ObjectiveSync

doctor_eval · on April 12, 2021

Great synopsis of JPA problems generally!

edem · on April 12, 2021

This comment is spot on. JPA/Hibernate is a very big leaky abstraction. That's why I ditched the whole thing and started to use things like jOOQ instead. In the end you must learn SQL to make sense of all of this anyway.

eitland · on April 12, 2021

Generally you should use the right tool for the job and that might be something else. This however:

> You get to throw away all the semantics built into the database structure, including knowledge about indexes,

suggests you either don't know JPA very well or your writing is a bit sloppy.

Please avoid making sweeping generalisations about tools that save hundreds (or al lot more) of hours of programmer tools just because you didn't get it.

Too often I see people going with a lesser alternative because such comments are scaring them away.

sabellito · on April 12, 2021

They made some concrete criticism based on their experience (hard to maintain, complex annotations, etc), but you in turn dismissed those with "you didn't get" and some borderline name-calling.

Which bit they didn't get?

eitland · on April 12, 2021

doctor_eval wrote:

>> You get to throw away all the semantics built into the database structure, including knowledge about indexes,

I wrote:

> suggests you either don't know JPA very well or your writing is a bit sloppy.

To suggest that JPA means "throw[ing] away all the semantics built into the database structure, including knowledge about indexes" indicates either that the person doesn't know what JPA is about: indexes should be used with JPA for all but the most trivial setups. Same goes for other general database knowledge.

You wrote:

> They made some concrete criticism based on their experience (hard to maintain, complex annotations, etc),

I just pointed out that it was either a misunderstanding or so sloppily written as to create misunderstandings.

> but you in turn dismissed those with "you didn't get" and some borderline name-calling. Which bit they didn't get?

The bit about JPA not standing in the way of using good database practices.

It is not name-calling. I point to the exact words doctor_eval uses. To be polite and also be sure to make room for error on my side I also offer the option of sloppy writing. Sloppy writing happens to all of us and I'll be happy to know if doctor_eval didn't mean it or if I've misread it.

If not however it is not smart to use big words to try to trash a super useful tool that save many of us lots of time and often increase quality.

doctor_eval · on April 12, 2021

I don’t think my point was sloppily written, but perhaps you misunderstood it.

JPA is conceptually incapable of automatically utilising the structure of the underlying database (despite the fact that the database structure is actually dynamic), so it makes up for it by requiring you to write these huge entity models which declare - and duplicate - that structure. You effectively “throw away” the database structure because you have to rewrite it in Java/JPA. I mentioned indexes but that’s just a side issue.

One consequence of this is that it makes refactoring much harder; I found that Java side of a JPA refactor was way more effort than the underlying SQL refactor, but that really shouldn’t be the case.

eitland · on April 12, 2021

I might start to get what you are hinting at. Still the

> JPA is conceptually incapable of automatically utilising the structure of the underlying database (despite the fact that the database structure is actually dynamic), so it makes up for it by requiring you to write these huge entity models which declare - and duplicate - that structure. You effectively “throw away” the database structure because you have to rewrite it in Java/JPA.

The way you wrote it: "you get to throw away" made it seem like you have to throw away your knowledge. (emphasis mine)

What you describe above is easy to read, but is of course not "throwing away" but duplication, which is of course an issue, but a totally different one.

> I mentioned indexes but that’s just a side issue.

Is indexes an issue with JPA or not? ;-)

You (or others) might wonder why I pick at your comment but I am so tired of seing people choosing inferior solutions because people are scaring them away from what would be perfect solutions for them.

doctor_eval · on April 12, 2021

It’s fine to pick at my comments, that’s what HN is for!

I accept that when I said “you get to throw away...” I was being a bit glib. I meant that we write the DDL, run it, and despite it being stored in the database as structure, that DDL is never used again by JPA and you have to rewrite it in annotations.

I thought lack of knowledge about indexes was an issue because I had a use case where the generated SQL was (very) sub optimal and thought it was due to an assumption made by JPA, but someone pointed out that maybe I used a Set instead of a List. I still think that’s stupid (it’s hardly fixing the “impedance mismatch” when which is the very purpose of JPA), but might not be index related. That said, knowledge about indexes does impact the design of handwritten queries and JPA doesn’t have that; JPA queries can be overly complex, and even pathological, so again I think the point holds up.

In my defence I was listing a lot of issues. I feel like I could write a book on the problems with JPA but all I had was this tiny box on my phone.

In terms of scaring people off, that is definitely my intent. There are lots of great ways to get Java talking to SQL, many linked to in this thread, and IMO the majority of developers would be better off choosing almost any alternative over JPA.

eitland · on April 13, 2021

I still think your experience is a bit on the extreme side but I'll still admit you've got me thinking: not because I've seen ORM causing much problems (I can use them just fine with indexes and custom SQL) but because maybe there is a better way.

Last I read up on MyBatis is probably >10 years ago and IIRC at that point it seemed like a manifestation of what the Rails guys teased us with:

  <situps>
    <up/>
    <down/>
  </situps>

but I'll admit:

- I've been wrong before

- things might have changed in 10 years

- I've always known there were cases for something else but maybe I should adjust my threshold

For anyone else who reads this, just be aware that many smart people are happily using JPA. :-)

doctor_eval · on April 12, 2021

I’m happy to scare people away from JPA! There are much better alternatives - such as MyBatis - that will make developers far more productive, and reduce LOC, heap size, and complexity.

debarshri · on April 12, 2021

Have you ever tried - ormlite-core[1]?

I recently found this obscure but quite stable library, kind of fell in love with it.

[1] https://github.com/j256/ormlite-core

dale_glass · on April 11, 2021

I disagree with this bit:

    A User can be considered unique in one context by its email address, or by its social security number

Personally, I'm a fan of giving everything a random UUID, because it's more flexible. It's random and impossible to guess, it scales well because there's no central bottleneck like with an autoincrement, and it's future proof and flexible.

What happens when the user changes the email address? What if the social security number changes, because it was wrong or because it actually changes? What if the original unique identifier wasn't a good choice? What if we decide that the user can have multiple email addresses? Then you may end up having to restructure the entire database, which will be a very annoying thing to do. What if you implement additional rules for what an email is allowed to look like and now the constraint fails for existing users, and this correction needs to be propagated to millions of already existing rows?

Real-life personal data is weird and fuzzy. They can violate seemingly sensible rules like being unique, unchanging, or conforming to any rule whatsoever. Best not to let them spread all over the DB and cause trouble later.

Instead, you could just have a random ID that doesn't mean anything and therefore can stay fixed forever, and any user-related metadata stays in the user table, where it can be modified as needed. Plus an UUID is a fixed 16 bytes, which is easy and efficient to deal with.

jacques_chester · on April 11, 2021

> What if the social security number changes, because it was wrong or because it actually changes?

What if the user doesn't have an SSN? What happens if they have one but lawfully refuse to provide it? What happens when you ask for and SSN from a US citizen who is also a European citizen? What happens when your database leaks?

In general, relying only on natural keys is a nightmare. Double nightmare if it's PII. Natural keys only work if you are flawlessly omniscient about the domain. And you aren't.

akra · on April 12, 2021

In my experience from what I've seen there are ways to use natural keys and handle domain changes - I've seen some systems like this work quite successfully. The cost to using synthetic IDs (auto-increment, UUIDs) is a lack of reproducibility and slower importing of data especially across multiple tables/entities limiting scalability. This can be very problematic for certain classes of applications I've seen, but not most. While I agree with your comment for many classes of apps as always there is no general "silver bullet" answer - it depends on your problem space.

Some cases I've seen in previous roles where some natural key is required include reconciling third party data sources, or processing events from a topic or stream and being able to replay the event log, etc knowing that a different ID may break other third parties you don't control since they've already imported the ID. Being able to replay your data sets from scratch and get exactly the same data can have some real advantages for some apps.

Of course you need to be aware of the domain and assume that the key can change over time and have strategies to deal with that (e.g. entity version tables bound by time, data migration to add key attributes, etc etc) and the data structures/processes needs to be designed for this. There's more work in it for sure to get right - it shouldn't be the default. But in some cases I've seen it work really well which frankly surprised me at the time.

cratermoon · on April 12, 2021

a Social Security number is far from unique. https://www.computerworld.com/article/2552992/not-so-unique....

Identity crisis: how Social Security numbers became our insecure national ID https://www.theverge.com/2012/9/26/3384416/social-security-n...

Back in the 1990s I was trying to convince my colleagues not to use SSNs as unique IDs. I've since noted that quite a few organizations that had gravitated to SSNs as IDs had to go through expensive and chaotic migrations to real unique IDs.

haspok · on April 12, 2021

How do you look for a person, if not based on his/her SSN?

SSN alone is not sufficient, of course. But it _is_ definitely part of the natural key that you use _implicitly_ ANYWAY, whether recognizing it or not.

> Natural keys only work if you are flawlessly omniscient about the domain

I would call that BS. Nobody is "flawlessly omniscient" about anything, not even in mathematics, yet we design and build systems that work.

On the other hand, yes, it is a very good requirement to have someone on the team during database modeling who understands the domain model thoroughly. No UUID columns will save you from that.

earthboundkid · on April 12, 2021

> How do you look for a person, if not based on his/her SSN?

These are different concepts. "Looking for a person" means search. You can look for people lots of ways. In medicine for example, they often look for first name + last name + birthday. Is that a unique ID? No, but it's close enough for search usually.

On the other hand, for any kind of indexing or foreign keys, you want an actually unique, immutable ID, which means you don't want a natural key, you want an artificial ID created just for your database.

jacques_chester · on April 12, 2021

> I would call that BS. Nobody is "flawlessly omniscient" about anything, not even in mathematics, yet we design and build systems that work.

And such systems typically use synthetic keys to completely dodge the kind of problems I outlined.

The problems with natural keys are that you, the programmer, don't know as much as you think you know. You muddle the problem domain with the solution domain and when something comes along in the problem domain you didn't think of, it's now much harder to fix.

> On the other hand, yes, it is a very good requirement to have someone on the team during database modeling who understands the domain model thoroughly. No UUID columns will save you from that.

They save you from having to work out how to store a record when you chose SSN as primary key and discover that, uh, no you can't do that. The same goes for purchase order numbers, waybill numbers, student IDs, payroll IDs, bank account numbers, license plates ... anything whatsoever that is visible in the problem domain will or will have exceptions you didn't know about, didn't foresee and for which legislation or policy allows no exception for not using a UUID column.

echelon · on April 11, 2021

I want to print this comment and frame it.

jillesvangurp · on April 12, 2021

It's a common mistake indeed. People change company, which means their email will change. Or they are affiliated with multiple companies. Or they simply sign up with their personal email. If you have more user objects than people in your system, you are doing something wrong. This kind of digital schizophrenia is unfortunately quite common (looking at you Slack).

Having a notion of multiple verified ways to contact the user is one step up over this. Not lot of websites actually do this. Linkedin is one of them; because they recognized early that they needed to track their users as they changed their professional affiliation.

I tend to use list fields for things like phone numbers and email addresses. This makes it clear that I understand that users might legitimately have several of those and that they change over time. I always liked what Keybase did with encouraging its users to have multiple verified third party identities (the more the better).

Using a social security number is a mistake for a different reason: it's a sensitive bit of information that can be abused if it falls in the wrong hands. You should protect it like you would protect a credit card number and generally not store it unless you absolutely have a valid business reason to. Also, it's country specific which sort of makes it an obstacle if you need to serve an international market (or have the ambition to do so later).

imtringued · on April 12, 2021

>Or they simply sign up with their personal email. If you have more user objects than people in your system, you are doing something wrong.

I have three discord accounts and there is no way to avoid that.

jillesvangurp · on April 12, 2021

Github manages fine. I've used my account on numerous projects. Github simply recognizes that you are you and that your account and your company's data are two things.

yukinon · on April 11, 2021

> Personally, I'm a fan of giving everything a random UUID, because it's more flexible

Unless of course, you're using a relational database like OP and incur a performance hit from using a UUID as your primary key. Additionally, they're not sortable like autoinc id's.

I've always wanted to try out Twitter's Snowflake ID [1] algorithm to get around this, but it requires requires using something like Zookeeper. I've seen some people on the net talk about UUIDv6 being sortable by time, but there's still the potential performance hit of index size.

While I'm bringing this up I've never actually tested how slow PK UUIDv1's are and at what magnitude their performance hit becomes noticeable.

[1] https://blog.twitter.com/engineering/en_us/a/2010/announcing...

cratermoon · on April 12, 2021

> they're not sortable like autoinc id's.

ksuid https://segment.com/blog/a-brief-history-of-the-uuid/

Using DB autogenerated IDs risks that nightmare situation I've seen in more than one organization I've worked with: the IDs "leak" and become actual identifiers, in perpetuity, for the related entity. Now you can no longer renumber your table, and if you dump and restore you get all new IDs.

hodgesrm · on April 13, 2021

Your comment seems to conflate autogenerated IDs like MySQL AUTO_INCREMENT or PostgreSQL SERIAL with internal row IDs. Autogenerated keys are set on insert if you don't provide a value. After that they are stable. (And very popular, too.)

The case you describe sounds like using Oracle ROWID as a key, which I suppose people do as a hack to get around schema problems instead of fixing the schema. [1] That's an exceptionally bad idea.

[1] https://docs.oracle.com/cd/B19306_01/server.102/b14200/pseud...

cratermoon · on April 13, 2021

Can you clarify how autogenerated keys are different from an autoincrement id column? In my experience, the autogenerated keys are built on top of auto-increment columns. Of course, if you use a stored procedure or something to generate a value, then you're just using a unique ID, same as anything else, but making your system dependent on the DB, which isn't awesome if you have a distributed system with replication and all that.

hodgesrm · on April 13, 2021

I might have missed something upthread but autogenerated keys to me just mean that the key is somehow generated automatically for you rather than using a natural key like social security number (SSAN) when inserting rows. Autoincrement IDs are one way of autogenerating a key that delegates generation to the DBMS server.

To expand on this, there are three common approaches to create automatic keys in SQL applications.

1.) Generate it in the application itself. You can make UUIDs yourself with a simple call in most languages. Ensuring uniqueness is your problem--to generate integers, for example, you'll need some sort of coordination if there's more than one application thread.

2.) Generate it from a SEQUENCE. Sequences are database-side key generators that hand back a unique sequence number from a block of available sequence numbers when you call for the next number. Uniqueness is pretty much guaranteed since numbers come from the shared database server. This approach is popular in PostgreSQL and Oracle.

3.) Generate it from an auto-increment column, such as MySQL AUTO_INCREMENT columns. Auto-increment columns generate the key server-side at insert time. Uniqueness is guaranteed, but you can't see the key until it has been inserted. If you need to know the key for follow-on INSERT or UPDATE commands you have to select it back using LAST_INSERT_ID(). [1] Auto-increment keys are popular in MySQL.

ORMs like Hibernate JPA tend to provide all of these as choices. I don't use ORMs much myself hence can't comment much on the details.

[1] https://dev.mysql.com/doc/refman/8.0/en/information-function...

cratermoon · on April 14, 2021

How is SEQUENCE different from AUTO_INCREMENT if you have only a single writeable db instance? If you dump & restore a table with SEQUENCE values as IDs, does the restored DB have the same IDs, or are they generated anew on restore?

Also, SSN is a terrible key [1] and getting uniqueness without central coordination can be done with UUIDv4 or ksuid [2], as long as you have a reasonably trustworthy source of randomness

[1] https://news.ycombinator.com/item?id=26776092

[2] https://github.com/segmentio/ksuid/blob/master/README.md

hodgesrm · on April 14, 2021

With SEQUENCE it's up to you to select the ID and insert it as a column value in your code. With AUTO_INCREMENT the DBMS does it for you. They both depend on a central DBMS to generate values. In both cases you can restore data. The IDs are just normal column values after generation.

As far as SSAN I'm not asserting it's a good key, just a natural key, which is to say a key inherent in the record.

bbirk · on April 11, 2021

There's no need for zookeeper or any centralised/decentralised service. In the article you link they mention why depending on something like zookeeper is suboptimal. Given that you have less than something like 2048 web server instances, (don't remember how many bits they give to worker_number and the snowflake github repo is basically unavaible) all you need to do is make sure every instance has a rank/worker_number (infrastructure/devops problem) which the instance will use when it generates the snowflake ids. Sidenote snowflake also suffers from the unix epoch 2038 problem, but that can be simply solved by adding bits for epoch number.

dale_glass · on April 11, 2021

Things like Twitter are special. I'm talking more about a generally sensible way of doing things, which one may need to deviate from in special circumstances.

Why would you want to sort by ID? Sort by something sensible, like the signup date instead. An autoincrement may stop corresponding to time if for instance at some point a database has an external dataset imported into it.

IMO, using an ID for anything other than an opaque identifier is asking for trouble.

simonw · on April 11, 2021

I like IDs I can read out over a call, or recognize when I spot them in a log file. The few times I've used UUIDs for IDs I've later regretted it.

mosdl · on April 11, 2021

Add a prefix to the if then.

orf · on April 12, 2021

> or recognize when I spot them in a log file

“14632” in a log file could be anything, whereas a UUID is way more explicit and searchable.

simonw · on April 14, 2021

I have frequently memorized IDs of things in systems I interact with often:

"Oh, it's user 14632 - yeah I've seen that ID crop up against this issue a bunch of times in the past"

I can't do that with UUIDs.

orf · on April 14, 2021

Sure, but I think you would usually optimize for the inverse - actively finding a needle in a haystack rather than passively hoping a needle falls in your lap. I get 1k hits for "14632" in our Splunk for the last 60 minutes, all of it rubbish. Durations, image sizes, worker IDs, primary keys, port numbers.

Search for a UUID and the results are guaranteed to always be relevant. Can't do that with integers.

santiagobasulto · on April 11, 2021

I'm from Argentina and we have something similar to a SSN, we call it DNI (Documento Nacional de Identidad). You wouldn't believe how many duplicate DNIs we have, it's crazy.

So yes, I agree with you, I always use a random UUID as ID.

haspok · on April 12, 2021

If you don't use natural keys, how do you know if you have duplicate records in your database? How do you UPSERT? A UUID won't help you here (unless created as a hash of natural data).

> Then you may end up having to restructure the entire database, which will be a very annoying thing to do.

Well, you might call it annoying, I may call it a good upfront design requirement. Yes, you have sometimes spend time on stuff that doesn't immediately pay off, but it will pay in the longer run.

If you have a problem with your natural keys, this is mostly due to missing or misunderstanding your requirements, or an incomplete domain design. You can of course just sweep stuff under the rug, and pretend it's not there, but it's not a maintainable strategy.

(Unless you are a consultant of course, with a fixed term assignment on a project - in which case it is an excellent strategy for yourself...)

the_af · on April 11, 2021

The underlying problem is one of O-R impedance mismatch. Going full SQL and getting rid of the ORM is a possible answer, but it has tradeoffs and is not a silver bullet. It might mean re-creating from scratch an in-house, bug-ridden ORM, or ditching OOP idioms from your language, or both.

The author of TFA seems to be going through one of the stages described in "ORM is the Vietnam of Computer Science", an article that should be mandatory reading before one claims to know the solution to this decades old problem.

hagy · on April 12, 2021

I've found jOOQ to provide the right tradeoffs and flexibilities for ORM vs. SQL. First, it can generate object models of tables from the database in development and therefore doesn't rely on any specific migration tool. These model classes can be used in a conventional ORM fashion, but you also have the option to use jOOQ to build SQL queries.

You can even fetch the results of arbitrary SQL expressions into a model class, which can handle partial population of columns. This allows combining ORM and SQL approaches with some code operating on models where that is natural, but this code can be applied to models which are fetched in alternative fashions when that is more preformat. E.g., fetching records using criteria that involves joins and only selecting relevant columns for the subsequent processing of fetched results.

The jOOQ "DSL" (i.e., Java interfaces and static methods) gives you essentially the full power of SQL without having to rely on external SQL files or stored procedures. (Or even worse strings.) You can even programmatically build these queries. E.g., optionally including different WHERE/HAVING criteria. The jOOQ "DSL" provides a fair amount of compile-time type safety, which isn't possible with external SQL.

cies · on April 12, 2021

jOOQ is great. It brings a lot:

* auto-complete in your IDE when writing queries in jOOQs Java DSL that maps quite naturally to SQL

* type safety. e.g.: migrate after a schema change, re-generate your jOOQ lib and see in your IDE (red underlines) all the place your queries would break using the new schema

* build queries from parts (e.g.: store/manipulate some where clauses in a local variable) w/o any string manipulation

But is has costs:

* generate the jOOQ library from your schema at build time (or everytime the schema changes): increase build time (a little) and complexity (a db needs to be around during builds)

* very small runtime overhead: queries are build at runtime

* one more thing to learn

To me the benefits outweigh the costs (unlike Hibernate, I much agree with the article) and I consider it similar to LINQ on C# while not being some language built in feature with it's own syntax.

phreack · on April 11, 2021

Here's the article

http://blogs.tedneward.com/post/the-vietnam-of-computer-scie...

Edit: huh, seems to have got truncated over the years. Here's an archive link

https://web.archive.org/web/20160120004603/https://blogs.ted...

alasdair_ · on April 12, 2021

>“ORM is the Vietnam of Computer Science”

In other words it’s a complicated subject with a long and fascinating history, yet Americans will tend to only remember the short, disastrous bit they were directly involved in?

the_af · on April 12, 2021

No, the takeaway from Ted Neward, the author of this analogy, is this:

> "Although it may seem trite to say it, Object/Relational Mapping is the Vietnam of Computer Science. It represents a quagmire which starts well, gets more complicated as time passes, and before long entraps its users in a commitment that has no clear demarcation point, no clear win conditions, and no clear exit strategy."

Though I must admit your take is good, too!

imtringued · on April 12, 2021

I don't believe that to be the case. The vast majority of applications built on top of ORMs design databases so that they are easy to map to. Sure, as soon as you access a database that wasn't designed for the ORM you run into trouble but this is an all or nothing problem. Getting rid of the ORM where it works doesn't actually provide you with a benefit.

the_af · on April 13, 2021

> Sure, as soon as you access a database that wasn't designed for the ORM you run into trouble

Don't you mean the reverse? A relational database is never designed "for the ORM" -- ORMs are designed (sometimes poorly, sometimes better) for some databases. The object-relational impedance mismatch problem is well known in software engineering for a reason: OOP and relational databases weren't designed with each other in mind, and in fact, "fight against each other" in many ways. This is the starting point of the Vietnam article -- you can disagree with the analogy, but the impedance mismatch is very real.

kleinsch · on April 11, 2021

Everyone hates JPA/Hibernate, but what’s the alternative? I’ve seen this a few times. “You don’t need an ORM, write your own SQL queries directly and create a beautiful domain driven design object model” leads straight into a project only the owner will understand. Homegrown mini-ORM that’s full of pitfalls, inconsistent object model, hacks and TODOs all over the place.

If you’re living in the Java ecosystem, the biggest benefit is that you have a massive library ecosystem and developer base that understands it. If you’re going to throw all that out and require new people to learn your homegrown mess, why not pick a language that’s more interesting than Java? Then you’ll get new developers that are anxious about Go/Elixir/WhateverHotness.

Saying this as someone with 15 years in Java world.

watwut · on April 12, 2021

I dont think it is true that everyone hates JPA/Hibernate. I dont hate them and dont know many people IRL who hate them. I know SQL, worked with SQL before Hibernate. Here, developers are expected to learn SQL even as project is using Hibernate.

For me, migration to Hibernate was improvement. There is a subculture of people who hate Hibernate, hate Java, hate frameworks for anything. And some of these people are very vocal and emotional about it. And then there is everybody else.

And a lot of it honestly sounds like projection. People who dont like to have to learn these things and quite clearly did not bothered to learn them. They assume Hibernate is used to avoid learning SQL, because that would make sense to them. But, back in real java world, developers are expected to know both.

tablespoon · on April 12, 2021

Yeah, I don't hate JPA/Hibernate. I fact, I tend to hate its absence, because for me that's usually meant a four user CRUD app written with JDBC and a billion lines of boilerplate mapping between the domain and SQL CRUD statements.

I mean, there are definitely cases where JPA/Hibernate doesn't fit for all kinds of reasons, but don't avoid it where it does help (e.g. the 4 user CRUD app that needs to be "web scale").

dopidopHN · on April 12, 2021

Gosh has it been 15 years already? I never liked Java much. But several companies I liked and hired me where using mostly that. Then I did some consulting / freelancing / firefighting. ORM and mostly Hibernate was often a topic.

My experience and methodology is as follow :

- Write test for your ORM layer. Possibly not in Java. Some high level integration type test. ( calling your web-layer or API and checking for what comes out )

- Burn your ORM layer down.

- Have someone who know the business logic sit with you. And re-write the ORM layer from mostly scratch.

- While doing the above, train that person on the very redondant gotcha ( Inverse n to n, accidental carthesian product, bad ID generation, compulsive flush, un-needed @lazy or @eager to deal with performance, @embedded abuse and fucked up SQL schema because of the previous)

- Train all dev on the persistence lifecycle of stuff in JPA. "No, most likely you won't have to call .persist or .merge yourself" being the bottomline

- Have a shiny documented exemple of 1 to 1, N to 1, N to N examples ( both with or without inverse link )

- If some report or API needs a fucked up SQL query with 18 join. Write that thing in native SQL and shove it in your DAO. You don't need to mess up you'r whole DAO layer for those 3 queries

- Pray

- Run the test from step 1.

--- TL;DR : if the ORM layer is somewhat clean. you can mostly forget about it. On the project I'm on now, I did not had to touch it more than once in 2 years. ( and that was terrible, granted, we touched a test and stuff that had nothing to do with it started to blow up. But still, once in 2 years is not a lot )

oweiler · on April 12, 2021

Both JOOQ and JDBI are superior to JPA in my book.

cies · on April 12, 2021

jOOQ is the most obvious contender. Since I use Java I like type safety, jOOQ gives me that (like Hibernate) without trying to abstract the fact that I make queries to a db (which Hibernate does).

ChrisWreck · on April 12, 2021

jOOQ and JPA doesn't solve the same problem.

lazystone · on April 12, 2021

That's why JOOQ is superior.

ChrisWreck · on April 12, 2021

It depends on what you want to do.

Complex queries with multiple joins? jOOQ.

Persist changes to complex aggregate models with multiple layers of child entities? JPA/Hibernate.

lazystone · on April 12, 2021

> Doctor, it hurts when I do this. Then don't do that

cryptos · on April 12, 2021

The interesting part is how to perform updates without an ORM. All this "just use SQL" statements seem to ignore this problem. If a domain model is used there is no simple, obvious way how to find out what needs to be updated in the database and how to generate performant SQL code from it. I'm really interested in some examples how to do it.

vishnugupta · on April 12, 2021

> “You don’t need an ORM, write your own SQL queries directly and create a beautiful domain driven design object model” leads straight into a project only the owner will understand

Could you please elaborate why will this end up happening?

I've been on projects that had ~50 active contributors. All of them used plain SQL queries and used a library (don't remember which) only to map columns to Java types. Everyone understood table structure they were working on and its relationship with other tables. If they introduced a new query they were expected to run 'explain plan' first to understand the consequences and change the query if needed. I won't deny that it all needed a bit of shepherding by the senior engineers but the overhead was negligible. As an upside every developer was always up to date with the schema and precisely knew the performance consequences.

The mess you state is perhaps a byproduct of "get-big-fast" code bases?

chrisandchris · on April 12, 2021

How do you deal in such a project with similar queries that fetch not all the same data?

E.g. Query A retrieves an Author and a count of all the boks he has written. Query B retrieves an author and the title of the most recent book he has written. I see following options: - make two models, one for each query -> leads to a lot of similar models and code-bloat - make a base query and fetch additional data with an additional query -> leads to a lot of small queries that would sometimes be more performant as a single query

deeg · on April 12, 2021

Shameless plug : for a little while now I've been working on a project with others to try a different approach. We use a higher level abstraction that does a better job of modeling the data than language-level objects. This eliminates the impedance mismatch. Because it's not language dependent you can use the same models (we call them "logical objects") with multiple languages.

The logical objects are hierarchical and serialize easily to JSON and back, making it simple to write REST interfaces. There are a couple of production teams making good use of it.

Github repo is here; feel free to contact me about it: https://github.com/zeidon/zeidon-joe

oulu2006 · on April 12, 2021

Nice, had a poke around, found a small typo while reading about it

Asynchronous loads are trivial with Zeidon. Simply add the “.synchronous” qualification to the activate.

deeg · on April 12, 2021

Thanks! It's on the list :)

stuff4ben · on April 12, 2021

Exactly! And can you even call yourself an experienced Java developer if you haven't written your own ORM at some point?

jayd16 · on April 12, 2021

I just don't understand this mind set. Just learn SQL and not only is it not hard to maintain SQL, you'll find its so much easier to manipulate data.

Just get something like JDBI for the row set to POJO mappings.

Quarrelsome · on April 12, 2021

C# have micro-ORMs like Dapper, surely Java has smth similar? Micro-ORMs show the SQL but the object generation still gets done for you so you get a bit of both worlds.

chromanoid · on April 12, 2021

I wouldn't call it micro, but https://ebean.io/ is pretty nice.

vips7L · on April 11, 2021

This article is... Questionable at best. I don't think any of this is an argument against JPA, except that the author doesn't like how it works? I also suspect the author doesn't know hibernate that well.

For instance selecting just the fields you need is relatively simple with JPQL:

    SELECT i.url FROM Image i WHERE i.id = ...

agilob · on April 11, 2021

JPA and Hibernate make it very easy to use it incorrectly, its almost like they promote bad SQL queries and ideas. They let users of database connection to write Java-first database queries, when database query should be database first, it's just way too easy to abuse it and get too much data, too many columns and JOINs.

Developers look like JSON looks like, what we send to browser, what formatting it has, validation and size of JSON data, as it's easy to monitor and trim.

Can't say the same about Hibernate. How the hell even hibernate caching works? Why even are there 2 levels of Hibernate cache? It's too easy to create and abuse transactions. Dirty-checking is JUST WAYYYY TOOO EAZY TO ABUSE. not calling, using setter of an instance shouldn't update in database by default omg, It shouldn't be possible for transactions to leak outside some easily specified scope - I've seen one project where transaction leaked to Jackson!! Jackson was calling getters on fields and executing DB queries. JSON ended up as 2.6Mb instead list of 10 fields.

Hibernate is popular because we don't need to learn SQL to get needed data, but it's also super hard to get it right and don't do something stupid by accident.

If in any doubt, refer to JOOQ - it's the SQL-oriented ORM for Java.

vips7L · on April 11, 2021

> How the hell even hibernate caching works? Why even are there 2 levels of Hibernate cache?

https://docs.jboss.org/hibernate/stable/orm/userguide/html_s...

> using setter of an instance shouldn't update in database by default omg

This doesn't happen.. the database won't update until you ask the entity manager to persist the entity.

> I've seen one project where transaction leaked to Jackson!! Jackson was calling getters on fields and executing DB queries. JSON ended up as 2.6Mb instead list of 10 fields.

I suspect you might not agree, but JPA lazy loading is one of the easiest concepts to understand. However, I'd argue that leaking your database models to the view is a mistake to begin with and said application is already incorrect.

> If in any doubt, refer to JOOQ - it's the SQL-oriented ORM for Java.

Both the author/maintainer of jOOq and Hibernate agree that each has their place.

I really am amazed at developers that don't take the time to learn about the tools they use.

eeperson · on April 11, 2021

> This doesn't happen.. the database won't update until you ask the entity manager to persist the entity.

Doesn't this happen through automatic dirty checking?

vips7L · on April 11, 2021

No calling a setter on an Entity doesn't automatically issue an sql UPDATE query. You need to ask the EntityManager to merge or persist the entity and it's changes.

Obviously JPA knows what fields were updated via dirty checking.. that's almost half the point of an ORM.

alserio · on April 12, 2021

If you have a managed instance of an entity, like something returned by a find or a query, calling a setter will modify its state and the state change _will_ be detected and persisted automatically on the next flush, without explicit persist or merge.

eeperson · on April 11, 2021

I'm not talking about "dirty checking". I'm talking about "automatic dirty checking". Take for example this code I got from this article[1]:

    SessionFactory  sessionFactory = HibernateUtil.getSessionFactory();
    Session session = sessionFactory.openSession();
    Transaction tx = session.beginTransaction();
    Person person  = session.load(Person.class, 2); //loads Person object for id 2
    person.setAge(32);
    tx.commit();
    session.close();
    HibernateUtil.closeSessionFactory();

It results in an updated age for the person in the DB. However, EntityManager was never directly notified about the change in the person object.

[1] - https://learnjava.co.in/automatic-dirty-checking-in-hibernat...

vips7L · on April 11, 2021

That's because JPA isn't involved here. It's directly using hibernate api's and not sticking to the standard which is what I was talking about.

eeperson · on April 12, 2021

I'm almost certain this occurs with Hibernate as the JPA provider. I haven't tried this with other JPA providers but as far as I can tell from tutorials[1], stack overflow posts[2], and the JPA documentation[3], this is the default for JPA as well.

[1] - https://www.objectdb.com/java/jpa/persistence/update

[2] - https://stackoverflow.com/a/8307991

[3] - https://docs.oracle.com/javaee/6/tutorial/doc/bnbqw.html#bnb...

EDIT: Oops I accidentally replied to this twice

eeperson · on April 12, 2021

Have you verified that? I'm pretty sure that occurs using the JPA API and Hibernate. I'm not sure if other JPA providers do this as well.

watwut · on April 12, 2021

That is because "session.load(Person.class, 2)" line literally says "and track changes" - as article says.

That is not how Hibernate or JPA is used normally. It is going out of standard way to achieve the thing you complain about.

eeperson · on April 12, 2021

> That is because "session.load(Person.class, 2)" line literally says "and track changes" - as article says.

I'm aware that that what occurs. That is my point. I'm responding to the previous posts statement "No calling a setter on an Entity doesn't automatically issue an sql UPDATE query". This is an example where calling a setter causes an update query to be run.

> That is not how Hibernate or JPA is used normally. It is going out of standard way to achieve the thing you complain about.

What about this not not how Hibernate and JPA are used normally? Are you saying that setters are not normally used? Do you mean that people normally call update or merge to persist an Entity? If so, I agree that is what people normally do. However, when people do that they tend to accidentally introduce bugs. Usually this occurs when they update an entity and then do some validation on it. When the validation fails they think they can avoid sending he changes to the DB by doing nothing. However, that isn't true. They have to manually evict the entity from the session to prevent that from happening.

watwut · on April 12, 2021

The "session.load(Person.class, 2)" thing is not done normally. It is not even part of JPA. It is hibernate only feature.

So in all project I have seen, calling setter did not changed database.

> Do you mean that people normally call update or merge to persist an Entity?

Yes, people normally call update and merge to persist an entity.

alserio · on April 12, 2021

People normally do that when they do not have to do that.

  Dog rex = em.find(Dog.class,"rex"); // 1
  rex.setAge(2); // 2
  // other query // 3

At 3 the update is flushed on the underlying db. This is pure JPA. Calling merge is forcing an useless query before the update. Also, merge returns the managed entity.

eitland · on April 12, 2021

> Hibernate is popular because we don't need to learn SQL to get needed data, but it's also super hard to get it right and don't do something stupid by accident.

I think this is plain wrong, or I might have been very lucky with who I work with: if anything I think most people I work with learned JPA or other ORMs long after learning SQL.

morelisp · on April 12, 2021

You have been lucky. Part of our hiring test (intentionally) does something which is trivial to get correct if you know SQL but also easy to get racey with Hibernate. About 4/5 of applicants get it wrong and maybe 2/5 don't know how to fix it when we show them they problem. (Spring devs with 1-5 years experience in web APIs and reporting.)

watwut · on April 12, 2021

Yeah, I really dont know anyone in real life who would claim or expect developers to not know SQL just because hibernate is used.

Hibernate is how to get data into Java. And we still have database scripts, migrations to new versions and what not.

Zardoz84 · on April 12, 2021

> not calling, using setter of an instance shouldn't update in database by default omg, It shouldn't be possible for transactions to leak outside some easily specified scope - I've seen one project where transaction leaked to Jackson!! Jackson was calling getters on fields and executing DB queries. JSON ended up as 2.6Mb instead list of 10 fields.

I saw a similar problem with some old codebase where I work. There is some Velocity templates that shows information stored on database entities (yeah... bad idea). And sometimes, we got some mysterious errors about transaction closed. Well... Results that Velocity calling the getters of these beans, can trigger a JPA/Hibernate query to get some additional data that has been loaded before. And this could happens after we close the database transaction.

alkonaut · on April 11, 2021

Giving up basic OO niceties like invariants in your whole domain just to get automatic persistence from some library I agree with the author: it’s insane and no one should accept that tradeoff.

There has to be ways around that though, perhaps using a duplicated domain of DTOs or coercing the ORM to use constructors or private setters to keep encapsulation and invariants.

vips7L · on April 11, 2021

IMO invariants are better handled by a class who's sole responsibility is to enforce and validate said invariants, especially when you have dependencies involved to enforce them (like making sure the Item actually exists in the db).

Value classes like @Entity shouldn't have the responsibility to enforce those business rules.

We can disagree on which way is more object oriented though.

jacques_chester · on April 11, 2021

I largely agree that JPA is a maddening mess[0]. However, this sentence caught my eye:

> I love open source, really, but big companies sponsoring open-source projects get most of their income from support or third party tools.

I work for VMware and consequently take some interest as to how my salary comes into being.

If you think VMware makes "most of its income" from supporting Spring, then I think I'd encourage you to spend some time at the investor relations site[0] reading any of the annual or quarterly reports. I'd advise the same for Oracle and Red Hat/IBM.

[0] The advice to use emails or SSNs as primary keys, though: yikes.

[1] https://ir.vmware.com/

lmm · on April 12, 2021

> If you think VMware makes "most of its income" from supporting Spring, then I think I'd encourage you to spend some time at the investor relations site[0] reading any of the annual or quarterly reports.

VMware is a huge company with its fingers in many pies, but I suspect if you rewrite the sentence as "most of their income attributable to that open-source project" then it would be true of such companies. Presumably VMware sees a return on investment for its sponsorship of Spring (otherwise why do it?); if that's not coming via support or non-free tools that build on Spring then where is it coming from?

jacques_chester · on April 12, 2021

I can't speak for folks moving in VMware's most exalted circles, but from my personal perspective the aura and reflected glamour are valuable in itself. Spring's reach and influence in enterprise programming is enormous.

Put another way: I don't think Spring makes money. It makes making money easier.

ChrisWreck · on April 12, 2021

Going from using only JPA/Hibernate for everything, to use a combination of both jOOQ and JPA/Hibernate on a project, is probably the best decision I've made. Using each tool at what I believe they do best.

I use JPA/Hibernate for most writing and inserts. It helps a lot when you're dealing with aggregates with child and child entities. It would be a nightmare to track all changes myself and try do manually do what the ORM is doing for me. Deleted a child of a child of an aggregate? No problem, persist only that.

I've started to separate my JPA/Hibernate entities from my domain models as well, and it looks promising. There's some more mappins, but my domain won't be polluted with database concerns.

Then I use jOOQ for almost all reading of data, reading into custom read models that fit the view they are supposed to be shown in. No problem doing multiple joins or other stuff that would give you an immediate headache when trying to solve using JPA/Hibernate.

exabrial · on April 12, 2021

I don't want to sound like I particularly love the orm pattern when I say this... But his information is about 10 years out of date. Many of the claims simply aren't true anymore or the world has moved past by other means. For instance one of his very specific claims, getters setters must be present, definitely is not true for current versions of every JPA implementation. In the claim about a default noop constructor, you can't possibly be serious unless you are doing true oop and in that case you are awarded no points and may god have mercy on your soul.

The beauty of dependency injection is it did more to make Java a functional language than Java8 with lambda notations. That pretty much eliminates the need for pure OOP, which JPA attempts to imitate. In reality JPA + JTA is a pretty awesome combination and avoids the stupid OOP paradigm.

Again. Argh. I would write a response blogpost, but argh, apathy for the orm pattern.

lmm · on April 12, 2021

This is a common criticism but still extremely shallow. JPA/Hibernate is still the best way to actually produce working applications if you have to use an SQL database for some reason. To go point by point:

Mutable datastructures with default constructors and setters: yes, mutable entities suck. Unfortunately SQL is fundamentally built around mutable entities. Every field of your POJO is writable because every column of your database is writable. The impedance mismatch is big enough already without trying to make objects that behave differently from your database. Yes, an object shouldn't just be a datastructure with methods; unfortunately an SQL table row is just a datastructure, and classes are the only mechanism Java offers for representing such a thing. Same for mutable collections.

Reflection (which is the reason classes must be non-final): again, sucks, again, the only way to do something like this in Java.

Lazy loading is wonderful. Put your session in your view, write a normalised set of entities that actually model your domain, and get on with your life. Don't worry about the details of what loads when unless and until you have to.

Those who don't understand Hibernate caching are doomed to reinvent it poorly.

Don't use your database as an API. Yes, Hibernate/JPA needs to own your database. That's as it should be. SQL databases are way too complex to be shared between independent applications.

Do you really think people who can't be bothered to learn and understand Hibernate properly are somehow going to take the time to learn and understand "vanilla SQL"? Why?

None of the problems listed here are problems of JPA/Hibernate. They're problems of SQL databases which are surfaced through JPA/Hibernate, but if you skip out on JPA/Hibernate you still get exactly the same problems (maybe in a slightly less recognisable form). The real solution is to stop using these overrated datastores, but if you must use them then JPA/Hibernate is the least-bad way of doing so.

manyxcxi · on April 12, 2021

I disagree on most of your points but I am trying to read your entire comment favorably, but my personal experiences do not line up with almost any of this.

>>> JPA/Hibernate is still the best way to actually produce working applications if you have to use an SQL database for some reason...

This statement makes me think that you either do not prefer to use SQL RDBMS, don’t have to use them very often, believe they are some dusty piece of tech, or all of the above when my experience has me believing that RDBMS are absolutely the most common persistence layer I encounter in JVM, .NET, PHP, and Python codebases.

I don’t think I’ve ever heard a senior JVM based engineer proclaim that JPA/Hibernate are “the best way... to produce working applications”. It simply isn’t. For basic CRUD applications you will AT BEST barely write fewer lines of code with JPA than with native JDBC queries and ResultSet mapping and have all the lock-in and performance drawbacks of JPA.

Lazy loading will inevitably wind up with Session scope problems with any kind of concurrency, forcing nasty internal list enumeration to force a faux eager fetch to work around the problems.

Fetching just the columns you need for a particular projection will have you writing either SQL or Hibernate “SQL” in annotations.

If you have a mix of JDBC and JPA in a codebase you will inevitably wind up with enough consistency and visibility issues as to either ditch one of them or ditch the entire codebase.

Vanilla SQL is so pervasive and CRUD operations are so simple that I would have serious doubts about the credibility of a JPA proselytizer that didn’t know SQL.

I expect every single one of my backend engineers (on any tech stack) to understand the fundamentals of SQL INSERT, UPDATE, and DELETE statements.

ChrisWreck · on April 12, 2021

> I expect every single one of my backend engineers (on any tech stack) to understand the fundamentals of SQL INSERT, UPDATE, and DELETE statements.

Understanding INSERT, UPDATE, and DELETE doesn't help you very much when you have to persist changes to a single child entity of many in an aggregate (parent entity). How do you track which child has changed? Or if a new child is added? Or one is deleted? Or what if the relation is a child of a child of the aggregate, which might very well be the best way to model your domain.

In these situations, Hibernate/JPA will help you a lot! If you're doing it using plain SQL/JDBC, you'll probably end up writing your own mini ORM, and/or polluting your domain with database concerns. (I do keep my Hibernate entities separated from my domain.)

cryptos · on April 12, 2021

Exactly this! And I'm curios to learn how to solve this update problem (in an elegant way) with pure SQL.

watwut · on April 12, 2021

> I expect every single one of my backend engineers (on any tech stack) to understand the fundamentals of SQL INSERT, UPDATE, and DELETE statements.

And everyone knows them. It is just absurd to talk about Hibernate as a way to avoid learning to write insert, update and delete. That is made up issue. That is like claiming that people who dont write getters and setters dislike them because they did not learned how to write them.

I dont know whether this claim is a manipulative attempt to try to insult people who like framework you dont or what. But it is ridiculous.

lmm · on April 12, 2021

> This statement makes me think that you either do not prefer to use SQL RDBMS, don’t have to use them very often, believe they are some dusty piece of tech, or all of the above when my experience has me believing that RDBMS are absolutely the most common persistence layer I encounter in JVM, .NET, PHP, and Python codebases.

Popular doesn't mean good. I have to use SQL RDBMS a lot, and I respect the amount of low-level engineering work that has gone into them, but yeah I do hate using them.

> I don’t think I’ve ever heard a senior JVM based engineer proclaim that JPA/Hibernate are “the best way... to produce working applications”. It simply isn’t. For basic CRUD applications you will AT BEST barely write fewer lines of code with JPA than with native JDBC queries and ResultSet mapping and have all the lock-in and performance drawbacks of JPA.

I don't think seniority is a good metric, but I've got 10+ years of professional JVM experience for what that's worth. Using JPA means you'll write significantly less code, and the code you get to skip is the most tedious (and therefore rarely read or reviewed) part. Lockin is significantly lower: you can seamlessly migrate between databases in a way that you can't with handwritten SQL, and you can migrate between different JPA implementations with minimal work (not that I think there's actually much value in doing that, but the capability is there). Performance for equivalent effort will be significantly better because you've got a caching layer that actually works already in place (unless you turn it off, but, uh, don't do that).

Of course if your CRUD application is performance-critical enough to justify hand-tuning every query and implementing a correct caching layer by hand then you'll do better without the framework. But realistically that's a vanishingly rare case.

> Lazy loading will inevitably wind up with Session scope problems with any kind of concurrency, forcing nasty internal list enumeration to force a faux eager fetch to work around the problems.

Depends on your application - a lot (not all, but a lot) of systems decompose naturally into a sequence of isolated steps that provide a natural session boundary. E.g. for a REST API or MVC-style webapp just put the session in the view and get on with your life - people have some philosophical objection to this but it works really well. (I actually don't think MVC is a great way to structure a webapp, but that's a separate fight).

> Fetching just the columns you need for a particular projection will have you writing either SQL or Hibernate “SQL” in annotations.

True. But, on the assumption that you've actually structured your entities to follow your domain, how often is that something you actually gain a significant amount of performance (or anything) from?

> If you have a mix of JDBC and JPA in a codebase you will inevitably wind up with enough consistency and visibility issues as to either ditch one of them or ditch the entire codebase.

This I completely agree with (at least for people who don't make any actual effort to address the problem), and I think it's where articles like the OP come from. I see a lot of people follow a pattern something like: their application needs some vaguely tricky query, and rather than spending 5 minutes looking up how to do it in the Hibernate documentation they decide to handwrite the SQL for it instead. Then they realise that this makes the Hibernate cache for the affected entity invalid, and rather than look up how to selectively invalidate the cache for the entities affected by their query they disable the cache globally. Then they complain that Hibernate is slow and decide the solution is to handwrite the SQL for other queries instead. JPA works great, but only if you're willing to actually try to use it.

> I expect every single one of my backend engineers (on any tech stack) to understand the fundamentals of SQL INSERT, UPDATE, and DELETE statements.

Ah, but that isn't actually enough. The people talking about getting better performance from handwriting your SQL are people who understand different types of indices, different join strategies, how the query planner chooses which one to use. And if you put the same amount of time and effort into understanding Hibernate, you can get great things out of it.

manyxcxi · on April 12, 2021

And on many of these points I agree...

I was careful to choose popular, and not project opinions about SQL/NoSQL/etc. In my field, most of our data is relational and we use NoSQL for caching, queues, shared work, ETL performance, dashboards, etc. but at the end of the day for persistence, the RDBMS is where the “gold copy” data ends up.

As you mentioned previously, knowing the tool set and the domain is critical to either approach. At a certain point with technology the benefits and costs are weighted by subjective preference and project specific needs. I have weighted SQL higher than JPA by many factors because I can take my SQL knowledge to any backend project, and I’ve been a part of a lot of different tech stacks in my career.

Maybe my travels have lead me to be surrounded by many more engineers that trust the database (and their knowledge of the database) to handle the persistence without a too many layers in between.

I, personally, have never seen a JPA based project that actually worked well with large-ish datasets, high concurrency, or when non-trivial ETL functions are part of the system- and this general domain has been the majority of my career, so I may have blinded myself to THE majority being confused for MY majority.

Thanks for the response and a good look at the topic from a different point of view.

lmm · on April 12, 2021

> I was careful to choose popular, and not project opinions about SQL/NoSQL/etc. In my field, most of our data is relational and we use NoSQL for caching, queues, shared work, ETL performance, dashboards, etc. but at the end of the day for persistence, the RDBMS is where the “gold copy” data ends up.

I'd worry about using an RDBMS in that situation because it's fundamentally mutability-first. I prefer to regard the user's actions as the "gold copy" and the current-state-of-the-world as a transient derived thing (i.e. event sourcing), but that doesn't really play to the strengths of an RDBMS. You also have to make global decisions about transactionality (in particular, you can't easily commit a data write without committing updates to all your secondary indices), and the much-vaunted relational integrity can be a problem because you can only represent constraints for cases where the appropriate response to a constraint violation is dropping the write on the floor. And of course you can't safely allow the ad-hoc querying that SQL is designed for.

I do think traditional RDBMS make some sense at the end of an ETL pipeline - where the secondary indices can be a big help for the ad-hoc querying/aggregation that you want to do in a reporting environment. But transactions don't make sense in that environment because it's essentially read-only (or at least single-writer), so you're still paying for a lot you're not using. I wouldn't use JPA for this, but I wouldn't really write code for this kind of environment at all - the point is to expose the data in a structured form for non-code tools.

Essentially I find mature systems outgrow SQL databases - the case where an RDBMS actually fits is the early stages where you want to run ad-hoc reports against your live datastore, you want to keep the current state of the world rather than worrying about history, having to manually fail over to a replica if master goes down is ok, updating all your indices synchronously is fine because write performance isn't an issue yet, and you can put constraints in the database because blowing up with an error page is an adequate response when the user breaks the business rules. Using JPA increases the rate at which you can iterate on the system, which is the priority for that kind of use case.

marcinzm · on April 12, 2021

Scala/Kotlin have some SQL abstraction libraries which have immutable data entities, actual constructors, etc. So I don't see why any of this has anything to do with SQL rather than the limitations of Java and Hibernate trying to force Java to do something it's not designed for.

lmm · on April 12, 2021

SQL abstraction libraries yes. ORMs, not really. You can map between an immutable datatype and the state of a row at a given point in time, but the only natural way to work with the native way that SQL databases express writes - in-place updates to rows - is with a model that represents them as in-place updates. In my experience those SQL abstraction libraries tend to be oriented towards either thinking in a purely command-oriented way (i.e. they're the equivalent of an IO monad) or using your database in an append-only log style (which is a much better way of storing data, but not what SQL databases are designed for).

eeperson · on April 12, 2021

> the only natural way to work with the native way that SQL databases express writes - in-place updates to rows - is with a model that represents them as in-place updates

I strongly disagree with this for a few reasons. I actually think the command-oriented way is makes the most sense. The problem with in place updates is that:

- Your queries aren't views. What you queried out of the DB is a snapshot of what is in there. It may already be different. - The way you interact with a DB is command based. If you want to update a row that is a command that may fail.

The only way I feel that using mutable data would make sense is if you had some sort of 2-way syncing. But those are notoriously difficult to get right even without network trips in the middle.

watwut · on April 12, 2021

I fully agree.

> Do you really think people who can't be bothered to learn and understand Hibernate properly are somehow going to take the time to learn and understand "vanilla SQL"? Why?

Yep, that is exactly what I think. People who know Hibernate know also SQL. People dont bother learn, dont know either. And then there are people who know only SQL, because they worked on projects without Hibernate. They typically learn Hibernate fast.

krzyk · on April 12, 2021

> Do you really think people who can't be bothered to learn and understand Hibernate properly are somehow going to take the time to learn and understand "vanilla SQL"? Why?

Because JPA is a magnitude larger space to learn than just SQL.

Or more precisely: using Hibernate in simple applications (CRUD) will cause you issues very soon (sessions missing), some magical saving of data when you a set, doing filtering in streams (and now you take too much fields).

Doing the same with SQL (using JDBC or something like JDBI) won't cause that for simple apps.

koolba · on April 12, 2021

> Mutable datastructures with default constructors and setters: yes, mutable entities suck. Unfortunately SQL is fundamentally built around mutable entities. Every field of your POJO is writable because every column of your database is writable. The impedance mismatch is big enough already without trying to make objects that behave differently from your database. Yes, an object shouldn't just be a datastructure with methods; unfortunately an SQL table row is just a datastructure, and classes are the only mechanism Java offers for representing such a thing. Same for mutable collections.

Every field is not writable in a database. There’s often constraints that restrict how fields can be updated. Requiring setters for all fields breaks that contract.

> Reflection (which is the reason classes must be non-final): again, sucks, again, the only way to do something like this in Java.

It’s the only way if you insist on having the entities track their own state changes via point in time snapshots. If they represent the changes themselves there’s no requirement for reflection.

> Lazy loading is wonderful. Put your session in your view, write a normalised set of entities that actually model your domain, and get on with your life. Don't worry about the details of what loads when unless and until you have to.

In the real world the lazy loaded view get tested with one widget loading five wozzles, then in production it’s one widget loading 5000 wozzles, each performing a separate query to hydrate its state.

> Those who don't understand Hibernate caching are doomed to reinvent it poorly.

Hibernate’s caching model manages to be both incredibly complex and incredibly limiting. Short of careful usage with an external store, it’s nearly impossible to scale across multiple JVMs. And even then it pushes the complexity to the cache.

> Don't use your database as an API. Yes, Hibernate/JPA needs to own your database. That's as it should be. SQL databases are way too complex to be shared between independent applications.

(Emphasis mine)

On the contrary, a well designed database can and will have many separate applications I retracting with it. All the more reason to have the database reflect the true constraints of the system.

> Do you really think people who can't be bothered to learn and understand Hibernate properly are somehow going to take the time to learn and understand "vanilla SQL"? Why?

The complexity of learning SQL is grossly overstated. Plus it translates as skill to all other languages and programming environments. On a purely economic basis it’s a better choice for an individual to learn SQL over Hiberate/JPA.

> None of the problems listed here are problems of JPA/Hibernate. They're problems of SQL databases which are surfaced through JPA/Hibernate, but if you skip out on JPA/Hibernate you still get exactly the same problems (maybe in a slightly less recognisable form). The real solution is to stop using these overrated datastores, but if you must use them then JPA/Hibernate is the least-bad way of doing so.

I think the ultimate example of how bad JPA can be is JPQL. They managed to take the worst aspects of everything, and then not just add them together, but multiply them!

lmm · on April 12, 2021

> Every field is not writable in a database. There’s often constraints that restrict how fields can be updated.

There are, but there's no metamodel that exposes them. All you can do (in a generic/programmatic way) is attempt a write. A particular write may fail because of constraints, but this gives you no information (except in ad-hoc database-specific ways) about what kinds of writes might succeed.

> It’s the only way if you insist on having the entities track their own state changes via point in time snapshots. If they represent the changes themselves there’s no requirement for reflection.

Sure, but that creates a major impedance mismatch. In SQL databases the current state is first-class and changes are very much seccond-class.

> In the real world the lazy loaded view get tested with one widget loading five wozzles, then in production it’s one widget loading 5000 wozzles, each performing a separate query to hydrate its state.

Sure, and then you take 5 minutes and actually profile it, actually read a little bit of the hibernate documentation, and fix it. I find it strange how for most kinds of technology the accepted wisdom is that you should take a bit of time to understand its behaviour (indeed the article says as much regarding SQL), but for ORMs the conventional wisdom is that at the first sign of trouble you should throw the whole thing away.

> On the contrary, a well designed database can and will have many separate applications I retracting with it.

It shouldn't. You are virtually guaranteed to get deadlocks (even in a single application deadlocking is easy - you need to have clear rules about which things can be locked in which order and all queries need to abide by them), updating your schema becomes essentially impossible because you can never know what's using a given column or constraint, you can't make use of temporary tables because it won't be clear what owns or is responsible for them, validation has to be done in the database which makes it very difficult to unit test or deploy (and good luck getting your test environment to look like your prod environment if you have to co-ordinate between n different applications)...

> The complexity of learning SQL is grossly overstated.

The complexity of learning Hibernate/JPA is grossly overstated too. Seriously, it's not that hard if you actually try.

victor106 · on April 12, 2021

Lot of Hibernate hate in this thread. This really helped me. Strongly suggest. Also Vlad writes really awesome posts on using Hibernate along with Spring etc.,

https://vladmihalcea.com/courses/

alserio · on April 12, 2021

Vlad Mihalcea is awesome! However, Hibernate, while helping in some use cases, in my opinion should not be the default solution one reaches for. It makes way too easy to do things in a way that works for a while and explodes later. And it is way too easy to be misunderstood, leading to performance and compositionality horrors. Still it is the only Java solution I know that manages to be reasonably portable between different DBMS.

victor106 · on April 11, 2021

Just because you're using Hibernate doesn't mean you have to use it for everything

—Gavin King, creator of Hibernate

Source:-

https://twitter.com/markuswinand/status/456827165938434048?s...

alserio · on April 12, 2021

"But then say goodbye to the consistency of your second level cache" - Gavin King, probably

karmakaze · on April 11, 2021

Hibernate was made to solve the problem of JavaEE enterprise bean persistence. It made sense 20 years ago. It is a mismatch for the API-oriented transactions we tend to write today.

amenghra · on April 12, 2021

I solve Hibernate problems by using jOOQ instead. The migration more than paid off the couple times I did it.

ChrisWreck · on April 12, 2021

How do you solve storing/updating/removing child entities of another entity (which also may be a child of another entity) effectively?

I use jOOQ as well, but mostly only for the reading part. For the above case, I still use Hibernate, as I've yet to find an efficient way to do it in jOOQ.

cryptos · on April 12, 2021

Even Lukas Eder, the creator of JOOQ, suggests to use Hibernate for CRUD and JOOQ for querying: https://blog.jooq.org/2015/03/24/jooq-vs-hibernate-when-to-c...

amenghra · on April 13, 2021

My models have always been fairly simple with not too many levels of child entities. I contain all the logic for a given entity in a single class (where I hand write the code to update related entities). I do have to keep my fingers crossed that nobody will implement similar-but-wrong logic in some other spot in the codebase. The 1:1 mapping between entities and classes to CRUD helps to some degree.

paulryanrogers · on April 11, 2021

> This loop has to stop, what defines the value of the projects we are working on has nothing to do with technologies and frameworks.

> I WANT to solve business problems, I do not want to keep solving technical issues.

It is interesting how emotionally invested we become with our tools. Yet we don't have infinite time to become productive with all possible frameworks. So we have to specialize at least somewhat.

As to JPA itself, I agree that it's generally a bad fit for most uses. Circa 2010 I used it with Java Enterprise in the hope that an failed bean could be recovered by a parallel worker, but the technical costs were crazy high. And often I had to drop to raw SQL anyway. Less invasive ORMs can still be more generally useful, and remove some tedium.

islon · on April 12, 2021

This post reminds me of what happened some weeks ago at my job.

We use Clojure and yesql (a library where you write sql queries directly in .sql files and the library generates functions for you). The devops noticed one of our queries was slow and taking too much DB resources. He sent us the query which we promptly found and then asked us if we could change it to a more performant version he just wrote.

We took his query, just replaced in our sql file and deployed. 10 minutes later he came back saying things look much better and the query update really helped.

I wonder how much time it would take if we were using JPA/Hibernate as we used to, many years ago.

KamBha · on April 11, 2021

I find this argument a little odd:-

"Frameworks do not keep retro-compatibility"

I would argue that is a good thing and in the case of Spring, doesn't feel true. In part, Spring is so hard to learn because it has too much support for legacy stuff (though this may have changed in the last few years).

The only issue with not supporting retro-compatibility is because the JVM doesn't support retro compatibility. I recently tried to upgrade an application to the latest version of Java (I think it was Java 12) and found that date formaters were completely changed to match an ISO standard which broke a lot of code. We also found a similar problem with the Java 8 transition as well forcing a huge upgrades to our libraries.

I am of the view that libraries and frameworks should support not support retro compatibility but the language should. This way, if you don't want to upgrade your library then you don't have to.

Using the web as an example, old Angular JS applications still work today as they did back when they were first written. I doubt applications written in the first version of Spring will work with the latest version of the JVM.

api · on April 11, 2021

From my experience ORMs are good time savers when you have relatively simple query needs, but fall down when things get really complex.

The_rationalist · on April 11, 2021

Nothing prevents you from using raw SQL queries with Hibernate. HQL is just a convenient superset.

karmakaze · on April 11, 2021

Actually mixing raw queries with JPQL/Hibernate queries is the worst of all worlds. To get it right, you will end up making explicit calls to let the EntityManager know what you want each side to be doing to play nice.

dopidopHN · on April 12, 2021

I've seen that a lot in the past 10 years, and I don't find it particularly problematic.

Often the fucked up native query end up in a different type of DAO for performance / concurency reasons anyway.

Don't get me wrong, I dislike JPA and Hibernate really much, but that particular aspect has often been one of the less painful to deal with.

The_rationalist · on April 11, 2021

I've never hit such a bug. Can you expand on what (and when) added complexity would a raw query have versus a raw query without hibernate?

karmakaze · on April 12, 2021

It's not a bug, it's by design.

Daishiman · on April 11, 2021

"Really complex" means getting down to 6 or 7 joins for your typical SQLAlchemy or Django ORM query. These sorts of queries comprise 5% of my queries, tops.

Seems like a fair tradeoff.

dopidopHN · on April 12, 2021

Exactly, with hibernate if something is too hairy, I use the native query system. ( and hibernate still map the result for me )

tarkin2 · on April 11, 2021

I have one of those queries. It's the most important in the database sadly. As django bloats and bloats the table space, over the years, more and more do I need those queries, and slower and slower becomes my app.

Daishiman · on April 12, 2021

The mapping of Django models to tables is trivial. Django's not introducing a difficulty, though the entity modeling may be.

tarkin2 · on April 11, 2021

And time. Let's not forget time.

Years of Django's ORMs has given our app over a hundred tables. Complex and /fast/ queries are next to impossible.

Django's ORM made app development lightening quick for the first developers. And impossible for the ones fives years later.

FridgeSeal · on April 11, 2021

ORM’s and other frameworks should come with an “eject” command that once you run it, gives you all the raw SQL/generated config/etc etc.

stefan_ · on April 11, 2021

Where simple already doesn't include pagination, when all those frameworks generate OFFSET queries that cause the equivalent of a full table scan when someone clicks "last page".

Uehreka · on April 12, 2021

Sorry for being off-topic, but why is Hibernate called Hibernate?

I can't find an explanation on the Wikipedia or GitHub pages, and whenever I see the name come up on HN my brain does a weird double-take as it goes "Is this an OS hibernation tool--No, it's the Java database thing... does it make objects go to sleep?"

dirkt · on April 12, 2021

Yes, it makes your objects go to sleep.

The use case is "you have a collection of objects that reference each other in memory. You want to say: Bam, put that structure like it is into a database ("put it to sleep"), and retrieve it again for me if I need it ("wake it up after winter is gone")."

For this type of ORM Hibernate works fine. If you want to use a database as a database, it resists at every turn.

nwatson · on April 12, 2021

Probably a reference to how JPA-Hibernate lets you fatten up your creatures (in memory) and then put them to sleep (in the RDBMS) until they're ready to play again.

watwut · on April 12, 2021

> Nobody understands the cache mechanism, you end up de-activating it. Worse, those understanding it are not caching query responses, they are caching entities.

I feel like hiring process should filter these kind of developers out. At least most of them.

I also think that if they get in, internal processes should limit these developers impact, limit their ability to make decisions. As in their careers should be stagnant and they should not be gaining influence.

harryvederci · on April 11, 2021

The only positive experience I've had with it was in a small application with a not-so-complex DB which was created entirely through Liquibase.

In large enterprise projects, I've always had to create custom SQL statements at some point, at which point I'd rather do everything in SQL. Otherwise, you have to know JPA/Hibernate and SQL, which (in my opinion) defeats the purpose.

kgeist · on April 12, 2021

Interesting, as far as I know, PHP's Doctrine was influenced by Hibernate but it doesn't require parameterless constructors, and data is hydrated directly as object fields, without requiring getters/setters. What's the reason for enforcing such requirements in Hibernate?

watwut · on April 12, 2021

It is JPA requirement. In java, if you are creating new object, you have to call some constructor. If you dont have no parameter constructor, the framework would have to somehow call a constructor with right paramaters. The date are filled directly to fields. But no parameter constructor is needed so that object can be created in the first place.

Now, in Hibernate itself, the no-parameter constructor is not needed. Hibernate supports an "interception mechanism" which allows you to use whatever constructor you like. But that is not part of JPA standards and for most it is simpler to just go standard way.

xupybd · on April 12, 2021

I have been using Entity Framework recently. I've not had any past experience where I have liked an ORM. This one seems alright.

I do wonder if I will fall into the same trap as complexity grows. But the migrations are just amazing so far. For them alone I want to stick with it.

jillesvangurp · on April 12, 2021

There are lots of valid reasons to avoid JPA/Hibernate beyond those listed in the article.

- Hibernate is using blocking IO and threaded database pools. This makes it a problem if you want to use non blocking web frameworks. Like Spring's web flux. You should consider it a legacy technology for this reason alone. There's a good reason why there is no drop in reactive replacement: modern frameworks are trying to not repeat some of the design problems with hibernate (see the article for an overview of those).

- It comes with its own category of hard to diagnose and fix bugs. I've been on more than one project where I had to clean up other people's messy transactional logic. One symptom is people copy pasting @Transactional everywhere as if it was some kind of magical incantation that says "dear db gods please just make this work consistently". A second symbol is flaky tests where that clearly is not working as advertised. A lot of this relates to things like aspect oriented programming and reflection which are what hibernate uses to generate byte code at run time.

- For the same reason, hibernate is also a problem if you want to natively compile your code via e.g. Graal. Reflection and byte code generation are problematic for that. The less you have of that, the better.

- For the same reason, hibernate is also inappropriate if you need fast startup times (which would be why you'd consider native compilation). For example because you are doing server-less designs or edge computing. Having 10-15 seconds of startup overhead is not great. Warming strategies can mitigate this somewhat. But honestly, there's no good reason for Java servers to take much longer to start than it takes the JVM to start (which is still around a second or so). As soon as you get rid of hibernate, Spring Boot startup times become a lot more reasonable. If you then switch to using it's bean DSL (as opposed to scanning packages for annotations), it gets better still. Most of what Spring does at startup is millions (literally) of calls into various reflective methods. That's why it takes so long.

- Object impedance mismatch. Designs that need an ORM layer might not be that optimal. I've been on multiple teams where people got carried away a little too much with e.g. overusing inheritance and coming up with complex solutions to make the database mirror the class hierarchy. The result is dozens of tables and dozens of joins on read. I've seen GET operations that had 1500ms response times because of this. It's stupid. It's stupid even after you fix all the silly joins, missing database indices, etc. You can do good database design with hibernate of course. If you understand how to do that, hibernate is just another tool and not a particularly critical or important one. I've removed it on a few projects to simplify the design.

- These days it is valid to treat databases as document stores. Once you refactor a 15 table database to be the 3 tables that you really needed all along, most of Hibernate is just not needed. My golden rule is that if I don't query on it, I don't need (or want) separate tables or columns for it. Nothing wrong with storing some json blobs. I love using databases because they are fast, transactional, and come with some strong consistency guarantees. Hibernate is not a great fit for document databases. It assumes your domain consists of columns and tables and it wants to do clever things with joins to make that seem like an object tree. The best join is the one you don't need. That's why document stores can be so nice.

If I had to do a green field project, I'd probably go for R2DBC with Spring or maybe one of several other Kotlin reactive database frameworks in combination with ktor, http4k or one of the other emerging Kotlin server frameworks. All my recent projects are using spring web flux and Kotlin co-routines in any case. So, using something non blocking is a hard requirement for me.

But if I had to use hibernate, using Kotlin is the way to do it. It shovels most of the ugliness under the carpet via compiler plugins. So you can use nice immutable data classes and let the kotlin compiler worry about adding default constructors, opening the class and adding getter and setter cruft just so hibernate can do its runtime magic. Also it removes all of the need for hacky things like Lombok and its gazillions of additional annotations. Hibernate can be a lot less painful if you just do it properly. But not using it is better still.

nnanda · on April 12, 2021

JPA madness is still ON and that is unfortunate. Spring framework guys at Pivotal are designing all kinds of database access (SQL, NoSQL) around JPA interfaces.

cryptos · on April 12, 2021

A default constructor must be present, yes, but if you use Kotlin a compiler plugin can generate this constructor that is not visible in your code base (but would be if another module would use the compiled byte code).

JPA/Hibernate does not require getters and setters. Fields can be used directly for ages (more than 10 years).

The arguments against reflection are pseudo arguments, because the developer doesn't use reflection himself where usual programming language means would be better. Instead the framework uses reflection to free the developer from writing boilerplate code - completely different things!

The claim that it would be impossible to return unmodified collections because one would be forced to provider getters and setters is wrong. If Hibernate uses the fields directly, it is no problem to give only an unmodifiable view of a collection to the outside.

I don't see a general problem with lazy loading. You have to understand what you're doing, but that should be the case with every framework. Hibernate is quite flexible with eager and lazy loading and it is possible to specify an entity graph per use case to say Hibernate what to load eagerly and what not. Another possible is to use dedicated query objects (usual entity classes mapped to the same table, but designed for a certain use case).

The section about "Accessing a single table field" ignores the fact that the object mapped with Hibernate could also only use the needed fields. You could even select a single value with JPQL without any domain class involved. This argument is kind of dumb and wrong.

"Constraints" with Bean Validation are not a core part of Hibernate and shouldn't really be in this article. But, as they are already there, let me say that the criticized "too late" enforcement of the specified constraints is only the default behavior (because Hibernate can not know when to perform the check). Nothing prevents you from performing these checks explicitly with the means of the Bean Validation API. But apart from that, I'd prefer to write this kind of rules as usual code, too.

"Framework updates are horrible": I had never a severe problem switching to a new Hibernate version. Same is true for Spring (but wait, why does this article mention Spring at all?).

"Rename your JPA Repositories to JPA DAOs": The author doesn't understand the difference between a DAO and a Repository. A repository is at the same level as an entity or value object, it _is_ a domain concept. BUT the implementation of the repository interface is indeed part of the infrastructure, but this implementation would reside in another package. There is no need to name "repositories" always "repository" - if you want to name the class "BankAccounts" then do so! By the way: the class name "BankAccountsJPAImpl" is a crime!

The advice to generate IDs in your own implementation is kind of silly. "if an id is generated, it should be done knowingly" Why? Usually an ID is just a number with no special meaning other than being unique per entity type. Even if you'd decide to drop relational databases and use advanced CSV files instead the sequence number generation could easily be implemented then. And why on earth should one want to test the sequence number generation? I trust every solid database to be able to generate perfectly fine sequence numbers!

The section "Stop adding multi-directional association" has nothing to do with Hibernate specifically, because that is some kind of general design consideration.

One big question remains unanswered by this article: What is the alternative? The author implicitly suggests that it would be SQL. But things are not so simple. If you want to have a domain model (and not a mixture of domain logic and technical infrastructure), you not only have to load data from the database and create objects from it (the easy part), but you also need to find out what data needs to be written to the database after business operations were performed. I'm not aware of a simple solution here (but let me know ...). There are other things like optimistic locking where Hibernate simplifies things a lot.

All in all the article is not well balanced and flawed.

Areading314 · on April 11, 2021

What is the argument in favor of server-side Java in 2021? It seems like alternatives like Go, Python, or even JS are far ahead at this point

the_af · on April 11, 2021

Far ahead in what sense?

In my previous job we used Java and the benefits were a huge ecosystem of libs and tools, plenty of monitoring tools, a ton of expertise and people who understood its memory model and quirks and were able to troubleshoot production problems. The JVM is rock solid and has great performance for backend with lots of transactions.

I'm not experienced with Go, Python doesn't seem particularly suitable, and server-side javascript looks like a nightmare to me.

NodeJS seems crippled to me.

paulryanrogers · on April 11, 2021

One benefit I've been seeing with server side JS is from FAAS like Lamba which can serve unpredictable loads very inexpensively.

eeperson · on April 11, 2021

How is that a benefit specific to JS? Don't those platforms generally support Java as well?

cle · on April 11, 2021

They do but the Java ecosystem takes a hit here, with many libraries being slow to initialize. The slower it takes to startup, the longer it takes to absorb the new load, to the point where you either keep some headroom (IOW, waste $$$), or try to predict load (complex, also wastes $$$).

There are attempts to fix this with e.g. Graal, which effectively does the expensive initialization and reflection at compile time, but there are so many downsides and pitfalls right now with Graal that I don't consider it a serious solution to the problem. It's basically creating a new ecosystem, which means one primary motivation--to take advantage of the Java ecosystem--is much less compelling.

This isn't constrained to the public ecosystem, lots of companies have their own internal libraries, and Java makes it very easy and even encourages doing lots of expensive things at startup, like classpath scanning and pre-caching things. For a long time, Java made explicit decisions to de-prioritize startup time to improve maintainability (reflection/scanning/dynamic classloading) and runtime performance (e.g JIT). This tradeoff doesn't work out so well in a world of ephemeral processes that come and go as demand changes.

I guess what's specific to JS, Go, Python, et al is the cultural emphasis on fast startup. Interestingly these all come from different constraints but the net result is that, in general, you can go from cold to serving traffic much faster than with Java, with a lot less effort.

eeperson · on April 11, 2021

> There are attempts to fix this with e.g. Graal, which effectively does the expensive initialization and reflection at compile time, but there are so many downsides and pitfalls right now with Graal that I don't consider it a serious solution to the problem. It's basically creating a new ecosystem, which means one primary motivation--to take advantage of the Java ecosystem--is much less compelling.

I'm not sure I understand what the downsides are for Graal and FAAS. There are some pitfalls around reflection but even those don't seem to hard to avoid. Is that what you are referring to?

cle · on April 12, 2021

Graal has incomplete support for Java. Last time I tried a few months ago, ObjectOutputStream was not supported (I think it is now?), which was a critical part of a library I was using, forcing me to abandon Graal.

Dealing with reflection is pretty awful, you have to comb through your entire dependency graph.

Targeting the compiler for a different OS/architecture than the host is difficult, to say the least.

There are so many great languages and runtimes to pick from these days...I can't imagine why anyone would willingly choose a technology like that.

(Don't get me wrong, Graal is a fascinating technology, and I hope that one day it is able to seamlessly compile Java code to native executables...but it's not there yet.)

alserio · on April 12, 2021

Not gp but avoiding the reflection pitfalls is not so straightforward at all, you have to chase what your dependencies are doing to satisfy the "closed world" hypothesis. I still like java more than javascript, but writing performant serverless functions using js is way more easy right now

eeperson · on April 12, 2021

You don't have to chase your dependencies. You just need an adequate suite of tests and then run them with the Graal Tracing Agent enabled [1].

[1] - https://medium.com/graalvm/introducing-the-tracing-agent-sim...

cle · on April 12, 2021

Your tests would need to exercise all your transitive dependencies' code paths that are used in production. I very rarely see tests written that way.

AaronFriel · on April 11, 2021

The JVM is rock solid, and recent improvements in garbage collection have reduced tail latencies dramatically.

But I find that most people who say that it has "great performance" have not built a parallel implementation in JS, Go, Rust, or even .NET Core. I'm omitting memory unsafe languages by default and Python here, because I think that's the domain Java competes in, and Python lacks the investment these other languages have in performance.

The lack of value types and the amount of pointer chasing that JVM languages do as a result, the way generics are implemented via type erasure (which the JIT then has to re-optimize), and so on usually mean that CPU and memory usage for the same throughput is much higher than a competing implementation in a different language. And on older JVMs, tail latency will be orders - plural - of magnitude worse.

It is absolutely true though that for most workloads that efficiency isn't necessary and the ability to reuse that ecosystem reduces time and cost to develop. But it's just definitely not true that Java has "great performance" and I don't think that's ever really been true.

ssijak · on April 11, 2021

TechEmpower Web Framework Benchmarks would like to disagree with you.

AaronFriel · on April 11, 2021

I don't think those are particularly realistic workloads, as they don't involve substantial amounts of working with in-memory data. Which of the TechEmpower benchmarks uses an ORM?

Before I finished drafting my comment I did have a sentence like this, which I removed, "Barring obscene amounts of optimization", so yes, some web servers like netty and jetty have gotten to a level of good performance in terms of handling plain requests.

But most line of business backends are not using plain jetty/netty, they aren't just responding to every request with the same "SELECT" query to a backend database. They're doing computation, they're storing intermediate data in data classes like ArrayList, TreeMap, etc.

And then of course due to business requirements, they often have to implement some in-process caching, and suddenly the lightweight Java application is a bloated multi-gigabyte CPU consuming monster.

I just don't see that happening often with more memory and cache friendly languages like Go, Rust, Swift, or even JavaScript on V8/Node.js.

thu2111 · on April 11, 2021

JavaScript is hardly cache friendly.

As for Go, that's a language which doubles up the size of every pointer and doesn't even use a moving GC, and last time I looked, the quality of machine code it generated was atrocious. It's not really cache/hardware friendly to do those things. Value types are I suspect being over-estimated here: when Java gets them I am expecting disappointment when they don't magically make everything twice as fast.

jacques_chester · on April 11, 2021

> Which of the TechEmpower benchmarks uses an ORM?

There are two: single query and multiple query.

AaronFriel · on April 11, 2021

Sorry, I should have been more precise. I am very, very familiar with the TechEmpower benchmarks and I first learned Java around SE 5, right after they switched from 1.x numbering. Please don't mistake me for someone who just learned about Go or Rust and is evangelizing them because I think they're the cool new thing.

Which of the Java implementations for the TechEmpower benchmarks use an ORM? Are they representative of the kind of code you would write? I think that the TechEmpower benchmarks suffer from many of the same problems the language benchmarks game benchmarks do - micro-optimization, unrealistic workloads.

My experience tells me that you an get any sufficient level of performance in almost any language, but that you are going to pay for some languages more in opex than others, particularly in memory usage. It takes more compute spend for a workload written in Java than one written in Go, all other things being equal. That's not to say Java is a bad language, but it does lack many features - some of them being intentional design decisions - which make it less cost effective to operate systems built on Java. However, we know that a significant cost is the cost to develop, so it's hard for me to say Java is a bad language for that reason either.

And memory usage is generally a good predictor of density in terms of scheduling workloads, be it Tomcat servers (back in the day) or VMs or containers these days. I also think that Java suffers, performance wise, from boxing values and pointer chasing / poor cache locality. The default container implementations are just, well, it would be polite to simply say that they're as good as the language allows.

However, data is better than claimed experience, no?

I just opened up the raw benchmark stats[1] for the database updates route. It's one that my favorite languages don't do well in, but I was curious about the operational overhead of running them in memory usage, something I've mentioned quite a lot up above.

I looked at a vertx-postgresql benchmark for the "updates" TechEmpower. This is a high performing implementation without an ORM[2]

I also looked at quarkus + reactive routes + hibernate, which appears to use hibernate, applicable to the original post[3].

And lastly, I looked at actix diesel, another ORM using implementation[4].

    java quarkus-hibernate:  3.9GiB memory (peak, start of test)
    java quarkus-hibernate:  3.1GiB memory (lowest value, near end of test)
    java vertx-postgres:     2.35GiB memory (consistent)

    rust actix-diesel:       1.2GiB memory

Standard deviation was:

    java quarkus-hibernate:  221.4MiB
    java vertx-postgres:       1.9MiB
    rust actix-diesel:         0.5MiB

I included the steady state for quarkus because its memory usage (perhaps due to a config flag starting it with a 4GiB heap?) started out extremely high and decreased over the course of the run. That likely affects the standard deviation, which I included to highlight that I didn't try to cherry-pick results.

Perhaps the funniest thing to me digging into it is, again due to the absurdity of Java's design decisions, to make sure that "Integer" objects are efficient, the Java benchmarks use the command line parameter "-Djava.lang.Integer.IntegerCache.high=10000". This tells you that if the benchmark used a wider range of random values[5], performance would degrade. Have you ever heard of a language requiring an integer cache? It's absurd to me that Java, rather than implement value types, requires Integers to be interned for performance.

Are there any other languages in the TechEmpower benchmark or the Debian benchmark game (formerly went by another name) that requires setting an "IntegerCache" to optimize... allocating integers? I mean, come on. You can't tell me this is a language that was designed for performance when integers can't be directly stored in arrays and instead have to be autoboxed and a cache is needed to intern them!

I will say one final thing: cost to operate/memory efficiency is just one metric for measuring languages. I think that Java is actually a pretty bad language for a lot of reasons, but path dependence has produced an extremely rich ecosystem that gives developers a lot of flexibility and a lot of tools to use when writing it. I think Kotlin, Scala, and even Clojure are by far more pleasurable languages to write in, though the JVM still holds them back for all the reasons above.

[1] Raw results from https://tfb-status.techempower.com/unzip/results.2021-01-13-...

[2] You can see they have simply hardcoded the SQL. See: https://github.com/TechEmpower/FrameworkBenchmarks/blob/mast...

[3] https://github.com/TechEmpower/FrameworkBenchmarks/blob/mast...

[4] https://github.com/TechEmpower/FrameworkBenchmarks/blob/mast...

[5] The update benchmark only requires random numbers between 1 and 10,000. Performance of Java apps would degrade if they were asked to use boxed integers greater than 10,000, which is possibly the most absurd statement I have said of any programming language ever. See: https://github.com/TechEmpower/FrameworkBenchmarks/wiki/Proj...

igouy · on April 14, 2021

> Are there any other languages in the TechEmpower benchmark or the Debian benchmark game (formerly went by another name) that requires setting an "IntegerCache" to optimize... allocating integers?

afaict Java programs shown on the benchmarks game website do not.

jacques_chester · on April 11, 2021

> I think that the TechEmpower benchmarks suffer from many of the same problems the language benchmarks game benchmarks do - micro-optimization, unrealistic workloads. ... It takes more compute spend for a workload written in Java than one written in Go, all other things being equal.

Well which is it, then? You say Java is slower, the benchmarks say otherwise. What other benchmark would you accept?

I hate autoboxing as much as the next numerical processor, and I avoid JPA whenever I can, but that doesn't change that Java is plenty fast on a variety of workloads. It's typically only beaten by the hardest of the hardcore Rust and C++ implementations.

AaronFriel · on April 12, 2021

I think that if all you're doing is serving the results of plain SQL queries, which is what the TechEmpower benchmarks are, then it's really hard to pick a bad language. Almost every language is capable of tens of thousands of requests per second. Even Ruby, a language we haven't brought up and is dreadfully slow, will do thousands of requests per second with Rails. Beautiful language, abysmal performance (relative to what's possible).

Once you're doing non-trivial things on Java, and I've outlined what those things are in my previous comments and they primarily revolve around memory, wall clock CPU time correspondingly increases as your program spends more time chasing pointers on the heap, poor cache locality, lack of value types, poor monomorphization of generics (until the JIT kicks in), and so on. These things all add up.

I'm not saying it's impossible for Java to be fast, after all, if you just store everything in a "private final double[]" like most of the Benchmarks Game implementations do, sure, the JIT will do wonders for you. But that isn't real world Java, is it?

Real-world Java web servers do more than just respond to epoll_wait(2) events on a loop by sending some bytes to a database, getting them back, and sending them straight back to the client. There's usually more serialization, more authentication, more logging, more metric exporting, more middleware doing one thing or another.

One last thing: GraalVM is the most exciting thing to happen to Java performance since NIO and the newer garbage collectors aiming for sub-ms stop the world times. Quarkus, which I googled over the course of writing my comments, is by far the most interesting new tool I saw for shipping Java in production efficiently by building on GraalVM to deliver web servers in megabytes, not gigabytes of resident memory: https://quarkus.io/

It's a shame Quarkus in the benchmark I saw used so much more memory. It looks like it should be possible to fix that.

TheRealDunkirk · on April 12, 2021

> Even Ruby, a language we haven't brought up and is dreadfully slow, will do thousands of requests per second with Rails. Beautiful language, abysmal performance (relative to what's possible).

It's precisely this terrible performance that makes ActiveRecord so much easier to program and better to use than Hibernate (or EntityFramework). I've written over a dozen Rails apps over the past 15 years, and it's "abysmal" performance has never been a problem for me. For my problem space(s), I'd make that tradeoff every day of the week, and twice on Sunday.

jacques_chester · on April 12, 2021

I agree with enthusiasm for Graal. Related to it are efforts like Project Valhalla and Project Panama, which will continue to do a lot for the current drags on performance. For example, by adding value types.

buster · on April 11, 2021

A huge developer base and eco system? What specifically, can't be achieved in Java, that you think node is needed for?

Scarbutt · on April 11, 2021

Rendering Javascript.

SahAssar · on April 11, 2021

There is https://en.wikipedia.org/wiki/Rhino_(JavaScript_engine) and https://en.wikipedia.org/wiki/GraalVM. The latter of those is the second fastest js server runtime (es4x) according to techempower benchmarks: https://www.techempower.com/benchmarks/#section=data-r20&hw=...

I haven't tried any of those, but saying the JVM can't run JS is not true.

Scarbutt · on April 11, 2021

GraalVM is not a drop-in replacement for the JVM and Rhino is slow as molasses compared to V8.

vips7L · on April 11, 2021

> GraalVM is not a drop-in replacement for the JVM

As far as I understand, yes it is. GraalVM is just HotSpot with the graal jit compiler instead of C2.

rakoo · on April 11, 2021

It works ? I'll argue that 90% of problems solved by SaaS today could be solved by any of the usual suspects of languages. What matters more is the architecture you choose. Choice of language is mostly a convenience choice, based on how comfortable you will be editing software in that language (that includes not only your proficiency, but also the availability of libraries and frameworks to help you)

rjsw · on April 11, 2021

Java has support for the schema definitions that I need, the other languages you list don't. I am happily using JPA/Hibernate too.

ssijak · on April 11, 2021

What is the best argument against it? And the whole JVM platform in general (Kotlin, Scala, Clojure...)

AzzieElbab · on April 11, 2021

I can’t speak for others, but scala’s has two of the best sql db libs I ever used, namely doobie and quill. https://github.com/getquill/quill https://tpolecat.github.io/doobie/