Hacker News new | past | comments | ask | show | jobs | submit login
Object-Relational Mapping is the Vietnam of Computer Science (2006) (codinghorror.com)
75 points by diffix on Feb 27, 2014 | hide | past | web | favorite | 85 comments

To be honest, I prefer to approach this from the other angle. Namely you start with relations and map logic relating to these back up to your object model. This provides cleanliness of code, performance, and hand-tuned design that you don't get with an ORM.

However hand-coding the SQL has some advantages a lot of folks don't want to discuss. While relations and objects are to some extent incompatible, focusing on writing good relational code usually results, at least in anything I write, in a net reduction in codebase size. This is because there are a lot of things that can be done in relations faster and easier than could be done in objects.

The answer in my view is not to pick one or the other but to define an interface across them. This is often an ORM in some projects but having talked with some ORM developers I respect, at least some tend to map objects to relations where the relations are views, thus encapsulating the relational logic behind relational processing (which is smart).

Again I go almost the other direction. I prefer to define logic interfaces which encapsulate the relational logic and then add a service locator to handle the interface. This was a big motivation behind PGObject::Simple::Role.[1]

I am not convinced that ORMs are so bad if they are also used with updateable views (and thus providing a stable interface for the application, protecting the internals of the storage from the application's need for intimate knowledge). However, I personally find it just simpler to write SQL.

[1] https://github.com/ledgersmb/PGObject-Simple-Role

This was a fun blog post, and was shockingly prescient: SQLAlchemy rose right about this time to become one of my favorite libraries and an impressive feat of solving the ORM issue with finality.

They have a sort of low-level python SQL api, a higher level declarative ORM layer, and a custom glue that let's you wire objects and sql together in arbitrary but useful & maintainable ways.

I think this was possible because the SQLAlchemy devs had this insight: "SQL databases behave less and less like object collections the more size and performance start to matter; object collections behave less and less like tables and rows the more abstraction starts to matter."

Yeah, I think sqlalchemy broke the mold. It starts from the idea that you already know and understand SQL and relational constructs well, then adds a layer of abstraction that lets you skip tons of boilerplate and write really elegant code. Personally, while there are lots of good decisions in sqlalchemy like the unit of work pattern (in contrast to the active record pattern), the one that pays off time and again is representing the SQL ast as python objects. It eliminates and entire class of text munging issues that occur when you write raw SQL and just completely makes you wonder "where's the mismatch they keep talking about?"

Do you know of any Java-OR mappers in the same spirit?

Java: http://www.jooq.org/

Scala: http://slick.typesafe.com/

They also have the benefit of being fully type-safe across query projections.

I don't think Slick can be considered a very SQL-centric API. While the Slick folks embrace the relational model, they certainly do not embrace the SQL language. Just as with LINQ, this can be desirable if you want to reason about collections in a more general sense. On the other hand, the SQL standard has gone far beyond the "occasional" OUTER JOIN that can turn out to be a true challenge to Slick.

In other words, SQL has never been purely relational. It is a beast of its own.

I've been there with SQLAlchemy in late 2006, and I vowed never to return.

Let me state this, SQLAlchemys pool/engine/SQL-Api is great, and it's got nothing whatsoever todo with ORM.

But SQLAlchemy does contain an ORM, and it does become unmaintainable just like every other ORM. It roughly goes like this:

1) You use the ORM

2) Things get slow

3) You write a bit of custom mappers

4) Soon all your logic can't be operated without custom mappers

5) Congratulations, you've now written all the handwired code you would have written without an ORM, plus all the ORM code on top, it's now unmaintainable, full of bugs and oversights AND slow.

And that about sums up everybodies experience with ORMs unless they do trivially small things. It's at this junction people begin thinking about solving this ORM problem. And they learn it's bloody hard.

Obligatory XKCD reference: There are N ORMs on the market, they all suck. Let's solve all these problem and do a new ORM to rule them all. There are now N+1 ORMs that all suck on the market.

Second this experience. A single row pk lookup using SQLAlchemy adds 30ms overhead to a query. That is huge. To avoid this, last time I touched it, I ended up just creating my own DB access library that spit out simple objects and used raw SQL underneath. I believe for any large system that is going to be the answer.

Having said that, do not underestimate the number of apps doings "trivial" things. There are lots. The problem with SQLAlchemy I think is that it is actually advanced enough to have you use it right up to the point where you must rewite all your code because it suddenly is just dead slow. In contrast the Django ORM is too simplistic to get too far with it.

Sure for simple things, of which there are many, ORMs can work sufficiently up to a point.

But, the problem is that, you rarely get to go to your boss and say, well, this thing we've been using to do, you know, our data stuff, it's completely wrong. We need to throw everything away and start from scratch.

The only situation in which you get to say that, is when you're explicitly make it clear that you're writing a raw prototype that'll not scale, and that nothing short of a complete rewrite is gonna be required as soon as you finish (and even then, it can turn out tricky to actually do that).

Unfortunately, most people who get themselves into the ORM mess don't realize that this coming for them in the future. So they don't go their boss and make the prototype proposal. This all but cements the entire failure of the project right then.

I don't know my main experience is with the Django ORM. Others have mentioned that some ORMs seem to start from the relational model, and transform those into objects (which sounds like how the Django ORM works to me).

I find this works well. It speeds up development of all the simple stuff , and by the time it gets complicated, you can drop down to SQL.

I don't find the two ways of thinking incompatible, in fact the "extra" method on Djangos ORM is an intermediate where you let the ORM do its normal simple style of mapping, but allows you to add extra clauses to the SQL manually. Once you do this you see how the ORM works, and can choose quite easily when it is appropriate to use it.

You loose a bit of the magic that comes with Django by the time your queries get more complex (like auto generated admin pages), but the benefits outweigh the costs immensely in my case.

I am in the same boat as you currently. I work with Django almost exclusively these days and use the ORM in every project. The difference is that I am working with mostly normalized databases and datasets in the 100's of megabytes. This is a great place to use an ORM. You get all the simplicity and quick prototyping and pay very few penalties. When you have a slow query you can usually address it by adding some select_related() or prefetch_related() and you are done.

What this thread is talking about is when you have datasets that are several gigabytes in size and up, when you have many dozens of tables, all interrelated and when you need to squeeze as much performance out of your hardware as you can. When your code gets complex enough where you cannot just insert an object, but have to do all manner of verification through half the DB first. Or where you need to operate on 100,000 rows in an intelligent manner very fast (think using stored procedures and complex query parameters). This is a place that you might end up and if your entire codebase depends on an ORM, you are often out of luck.

In the long run, I think there several things to do: (1) know SQL well, (2) use an ORM when you know the project will be small, and (3) anticipate that if you are growing and using an ORM, you will need a re-write.

I agree, and my application is pretty much what you describe in your last paragraph.

80% of the queries are simple and use the ORM directly. A few need an extra clause for stuff that isn't handled by the ORM, and I have a couple that need to be high performance (and are abstract enough that they don't fit well with an ORM), which are done via the raw SQL.

I completely agree with your premise, don't get me wrong. I think the ORM is a great way to do a small project that you know won't grow big (there are plenty more of these than big projects and most of the time you can tell. For example, a small-ish e-commerce site I was involved in last year).

My experience was somewhat different in terms of ditching SQLAlchemy: the performance started to suffer as we kept adding clients. Also, developers started tripping over their own feet to do simple things as relationships got much more complex and the ORM did not allow us to impose certain rules easily (data did not always flow through the ORM as it was due to historical and performance reasons). Instead, a co-worker and I sat down and wrote a small library that did the common things for you. It was broken up into modules and we standardized on some specific methods for each "object": get, find, insert, delete, update. Each object also had non-standard more interesting methods that allowed us to manipulate relationships in a controlled fashion. The management did not mind this as it took very little time, was a gradual transition (we converted performance critical code first, then anytime we touched anything else).

Some things I learned from this:

1. Creating and maintaining a library like this is actually very little work. Who cares if it's raw SQL if it's super simple and you only write it once. Something like "SELECT * FROM vehicle WHERE id = ?" does not need unit testing, etc. and almost never changes. On the other hand the crazy complex queries we had were actually simpler in raw SQL than trying to express them in the ORM. To use an ORM efficiently you must know SQL already, except you can usually write better, clearer SQL than an ORM can.

2. The benefits in performance were huge. We could do all manner of dirty hacks inside the library functions since the library was self-contained.

3. Migrations were a pain, until we created a simple text file where we would append the latest migration. I used a tool that compared schemas to verify that the live database was in line with what a blank initialization of the DB looked like. This could definitely use some better tooling but practically this was a small pain.

4. The benefits from code that used this library were even bigger than the performance benefits. Other developers on the team were locked into what that could do with the data. If they needed to create a new method for a given table/object, the went ahead, but anything complex typically got reviewed by at least one other person and bugs were caught early. Moreover, all the data manipulation now happened in one single library so it was much harder to introduce inconsistencies.

5. We got to actually use our RDBMS. ORM's try to give you the lowest common denominator between the ORM's they support. They don't typically let you do views, sharding/partitioning (in the RDBMS sense, not in the application), SQL functions/procedures, etc. We got to do all of this and it often got to be the better solution.

I played around with sqlalchemy a bit this week, it seems you still have to dive down to the bare relational manual mapping (by creating a table by hand) to do something simple, like many-to-many mappings. Is there a better way to do it?

Honestly I'm extremely happy with EF6 and LINQ. If you know what you're doing [1] it works extremely well.

Having used EF6+LINQ in 2014 I feel this post from 2006 is quite outdated. The article tells you as much:

> Work in this space has thus far been limited, constrained mostly to research projects and/or "fringe" languages, but several interesting efforts are gaining visibility within the community, such ... the LINQ project from Microsoft

LINQ's come a long way since 2006.

[1] Mostly this comes down to understanding the leaky abstraction between what IQueryable can do [2] with respect to what an ORM can do with SQL generation

[2] Also knowing your IQueryables from your IEnumerables.

I wrote an early Java ORM (1996-2000) and I've come to the conclusion that ORMs are a bad idea. ORMs make the easy even easier, and the difficult impossible. The basic problem is that writing good, high-performing SQL is hard enough as it is. You have to understand indexes, clustering, execution plans, and query optimization to get good performance. Add an ORM to the picture and then you have to figure out how to say what you want in this brand new language, through a mapping layer.

I don't want to give up either objects or relational databases, (the choice that some other comments have suggested). What I am willing to give up is an ORM generating SQL. Writing SQL doesn't bother me. What bothers me is dealing with Java PreparedStatements, Connections, and ResultSets. This is all trivial boilerplate stuff, that an automated approach can do easily. I want to preserve my brain and fingers for the hard stuff, writing clean, fast Java and SQL.

I have heard that iBatis takes this approach, but I haven't tried it myself.

> The basic problem is that writing good, high-performing SQL is hard enough as it is. You have to understand indexes, clustering, execution plans, and query optimization to get good performance.

Not to mention cache specifics (something that has personally bitten me).

To be honest, my preference is actually to write a service locator against stored procedures. This allows queries to be maintained separately from the application (and loosely coupled from them also). This is not too far from what iBatis does except that I would use stored procedures instead of mapped sql.

What db's are you using? If Pg only, interested in porting the PGObject approach to Java?

I think the bigger problem with ORM's or database driven application design in general is that you start to see the world through the eyes of the database and how you move things around inside it. Thus, your code resembles your database more than your database resembles your code.

Ironically, developers would never think to do this with a storage mechanism like the filesystem. There is no great popularity in filesystem based ORM's. Somehow when we deal with the filesystem we treat it as it is - data storage and retrieval. When we deal with 3rd party api's we tend to keep them at arms length as well. Yet, when we deal with the database, we treat it as some other thing that seeks to influence the design of our code on a fundamental level.

I wrote about this like a year ago: http://brianknapp.me/the-filesystem-test/ and I don't think much has changed since then.

Thing is, for a lot of applications, the structure of the database is _much_ more important than the structure of the code. It's easier to refactor your code later than to deal with a database which is structured incorrectly, since when you refactor that you risk losing the data, have to run change scripts and have downtime, etc.

You can view the database as merely being there to support your program, or you can view your program as being there to manage the database. If it's the latter, then it's not surprising that the database design impacts the design of your code. So we don't keep it at arm's length, because it's much more important and complex than a 3rd party API or even a filesystem.

I think is analogous to Linus's quote about good programmers worrying more about the data structures than the code.

Certainly when my team leader suggests another quick hack at the code level to make the database do what we want, I usually push back, and suggest a change at the database level to model the situation more accurately. His solution would be quicker, but mine will be more robust and maintainable (so less work in the long term).

Isn't this because when using a filesystem, we - in almost all cases - don't have relations between files we need to use?

If I needed to look in a folder containing a million files, find a subset containing the word 'foo' and match to the 'joining' files in another 4 folders (all of which contain 500k+ files) - I think my code would resemble my filesystem more than my filesystem resembles my code.

Except files are modelled. You have a bunch of functions for finding the right one, then a bunch for manipulating files at rest, then a few for giving you a file descriptor.

The file descriptor represents the file in your language and is the equivalent of parf of what some ORMs do. It's an explicit interface between external data and your program, just like a good SQL library (like SQLAlchemy).

Sounds like you want to start using go. It basically assumes a database to be a black box much like file storage or an api, you keep away from it until you really need to put that data somewhere. This has the added benefit of being able to unit test almost all of your code. The only parts you can't effectively test are the storage layers like apis, fs, and db. Hmm to be a little more clear, I dont mean you avoid it, I mean you abstract it away into simple storage access interfaces the same way you would with fs or apis. The last thing you want is orm riddled logic.

Having experimented with golang a bit I don't see any significant advantages over other languages in interfacing with sql dbs - the db drivers return rows which you convert back to your objects, and it's up to you to wire up the objects and relations from that, and there are even a few issues like calls to insert not returning the rowid inserted so you have to call LastInsertId for Mysql/sqlite (last time I used it). You still have to decide how rows returned are converted back to your golang objects, deal with nils, relations etc, all that logic has to go somewhere, and it's not in the db drivers.

The problem with ORMs is not with just storing data, that's pretty simple and you don't even really need an ORM for that. The most difficult problem comes when you introduce complex relations, and need a way to retrieve those datasets from the database. AFAIK golang does not help you in any way with that, so you'll have to invent your own ORM and conventions for representing things like belongs_to, has_many and join relationships. If you have join tables you'll be doing the relations yourself which is not always simple or performant - this is what ORMs are useful for.

Clearly ORMs break down in apps of a certain size and have their own issues (largely due to non-optimal SQL and/or a reluctance to break out of the ORM when necessary), but every app contains the sort of logic which is in an ORM, because it must map storage to in-memory and view representations. If you have logic to convert *sql.Rows to your objects, including relations, you have written a simple ORM.

Can you provide a link? A quick web search only turns up the "sql" package - that seems to be just a low-level SQL interface that doesn't prescribe much.

That's all golang has, it's pretty much like other languages :)

github.com/lib/pq is a typical driver, which returns rows of values which you read to recreate your objects.

I work at a large enterprise. At one point the large enterprise decided to invest heavily in an ahem non-relational model of storing data. This has proven to be pretty dumb. It's both less performant and less reliable than the relational databases it purports to replace. On top of this, any interesting application of the data besides basic CRUD is either slow and a total pain in the ass or just not possible.

SQL isn't going away because it's valuable to exploit relationships between data and because it's pretty good at doing that. Anything that tries to replace it, eventually looks suspiciously like it. (I'm looking at you, MongoDB aggregate functions)

The object-relational mapping problem is a perennial question so I think it is great to have an interesting survey of the topic even if it is a little old.

The thing is that the object oriented programming model and the relational database model are both good models. However, the relational model is a logical model specifically intended to be independent of the particular structure used to store data (presentation independence: it uses only a few relations so as to allow data to be presented using any logical conditions on those relations) while object oriented programming is primarily focused on creating specific storage structures (it's not a specific model but a set of tools for build whatever-the-heck). You can create a limited definition of object this is compatible with the relational model (see [1]) but such definition more or less neuters OO as a tool chest for building any structure that tickles your fancy(such a limit can be good or bad depending on the context).

[1] For example, the daunting Third Manifesto (TTM), by Hugh Darwen and C.J. Date http://www.thethirdmanifesto.com/

However, the whole point of object encapsulation is to provide that independence, right?

I think the issue is how that independence of interface is maintained. In the relational model, the question is one of transparency in internals and set (or more properly bag) operations, while in objects it is defined interfaces and encapsulation.

It seems to me that's the problem, but you can bridge them by creating stable, defined interfaces (updateable views for example).

"However, the whole point of object encapsulation is to provide that independence, right? ...the issue is how that independence of interface is maintained."

OK, a different way to put it is that the relational model provide one kind of independence and object orientation provides another. The way that a foreign key column is not OO encapsulated but does reference a different column allows SQL expressions that aren't dependent on which value is the "real" value.

"you can bridge them by creating stable, defined interfaces"

Yes you can. At any one point in the development of an application, you can pick a mapping and say "no problem", this is true.

The problem is that if you develop an application further, have an object return an object instead of a value say, then the problem of how to change you mapping to reflect the development is hard, harder than one would think. It's not the snap that's hard, it's the evolving application that uses OO and relational models that becomes weirdly hard, where the "impedance mismatch" become evident.

> The problem is that if you develop an application further, have an object return an object instead of a value say, then the problem of how to change you mapping to reflect the development is hard, harder than one would think. It's not the snap that's hard, it's the evolving application that uses OO and relational models that becomes weirdly hard, where the "impedance mismatch" become evident.

I am not so sure that is really the problem though. As I noted, if you do ORM mappings to updateable views, that both simplifies and complicates things. On one hand it means more code to maintain because you have to map physical storage relations to application-facing logical ones.

But on the other hand, it solves the problem you are describing quite nicely because your apps only see an intermediate form of the data. You control this intermediate form and it is independent of how the data is stored.

The real problem I see with ORM usage generally is that instead of thinking about interfaces they think the ORM is the interface, but this means that the application depends on intimate knowledge of how the data is stored, a dependence the relational model is designed to get you out of. That isn't really a problem of the relational model -- as you point out, this independence is exactly what the relational model provides and so if you aren't getting it, it is because you aren't really using the relational model.

Rather it is a problem of trying to eliminate relational thinking from app development.

By "Vietnam" do you think he means "an entity that they tried to crush and demonize but is actually doing kind of ok in 2014, all things considering".

It's a quote from someone else's essay which explains it as a situation where initial results are easy but finishing the job is hard. A strained analogy at best, and the essay is very verbose.

"One of the key lessons of Vietnam was the danger of what's colloquially called "the Slippery Slope": that a given course of action might yield some early success, yet further investment into that action yields decreasingly commensurate results and increasibly dangerous obstacles whose only solution appears to be greater and greater commitment of resources and/or action. Some have called this "the Drug Trap", after the way pharmaceuticals (legal or illegal) can have diminished effect after prolonged use, requiring upped dosage in order to yield the same results. Others call this "the Last Mile Problem": that as one nears the end of a problem, it becomes increasingly difficult in cost terms (both monetary and abstract) to find a 100% complete solution. All are basically speaking of the same thing--the difficulty of finding an answer that allows our hero to "finish off" the problem in question, completely and satisfactorily."

That describes NoSQL perfectly.

I just got the whole NoSQL marketing talk from a developer I am collaborating with on a project. "Its so quick, just connect Angular to Elasticsearch and blah blah blah". So no thought has been given to security (its going to be a biomedical research application). I can foresee lots problems in the future.

I think, as an American, the metaphor carries with it the idea of a situation you think at the outset should be relatively easy but once you get invested in it seems to have no way out until you leave relatively defeated.

For history buffs from Europe, the experience of Germanicus (at least as recounted by Tacitus) in the swamps across the Rhine may have a similar implication. Oh look, we're the Roman army, the most powerful army in the world. Defeating a few barbarians should be easy! But then the actual requirements start piling up, one gets into the swamps, can't move about like one would have hoped, logistics gets disrupted, and eventually it takes outside help even to retreat.....

I think he meant Vietnam as in a total failure on the part of the US

Yeah, the US totally fail at ORM.

Wait... wat?

Perceptions. Just like US did not fail in Vietnam but failed in the public opinion, ORM are doing a good (well, at least passable) job of doing what they are intended to do, but are generally hated.

> Just like US did not fail in Vietnam

In the alternate history where the US did not turn tail and run out of Vietnam like a whipped dog, abandoning their former allies in the south to be utterly subsumed by their enemies from the north, how did the 1976 election turn out. Was it still Carter?

Let's leave politics to a different thread on a different site.

If you don't want politics, feel free to refrain from talking about them.

I think it's a legit reply, are you gonna answer it?

> Just like US did not fail in Vietnam but failed in the public opinion

This is self contradictory. Failure of national morale in war is actually how wars are normally lost, not somehow something different than normal defeat. And its relevant to the analogy, in that...

> ORM are doing a good (well, at least passable) job of doing what they are intended to do, but are generally hated.

If that's true, its because what they are designed to do isn't what the people hating them need done. It indicates a gap between what is available and what is needed; intent may be interesting, but in the end not all that important.

No, he means 'there are no good solutions to the situation we find ourselves in'.

He means that it's a wormhole that you can't get yourself out of, initially because you're too invested to give it up, and going forward, because it's become a part of the application and as such, can't be taken out.

Code comes and goes, data is forever.

ORM is an attempt to bypass proper data design, when in fact more attention should be paid to the data model in the DB than the structure of the code.

Functional programming has a much better impedance match to the relational concept; maybe this will mitigate the issue in the future.

this post goes so deep into architectural aspects, that becomes philosofical and makes us almost forget the not so elevated principle of "get shit done, fast". The ORM that I'm using (Perl's DBIx::Class) never let me down:

* it has model generators

* need an insert into or an update? bang, one line of code

* find_or_create? one line of code

* acess, create, update or delete related records? one line

* it has chained where clauses and hits the db only when it needs to access the data

* want hashes instead of objects in order to send them as a json response in a REST service? one line of code

and many more features. Is there an O/R "impedance mismatch" ? I don't know, I don't care and I don't use it if there is

> I tend to err on the side of the database-as-model camp, because I think objects are overrated.

That approach is really quite fine and wonderful, if you can put all your business logicy awesomesauce into the database layer. But it can make a variety of problems really awkward if you want that business logic to interact with code that lives in a different process... starting with "validating user input on a web form" and going on from there.

Also, I'm curious whether you'd write stored procedures in your database to set up objects for your automated testing suites, when you want to test the business logic in your stored procedures. :P

> That approach is really quite fine and wonderful, if you can put all your business logicy awesomesauce into the database layer. But it can make a variety of problems really awkward if you want that business logic to interact with code that lives in a different process... starting with "validating user input on a web form" and going on from there.

I don't know. We've certainly had our awkward moments with LedgerSMB. However, there are a bunch of things that have to be understood to do this right.

1. Business logic in the db requires you to think carefully about interfaces. Getting the right interfaces all the way down is rather difficult but it can be done with appropriate attention to detail.

2. Error handling is a very specific case of #1. You need to think about notifying the application of an error, not the user.

> Also, I'm curious whether you'd write stored procedures in your database to set up objects for your automated testing suites, when you want to test the business logic in your stored procedures. :P

I just test the stored procedures in transactions that roll back. This is pretty nifty because I can even test write operations on a production db without worrying about polluting the production instance with test data.

For the past few years, I abandoned the ORM and went straight to Stored Procedures using Data Transfer Objects / DAOs. It is no panacea, but has worked well for these reasons:

1. I can grep all of my SQL! No dynamic SQL generation means all SQL statements are known. This alone has been huge to me. Of course it depends on storing the SPs in version control and settling on good naming conventions (eg. always write table.column). 2. The SPs are easily testable and verifiable endpoints to themselves. 3. Small Performance boost of using compiled / validated SQL. 4. No dynamic SQL also means that performance tuning is a snap. I see a bad actor I can track it down easily and fix the problem.

One thing that has made this possible is a convention of naming SPs by tablename_DMLname_[description]. e.g. customer_sel_byemail.

What is "DMLname" in "tablename_DMLname_[description]" ?

A named query, or any other form of named SQL statement.

It could be a view (a select query, stored as a database object) or a procedure (a grouped set of sql statements that might involve changes that are written to the database, possibly a simple insert, or simple update, or simple delete, but potentially much more complex).

DML is one class of SQL statements, and DDL is another. The two together encompass most of your common, practical database operations.

DML is short for data manipulation language as opposed to DDL which is data definition language. So in this case it's including in the stored proc name the type of data manipulation operation being performed by the proc. In the example "sel" is short for select so it's a query operation. You would also have insert, update and delete.

Got it, thank you both.

Datomic purports to solve the O/R impedance mismatch. I attempted to explain this in a blog post: http://www.dustingetz.com/2013/03/26/orm.html

I've never used an ORM, but I tend to associate their use with yer basic CRUD application that people write for money -- i.e. something not that interesting where you wouldn't mind "saving some time" if you could avoid caring exactly what string gets sent to the SQL engine -- or, perhaps, knowing SQL at all, for that matter.

Am I wrong? Are people using ORM's in their "passion" projects? Or are they just a way to take some of the pain (and perhaps insecurity, think SQL injections) out of boilerplate apps?

I gave up on ORMs after a year or so (back around 2006) of ActiveRecord doing the "makes the easy even easier, makes the hard impossible" syndrome someone else mentioned.

Then a couple of years ago, I started playing with a DataMapper/Unit-of-Work ORM (Doctrine2, in PHP-land). My experience, writing a moderately sized warehouse-management application, has been very good.

In Doctrine, you don't write your schema/DDL. Doctrine does, based on your Entity and Association mapping. You write plain-old-PHP-objects, and then map their relations (in XML, annotations, or YAML). Doctrine reads and validates your data model, and generates DDL statements to create the schema. Big surprise: it creates pretty much exactly the same schema you'd expect. (And it will generate and manage migrations, too)

Unit of Work is a powerful pattern, too, especially for web apps. You just worry about updating entities in memory. The ORM then flushes them to the datastore in a single transaction at the end of the request (so, by default, you have a one-to-one mapping between http requests and transactions in the RDBMS).

Datamapper/UoW is probably overkill for CRUD stuff; that's where ActiveRecord shines. But it pays dividends when you're dealing with more complicated state-transitions on your underlying entities, and there are lots of cross-cutting concerns.

I read Jeff's article and the comments here with interest. My only experience with an ORM is Doctrine2 (about 2.5 years) , and I've never really hit any issues. Whenever I've needed to duck out using the ORM in an edge case, its only been a few lines of code and very undisruptive.

In a recent project I manually mapped a large sprawling ancient schema manually using Doctrine2, and still never hit any major blockers. The code just abstracts away over the cracks.

Does anyone have any real world example articles on where an ORM has totally failed, and why?

I use SQL-Alchemy to help provide portability between SQLite and MySQL. That way I can run my apps in standalone mode or as part of a "real" deployment (nginx, uwsgi, python-app, mysql). SQl-Alchemy definitely helps take some of the pain out of that arrangement (although certainly not all of it). For some tasks I use the ORM layer of SA and for other tasks I just use their SQL Expression Language.

I also appreciate all the work the SA devs have done to help mitigate SQL injection. I recommend using the SQL Expression Language without the ORM to help mitigate SQL injections (following from defense in depth). The oursql connector also helps with that as it supports real parameterized queries where as some of the other connectors do not.

I use an ORM that does basic mapping but let's you write your own SQL to cover situations that mismatch (still contained in a data layer of the app). It basically just saves a ton of time and let's me write a REST Services in about 2 minutes that power mobile apps or provide public/private APIs. I've used it for all variety of apps.

I certainly do care what strings get sent to the Database though, I can tell you that! If you use an ORM and don't know how to inspect the queries it generates then I'd consider that to be somewhat negligent. That's just me, though.

I don't feel like I experience this problem in Django. But this might just be because I programmed database-driven apps pre-ORM, and as such I am predisposed to making my object model more relational to begin with. Or perhaps it's the case that using Django's model layer is really just an OOP veneer on SQL. In any case, I don't feel like I've had a major pain point here.

The real problem is more general.

Relations are easily persisted because they only have fields meaning they are easy to persist (store).

Objects are not easily persisted because they have methods meaning you are not always assured that the object is in a safe persist-able state (unless they are value objects).

It would not have had to be the vietnam of computer science. There are essentially two approaches at interacting with RDBMS:

- "Top down", domain-model centric, hierarchical, object-oriented, etc.

- "Bottom up", relational-model centric, relational, etc.

We've also nicely summarised this on our page here: http://www.hibernate-alternative.com

ORMs solve the first approach pretty well. ORMs have never claimed to solve the second approach. But people are lazy and constantly looking for magic bullets, which is why they are trapped by thinking that ORMs should also be a solution to the second approach.

But even Gavin King always stressed the fact that one shouldn't use Hibernate for everything.

I think people have forgotten about ZODB, an impressive concept in the early 2000s. But sadly it didn't take off. Slightly off topic here but worth looking at.

What's changed since 2006?

At least from my knowledge:

the rise of Rails and ActiveRecord

a multitude of back end JavaScript technologies (node, et.al)

NoSQL databases have become more prominent and widely accessible.

How has the Author's opinion changed?

I could submit a link-baity article from the 90's decrying object oriented programming as a pointless dead end in computer science, but I doubt the original author would hold the same opinions

Can't speak for the author's current opinion, but he did voluntarily start a Rails app recently:


The problem is stunningly difficult and widely believed (but not proven) to be intractable.

However, there's good work advancing the state of the art in this field. Look at Object Role Modelling implementations like ActiveFacts ( https://duckduckgo.com/?q=ActiveFacts ).

I don't see how posting links to to old articles (2006) with sensationalist titles adds value to a _new_s site. Sorry.

Hacker news isn't just a news site. That's one half, but the other half is the discussion we bring to the table. I assume diffix thought that the article was an interesting topic starter.

I like to see good articles reposted — even though the site's name contains the word "News", I feel like the site would clearly be less good if it only took submissions of articles that were less than a week old. But, yes, the title of this article is awful and distracting from its (imo worthwhile) content.

Well, someone thinks the article was important enough to give it a catchy title, so that somebody else could leverage upon it.

It was old, but it at least pointed out a theme that still holds some validity.

I like getting injections of knowledge and themes, that one should be aware of, and from the replies and comments here, it seems that a lot of other people feel the same way too.

I got fucked by using .NET and Linq some years ago, so I found it to be an interesting read. Nowadays I use Objective-C and core data, and that is a much more pleasant experience, as you can see from the comments. :)

"Those who cannot remember the past are condemned to repeat it."

George Santayana (1863-1952)

As long as we suffer from ORM, we'll suffer from flame-wars :-)

To be honest, I've never really understood the point of using an ORM. Sure, I guess it would be handy if you needed to migrate your data from one RDBMS to another, but other than that, I find it much easier to write SQL (with parameterized queries when applicable).

In my case it just saves a lot of time on "plumbing" code. Having DB entities represented consistently as objects is helpful for doing all kinds of things. - having events fire on save or delete, hooks for business-logic validation, etc. boring stuff perhaps but time-saving and keeps things really well organized.

My ORM of choice doesn't take away the ability to write raw SQL when you want, though. I wouldn't want to use an ORM where you couldn't extend it with your own SQL.

This is certainly intriguing. .NET has moved steadily in the opposite direction with Linq-to-Sql classes, followed by the Entity Framework which is now the default. I wasn't even aware that this school of thought advocating the opposite direction existed.

My first introduction to ORMs was hibernate. I immediately loved the idea but to be honest it can be extremely frustrating to get things done - especially if you are good at writing SQL already. It can feel like handcuffs!

I kinda suspect that started the whole anti-ORM thing in response to the ridiculous complexity of some frameworks.

I still use ORMs but I've decided that it has to be a light approach that doesn't try to totally hide the SQL code from you.

After I moved to Clojure and start to deal with raw data instead of objects I remember all the pain of ORMs from my Java days… Jeff is right: objects are overrated.

ActiveRecord has done it about as well as I've seen, by just bailing to strings when it got hard. Worse is better wins again.


Check out the Db library.

From my own experience, if you use an ORM, that is a strong indication you might have been better off with a NoSQL db like Riak. Consider your access patterns and partitioning requirements.

I don't think that's remotely true. People tend to use ORMs for much more superficial reasons like not wanting to manually build SQL query strings (which is valid, as this is error prone, but I still maintain that it's superficial) or because the environment provides them. I've seen many apps that are extremely well suited for a relational database but function terribly because the ORM doesn't let them get at an actual relational model.

So you just proved my point. Just because the ORM is there doesnt mean you have to use it. There are libraries that generate relational SQL without ORM!

I don't think so? I agree that just because the ORM is there doesn't mean you have to use it, but where I disagree is the idea that people incorrectly using an ORM means they should be using an object or document database. They should use a tool suited to their job, but often they're using an ORM because it's been presented to them as a path of least resistance and a silver bullet.

You can't tell anything about what tool is appropriate for the job they're trying to do from their lack of investigation.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact