Hacker News new | past | comments | ask | show | jobs | submit login

The divide is fundamentally about choosing what's in charge of your system, the system being composed of your databases, your applications, and your supporting infrastructure (your scripts, your migrations, etc.) To relational database folk such as myself, the central authority is the database, and our principal interests are what ACID exists to provide: concurrent, isolated, atomic transactions that cannot be lost, on top of a well-defined schema with strong data validity guarantees. To us, what's most important is the data, so everything else must serve that end: the data must always be valid and meaningful and flexible to query.

The side that argues for ORM has chosen the application, the codebase, to be in charge. The central authority is the code because all the data must ultimately enter or exit through the code, and the code has more flexible abstractions and better reuse characteristics.

The reason for the disagreement comes down to disagreement about what a database is about. To the OO programmer, strong validation is part of the behavior of the objects in a system: the objects are data and behavior, so they should know what makes them valid. So the OO perspective is that the objects are reality and the database is just the persistence mechanism. It doesn't matter much to the programmer how the data is stored, it's that the data is stored, and it just happens that nowadays we use relational databases. This is the perspective that sees SQL is this annoying middle layer between the storage and the objects.

To the relational database person, the database is what is real, and the objects are mostly irrelevant. We want the database to enforce validity because there will always wind up being tools outside the OO library that need to access the database and we don't want those tools to screw up the data. To us, screwing up the data is far worse than making development a little less convenient. We see SQL not as primarily a transport between the reality of the code and some kind of storage mechanism, but rather as a general purpose data restructuring tool. Most any page on most websites can be generated with just a small handful of queries if you know how to write them to properly filter, summarize and restructure the data. We see SQL as a tremendously powerful tool for everyday tasks, not as a burdensome way of inserting and retrieving records, and not as some kind of vehicle for performance optimization.

At the end of the day, we need both perspectives. If the code is tedious and unpleasant to write, it won't be written correctly. The code must be written--the database is not the appropriate thing to be running a web server and servicing clients directly. OOP is still the dominant programming methodology, and for good reasons, but encapsulation stands at odds with proper database design. But people who ignore data validity are eventually bitten by consistency problems. OODBs have failed to take off for a variety of reasons, but one that can't be easily discounted is that they are almost always tied to one or two languages, which makes it very hard to do the kind of scripting and reporting that invariably crop up with long-lived data. What starts out as application-specific data almost invariably becomes central to the organization with many clients written in many different languages and frameworks.

We're sort of destined to hate ORM, because the people who love databases aren't going to love ORM no matter what, and people who hate databases will resent how much effort they require to use properly.

This is a fantastic comment, you need to post this as a blog post, and then submit to HN. This way I and many others will never have to rehash this again, just point back to it.

> We're sort of destined to hate ORM, because the people who love databases aren't going to love ORM no matter what, and people who hate databases will resent how much effort they require to use properly.

Speak for yourself. I love databases (note the plural form) and love ORM. ORM is a godsend for developing application that has to work against different databases (postgresql, mssql, db2, etc).

To me it sounds like you're talking about the argument of who has the responsibility of applying business rules, the application layer or the database layer - which is another entirely valid argument in itself, but distinctly different than the argument of whether to use an ORM in your application layer or not.

To a database person, the database is all about business rules: what we recognize as a "thing", what we record about these things, and how they relate to other things. ie, what are the facts about the business, and how do we reason about them?

Any conformant client code then must honor these rules, and oftentimes that means it must re-implement them, which is an acceptable cost if we have decided to use an RBDMS in the first place.

Now it's true, a given database may only implement a subset of all applicable business rules--maybe some fall outside the scope of the database, maybe it's preferable to offload some to a trusted client, maybe the business and database model have drifted apart over time, and no one wan't to overhaul the database model due to all the dependencies involved.

That said, any rules that the database does implement is a good thing, especially those simple rules that can be implemented as constraints. And it's good because then you can program against them, from any client, from any code, inside the database and elsewhere, and you can make guarantees about what possible states the data could be in. This is generally a useful thing.

Agreed. If the costs were equal, I would always implement business rules in the database. I doubt many people would disagree with that. The problem is, it takes longer to write and debug. However, despite this, I think there is a certain subset of rules the belong in the database without exception - constraints and things of that nature.

Would Reddit be a good example of this, where they use a relational database as a key/value store, don't use an off-the-shelf ORM and still depend on the application layer for all the business rules?

I admit I have no statistics, but it's been my experience that most places choose between a highly OO model + ORM and a highly relational model without.

> it's been my experience that most places choose between a highly OO model + ORM and a highly relational model without.

My experience has always been a highly relational model, ORM or not, and business rules enforced in app layer or DB (or a mix of the two). I've always seen them as distinctly different decisions.

Personally, in the past I was always a "rules in the app layer" guy, because of the many advantages of doing in that way, but as I get older the more difficult but guaranteed correctness of implementing in the database is becoming more appealing (especially if it's not me that has to actually write the code!!)

Is it Man or Woman important for successful marriage? Your comment sounds similar to this :) I do not hate either. ORM or not, developers dealing with db data needs to understand how it works. If they do not, they may still come up with working application but when performance issues come up, they become deer in headlights.The same argument applies to ORM or any other technology. Long time Hibernate user here.For me, Martin Fowler's article hits the nail.

"To us, what's most important is the data, so everything else must serve that end"

Surely this is a flawed world view!

Let me pitch you this scenario. You run Facebook, and you have all the software and all the data. A catastrophe occurs and you lose everything, and due to a mistake in the way backups were made, you can choose to restore the software or the data, but not both. (Somehow, restoring the software will destroy the data, and restoring the data will destroy the software). Which one do you choose?

Although that makes for an interesting conversation, it is a red herring / strawman / false dichotomy with respect to this discussion.

> To us, what's most important is the data, so everything else must serve that end

To you, yes, and I don't fault you for defending that perspective. But the real master who must be served is maximizing "profitability" while maintaining an acceptable level of risk.

Anyone from either side of this argument who ignores the very real advantages from the other side, or the risks from their own side, are the only ones who are totally wrong. (Which would make the author of the original article the one that is most "wrong" in this discussion, as far as I'm concerned.)

I don't disagree with you. The point of the hypothetical situation is to dislodge a naive programmer from the sense that the code is the most important artifact and the database is just storage, a servant. I agree with you and jshen, that reality is nuanced and turns on the business, the system as a whole.

"To us, what's most important is the data, so everything else must serve that end"

This is ideological, right? What's most important is the business. Anyone that starts from the assumption that everything, EVERYTHING, must serve the end of the data, is wrong. Right?

We can make up interesting dilemas all day. How about this one. There is an optimization that facebook can make which is shown to increase monetization by 10%, but it creates soem risk of data corruption. Engineers estimate that it will corrupt 0.01% of facebook posts. Do you choose a 10% increase in monetization, or does everything have to serve the end of data integrity?

Yes, it is ideological. But you're right, at the end of the day, it's the business that matters. Developers are going to have guiding principles and it's not a bad idea to evaluate them every now and then; I hope my hypothetical at least illustrates why the alternative opinion exists. I quite like your hypothetical situation, it's the best retort, and offers a great motivating example for the rise of NoSQL, where ACID compliance simply doesn't make as much business sense as scalability with less validity.

"I don't believe in hypothetical situations" -- Kenneth the Page, 30 Rock.

Hm, that's a great question, even outside the scope of this ORM context.

I would probably pick the data. It's what can be monetized and it's impossible to regenerate.

Having to recreate the software might even be beneficial in the long term if your engineers are careful enough to avoid second system syndrome.

Your response is long, but altogether too subjective, philosophical, and tautological.

What matters is consistency, usability, and agility. Throwing ORMs out the window will give you as much consistency as you can squeeze out of an SQL server, but will greatly reduce your agility. Using an ORM for everything will greatly increase your agility but will reduce your usability.

As in everything, there is a balance. People who fall on either side of that balance need to back away from the pulpit and rethink their stance.

Did you read the parent post trough the end? To quote: "At the end of the day, we need both perspectives. "

But the truth is the days of DB being king are almost gone.

These days you just don't hear about DBAs at all any more. You used to see constant jokes about DBAs being a pain in the ass and stopping programmers doing X or Y. ORMs going to win because there aren't enough of you left. Stored procedures, triggers, etc. are going to be viewed as ancient technology back from the days of yore when people didn't understand how to code properly.

At the risk of responding to a troll, being a DBA is far more than "Stored procedures, triggers, etc.". The relational database is still around (I notice you talk about databases as if all databases are relational, which is false), and will remain for decades to come because relational theory is sound and has proven to work for most use-cases.

The database is where you store your data. If you have data of which its integrity is critical to your organization, a properly designed and maintained database is going to save a lot of hastle.

I believe that databases will remain important, and maintaining data will always involve restrictions on how you can use it. Restricting data is not a relational database problem - it's more often than not a business constraint. Often times you don't want programmers doing stupid things with your data :-)

My experience seems to be different from yours. I am a working DBA and I have friends that are working DBAs and we generally do not have a hard time finding work.

I incidentally have stopped programmers from doing X or Y, but it was because the right answer was Z.

As for ORM's winning, I don't think its a war, For some things I use and recommend ORMs, but for others I recommend using pure SQL.

> Stored procedures, triggers, etc. are going to be viewed as ancient technology back from the days of yore when people didn't understand how to code properly.

You may be right about perception, but nearly every system I've worked has contained a big ugly mess somewhere because the author didn't know how to use a SQL DB properly.

I'm not sure where you're working that there are no DBAs..not an enterprise shop.

Honestly, every time I see how badly Facebook handles data and caching, I can't help but wonder why they don't use a real data store and DBAs.

I would be interested to know what you mean by "real data store", and what you believe Facebook does badly at in terms of handling data and caching and how it could be improved.

(I am an engineer/developer/whatever at Facebook, and I'm always interested in hearing the perception of the company's technology from the community.)

I have two questions:

1. I've always been under the impression that for what Facebook does, a traditional RDBMS simply cannot handle the scale (like, not even close). Is this correct?

2. I'm also under the impression that due to the architecture Facebook runs on, from time to time some lesser-important data (ie: a status update or comment) can be lost (temporarily or permanently) and this is not considered unacceptable. (It seems perfectly reasonable to me for this particular use case.)

Perhaps it is best for the database team to talk about that themselves - wouldn't want to put words in their mouths. They gave a Tech Talk in December last year, which you can see at http://livestre.am/1aeeW

With respect to #1, I believe they use a heavily sharded MySQL cluster for at least part of their work. But I also believe you're right about #2.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact