Hacker News new | comments | show | ask | jobs | submit login
What ORMs have taught me: just learn SQL (2014) (wozniak.ca)
379 points by bsg75 303 days ago | hide | past | web | 295 comments | favorite

Ten years ago, there was a blog post every other week bemoaning ORM's. Ten years ago, those posts often had merit.

In 2016, this sentiment is outdated. A few points:

1. If you think that using an ORM means you don't have to learn SQL, then you're going to have a bad time. This is where most of the bad press originates... from people who never really learned SQL or their chosen ORM. An ORM provides type checking at your application layer, and typically better performance (unless you plan on hand-rolling your own multi-level cache system). But you still must understand relational database fundamentals.

2. If you're not using an ORM, then you ultimately end up writing one. And doing a far worse job than the people who focus on that for a living. It's no different from people who "don't need a web framework", and then go on to re-implement half of Rails or Spring (without accounting for any CSRF protection). Learning any framework at a professional level is a serious time investment, and many student beginners or quasi-professional cowboys don't want to do that. So they act like their hand-rolled crap is a badge of honor.

3. Ten years ago, it was a valid complaint that ORM's made custom or complex SQL impossible, didn't play well with stored procedures, etc. But it's 2016 now, and this is as obsolete as criticizing Java by pointing to the EJB 2.x spec. I can't think of a single major ORM framework today that doesn't make it easy to drop down to custom SQL when necessary.

> 2. If you're not using an ORM, then you ultimately end up writing one. And doing a far worse job than the people who focus on that for a living. It's no different from people who "don't need a web framework", and then go on to re-implement half of Rails or Spring (without accounting for any CSRF protection). Learning any framework at a professional level is a serious time investment, and many student beginners or quasi-professional cowboys don't want to do that. So they act like their hand-rolled crap is a badge of honor.

As a professional cowboy -- I detest layers; if it's running on my server, I need to support it, even if i didn't write it. An ORM or a web framework adds thousands of lines of codes and usually two or three layers of indirection. It's way less effort to just output some HTML, or just make a database query, than to learn a framework or ORM, and as a bonus it's usually faster too. Also, it's a lot easier to fix slow queries when there's one place that has the whole query (with placeholders where possible -- I may be crazy, but I'm not insane).

I'm currently at a company that is not using an ORM - what has happened is that developers have written endpoints with inconsistent data formatting for similar data types, which prevents the possibility of creating abstractions cleanly on the client. It would have been nice to have a lightweight ORM, if only for object consistency.

I've been at companies that did use ORMs, and still ended up with that kind of mess.

No tool is ever a substitute for discipline.

> No tool is ever a substitute for discipline.

My experience is exactly the opposite. Humans are too fallible for discipline to ever be reliable. If you want something done, automate it.

In my experience you're both right.

The enterprise wants everyone to be a jack of all trades and to understand everything. Part of the reason for the massive layering that takes place is so that things are compartmentalized and easier for an individual to pickup and put down as they're shuffled around from project to project. Standardization, documentation, consistency, and discipline are essential for enterprise developers to be able to quickly grok a project and make the necessary changes.

Automation also plays an important role because it allows less skilled and knowledgeable individuals of lower pay grades monitor the blinky lights and only call on the big boys when things fail.

ORMs let you structure your queries in a way that allows the app to understand them and analyze them, for sharding purposes. True, you can write SQL and then parse it yourself in a proxy, but why?

If your ORM allows you to drop down to SQL, which the parent claims all modern ORMs do, then you will still need that SQL parser in your ORM.

You can have a small subset of your queries in raw SQL and if you really need to shard them, you mark them up with structured metadata. But the rest would be automatically structured, analyzed and sharded.

Sure. So you can use the SQL parser from a mature, widely used library, or you can write your own. I know which option I would prefer, even if I was writing every single query by hand (not that I'd ever do that).

> It's way less effort to just output some HTML, or just make a database query, than to learn a framework or ORM, and as a bonus it's usually faster too. Also, it's a lot easier to fix slow queries when there's one place that has the whole query (with placeholders where possible -- I may be crazy, but I'm not insane).

Do you work in a team? Or do you plan to hand over that code to someone else for maintenance at some point in time? If not, then by all means, go ahead. Another advantage of ORM/web frameworks is that they are properly documented and only have to be learned once. I once had the displeasure of maintaining someone elses custom 'framework' with no documentation.

>Do you work in a team? Or do you plan to hand over that code to someone else for maintenance at some point in time? If not, then by all means, go ahead.

Did anything of what the parent wrote sounded like it was worse for teams?

Because toast0 knows what they wrote, but a team member wouldn't. So instead of googling "Flask add cookie to response" you have to start digging through all of toast0's code to see if they even have a way.

At least that's why I would consider working on a team like that much worse.

This is the first link for 'php send cookie' or 'php add cookie'


grep -r setcookie in my code will let you find out if i sent any cookies. Incidentally, this is the same grep I would have to do on a framework, but the framework (pick one) is probably bigger than all of my code.

As an example -- we used to have a WordPress blog; now we have a custom blog that's a total of 3628 lines of php that lives on the frontend and I included the Makefile for deployment and some extra include files cause I don't want to spend the time to consider which ones aren't needed. Content is from text files pushed with another process.

By comparison Wordpress includes 298,643 lines of php. Wouldn't you rather dig through all of my code than all of theirs? (Yes, Wordpress has a bunch of features -- but even when you turn them off, the code is there lurking, and sometimes it still runs)

Someone else from my team fixed up the mobile support and handles the css/javascript -- they were able to just look at the code and do what needed to be done. (The mobile page was broken in our install of Wordpress too, so I got it to parity anyway)

It took me about a week to write it and polish it (three days dedicated, including exporting the content from mysql, two days I was doing other things too), but on the plus side, I don't need to spend half a day to figure out how to deploy a Wordpress upgrade every time they have a security release; there have been 12 security releases since then, so I've earned a day of my life back.

>Because toast0 knows what they wrote, but a team member wouldn't.

They wouldn't know vanilla SQL, HTML, WSGL, etc, --decade old standards-- etc, but they would know some random framework like Flask?

"Custom" framework.. idiosyncratic code. Code that is probably not properly documented, not properly vetted, or maintained as technology evolves. It is not to say it can't be done (e.g. fb, google, ms), but the cost of doing so greatly outweighs the benefits. For example, security concerns of your "custom" framework and onboarding of new engineering into your team. And are you really in the business fo "custom" framework?

>"Custom" framework.. idiosyncratic code. Code that is probably not properly documented, not properly vetted, or maintained as technology evolves.

Well, Flask is 95% just two persons [1] and 5% among 10 others. And the one responsible for creating it and writing 60% of it doesn't even have it as his first priority (has multiple other libs, codes Go, etc).

A lot of widely used Javascript libs are not even that lucky. Heck, GTK+ itself, one of the pillars of Gnome and used by millions, was down to 1 maintainer a few years ago, who openly complained about it (not sure if things improved since).

Unless it's some large, focused project with bit traction, tons of books and courses on it, etc., like React, Angular etc (usually with some corporate backing too), a lot of the things that pass for serious frameworks and libraries are anything but, and a lot is hardly a "community" project either.

Some person slapped together something and thousands adopted it because "frameworks are better than writing your own code". Then the person moves on, or some small team that leads the project decides to rewrite it from scratch to try new stuff because they have short attention spans, and you're left with an abandoned code in your hands, which you haven't wrote yourself, and don't have time to understand fully.

A framework can be good to start with if you're doing stuff close to its confines, but otherwise its how Joe Armstrong described OOP: "You wanted a banana but what you got was a gorilla holding the banana and the entire jungle".

Notice how all major companies, from Facebook and Google, to Twitter and Microsoft and Apple etc have written their own frameworks.

Perhaps their programmers didn't know that already existing frameworks are better? Or were they focused on their actual problems and pain points, instead of lowest common denominator solutions?

[1] https://github.com/pallets/flask/graphs/contributors

But that doesn't really matter, because those frameworks have mindshare. If I have a specific issue that I don't know how to solve in Angular or Flask or whatever, I can search online, and chances are good that someone has run into the exact same issue and solved it. If I'm using someone else's custom-built framework, I have to either consult the original author, which may not be practical, or figure it out myself by digging through the code. For anything that needs to be maintained in the long run, I'd take something written using a popular framework, with all its flaws, over something written from scratch any day.

A purpose built application if built well will not only do the job more efficiently but also be easy to maintain. If you write exclusively attached to a framework it creates just as much work for someone who knows the underlying technologies but not the framework as it would for someone who knows a framework. You'll end up tying your application to that framework. Sure, say you want to quickly hire 20 coders to work on a project - use a framework (or be like Facebook, or Google and create your own...). I've found hiring framework developers is a good path to getting work only in that framework. Devs who are more versed in the underlying technologies have in my experience been more flexible and put out better products for the domain they're working in.

In some cases you are right. If all you need is a simple landing page, with a simple form or two, then by all means - do it.

However, if your application is more complex, you should be careful. Chances are, you will add your own layers. And chances are, they will suck more than some popular framework or ORM. It's not that you make a conscious decision, a plan to do so, usually it's bit by bit. And having seen all the custom frameworks and shitty ORMs (because it seemed easier at some point early in the project), don't write your own ORMs for the sake of the maintainers mental health if not your own.

> If you're not using an ORM, then you ultimately end up writing one.

I disagree with this. A lot of things people use ORMs for are rather easily solved with stored procedures, especially in Postgres where you can write stored procedures in Perl, Ruby, etc. Validations, “fat models”, etc are all managed with SQL easily (and this means you get that functionality from _anywhere you access the database_, not just from your framework with an ORM). For convenient access you can roll 30 lines of Perl to wrap DBI or whatever (I use Perl for most web backends these days) and call your stored procedures in normal syntax with a little metaprogramming.

Maybe this sort of scheme (heavy usage of stored procedures and offload almost everything to the DB) doesn't work for everyone, but I like databases and it works for me.

> A lot of things people use ORMs for are rather easily solved with stored procedures

More like 1 thing. ORMs are meant to make interfacing through the object/relational impedance mismatch easier and through the regular code in your application. Stored procedures do not come anywhere close to this and are usually the same as just calling any other SQL query as you would when not using an ORM.

If you think any majority of what an ORM is used for can be replaced by stored procedures, then you're not really using an ORM for much at all.

I agree with this, but I'd also add that if your ORM is doing a lot beyond the capabilities of your database, it's an indicator of a hacky design in your application. If there's a large object/relational impedance mismatch, then either the objects or the relational DB are a poor fit for the problem you're trying to solve, and an ORM can't really fix that.

If your ORM is just providing a mapping between select/map, where/filter, join/zip, etc., you have a fairly list-of-records-ish functional application and your objects are only nominally objects. The ORM is only providing an interface that lines your types up (and the benefits of that are not to be underestimated--it allows, for example, running your app on different database engines).

But if your ORM is doing more than that, it's usually because your objects have a lot of complicated graph vertices that are poorly represented by tables. ORMs can reduce the difficulty of this object/relational impedance mismatch, but ultimately they can't provide a general solution for it, because the data structure they're sitting on top of doesn't have the capability. ORMs can make your code simpler, but they can't make SQL performant on, for example, directed graph or deep tree structures. Ultimately, they only mitigate the object/relational impedance mismatch, they don't solve it.

Again, as I parenthesized earlier, the benefits of ORMs still aren't to be underestimated. But I think a lot of those benefits can be realized with a simple wrapper around SQL (i.e. LINQ). Beyond that, all ORMs provide is a little fudge factor which lets you get away with some things that SQL doesn't support well, but ultimately ORMs aren't a general solution to those kinds of problems.

> If there's a large object/relational impedance mismatch, then either the objects or the relational DB are a poor fit for the problem you're trying to solve.

Why? You're just making that assumption but the fact is that relational databases and the normalized storage of data is completely different from the way OO languages deal with rich nested objects. And there's nothing wrong with that mismatch because there will always be a mismatch. It's just 2 different paradigms of handling data. All an ORM is doing is giving you a tool to make that translation easier, if you want it. If you really can't have the mismatch then there are document-stores available but in most cases, the O/R mapping is just not a big deal.

> But I think a lot of those benefits can be realized with a simple wrapper around SQL (i.e. LINQ).

That's basically still an ORM. Again why the assumption that just because you have an ORM that anything and everything must be piped through it? The modern ones let you use ORM methods in your code and chain them with custom raw SQL too. ORM usage is on a spectrum, it's not binary and there's definitely no "right" way to use them.

I suspect you're viewing my previous comment as a criticism of ORMs. I tried to make it clear that it was not a criticism of ORMs. It's more a criticism of people who try use ORMs to whitewash a bad design or get around a problem that relational databases can't solve. Relational databases lend themselves to a very specific way of structuring data, and if you don't structure your data that way or your data can't be structured that way, an ORM won't fix it. That doesn't mean ORMs are bad, it means that ORMs have to be used as intended and often people don't use them as intended. That's not the fault of the ORM any more than it's the fault of a screwdriver when it's used to saw wood.

Nobody uses a screwdriver to saw wood. But they're great as replacement paint stirrers and hole punches.

That's my bad ORM misuse analogy.

As a maintenance programmer, it really fries my bacon when I find all the screwdrivers next to the old paint cans.

I think when people talk about using an ORM or not they're mostly talking about using a LINQ-like wrapper or not. There's maybe a conversation to be had about some kind of fancier features on top of your ORM and whether they're worth it, but I don't think those kind of features are what people usually mean when they talk about ORMs (whether they're technically what the acronym "should" mean or not).

> interfacing through the object/relational impedance mismatch

Do we really need the regular code with it's traditional OOP and fat application server to convert result set to JSON and spit it out - that most of the time is all that code is doing actually? I suppose these typical tasks might be pretty much covered not only by PostgREST [1], but with just a sweet combination of two PostgreSQL functions - array_to_json() and array_agg().

[1] http://postgrest.com/

> you're not really using an ORM for much at all.

The ORM can do transient fault recovery (including other cloud patterns) as well as modernise the interface (Micro-ORMs in general).

One problem with the stored procedure approach is that it scales terribly.

If your logic is in app servers and your state in a DB, you can add more app servers, and it will be a long time until your DB get overwhelmed.

When your DB is doing both, the choking point is much earlier.

Look, this is just wrong. All the best-performing research databases use stored procedures because they perform much better than other approaches. Your bottleneck when it comes to validating your data is, generally speaking, your database, so you're not going to scale better by moving the logic farther away from it (and introducing a bunch of round-trip latency) and this is empirically confirmed in benchmarks of stuff like TPC-C. It's a really tired argument, especially when virtually every aspect of stored procedures but their performance is irritating.

That stored procedures scale badly is different from if they perform badly.

With application logic on separate stateless app servers, you can quickly spin up as many of those as you need when your site goes viral. If it's in your DB, you can't.

If the stored procedures are fast or slow doesn't really enter into that equation. It's about the difference between 1 and N.

I actually did work at a startup that got into serious trouble because much of their application logic was in the DB.

Were you serving something that had to live in the database, or were you serving cached copies that didn't actually need to be updated or read within a transaction? There's a crucial difference there. The moment you are required to access the database at all, the equation changes (which is usually true for data validation, inserts, updates or deletes, and constraint management). If all you're doing is reading static data, there's no need to touch the database more than once and you can do much better. Moreover, if all you need is causal or eventual consistency (for application-specific reasons), you can use databases that provide those semantics. In pretty much all cases, once you've chosen the correct database / semantics for your use case, you'll be better off from a performance perspective with stored procedures; 1:N doesn't really enter into it because databases can run with their own isolated state on multiple nodes just as well as applications can (indeed, there are some caching systems built around this observation)!

IME, the win in the application layer is when you can have faster copies pulled from a cache or validate rules that don't relate to other records.

I've yet to see empirical proof that stored procedures don't scale well. If you write an application in your SQL dialect of choice, that likely won't scale well, but putting data access behind an API should scale well no matter if that API is in Rails, Spring, or a stored procedure.

I don't know what you consider "empirical proof", but it seems obvious to me. And should be obvious to anyone competent who has ever had to scale stuff. A system fails to scale when it has a bottleneck, and that bottleneck gets overwhelmed. You make it scale better by scaling the bottleneck, which can be done by moving work out of the bottleneck, or by parallelizing it in some way.

The natural bottleneck for any system that has to synchronize data is the locking around synchronizing that data. That is because things that do not need synchronization can easily be parallelized. You therefore scale that bottleneck until the more fundamental one emerges.

In a standard database driven website, that bottleneck is always in the database. And therefore your scaling limit is the capacity of your database. As follows normal scaling advice, you need to move work out of the database, or remove the database as a scaling limit.

Moving work from stored procedures to the application is an example of moving work out of the database. So is having queries run against read-only replicas instead of the read/write master. Sharding your database and moving to a distributed NoSQL architecture are examples of removing the bottleneck.

Of the two approaches, the much simpler and safer one is to move work out of the database. Going NoSQL is cool, but unless you really know what you're doing, it is both unlikely to buy you what you wanted, and leaves you open to obscure data consistency problems.

When it comes to data validation, none of what you just said applies, because you need to perform your work inside a transaction. You can validate the data outside your database and then confirm that nothing's changed (optimistic concurrency control) but you can do that just as easily inside the database, with lower latency and greater throughput (and in situations with lots of contention this can lead to many more aborts than other concurrency control mechanisms, so be careful!) because many databases have OCC built in. If you can afford to relax consistency due to aspects of your data model, you can use a database with a relaxed consistency model and--again--get far better performance than an ad-hoc solution in your application.

It's hugely unclear to me why you think you skirting transactional requirements by performing work in your application is less complex than using a NoSQL database (or using a database that utilizes MVCC or can otherwise provide long-lived read snapshots).

Frankly, I also disagree that for most websites the bottleneck is the database. For many websites, database latency / throughput constraints don't ever become the dominant factor in end-to-end requests because of all the layers they have to get through in order to get to the database in the first place, combined with a relatively low number of requests per second (commodity relational databases on commodity hardware can easily handle many thousands per second, and IIRC Google Search only had to handle 40k rps from real clients in a recent press release) and inefficient code elsewhere in the stack.

Just put transactions in the application. With row level locking, the odds of running into contention problems are low. If you've been careful to always lock tables in the same order, then deadlocks are a non-issue. At that point, having longer running transactions does not matter. What matters is contention for the internal locking mechanisms inside of the database. And limiting how much extraneous stuff you have improves that.

Now it is easy to screw up an application. It is easy to screw up a database design. It is easy to screw up queries and query plans. But all of those are fixable in relatively straightforward ways. And once you do that, you will wind up with database throughput as your bottleneck.

As for NoSQL, the problem is this. Moving to that architecture requires taking an up front hit on transactional complexity, usually requires several times the hardware (data needs to be stored multiple times for hardware failure), puts a lot of stress on your network and latency, and is really easy to screw up. Just using a popular out of the box solution is not enough - see https://aphyr.com/tags/jepsen for a list of real failure modes on stuff that will look perfectly fine in testing.

It is a necessary challenge to accept if you want to go beyond a certain scale. But you should not accept that challenge unless you have good reason to do so.

The thing you're missing is that unless your validation logic is really complicated or you have a really fast network (think Infiniband, which is still not commodity), it will take less time and fewer resources to perform the validation logic (once the locks are taken, which has to happen regardless) than it does to marshall the data over the network to your application for processing, often by orders of magnitude. That is why more or less everyone trying to win benchmarks at transaction processing uses stored procedures.

FWIW, there's lots of concrete evidence of this beyond the fact that benchmarks almost always use stored procedures. If you look at the very best-performing storage systems, like MICA or FaRM, which achieve 100 million tps or more, you'll see that a huge amount of their optimization comes from circumventing the OS networking stack, taking advantage of built-in queueing mechanisms in NICs, and offloading work to avoid taking up cache lines and cores from the processing CPUs. Many extremely recent database designs also take this approach (separating the transaction processors from the executors) for similar reasons, like Bohm, Deuteronomy, and the SAP HANA Scale-out Extension, as well as deterministic database systems like Calvin. Given that these are literally the best-performing database systems out there, I have a hard time accepting that reducing network latency and doing as much processing work as possible in the database isn't the correct way to achieve high performance.

Using stored procedures also allows for (in theory) analysis of the allowed transactions and conflicts between them, which can boost performance further (e.g., if you can guarantee that they can't conflict with each other ahead of time, or if you can reuse cached data because nothing could have changed, or you might be able to order data in epochs and guarantee that there are no deadlocks, or you might be able to periodically check to see if a record was particularly highly contended and rebalance or split, if commutativity is possible--all of these options and more have been explored in recent database research). It's often difficult for an application to guarantee this because several web servers may be contacting the database at once, and they don't all know what the other web servers are doing, and this is doubly true if you allow ad-hoc queries. Not knowing whether a transaction will finish quickly (as is the case if an application is allowed to hold locks) also greatly increases chances of contention and/or disconnection, which can lead to further issues.

To be doubly clear--I think stored procedures are a massive pain in the ass and there are lots of good reasons to do exactly what you're proposing above. But performance is not one of them.

I wrote https://news.ycombinator.com/item?id=11989138 while you were writing that, and addressed your point there.

Performance and scalability are very different things. Stored procedures are great for performance, but bad for scalability.

Please see my response there. It's not as simple as you're implying; often your throughput suffers if you can't handle requests fast enough. MICA, for instance, injects packets directly into L3 to avoid cache misses; it's important that MICA handle these requests as fast as possible because otherwise the L3 cache will fill up, the queues will start to back up, and ultimately they'll put backpressure on incoming packets and slow down the whole system (greatly decreasing throughput). The fact that it has tail latencies measured in microseconds is good on its own, but its real benefit is its effect on throughput!

I believe the poor performance of database is often caused by people who are new to the subject writing applications in such way that they do N+1 problem.

I think we're in agreement writing applications in the database isn't a good idea.

By "empirical proof", I mean that I have never seen anyone conclusively prove that switching data access from stored procedures to ad hoc SQL through an application was the cause of performance improvement. I've seen proof that removing complex application logic from the database fixed a problem, or that better indexing improved performance, but I've never seen anyone prove that removing stored procedures alone fixed the problem.

Further, there's nothing in your proposed solution of scaling out with readable replicas that precludes the use of stored procedures for data access. As a huge fan of both databases and separation of concerns, stored procedures make tremendous sense. They let a domain expert tune the database as needed. Even going so far as allowing that expert to re-write queries, provide optimizer hints, or even change the physical data model for better performance without ever changing application code.

Although I may or may not be competent, I have scaled "stuff" in a database. But I do appreciate you getting a vague ad hominem into the first sentence. Bravo.

The typical bottlenecks that I found were almost always IO - either through crappy storage, crappy indexing, or some combination of the two. Modern databases typically aren't bottlenecked by the lock manager.

I'd also agree that moving work out to a NoSQL database is particularly tricky. For three years I maintained the .NET Riak client and helped developers make better decisions when they were considering moving away from an RDMBS.

Scalability IS NOT performance.

I like to say that scalability is like a Mac truck. Good if you have to move a lot of stuff, but not necessarily the best tool for getting your groceries. Moving logic from stored procedures to the application adds latency and network traffic. This is never going to be good for how fast you process an individual request. However it can let your system handle more requests per second.

That's only true if there aren't resources taken up by your in-flight transactions, and if those resources don't exceed the cost of just running the logic in-place. Otherwise, you're not sacrificing latency for throughput, you're just sacrificing latency and throughput. Unless you have a very heavy application-level thing to do that doesn't require periodic new access to data, that's usually not the case, especially if you're trying to process requests at line rate (NIC queues will get exhausted just trying to hold all the pending requests in memory, and you'll start to suffer from lock manager contention, page buffer swapping, etc.).


But my experience with Oracle specifically is that handling connections is just general overhead, while the real failure modes have to do with overloading random latches somewhere deep inside of the system. I also found that logic in stored procedures caused us to hit limits faster than leaving it in the application.

The specific case I saw this in was a simple set of queries to test if you were in an A/B test, and if not to assign you to a random variant. After I left my application logic was moved into a complex stored procedure, and they then had scalability limitations that they didn't have before. For political reasons they declared success and ran fewer A/B tests...

(I've used a lot of other databases as well, but Oracle is the only one I've really pushed to its scalability limits.)

The database is usually the bottleneck because of IO, not because of load on the actual database in my experience. Assuming you know how to design and index a database.

That's a valid point. Seeing as I don't build anything that has to scale well I can't really comment on this, but I don't think it scales too badly. With stored procedures it's more or less just your model in the database, not your whole application logic. Using a database to replace your controllers wouldn't scale so well, but replacing fat models would still scale fine.

It significantly depends on the application, but it shouldn't be more than a factor of 5-10 or so...

A factor of 5-10 can matter. A lot.

Many applications don't need Facebook/Twitter scale and will never have this problem.

How do you manage version-control on stored procedures? Can they be checked in with the rest of the application logic?

That's the way I've done things in the past -- save stored procs in a file in a project repo, along with deploy and rollback scripts.

In Microsoft land we have SSDT and stored procedures live in source control right next to the schema and any CLR functions. The whole thing is built, compile-time checked, and deployed. It's all pretty wonderful.

I myself am wary of stored procedures except in very specific and uncommon circumstances. That said, you could absolutely version control your stored procedures by creating them from within database migration files that are version controlled by default.

What is the source of the wariness?

Don't know about the parent, but I always felt that stored procedures end up by incorporating a good share of the business logic, extracting it from the main code of the application. This creates a messy situation in which you have your business logic split up between two completely separate and different systems, one of which (the stored procedures) is much harder to read, write, maintain and test. That said, I also deeply hate ORMs, I much prefer to use thinner layer to interface with a database (query builders and row mappers).

That is normally considered a good thing other wise different applications accessing the same data have to roll there own - do you really want multiple versions of biz logic.

eg a large organizations like a bit telco may have multiple applications that update customer records.

Should they do that through a single service api instead of directly hitting the database?

Yeah, the proper way to be doing that is via a middleware api that can be maintained and versioned by it's own team, rather than coupling all these applications to a very specific database. If you have 4 apps talking to the same oracle database, and you decide to ditch the license, that's going to be way harder to deal with than if you just had one api being maintained by a team who's job is to know databases.

Midleware is the normal term

I also do think putting too much logic in stored procs is a danger, and was/is a scourge for some shops. But was curious if that is the primary objection.

I try to avoid anything but mapping or querying data in sprocs, but sometimes they are a better place to put logic that is needed to ensure consistency. Your database will outlive your application, or you will eventually need to integrate data from a third party system - as a result your database engine is always the last line of defense to protect your data, and if that means using a sproc than so be it. I'll admit, I've written some business logic into stored functions in PostgreSQL, but it's mostly a background job to update cached data because it is much faster to do it in the database than pulling a bunch of records down to a service and sending them back up.

It's fat models vs thin models, just with fat database vs thin database. You have to be careful.

I am working on an app where the original developer has followed a "thin models fat controllers philosophy". I have never seen this referred to as a good thing anywhere else (and Django works far better the other way around). Is there any information on why this might be a good idea / benefits to this approach?

How about the fact that in mysql you need root access to add / drop stored procedures? So my app's installer requires the regular user to have admin credentials to mysql instead of just creating a database for them?

There is hardly any reason to use stored procedures anyway, when your favorite mysql library supports multiquery.

Yes, there are tools to do this. It varies by database server and may need custom integration with your deployment system but they definitely exist.

The alternative is the old StoredProc_v1, StoredProc_v2 etc in the database and have the application check what's available. This allows you to roll the parts independently.

The big advantage with stored procedures is they can benefit from the DBMS caching and applying additional optimisations (depending on the DBMS of course).

Yes. Data warehousing projects I have worked on have lots of stored procedures in with the code.

One way that works is make sure you make a simultaneous tag of your code and stored procedure that works with it. You really want development to run hand in hand with stored procedure development, not as two parallel processes.

I have found Sqitch (http://sqitch.org) to be a useful solution.

> 30 lines of Perl to wrap DBI

In Java it's even easier with JPA @NamedStoredProcedureQuery annotations


But once you have run the stored proc how do you display it to the user? How do you get the data from the UI to the stored proc to execute? You write some code or it happens by magic? If you write some code, then you have just written an ORM.

It's overly reductive to consider any code that interacts with a database an ORM. Broadly speaking, the big claims ORMs have made are 1) the ability to automatically map between an object model and a relational data model and 2) portability between database vendors. #1 falls apart when the object model becomes complex enough to suffer from object-relational impedance mismatch. #2 may work for smaller or simpler applications, but you're generally going to have to take advantage of vendor-specific database features to achieve optimal performance with applications that have anything more than trivial scale or performance requirements. Going with stored procedures generally means giving up on both #1 and #2 in exchange for a more flexible design (relational model can be very different from your object model) and higher performance (at the cost of having to hand-write all your own SQL, of course).

> If you write some code, then you have just written an ORM.

What? No, I've written code to query a database. Not an Object-Relational Mapper.

  my $q = $db->prepare("get_my_stuff_by_name(?);");
or with fancy metaprogramming:

  use DBMagicStuff 'postgres://…';
That's not really an ORM.

You are both partially right. An ORM deals with the black magic of connections, parameters, transactions, etc. If you don't use an ORM, then you still have to deal with those semantics. But I do agree that you aren't writing an ORM.

> connections, parameters

The database driver takes care of that, not the ORM.

(It's also not black magic.)

> transactions

Huh? That's a database feature. BEGIN/COMMIT/ROLLBACK are ANSI SQL.

ORM is an _Object Relational Mapper_. You're confusing it with a database driver (or, in the case of transactions, just a database in general).

In Go I only need to annotate the column name of the struct variable to have it load automagically from DB using just SQL and a little bit of tooling (sqlx). 95% of the convenience of an ORM, none of the problems.

That's because sqlx (and the go foundation) are already a basic ORM.

You could argue that anything that interfaces with a database is an ORM, but it is certainly no ORM in the Hibernate/ActiveRecord/EntityFramework sense.

It does very little "magic" and you're in full control of the SQL from the start.

Sure, let's say an ORM is anything capable of turning normal app code into SQL statements by itself.

What do ORMs do that's magic? What are the problems? Are you not in full control of them? It's your code after all that's using them.

I really don't get how they suddenly force any issues on you that you don't create yourself. The output SQL doesn't really matter if it gets the job done, and in cases it does you still have full control to write it yourself, and even use the same ORM to save you time in executing that.

>What do ORMs do that's magic? What are the problems? Are you not in full control of them? It's your code after all that's using them.


Thanks for these questions. Initially it wasn't clear to me what the difference was either, but now I understand the distinction a little more.

The ORMs discussed in the article map a whole object model to a whole relational model. The idea is that a single piece of primitive data, say a bool flag on an entity, will have one location in the relational model, and one location in the object model, and the mapper's job is to get the data from one to the other and back again.

Hibernate will guarantee things about the object model, for example within a particular session, the entity A which maps to a record with id 1 in the database will be represented by the same object instance, so even if A is referenced indirectly through two different routes, eg, obj.foo.a and obj.bar.a, it would reference the same instance of A. The instance will only be fetched once, can be mutated through either route, and then saved back to the database using the mapping.

That's the sort of thing which is possible with a single mapping, but the abstraction breaks down outside that mapping. For example, questions like this on StackOverflow:


He wants to select part of a database record, not the whole record. The mapping will only be defined for whole entities, so he has to use the complex projection syntax instead of a normal fetch. The top answer suggests mapping the data into a User object anyway, which will result in his code having some User objects will all their properties set, and some with just a subset.

This puts pressure on the programmer to stick to the single mapping. Partial fetches like this could lead to an object which represents a particular user which may or may not still be the same instance, and it may or may not support being saved back into the database.

> Are you not in full control of them?

So, yes, you are in full control, but the benefits come when there is a single mapping between the relational model and the object model. In practice, there isn't a single object model, let alone a single mapping.

The issue with it is that SQL is a bit different kind of language. When using it we don't tell the database what to do, but instead we say what we want. That does not translate well.

Second issue is that tables and their relations don't map too well to objects and their relations.

> If you're not using an ORM, then you ultimately end up writing one.

Only if you assume that Object Relational Mapping is the only way to express type checked database access at the application layer.

ORMs are broken by design; there's no lossless bridge between relational theory and OO -- only inconsistent, error-prone, complex approximations.

Instead of bringing relational theory to the programming language, ORMs try to emulate OO behavior/method dispatch/etc in a world of transactions, deadlocks requiring retry of previous operations, on-disk serialization of data, etc.

In 2016, it's ORMs that are outdated.

This. Ye Gods, this.

A project with nontrivial data relationships should start with the data model, not with a code model superimposed on data. ORM leads to bad data models and performance problems that are unfixable.

If only people would listen to this advice. The number of times I have started a project by trying to think about the data model only to get told "just build the models, we can change it later" drives me insane. Code is easy to change, your data model is not once it is in production.

Only if you use a single, concrete data model. If your domain supports the use of something like event sourcing, your "data model" is surprisingly easy to change. Or in this case, is = are, and data model = data models, and they're actually derived from your domain events, which can be replayed. At any time.

testify Brother/Sister Testify

Yep, for years my bias has been trending to the relational method of modeling data rather than the object oriented way.

The relational model is more flexible -- you can slice data up in any number of ways, rather than through a fixed hierarchy.

The relational model was developed by Date/Codd to address the needless complexity and brittle nature applications built around network or hierarchical databases of the 60s/70s, which are in many ways analogous to today's OO relationships.

More like relational databases are outdated. There was one truly committed effort to bring the database model into the programming language - EJB - and there's a reason it's now a curse word. The successful systems of the past ten years have been those that moved away from the relational model, using simple datastores controlled by application code.

I'd bet that there is more code written in the past 10 years that uses SQL, than that which uses "simple datastores". You don't hear it much, because it just works, all the problems are well known (and so are the various solutions & workarounds), and so it's boring. Whereas, "Company X has adopted No-SQL solution Y for its project Z! That means it works!" makes headlines.

Probably true as far as it goes. But I think that SQL is a smaller percentage of systems now than it was 10 years ago, and that the trend continues to be in that direction.

Only if you believe type theory and typed programming languages are outdated.

I'm not impressed with NoSQL; they have no answer to the limitations of relational theory beyond simply abandoning formalism entirely.

I believe type theory and typed programming languages are very valuable. But if you keep your logic in code and your store as a dumb store then the store doesn't need to know about them.

Marshalling support is useful. SQL result -> struct and struct -> SQL insert code is repetitive to write. Getting fancier than that may be overkill.

Marshalling in general needs more compile-time support. Kludges such as Google protocol buffer preprocessors are a fast but clunky way to do it. It would be useful if languages could be given a reference to an SQL CREATE TABLE and could use that information usefully. Field names, type information, and enum values should come from the CREATE TABLE info.

This is the sweet spot of ORMs. A team lead early in my career wrote a tool that read the database schema and generated all the marshaling code at the application layer. We got a degree of type safety and relief from a lot of boilerplate code but no promises of automatically generating SQL or abstracting away the database schema, etc. It worked so well that I wrote one from scratch in another language a decade later for another database-backed application.

We released a tool, xo, that generates safe and idiomatic Go code from a database schema that does not make any use of an ORM. I have been meaning to add support for generating code for other languages but have not yet had time to do that. That said, it is very easy and very straightforward to do so.

Check it out: https://github.com/knq/xo

xo is great, we've used it a few times for existing databases, great timesaver. Thanks!

Despite not being a huge Go fan myself, I do appreciate this aspect of Go's database/sql and sqlx packages.

One of the problems with learning a web framework is that there are a lot of them, and the effort it takes to find out which one best serves your needs can seem to be on par with rolling your own, without the assurance that you'll actually get your needs met. And I suppose the ORM issue faces the same decision-making problem.

Hand-rolled crap is at least crap that was rolled to suit the problem you're facing, and you'll do it using tools you understand. You're not learning "any" framework, you're learning a bit of them all and hoping that's enough to point you in the right direction. Sometimes it's not, and then you're stuck with a hammer for screws and someone else's code that you don't understand.

I see learning a framework as less about trying to get all the things you need, and more about 'borrowing' someone elses architectural thinking. When you encounter something the framework doesn't provide, you probably have all the architectural tools available to you to create something good that fits in well with the rest of the framework. Also, when you need to move to another framework, you have a base of experience you can draw from to help you learn much faster.

> An ORM provides type checking at your application layer

This. When composing complex queries, we really want type checking and SQL injection safety.

Don't parameterized queries provide all the safety one might need?

Parameterized queries are to prevent injection. They don't tell you, at compile time, when you're building an invalid/unrepresentable SQL query.

> They don't tell you, at compile time, when you're building an invalid/unrepresentable SQL query.

Sure they do, with a proper library. Haskell's persistent library does this very well.

The Persistent library is an ORM. It provides syntax and safety for sql. For example

    delete $ from $ \t -> do
             where_ $ (t ^. TutorialAuthor) ==. 
                      (sub_select $ from $ \a -> do
                                    where_ (a ^. AuthorEmail ==. val "anne@example.com")
                                    return (a ^. AuthorId)){-/hi-}
    tuts <- select $ from $ \t -> do
            where_ (t ^. TutorialSchool !=. val True)
            return (t ^. TutorialTitle)
looks a lot like something you might get with a good ORM. It might be more general and slightly different, but haskell usually does things slightly different.

Looks like Persistent has no support for table joins...bit of a show stopper that, but there's a companion library[1] which looks quite powerful (similar to Scala's Slick wrt to type safety and composability).

Can't say I care for the syntax much (`^.`, `where_`, etc.) but I guess that's par for the course in Haskell given the open world (i.e. module system leaves something to be desired relative to OCaml and SML).

[1] http://www.yesodweb.com/book/sql-joins#sql-joins_esqueleto

Haskell doesn't have OO, so it can hardly be an ORM.

The library in question is modeling relational theory at the language/type level. That's not remotely like an ORM.

Haskell does OO just fine. It doesn't do nominal subtyping, but that's really a misfeature in OO anyway.

That said, persistent doesn't do OO.

Data encapsulation (e.g. via ADTs) does not equal "OO".

I would argue that OO is polymorphism + innate object identity. Though the latter only matters if you have mutability; but then, whether OO without mutability is "true OO" is one of those deeply philosophical questions.

I find the dividing line in formalism; we can always find similarities with language-level features from OO, but OO is by definition an informal set of ideas about how programming is best done and understood, rooted deeply in empiricism, not formal theory and analysis.

If I can model your type/language model as a higher-kinded dependently typed lambda calculus (e.g. using Boehm-Berarducci encoding for non-turing complete recursive ADT definitions), then it's not OO -- it's functional programming, rediscovered.

Existential types are kind of like OO interfaces (allow storing a collection of things that only have in common that they implement the interface), and typeable can be used for casting them back to their base types.

Yes, but that library is using Haskell functions to compose SQL strings at run-time. I should have clarified that by interpolation I meant manually writing the string you send to the DBMS.

No, but your unit tests should catch bad queries and then you're good to go.

You can use that argument against any kind of static analysis/checking.

Unfortunately, ORMs sometimes undermine safety for the sake of convenience. Even ORMs which use parameterized queries (which any sane framework will) may be vulnerable if they build dynamic queries using string concatenation.

No ORM will pass queries using string concatenation? (Right? I know nothing about ORMs written in PHP by beginners that don't know SQL if it jumped up and bit them in the ass... But surely no half-decent ORM would concatenate strings to pass arguments?)

Anyway. Type safe queries like QueryDSL is extremely nice to work with.

But as was mentioned, it all boils down to this: There IS NO silver bullet.

You have to learn SQL, and then the ORM tool. And the abstractions will leak, and you will be pissed of sometimes, but It Is Worth It because you will save a lot of development time.

There are some pain points. Large joins where you'd need to eager-fetch a few one-to-many "leaves" at the end of a huge and complex join is a bit of a pain, as the ORM will need to split the joins for efficiency, and there is no obvious way of reusing the complex part between the calls available to the user of the ORM.

(Like how you could use a temporary table when using Oracle for instance. Nothing should stop a ORM to use that as a join-strategy though, when I come to think of it...)

The other obstruction is mindset. To use a ORM efficiently, the developer need to step away from the data-layer model.

There is no separate data access layer when working with persistent objects. The objects represent the model, and are simply persistent. If they are changed, the change stays.

Preferably, the model should be available to the whole application, and the objects should be changed and used where it is suitable, not restricting access based on that they will trigger a database access. (The important thing should be to keep the model and it's rules together - in some sort of abstraction, not that some things happen to write to a database.)

> But surely no half-decent ORM would concatenate strings to pass arguments?

I wish... Drupal had an issue in ORM itself. https://www.drupal.org/SA-CORE-2014-005

>And the abstractions will leak, and you will be pissed of sometimes, but It Is Worth It because you will save a lot of development time.

Only on some languages. As I mentioned on another comment, on Go I'm very productive using simple database/sql + sqlx. On C# I could die writing mapping boilerplate before getting any business logic done.

Does database/sql check syntax at compile time? Or do you just try to unit-test you out of that?

No, but even though Entity Framework (as an example) does check types at compile time it doesn't prevent you from making mistakes that will cause runtime errors (oh the number of times I have run into issues trying to do simple things like ThingsToCheck(x => x.Created <= DateTime.Now.AddHours(-1)) which you would think works, after all it compiles, but it will just throw a runtime error because LINQ to Entities doesn't know how to parse the expression tree).

Type-safe ORM's don't preclude integration testing, end of story.

No ORM is building query bits from user input with string concat.

I don't need a full O/RM to provide query composition, type checking and protection against sql injection. SQLAlchemy core provides all of these (you can use the ORM if you really want, and I have done so, but when I am pulling 16K records in the whole object mapping layer becomes expensive when a simple tuple works fine).

The problem is that doing queries over relational databases in a type-safe way requires either 1) some form of structural typing (for projections), or 2) ignoring projections altogether, which brings a slew of perf problems to the table.

Ironically, this means that OCaml is probably the best [reasonably popular] language to use an ORM in.

> 2. If you're not using an ORM, then you ultimately end up writing one.

There are other ways, like when using "event sourcing", or servers like postgrest [0] that give you a REST-api on top of your database.

ORMs are mainly useful if you have objects to create, read, update and delete.

[0] http://postgrest.com/

>If you're not using an ORM, then you ultimately end up writing one. And doing a far worse job than the people who focus on that for a living.

I don't understand why people keep repeating this myth. I do not use an ORM. I did not write one, or anything resembling one. I have absolutely no problems with accessing my database from my applications.

Same here and after heavily testing, the old school simple approach is much faster. ORM, you never know how many lines of code are involved.

It's 2016, and this article from 2014 has very valid points.

Maybe read the article first? The headline isn't a complete summary. You wouldn't have written point 2-3 otherwise - have you read what he wrote about DB schema in the article?

> 2. If you're not using an ORM, then you ultimately end up writing one.

Why? That presumes that you are using objects, doesn't it?

I like to say "ORMs make easy thing easier and hard things harder".

What i mean by that is, any simple CRUD operations are much easier in ORM. The hard things, i mean any complex queries that need more than one join you are probably better of writing yourself.

In the end i prefer to do inserts, updates and deletes with ORM (or some other database abstraction tools) but most SELECTs i write myself, fetching exactly what i need and mapping result to objects if needed manually.

I pretty much do the same. The CUD is ORM, and some simple selects in C# Linq, but as soon as the query is a bit more complicated I crack out the SQL. It does not have to be all or nothing.

Do uou use metadata from the db driver to do type mappings?

Not sure what you asking about. Do you mean if I define database structure in my code? The answer is: yes. I don't know about any database abstraction library that would provide you with convenience methods to query database with type checking and stuff without giving it the schema.

Not this again. Why is this coming up at all in 2016? There isn't even a valid debate here.

ORMs are a tool, that's it. The relational operations of SQL and the object-oriented (or functional) logic of your application code are usually very different and it's nice to have a mapper that lets you interact with your app's language while it takes cares of automatically mapping it to SQL.

For 99% of database ops where it's pulling some stuff out, editing some fields and then saving it back, an ORM will save you a massive amount of time with better safety, security and performance. Even complicated queries/mappings/procedures run pretty well and you can always write raw SQL when you need it. Modern ORMs will even let you execute custom SQL and still give you conveniences like easy parameterization and hydrating the results back as objects.

Maybe I've been spoiled with the .NET ecosystem with great ORMs like NHibernate, EntityFramework, and Dapper (along with C# features like Linq) but it seems like most of these complaints are from people who use shitty ORMs or can't comprehend that it's just a tool that they can choose to use or skip, at very granular method by method level. There's absolutely no reason for any extreme here.

> ORMs are a tool, that's it. The relational operations of SQL and the object-oriented (or functional) logic of your application code are usually very different and it's nice to have a mapper that lets you interact with your app's language while it takes cares of automatically mapping it to SQL.

Functional and relational models work really well together.

At Standard Chartered we even went so far as to add relations as a datatype to our Haskell-like language. It's a charm; and comparable for me to my experience first going from C-style arrays only to eg Python's dicts.

Only that this time the dict-style data organization was the `before'. Dicts are essentially equivalent to hierarchical databases, a model whose flaws relational databases were invented to address.

> At Standard Chartered we even went so far as to add relations as a datatype to our Haskell-like language. It's a charm; and comparable for me to my experience first going from C-style arrays only to eg Python's dicts.

Have you written about this? It sounds really interesting.

I haven't written about this yet, alas. There's a bit of information at https://www.reddit.com/r/haskell/comments/2u0380/slides_from...

Could you provide some examples?

Sorry, lacking the time for that. But basically all the advantages of relational over hierarchical databases apply for relations vs dicts as datastructures.

My experience with hibernate, entity framework, nhibernate, and lastly dapper has basically left me thinking Dapper is all you'll ever want. I can't imagine a use case where I'd rather opt for NHibernate or EF at the moment.

(I might add, Dapper in combination with C#6 even gives me enough type saftey to be happy. String interpolation and the nameof operator complements dappers DTO approach nicely)

Since you seem to have similar experiences Im curious. When would you ditch dapper for the others?

> basically left me thinking Dapper is all you'll ever want

Same here. And I don't know anybody who has worked on large enterprise EF systems that hasn't come to roughly the same conclusion. The type safety EF offers is extremely nice to have, no doubt, but the problem is it makes it so easy to create performance problems that _every_ system ends up with them. Probably half of my billable hours in the last few years has been addressing EF performance problems.

Dapper, with a few extensions, can give you type safety for 80% of your typical queries, and the rest can easily be done in stored procedures or with (my preference) Dapper's SqlBuilder. And using SSDT instead of EF code-first, you can easily and in a version-controlled way manage your schema, views, stored procedures, indices, etc., simplifying performance management and getting static analysis in your sql while still not losing straightforward and configurable migrations.

Type safety (no strings) and just as fast as Dapper (actually faster in some cases):


Do you use a Dapper extension for populating and persisting entities? (Last I looked at it as I recall this wasn't default functionality.)

Dapper does have very low-level basic functionality to insert or update entities. You need to write the SQL yourself so this gets very painful very quickly for complicated models where you want to persist child objects.

That someone would say "Dapper is all they need" leads me to think they only work on very small projects. You will drown in large projects if you use Dapper everywhere. Better to use a high level ORM like EntityFramework and sprinkle in some Dapper in performance critical code.

The only time EF creates performance problems is when the developer doesn't understand the concepts behind SQL. This isn't an ORM problem, it's a training problem.

As I see it, if your model is trivially mapped, you could probably go with something like ormlite on top of dapper. But if you need to custom map you model anyway you might just as well write SQL to map it as some custom dsl mapping thing for the ORM in question.

And, I don't use dapper for performance, I actually like its transparency and friction free interface to the db. Let's me get things done without workarounds. And most importantly, lets me use ssms with a repl flow, copying sql verbatim between vs and ssms.

Just once I'd like to see someone write an opinion about ORMs without resorting to sweeping generalizations ("Vietnam of computer science") or making assumptions about how they are being used ("it wasn't a good fit for my use case so therefore it must never be").

Maybe, just maybe, it's possible that ORMs are useful for solving certain types of problems and less suitable for other types. If you work on web stuff or applications that are report-oriented where most of the time you're just fetching data (possibly with complex queries) and rendering it to display, then maybe ORMs aren't a good fit.

On the other hand if you work on client-side apps where your objects are backed by a database but are otherwise long-lived, then sometimes the other features (beyond SQL generation) that ORMs provide come in handy (tracking units of work over the object graph, maintaining an identity map for consistent access to objects, and providing change notifications when an object or collection of objects is manipulated).

If you've never needed any of these features that's perfectly fine. I've never to needed to use a bulldozer either. But I'm not going around declaring bulldozers useless just because I've only ever needed to use a shovel.

Is this not the general sentiment of any piece of software? It was implemented to serve a purpose, if that purpose is not sufficing a goal of yours it is not relevant to you.

Indeed. It would be nice though, if people, having realized that a particular tool isn't useful for their particular goals, could still acknowledge that it could be useful in other situations, instead of blaming the tool itself.

I think ORMs are a great tool to get something off the ground quickly. Like with most tools you will hit a point where they make things more difficult and then it's probably time to switch to SQL only or mix SQL with ORM especially for performance critical queries. In most applications I have seen the ORM provided a lot of value but there were cases where it needed to be augmented with raw SQL.

I never understand why people want one or the other exclusively. Both have their place.

I think stored procedures would be very useful if they integrated better with source control and the app code. Maybe we need an ORM for stored procedures that automatically creates stored procedures from the project code?

I completely agree.

90% of the queries in my app are no more complex than selecting from a table with a simple condition. I definitely find

  users = User.where(has_foo: true).limit(10)
to be a lot more readable than

  rows = connection.exec_query("SELECT * FROM users WHERE has_foo=true LIMIT 10")
  users = rows.map { |row| User.build(row) }
(And that's an example with no user-provided input)

Likewise, any app of sufficient size seems to end up with a handful of queries that really are a pain (or impossible) to cram into an ORM. Trying to do so would result in an unreadable mess, and using raw sql improves the situation immensely.

Here's the thing, anyone who knows SQL will find the second one readable, and only Ruby programmers who have used ActiveRecord will know how the first one does.

I've never written a line of Ruby in my life and I know what that code does, because it's essentially identical to what you'd write in C# LINQ to query with Entity Framework, LINQ2SQL, etc.

    var users = Users.Where(u => u.HasFoo).Take(10);

Unfortunately that argument works both ways, and we are saddled with a majority of mediocre developers who know an ORM but can barely read trivial SQL and can't really write it at all.

> and only Ruby programmers who have used ActiveRecord will know how the first one does.

do you really think the first one is /that/ hard to understand?

In my experience, programmers are often very fuzzy on all of the types involved in such libraries. I don't even mean "strong typing", but just the operations that they can perform with the various chunks of that expression. This is often complicated by the fact that the library itself often has bizarre limitations that have more to do with its internal implementation and limitations than with SQL. It's rare for me to see anyone use a library like that with fluency. Which seems to me to be a valid way of saying that, yes, in practice, it does seem to be harder to understand the first version in Ruby than you might initially think. Using it isn't particularly harder, but understanding it is.

Also note this is an observation of mine, not a logical argument, so trying to logically argue about why it shouldn't be harder would be arguing a point I'm not making. As much as I don't like people bashing strings together programmatically to generate SQL queries due to the ease of screwing it up, I observe that a lot more programmers are capable of this (even if they screw up the security) than seem to understand how to use things like ActiveRecord equally fluently. YMMV.

Really? connection.exec_, .map, .build – there's a lot more non-SQL going on in the second example than the first. The first may be syntactically far from SQL, but the use of familiar vocabulary makes it pretty understandable from a SQL point-of-view.

And what are the connection and allocation semantics of the first? (Yes, the second way is more explicit.)

What's your concern here?

Readability by maintenance programmers? How many Rails programmers will not know ActiveRecord? How many are going to be better at maintaining SQL than they are at maintaining ActiveRecord queries?

Readability by business people who don't know any programming languages? The first is a lot clearer than the second IMO.

Not true. They're both readable, but trying to do anything more interesting becomes painful, and this is where having a common tool for the job helps.

It's like regexps - imagine if there were 30-40 different implementations instead of 2 or 3.

For me thats the opposite. I prefer "rows = connection.exec_query("SELECT * FROM users WHERE has_foo=true LIMIT 10")", because i can read and understand the query. By reaing it i can already tell if it is a good query etc. while "User.where(has_foo: true).limit(10)" tells me nothing.

Read and understand, yes, the first is slightly simpler. Modify? Now I want to add an 'and' to the where clause. What's easier to figure out how to modify, for a random developer? I'd contend the latter, and that that is true for any modification you have to make.

In this example, it seems like it could be an issue that you take everything and then only use the first 10, instead of only taking the first 10 to begin with. Is there a way to not make it take them all, like putting the limit parameter in where()?

Rails doesn't execute the query until you actually use it in some way, by which point it knows you've added a limit clause.

No professional developer should EVER use Select * in working code.

At least it is an obvious fix. :) The AR example/solution would tend to just assume you want all the fields. To those naive of how AR behaves, it isn't obvious.

It's quite trivial to keep stored procedure code in source control with the app code or in its own repo. I am always mystified when I hear the complaint that this is difficult because in my experience it's no more difficult than managing any other code in a SCM repo.

My biggest concern has always been around zero downtime deployment with stored procedures. Your database has to work with both the old and new versions of web code during the deploy in case of rollback to make that process work and that always struck me as the weird edge case that makes things tough. If a procedure never changes or remains backward compatible then it should still be fine though.

Give the new proc a different name (e.g. append a version number). On the n+1 deployment, clean up the old proc.

And so you've reinvented the version control system on top of your deployment system, on top of your version control system.

That also sounds fun if you use the proc from more than one location...

Zero downtime deployment with stored procedures should not be confused with "version control".

It's not a version control scheme; once deployed the procs are never updated.

Yea, I suppose backwards compatibility is really the best policy there. If a proc is going to be updated to be more efficient it's one thing. If it's going to change what it does then we just need a new proc.

Yep. (if there are schema, changes too, zero downtime & backward compatability is difficult to pull off, but thats a chore for any strategy )

In my limited experience with the enterprise world, the problem with using stored procedures was that the devs had no control over the database. We could have come up with some nicer solutions to certain problems, but no one wanted to deal with the bureaucracy necessary to create and maintain parts of our applications within the database.

Even if devs had control, it's difficult to know which version is actually on the database without pulling it up (this is a timesink), especially if you have multiple environments/configurations.

AFAIK there's no way to diff/history of stored procs in the database (and certainly not against your VCS), so large companies usually do comment blocks at the top of each one.

> AFAIK there's no way to diff/history of stored procs in the database (and certainly not against your VCS), so large companies usually do comment blocks at the top of each one.

Sqitch, go check it out. Versioning database code with linear migrations always has this issue, you add a column and where's your diff on that without going through your list of migrations. Stored procedures are no different, and the sooner people use better tools the happier they will be with maintaining their database migrations.

Also Java and C# developers often produce bad SQL procedures because they tend to think procedurally about problems.

I think if they were better integrated with the IDE and easier to debug then they would probably write better stored procedures.

And easier to unit-test, and profile, and had a better standard library available, and...

SQL is a terrible general-purpose programming language by the standards of today, which makes it a terrible way to express business logic.

As a C# dev myself, fuck stored procedures, to be quite honest.

Please elaborate

1.They are a pain in the ass to keep track of in version control. 2. They are hard to debug, especially when lots of business logic gets dumped to them

Totally agree. It's hard to do code and stored procedures right with the same level of quality. The tools are different and the thinking is different.

What would be the better thought process for writing sql procedures? It seems like PL/SQL is fairly procedural.

The rule of thumb would be, use SQL statements to model and execute business logic, not cursors/variables. While the execution is still procedural, the goal is to make all the decisions 'at once' rather than explicitly looping. Behind the scenes a similar set of steps are taken, but the dbms is free to perform the operations as it sees fit.

This is a great point.

In a smaller company, those barriers aren't really meaningful. In an enterprise, you're easily adding a week or more to change control process to ship.

I think another factor towards why database focused solutions aren't popular in small companies is that MySQL historically hasn't been the best platform for that approach, and you need more expertise to scale the database.

It's not only easy to manage stored procs separately, it has huge advantages: you can play with your SQL in an IDE built for the task with luxuries like real-time error highlighting, profiling, debugging, and auto-complete, and in some cases you can even iterate on your code without restarting your application (just update the proc).

I've come to love jOOQ, the Java library that I can write type-safe pure SQL in (and with code that is immediately understandable to anyone who knows SQL). It provides mechanisms to smooth the clash of Java/SQL worlds, but otherwise never assumes it is smarter than the wisdom accumulated through decades of database usage. I've come to appreciate many aspects of databases again that tools like Hibernate try to hide from me.


I do still use Hibernate, probably because my usual framework of choice makes it so easy to, but anything above medium complexity goes through jOOQ nowadays.

I was intrigued by jOOQ, but the huge amount of generated code for internal metadata of the DB was a bit offputting. I don't want 4MB of generated classes for 3 very small tables.

Use the code generator's <inputSchema> or <excludes> flag or other means and you'll only get those 3 small tables generated...

Ah, thank you, this does the trick; I first tried to use my database name as inputSchema, but I had to use "public" in PG.

I see, yes. The database corresponds to jOOQ's Catalog, not the Schema (except in MySQL, which mixes up these concepts)

The thing I love about ActiveRecord is that it makes it easy to, anywhere you want, and at any level of the abstraction stack you want, to just toss in SQL fragments. This gets you the best of both worlds, the fluidity of being able to just define methods on model objects, and the ability to utilize database tech to the fullest. You can take any query and call .to_sql on it and it shows you exactly what it's passing to the database. I prefer Sequel's semantics but ActiveRecord is an excellent workhorse that I have no problems relying on.

But how do you do:

      (SELECT COUNT(1) FROM comments WHERE post_id = posts.id) AS comments_count
    FROM posts;
In ActiveRecord, without 1+N queries, or caching comments_count in a column somewhere?

Admittedly, that was not the best example. The last time I need something more intertwined than a simple COUNT in subquery, the answer was "give up and just use Arel." But at this point it is no longer quite ActiveRecord, but rather a SQL without strings.

This is my biggest gripe against the Active Record pattern in general, as it ties its model too tightly to the underlying database. It is convenient for a simple CRUD tasks, which may fit about 90% of use case, but that's not the only thing the database is capable of.

I don't know about active record but with Django it's simply:

Which produces (simplified, Django would actually explicitly select each column):

  SELECT posts.*, COUNT(comments.id) AS comments_count
  FROM posts
  LEFT OUTER JOIN comments ON (posts.id = comments.post_id) GROUP BY posts.id
Nice, easy and without 1+N queries.

That is actually very nice and elegant!

I've actually never used ActiveRecord myself, but when an ORM breaks down and it becomes hard to make certain queries I've had good luck with other ORMs by creating a view in the database and telling the ORM that it's a table. Obviously you can't insert into arbitrary views, but most are happy to give you an object back, with appropriate relationships to the actual database-writable objects.

I'm surprised there isn't more explicit support for this in more ORMs (things like not having 'save' methods on the ORMed classes).

That's really easy.

    Post.select("posts.*, count(comments.*) as comments_count").joins(:comments)

That is not the same query at all. You need to do a left join and a group by for it to become the same query. Wihout a left join posts without comments wont show up in the result.

You're totally right. My example is an inner join. I will admit that doing outer joins in ActiveRecord is quite painful. Here's a more realistic example, which is not nearly as pretty:

  Post.select("posts.*, COUNT(comments.*) AS comments_count").joins("LEFT JOIN comments ON comments.post_id = posts.id").group(:comments)

Thanks for the answer!

Unfortunately, what I needed to do back then were more complex than that (say, I need to perform query on top of subquery, e.g. "SELECT ... FROM (SELECT ...) AS a HAVING ... GROUP BY ..." sort of thing) which seems to be something ActiveRecord wasn't designed for.

In the end, I solve the problem by using Arel with an ActiveModel and unwrap all the data manually.

Wouldn't you need to do a group by in your query as well?


What's more, even with just bare prepared statements.. how do I use dynamically built SQL queries and prepared statements together? And please don't just say "don't", at least not without telling me how to achieve what I need the proper way :)

For example, let's say you have a query that gets search results, and depending on whether the visitor is logged in or not you also may want to know whether a given search result happens to be a favourited item. The way I understand it, I would have to do for example:

    $query_text = 'blah';
    if (user is logged in)
        $query_text .= 'SQL pertaining to favourites';
    $stmt = $dbh->prepare($query_text);
    if (user is logged in)
        $stmt->bindParam(':user_id', $user_id);
Or am I missing something?

Just use separate statements. You should already know before that point whether or not the request is coming from a logged in user, so to me even the if statements are redundant - have functions or methods that just take parameters for and return the result set of a single SQL query, and figure out which function to call and how valid the results are elsewhere:

    function getResultsForUser($DB, $user_id)
      $query="search for user :user_id, SQL pertaining to favourites";   
      return $stmt->fetchAll(PDO::FETCH_ASSOC);

    function getResultsForVisitor($DB)
        return $stmt->fetchAll(PDO::FETCH_ASSOC);
Yes, it does mean more code, but that code will be much cleaner, more readable, more easily transportable (by not depending on a third party library) and less bug-prone.

Deciding between user or not is the most simple case though. The most monstrous function I have by far considers things such as

    - is the user logged in, and maybe even owner of the profile we're at?
    - are we looking at all nodes, or the subnodes/subtree (don't ask) of a node?
    - do we want to display tags/authors/sources?
    - are we filtering by tag/author?
    - what's the display mode: list or full?
There might be more too, that's from the top of my head... Yes, it's messy, but also kina DRY. To "unroll" that would be even more monstrous, that can't be the way D:

This actually feels a lot like localization -- unrolling all the ifs is actually the best way to do it. It feels like you're repeating yourself, but having the whole sentence / query together as one unit gives you the ability to understand the whole context -- often times in a database context, you can omit parts that are useless if you get the full context.

Unrolling would mean a lot of permutations.. in my example 576 if I'm not mistaken. Understanding the context or dead SQL code isn't really a problem in my case, since the various query bits are mostly orthogonal to each other, it's that they're optional that makes it a bit complex.

But as I said in my other comment, I realized I can just remember the parameters/values as I build the query string, and then bind them all after creating the query. That way I can have my messy cake (that should feel so wrong but tastes so right) and eat it with a plate and a fork like a proper person.

Queries with and without user_id probably have different plans (e.g., table scan vs. index lookup) and need to be prepared separately. I've never seen an ORM that takes prepared statements and query planning seriously, generally they only emit trivial queries.

Since the queries only get re-used per connection IIRC, that's moot for my case (personal derpy CMS) anyway, since they all just get used once per request, anything loopish is admin-related and fast enough anyway.

What really bugged me though was the idea of having to repeat the decision logic that determines what the query ends up being twice... though I just realized that putting name/value pairs into an array while building the query, and then using that when the query is built to bind them to it, is probably fine.

Haven't red all comments and maybe somebody already mentioned this, but here are my few cents:

I've worked with multiple ORM frameworks during my career. And I always reached the point where developer needs focus to framework internals instead of delivering new features.

Yes, usually most situation which developer didn't expected can be solved by configuring ORM framework, changing some configuration, providing additional parameters or calling some methods, but what that means? You need fully understand your framework if you want to be sure you application does exactly what you want and not misbehave.

So next time when you'll be thinking grab ORM to make things simpler, ask yourself if you really have enough time to fully learn the framework?

I'm a bit surprised by these criticisms from someone using sqlalchemy. I had the same opinions of orms in the past until I learnt sqlalchemy.

Partial objects, attribute creep, and foreign keys: defer the loading of columns [1] to suit your use-case. You can even have different mappings that deal with different subsets of columns depending on your situation.

Data retrieval: sqlalchemy is so good for querying that I've mostly stopped using sql for anything other than complex exploratory analysis work. Sure, you need to know sql to be effective, but whatevs - eg Window functions? Done [2]

Dual schema dangers: I understand the concern, but again, you can choose to map this as you please and I don't see what doing raw sql gains you here. You need to change the schema, if that affects your application, you need to change your application, if it doesn't, do you need to change your orm code? I routinely migrate my db and release changes to the orm code later.

Identities: I admit there are a couple of times I need to flush in my app outside of the normal lifecycle, which I don't like. With sqlalchemy, for the most part, you just connect objects and don't worry about the ids.

Transactions: Whatever happens you'll need some sort of transaction boundary in your code. Removing the orm doesn't gain you anything there, does it?

I've written about some of this before on here so I won't rehash it https://news.ycombinator.com/item?id=9180831

There's a time and a place for sql, orms and storedprocs. The more you know about each of them, the more effective you can be. As ever, learn the tools and never throw the baby out with the bathwater.

[1] http://docs.sqlalchemy.org/en/latest/orm/loading_columns.htm...

[2] http://docs.sqlalchemy.org/en/latest/core/tutorial.html#wind...

I prefer to use functional frameworks. They 1) express the sql with map, flatmap, filter, groupBy etc functions 2) use classes that are equivalent to tuples.

Since I develop in Scala both are very intuïtive to write.

Apparently Linq for F# is really good as well. As someone else mentioned in a comment.

I get a programming interface that doesn't attempt to hide sql and still my code is fully typed. And it's really easy to make your query logic modular: make the sql fragment a function that you can call. Like any other code you reuse.

I can't think of an advantage of ORM over this functional approach. Apart from ORM being more well known.

I, like many other people, also took a stab at my ORM / ActiveRecord lib (First for PHP on MySQL, later ported it to Javascript on WebSQL) and I've been following exactly the sentiment that a people here seem to have to: Don't make your ORM want to do everything. If you are going to need computed N+1 queries, write a View, or use custom SQL.

ORM / ActiveRecord is a great pattern imo, but if you want to make it do everything, like so many other techniques, you're gonna end up with a behemoth of a thing.

I've tried many different ORMs in the last couple of years, and while mine may not be the most complete or have a sexy API for joining custom query parameters into the results, I still feel my own provides the one and only syntax / workflow 'as it should be' in Javascript:

    var serie = new Serie();
    serie.name = 'Arrow';
    serie.TVDB_ID = '257655';
    serie.Persist().then(function(result) {
      console.log("Serie persisted! ", result);
I've got loads of examples with jsfiddles if you want to find out more :-)


(browser that supports WebSQL required, obviously)

Using an ORM is for advanced users, and I think it's a bad default for beginners. A good one certainly saves time when you know how to use it, but it also adds complexity and narrows the perception of what a database can do.

Narrowing the perspective is especially bad for beginners, because some options will simply never occur to them and the ORM will prevent learning by osmosis.

Let's say you need reliable, efficient cache invalidation (say, to re-render a static page when the data changes). Triggers and LISTEN/NOTIFY in postgres might save a huge amount of time (and even more in maintenance costs) over a custom solution that probably isn't right anyway.

Or maybe you need to prevent schedule conflicts. Postgres offers range types and exclusion constraints, which aren't supported in all ORMs.

Or to keep a remote system reliably in sync (even if it's a very different kind of system), logical decoding is a very powerful feature.

But how would you encounter any of these potential solutions if you are always behind an ORM?

When considering using ORM we need to answer simple questions: does it help to decrease code size? In most cases ORM code is same size as SQL query. Secondly, does it hide complexity? No, it just adds up one more layer that in fact increases complexity and makes it harder to debug. Finally, does it protect us from errors or impoves code quality? This is rarely the case.

Instead of using ORM I prefer moving data retrieval code to database layer with help of views. There is also updatable views that we can use to simplify database updates on code side and sometimes avoid using transactions. Separation of code and data logic is great concern when deciding to implement ORM. Nowdays database are very smart and convinient so there is no need to use ORM.

In most cases its smaller (in the Django ORM anyway).

myobject, created = MyObject.objects.get_or_create(field1='1' , field2=field2)

That would take a number of lines. One query to check if the object exists, another to create it / retrieve it, plus application code to deal with that SQL.

Ok, for this particular example, is operation wrapped in transaction? How will code evalute if we need to check some condition on field2 and occasionally update it?

Never seen a large enough project that relies on an ORM be anything other than a giant mess. I mean never. The conclusion is correct. From an application perspective the db is just another API and should be treated that way and the ORM should just be thought of as a convenient DSL for creating queries on top of that API.

To be fair, I've never seen a large project be anything other than a giant mess. This is the nature of large projects. If it's not a giant mess, you are probably leaving user happiness (= $$$) on the table.

And it comes at the expense of craftsmanship and programmer happiness. Lately I've been thinking how much initial architecture affects the mental well-being of all subsequent programmers that join the project. This is one of the reasons there is such a high turn-over and burnout rate in programming. If you're not consistently paying back technical debt and improving the architecture of the project then you're paying for it in other ways with the revolving door of programmers.

Then again if you're a big enough business it probably doesn't matter as long as customers are paying you. All you have to do is plug-in another cog/programmer into the machine when the previous one wears out.

All this is true, but it's also the reason why programmers are paid as much as they are. In general, in a market economy, you are paid to do things that other people don't want to do. The whole point of a market is to create an incentive to do things that are unpleasant.

Beautiful code does exist in the software world. It's usually found in hobby, open-source, and research projects without profit motives, userbases, deadlines, and all those other complications that result in shitty hacks.

In the meantime, you could look at the crap that is most large-scale codebases as a barrier to entry that keeps demand for programmers high. Enjoy the money you get for cleaning up other peoples' messes, and then use that to buy time to make the beautiful, useless stuff.

It strikes me as a thick/thin client tug of war. You can supplant ORMs by e.g. moving your logic into stored procedures on the db side, until you find an inconvenience there, then back and forth until the end of time. As with many dyadic architectural choices, there are good arguments on both sides.

I agree. The problem is that striking that balance is hard. If you go all-in with an ORM then trying to backpedal and figure out at which point you went wrong becomes almost impossible and so now you have a tangled web of ORM queries and no clear path to untangling things.

Sure, but the balance is entirely context-dependent. But hey, I guess this is where all polemics fall down.

That said, where is the academic problem in refactoring into the db and decomposing ORM queries into straight SQL where necessary/desired? I'm not a Fortune 500 Systems Architect, but I'm under the impression that this is a fairly straightforward process. Now, management of the systems will necessarily incur extra complexity, but that's a second order effect that should be assessed in the decision process as part of the cost estimations. Extra technical debt, for sure.

So, choosing one is technically cheaper and more simple and maintainable. Choosing multiple may be faster (or, in many cases, "academically satisfying"), but more complex with a steeper learning curve. Be sure to hire the best of the best!

Not so simple. Every big enough place will be a disjointed mess of cross-cutting concerns. The ORM (usually another custom mess on top of some open source thing) ends up being owned by everyone and in effect ends up being owned by no one. So then you end up going around asking each team in turn to fix stuff but each team consists of domain experts and not ORM/SQL experts so no one does anything. So then you bring in ORM/SQL experts but they don't have any domain knowledge so they just chip away at the edges and end up making no improvements and you still are stuck with an ORM nightmare.

The bottleneck in fixing technical things is never the technical stuff. After all, most of this stuff is 70s tech to begin with and the science has been worked out by much smarter people already. The social stuff on the other hand is a different matter entirely and once you're in an ORM quagmire it is impossible to get out.

I've worked on a 400kloc Java system that was beautiful in the half that used Hibernate, and horrific in the half that didn't.

400kloc Java. That's cute. Not saying that's not impressive. The scales I've seen you're off by a few orders of magnitude.

Oh sure, it's not the largest system I've worked on by any means, it just seemed relevant because at the time I was working on it it was almost exactly a 50/50 split between the hibernate parts and the not. And honestly I find beyond a certain size the techniques you have to use are pretty similar - there's very little practical difference between working on a 400kloc and a 4mloc project.

What do you define as "large enough"? There have been a ton of fairly big Ruby on Rails apps written over the last few years, the majority of them using ActiveRecord to handle everything.

Really? How many of those backends have you seen and worked on? Heroic effort does not qualify as success in my book. You can maintain any nightmare by just throwing enough bodies at it. What counts is the maintenance burden and the cost of change.

ORMs tend to assume that there is "the application" with "its database". If the application changes, so does the database. If the data is used by more than one application, it's better to have the data defined in the database and write applications as database clients.

At the big tech firm I work at, there's a best practice where any database (whether that's a traditional RDBMS or a NoSQL client) is abstracted away by a microservice with a defined API, and every other application that wants to get that data needs to interact with the microservice. That way, the database schema can change without it affecting multiple applications. There's still the traditional mismatch between ORM and database, but doesn't feel as painful because only a single application is using that data, and that application can have special knowledge of the persistence layer underneath (meaning, use database hints if necessary, defer to raw SQL, etc.)

We have decades of research into filtering, joining, and aggregating across a complex set of tables and views. With microservices you have to roll your own query planning and stream all the intermediate results on the wire even when you're throwing away most of them.

The thing is, letting the system handle it isn't good enough. We have decades of research into this stuff, so 90% of the time the database gets it right - but the interface for explicitly taking control when the DB gets it wrong is black magic at best. The value of having an exposed representation of filtering/joining/aggregation in a general-purpose programming language where you can interact with it is big enough to outweigh that of a database that can do it for you most of the time.

It does sound like an extra layer of unnecessary complexity to solve a problem that could be solved in other ways.

Unfortunately, SQL servers aren't generally scalable. Offloading joins and aggregation onto (inexpensive) app servers can increase over all system performance, despite lack of advanced query planning.

Bollocks. SQL scales fine unless you are stupid in the way you use it. All the crap performance I have seen in the last couple of years have been from doing crap like you suggest - doing joins and aggregation at the application level. Our front page currently makes over 1000 database calls because of this sort of nonsense.

That probably is true - if you have a crappy API design

That kind of sounds like a microservices anti-pattern in disguise. Its thought by many to be bad practice for microservices to share a common database. But that's pretty much what you're doing, but just with an API instead of jdbc for the access.

An API is much easier to version-control, automatically check for compatibility and so on.

Ok, I have only used Django's ORM and a little bit of SQLalchemy, but in both cases you are defining the database in a Python file. Django's models.py file usually maps directly to database tables, and is less verbose than writing SQL create statements.

Maybe I don't see any advantage in what your doing, as I treat the database model as the most important part of the application (or maybe thats the same as what you are saying). Get that correct, and code falls into place easily. Start hacking rules in at the application level and eventually it gets messy. Linux said something similar about bad programmers worrying about the code, and good ones worrying about the data and its relationships.

Ok, I have only used Django's ORM and a little bit of SQLalchemy, but in both cases you are defining the database in a Python file. Django's models.py file maps directly to database tables, and is less verbose than writing SQL create statements.

Maybe I don't see any advantage in what your doing, as I treat the database model as the most important part of the application. Get that correct, and code falls into place easily. Start hacking rules in at the application level and eventually things start to get messy. Linux said something similar about bad coders worrying about the code, and good ones worrying about the data and their relationships.

I've slathered in yet another confusing layer by creating an application specific view.

Now you've got the ORM layer, which possibly translates to SQL correctly, to talk to a view, which is a front on the real untouched table. Its kind of a hassle.

ORMs can alleviate a lot of tedium, until you get to a level of complexity where they start to create it.

I created my own ORM-like access library which seems to work pretty well.


Comments welcome.

"quasi-professional cowboy" here. I have used both ORM and direct SQL systems with the problems outlined above. I finally gave up and wrote my own row mapper:

1. Using the DB Schema, generate stored procedures to load and save data together with code/validation in partial C# classes.

2. Transport data in XML format using single letter (or two) table/element and column/attribute names (auto generated). Use .Net to automatically build/ serialize the XML into objects. SQL Server/.Net have some nice XML features that make this simple.

This means that the only user access to the database is through authorized stored procedures. The users have no access to database views or tables.

I have implemented the means to load and save arbitrary hierarchical objects (e.g. an Order/Order Lines). A single request to the database can return a complex three level object which would otherwise take hundreds to round trips to load.

I agree with the observation that this would be hell to maintain. However, the people likely to maintain this system would be in just a different hell if they had to work with Hibernate or MS Entity Data Model.

I written a fair number of C# LOB apps and use LINQ quite a bit with mysql. I don't even want to talk about Java and some of its ORMs as its too painful to think about. I agree with the sentiment of the post but in compiled languages I really want an ORM to simplify unpacking result sets. LINQ is great when it works but joins sort of suck as well as calling in-built sql functions and it can some times generate highly unoptimized queries. I frequently pull in data via LINQ to SQL and then massage the data with LINQ for objects where I have better control of performance and operations.

Trying to write pure SQL leads to lots of manual unpacking of the result set which I generally dislike and is much harder to maintain and doesn't work well in practice compared to when LINQ actually works.

I think maybe what I've really learned is use scripting / loosely coupled type systems when working with SQL. In python, I usually just call sql directly rather than use sqlalchemy and its fine because of the loose typing and result set unpacking isn't terrible.

On C#, Dapper's useful, but not perfect, for letting you write your own queries and then making it easy to unpack the result sets.

Unfortunately, it relies on property setters for doing the unpacking, so it doesn't interact super well with your code if you like to avoid unnecessary mutability.

The only publicly-available lightweight ORM I know of that does a good job with that is the SQL type provider in F#.Data. That one is head-and-shoulders above any other option I've found for working with databases. It does require that you write your data access layer in F#, though, which may make it a hard one to sell at work.

Thanks. That actually doesn't look that bad. It would be nice to have anonymous type objects but I recognize the difficulty in that. This looks like a nice compromise. Now if they could also fix passing in arrays as part of a parameterized query "select x from y where z in ?" where ? is a collection of strings or integers it would be perfect but I think that is a driver/interface problem.

Edit: Looks like it actually does says it does support the enumerable but I'm questioning if it will do what I want. Will have to test out but I'm guessing there is a limit to the size of the array.

Dapper accepts anonymous objects for passing query parameters. It will return an expando if you don't specify the type of object you want back, too. That can be convenient, but I don't personally like having dynamic objects running too wild in my code so I end up manually mapping it to another type before returning anyway. (AutoMapper can help here.)

The IEnumerable support will generate (parameterized, I believe) inline arrays in the query. Not really my favorite, but it works.

The thing that I couldn't get over, and which ultimately led me to write my own micro-ORM at my last job, was the weak support for table-valued parameters. But I gather they've fixed that since then, so hopefully it's not a big deal anymore.

Dapper supports IN statements being passed an array:


Works if you pass in a parameters object containing a property called ids that is a list/array of some sort (see https://github.com/StackExchange/dapper-dot-net#parameterize...).

On SQL 2016 this uses STRING_SPLIT across an nvarchar(MAX) which makes it effectively unbounded. Earlier versions and other platforms have restrictions on the number of elements in the array/list.

Dapper will use a constructor with parameter names that match your query's columns if you need immutability.

unpack result sets into what?

What reading articles on how bad ORMs are has taught me: More people need to use Django.

Django does a lot and does it well.

I wouldn't say the ORM is Djangos strongest point. Its great for simple stuff, but not great for any moderately complex queries (e.g. extra conditions on a join, subqueries, conditional aggregates - though I think that last one may be solved in the most recent version).

I don't think they have done a bad job with the ORM, but you really ought to know SQL as well.

I think the ORM is Djangos strongest point. Take the ORM away and then ModelForms and Admin go with it. I think Django makes little sense if you're not using the ORM.

That said, you're absolutely right that it is really bad at certain types of queries. There's no way to really influence joins. Subqueries are coming (https://github.com/django/django/pull/6478), and conditional aggregates became possible in Django 1.8 with Expressions support (which I authored). Queries that map over many to many relations that have .exclude() or .annotate() generally produce incorrect results. Most of the problems with the ORM are as a result of Django handling joins for the user - which is also one of its biggest strengths.

You should absolutely know SQL if you're using an ORM. If you're doing anything other than basic CRUD, you have to know SQL. An ORM should give you an escape hatch to write that custom SQL. Django's ORM does not save you from all the problems mentioned in the article.

I'm not arguing that people shouldn't learn SQL, I'm just saying that a lot of people who bemoan all ORMs should try using a really good one —like Django's— and see if all their complaints still ring true. Your examples, for example, might have been true a while ago but:

- Extra conditions on a join (eg a prefetch filter): https://docs.djangoproject.com/en/1.9/ref/models/querysets/#...

- I'm not sure what you think you'd want explicit subqueries for, at least in a context that annotations, prefetch, etc don't already do. If it's that weird, just use `.extra(..)`

- Conditional aggregates: https://docs.djangoproject.com/en/1.9/ref/models/conditional...

- Not that you asked but also have a look at the new database functions stuff. http://stackoverflow.com/questions/38017076/annotate-a-comma...

And it keeps getting better. The list of "ORMs are rubbish because I can't do X" is evaporating. Yeah there's probably some stuff that never comes up but you can still run raw SQL through the ORM.

When you consider the development time saved and performance improvements (caching, good query writing, etc) an ORM delivers, I personally think it's hard to argue against them.

> ORMs are more detriment than benefit

The author mentions Hibernate and SQLAlchemy, which are both DataMapper ORMs. But what about Active Record? It's true that AR will provide even more abstraction and distance from the database, but it also provides a lot more convenience, which for me in small CRUD projects (as most are) has been worth the downsides.

And as another posted mentioned, you can always optimize by replacing slow AR queries with custom SQL ones, and restructuring your database as your project scales.

I love ORMs, they help abstract the database technology which has been really useful in a recent project switching an app from MSSQL to PostgreSQL. I also personally as developer appreciate the programmatic syntax.

I would caveat the performance issue, sometimes there is no better way, however i would always try to make things work in the ORM first before switching into native SQL to get the job done.

> they help abstract the database technology

If you are going to do anything even remotely complex you're going to need to know that database technology otherwise your ORM is going to spit out queries that are dead stupid and a performance nightmare (pages taking 1 full minute to load). Then you end up fighting with the ORM to do what you want.

That's my experience.

I guess there are projects that are so simple the database layer can be abstracted away but I've never worked on one.

I've always said: "ORMs are for people who don't know SQL!"

ORMs are not for people who don't know SQL.

The secret to using an ORM is knowing exactly the SQL it will generate underneath. The advantage to using one is static typing as well as writing orders or magnitude less code than SQL. As well as orders of magnitude more understandable code than the equivalent SQL.

If you work on heavy enterprise applications the ORM can be your best friend or worst enemy. It comes down to knowing the tool you're using inside and out. It will make you much more productive in the long run. And sometimes yea, you need straight SQL or a stored proc to get the job done. That's OK.

> I've always said: "ORMs are for people who don't know SQL!"

False. To use an ORM you must know and understand data modelling and the Entity/Relationship model. I'd say, people who don't like ORMs are often those who don't understand that model.

If you do understand it, then you know that relationships can be represented 100% by code in an OO language, thus completely automated.

Why bother writing your own hydrators for each entity when you can model them directly in a library? it doesn't make a codebase more maintainable or readable.

Furthermore you could have an ORM that uses SQL directly for queries, and still eagerly resolve relationships. It's just not practical, that's why ORMs often come query builders.

That is the thing that abstracts SQL, not an ORM per say. ORMs only deal with relationships between entities. Query Builders build SQL queries.

This doesn't ring true for me, at least - I've been programming professionally (using SQL) since the mid-90s, and these days very much appreciate the time-savings that ORMs (on Windows, for the most part) provide. With my favourite ORM - XPO from Developer Express - there is a decent tool for monitoring queries and sessions, and if I need to I can write custom SQL to improve performance (which almost never happens).

Another important point is that it's, IMO, much easier to maintain ORM-based code, and another programmer coming along after me can more easily get started on understanding what I've implemented. Once place I worked in the past we had a 'SQL guru' who wrote these massively complicated SQL queries that could be understood by no-one by him.

Like the others below/above I agree that you should certainly know SQL if you're going to use an ORM. ORMs can be a really great tool if you use them correctly.

More to your point, I think the problem is the vast number of developers using ORMs without knowing SQL and running into big problems with correctness and performance.

I began programming in languages where SQL was a requirement in 1999; only in the past 3-4 years have I use ORMs. I often write a lot of raw SQL, but I find a well written ORM statement communicates intent more quickly.

What django gave us out of the box would have been impossible with SQL in the same timeframe with the resources we had. If we had the luxury to do things properly and fully the first time, we never would have made it to market with our robots.

This isn't to say that one day we won't throw away django or use more SQL. We just had real world schedules to deal with.

Oh finally!!! I've typically had a ambivalent relationship with ORMs. The mass of config files and auto generated code is a huge pain - especially when things go wrong. Doing multiple DBs can also be an issue and requires messing around with config/annotations, what not...

Just write the SQL :) - relational algebra is not that hard...

If you dont want to deal with quirks of ORM,use a lightweight microORM. You basically get some productivity, simple type checking,consistent way to access data , sql injection protection and you will deal with hand rolled sql queries.

As much as I advocate for properly using SQL, I still want an ORM (lightweight) to manage the repetitious small stuff that happens repeatedly throughout an application. I'm all about custom SQL or stored procedures when the time comes though.

I find ORMs great in some scenarios.

Take a really thick client, like say a graphical diagramming tool that makes your machine's fan whir loudly when you start it up. Here an ORM can be a great tool to efficiently manage a constantly evolving, large cache of your hot objects, and keep them syched with the backing RDBMS.

Big batch programs can be another good use case.

Server side web apps and API servers though are the opposite of this. Web pages and API responses should be fast and small, so we don't normally build up a big cache, and in a stateless architecture we are normally throwing the cache away at the end of each request. In this case raw SQL is often easier than work with.

Sigh. Whenever I have to hurt myself at work on such issues, I find myself thinking of the web that could have been, if only...

Let me backtrack a bit. There's a much better way to do things than the way we do it right now. A way that completely obviates the need for ORMs, or indeed any way to deal with the interface between application and database.

You see, there would be no object-relational Impedance Mismatch (and therefore no need for such clunky kludges like ORMs) if the database and the application were both written in languages of the same paradigm, either both OOP or both relational. Ideally, that would be a single language, that could handle both data and application logic.

There is such a language: Prolog [1]. Prolog is implemented as a relational database. So your program itself is the database. There is no separation between data and operation and therefore no ORIM and no need for ORMs, or stored procedures or anything, really.

And, yes, it's perfectly possible to do your web dev in Prolog. Here, see this explainer on the Swi-Prolog website, titled "Can I replace a LAMP stack with SWI-Prolog?":


Hint: Yes. Yes, you can. You can replace LAMP (or LAM-whatever) with an LP stack, where all you need on top of your OS is Prolog itself. Its built-in Definite Clause Grammars notation [2] can be used to parse and generate javascript, html, xml, css, YAML, whatever you like. Swi-Prolog even offers translation to RDF, which is as natural as you can expect given RDF is also a relational language.

Here's a big fat tutorial to help you started:


Can this be done for real? OMG yes it can. The Swi-Prolog website itself runs on an LP stack. There's a few more websites that do too:


In short: you don't need to hurt yourself as badly as you 're currently doing it. There's no other need than of course, nobody wants to learn Prolog. I know that, I've made my peace with that and I've spent all of my so-far career hurting myself against the ORIM just like everyone else.

... but we could have had a better web.


[1] https://en.wikipedia.org/wiki/Prolog

[2] https://en.wikipedia.org/wiki/Definite_clause_grammar

This sounds really interesting, but I'm having a lot of trouble trying to follow how all that code fits together - from what code I could find. I also strongly dislike the object/function based approach to constructing HTML (which - funnily enough, is just like using an ORM to construct SQL which I like heh).

Really, I don't see enough offered to counteract what little pain I feel using an ORM. The LP actually looks more painful (to me). Obviously I'm totally unfamiliar with this stack which makes it hard for me to see where the benefits lie.

>> Really, I don't see enough offered to counteract what little pain I feel using an ORM.

In that case you'd probably find no benefit in giving up the use of one, regardless of what replaced it.

Hibernate is a very complex library. SQL+JDBC on the other hand is easy (JDBC is tedious, some complex SQL might be hard, but generally it's easy). On the other hand Hibernate could significantly reduce development time, if developers know it and application is a goot fit (CRUD, filters).

My advice is to avoid ORMs unless your project is big and its database schema is a good fit. And if it is, don't think, that ORM is easy, it's not. Learn how it works, read its source code, understand its inner workings. Or find someone who does.

I didn't have much experience with other ORMs, though.

I expect I am approaching things improperly, but the only time I end up fighting ORM's is when I am trying to creation complex aggregation queries, or other responses used on "reporting" pages.

Domain objects in the middle tier only make sense if there is pervasive reuse of component objects. I extensively chased the unicorn of component orientation but it never pans out.

The logical place for domain objects is in the Database. Yes, stored procedures, with object layer mappings of views. But the reality of the practice is that (a) the toolchain for the DB backend is quite firmly stuck in 80s and (b) most IT programmers lack SQL skills.

The trick to using ORM is being smarter than it, so that you can ultimately rewrite everything it once did for you incredibly inefficiently.

Use the best tool for the job.

ORM's might do it 80% of the time. The other 20%, most of the mature ORMs provide a way to execute your own sql.

I'm curious how the author would like MyBatis, which is just a thin wrapper for calling SQL, and then translating results to POJOs.

I like it, and it, IMO, keeps the API distinct from the access in the DB. You have to write your own SQL, which is good and bad. Huge amounts of control and performance, but higher portability costs.

Also Records for Python: https://github.com/kennethreitz/records

One important benefit of using an ORM is not being locked in with a particular DB server tech. Good luck migrating large app not using ORM, from, say, MySQL to Postgres if, for example, Postgres replication is something you are all of sudden interested in.

Has anyone ever actually migrated a database on a non trivial application? I want to go from MySQl to Postgres, but I am not sure if its worth the risk / effort. We got told about the standard nature of SQL in University, and how you could swap engines 'easily'. Then we started using them in real life and the reality of that situation hit.

For me the time I loved SQLAlchemy the most was the time when I migrated my database from MySQL to Postgres. Almost all of the issues people were warning me about did not matter as SQLAlchemy caught them automatically.

I came to the same conclusion, and wrote this ORM for SQL lovers:


Use ORMs for complex CRUD. Use SQL for complex queries.

There are two types of ORMs 1. ORMs for SQL dummies 2. ORMs for SQL gurus You really need to be clear, when discussing ORMs, which type you are discussing.

Enterprise Objects Framework just worked.

Pretty much every other ORM I've worked with has been terrible by comparison.

(= 'ORM 'OMG) #Clojure

SQL is a write-only language. You better hope you don't have to rewrite your complicated raw SQL queries in the future.

You can write well written modular SQL that supports changes and can be read by a competent developer.

Not in my experience. Please show me actual examples of "well-written modular SQL" in a real application.

I'm taking a bit of creative license with the term "modular". CTEs and views can help with this, and are closer to true modules.

I like to encapsulate inside of stored procedures, and write my code in such a way that it reads almost like a a procedural language. Heavy use of indentation, subqueries, and temp tables lets you carry results through the proc. I like to hang conditions off of joins, so related logic is in the same place.

Would modular sql be wrapping up subqueries inside of views, and then selecting from those views?

Common Table Expressions (CTEs) help with modular SQL.

If you do SQL full-time probably yes. The problem is that most developers don't work enough with it so their skills are lacking.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact