Hacker News new | past | comments | ask | show | jobs | submit login
Embracing SQL in Postgres (conery.io)
273 points by GarethX on Feb 26, 2015 | hide | past | web | favorite | 120 comments

My CTO wrote custom PostgreSQL functions to generate all the JSON that our API returns. First time I saw it, I said "Holy crap, that query is two pages long. You could just tell Rails to serialize the object in 3 lines of code." He replied "I know. That's how we did it at first. But generating the JSON directly in postgres is about 700x faster."

Eventually we even moved a very complex scoring algorithm from an overnight Hadoop Java process, into Postgres as a compiled extension in C. We used to spend all night calculating those scoring results and caching them in an enormous table. But with the Postgres+C solution, we could calculate and store it in realtime by having Postgres do the hard part.

You can skip the web tier entirely and have Pg returning JSON documents via an HTTP endpoint to a rich MVC js browser application that synchs to a local cache.

Thus re-inventing Lotus Notes.

I like OpenRESTy for such endpoints; YMMV.

But I refuse. REFUSE. to re-invent Notes.

I haven't used it yet but postgrest looks awesome for this.


That looks like a good tool, but have you met nginx before? I sorta view OpenRESTy as a reason to sneak nginx into my stack. :)

>Thus re-inventing Lotus Notes.

I would like to veto this with deadly weapons - up to and including nuclear.

Writing Postgres functions in C is not nearly as scary as it sounds. Here are some stats functions I wrote for a Rails app. The histogram function in particular was the driver, since it was parameterized on bucket size/count/start so uncacheable:


Very happy to see all of this talk over the past few months on the power of modern--or anything beyond basic--SQL.

This article touches on one of my favorite things, too, which is built-in date and time arithmetic. That's something that basically just works and comes for free in the database but is often a hodgepodge of messy third-party libraries elsewhere.

The other enormous one, for me at least, is arbitrary precision math. Proper precision, rounding, division, and exponentiation isn't easy to come by. I'm so tired of dealing with libmpdec, BigDecimal (flimsy as it is), GMP, etc. It all works well in the database and it's pretty fast. Perhaps it's not suitable to 1000-digit scale scientific computing, but it sure does make building financial software easy.

In Ruby World, Sequel (http://sequel.jeremyevans.net/) probably gives a bit of best of both ORM and direct SQL use, because what it really does is an object-oriented ("native" to language) interface to SQL database. And yes, it can build models like ActiveRecord, too!

In Common Lisp, postmodern (http://marijnhaverbeke.nl/postmodern/) serves a similar role, with both DAO and lower-level access methods that compose pretty well. The 'perec' ORM from DWIM project was based on Postmodern, and among some weirder things I did quite easily was embedding custom SQL into query builder, so that the object-based accesses would use the advanced SQL. Really nifty feature :)

One thing that drives me absolutely over the cliff is how ORMs try so hard (and fail) to abstract the power and expressiveness of SQL.

I agree with this. It's nice to deserialise a DB row into objects, but actually querying with an ORM is usually terrible and people should not be afraid to write smart queries. Especially like those in sibling comment https://news.ycombinator.com/item?id=9112036 : date/time/arithmetic.

> actually querying with an ORM is usually terrible

What do you think of stuff like Hibernate Query Language? I'm currently working on a project that might use DQL (a PHP variation) to get our snarl of tables under control.

The fact that Hibernate Query Language even exists tells you that ORM is a bad idea.

I think the basic underlying issue is some developers just don't want to understand data. The rest are cascade effects.

That makes absolutely no sense.

The whole point of ORM is that it is database agnostic. Given how different databases twist the SQL standard and provide a whole raft of proprietary extensions you need an abstracted query layer.

Whether you give it a name like HQL or don't there will always be some layer.

I hope that's not the whole point of an ORM given how pervasive they seem to be in every project and how few any given projects ever switch their backend out. Choosing an ORM because it abstracts you away from the storage engine seems like a terrible reason.

I agree with Conery, choose the components that will compose your application wisely and then embrace those choices and use them to their fullest.

Sure, but one can have an "ARM" (Abstract Relational Model) that abstracts away from database-specifics without trying to shoehorn the relational model into an object model, which isn't a one-to-one mapping. It doesn't have to be an ORM.

Couldn't the abstracted query layer... be SQL? You don't need to speak it to the server; you can parse it at the client-library level and send whatever optimized thing the server likes to it.

I think SQL is already a DSL for querying, and it is beyond clumsy trying to second-guess one (pretty good) DSL from a second (decidedly less mature, less standardized) DSL.

This is why I have come to love MyBatis. It lets me use SQL for the queries and then deserializes the results into a structure of my choice, even something as simple as List<Map<String, Object>> in Java.

DQL and Doctrine are absolutely horrific. Not to mention extremely slow both in development time and especially in execution. I will bet money you will regret using them soon enough. I already am. In the apps I'm working on, the time spent in Doctrine is > 20x the time spent actually making the queries.

EDIT: I inherited this system. I'd never use an ORM by choice.

Even though I think it's important for developers to have some understanding of the underlying SQL, I still believe a good ORM can save time and make code more readable. I think SQLAlchemy does a good job here - and it's close enough to SQL such that you are typically not surprised by the SQL it produces.

For more complicated queries, I often hand write them in SQL first and then translate them into SQLAlchemy's methods. It's still worth it, since it's both easier to build dynamic queries (e.g. dynamic WHERE clauses) and it makes the parameterisation trivial.

My problem with stored procedures is that you can end up with a lot of business code in the db.

Why is it that the DB is the correct place for business data, but not for business code?

Of course, even for practical reasons, it's important to keep the DB lean. But some code does really belong toghether with the data.

Sorry but putting business code in databases is a terrible, terrible idea.

(a) Oracle, Microsoft, Teradata etc all charge through the roof for scaling out your database. Which of course you're going to need to do if you're adding more and more business code. Scaling out normal code ? Cheap.

(b) What happens if you exceed the capabilities of your database (yes it happens). Then you have a major project on your hands migrating both data and code.

(c) The platform for developing/executing your business code in the database is like going back to the 1980s compared to the new flexible, microservices world we live in today. Really stored procedures over NodeJS, Scala, Go etc ? Performance isn't that important in businesses where most processes are batch orientated.

(d) Databases in businesses are as tightly controlled as they get. As a developer would you want to have to go through laborious change management processes every time you push a commit ? It would cripple software development teams.

There is a reason why "data lakes" are all the rage right now. Because people need cheap places to store and process their data and then use their SQL database simply for querying and reporting.

(a) Yes, databases scale badly. You don't want to run your entire application there. Yet, code that reads a huge volume of data, calculates something based on parameters, and write it back to the DB will put more weight on your DB servers if it's run from the application layer; code that enforces the consistency of the data needs a strongly enforced policy if you keep it in any other place, but it works completely transparently if you place it in the database; code that denormalize the data will be much easier to use if you can run directly on your queries...

(b) Been there, done that. Migrating code is EASY. Data is what's hard.

(c) Is that a joke? Honestly, I can't tell.

(d) That's more a reason to integrate those teams than to choose one place over the other. The programmers can disrupt the database on several ways without putting code there, and don't get any extra power by running their code in a different server.

By the way, the data that was famously leaking recently was mostly from email and file servers...

(a) http://aws.amazon.com/rds/postgresql/

(b) Can you give an example?

(c) I don't understand what you're criticizing here.

(d) This depends on the organization. But generally business logic that would be at home in the database is business logic that should be controlled as tightly as the data is.

Because then it's not in version control.

Is that for the same reason that stored procedures, column constraints, indexes, and performance are not tested the same way any other code is?

When I write a schema, I plan for testing it with dummy data. If it's PostgreSQL I write a test using PGTap. It's code. You test it. I don't get why everyone doesn't get that.

Stored procedures can and should be deployed from files that are under version control.

Then you have to keep your database and those files in sync. There are tools that help with this, but it's no longer as simple as redeploying your code.

You keep them in sync by only deploying from those files, exactly the same best practice for deploying any software.

A couple of weeks ago I asked the Sequelize people why they couldn't also copy their validators into the database.

The validation would still happen on the node app side of things, but when it generates the create table query (as it must) it could easily also put in the equivalent check constraint.

Waiting on an official answer.

Not really. The program can define the stored procedures on startup.

That tends to hurt a lot when you have multiple front ends, and in general, I don't like the front end to have that much access to the database.

Organizations I've been with have always preferred to keep the revenue app out of the schema-maintenance business.

it's certainly that simple in my deploy script.

If you are primarily using stored procedures, you are probably viewing the database as a "persistence API" with its own encapsulated logic.

For context, the author Rob Conery wrote SubSonic - one on the first opensource ORM for .Net which was very popular at the time. http://en.wikipedia.org/wiki/SubSonic_(software)

I never got that much sold into ORMs when doing anything more complex than plain CRUD, I rather use the expressiveness and performance of plain SQL and its associated procedural extensions.

One thing you get with a good ORM is composability, though. Trying to build dynamic SQL queries with raw strings is a nightmare.

Another, less useful property, is a considerable reduction of boilerplate when persisting a complex object graph.

> Trying to build dynamic SQL queries with raw strings is a nightmare.

Create the optimized version in a SQL editor, then use the builder pattern to fill in the holes in the query.

Much less boilerplate than any ORM tool generates.

> Another, less useful property, is a considerable reduction of boilerplate when persisting a complex object graph.

That depends how expressive is the application language's meta-programming capabilities.

The thing is, if I have to fine-tune the ORM generated SQL for optimal performance, its value diminishes.

> Much less boilerplate than any ORM tool generates.

I disagree. The SQLAlchemy query language is very nice, for instance.

> That depends how expressive is the application language's meta-programming capabilities.

Certainly, but metaprogramming doesn't necessarily rhyme with readability.

> The thing is, if I have to fine-tune the ORM generated SQL for optimal performance, its value diminishes.

Of course. However: the SQL generated will be fine for the majority of queries. What's going to bite is the use of lazy collections. For complex queries, you may want to fine-tune it by hand (though not necessarily).

That said, I rather agree that ORMs are best avoided. The existence of lazy collections makes it very easy to turn what ought to be a straightforward, fast query into something slow. What I'd like to work with is a query builder for select/insert, and avoid magic, even at the cost of creating by hand awkward structures like PersonWithAddress and PersonWithAddressAndBankAccount.

Yeah, query-building languages with good composability are a sweet-spot. In ruby, there is arel, which describes itself as "a relational algebra", rather than an ORM and is actually how activerecord, which is absolutely an ORM, is implemented. It's possible, even easy, but not particularly idiomatic, to drop down into using arel for more complex queries.

I agree that composability is an important concern, which is why I like to use what someone above called an "ARM" (abstract relational model) library, instead of an ORM. I'd put in that category Squeryl (Scala), jooq (Java), and SQLAlchemy (Python), which all allow building relational queries "on their own terms" and allow for easy composition.

Another side benefit is it lets the library authors worry about portability between databases so it's much less of an issue.

There are plenty of non-ORM database layer solutions for composing SQL queries.

I prefer ORMs for the CRUD stuff--especially for data validations and things. Filtering and querying, though, tends to benefit from some kind of structure query language.

That said, there is a good argument to be made that, if you've factored everything correctly (ha) and done your design work up front properly (ha ha) you've rendered the database as just another sort of persistence engine, so even an ORM is not always necessary.

Really happy to see some SQL appreciation. Of course ORMs can make code easier to read and allow you to change databases quickly... But I always asked myself if these 2 points are enough for the loss in performance and the use (in some cases) of really dodge background implementations of things that already are beautifully implemented by many databases.

Another issue I have with ORMs is different projects end up with different views of the same database.

Why don't you use a foreign key to that table? We don't have that table... Yes we do, it's not because you haven't defined it in your project that it doesn't exist!

And... every project ends up with their own version of the "versioned schema", so to rebuild the DB, you need to aggregate all the liquidbase/flyway files from all the projects and run them all.

When I wear my sysadmin/pseudo DBA hat, ORMs are one of my more prominent pain.

I think this pain comes from using swl databases for integration, which is an antipattern in my book, and prevalent in the enterprise. Databases should have a single application talking to them, and have a service in from of them that makes it possible to get the data needed out. Services are easier to maintain backwards compatibility with, whereas databases are a massive pain to version and migrate and having to synch multiple teams to do such a job is a nightmare.

That being said, I realize its often much more convenient at the beginning to have one database for integration. SQL is much more expressive than your service's interface will be, and in the beginning this seems like a killer.

I've yet to see the real payoff of a database service like you describe over simply having a central database library. Perhaps the payoff comes if you want to use multiple languages, but if you don't, then pulling all the schema versioning and db-related client code into a single library that is used anywhere it's needed solves all the problems your parent brought up.

I haven't seen any large enough entity using a single programing language across teams in a long time!

Fair point!

So much this. Having had to be involved in the transfer of a DB that was used for integration from a legacy SQL DB to postgres, it was a pita (and not even my job) to deal with all the small and not so small things that broke. Especially things like monthly emailed reports and all sort of quick hack/tools that no one even remembers writing.

Good point. I don't think it is that difficult to setup at the begining, especially with the low cost of entry of REST, but forseeing what will be separate and distinguishable services early on is very hard.

These are some very nice tips! One of the things I like so much about ActiveRecord is that it plays nice with pretty much every example here, e.g. `Product.where("title ~* 'title'")` or `Product.where("title ~* ?", title)`. A few other remarks:

- I'm so glad to see someone talking about `generate_series` for time-series reports. I often see queries that are `GROUP BY date`, which is usually okay, but will skip an output row if you have no data for a certain day. If you want to force one-row-per-day, use `generate_series`.

- His `GROUP BY` query can be rewritten with less repetition like this:

    SELECT  sku,
            date_part('month', created_at) m
    FROM invoice_items
    GROUP BY m
    HAVING m = 9
In other words, `GROUP BY` and `HAVING` can reference expressions that are aliased in your `SELECT`.

- Also on this query: In Postgres if `sku` is a primary key, then you can `SELECT` other columns from `invoice_items` without including them in `GROUP BY`, e.g. `SELECT sku, name, supplier_id, etc`. I think even joined stuff is okay: `SELECT sku, suppliers.name`.

- One more nit on this query: I think this is backwards from what he meant?: "having to be sure that whatever you GROUP BY is in your SELECT clause."

- Usually I don't use window functions where a `GROUP BY` will do, but to me the biggest value in window functions is they don't force any particular structure on the overall query, so they don't constrain you the way `GROUP BY` does. He kind of alludes to this when he shows per-sku monthly sales combined with total monthly sales. Good example!

- I'm not sure his CTE example shows much added value from the CTE, but maybe I'm missing something. The three places I really like CTEs are (1) recursion, (2) with INSERT/UPDATE/DELETE and RETURNING, (3) very big queries. But a lot there depends on taste.

- When you use `string_agg` and friends, you can give an ordering for just that function: `string_agg(downloads.title, ',' ORDER BY products.sku)`. That syntax works for every aggregate function (I believe), although it's especially useful for the *_agg family.

Thanks for sharing these tips! I wish more people knew the cool things you can do beyond SELECT x FROM y.

I've spent some time recently figuring out that SQL is nice. But my issue has always been the same : how do I coalesce that with best-practices in other things.

For example, how do I apply DRY and use SQL ? How do I apply type safety principles? How do I avoid syntax errors?

It feels like this ends up with building some sort of ORM.

Anyways, I'd love to hear experiences on interfacing SQL with "good" safety principles in larger-scale projects

> For example, how do I apply DRY and use SQL ?

Lots of ways. Defining appropriate views is one of the key ones, but not the only one.

> How do I apply type safety principles?

That's too vague of a question to present a concrete answer to.

> It feels like this ends up with building some sort of ORM.

Anytime you are using a database from an external language, you are going to have some code to translate back-and-forth between database objects and operations and application language objects and operations; you might call it an ORM, but if you are using SQL more robustly, it may not look much like most things that go under the ORM label, and some popular ORMs are fairly opinionated in ways which fight effective use of SQL (ActiveRecord in particular at least used to be; I haven't done much with it in quite a while and am not sure if that's still the case.)

When I can do my ideal best, I write a remote facade to my database that provides functions and views on the data as an API, which can be adapted to an ORM or mapped nearly 1:1 to a DAO.

This API lends itself to TDD, and yes, you must test your database components the same as any other code. Preferably using database code. For PostgreSQL I really liked PGTap. Include this testing in CI and other unit test executions.

My sniff-test for a well-done schema is that developing application(s) around it should be clean.

This link is about Redis but I apply the same approach to databases: http://michaelxavier.net/posts/2014-04-27-Cool-Idea-Free-Mon...

The downside is that you need to express the command language, so you lose the ad-hoc querying capability that SQL gives you. But something more structured and more closely corresponding to the domain is generally want I want anyway.

> something more structured and more closely corresponding to the domain

If the needs have changed, the API should be upgraded. It seems like ad-hoc querying is mainly to get around shortcomings in the API, right?

Sometimes the business requirements really are that ad-hoc that you need something as general as SQL (in which case there's no point reimplementing it yourself as an "inner platform").

There's no inner platform. If the business requirements obligate the schema to change, that has a cascade effect on the remote facade and any ORMs using it or DAOs written against it. The database is part of the product, but if you treat it as a foreign component on its own change schedule, I understand how it looks like an inner platform.

I meant in the sense of http://thedailywtf.com/articles/The_Inner-Platform_Effect . If you try to create a platform that has all the power and flexibility of SQL queries, you end up reimplementing SQL, poorly.

Stored procedures, parametrization of queries and views will eliminate many errors.

DRY is the #1 reason I use an ORM. I also think that it's a factor that is rarely if ever considered by people who choose not to use an ORM.

Your database is just about the worst place in the world to be strict about DRY. It turns out, your model and validations don't actually prevent broken data. They just make you think you're safe from it...

...until your broken data starts biting you and your customers in places that don't typically see sunlight.

>Your database is just about the worst place in the world to be strict about DRY.

There are no places in any code base at all where it doesn't pay to be rigorous about DRY.

>It turns out, your model and validations don't actually prevent broken data.

A) DRY isn't about preventing broken data.

B) Validations are for preventing broken data. That's what they do.

DRY is the reason I write a remote facade. Security (e.g. sanitizing inputs) is less-repeated at the database boundary than at myriad application/ORM implementations.

>DRY is the reason I write a remote facade.

Remote facade sounds like a lot of boilerplate to me.

It truly varies. If all you're doing is sanitizing a simple INSERT, it's almost pure boilerplate, but that is not a very interesting or common case.

I'll offer some of the reasoning that got me to building remote facades. The database needed to sanitize its own inputs; we could not trust application developers with that. There would be multiple client-languages (Ruby, Python, Java, for starters) and I wanted to keep domain adapters for those languages simple as possible. The database would be changing regularly, and many changes could be simply absorbed in the remote facade, so applications depending on the remote facade via their client-language libraries would be somewhat insulated.

Now, it wasn't a perfect regime, and nothing is. But it paid the bills for years and my schemas are still in play at that place.

The 'slick' library for scala ticks a lot of these boxes. It allows you to write type-safe, idiomatic scala code to query your database, transform results into proper objects while avoiding the weight of an ORM.

I disagree. It sure gives you all of those things, but it is still a heavy ORM, that requires a whole lot of slick specific magic do simple things. You can let it own your table definitions, which is a bad idea if you ever want to update it in place, you can spend a lot of time writing the definitions by hand, or you can use their code generator, which writes code that is very verbose, full of useless comments and hard to read.

For a large org, Slick also adds to the already significant Scala learning curve for Java devs. Scala itself is good and pretty, but when the first task you give someone new is to write a CRUD-like rest service using slick on one end, and spray on the other, they don't get to spend any time learning actual scala, and instead domain specific languages that are hard to reason about without serious Scala experience. No, this is not speculation: We are going through this problem at work right now, with dozens of developers learning Scala, and only a few of us that really started with significant experience.

If I have to use slick, I use the few direct SQL bits. From my perspective, is the one saving grace of the library.

We went through a similar migration process and it sounds like we had different experiences. We started with the devs taking some time just to study and learn Scala basics then added the frameworks on top of that. I think if you go that way and you're using Slick > 2.0 it really starts to look like idiomatic Scala and everything in Slick looks like List manipulations. The other bonus is that a lot of the SQL generated perfors well and is easy to read.

Fair enough, thanks for the perspective!

What library would you use instead of slick?

You could try https://github.com/njeuk/dbmapper . I use this for my Scala projects, having rejecting Slick and it's unwanted data DSL.

In the Java world checkout MyBatis it lets you write "named" SQL queries in a config file and makes it available to you in Java as a type-safe interface. Been using it recently and love it.

Months since last I used JPA but I'm well beyond 90% sure I did this in JPA as well.

You can. It's just not mundane JPA. Hibernate had the same feature before JPA borrowed it. I was the only person that new Hibernate well enough to do that. JPA targets normal CRUD->POJO apps for it's common case. Under most circumstances you don't need raw SQL for that.

The point of using an ORM is not about avoiding the difficulty of SQL or the ease of switching databases.

ORMs are about code reusability.

First off, validation of input data is painful without ORMs. Converting data between SQL, your languages' type system and the user interface in a reusable way almost necessarily leads to an abstraction which looks like an ORM. Not doing so quickly leads to bugs and vulnerabilities. I can usually tell at a glance if Django ORM code leads to an injection vulnerability or not (Hint: Unless there is custom/raw SQL there is no vulnerability). Harder to do in pure SQL.

Secondly, queries can be modified and introspected in code. Hard to do with SQL query strings, easy to do with ORMs. With the Django ORM you can progressively filter a query using variables from your context, you can modify a base query to get you a count, an aggregate and a list of model instances, and you can introspect a query for the tables it will touch.

Thirdly, an ORM makes queries reusable by different sub systems of your framework. Again in Django, you can pass a Queryset (without evaluating it before it's necessary) to the template system, to the input validation (django forms), to form widgets, the introspected admin, REST frameworks and so on.

Fourthly, ORMs help manage migrations. Migrating to and from different points of migrations can be hard to do in raw SQL or with your own tools. ORMs aren't perfect at this, but SQLAlchemy and the Django ORM both have very good migration tools which save you a lot of time and a lot of frustration, with minimal headache now and then. Migrations are easily testable and easily repeatable.

I also believe that ORMs are more important to the framework than for the individual application. A framework can only survive and grow a strong community if it can satisfy a large base of users. For Django's success it was necessary to support SQLite, Postgres, Mysql and Oracle, whereas single Django projects might thrive without ever changing to and from Postgres.

Switching from mssql to postgres, these are the kinds of things I really loved where I didn't need to roll my own. In mssql I rolled my own regex, series generation, date formatting, and others. Once you've done it, it's there though and you forget about the time that went into implementing.

However there's plenty things I prefer in mssql, but I won't go into them all here.


I didn't want to take away from the focus or be a jerk about it.

Things I prefer in mssql environment:

> Triggers work on entire dataset instead of row by row.

> Stored procs can return multiple data sets and output variables all from a single call.

> Opinion: t-sql is friendlier than pgpsql to work in.

> Backups/restores are more straightforward.

Perhaps I'm missing something, but:

> Triggers work on entire dataset instead of row by row.

in pgsql you can have trigger that works per row or per dataset [1]

> Stored procs can return multiple data sets and output variables all from a single call.

you can provide results through arguments by using "OUT" as argmode [2]

> Opinion: t-sql is friendlier than pgpsql to work in.

while it is hard to argue with an opinion, you can also use Tcl, Perl, Python (these three come standard), Java, PHP, R, Ruby, Scheme, sh[3].

[1] http://www.postgresql.org/docs/current/static/sql-createtrig...

[2] http://www.postgresql.org/docs/current/static/sql-createfunc...

[3] http://www.postgresql.org/docs/current/interactive/external-...

I'm generally a fan of pgsql, but mssql triggers are better in that they can operate on the entire set of affected rows. Postgres FOR EACH STATEMENT triggers don't get any OLD or NEW; you only get those with FOR EACH ROW triggers. In mssql OLD and NEW are the entire set of affected rows and can be used like tables. Batch operations are thus more efficient. http://www.postgresql.org/docs/9.4/static/plpgsql-trigger.ht...

You haven't gone into any, much less all, here--I for one would be curious to know about what you prefer in mssql.

I'm sure anyone can come up with specific situations where writing SQL is much better than using ORM. If I had infinite time and dev resources I would never use an ORM. In reality Entity Framework makes abstracting my data-layer very easy and increases my productivity. It also makes me feel much more comfortable about switching databases if necessary. Even if I'm taking a 5% performance hit it is totally worth it to me.

> It also makes me feel much more comfortable about switching databases if necessary.

This is one of the big features touted by ORMs, but I wonder how often it actually comes into play. I've been using Rails with ActiveRecord for most of the last 10 years now, and I've not once changed databases after starting a project. There's one project where I'd like to, but despite the use of ActiveRecord, there's enough MySQL specific stuff in the code that its not going to be trivial to just copy the data over to a Postgres database and flip the switch.

> but I wonder how often it comes into play.

Databases are not interchangeable, though sharing a common query language makes the skill set somehat interchangeable. I think this is part of the nuance often lost in the discussion of ORMs and database-portability.

I have worked with many developers who actively deny this basic reality, in favor of trying to bury all that variability under an ORM, rather than exploit any of the proprietary features. I understood it as a crutch when I was still in the flat part of that learning curve, but I am a lot happier building a rich SQL application interface (non-ORM!) these days.

> Databases are not interchangeable, though sharing a common query language makes the skill set somehat interchangeable. I think this is part of the nuance often lost in the discussion of ORMs and database-portability.

I see two realistic reasons for why having an ORM/avoiding DB-specific queries is good idea:

- your product supports multiple databases

- you want to use something like sqlite in memory for tests

> your product supports multiple databases

In this case, I tend to see pretty light loads. You can do this with fairly vanilla (ORM-generated) SQL and design. When your business begins to lean heavily on the database for operations (I saw this in finance) around the clock, and you have to start being more judicious about your queries, these applications are usually the first to buckle.

> you want to use something like sqlite in memory for tests

Only if I have to support sqlite in the field!

It's not so much switching databases during a project, but using the same skillset across projects regardless of the database engine used. SQL syntax is different enough across databases to require re-learning it.

> I wonder how often it actually comes into play

I've wondered the same thing in the past, but just recently I converted two Django projects at work from MySQL to PostgreSQL. The transition was pretty much seamless--I didn't have to change any application code.

One of the projects was converted to add spatial capabilities via PostGIS. The other was converted due to an issue with how MySQL stores data (it ended up being easier in a time crunch to dump and reload into PostgreSQL than fix the issue with MySQL).

I made another comment already how it's trivial to start a web project with SQLite and migrate to a 'real' database, or continue to use it interchangeably for local development.

But I also just realized, there is a whole ecosystem of ORM based apps that can run on any of the supported databases. If you ever used an MVC framework and re-used apps, you have "switched databases" without even knowing, depending on what the original developers used.

I'm in the process of moving some of my data storage from oracle to redis. Having already used Linq has made this easier. I had a pretty small data-model though. I think that in a large project switching databases is a large effort with or without ORM.

>I'm sure anyone can come up with specific situations where writing SQL is much better than using ORM.

You are supposed to bypass every ORM sometimes. One measure of a good ORM is whether it makes that easy for you or difficult.

Regex queries are definitely one example of a niche SQL query that I would bypass an ORM altogether in order to use.

95% of my SQL queries are not that, though, and for the very standard ones I use an ORM to heavily restrict the amount of code I have to write and maintain type safety.

The problem is that this 5% hit very often becomes a 1000% hit when you let the ORM in the hands of people who don't understand it and they do a thousand queries when a single one would be enough.

I always hear (one of) the argument for ORMs being "If I want to switch databases...", but in reality, after having questioned many many devs, nobody has ever _actually_ done this.

As one of the few devs who has actually switched databases (SQL Server to Postgres) -- without an ORM -- the biggest timesink was regression testing sanely and exporting/importing the data. Which you would have to do with an ORM anyway. The amount of time I spent re-writing a total of ~20 non-ANSI SQL queries and re-creating a handful of views (some of which were quite complex) was no more than 3 hours. And that would have gone by much faster had I been more familiar with Postgres at the time.

> regression testing sanely and exporting/importing the data. Which you would have to do with an ORM anyway

You can export data into ORM specific, database agnostic format, such as Django fixtures or SQLAlchemy fixtures, and test against the ORM. This is actually one of the key ORM features to leverage.

Pretty much ever other django project I work on starts with sqlite, on developers machine and only later gets migrated to PostgreSQL. I'm also dealing with legacy mysql website that didn't use ORM and I really wish they used an ORM. They hard coded column names, relationships, in absolutely most random places. Yes, this could be fixed with having better programmers, but if they used an ORM I could simply ignore a whole class of legacy!

I dunno, I've seen ORM absolutely abused. Never underestimate the effect a tool can have in the hands of a clueless person.

That said, there are probably fewer SQL injections in ORM code than typical mysql messes.

My previous company is actually in the process of switching from Oracle to MySQL. That said doing so is of nightmarish proportions because they have 10-15 years of legacy behind them of building their own ORM, using a modern one and just raw Oracle-SQL (no ANSI joins there!) to either replace, rewrite or drop.

Oracle -> MySQL? You have my most profound sympathies. Even before the ORM nonsense.

We switched from MySQL to Postgres on our analytics database and it would have been a lot easier with an ORM.

How do you know it would have been easier with ORM?

I'm looking for a single example of ORM-based project switching RDBMS. It looks like it almost never happens.

Agree that ORMs like Entity Framework are very good at taking basic SELECT and INSERT/UPDATE statements and jamming them into objects without a lot of code. However, consider that perhaps your position is a judgement of how few features of a database system you have needed to use, and not necessarily a judgement on the robustness of ORMs.

I have found that ORMs get you up and running more quickly, but that time saved is eventually given back working through ORM performance issues. As someone who has written complex SQL for many years, it is simply faster and easier to just write optimized queries up front. Something like jOOq makes this nice with typing and some minimal object mapping.

That's ultimately true of any framework, but ORMs are a beast all their own. Having a light abstraction framework on top of SQL, that ultimately respects it, makes a huge difference at the end of the day. No magic, just results. I've watched coworkers spend days tracking down obscure performance issues related to mapping in Hibernate that just wouldn't occur when using something like jOOQ. The only overhead I get is possibly having some DTOs to map into via a simple interface. Hardly more effort than one would expend configuring Hibernate mappings what with its myriad options.

I find the opposite to be true. I have to learn a whole bunch of new methods and concepts to use an ORM like Doctrine 2 and even a whole new query language for when the basics fail which is most of the time. I can't imagine a way to make development slower than adding an ORM.

Using distinct and windowing functions to avoid writing group-by clauses may work fine with Postgres but is a very ugly hack.

This ugly hack is not useful in general and only looks interesting here because the original example query is in bad taste. Simplifying it removes the tedium motivating the use of window functions.

For example, this is equivalent to the original query:

  SELECT sku, sum(price)
  FROM invoice_items
  WHERE date_part('month',created_at) = 9
  GROUP BY sku
See how the tedium disappears? In the original, the HAVING clause doesn't use any aggregate functions, which means it can instead be a WHERE. Removing the HAVING means we can further simplify the GROUP BY.

Of course, that only works because this example needs only a single month group. If you want multiple months and per-month price totals, then my rewritten query is no longer equivalent. It would need the more complicated GROUP BY. However, the window function version would also get more complicated in just the same way. The complication is inherent to the problem, not due to standards conformance.

Actually, what's really tedious is the verbose expression required to get the month and the fact that the author is repeating it. The standard provides "WITH" to avoid that duplication, and Postgres implements it.

Here's a reasonable version for multiple months:

  WITH items AS (
    SELECT *, date_part('month',created_at) AS month
    FROM invoice_items
  SELECT sku, month, sum(price)
  FROM items
  WHERE month in (7,8,9)
  GROUP BY sku, month
But suppose we make the example even more complicated. Suppose you want sums over different time periods, like per-month and per-quarter. Then you can't use a GROUP BY and must use window functions and DISTINCT.

  WITH items AS (
        ,date_part('year',created_at)    AS year
        ,date_part('month',created_at)   AS month
        ,date_part('quarter',created_at) AS quarter
    FROM invoice_items
      ,sum(price) OVER (PARTITION BY year)          AS yearly_total
      ,sum(price) OVER (PARTITION BY year, month)   AS monthly_total
      ,sum(price) OVER (PARTITION BY year, quarter) AS quarterly_total
      ,sum(price) OVER (PARTITION BY month)         AS month_number_total
      ,sum(price) OVER (PARTITION BY quarter)       AS quarter_number_total
  FROM items
  ORDER BY 1,2,3,4

This "hack" is also supported by Oracle and SQL Server.

SQLAlchemy can express SQL from author's post, so probably he just didn't try good ORMs.

SQL is nice but most of those features work best when the user is a finance person doing reports (which was the original purpose for SQL, I think?). Developers don't want the sql server to format their numbers as money, that's a work for the UI. The main reason for me to use an ORM is to standarize on one way to retrieve data whatever the database engine is. Maybe an intermediate solution would be better, like a query builder or a sql-like syntax that transforms to the correct SQL for the database in use like Doctrine's DQL.

My favorite query builder for SQL is jOOQ, works on the JVM. It has a great, fluent syntax and lets you get real objects back while writing what is essentially SQL.


I second jOOQ. The fluent interface actually catches SQL errors at compile time (brief example is in the wikipedia entry[1]).

[1] http://en.wikipedia.org/wiki/Java_Object_Oriented_Querying

> whatever the database engine is I know, but... How often does that flexibility come up? How often do you write code first against one DB type, and then switch to a different that would require a re-write?

It's admirable and fun to think about, for sure, but how much use does that feature actually get?

> It's admirable and fun to think about, for sure, but how much use does that feature actually get?

(Lukas from jOOQ here)

Surprisingly, that feature gets quite a bit of use. There are two use-cases (among many others) why our customers choose jOOQ:

- They're using a test database (e.g. H2) and a production database (e.g. Oracle), and things still work very nicely, while staying close to SQL

- They're using a client database (e.g. HSQLDB) and a server database (e.g. SQL Server), and can easily replicate without needing to remember subtle syntax differences all the time.

- They're selling complex products that run on as many databases as DB2, MySQL, Oracle, SQL Server, Sybase and they want to use SQL, because their queries have an average of 10 joins each.

I think the advantage isn't so much the ability to switch, but the luxury of not having to know anything specific about the database you are interfacing with. This is especially useful in heterogeneous environments (like my workplace).

Unfortunately to do any real optimizations or something even minimally complex you have to know the underlying rdbms. For example, techniques that work well in Oracle may not work well in MSSQL. If you throw mysql in there, then you have an entire set of standard sql that simply does not work or works but with horrible performance.

I already said this in another thread, it's not about switching databases in one project, but using the same skillset across projects regardless of the database.

I find SQL is pretty portable. You have to learn each server's ins and outs to do performance work, but then, those differences are part and parcel of why there are different servers, beyond basic business-competitive reasons.

ORMs just add a thick layer of fluff over an otherwise (mostly) portable language, where I'm concerned. I rather keep it simple between DBs and apps. If there is gross SQL in the app, make a view or function in the database. It isn't rocket surgery but you do need experienced staff to get it right.

Can you use the same skillset across projects regardless of the ORM, or do you have to standardize on one? Why not just standardize on one database?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact