
Embracing SQL in Postgres - GarethX
http://rob.conery.io/2015/02/24/embracing-sql-in-postgres/
======
knodi123
My CTO wrote custom PostgreSQL functions to generate all the JSON that our API
returns. First time I saw it, I said "Holy crap, that query is two pages long.
You could just tell Rails to serialize the object in 3 lines of code." He
replied "I know. That's how we did it at first. But generating the JSON
directly in postgres is about 700x faster."

Eventually we even moved a very complex scoring algorithm from an overnight
Hadoop Java process, into Postgres as a compiled extension in C. We used to
spend all night calculating those scoring results and caching them in an
enormous table. But with the Postgres+C solution, we could calculate and store
it _in realtime_ by having Postgres do the hard part.

~~~
inopinatus
You can skip the web tier entirely and have Pg returning JSON documents via an
HTTP endpoint to a rich MVC js browser application that synchs to a local
cache.

Thus re-inventing Lotus Notes.

~~~
spacemanmatt
I like OpenRESTy for such endpoints; YMMV.

But I refuse. REFUSE. to re-invent Notes.

~~~
lgas
I haven't used it yet but postgrest looks awesome for this.

[https://github.com/begriffs/postgrest](https://github.com/begriffs/postgrest)

~~~
spacemanmatt
That looks like a good tool, but have you met nginx before? I sorta view
OpenRESTy as a reason to sneak nginx into my stack. :)

------
ibejoeb
Very happy to see all of this talk over the past few months on the power of
modern--or anything beyond basic--SQL.

This article touches on one of my favorite things, too, which is built-in date
and time arithmetic. That's something that basically just works and comes for
free in the database but is often a hodgepodge of messy third-party libraries
elsewhere.

The other enormous one, for me at least, is arbitrary precision math. Proper
precision, rounding, division, and exponentiation isn't easy to come by. I'm
so tired of dealing with libmpdec, BigDecimal (flimsy as it is), GMP, etc. It
all works well in the database and it's pretty fast. Perhaps it's not suitable
to 1000-digit scale scientific computing, but it sure does make building
financial software easy.

------
p_l
In Ruby World, Sequel
([http://sequel.jeremyevans.net/](http://sequel.jeremyevans.net/)) probably
gives a bit of best of both ORM and direct SQL use, because what it really
does is an object-oriented ("native" to language) interface to SQL database.
And yes, it can build models like ActiveRecord, too!

In Common Lisp, postmodern
([http://marijnhaverbeke.nl/postmodern/](http://marijnhaverbeke.nl/postmodern/))
serves a similar role, with both DAO and lower-level access methods that
compose pretty well. The 'perec' ORM from DWIM project was based on
Postmodern, and among some weirder things I did quite easily was embedding
custom SQL into query builder, so that the object-based accesses would use the
advanced SQL. Really nifty feature :)

------
pjc50
_One thing that drives me absolutely over the cliff is how ORMs try so hard
(and fail) to abstract the power and expressiveness of SQL._

I agree with this. It's nice to deserialise a DB row into objects, but
actually _querying_ with an ORM is usually terrible and people should not be
afraid to write smart queries. Especially like those in sibling comment
[https://news.ycombinator.com/item?id=9112036](https://news.ycombinator.com/item?id=9112036)
: date/time/arithmetic.

~~~
Terr_
> actually querying with an ORM is usually terrible

What do you think of stuff like Hibernate Query Language? I'm currently
working on a project that might use DQL (a PHP variation) to get our snarl of
tables under control.

~~~
yessql
The fact that Hibernate Query Language even exists tells you that ORM is a bad
idea.

~~~
threeseed
That makes absolutely no sense.

The whole point of ORM is that it is database agnostic. Given how different
databases twist the SQL standard and provide a whole raft of proprietary
extensions you need an abstracted query layer.

Whether you give it a name like HQL or don't there will always be some layer.

~~~
akmiller
I hope that's not the whole point of an ORM given how pervasive they seem to
be in every project and how few any given projects ever switch their backend
out. Choosing an ORM because it abstracts you away from the storage engine
seems like a terrible reason.

I agree with Conery, choose the components that will compose your application
wisely and then embrace those choices and use them to their fullest.

------
ptype
Even though I think it's important for developers to have some understanding
of the underlying SQL, I still believe a good ORM can save time and make code
more readable. I think SQLAlchemy does a good job here - and it's close enough
to SQL such that you are typically not surprised by the SQL it produces.

For more complicated queries, I often hand write them in SQL first and then
translate them into SQLAlchemy's methods. It's still worth it, since it's both
easier to build dynamic queries (e.g. dynamic WHERE clauses) and it makes the
parameterisation trivial.

My problem with stored procedures is that you can end up with a lot of
business code in the db.

~~~
marcosdumay
Why is it that the DB is the correct place for business data, but not for
business code?

Of course, even for practical reasons, it's important to keep the DB lean. But
some code does really belong toghether with the data.

~~~
threeseed
Sorry but putting business code in databases is a terrible, terrible idea.

(a) Oracle, Microsoft, Teradata etc all charge through the roof for scaling
out your database. Which of course you're going to need to do if you're adding
more and more business code. Scaling out normal code ? Cheap.

(b) What happens if you exceed the capabilities of your database (yes it
happens). Then you have a major project on your hands migrating both data and
code.

(c) The platform for developing/executing your business code in the database
is like going back to the 1980s compared to the new flexible, microservices
world we live in today. Really stored procedures over NodeJS, Scala, Go etc ?
Performance isn't that important in businesses where most processes are batch
orientated.

(d) Databases in businesses are as tightly controlled as they get. As a
developer would you want to have to go through laborious change management
processes every time you push a commit ? It would cripple software development
teams.

There is a reason why "data lakes" are all the rage right now. Because people
need cheap places to store and process their data and then use their SQL
database simply for querying and reporting.

~~~
marcosdumay
(a) Yes, databases scale badly. You don't want to run your entire application
there. Yet, code that reads a huge volume of data, calculates something based
on parameters, and write it back to the DB will put more weight on your DB
servers if it's run from the application layer; code that enforces the
consistency of the data needs a strongly enforced policy if you keep it in any
other place, but it works completely transparently if you place it in the
database; code that denormalize the data will be much easier to use if you can
run directly on your queries...

(b) Been there, done that. Migrating code is EASY. Data is what's hard.

(c) Is that a joke? Honestly, I can't tell.

(d) That's more a reason to integrate those teams than to choose one place
over the other. The programmers can disrupt the database on several ways
without putting code there, and don't get any extra power by running their
code in a different server.

By the way, the data that was famously leaking recently was mostly from email
and file servers...

------
zvikara
For context, the author Rob Conery wrote SubSonic - one on the first
opensource ORM for .Net which was very popular at the time.
[http://en.wikipedia.org/wiki/SubSonic_(software)](http://en.wikipedia.org/wiki/SubSonic_\(software\))

------
pjmlp
I never got that much sold into ORMs when doing anything more complex than
plain CRUD, I rather use the expressiveness and performance of plain SQL and
its associated procedural extensions.

~~~
mercurial
One thing you get with a good ORM is composability, though. Trying to build
dynamic SQL queries with raw strings is a nightmare.

Another, less useful property, is a considerable reduction of boilerplate when
persisting a complex object graph.

~~~
pjmlp
> Trying to build dynamic SQL queries with raw strings is a nightmare.

Create the optimized version in a SQL editor, then use the builder pattern to
fill in the holes in the query.

Much less boilerplate than any ORM tool generates.

> Another, less useful property, is a considerable reduction of boilerplate
> when persisting a complex object graph.

That depends how expressive is the application language's meta-programming
capabilities.

The thing is, if I have to fine-tune the ORM generated SQL for optimal
performance, its value diminishes.

~~~
mercurial
> Much less boilerplate than any ORM tool generates.

I disagree. The SQLAlchemy query language is very nice, for instance.

> That depends how expressive is the application language's meta-programming
> capabilities.

Certainly, but metaprogramming doesn't necessarily rhyme with readability.

> The thing is, if I have to fine-tune the ORM generated SQL for optimal
> performance, its value diminishes.

Of course. However: the SQL generated will be fine for the majority of
queries. What's going to bite is the use of lazy collections. For complex
queries, you may want to fine-tune it by hand (though not necessarily).

That said, I rather agree that ORMs are best avoided. The existence of lazy
collections makes it very easy to turn what ought to be a straightforward,
fast query into something slow. What I'd like to work with is a query builder
for select/insert, and avoid magic, even at the cost of creating by hand
awkward structures like PersonWithAddress and PersonWithAddressAndBankAccount.

~~~
sanderjd
Yeah, query-building languages with good composability are a sweet-spot. In
ruby, there is arel, which describes itself as "a relational algebra", rather
than an ORM and is actually how activerecord, which is absolutely an ORM, is
implemented. It's possible, even easy, but not particularly idiomatic, to drop
down into using arel for more complex queries.

------
humbertomn
Really happy to see some SQL appreciation. Of course ORMs can make code easier
to read and allow you to change databases quickly... But I always asked myself
if these 2 points are enough for the loss in performance and the use (in some
cases) of really dodge background implementations of things that already are
beautifully implemented by many databases.

------
dorfsmay
Another issue I have with ORMs is different projects end up with different
views of the same database.

Why don't you use a foreign key to that table? We don't have that table... Yes
we do, it's not because you haven't defined it in your project that it doesn't
exist!

And... every project ends up with their own version of the "versioned schema",
so to rebuild the DB, you need to aggregate all the liquidbase/flyway files
from all the projects and run them all.

When I wear my sysadmin/pseudo DBA hat, ORMs are one of my more prominent
pain.

~~~
habitue
I think this pain comes from using swl databases for integration, which is an
antipattern in my book, and prevalent in the enterprise. Databases should have
a single application talking to them, and have a service in from of them that
makes it possible to get the data needed out. Services are easier to maintain
backwards compatibility with, whereas databases are a massive pain to version
and migrate and having to synch multiple teams to do such a job is a
nightmare.

That being said, I realize its often much more convenient at the beginning to
have one database for integration. SQL is much more expressive than your
service's interface will be, and in the beginning this seems like a killer.

~~~
sanderjd
I've yet to see the real payoff of a database service like you describe over
simply having a central database _library_. Perhaps the payoff comes if you
want to use multiple languages, but if you don't, then pulling all the schema
versioning and db-related client code into a single library that is used
anywhere it's needed solves all the problems your parent brought up.

~~~
dorfsmay
I haven't seen any large enough entity using a single programing language
across teams in a long time!

~~~
sanderjd
Fair point!

------
pjungwir
These are some very nice tips! One of the things I like so much about
ActiveRecord is that it plays nice with pretty much every example here, e.g.
`Product.where("title ~* 'title'")` or `Product.where("title ~* ?", title)`. A
few other remarks:

\- I'm so glad to see someone talking about `generate_series` for time-series
reports. I often see queries that are `GROUP BY date`, which is _usually_
okay, but will skip an output row if you have no data for a certain day. If
you want to force one-row-per-day, use `generate_series`.

\- His `GROUP BY` query can be rewritten with less repetition like this:

    
    
        SELECT  sku,
                SUM(price),
                date_part('month', created_at) m
        FROM invoice_items
        GROUP BY m
        HAVING m = 9
    

In other words, `GROUP BY` and `HAVING` can reference expressions that are
aliased in your `SELECT`.

\- Also on this query: In Postgres if `sku` is a primary key, then you can
`SELECT` other columns from `invoice_items` without including them in `GROUP
BY`, e.g. `SELECT sku, name, supplier_id, etc`. I _think_ even joined stuff is
okay: `SELECT sku, suppliers.name`.

\- One more nit on this query: I think this is backwards from what he meant?:
"having to be sure that whatever you GROUP BY is in your SELECT clause."

\- Usually I don't use window functions where a `GROUP BY` will do, but to me
the biggest value in window functions is they don't force any particular
structure on the overall query, so they don't constrain you the way `GROUP BY`
does. He kind of alludes to this when he shows per-sku monthly sales combined
with total monthly sales. Good example!

\- I'm not sure his CTE example shows much added value from the CTE, but maybe
I'm missing something. The three places I really like CTEs are (1) recursion,
(2) with INSERT/UPDATE/DELETE and RETURNING, (3) very big queries. But a lot
there depends on taste.

\- When you use `string_agg` and friends, you can give an ordering for just
that function: `string_agg(downloads.title, ',' ORDER BY products.sku)`. That
syntax works for every aggregate function (I believe), although it's
especially useful for the *_agg family.

Thanks for sharing these tips! I wish more people knew the cool things you can
do beyond SELECT x FROM y.

------
rtpg
I've spent some time recently figuring out that SQL is nice. But my issue has
always been the same : how do I coalesce that with best-practices in other
things.

For example, how do I apply DRY and use SQL ? How do I apply type safety
principles? How do I avoid syntax errors?

It feels like this ends up with building some sort of ORM.

Anyways, I'd love to hear experiences on interfacing SQL with "good" safety
principles in larger-scale projects

~~~
crdoconnor
DRY is the #1 reason I use an ORM. I also think that it's a factor that is
rarely if ever considered by people who choose not to use an ORM.

~~~
rosser
Your database is just about the worst place in the world to be strict about
DRY. It turns out, your model and validations don't actually prevent broken
data. They just make you think you're safe from it...

...until your broken data starts biting you and your customers in places that
don't typically see sunlight.

~~~
crdoconnor
>Your database is just about the worst place in the world to be strict about
DRY.

There are no places in any code base __at all __where it doesn 't pay to be
rigorous about DRY.

>It turns out, your model and validations don't actually prevent broken data.

A) DRY isn't about preventing broken data.

B) Validations are _for_ preventing broken data. That's what they _do_.

------
bayesianhorse
The point of using an ORM is not about avoiding the difficulty of SQL or the
ease of switching databases.

ORMs are about code reusability.

First off, validation of input data is painful without ORMs. Converting data
between SQL, your languages' type system and the user interface in a reusable
way almost necessarily leads to an abstraction which looks like an ORM. Not
doing so quickly leads to bugs and vulnerabilities. I can usually tell at a
glance if Django ORM code leads to an injection vulnerability or not (Hint:
Unless there is custom/raw SQL there is no vulnerability). Harder to do in
pure SQL.

Secondly, queries can be modified and introspected in code. Hard to do with
SQL query strings, easy to do with ORMs. With the Django ORM you can
progressively filter a query using variables from your context, you can modify
a base query to get you a count, an aggregate and a list of model instances,
and you can introspect a query for the tables it will touch.

Thirdly, an ORM makes queries reusable by different sub systems of your
framework. Again in Django, you can pass a Queryset (without evaluating it
before it's necessary) to the template system, to the input validation (django
forms), to form widgets, the introspected admin, REST frameworks and so on.

Fourthly, ORMs help manage migrations. Migrating to and from different points
of migrations can be hard to do in raw SQL or with your own tools. ORMs aren't
perfect at this, but SQLAlchemy and the Django ORM both have very good
migration tools which save you a lot of time and a lot of frustration, with
minimal headache now and then. Migrations are easily testable and easily
repeatable.

I also believe that ORMs are more important to the framework than for the
individual application. A framework can only survive and grow a strong
community if it can satisfy a large base of users. For Django's success it was
necessary to support SQLite, Postgres, Mysql and Oracle, whereas single Django
projects might thrive without ever changing to and from Postgres.

------
buckbova
Switching from mssql to postgres, these are the kinds of things I really loved
where I didn't need to roll my own. In mssql I rolled my own regex, series
generation, date formatting, and others. Once you've done it, it's there
though and you forget about the time that went into implementing.

However there's plenty things I prefer in mssql, but I won't go into them all
here.

Edit:

I didn't want to take away from the focus or be a jerk about it.

Things I prefer in mssql environment:

> Triggers work on entire dataset instead of row by row.

> Stored procs can return multiple data sets and output variables all from a
> single call.

> Opinion: t-sql is friendlier than pgpsql to work in.

> Backups/restores are more straightforward.

~~~
takeda
Perhaps I'm missing something, but:

> Triggers work on entire dataset instead of row by row.

in pgsql you can have trigger that works per row or per dataset [1]

> Stored procs can return multiple data sets and output variables all from a
> single call.

you can provide results through arguments by using "OUT" as argmode [2]

> Opinion: t-sql is friendlier than pgpsql to work in.

while it is hard to argue with an opinion, you can also use Tcl, Perl, Python
(these three come standard), Java, PHP, R, Ruby, Scheme, sh[3].

[1] [http://www.postgresql.org/docs/current/static/sql-
createtrig...](http://www.postgresql.org/docs/current/static/sql-
createtrigger.html)

[2] [http://www.postgresql.org/docs/current/static/sql-
createfunc...](http://www.postgresql.org/docs/current/static/sql-
createfunction.html)

[3]
[http://www.postgresql.org/docs/current/interactive/external-...](http://www.postgresql.org/docs/current/interactive/external-
pl.html)

~~~
brlewis
I'm generally a fan of pgsql, but mssql triggers are better in that they can
operate on the entire set of affected rows. Postgres FOR EACH STATEMENT
triggers don't get any OLD or NEW; you only get those with FOR EACH ROW
triggers. In mssql OLD and NEW are the entire set of affected rows and can be
used like tables. Batch operations are thus more efficient.
[http://www.postgresql.org/docs/9.4/static/plpgsql-
trigger.ht...](http://www.postgresql.org/docs/9.4/static/plpgsql-trigger.html)

------
rondon2
I'm sure anyone can come up with specific situations where writing SQL is much
better than using ORM. If I had infinite time and dev resources I would never
use an ORM. In reality Entity Framework makes abstracting my data-layer very
easy and increases my productivity. It also makes me feel much more
comfortable about switching databases if necessary. Even if I'm taking a 5%
performance hit it is totally worth it to me.

~~~
jon-wood
> It also makes me feel much more comfortable about switching databases if
> necessary.

This is one of the big features touted by ORMs, but I wonder how often it
actually comes into play. I've been using Rails with ActiveRecord for most of
the last 10 years now, and I've not once changed databases after starting a
project. There's one project where I'd like to, but despite the use of
ActiveRecord, there's enough MySQL specific stuff in the code that its not
going to be trivial to just copy the data over to a Postgres database and flip
the switch.

~~~
spacemanmatt
> but I wonder how often it comes into play.

Databases are not interchangeable, though sharing a common query language
makes the skill set somehat interchangeable. I think this is part of the
nuance often lost in the discussion of ORMs and database-portability.

I have worked with many developers who actively deny this basic reality, in
favor of trying to bury all that variability under an ORM, rather than exploit
any of the proprietary features. I understood it as a crutch when I was still
in the flat part of that learning curve, but I am a lot happier building a
rich SQL application interface (non-ORM!) these days.

~~~
mercurial
> Databases are not interchangeable, though sharing a common query language
> makes the skill set somehat interchangeable. I think this is part of the
> nuance often lost in the discussion of ORMs and database-portability.

I see two realistic reasons for why having an ORM/avoiding DB-specific queries
is good idea:

\- your product supports multiple databases

\- you want to use something like sqlite in memory for tests

~~~
spacemanmatt
> your product supports multiple databases

In this case, I tend to see pretty light loads. You can do this with fairly
vanilla (ORM-generated) SQL and design. When your business begins to lean
heavily on the database for operations (I saw this in finance) around the
clock, and you have to start being more judicious about your queries, these
applications are usually the first to buckle.

> you want to use something like sqlite in memory for tests

Only if I have to support sqlite in the field!

------
bifel
Using distinct and windowing functions to avoid writing group-by clauses may
work fine with Postgres but is a very ugly hack.

~~~
adamtj
This ugly hack is not useful in general and only looks interesting here
because the original example query is in bad taste. Simplifying it removes the
tedium motivating the use of window functions.

For example, this is equivalent to the original query:

    
    
      SELECT sku, sum(price)
      FROM invoice_items
      WHERE date_part('month',created_at) = 9
      GROUP BY sku
    

See how the tedium disappears? In the original, the HAVING clause doesn't use
any aggregate functions, which means it can instead be a WHERE. Removing the
HAVING means we can further simplify the GROUP BY.

Of course, that only works because this example needs only a single month
group. If you want multiple months and per-month price totals, then my
rewritten query is no longer equivalent. It would need the more complicated
GROUP BY. However, the window function version would also get more complicated
in just the same way. The complication is inherent to the problem, not due to
standards conformance.

Actually, what's really tedious is the verbose expression required to get the
month and the fact that the author is repeating it. The standard provides
"WITH" to avoid that duplication, and Postgres implements it.

Here's a reasonable version for multiple months:

    
    
      WITH items AS (
        SELECT *, date_part('month',created_at) AS month
        FROM invoice_items
      )
      SELECT sku, month, sum(price)
      FROM items
      WHERE month in (7,8,9)
      GROUP BY sku, month
    

But suppose we make the example even more complicated. Suppose you want sums
over different time periods, like per-month and per-quarter. Then you _can 't_
use a GROUP BY and _must_ use window functions and DISTINCT.

    
    
      WITH items AS (
        SELECT
            *
            ,date_part('year',created_at)    AS year
            ,date_part('month',created_at)   AS month
            ,date_part('quarter',created_at) AS quarter
        FROM invoice_items
      )
      SELECT DISTINCT
          sku
          ,year
          ,month
          ,quarter
          ,sum(price) OVER (PARTITION BY year)          AS yearly_total
          ,sum(price) OVER (PARTITION BY year, month)   AS monthly_total
          ,sum(price) OVER (PARTITION BY year, quarter) AS quarterly_total
          ,sum(price) OVER (PARTITION BY month)         AS month_number_total
          ,sum(price) OVER (PARTITION BY quarter)       AS quarter_number_total
      FROM items
      ORDER BY 1,2,3,4

------
k_bx
SQLAlchemy can express SQL from author's post, so probably he just didn't try
good ORMs.

------
towelguy
SQL is nice but most of those features work best when the user is a finance
person doing reports (which was the original purpose for SQL, I think?).
Developers don't want the sql server to format their numbers as money, that's
a work for the UI. The main reason for me to use an ORM is to standarize on
one way to retrieve data whatever the database engine is. Maybe an
intermediate solution would be better, like a query builder or a sql-like
syntax that transforms to the correct SQL for the database in use like
Doctrine's DQL.

~~~
RangerScience
> whatever the database engine is I know, but... How often does that
> flexibility come up? How often do you write code first against one DB type,
> and then switch to a different that would require a re-write?

It's admirable and fun to think about, for sure, but how much use does that
feature actually get?

~~~
towelguy
I already said this in another thread, it's not about switching databases in
one project, but using the same skillset across projects regardless of the
database.

~~~
spacemanmatt
I find SQL is pretty portable. You have to learn each server's ins and outs to
do performance work, but then, those differences are part and parcel of _why_
there are different servers, beyond basic business-competitive reasons.

ORMs just add a thick layer of fluff over an otherwise (mostly) portable
language, where I'm concerned. I rather keep it simple between DBs and apps.
If there is gross SQL in the app, make a view or function in the database. It
isn't rocket surgery but you do need experienced staff to get it right.

