
Literate SQL - mshenfield
http://modern-sql.com/use-case/literate-sql
======
caseymarquis
It's critical to have a good understanding of SQL, but once you do, ORMs with
a functional syntax solve this problem for many of us. I know it's not
possible for some people to use ORMs as they can't risk the ORM making a
performance mistake, but I think the solution is to improve ORMs to the point
that writing raw SQL is akin to writing assembly instead of using a higher
level language.

That said, we're not there yet. Using an ORM in 2017 without understanding
what SQL you want it to produce IS NOT OKAY. Please learn SQL if you're going
to use an ORM, and understand the SQL you are producing with the ORM.

Summed up, if you can, use an ORM to solve this problem.

~~~
enjoiful
I love Django -- but what it does is mostly magic to me. That being said...
I'm fairly competent in being effective in Django -- although not for advanced
and efficient querying.

I want to get strong in SQL -- where/how do I start?

~~~
3131s
Write raw SQL queries. My feeling on the Django ORM is that it's useful for
defining models and managing migrations, but I prefer to write raw SQL for all
queries.

The PostgreSQL documentation is really good too.

~~~
ubernostrum
Meh. My feeling is that Django and other web frameworks are too magical -- I
always worry they might be doing something inefficient. That's why I write all
my web applications in hand-tuned assembly. It even has reusable abstractions
for when I need them, and nice hand-formatted assembly is super readable!

~~~
3131s
You're seriously comparing SQL to assembly? There's a point where it doesn't
make sense to add another layer, another dependency, etc.

~~~
ubernostrum
In both the "never use an ORM" and "always use assembly" cases, the argument
is based entirely on having full control of what's going on and not trusting
any intermediate layer to get it right or eke out every last femtosecond worth
of performance.

So it's a perfectly fair comparison.

------
nerdponx
I wish CTEs ("common table expression", i.e. a "with clause") had the same
performance as a subquery in PSQL. I always assumed they'd be implemented as a
kind of macro that expanded to a subquery. Is there a good reason for this
distinction?

For that matter, having some kind of SQL-oriented macro/preprocessor language
would be fantastic. I guess GPP (General Preprocessor,
[https://logological.org/gpp](https://logological.org/gpp)) is always an
option.

~~~
gjvc
For the uninitiated, what is GPP ?

~~~
okket
Closest guess: General-purpose PreProcessor

[https://github.com/logological/gpp/](https://github.com/logological/gpp/)

~~~
nerdponx
That's it, sorry I wasn't clear.

------
bayonetz
Love the literate sql style. A lot of times we have to use temp tables instead
of with statements though. In Vertica, all the withs of a query have fit in
memory together. Using temp tables instead to create the logical sequence, you
only need enough memory for each temp table and final query on there own. The
cognitive effect is still the same. Define A, Define B based on A, Define C
based on B, etc. but sequentially rather than recursively. When the queries
are smaller, withs are more direct, a little less verbose, and usually more
performant. However, with the datasets I'm working with, the queries are
usually too big or I just don't know how big they will be as I start out
writing. So I typically just use temp tables from the outset vs writing with
withs only to have to rewrite it once I get "too big to fit in memory" errors.

~~~
teej
We do the same thing with Redshift. Using temp tables also gives us more
direct control over the query plan, since we can specify distribution and sort
keys. On top of that, we can run unit tests on the temporary table (explained
in another comment) and fail the query if tests fail.

------
AndrewOMartin
I see that this is about a way of writing SQL, but as Literate Programming is
cited in the title, I wanted to point out that you can already use the true
Literate Programming tool noweb to write SQL with or without "with"
statements.

The advantage is that you'll be able to write full documentation amongst the
SQL, present it in any order, and reuse chunks.

The disadvantage is that it outputs to stdout, so if that's no good for your
task then it's no help.

~~~
ajarmst
Well, stdout can be redirected in any sane environment, so that's not much pf
a disadvantage. However, thanks for making the point that what they're
describing (good variable naming practices, and the use of 'with' to prefix
them) is hardly what Knuth was talking about when he coined "Literate
Programming".

------
krylon
It took me a long time to learn SQL well enough to appreciate its beauty.

Simple queries, you know, "SELECT foo, bar FROM baz WHERE lastChange >
CURRENT_DATE" are easy.

But if you are facing the database of your ERP software (as I often am) whose
vendor is very reluctant to tell you about its internal structure and how that
interfaces with the ERP system (my gut feeling, though, is that we're lucky -
SAP and Oracle are probably much less friendly to people poking around in
their databases to create custom reports, hehe), using SQL and its interactive
nature to explore the database is a lot of fun as long as the database design
is relatively sane. Thank God our ERP vendor's programmers were not creative
enough to do insane things.

(Well, they did one crazy thing - there are NO foreign keys to be found
anywhere in that database, instead it is all faked with triggers. I think
that's how people used SQLite before it supported foreign keys. But we're
talking about Microsoft freaking SQL Server here; being derived from another
enterprise-y RDBMS, I find it hard to believe that it would at some point have
lacked foreign keys. Since the triggers DO check for referential integrity,
why on earth did they no just use Foreign keys? What were they thinking?)

It gets a little mind-bending at times, but in a good way.

But explaining SQL queries of the non-trivial kind to somebody is
intimidating. One of our accountants at one point expressed interest in
learning SQL, because she would bug me with questions that I answered by
running a few carefully worded queries. For some reason I find SQL relatively
easy to understand but really, really hard to explain.

------
lisper
Why not just reverse the SELECT and FROM clauses?

    
    
        FROM table1, table2
        SELECT col1, col2
        WHERE ...
    

or even better

    
    
        FROM table1 JOIN table2 ON condition ...
        SELECT ...
        WHERE [additional join conditions]
        HAVING [filter conditions]
    

Then the compound version looks like:

    
    
        FROM (
          FROM tables...
          SELECT ...
          WHERE
        ) SELECT...
        WHERE ...
    

or

    
    
        FROM (
          FROM tables ...
          SELECT ...
        ) JOIN [whatever]
        SELECT ...
        WHERE ...
    

Actually, putting the WHERE before the SELECT makes even more sense.

~~~
kierenj
Because that's invalid SQL?

~~~
lisper
So was the WITH clause until someone decided it should be valid and added it.

Every improvement to every language ever started out with someone saying,
"Hey, here's an idea..." And none of those ideas were ever "valid X" at the
time they were proposed.

------
mshenfield
I came to this page looking for ways to parameterize and re-use raw SQL.
_with_ (and _create view_ for multiple queries) fulfill the case where the
unit of reuse is a filtered view on one or more tables.

~~~
teej
To make progress on this problem, I wrote a light SQL preprocessor to support
INCLUDE statements. It enables me to write code like this:

    
    
        WITH frequently_bought_together AS (
          INCLUDE frequently_bought_together.sql
        )
        SELECT ...
    

This allows way better isolation and reuse of business logic than before. In
Redshift, I combine this with an assert user-defined function to enable
writing unit tests in raw SQL.

With all that together, I can trust analysts to update complex data assets and
I can ask them to take any data issue investigation they've done and turn it
into a re-usable test. Tests end up looking like:

    
    
        CREATE TEMPORARY TABLE frequently_bought_together AS
        INCLUDE frequently_bought_together.sql
        ;
    
        SELECT f_assert(COUNT(*) > 0, 'Table is empty');
        SELECT f_assert(COUNT(DISTINCT item_bought || item_recommended) = COUNT(*), 'Table is fanned out');
        ...
    

It has made a huge difference in how we write SQL.

~~~
mshenfield
It took me a second, but I can see why this would be handy. Whether you just
use it in a _with_ , or create a view from it, you have a single definition of
the query that anyone can use. And you can test the _INCLUDE_ d query. Neat.

------
ajarmst
The use of meaningful labels and "with" to prefix them may be good practice,
but that's barely related to what Knuth meant by "Literate Programming".

------
zamalek
It's great to see highly legible SQL in a tutorial. These days it seems as
though everyone on the internet is secretly running a SQL obfuscation
competition.

------
lucio
Nice article, thx. I think I'll start using WITH more.

