
Literate SQL - based2
https://modern-sql.com/use-case/literate-sql
======
esaym
In defense of the "Performance Considerations" against postgres, I worked on a
team that had a BI guy that regularly wrote giant multi page queries using
WITH statements.

The main database was IBM DB2 9.7 on RHEL and it had about 3 months (800GB) of
data in it (we pruned it nightly). We had a "reporting" database that was
postgres 9.3 or 9.4 (can't remember). It was the same schema but was never
pruned. It had about 15 billion rows total and about 5TB of data. About 700GB
was indexes. The same query when ran on DB2 took 12 hours, on postgres (5x
more data) it took 24 hours.

The query plans were mostly identical (neither one did much to optimize the
WITH statements). The speed difference seemed to come from Postgres just
accessing the disk better.

When doing an index scan, the disk read speed was about 80MBs, while DB2 was
10MBs. Full table scans in Postgres were between 180-220MBs while DB2 was
around 60MBs. Same storage too (netapp over nfs). DB2 was using direct IO so
it like to read in 4k chunks, but even disabling that didn't help much.

~~~
assface
> The same query when ran on DB2 took 12 hours, on postgres (5x more data) it
> took 24 hours.

You're doing it wrong. These type of queries should be run on a column store
(e.g., Vertica).

~~~
esaym
All the money was flowing to IBM licenses. Hence when we needed another
database for reporting, we went with open source. Vertica is big money++ And
really, after my stint with that company, I really can't stand commercialized
licensed software anymore.

~~~
CuriousSkeptic
Did you look inte monetdb? I'm curios about real world experiences with that
one.

------
melted
IMO, Microsoft got it right way back when with LINQ syntax. They did it for a
different reason (to make things easier to convert to a sequence of plain, old
fashioned method invocations), but the idea was the right one. For those who
don't know, in a LINQ query you first specify the data source(s) and joins,
then your filtering logic, then grouping and ordering, and then your
projections, including projections into objects and anonymous structs. That's
just the way it oughtta have been from the start. As a bonus, it also
simplifies things like autocomplete, because you know the data sources up
front.

------
toyg
This is a gem of a website, thanks for posting. For all the noise about NOSQL
and ORM tools, there will always be a humongous number of situations where SQL
is unavoidable (or even the best tool for the job).

------
joshka
A somewhat related tip that works in SqlServer (not sure about other dialects)
is to put the column alias first rather than at the end of a calculated
column. E.g. rather than:

    
    
      SELECT
        Foo,
        Sum(Baz) AS Bar
      FROM Quux
    

Use instead:

    
    
      SELECT
        Foo,
        Bar = Sum(Baz)
      FROM Quux
    

The advantage here is that the names also line up neatly and the shape (column
names etc.) of the result is obvious.

~~~
rupert_murdaaa
I'm so used to using and reading the "AS" syntax for aliasing that if I came
across this it'd throw me off.

It seems to me that sticking everything in CTEs would be equally as jarring.
Writing SQL or reading and understanding someone else's SQL is almost always a
non-linear process, not least of all because the processing order by the
database engine isn't top-to-bottom.

~~~
joshka
It took a minute for me to accept it as well, but once I did the queries just
look so much better.

------
planetjones
Completely agree with this - last week I was writing a query with one of our
junior developers. Rewriting to use a couple of with clauses with very
descriptive names immediately made the intention clear. The with clause is
really the clean code enabler for my SQL.

------
frik
WITH statements seem convinient. But nested recursive SQL statements /
subqueries can be an performance issue. Sometimes speed is more important than
a SQL99 or SQL2003 feature, some implementations are fast with SQL89/92
syntax.

------
johnnyb9
Using SQL Server my normal pattern of development is to do what I need using
CTE's (WITH) and then switch to temp tables if/when the queries start to show
poor performance.

~~~
chris_wot
That's valid, but you want to keep in mind that SQL Server has ONE tempdb that
is used across _all_ databases. TempDB is used for temp tables, but TempDB is
used for a raft of other features (like snapshot isolation). You need to
ensure you don't start getting tempdb contention.

