
SQL queries don't start with SELECT - protomyth
https://jvns.ca/blog/2019/10/03/sql-queries-don-t-start-with-select/
======
pjungwir
Optimizer nuances aside, I agree with this and think it's helpful for anyone
writing SQL. I've trained a lot of non-developers (and new-developers) to use
SQL, and one of my favorite tips is, "Start with FROM."

Even sooner than "start with FROM" is "What does each row of the result mean?"
Is it a sale? A month of sales? A customer? If you can answer what you _want_
, then it's usually easy to start there. Especially if your answer corresponds
to an existing table, put that in the FROM, and then add whatever joins you
need. You can even think about the joins "imperatively".

Thinking imperatively about JOINs is also a helpful way of understanding a
mistake I call "too many outer joins", which I wrote about here:
[https://illuminatedcomputing.com/posts/2015/02/too_many_oute...](https://illuminatedcomputing.com/posts/2015/02/too_many_outer_joins/)
If you don't know about this mistake, it's easy to get the wrong answer from
aggregate functions like AVG or COUNT. (EDIT: I should probably call this "too
many joins" instead, since it's not really specific to outer joins. But in
practice I find outer joins are more often used in these sort of situations.)

~~~
yread
For updates, and deletes on the other hand I try to start with the where
clause. Learnt the lesson after running a update query once before i typed the
where to cause something like 40k worth of damage

~~~
brigandish
Good advice. I also break up the statement so that an incomplete highlight and
execute will fail, e.g.

    
    
        DELETE FROM   
          Customers WHERE    
          Name='Gude' AND   
          Area='Tama'   
          ;  
    

Much harder to leave off half the clause doing this kind of thing.

~~~
fr1tkot
I've forced the habit upon myself to first compose a SELECT statement and
verify that it will give me exactly the dataset that I want to run my
operation on. Then I wrap that statement into a CTE and run the DELETE /
UPDATE on the CTE.

It doesn't save me time, but a lot of nerves.

~~~
CubicsRube
I'm used to writing the SELECT query and adding the commented-out DELETE just
below the SELECT line.

    
    
      SELECT *
      -- DELETE
      FROM Sales
      WHERE Customer = 1
    

Next, selecting the whole query and executing it. If results are satisfactory
- selecting only the DELETE part.

~~~
piaste
This is the one I use too. Tiny difference - I write

    
    
       select * -- delete
       from sales
       where customer = 1
    

so it's even harder to accidentally highlight the 'delete' part.

Although I've switched to DBeaver for a year now, and it automatically pops up
a warning when it detects a DELETE query without a WHERE, which is very nice.

~~~
samus
DBeaver has something even better, albeit sometimes annoying: you can define a
connection as "Production" (highlighted red) and "Test" (green). On Production
systems, DBeaver will ask you for confirmation for every write statement.

------
Pxtl
This is one of the things MS Entity Framework gets right - it enforces the
"start with FROM". It actually _ends_ with SELECT, which makes so much more
sense.

Seriously, between navigational properties, group-with-children, EF to me
hammers home how _bad_ SQL is. EF is a square peg in a round hole but it does
a great job at makign some trivial improvements over SQL, while still
remaining true to the principles of relational databases.

Of course, the framework is a hairy mess for other reasons (lazy loading,
untranslateable methods, ugly generated queries) but the alternative to SQL it
presents is _lovely_.

~~~
quickthrower2
What do you mean SQL is bad? SQL is very good and some dialects can do some
amazing things. I don't think EF is much better, and EF (older Linq2Sql as
well) has some gotchas because it converts your expression into SQL, except
for when it doesn't and throws a runtime error that it couldn't. And there are
some edge cases where you think a C# expression would convert to SQL but alas,
it wont.

~~~
manigandham
The point is that LINQ expressions (in the query syntax) can be more readable
and straightforward than the equivalent statement in SQL.

EF is just an ORM that benefits from using the LINQ syntax and operators
available in .NET/C#.

~~~
quickthrower2
Thanks. Even on just that point, I disagree. I think they are equally
readable. Just like Python and C# are equally readable to the respective
cohorts that are used to programming in those languages.

Yes, EF queries are more readable to many developers who know C# well but
don't want to get used to SQL. But the problem with that is DB apathy + EF
leads to some real performance problems as systems scale that grinds the
application to a halt with SQL queries longer than dissertations, to which the
developers need a quick hack to fix. The query plan for those beasts is
intractable. Which of the 10 table scans or index scans to tackle first? And
how do you tackle it?

They discover "indexed views" and all is well until you start to have locking
problems, so more hacks are needed. etc. By writing the damn SQL to begin with
they'd have to tame the beast earlier on and address the design flaws in their
schema and/or application.

Ok this is a bit of a strawman (although it is based on experience) but I
think saying SQL is less readable is a red herring. It is more readable for a
lot of use cases. It's a DSL designed for the job, you really can't get much
better (except for Elastic Search use cases, etc.). It's slightly old
fashioned but that's just fashion.

~~~
manigandham
This isn't about ORMs or database performance. It's just a comparison of query
languages, with LINQ being more natural for data models with lots of joins and
nested records that are common today (especially since that's how people work
with objects inside their programming language).

SQL is far from perfect. It would be nice to see some modern SQL dialects that
follow the LINQ syntax and are natively supported by RDBMS.

~~~
scarface74
Linq is horrible for outer joins, the syntax is completely unintuitive to the
point that I’ll just use raw sql.

------
the_watcher
The author has been posting some pretty awesome SQL content on Twitter
recently. I'm a data scientist who uses SQL pretty frequently, and I'm
learning lots of stuff - from random tidbits like this that clarify something
I think I intuitively knew to things that immediately became useful to me.

[https://twitter.com/b0rk](https://twitter.com/b0rk)

------
xeonoex
Unless I'm writing 'select *', I always do the select portion of a SQL query
last. Starting with from enables helps with autocomplete on most editors too.
I actually with 'select' was at the bottom. The only reason it would be useful
at the top would be if it allowed you to use the new/computed column names as
aliases later in the query, which doesn't work with most databases.

~~~
magicalhippo
> The only reason it would be useful at the top would be if it allowed you to
> use the new/computed column names as aliases later in the query

I'm soooo gonna miss that when we have to add MSSQL support soonish. Having to
repeat myself is tedious, especial for non-trivial expressions.

~~~
jfim
As far as I know, it supports the WITH keyword, so you can reuse those
expressions.

~~~
hobs
Which you probably shouldnt, because you are just writing the subquery each
time - a temp table in MSSQL is a much better solution than CTEs 99/100 times.

~~~
jfim
It's not any different than selecting from a view.

Are CTEs an optimization barrier in MSSQL? I would've thought that the query
planner could move the execution plan nodes across the CTEs.

~~~
wvenable
> Are CTEs an optimization barrier in MSSQL?

They are not. They are, however, in Postgres.

~~~
dragonwriter
They are not (or at least less likely to be) in Postgres 12, as they are
inlined in a wide variety of cases.

------
Gormisdomai
I really like jvns's work but I have to say I found this diagram misleading
when I first saw it! SQL queries "happen" in whatever order the database
optimiser decides.

I'm really glad the accompanying article has a bunch of qualifications of the
form "Database engines don’t actually literally run queries in this order" but
a lot of the beauty of these diagrams is that they work on their own.

I wish the title of the picture said something more like "SQL queries are
interpreted in this order" to make it clear we're talking about semantics and
not what actually takes place.

~~~
zepearl
Concerning the database optimizer:

is it only my (wrong) subjective feeling, or did the one of Oracle become much
more stubborn about following "hints"?
[https://en.wikipedia.org/wiki/Hint_(SQL)](https://en.wikipedia.org/wiki/Hint_\(SQL\))

Reason: I started dealing with Oracle when it was at v8 and I think up to
including v9 when I specified a hint it was usually followed. Then later with
v10 and v11 it progressively became more difficult and nowadays with v12 I'm
having a really hard time (I usually have to use some "tricks" that don't
change the logic of the SQL but do change its technical execution, like
placing somewhere a useless but tactical "distinct", to confuse it enough so
that my hints are more or less followed/used).

Btw., if you want to state something like "using hints is wrong" then in
general I agree (with some exceptions for special cases) but currently I'm
taking care of a very old app that uses often quite big SQLs (involving up to
~15 tables) and as the maintenance budget is limited and the app will anyway
be decommissioned in 1-2 years I cannot start rewriting half of the
application to break down and improve the statements => when once per quarter
the data distribution/quantity/etc... change and the optimizer thinks that it
has a new brilliant idea about how to exec some SQL and then of course the SQL
hangs then I usually just try to spend 1-2 hours trying to find some hint(s)
that will bring back the old execution plan, but since we upgraded to Oracle12
I often see no change in the exec plan unless I do what I mentioned above.

~~~
munk-a
> Btw., if you want to state something like "using hints is wrong" then in
> general I agree (with some exceptions for special cases) but currently I'm
> taking care of a very old app that uses often quite big SQLs (involving up
> to ~15 tables) and as the maintenance budget is limited

This is a very common truth, I've been in similar situations where budgets are
tight and things are working so don't touch a thing. I think it's actually
just a specific example of a very common problem pattern in tech - the way I
usually push back on it is "The stuff that is is and I'm not going to waste
time fixing it - but any stuff I'm adding a feature to or fixing a bug in,
it's not worth the company's time to fix it the wrong way because it'll end up
costing more when another bug or feature requires an adjustment near this in a
year." All that said, especially within MySQL (and I haven't worked in Oracle
so maybe there too) the query planner is a bit dumb, so sometime you really
need to help it along.

~~~
zepearl
> _especially within MySQL (and I haven 't worked in Oracle so maybe there
> too) the query planner is a bit dumb, so sometime you really need to help it
> along._

Yeah, I've seen it as well last year when working with MariaDB (but I'd guess
that MySQL might be a bit more clever as maybe Oracle might have improved it a
bit).

------
ainar-g
In general, I _love_ SQL. It's one of my favourite languages (or rather,
language families) to play and tinker with. But the fact that SELECT comes
before FROM has always ground my gears. Thanks to this article, I now finally
have a better explanation than “Dunno, feels weird” :-)

~~~
dragonwriter
SELECT before FROM makes sense when writing, especially fairly simple, queries
in terms of designing results first.

OTOH FROM first is better for tooling.

Really, there's no really good reason for a mandatory order of clauses in an
SQL statement.

~~~
davidw
> SELECT before FROM makes sense when writing, especially fairly simple,
> queries in terms of designing results first.

The thing I always imagine is a bunch of people in the 1970ies sitting around
designing this stuff, and reading it out loud in a Captain Kirk voice

"Computer, SELECT course WHERE Klingons IS zero!"

------
baddox
I pretty much always start typing SQL queries with “SELECT *” because I know
I’ll come back and figure that part out later.

It’s similar to how I write Python list comprehensions. I almost always type
“x for x in” from muscle memory, then get the collection part at the end
correct, then go back and fix the beginning. My brain just won’t think in the
other direction.

~~~
talaketu
Oh yeah I agree - the forward reference to attributes before the source is
stated does feel backwards when writing the expression.

Compare scala's for-comprehension that feels more natural: `for (x <\- xs)
yield x + 1;`

Perhaps there's an argument that the shape of the result is defined up front,
so it might be easier to comprehend the result if not the implementation.

------
munk-a
I've actually found myself more and more disappointed that FROM isn't the
first clause in the query, it'd be really nice to express those queries as
WITH, FROM, GROUP BY, HAVING, WHERE, SELECT, ORDER BY, LIMIT as that's usually
the way I consider queries - granting HAVING is sort of a free card here, I
don't use it enough to place it accurately, but being near WHERE or GROUP BY
makes a lot of sense.

~~~
war1025
A professor I took a databases course from in college was convinced XML was
the next big thing, so he made us also learn the XQuery language [1], which
does things in roughly the order you described.

I don't remember the specifics, just that the acronym was something like
"flower" and you started with your from clause and ended with what you would
return.

[1]
[https://www.w3schools.com/XML/xquery_flwor.asp](https://www.w3schools.com/XML/xquery_flwor.asp)

~~~
iamthepieman
FLOWR - from, let, order by, where, return.

~~~
gigatexal
Or flows just replace the r with the s for select

------
em500
I learned this and more earlier from
[https://blog.jooq.org/2016/03/17/10-easy-steps-to-a-
complete...](https://blog.jooq.org/2016/03/17/10-easy-steps-to-a-complete-
understanding-of-sql/) my goto guide and recommendation for people with
practical SQL experience who want to improve their understanding. (I first
read it via Hadley Wickham's tidy data writeup.)

It leaves out SQL window functions though.

------
HeavyStorm
I usually begin my queries by typing "SELECT * FROM", and then go back to the
* after I've done everything else.

There's a bonus in doing that - you get auto completion from the editors on
many cases if you've filled the from first.

~~~
mritchie712
I do this too, so the autocomplete in SeekWell automatically does this, check
it out:

[https://www.loom.com/share/9c7979a163eb4513b3320e6e90c66079](https://www.loom.com/share/9c7979a163eb4513b3320e6e90c66079)

If you type just a table name into an empty cell it will autocomplete with a
full SELECT statement. e.g. if you typed "pubuser" and selected "public.users"
it would autocomplete with:

    
    
      SELECT pu.*
      FROM public.users AS pu
      WHERE 1=1
    
    

Disclaimer: I built this.

------
chx
The posts, tutorials and zines of Julia Evans are amazing. She has a knack for
explaining things. This one is also quite great, isn't it? It doesn't talk
about relational algebras, sets, cartesian products etc but explains things in
a very easy to understand fashion while pointing out common problems and
pitfalls.

I wish she did a monad one, you bet it wouldn't be "it's just a monoid in the
category of endofunctors, what's the problem".

~~~
danellis
There are already many, many monad tutorials. If you want something visual,
perhaps this one?

[http://adit.io/posts/2013-04-17-functors,_applicatives,_and_...](http://adit.io/posts/2013-04-17-functors,_applicatives,_and_monads_in_pictures.html)

~~~
ant6n
This looks like a bunch of jibberish to me, mostly pretending it's easy by
making it look like a comic.

~~~
chx
I didn't want to be the one to say that... this is a terrible tutorial.

------
manigandham
The reason why is historical. SQL (structured query language) originally came
from SEQUEL (structured _english_ query language) and was designed to emulate
a normal English sentence as a query.

 _' SELECT columns FROM table'_ makes sense when you're asking it literally.
However SQL and relational DB's have evolved since then and the English
grammar is now awkward. It would be nice to see newer dialects accept the
inverse order to help with tooling and structure.

~~~
kragen
I'm a native English speaker and both "from the top shelf select a book" and
"select a book from the top shelf" are perfectly valid syntax, even if the
second one is more common.

------
goto11
This is why I like Linq in C#. The syntax order corresponds to the logical
order. This also makes queries more composable.

And a very practical advantage is that the editor can support syntax
completion. SQL syntax cannot support that since you write the projection
before the from-clause.

~~~
nick_
Have you checked out the LINQ expression support in F#? They cover all (I'm
pretty sure) LINQ operators within an expression style (AKA not the method-
call syntax).

~~~
thrower123
This looks much better than the way it is done in C#.

I absolutely hate the SQL-ish LINQ expressions in C# code. It's really awkward
when you make a query and need to materialize it. You either store the
Queryable<> in a temporary variable and then convert it to a list or an array
or whatever, or you have to glom parentheses around a big LINQ expression.

~~~
nick_
I prefer the approach of LINQ expressions over LINQ method chains, but in C#
very few LINQ operators are exposed to the LINQ expression syntax. So if you
start with expressions you often end up with a terrible hybrid of both.

------
phs318u
Maybe I’m just showing my age here, but has use of SQL really fallen so far
out of favour that very few developers know it well? I’m honestly surprised.

~~~
EastSmith
I've wrote CRUD apps for 5 years very long time ago. Then I was spending weeks
debugging business logic in stored procedures and views, fine-tuning big query
execution plans, you name it.

At my current work nobody talks about SQL, literally every discussion starts
with "We will create this new Rails model". Indexes are not created
deliberately - we are adding them only when the need is obvious, because some
functionality is plain slow. No one talks about SQL, plain SQL queries in code
are considered really bad practice - in cases where the query generated by
Rails is complex and slow and a plain SQL query is PR-ed, the code reviews
discussions are pretty long.

I am not saying this is good or bad, in fact all looks OK, but I was really
really surprised at the beginning.

~~~
laumars
I once worked with a guy who would generate SQL queries via a series of
multiple regex transformations. Even now that's still up there as one of the
most terrifying things I've seen a developer do.

~~~
phs318u
Why terrifying? Sounds like he saw that the structure of much of what he was
doing was boilerplate and created a bespoke code generator to save himself
time. As long as the resulting queries are readable and effective I don't see
a problem. Unless I'm missing something, understanding his code generator is
not (necessarily) a pre-requisite for understanding the resulting code.

~~~
laumars
> _Why terrifying?_

Because you're leaving an unmaintainable mess. Nobody else can rationalise the
code aside himself. It's harder to debug because even he cannot account for
every edge case his regex might throw up -- which also makes it less secure.
Also regex is comparatively slow so transforming an SQL query using multiple
regex search and replace patterns is not the way to write hot code hit by
literally millions of visitors each hour.

> _understanding his code generator is not (necessarily) a pre-requisite for
> understanding the resulting code._

Oh it absolutely is if you want maintainable and secure code. I get some
situations call for complexity but when you're generating SQL on a public
facing web application with a centralised database backend, you want to be
damn sure you can rationalise the SQL being generated. The best ways of doing
that is to either not to try to be too clever with your generation (KISS) or
to put your confidence into an established and well tested existing ORM or
equivalent.

------
faizshah
In my experience the execution order isn’t obvious to most sql users. You’ll
find a lot of people using sql who don't understand why something from SELECT
can be used in ORDER BY but not WHERE or HAVING for example. It takes a
slightly more advanced SQL understanding to be able to explain these kinds of
errors in thinking. Thats why a good ORM can often be the best choice for
users.

~~~
chias
Can't you though?

    
    
        SELECT 2+2 as num FROM <table> HAVING num > 3;
    

this will return as many 4s as there are rows in the table

    
    
        SELECT 2+2 as num FROM <table> HAVING num > 4;
    

this will return 0 rows. How is this not using something from SELECT in a
HAVING clause?

~~~
faizshah
Like with many SQL questions this depends on what implementation you're using.

In MySQL you can use HAVING, ORDER BY, and GROUP BY with column aliases but
not WHERE: [https://stackoverflow.com/questions/942571/using-column-
alia...](https://stackoverflow.com/questions/942571/using-column-alias-in-
where-clause-of-mysql-query-produces-an-error)

In PostgreSQL you can not use the column alias within WHERE or HAVING but you
can use it in ORDER BY and GROUP BY:
[https://dba.stackexchange.com/questions/225874/using-
column-...](https://dba.stackexchange.com/questions/225874/using-column-alias-
in-a-where-clause-doesnt-work)

Similarly in Microsoft SQL server you cannot use it in WHERE, HAVING, or GROUP
BY but you can use it in ORDER BY.

In SQLite you can use column aliases within WHERE, HAVING, ORDER BY and GROUP
BY: [https://stackoverflow.com/questions/10923107/using-a-
column-...](https://stackoverflow.com/questions/10923107/using-a-column-alias-
in-sqlite3-where-clause)

In Google Bigquery you can use them in GROUP BY and ORDER BY but in Hive you
can only use them in ORDER BY.

As far as I know the standard only defines that you cannot use column aliases
in the WHERE clause. I'm sure someone else can chime in with what the standard
says about column aliases in ORDER BY, GROUP BY and HAVING.

------
bodyloss
A minor thing, but adding the non-image version is a really nice touch!

Great article, thanks.

------
nicwolff
More depth and detail in this post a couple of year ago on Lukas Eder's
excellent SQL blog [https://blog.jooq.org/2016/12/09/a-beginners-guide-to-the-
tr...](https://blog.jooq.org/2016/12/09/a-beginners-guide-to-the-true-order-
of-sql-operations/)

------
belltaco
>LINQ (a querying syntax in C# and VB.NET) uses the order FROM ... WHERE ...
SELECT

The reason is for that is Intellisense, you cannot show columns for SELECT
unless you already know the table(s) you're selecting the columns for.

~~~
noveltyaccount
Came here to say this. It makes a heck of a lot of sense when you think about
it!

------
throwaway_pdp09
Actually I've got bad news. The where clause should be run before the select
but for reasons of, I guess cheating at the benchmarks, oracle doesn't do that
and MSSQL followed that. It is particulary nasty as it _mostly_ works, so this
(CTE to not clutter your schema)

    
    
      with data as
      (
       select *
       from
       (
        values(1), (2), (0), (4)
       ) as abc(d)
      )
      select 1 / d
      from data
      -- where d <> 0
    

gives divide by zero. If you uncomment the where it then seems to work
correctly.

Well it doesn't. You cannot rely on this for MSSQL2005 or later. When the
expressin grows more complex it blows up - I am saying this from losing days
of work by finding out the hard way.

I'll repeat myself, the expressions in the select and that in the where can be
evaluated in parallel. Thank you Microsoft.

Microsoft's solution to this is to expect the programmer to deal with it by
skipping the evaluation of the expression

    
    
      with data as
      (
       select *
       from
       (
        values(1), (2), (0), (4)
       ) as abc(d)
      )
      select case when d <> 0 then 1 / d else null end
      from data
      where d <> 0;
    

The case prevents any possible division by zero.

I've had this behaviour confirmed by Microsoft, and the supposed fix was their
solution.

Not coincidentally I'm looking at learning postgres. For this and other
reasons I don't want to work with MSSQL any more.

------
MiSeRyDeee
I'm a bit of surprised by how shallow people's understanding of database is.
Especially for someone have literally wrote thousands of sql queries. The
query engine implementation is/should be covered by some extent in any of the
database courses, thus I assume anyone with a CS degree shouldn't be surprised
that a queries doesn't execute from SELECT(how's that even possible). Perhaps
nowadays the major users of SQL don't really have a CS background.

~~~
laumars
In fairness, you don't need to be a DBA to have written thousands of SQL
queries and if you're in your 30s or older it's quite possible you might have
forgotten some of the theory since you don't need that level understanding for
your day to day.

It's getting on for 20 years since I did CS and I'm certain there's stuff I've
forgotten about (I've definitely forgotten half the stuff about network
topologies but most of that stuff was about coax networks rather than ethernet
so it's not stuff I've much since)

~~~
MiSeRyDeee
It's fair to forget OSI models since most of programmers don't need to deal
with layers below IP, but the fact that there are 3 more _hidden_ layers
shouldn't surprise you. As for someone who writes SQL queries for work, I
expect some level of understanding of how your underlying system work. Not in
great detail, but at least a general picture. I usually consider the desire of
understanding an blackbox as an indicator of whether the person will become a
great programmer.

~~~
laumars
I agree with the "blackbox" metaphor 100% but you have to bare in mind that we
all have blind-spots. Sometimes they're due to the path we took into our
profession, eg different education/training routes; and sometimes due to the
way we've specialised, eg the example I'd given about how specialised
knowledge that isn't recalled frequently can be forgotten over time. Sometimes
it's just because there's so much information to take in that our leaky brains
missed that specific detail in lessons. However the fact that the author
literally did research that "blackbox" \-- hence the article being written --
should be an indicator that she is actually a pretty decent engineer.

I find criticising people for having blind spots that we might consider
obvious isn't a great way to share knowledge. Not only does it make
individuals less willing to come forward with questions but it also means
they're less likely to correct your errors (we are all fallible) for fear of
being chastised again.

~~~
MiSeRyDeee
I agree. Wasn't mean to criticizing tho. I definitely think the author
is/could be a great engineer judging by the volume of her blog.

------
bena
I really wish SQL syntax was in this order.

~~~
knodi123
Why do SQL clauses need to be ordered, anyway? Is there anything that would be
impossible to parse if the order was not defined?

~~~
cm2187
To be logical to beginners. So that autocomplete can help you (which it cannot
if it doesn't know what tables you are querying).

~~~
wodenokoto
But the syntax is in the wrong order for autocomplete to help you.

If from was first, autocomplete could help you with the names of columns, like
RStudio does with dplyr:

    
    
        data_frame_name %>% select(column_name)
    

Because you start by piping the data reference into your select function,
RStudio can inspect the data and autocomplete the column names in the select
statement, completely opposite of what is possible inSQL

With a free order, you'd be able to start SQL queries with `FROM table_name
SELECT ...` and have columns autocompleted.

~~~
cm2187
Sorry, I didn't mean that in defense of SELECT being first. I agree with the
article that SELECT must come toward the end. And the first thing I tell
people I teach SQL to is that they have to read SELECT last.

~~~
wodenokoto
Then I misunderstood you post.

------
anon1m0us
> (I really want to find a more accurate way of phrasing this than “sql
> queries happen/run in this order” but I haven’t figured it out yet)

Maybe "Order of Operations"? Similar to order of operations in math where you
do (), then * and /, then + and -, ...

[https://en.wikipedia.org/wiki/Order_of_operations](https://en.wikipedia.org/wiki/Order_of_operations)

------
pvorb
> Can I filter based on the results of a window function? (no! window
> functions happen in SELECT, which happens after both WHERE and GROUP BY)

Yes, you can by wrapping the entire SELECT in another SELECT (or use WITH)

    
    
        select *
        from (select ...) as x
        where ...
    

This works because the result of a SELECT is a (virtual) table on its own. It
was an eye opener when I saw it for the first time.

~~~
Macha
It's surprising to me that someone can run into window functions earlier than
subqueries. I guess my background with MySQL which only recently supported
window functions affected my perspective there.

~~~
firasd
Her question that starts the post is about doing it in a single query (no
subqueries.) I kinda had this misunderstanding on Twitter too when responding
to an earlier diagram of hers
[https://twitter.com/firasd/status/1172874054002307073](https://twitter.com/firasd/status/1172874054002307073)

------
sebazzz
I'm always surprised how apparently not-deterministic SQL Server appears to be
when it comes to performance.

We use for our apps Kendo grids at work which allows you to create any report
by dynamically choosing the filters and columns you want. On the server side a
query is composed from the parameters the user chooses, including only the
columns that are necessary. The data source for the query is a inline table
valued function (TVF).

At one point we had trouble with performance after going live, and it could
not be explained by changes to the TVF. Even when we reverted the released TVF
changes the issues kept happening.

A day or so later I was with a database expert one day long optimising the
query (may I plug the excellent free Sentry One tool here?). Eventually we had
optimised the query. However, when we put the query back into the TVF
performance was down again. The slightest things that changed affected the
performance.

~~~
avar
This is something that affects every RDBMS that exposes SQL. The SQL language
describes _what_ and not _how_ , how specifically to retrieve your data is
left to the implementation.

The most common causes for this in MySQL (and I expect SQL Server too) is that
often your query can be resolved on multiple indexes, and picking which one is
a matter of a heuristic. On MySQL changing table statistics can change which
index is used, which can drastically affect query performance.

So a common poweruser optimization strategy (which should be used very
carefully) is FORCE INDEX, also known as "I know better which index to use".
More often than not I've found uses of FORCE INDEX that were more harmful than
not.

But an ever more insidious version of this is queries that retrieve cached
data being faster. This is something you can generally only observe in
production because what's in cache is a matter of _other_ traffic on the
server.

So you can easily get cases where something looks much worse under EXPLAIN,
but is faster. E.g. it can be faster to execute a "worse" query with no
indexes than one with, if the one with no indexes happens to need data that's
already in memory v.s. sitting on disk.

------
d--b
Of course they don't start with SELECT, which is also why SQL is a terrible
language.

When you type the fields you want to select, there is zero context as to what
those fields apply to, meaning that there is zero contextual help that you
could get from know the table and group by ahead of typing select.

------
etrain
Relational database queries are supposed to be _declarative_. As a user,
you're not supposed to think about the mechanics of execution because the
database system is supposed to be able to decide how to execute your query
using whatever magic it wants, as long as it satisfies the contract that it
gives you the right answer.

It's absolutely useful as a debugging tool to build up some semantic
understanding of what the query means, and I encourage every database user to
learn to use EXPLAIN, but relying on a mental execution model is borderline
dangerous.

~~~
curryhoward
The author makes it crystal clear that she is not advocating this as a mental
model for the execution of the query:

 _Database engines don’t actually literally run queries in this order because
they implement a bunch of optimizations to make queries run faster – we’ll get
to that a little later in the post._

 _So:_

 _\- you can use this diagram when you just want to understand which queries
are valid and how to reason about what results of a given query will be_

 _\- you shouldn’t use this diagram to reason about query performance or
anything involving indexes, that’s a much more complicated thing with a lot
more variables_

She claims this is a useful tool to understand the denotational semantics of
the query and which kinds of queries are allowed vs. not allowed—and she's
absolutely right.

------
chias
I'm confused, this doesn't seem right. This article, and several of the
comments here, suggest that you cannot reference things calculated during
SELECT in the HAVING field. But you can.

As a quick demo, pick your favorite table and run:

    
    
        SELECT 2+2 as num FROM <table> HAVING num > 3;
    

this will return as many 4s as there are rows in the table

    
    
        SELECT 2+2 as num FROM <table> HAVING num > 4;
    

this will return 0 rows.

~~~
okonomiyaki3000
I was thinking the same thing. HAVING must be after SELECT. It doesn't really
make sense otherwise.

~~~
faizshah
Long story short, relying on HAVING to run after SELECT is not a standard SQL
behavior it is dependent on the implementation and what the optimizer thinks
is best (more discussion below:
[https://news.ycombinator.com/item?id=21153870](https://news.ycombinator.com/item?id=21153870)).

The diagram in the OP is a pretty good general/semantic understanding of SQL
execution order for writing queries.

------
nick_urban
Elixir's ORM, Ecto, appears to be a lot like LINQ. You write pseudo-SQL
queries starting with from. E.g.

    
    
      from u in User, where: u.id = 1, select: u.name

------
theemathas
In my undergrad class, I was thought that the order is "FWGHOS" (pronounced,
"fwog hose"), i.e., FROM, WHERE, GROUP BY, HAVING, ORDER BY, SELECT. (LIMIT
was omitted from this.)

This acronym has "ORDER BY" happen before "SELECT", but this article have them
the other way around. Does this difference ever matter?

~~~
daxfohl
You don't have to include the orderby field in the result set so I'd say O is
before S.

------
AtlasBarfed
Yes, this is particularly annoying for autocompleting column names when
writing SELECT (and DELETE)

with INSERT and UPDATE the user provides a table as context for autocompletion
of the column list.

There's been a billion revisions of ANSI SQL and they've never provided
alternate ordering options. Who cares about users?

------
iblaine
"SQL queries don't start [optimizing] with SELECT"

The title is missing a word. Interestingly, part of the industry is moving
towards databases that optimize themselves (ex Snowflake & Oracle Autonomous
Database), plus projects that do it for you (Apache Calcite).

~~~
munk-a
With a bit of an exception for the era where we were all hard wiring indexes
in MySQL using hints... that's what databases should be doing. DB interactions
are supposed to be done using a purely declarative language - postgres has
stuck by this the strongest and has been able to make the biggest
optimizations in (IMO) a large part due to how little of the internals they've
exposed. With pg12 they're making the optimization fences in WITH clauses an
accepted give on this point and the discussion on whether to do this or not
went back and forth quite a bit.

------
nickjj
I came to this realization by accident shortly after using various ORMs and
database mapping tools.

Being able to break down and compose a query in your own defined order while
leveraging a library to transform it into a valid query is quite nice for that
way of thinking.

------
mirceal
so this depends on the database engine and while in general it may follow a
pattern there are no guarantees.

To dig deeper:
[https://en.wikipedia.org/wiki/Query_plan](https://en.wikipedia.org/wiki/Query_plan)
[https://www.khanacademy.org/computing/computer-
programming/s...](https://www.khanacademy.org/computing/computer-
programming/sql/relational-queries-in-sql/a/more-efficient-sql-with-query-
planning-and-optimization)

so, SQL is not about how you're going to get the results it's about what
results you want.

------
Tomis02
For bonus confusion points, DISTINCT happens after SELECT and before ORDER BY
(and DISTINCT affects the usage rules for ORDER BY). Personally keep a printed
copy of the actual logical order of operations, it comes in handy.

------
nnq
SQL is the most WTF notation/language I've ever seen... How the hell did
someone _start from the beauty of relational algebra and end up with the this
misordered confusing and verbose abomination that we 've all standardized
upon?!_

...even Mongo's JSON-based query language has more elegance to it. SQL seems
_designed on purpose_ to be annoying to write, hard to read, and never look
like anything, not the math logic, not any query operations that could ever be
executed. I may be superficially similar to English, but English is horrible,
why would you model a notation after that?!

Everytime I see someone yelling that Haskell or Lisp syntax is weird and
unintuitive I think they need to have their face bashed into a printout of a
complex SQL query!

~~~
edmundsauto
I suspect you're asking more figuratively, but I saw this PDF from Stonebraker
on the developer of databases and query languages that literally answers your
question[0].

I've always done data heavy work, but as a recently hired data engineer, I'm
really diving deeply into this layer that previously I had taken for granted.
In doing so, I find SQL highly expressive and beautiful for data work; much
moreso than Python, my other primary language.

Which languages do you feel manipulate data beautifully?

[0] -
[https://people.cs.umass.edu/~yanlei/courses/CS691LL-f06/pape...](https://people.cs.umass.edu/~yanlei/courses/CS691LL-f06/papers/SH05.pdf)

~~~
nnq
They all suck. C#'s Linq seems cool but I don't live in that ecosystem. Also
Cypher fro graph DBs or better still Gremlin are nice and more elegant.

But I'd just want to write code like:

    
    
        (employees * addresses)[employee.name, address.city.name]
        | filter{address.city.name = "London"}
    

...that would match rel algebra notation nicely, and extend to complex cases
more like:

    
    
        ( employees e *{e.id = a.employee_id} addresses a )
        | filter{a.city.name = "London"}
        | [e.name, a.city.name]
    

...you can replace operators with words of course. Or you could represent this
as a JSON structure, even better imo. But the point would be to have a
notation highlighting the idea of a "product", of restricting and specializing
that product or specifying how it's made, and of indexing into its fields. The
way I see it in my mind is like taking "a (maybe generalized) cartesian
product" of different things, then filtering this, then zooming on each
dot/point/object and filtering its fields etc. And obviously focusing on
piping/chaining stuff.

Maybe the notation I suggest is not a good one, but I'd care more about making
the concepts obvious and readable and composable.

------
CWuestefeld
I have long wondered why we can write "ORDER BY 1" (meaning to order by the
values in the first column), but it's illegal to say "GROUP BY 1". This
article shows why that is.

~~~
jvns
`group by 1` is actually allowed in several SQL implementations, and there's
no reason it can't be -- the same as with column aliases the db engine can
just rewrite

    
    
      select name, count(*) from table group by 1
    

as

    
    
      select name, count(*) from table group by name
    

before executing the query

~~~
erik_seaberg
I will never forget Percona telling us

    
    
      ORDER BY NULL
    

would defeat something MySQL was getting wrong in a query that already wasn't
ordered by anything. A coworker put it as "waving a dead chicken that doesn't
even exist".

------
perl4ever
My SQL queries generally start with _WITH_.

Or they did when I was using Oracle.

~~~
scomp
To be honest I don't think the article highlights very well where this
applies. I am a SQL Developer with SQL Server right now and in the past using
Teradata I vaguely remember an aliased column could be referenced in the where
clause if you didn't use the AS keyword.

------
spicyramen
This totally makes sense, as from a cognitive perspective the table you will
extract information is the first place you take a look

------
mtts
I teach a first year introduction to sql course and this is _exactly_ how we
tell our students to construct their queries.

------
starpilot
Fuck this opacity. When will a database add seamless translation of Pandas to
SQL? It is a thousand times the better syntax.

~~~
avip
Like spark has? personally I found it just confusing. Why would I need a "no
sql" syntax, when sql has such a perfect syntax for what it does?

I generally dislike having multiple ways to achieve the same end result.

------
parliament32
Interesting stuff. One of those things you don't really think about, but when
it's explained, it seems intuitive.

------
Rerarom
My database professor in university always used to say "the FROM, not the
SELECT, is the most important".

------
annguyenfoss
Developers these days truly don’t know SQL. This blog post getting 500+ points
on HN is a proof.

~~~
scarface74
One blog post getting 500+ points is about as much “proof” that developers
don’t know sql as a comment that cites that as evidence is proof that HN users
don’t understand statistics....

------
csours
Of course not, they start with WITH! Preferably 8 withs that reference each
other.

------
jbverschoor
Funny. Iirc, the ejb1 query language was like “from xxx”. Not entirely sure
though

~~~
jbverschoor
Oh wait I think it was hql from hibernate

------
rambojazz
Is this specific to Microsoft SQL server or is it the same for every database?

~~~
niceworkbuddy
Every database. It's because of logical order of execution, as explained in
article. Eg. you cannot use `having` before `where` even if syntax of SQL is
allowing you to do so.

------
kerng
If you use SQL server you can inspect the query plan, very useful.

------
clatan
I wish SQL would die as a query language. This is one of the reasons why,
writing a query in the syntactic order is counter to how you should reason
about getting and shaping your data.

------
riantogo
Well, processing doesn't but the beauty of the language is that it starts with
what the user wants as the output and the details of how it is pulled follows
later.

------
Epopeehief54
That explains linq syntax.

------
tannhaeuser
No, no, that's wrong

------
crorella
Qualify ?

------
magicalhippo
Title is misleading as far as I can tell. The author was just surprised to
realize that the syntax for SQL queries did not mirror the execution order of
the various elements of the query.

It is a bit of a "well, duh" thing, but I've had my fair share of those with
SQL through the years so I won't hold it against the author.

One of my more recent random discoveries was that I could join a subquery, ie

    
    
       select x.a, y.b, y.c
       from tableX x
       join (
         select id, sum(b) as b, count(b) as c 
         from tableY group by id
       ) as y on y.id = x.id
    

Again nothing special, just hadn't had the need for it before so didn't even
think about the possibility.

~~~
goto11
Everywhere you can have a table expression, you can replace it with a
subquery.

~~~
Sharlin
Indeed one can think of a table expression as the trivial subquery

    
    
        SELECT * FROM table
    

…well, modulo the fact that in a concrete sense that substitution would lead
to infinite recursion!

------
lugg
I thought this was obvious? (Serious, not snark)

You start explaining what you need and work backwards, on the other end, they
work in reverse, because that's how it's stored.

Conceptually, it's like piping in bash, "find | grep | cut". Start with
finding the files(tables), grep for the row data (where), cut the fields you
want returned (select).

------
ineedasername

      drop schema if exists *
    

Doesn't start with select; you never have to write sql again; you're probably
fired

~~~
hirako2000
I did this, on a very very large set of DBs.The boss moved me to the backup
team.

~~~
ineedasername
I didn't drop a schema, but I did, um, accidentally get over aggressive with a
merge-purge duplicate process. By which I mean that I merged every single
record, all of them, into a single record.

I was very fortunate, we had backups and restoring went smoothly, and my boss
very calmly asked me one thing: "did you learn from this?" I assured him that
I had.

------
Scarbutt
_You can even think about the joins "imperatively"._

JOINs _are_ imperative.

~~~
atroche
How would you describe the joining of two tables more declaratively than how
SQL does it?

~~~
baddox
Perhaps a “fake table” (e.g. a view) that is the result of two tables being
joined but presents itself as a normal table with the expected columns? That
would be a very declarative way of querying joined tables, but yeah, you’d
still have to create the views or have some means of declaring the appropriate
relationships between tables. ORMs sort of let you do this, like in Django
where you declare that a User has many Permissions and the ORM knows how to
generate that JOIN query for you.

~~~
benjiweber
Something like SQL's NATURAL JOIN? Essentially a relational "AND"
[https://en.wikipedia.org/wiki/Join_(SQL)#Natural_join](https://en.wikipedia.org/wiki/Join_\(SQL\)#Natural_join)

