
SELECT wat FROM sql - luu
https://scattered-thoughts.net/writing/select-wat-from-sql/
======
progre
This one

    
    
        jamie=# ((select a from nums) union (select a from nums)) order by b;
        ERROR:  column "b" does not exist
    

This seems to point to a missunderstanding of how sql works. It's not a
_table_ query language, but a _set_ query language. Tables are just a way to
store sets while thay are not in use.

    
    
        (select a from nums)
    

This part creates a new set from the "a" column of nums. The column name is
carried over as convinience but the nums table is not visable from the union.

~~~
mFixman
The worst part about SQL is statement order.

IIRC most SQL engines do FROM → WHERE → GROUP BY → HAVING → ORDER BY → SELECT
→ LIMIT → UNION, which is why you can SELECT after ORDER BY in the previous
example but not after union.

If I were to start SQL from scratch I would allow statements in an arbitrary
order and execute them in whichever order I coded them.

~~~
tpetry
But then i will not be SQL. SQL is built like this because all these
operations are projections on a set. You can not really change their order and
have meaningful results.

Some oeprators can be swapped, but this part of the optimization process and
nothing you need to be aware of.

------
X-Istence

      jamie=# create table users ("user" text, "password" text);
      CREATE TABLE
      jamie=# insert into users values ('bob', 'whatever');
      INSERT 0 1
      jamie=# select user, password from users;
       user  | password
      -------+----------
       jamie | whatever
      (1 row)
    
    

Is because the name /user/ is a pseudo-function keyword, and is an alias for
/current_user/

    
    
      test=# select "user", "password" from users;
       user | password
      ------+----------
       bob  | whatever
      (1 row)
    

Newer versions of PostgreSQL make this more clear:

    
    
      test=# select user, password from users;
       current_user | password
      --------------+----------
       postgres     | whatever
      (1 row)
    

Here is a full list of keywords: [https://www.postgresql.org/docs/current/sql-
keywords-appendi...](https://www.postgresql.org/docs/current/sql-keywords-
appendix.html)

~~~
em500
And here's what the SQLite manual says about keywords:

 _The SQL standard specifies a large number of keywords which may not be used
as the names of tables, indices, columns, databases, user-defined functions,
collations, virtual table modules, or any other named object. The list of
keywords is so long that few people can remember them all. For most SQL code,
your safest bet is to never use any English language word as the name of a
user-defined object._

[https://www.sqlite.org/lang_keywords.html](https://www.sqlite.org/lang_keywords.html)

------
craigkerstiens
A great one to pile on here is null. Null is quite the unique thing within
SQL, this post from some years ago highlights a lot of the unintuitive
behavior around it within Postgres -
[http://thoughts.davisjeff.com/2009/08/02/what-is-the-deal-
wi...](http://thoughts.davisjeff.com/2009/08/02/what-is-the-deal-with-nulls/)

~~~
sjwright
Interesting. The SQL concept of null always made perfect sense to me.

Two people with the same middle name would have equality with respect to the
string value of middleName. The same applies with two people known to have no
middle name—where you store a zero length string.

Two people who _didn’t supply_ a middle name should not have equality with
respect to middleName. Hence null. This acknowledged lack of data cannot be
equal to a different lack of data.

~~~
rmrfchik
Except, Oracle treats zero length string as null.

~~~
sjwright
Seriously? That's the sort of thing I'd expect to find in MySQL 3-point-
something, not Oracle.

~~~
Sharlin
Yes. It's horrible. But presumably it's a decades-old "feature" retained for
backward compatibility…

------
ThreeFx
Super interesting!

Although it sure would be nice to have explanations for amateurs such as me.

~~~
benesch
I highly recommend puzzling through some of these on your own first (it's a
great way to build intuition for SQL's craziness), but here's a few
explanations if you're stuck.

    
    
        jamie=# select a+1 from nums group by a+1;
         ?column?
        ----------
                2
                4
                1
                3
        (4 rows)
        
        jamie=# select a+1 from nums group by 1+a;
        ERROR:  column "nums.a" must appear in the GROUP BY clause or be used in an aggregate function
        LINE 1: select a+1 from nums group by 1+a;
    

In general, a query like `SELECT a FROM t GROUP BY a` is valid, while a query
`SELECT b FROM t GROUP BY a` is _not_ valid. Because you didn't group by
column b, there are multiple values of b within each group, and it's not clear
how to combine them. What you usually meant is something like

    
    
        SELECT a, sum(b) FROM t GROUP BY a
    

where you can see the sum of all b's for each distinct value of a.

The error message explains it really nicely: if there is a GROUP BY clause,
every column in a SELECT list must either be used in the GROUP BY clause or
appear in an aggregate function. This is kind of annoying, though; what if you
want to group by a value derived from a column, like `a + 1`? (More
realistically, maybe you want to trim whitespace or something.) I think the
SQL standard would have you write

    
    
        SELECT a1 FROM (SELECT a + 1 AS a1 FROM t) _ GROUP BY a1
    

but that's annoying. PostgreSQL (and maybe other SQL implementations; not
sure!) has some smarts about detecting when an expression is used verbatim in
a SQL query, so you can just write

    
    
        SELECT a+1 FROM t GROUP BY a+1
    

and it will notice that the "a+1" in the SELECT list is literally the same as
the "a+1" in the GROUP BY clause. But it's not that smart, because it
doesn'that realize that "a+1" and "1+a" are semantically equivalent, and so
this is an error:

    
    
        SELECT a+1 FROM t GROUP BY 1+a
    

\----

    
    
        jamie=# select b from nums group by a;
         b
        ---
        (0 rows)
        
        jamie=# select a from nums group by b;
        ERROR:  column "nums.a" must appear in the GROUP BY clause or be used in an aggregate function
        LINE 1: select a from nums group by b;
    

An extension of the previous example, where PostgreSQL can infer that when
grouping by a all b's are unique. It can't infer the converse. The reason is
that a is the primary key and so guaranteed to be unique, and therefore `GROUP
BY a` is meaningless and can be elided.

(This one really confused me for a while. I only learned this recently.)

\----

    
    
        jamie=# select a, b from nums order by 2;
        a |  b
        ---+-----
         3 | bar
         0 | foo
         1 | foo
         2 | foo
        (4 rows)
        
        jamie=# select a, b from nums order by 1+1;
         a |  b
        ---+-----
         0 | foo
         1 | foo
         2 | foo
         3 | bar
        (4 rows)
    

`ORDER BY <n>` means "order by the Nth column", but `ORDER BY <expr>` means
"order by the value of <expr>`. So you get the goofy behavior that `ORDER BY
2` in the above query means order by column b, but `ORDER BY 1 + 1` means
order by the literal value "2", i.e., don't order by anything useful at all.
(It just so happens that you get primary key order in the example above, but
that's not guaranteed by any means.)

\----

Most of the others have a similar feel. Let me know if you're particularly
puzzled by any others.

~~~
dan-robertson
I’m not sure the sub query would be necessary if the weird ast-based naming
didn’t exist. Couldn’t you just write:

    
    
      SELECT a+1 AS a1 FROM t GROUP BY a1
    
    ?

~~~
benesch
You can do that in PostgreSQL, but not according to the spec, apparently, and
the PostgreSQL developers regret the feature. [0] (As someone who's spent a
lot of time trying to simulate PostgreSQL's name resolution rules... it's
reassuring to know that the rules are painful inside of PostgreSQL too.)

[0]: [https://www.postgresql.org/message-
id/7608.1259177709%40sss....](https://www.postgresql.org/message-
id/7608.1259177709%40sss.pgh.pa.us)

~~~
jamii
I never figured out why AS isn't allowed in the GROUP BY clause. It would be
much simpler to implement `SELECT a1 FROM nums GROUP BY a+1 AS a1`.

------
dahdum
I think I’ve been drinking the SQL kool aid for too long, only a couple at the
end surprised me. The rest seem perfectly logical.

Was a lot of fun to read through them though.

------
csours
So I looked at these, I didn't notice any examples where the problem was with
ANSI SQL? Please let me know if there were.

It seems like the problems were with PostgreSQL flavored queries.

Speaking of which - I know Microsoft has Transact-SQL and Oracle has PL/SQL,
is PostgreSQL just PostgreSQL?

~~~
jamii
I think a lot of these are in the ANSI 2016 spec. I haven't looked at the
older specs, maybe they were simpler.

Eg here is the `a+1` vs `1+a` thing explained for ORDER BY -
[https://i.imgur.com/bklpjkP.png](https://i.imgur.com/bklpjkP.png). Here are
the conditions under which ORDER BY can reach inside SELECT to pick columns
from the FROM clause -
[https://i.imgur.com/aoeqDGn.png](https://i.imgur.com/aoeqDGn.png).

Chapter 2 of the spec is 1732 pages of that kind of language, so it's hard to
be sure that any particular behaviour is not in there somewhere. But I'm
fairly sure that all the examples with set-valued functions are postgres-isms.

~~~
csours
Wow, I never would have guessed that ORDER BY stuff was part of the standard.
Seems like one of the special corners of SQL.

From that brief read of the ANSI Spec it seems like one would need a great
deal of time and some advanced mathematics to understand it.

~~~
jamii
There isn't really any math involved. It's just a lot of edge cases written
out in lots of detail.

I think a lot of what makes SQL hard to implement comes down to consistently
prioritizing local convenience over global complexity, combined with not being
able to break backwards compatibility to simplify the design.

------
mikl
Most of these are “play stupid games, win stupid prices”, not things you’d
encounter in normal usage.

Queries like `select a, b from nums order by 2` are nonsense. And sure, you
can create tables with weird names if you want, but that’s not really a
problem.

~~~
anamexis
Some of those queries are nonsense, but `select a, b from nums order by 2` is
not one of them. `order by 2` here means "order by the second field in the
select statement."

~~~
progre
One note: While using field number for _anything_ is perfecrly valid SQL, this
will _not_ pass a code review if I'm the one doing the review. Reason being,
it's not clear without looking at the field list what is being ordered. For
extra shame do something like:

    
    
        select * from foo order by 2
    

Order of the fields when doing "select *" may be guaranteed (I don't know) but
they have been dependent on when the feild was added to the database on every
system I have worked with, and sometimes patches are applied in diffrent
orders on diffrent databases.

~~~
plq
If you have a Python script create your tables

    
    
        AND 1=1 
        AND this script has the table information stored in dicts in the {column name: column type} form
        AND this script is run by a python version where the dict order is undefined
    

you will get funny outputs from your queries that break production deployments
in subtle and unique ways.

So yeah, just use column names (and name your computed columns!).

~~~
progre
Yes, name your computed columns. Makes your life easier.

------
crorella
All of them seem valid errors. I’m don’t get the idea of the post lol.

------
Areading314
The author of this seems to be surprised that the DB doesn't just sort things
(which could be billions of rows) automatically. SQL deals with "sets", which
are unordered, and the user needs to specify the order by if they need sorted
output. I think this is an excellent design.

~~~
benesch
You're not looking at the examples closely enough, if that's your takeaway. I
assure you they are much weirder than forgetting to specify an ORDER BY clause
and expecting sorted output.

------
mcstafford
Choosing an ineffective tool, or not knowing being familiar with its effective
use says more about the crafter than the tool.

~~~
dan-robertson
I don’t understand what point you’re trying to make.

------
samcheng
SQL is indeed pretty crufty. After all, it dates back to the 70's, and is
generally regarded as an 'entry level' language for analysts.

It's nowhere near as crazy as Javascript (and, arguably, Ruby):

[https://www.destroyallsoftware.com/talks/wat](https://www.destroyallsoftware.com/talks/wat)

~~~
JohnBooty
I don't think SQL is particularly crufty, but I'm willing to admit that's
highly debatable -- depends on your definition of "crufty," really.

However, the examples in the linked article are not examples of cruft.

I'm using this example of "cruft":

    
    
        Cruft is a jargon word for anything that 
        is left over, redundant and getting in the 
        way. It is used particularly for defective, 
        superseded, useless, superfluous, or dysfunctional 
        elements in computer software. 
    

Nearly all of the examples in the linked article are based on core SQL sorting
and grouping concepts (with a sprinkle of trivalue NULL logic) that have been
around since Day One.

I also don't think they really illustrate flaws/gotcha/etc in SQL itself. Nor
were they meant to, I don't think... these highly pathological examples are
examples of the sorts of pathologically ambiguous things a compiler needs to
handle, but are not representative of anything I've seen in 20+ years of real
world SQL experience.

~~~
nmyk
There are parts of SQL that are crufty by that definition. For example,
COMMIT; and COMMIT WORK; are two different statements that do the exact same
thing according to the SQL standard.

Perhaps trivial cruft but cruft nonetheless.

~~~
JohnBooty
I personally wouldn't say that example fits the "getting in the way" part of
the definition.

As a user, I've been using SQL for many years and it's never been in the way.
It's really just a syntactic alias, so I can't imagine it really impedes the
work of the folks maintaining RDBMS's under the hood.

But, I wouldn't argue either. Like you say... I suppose that, however tiny,
the burden is non-zero! =)

