
Easy Steps to a Complete Understanding of SQL - lukaseder
http://tech.pro/tutorial/1555/10-easy-steps-to-a-complete-understanding-of-sql
======
ams6110
_SQL JOIN tables should be used rather than comma-separated tables_

Having used SQL since long before the ANSI JOIN syntax was well supported
(first Sybase, then MS SQL and then Oracle) I resisted it for a long time out
of habit, and also because at first the syntax was buggy when used in Oracle.

But I have come around to being in favor of it. The bugs have been fixed, and
the main advantages are that: inner and outer joins are more clearly stated
than by using ' __*= ', or '(+)' suffixes on one side of a predicate; and the
join criteria are clearly separated from the WHERE clause. It makes it much
easier to see how the tables are being joined vs. how the results are being
limited.

~~~
chris_wot
They stop Cartesian products from accidentally occurring. That alone should
force the use of ANSI joins. cross joining two million row tables is not a
pleasant experience!

------
zamalek
The way I see SQL in my mind is programming Venn Diagrams (
[http://en.wikipedia.org/wiki/Venn_diagram](http://en.wikipedia.org/wiki/Venn_diagram)
) followed by creating a projection that uses one or more of the areas within
that diagram (contrasting strongly with imperative languages, i.e. logic
programming). Unfortunately it's not straight-forward to do that because SQL
is a superset of Venn Diagrams, but that line of thinking is where you need to
be.

~~~
ams6110
Yes, exactly. To use SQL well you need to think in terms of sets and set
operations. Venn Diagrams are a good idea to help visualize this. If you find
yourself writing a lot loops over cursors there's a good chance you're doing
it wrong.

------
jfb
I thought this was a very good article, unusually so for a blog post about
databases. I do take issue, however, with the idea that there's a beauty to
SQL. There's a beauty in the conceptual model, but you have to squint sideways
to see it through the deeply hostile syntax.

~~~
clarkevans
I'm not sure the "Relational Model" is even a great conceptual model for
common business or analytical processing. I think SQL is ugly because the
model is a poor fit; if the model was solid, a clean syntax would follow
naturally. As for alternatives, MDX has superseded it for core "pivot table"
analytics. For application processing, it seems modern document/hierarchical
databases are good for many transactional needs.

What modern relational databases have... is a very intuitive abstract storage
model. That's awesome. We've been working on an alternative "navigational
model" (inspired from CODASYL and ORMs) based on this storage model. Our
experimental query language is at [http://htsql.org](http://htsql.org)

Our critique of SQL is at [http://htsql.org/doc/overview.html#why-not-
sql](http://htsql.org/doc/overview.html#why-not-sql)

~~~
lukaseder
I'm sure you'll agree with Stonebraker's recent (actually, not so recent)
claims that the relational model is "all wrong":

[http://blog.jooq.org/2013/08/24/mit-prof-michael-
stonebraker...](http://blog.jooq.org/2013/08/24/mit-prof-michael-stonebraker-
the-traditional-rdbms-wisdom-is-all-wrong/)

htsql seems very interesting. I should soon blog about your product and
critique. Have you been publishing elsewhere?

~~~
GFischer
Thanks for linking to that talk, it deserves a submission all of its own.

~~~
pkj
Absolutely. Very interesting talk. Definitely deserves a submission.

BTW, the person asking the last couple of questions is Ed Bugnion, one of the
co-founders of VMWare. He is a faculty now at EPFL.

~~~
lukaseder
Interesting, I didn't know that!

------
rhizome31
Another thing to understand is how NULLs are handled in different contexts,
using either the familiar two-valued logic or the more exotic three-valued
logic. It's kind of messy but really worth knowing if you're working with SQL.
The wikipedia page actually gives a pretty good account of the issue:
[https://en.wikipedia.org/wiki/Null_%28SQL%29](https://en.wikipedia.org/wiki/Null_%28SQL%29)

------
bsaul
The most beneficial thing you can do when you start with SQL is to NEVER EVER
put join clauses in the WHERE section. ALWAYS put the join clauses as close to
the table you're joining as possible.

eg : never do SELECT * FROM A JOIN B WHERE A.ID = B.a_id and B.id > 10 but
always JOIN B ON A.ID = B.a_id AND B.id > 10

the second way of doing "scale" much much better when you add more tables, and
start mixing left and right joins.

------
lysium
Great article! I wish all SQL libraries would adhere to this structure instead
of forcing devs to write glorified (string) SQL. Ie., let me

    
    
        from(join(table1, table2))
        .where(table1.col1 > 10)
        .groupBy(col1)
        .project(col1, avg(col2))
    

and only in that order.

~~~
batbomb
I wrote my own library in Java recently for something similar. It was inspired
by LINQ and SQL grammar.

    
    
         Select statement = table1.select()
           .join(table2,[join condition])
           .where(table1.col1.gt(10))
           .groupBy(table1.col1)
           .selection(col1, Fn.avg(table1.col2))
    
         Select s = new Select()
            .from(table1)
            .join(table2, table2.t1_col1.eq(table1.col1))
            .where(table1.col1.gt(10))
            .groupBy(table1.col1)
            .selection(col1, Fn.avg(table1.col2));

~~~
andrewcooke
have you seen empire db? [http://empire-db.apache.org/](http://empire-
db.apache.org/) no-one much seems to use it, but it's actually quite neat.

------
chris_wot
Be careful with point 6. If you use a subquery through an IN clause and it
returns a NULL as one of the rows, nothing will return.

~~~
lukaseder
That's only true for NOT IN predicates (which is mentioned in the article)

Example:
[http://sqlfiddle.com/#!4/d41d8/16785](http://sqlfiddle.com/#!4/d41d8/16785)

    
    
        with a(x) as (
          select 1    from dual union all
          select 2    from dual union all
          select null from dual
        )
        select 1 from dual where 1 in (select x from a)

~~~
chris_wot
Doh! Of course. :-)

The reason being is that IN uses an disjunction and NOT IN uses a conjunction.
When you compare the value to NULL it returns UNKNOWN, which ultimately causes
the Boolean expression to be false.

I hope that redeems my reputation a little :-)

Btw, great article!

~~~
lukaseder
> which ultimately causes he Boolean expression to be false

Strictly speaking, the boolean expression is also UNKNOWN ;-) I've tried to
explain this here: [http://blog.jooq.org/2012/01/27/sql-incompatibilities-not-
in...](http://blog.jooq.org/2012/01/27/sql-incompatibilities-not-in-and-null-
values/)

... but to this date, I'm still not 100% sure if I correctly understood SQL
NULL. A quite worrying thing is documented here:
[http://blog.jooq.org/2012/12/24/row-value-expressions-and-
th...](http://blog.jooq.org/2012/12/24/row-value-expressions-and-the-null-
predicate/)

It's about row value expressions and NULL predicates. The following are not
the same!

    
    
            (A, B) IS NOT NULL
        NOT((A, B) IS     NULL)
    

> I hope that redeems my reputation a little :-)

You're forgiven ;-) Thanks!

~~~
chris_wot
Darn it, right again. Sigh.

Those expressions can be worked out via first order predicate logic. Damned
tricky, I can definitely see how that would trip up pretty much anyone at
first glance. If you weren't aware of logical equivalences then you might be
scratching your head for some time!

Another great article, btw.

------
saosebastiao
Fantastic article. I clicked on it, hoping to find a way to help some of my
less SQL-savvy co-workers to understand SQL a little better. I instead came
away with a better understanding of my own.

~~~
triplesec
Stuff in here they nevr taught me all those years ago either! And a smart geek
is never be too proud for a refresher.

~~~
collyw
Where do you go to keep up with things like this.

I was taught at a reputable university to put joins in the where clause. Now
it seems that is frowned upon, and I only find out 13 years later.

------
mtdewcmu
I like SQL. The relational model is often underestimated. All of the complex
structure that goes beyond individual rows is orthogonal and is itself first-
class data. Contrast with document databases where hierarchies are part of the
low-level storage mechanism. By making the structure first-class data, the
relational model keeps you from painting yourself into a corner.

------
ExpiredLink
> _If a has 3 columns and b has 5 columns, then the "output table" will have 8
> (3 + 5) columns. The records contained in this combined table reference are
> those of the cross product / cartesian product of a x b. _

This is where the easy steps become hard for real noobs.

~~~
pbhjpbhj
A graphical example of a Cartesian join -
[http://web.utk.edu/sas/OnlineTutor/1.2/en/60477/m71/m71_3.ht...](http://web.utk.edu/sas/OnlineTutor/1.2/en/60477/m71/m71_3.htm).

------
jcampbell1
I think the author has the order wrong. Consider:

    
    
        SELECT WEEKDAY(created_at) wkday, COUNT(1) 
        FROM orders
        GROUP BY wkday
    

The above query works fine, thus GROUP BY, ORDER BY, and HAVING are all aware
of SELECT.

~~~
chris_wot
Those are expressions. They only work once you have picked the table(s) - FROM
- and determined the rows you want returned - WHERE. The query then groups and
orders the results and calculates the scalar value of COUNT(1) which it
repeats on every row, then it determines the weekday for every row for the
column created_at and uses that value for output.

~~~
jcampbell1
My point is that a better explanation of the execution ordering is:

FROM, JOIN, WHERE, SELECT, GROUP BY, HAVING, DISTINCT, UNION, ORDER BY, LIMIT.

The author has SELECT after the GROUP BY and HAVING, which doesn't make sense
because GROUP BY is aware of expressions in the SELECT statement. Grouping and
filtering happens _after_ selection.

~~~
lukaseder
(I had to correct myself)

SELECT is evaluated _after_ GROUP BY, at least in most databases that
implement SQL correctly. SQL Server's documentation nicely explains this:

[http://technet.microsoft.com/en-
us/library/ms189499.aspx](http://technet.microsoft.com/en-
us/library/ms189499.aspx)

Skip to "Logical Processing Order of the SELECT statement".

MySQL, PostgreSQL, and SQLite behave differently, though!

------
harrytuttle
Not a bad article but it needs a step 11:

 _Know thy database engine for if you do not, your local DBA will be mightily
pissed off when you do a cross join across 15 tables and watch his /her IOPS
go through the roof..._

~~~
ams6110
This is why you have a test instance for developers (or more likely now, every
developer runs his own engine locally or in a VM).

~~~
harrytuttle
That's irrelevant as it usually only goes bang in production under production
load and data volumes.

------
shire
Thanks for this article very beneficial, I'm learning about Databases at the
moment so this definitely comes in handy while learning SQL.

