Hacker News new | past | comments | ask | show | jobs | submit login
The Difference Between SQL’s JOIN .. ON Clause and the Where Clause (jooq.org)
164 points by pplonski86 9 days ago | hide | past | web | favorite | 28 comments





The syntax for joins is unnecessarily complex. The difference in behaviour is determined by the operator used for comparison between the two tables. If there existed modifiers for the operators to be sensitive to nulls, no need for special syntax would be needed.

If we define /<op> to make the operator include nulls on the left side and <op>\ to include nulls on the right side, we can do joins of all kinds without the JOIN ON syntax.

Example.

inner join

select * from a,b where a.id=b.id

left join

select * from a,b where a.id=\b.id

right join

select * from a,b where a.id/=b.id

outer join

select * from a,b where a.id/=\b.id


That's the old join syntax before ANSI SQL was designed. Except your operators look like "* =" and "= *" (without whitespace of course, blame HN's markdown parser). Sybase supported this syntax well into its version 15.x (maybe still in version 16?). And Sybase did not (does not?) support full outer joins.

It may seem simple on the surface, but once you get into complex queries, it's a horrible syntax, and it's easy to forget the relations between the tables, particularly with several tables (4+) involved.

As someone who has converted an old and large SQL code base from non-ANSI JOINs to ANSI JOINs, I am happy to see this style of syntax abandoned.


IIRC, Oracle also supports this "* =" and "= *" syntax. I started with that and then learned ANSI JOINs; was difficult initially because I was used to the former syntax but I would agree that ANSI JOINs are more clearer/cleaner. That and the "with" clause makes queries so much more readable.

Can't agree more, having been in a very similar position. From what I understand (based on the book SQL for Smarties), it's also more efficient for the query planner in some cases (especially when filter/non-join predicates are added to the JOIN).

Eh—I would have designed SQL differently if I were in charge (especially for casts, function definitions, and string munging) but leaving that aside, I disagree that the JOIN ON syntax is unnecessarily complex.

We do not write programming languages solely for computers. We write them for humans too. Humans make mistakes and come at codebases at all manners of experience, both in breadth and depth.

For example, it may literally be someone's first time looking at SQL when they're confronted with the following:

    SELECT *
    FROM classrooms, enrolments
    WHERE classrooms.id=\classroom_id;
It is inherently less clear than the following:

    SELECT *
    FROM classrooms
    LEFT JOIN enrolments
    ON classrooms.id = classroom_id;
It's easier to figure out bugs too, because it leaves more room to understand what the person intended to do rather than what they actually did. For example, say the classrooms table had, due to legacy issues, a nullable id field, but we didn't want to include those classrooms. Someone may have inadvertently typed:

    SELECT *
    FROM classrooms
    LEFT JOIN enrolments
    ON classrooms.id = classroom_id
    WHERE classroom_id IS NOT NULL;
    //    ^ should have been classrooms.id
Because the NOT NULL criteria is in the where clause, it's clear it's not supposed to impact the join. It's supposed to be part of the filter. I know it's the same thing practically, but they're conceptually different.

Further, as you get into more complex queries (unions, recursive, cross joins, aggregations) stuffing everything into one hairy WHERE statement gets impossible to understand because it does everything all at once.


But sometimes you want to outer join on the ID matching and some other condition, and thats where the JOIN syntax gets more useful, because the 'outer' then obviously applies to a whole series of logic statements that are and/or'd together. If you write reports its not long before you start hitting really obscure joins and conditions and subqueries and so on.

How would you solve a left join like this with that kind of syntax?

LEFT JOIN ( inner join table2 ON table2.ID = table1.table2id and table2.othertype = 1234 ) on table1.outerid = outertable.Id


The subquery would be in the WHERE clause, and it would be called for each row, thus making it N times slower than the JOIN syntax. There is a reason why this style has been abandoned.

Modern RDBMS's are smart enough to figure out when the result of a subquery doesn't depend on rows outside of it. Then the subquery is run only once, and its result is treated as a temporary table to be JOINed.

For most simple queries, that is correct. But if the query gets complicated, and/or the tables involved have several indexes, where several might be useful, then it might make the wrong decision. I've written plenty of complicated queries, where I have to help the query optimiser along.

Although, the most modern I've had this experience with is SQL Server 2014, but that definitely still needed guiding a few times.


How about

select * from outertable, (select *, outerid from table1,table2 where table1.table2id = table2.ID and table2.othertype = 1234) t12 where outertable.Id =\ t12.outerid


Contrary to articles conclusion, there are databases (i.e. Oracle) where it doesn’t matter how you write joins, but oracle also has support for left/right/full outer joins in the where clause

Left join ... is the same as leftcol=rightcol(+) (oracle has special syntax for this with that + sign).

In summary, depending on database, it may not matter how you write these joins, but it’s best to stick to SQL standard using JOIN if you’d like cross database support.


I have noticed that many authors of SQL articles don't seem to be aware of Oracles quirks. For example, recently I read an article about NULLs in SQL. The author didn't specify any particular database, but wrote as if what he was saying applied to all relational databases. But Oracle treats empty strings and NULLs as the same thing, which I don't think the article's author was aware of.

> treats empty strings and NULLs as the same thing

Wow, that's ridiculous. They are very clearly two different things. For example "this has no value" and "we don't know the value of this".


More like "this value is empty text" and "this has no value/is unknown".

I agree though. Makes as much sense as treating 0 and NULL as the same.


What does ANSI SQL say about the matter? If Oracle's behavior is non-standard, it makes sense for the aforementioned author to ignore it.

My understanding is that Oracle's behavior existed prior to the standard. That is, Oracle came first, then the standard was written years later. Oracle hasn't changed to match the standard because of issues with backwards compliance.

> I have noticed that many authors of SQL articles don't seem to be aware of Oracles quirks

I've spent much more time with Oracle than with other RDBMS. So, what made you think so?

> The author didn't specify any particular database, but wrote as if what he was saying applied to all relational databases

Yes, in this article's case, the article and conclusion applies to all relational databases.

> But Oracle treats empty strings and NULLs as the same thing, which I don't think the article's author was aware of.

I am aware of this, but how is it related to the article?


Indeed, Sybase for instance treats a single space (e.g. ' ') the same as an empty string (e.g. ''). For the purpose of concatenating in Sybase, NULL is '' (and thus ' '). So `SELECT a = "a" + NULL` would yield a = "a". But so would `SELECT a = "a" + " "`.

MSSQL allows this behaviour, since it's based on Sybase, but you have to explicitly turn it on.


"this may not continue to be true in future releases, and Oracle recommends that you do not treat empty strings the same as nulls"

https://docs.oracle.com/database/121/SQLRF/sql_elements005.h...


How are we supposed to treat them differently? Oracle's the one putting the damn nulls in the db.

(Recently tripped over this one.)


They finally did a 180 on that bullshit after years trying to convinced everyone that empty string = null. (marketing, blogs, oracle ace etc.)

Oracle's (+) syntax does not allow for full outer joins.

The conclusion would be the same with Oracle's syntax. It's just more obscure, as it would turn into a "fa.film_id < 10" vs "fa.film_id(+) < 10" discussion.


As soon as you filter on the table you’re joining with in the where clause it becomes an inner join.

    FROM foo
    LEFT JOIN bar ON ...
    /* Makes it an inner join: */
    WHERE bar.baz ...

Only if you're resting for a non null value. If you test for a null value, you'll be looking for all values in one table that don't have a match in the other table.

Indeed, a faster way to check if a table does _not_ have a comparable row is a LEFT JOIN with a WHERE NULL clause:

SELECT t1.* FROM t1 LEFT JOIN t2 ON t2.id = t1.foreign_id WHERE t2.id IS NULL

Is a lot faster than the naïve alternative:

SELECT t1.* FROM t1 WHERE NOT EXISTS (SELECT * FROM t2 WHERE id = t1.foreign_id)


Unless the query is optimized?

For a simple query like this, then yes, it will be optimised. But once your queries get more complicated, the optimiser might not be able to make that deduction on its own.



Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: