

A Visual Explanation of SQL Joins (2007) - g3rald
http://www.codinghorror.com/blog/2007/10/a-visual-explanation-of-sql-joins.html

======
Tycho
I think it's best just to read what a cartesian product is, then consider that
all joins are just simple filter operations after you've already made a
cartesian product.

(although modern databases may do something more complicated, I'm not sure)

~~~
jandrewrogers
Databases turn the big Cartesian product into a large number of smaller
Cartesian products by inexpensively grouping values such that keys not in the
group are guaranteed to not meet the filter operation criterion.

Instead of n^2 operations it is more like m * ((n/m)^2) that can be evaluated
in parallel across the m groups.

The simplest example of this is hash joins and equality filters. Keys that
match on equality will also be grouped into the same hash bucket, reducing the
search space. Keys that hash to different buckets will never be equal. The
join is therefore the union of the cartesian product on each hash bucket,
which requires fewer operations than filtering the cartesian product of the
entire key set.

------
quizotic
sigh... this is totally wrong... as Jeff himself kinda realized with his
comment "There's also a cartesian product or cross join, which as far as I can
tell, can't be expressed as a Venn diagram"

SQL does have set operations, and the Venn diagram treatment is much less
terrible [though still not quite accurate] at explaining SQL set semantics.

The biggest flaw with the Venn diagram thing is that it doesn't capture the
productive aspect of joins.

A inner join B on (some-condition) will produce ALL combinations of A records
pasted to B records where some-condition holds.

The Venn diagram approach can't distinguish between a single A-pasted-to-B
record and 100 thousand combinations of A-pasted-to-different-Bs. All it can
say is that the resulting combinations consist of some set of A and B records.

Venn diagrams don't accurately describe the records resulting from a join at
all. At best, they rule out records that can't occur.

So they're not so great as an explanatory device.

~~~
epochwolf
It's good enough for a beginner to start with. Provided they know it's not the
complete picture.

~~~
quizotic
Really? Even if it NEVER produces the right answer? How is it useful then?

A [inner|left|right|outer|cross|natural|whatever] join B NEVER produces a
result that is in either A or B. It produces something larger, that wasn't it
A or B before.

It's like trying to describe the result of meiosis with a Venn diagram. It's
the combinations that matter, and the combinations are not reflected in the
Venn diagram.

------
BinaryBullet
I made something a while back that compares lists/tables and visually shows
results:

<http://skratchdot.github.io/list-tool/>

Some of the algorithms aren't optimized (as to keep results in the order they
were processed).

It was more of an experiment to play with web workers and responsive design.

------
asrowe
I liked the article a lot. For more detail its worth looking at:

Join Processing in Relational Databases, Priti Mishra and Margaret H. Eich,
ACM Computing Surveys (CSUR) Surveys, Volume 24 Issue 1, March 1992, Pages 63
- 113

via [http://www.allthingsdistributed.com/2013/04/join-
processing-...](http://www.allthingsdistributed.com/2013/04/join-processing-
relational-databases.html)

------
gjreda
I like that this only covers left outer joins as opposed to left and right
outer joins.

The difference between the two is sometimes confusing for people, so I've
always suggested that people learn one and stick with it, since you can
accomplish the exact same thing with either.

~~~
siddboots
For that matter, most queries work nicely if you avoid null attributes
altogether, and stick to (LEFT INNER) JOIN, WHERE ... IN, UNION, INTERSECT and
EXCEPT.

