A Visual Explanation of SQL Joins (2007)

Tycho · on May 27, 2013

I think it's best just to read what a cartesian product is, then consider that all joins are just simple filter operations after you've already made a cartesian product.

(although modern databases may do something more complicated, I'm not sure)

jandrewrogers · on May 27, 2013

Databases turn the big Cartesian product into a large number of smaller Cartesian products by inexpensively grouping values such that keys not in the group are guaranteed to not meet the filter operation criterion.

Instead of n^2 operations it is more like m * ((n/m)^2) that can be evaluated in parallel across the m groups.

The simplest example of this is hash joins and equality filters. Keys that match on equality will also be grouped into the same hash bucket, reducing the search space. Keys that hash to different buckets will never be equal. The join is therefore the union of the cartesian product on each hash bucket, which requires fewer operations than filtering the cartesian product of the entire key set.

tome · on May 27, 2013

This was the realisation that made me finally understand joins (although they're certainly not generally implemented like that in practice!).

mcintyre1994 · on May 27, 2013

It's probably worth noting that a relation is a subset of the cartesian product too. For me at least, it makes the concepts make a lot more sense to think about it in these terms.

Tloewald · on May 27, 2013

Agreed. Indeed the example is quite misleading as it pointedly does not illustrate the most common case, which os of a small lookup table appearing in multiple result rows.

quizotic · on May 28, 2013

sigh... this is totally wrong... as Jeff himself kinda realized with his comment "There's also a cartesian product or cross join, which as far as I can tell, can't be expressed as a Venn diagram"

SQL does have set operations, and the Venn diagram treatment is much less terrible [though still not quite accurate] at explaining SQL set semantics.

The biggest flaw with the Venn diagram thing is that it doesn't capture the productive aspect of joins.

A inner join B on (some-condition) will produce ALL combinations of A records pasted to B records where some-condition holds.

The Venn diagram approach can't distinguish between a single A-pasted-to-B record and 100 thousand combinations of A-pasted-to-different-Bs. All it can say is that the resulting combinations consist of some set of A and B records.

Venn diagrams don't accurately describe the records resulting from a join at all. At best, they rule out records that can't occur.

So they're not so great as an explanatory device.

epochwolf · on May 28, 2013

It's good enough for a beginner to start with. Provided they know it's not the complete picture.

quizotic · on May 28, 2013

Really? Even if it NEVER produces the right answer? How is it useful then?

It's like trying to describe the result of meiosis with a Venn diagram. It's the combinations that matter, and the combinations are not reflected in the Venn diagram.

gburnett · on May 29, 2013

It's not that the picture is incomplete. It is, I believe, that the picture is incorrect. Sets are not relations. This is a barrier to understanding for a beginner.

BinaryBullet · on May 28, 2013

I made something a while back that compares lists/tables and visually shows results:

http://skratchdot.github.io/list-tool/

Some of the algorithms aren't optimized (as to keep results in the order they were processed).

It was more of an experiment to play with web workers and responsive design.

asrowe · on May 27, 2013

I liked the article a lot. For more detail its worth looking at:

Join Processing in Relational Databases, Priti Mishra and Margaret H. Eich, ACM Computing Surveys (CSUR) Surveys, Volume 24 Issue 1, March 1992, Pages 63 - 113

via http://www.allthingsdistributed.com/2013/04/join-processing-...

gjreda · on May 27, 2013

I like that this only covers left outer joins as opposed to left and right outer joins.

The difference between the two is sometimes confusing for people, so I've always suggested that people learn one and stick with it, since you can accomplish the exact same thing with either.

siddboots · on May 27, 2013

For that matter, most queries work nicely if you avoid null attributes altogether, and stick to (LEFT INNER) JOIN, WHERE ... IN, UNION, INTERSECT and EXCEPT.