Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
A Visual Explanation of SQL Joins (2007) (codinghorror.com)
111 points by g3rald on May 27, 2013 | hide | past | favorite | 13 comments


I think it's best just to read what a cartesian product is, then consider that all joins are just simple filter operations after you've already made a cartesian product.

(although modern databases may do something more complicated, I'm not sure)


Databases turn the big Cartesian product into a large number of smaller Cartesian products by inexpensively grouping values such that keys not in the group are guaranteed to not meet the filter operation criterion.

Instead of n^2 operations it is more like m * ((n/m)^2) that can be evaluated in parallel across the m groups.

The simplest example of this is hash joins and equality filters. Keys that match on equality will also be grouped into the same hash bucket, reducing the search space. Keys that hash to different buckets will never be equal. The join is therefore the union of the cartesian product on each hash bucket, which requires fewer operations than filtering the cartesian product of the entire key set.


This was the realisation that made me finally understand joins (although they're certainly not generally implemented like that in practice!).


It's probably worth noting that a relation is a subset of the cartesian product too. For me at least, it makes the concepts make a lot more sense to think about it in these terms.


Agreed. Indeed the example is quite misleading as it pointedly does not illustrate the most common case, which os of a small lookup table appearing in multiple result rows.


sigh... this is totally wrong... as Jeff himself kinda realized with his comment "There's also a cartesian product or cross join, which as far as I can tell, can't be expressed as a Venn diagram"

SQL does have set operations, and the Venn diagram treatment is much less terrible [though still not quite accurate] at explaining SQL set semantics.

The biggest flaw with the Venn diagram thing is that it doesn't capture the productive aspect of joins.

A inner join B on (some-condition) will produce ALL combinations of A records pasted to B records where some-condition holds.

The Venn diagram approach can't distinguish between a single A-pasted-to-B record and 100 thousand combinations of A-pasted-to-different-Bs. All it can say is that the resulting combinations consist of some set of A and B records.

Venn diagrams don't accurately describe the records resulting from a join at all. At best, they rule out records that can't occur.

So they're not so great as an explanatory device.


It's good enough for a beginner to start with. Provided they know it's not the complete picture.


Really? Even if it NEVER produces the right answer? How is it useful then?

A [inner|left|right|outer|cross|natural|whatever] join B NEVER produces a result that is in either A or B. It produces something larger, that wasn't it A or B before.

It's like trying to describe the result of meiosis with a Venn diagram. It's the combinations that matter, and the combinations are not reflected in the Venn diagram.


It's not that the picture is incomplete. It is, I believe, that the picture is incorrect. Sets are not relations. This is a barrier to understanding for a beginner.


I made something a while back that compares lists/tables and visually shows results:

http://skratchdot.github.io/list-tool/

Some of the algorithms aren't optimized (as to keep results in the order they were processed).

It was more of an experiment to play with web workers and responsive design.


I liked the article a lot. For more detail its worth looking at:

Join Processing in Relational Databases, Priti Mishra and Margaret H. Eich, ACM Computing Surveys (CSUR) Surveys, Volume 24 Issue 1, March 1992, Pages 63 - 113

via http://www.allthingsdistributed.com/2013/04/join-processing-...


I like that this only covers left outer joins as opposed to left and right outer joins.

The difference between the two is sometimes confusing for people, so I've always suggested that people learn one and stick with it, since you can accomplish the exact same thing with either.


For that matter, most queries work nicely if you avoid null attributes altogether, and stick to (LEFT INNER) JOIN, WHERE ... IN, UNION, INTERSECT and EXCEPT.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: