I think it's best just to read what a cartesian product is, then consider that all joins are just simple filter operations after you've already made a cartesian product.
(although modern databases may do something more complicated, I'm not sure)
Databases turn the big Cartesian product into a large number of smaller Cartesian products by inexpensively grouping values such that keys not in the group are guaranteed to not meet the filter operation criterion.
Instead of n^2 operations it is more like m * ((n/m)^2) that can be evaluated in parallel across the m groups.
The simplest example of this is hash joins and equality filters. Keys that match on equality will also be grouped into the same hash bucket, reducing the search space. Keys that hash to different buckets will never be equal. The join is therefore the union of the cartesian product on each hash bucket, which requires fewer operations than filtering the cartesian product of the entire key set.
It's probably worth noting that a relation is a subset of the cartesian product too. For me at least, it makes the concepts make a lot more sense to think about it in these terms.
Agreed. Indeed the example is quite misleading as it pointedly does not illustrate the most common case, which os of a small lookup table appearing in multiple result rows.
sigh... this is totally wrong... as Jeff himself kinda realized with his comment "There's also a cartesian product or cross join, which as far as I can tell, can't be expressed as a Venn diagram"
SQL does have set operations, and the Venn diagram treatment is much less terrible [though still not quite accurate] at explaining SQL set semantics.
The biggest flaw with the Venn diagram thing is that it doesn't capture the productive aspect of joins.
A inner join B on (some-condition) will produce ALL combinations of A records pasted to B records where some-condition holds.
The Venn diagram approach can't distinguish between a single A-pasted-to-B record and 100 thousand combinations of A-pasted-to-different-Bs. All it can say is that the resulting combinations consist of some set of A and B records.
Venn diagrams don't accurately describe the records resulting from a join at all. At best, they rule out records that can't occur.
Really? Even if it NEVER produces the right answer? How is it useful then?
A [inner|left|right|outer|cross|natural|whatever] join B NEVER produces a result that is in either A or B. It produces something larger, that wasn't it A or B before.
It's like trying to describe the result of meiosis with a Venn diagram. It's the combinations that matter, and the combinations are not reflected in the Venn diagram.
It's not that the picture is incomplete. It is, I believe, that the picture is incorrect. Sets are not relations. This is a barrier to understanding for a beginner.
I like that this only covers left outer joins as opposed to left and right outer joins.
The difference between the two is sometimes confusing for people, so I've always suggested that people learn one and stick with it, since you can accomplish the exact same thing with either.
For that matter, most queries work nicely if you avoid null attributes altogether, and stick to (LEFT INNER) JOIN, WHERE ... IN, UNION, INTERSECT and EXCEPT.
(although modern databases may do something more complicated, I'm not sure)