That's not to say that other ways of illustrating things don't have merit, but broad statements like "say NO to Venn diagrams" just aren't helpful. If people find Venn diagrams a useful tool (and their proliferation indicates they do), I don't see the harm. They didn't stop me going deeper on stuff like relational algebra when I was ready to.
I think it's OK to use the venn diagram only as an explanatory tool, briefly, and informally. If you raise it to the level of a complete description of semantics, then it's very inadequate. Any step off the beaten path, where the cardinality of the join is no longer 1:1 or 1:n, and you're back in the weeds again.
The first section is about how Venn works fine for AND and OR because it’s set theory and Venn works just fine for examples (maybe not every problem though).
Joins are not sets. They are Cartesian products.
I do know at this point that he is technically right... But being 100% correct doesn't really matter at the beginning.
And Venn diagrams are right enough to start working with joins
I don't know if that's representative, but my gut instinct is that Venn diagrams are very accessible to the general public.
But that wasn't my point; if other people find these illustrations helpful, that's awesome and I wouldn't want to stop someone inventing something new and useful. My point was that I violently disagree with one-size-fits-all proclamations like "Say NO to Venn Diagrams."
His visualizations ultimately help nothing. You have to stare at them for a while to see what the pattern is, but there's absolutely no mnemonic hook to remember what's what, so in the end all you learn from his visualizations are that there exist different ways to slice and dice the products of sets, which is already a given if you're trying to figure out what the differences are.
Check this out, similar to the author's approach, but animated. This is the best visual of SQL joins I've ever found. https://news.ycombinator.com/item?id=16983007
Two tables (A & B) are somehow being joined (in the sql sense, by some query) ON a column from each table expressing the same type of id key.
Consider the union of values for those columns in either table. That is the set containing everything in either circle in the Venn diagram.
Then consider the values which appear just in A or just in B -- those, respectively are the non-overlapping sides.
Then consider the intersection (values which appear as values in both A & B), that is the middle part shared by both circles.
The highlighting on the Venn diagram shows which sets of values for the key could be made to appear in the resulting join for each type of query (and, yes, by extension indicating whether some other non-null data could also be coming from a given table in the cases where the given value is present for the table with the ON'ed column in question)
That's perfectly valid for a Venn diagram. The representation is meaningful and concise.
If you don't understand the fundamental nature of what is happening in a SQL JOIN I can see why you may need a more detailed/verbose diagram for each type of join, but the diagrams don't purport to be about the sets of rows from a table in the result and anyone who grasps the basic principle behind the joins won't struggle with the Venn diagrams because they strike the nail on the head with respect to the differences between each type of join without the need for verbosity & redundancy.
Take two tables of 1m rows each and do a cross join on them. The Venn diagram model says you would have at most 2m output rows. But that's wrong. You would have 1t rows output. When when you're using a normal join on a specific value matching, if the columns are unsorted, that's 1t comparisons potentially.
The right mental model leads to the right intuition of what's going under the surface. And it's all well and good to say "If you don't understand the fundamental nature of what is happening in a SQL JOIN..." But that is exactly who these diagrams are intended for! People who don't understand yet and are trying to.
Teaching people with the right model means they don't have to unlearn the wrong model later when questions get more complicated.
Say tbl_1 has 5 rows, and column "A" is the numbers 1 to 5. Then tbl_2 has the numbers 1 and 2, each repeated 5 times. Now how many rows are in the result of the query below?
select tbl_1.A from tbl_1 left join tbl_2 using(A)
The "filtered cross-join" model allows students who have learned SELECT and WHERE to think of a JOIN as an extension of those primitives, a combination of two tables on which they can filter.
Venn diagrams might be useful to visualize an outcome, but they will not support stepping through primitives to a solution.
In "filtered cross-join", JOIN can be an extension of SELECT that combines two tables. The combination is a set of rows, each each of which combines all the fields of one row of one table with all the fields of one row of the second. This is easily visualized with two two-column tables of three rows.
They can then use WHERE to find those rows with matches on the key field.
With this model, students can build up JOIN as an abstraction of simpler primitives. When they are struggling with a problem, you can ask them to first step through those primitives to accumulate the solution. Say, query the "raw" join and examine a few rows to see which they want returned. What is true of those and not true of the ones you do not want? ("I want those where these two fields match, and none that do not.") Ok, how do you express that condition in SQL? ("Where . . . this equals that?" "Hmm, try that" "HEY THAT WORKED!")
With that basis, they have a model that can extend to more complexity -- joins across several tables, joins on the same table, joins with conditions other than field equality.
Building up to and using this model, you can have the vast majority of students writing joins with confidence in two days.
My suggestion to the site would be, use an example that has two columns on each table, to provide the key field on which the join will be performed.
The diagrams that the author of the article comes up with (with the exception of the cross join) are just Venn diagrams rotated 90 degrees.
left join tbl_2
on tbl_1.A = tbl_2.A
It is in fact a set difference and the sets in question are values for the column being ONed on in each table (as I explained in the top level
My reply was with respect to these being cross joins with filters. That’s just one way of thinking about a JOIN that actually adds notational complexity (and does not actually represent how your database would be doing the join because it would be unnecessarily expensive computationally). Thinking of it in terms of a set difference of the values in thr ON column is arguably the better way of thinking of it if you are interested in how such a join should be implemented. Alternatively you can interpret the venn diagrams as just the filtration if you prefer interpreting as a cross join with filters.
In anticipation of some possible technical replies, let me be more explicit:
It isn't a set difference, no matter how you view it.
The original lists of objects are not actually sets because the elements are not distinct. From this viewpoint, it isn't a set difference because they aren't even sets to begin with.
There is an implicit "row number" feature of each object so in that sense the objects are distinct and do form sets. From this viewpoint, it still isn't a set difference. If you have one set with 5 elements and another with 10, the set difference cannot have 13 elements.
If you think of inner joins as a cross join with a filter, then we have all seen cross joins represented by Venn diagrams - and for me at least I lost several hours of work restructuring my understanding of what I was working with!
i could tell the link had an important distinction to make, but i was unable to understand it without your supplemental explanation.
You dismiss that as "silly" because anyone who already "grasps the basic principle" could figure it out. You're talking about the use of Venn diagrams by people who already "know how SQL joins basically work".
It sounds like you missed the point the article was making.
My point is that people aren’t really getting it. Is that the fault of the pupil or the teacher? If someone wants to investigate the latter I say let them. All your surprises will be good.
The author's approach is a better visual model. Using boxes makes so much more sense, since we are dealing with rows of data. The circles of a venn diagram don't represent a result set visually, and i think that's why I always found it confusing.
The the main point I think should be expressed to really explain what's happening in a join is about null values. If one table has nulls in the matching set, which ones do we want to keep, or discard? The box approach shows this beautifully, while a venn approach doesn't really make this obvious.
The best visual representation I've ever seen is similar to the author's, but animated. https://news.ycombinator.com/item?id=16983007
If I turn the "correct" diagrams sideways, it looks very much like a venn diagram again.
I'm drawing the conclusion that some people just need to see arbitrary concrete values to be able to understand what's going on.
Otherwise it's literally a square, sideways Venn Diagram. You can turn your phone sideways to confirm.
In a join the result can multiply and you get up to A*B rows instead.
Hence the geometrical representation can mislead you into thinking your data is smaller than what it actually is. As a quick mental model i think the diagrams can help but you have to understand they are only a reminder of how the join is being matched, not a representation of the complete data set.
I had been using SQL but not understanding it. Mostly because of poor previous explanation via things like Venn diagrams- the wrong model lead me to think incorrectly and make mistakes.
With the right model in my head, the exam was dead easy. I don't even know that she realized how impactful that was to me. So on the off chance you're reading HN, thanks Wen!
To me the Venn diagrams made it quite clear to understands while this article diagram wouldn't have helped me at all.
The best way to learn is in a way that make you understands. Your friend found a way that make you understands, but do you believe his explanation would have been universally understood? We would live in a much better world if it was that simple.
For me, Venn worked to make me understands. For you, your friend explanation worked to make you understands.
Should we make your friend stop explaining because he can't help everyone? Or should we simply let both way coexist?
What I would like to see is inner/left/right joins explained with colours used to indicate value (as the article already does) but in the form of entries in a PivotTable (as the article used only for outer join), perhaps with faded entries for those that were filtered out. Most of all, it would be great to see this with non-unique values in both tables, as rare as this admittedly is. This could be followed up by the more usual primary key / foreign key combination. At that point it would be visually clear that only max(n,m) entries are left, so you could finish up by showing those evil Venn diagrams after all, with the disclaimer that you understand that there's a full n×m combination under the hood in theory.
He didn’t know SQL well, but he described a left join problem perfectly by drawing a Venn diagram to explain the data that needed to be retrieved. When I was about to finish up the interview, he was so curious about the answer he insisted that I explained the answer so “he could sleep at night”.
I gave him a thumbs up just because he knew the concept and displayed intellectual curiosity. It was a great choice.
1:1 matches on rows are trivial. Venn diagrams can describe these too. But you really can't let people develop an intuition based on that, early on, or they kinda get stuck and it takes a while to unravel.
A Venn diagram may not be a perfect analog for joins but they relay the concept quite well.
The inner join is already perfectly distinguishable by how dark the squares are, and anything other than monochromacy preserves the distinctions that the color adds in the cross join, unless the site I used is misleading.
SQL used to have an intriguing aura of exotic knowledge back in time when SQL was still a Language to highlight on a resume and to have an RDBMS server meant spending thousands.
So Venn diagrams were just adding to such aura. But it's a visual aid and is usually supplemented with examples. If it doesn't ring a bell, some hands-on or detailed explanation helps crack it.
Personally, I believe JOIN should first be touched upon when introducing normalization. It just makes sense: break tables apart then join them together. Here one could also grasp a utility of an index on join column.
Then progress to more advanced use of joins - outer join (needs a concept of NULL), union (basically a synthetic set) etc.
Finally, a cross join - as one of those cases that may lock your terminal if you don't know what you're doing. Of course learning Cartesian product concept here, if not yet.
As anything in Computer, the hands-on aspect usually clears thing much better then science-first. The science would ease in much better once the utility is grasped.
Teaching means helping someone to learn, not making them understand your way of teaching. So different aids may work for different audiences.
The 'FROM' clause defaults to Cartesian product for multiple tables. You can then specify your join conditions in the where clause, but it makes outer joins harder to write - Oracle does them with special (+) operator, but it's clumsy.
It doesn't (shouldn't) make much difference at all to the database whether you specify your join condition after the `ON` or in the `WHERE` - if it doesn't hoist `WHERE` conditions into the join clause, and potentially change the join order because of it, it's missing big optimization opportunities.
I can't imagine any reasonable way to do outer joins in a where clause.
For inner joins the grandparent is correct and it should never matter whether you use ON or a where clause.
Of course it's weird and ugly. I don't think there's anything to recommend it over ANSI join syntax.
On the topic as such, besides ON and WHERE there is also USING and NATURAL.
Som very influential db-people have argued that NATURAL should be the only supported join. Checkout the third manifesto.
ON as such allows for outer joins, something you can’t do starting with a Cartesian product. But Oracle has syntax for outer joins using WHERE if you prefer that.
Outer joins could be achieved with NVL and a suitable default value, if you don't want to do the (+) thing?
Edit: you don't get to pair all rows from one table with an all-NULL "fictional" row from the other table in a cartesian product, so the answer is no
The whole point of the article is that there are different visualizations that are much better than Venn diagrams (in fact Venn diagrams are totally misleading), and they don't require any relational algebra.
That said, if someone doesn't know what a cartesian product is (high school level math), perhaps they shouldn't be allowed near a database...
This is such an intellectually snobbish position to take. I can tell you that I didn’t know this term when I first interacted with a database, and if someone had told me the above, I would have just felt stupid and given up.
The fact you learned what a Cartesian product was at high school shouldn’t preclude someone who hasn’t from trying to build stuff and experiment with tech. Not everyone learns things the same way, or in the same order, and that’s okay.
Intellectually snobbish is "everyone should listen to Mahler and read Epictetus in the original language".
This is just basic professional requirement...
>The fact you learned what a Cartesian product was at high school shouldn’t preclude someone who hasn’t from trying to build stuff and experiment with tech.
Just not in any capacity where people depend on their output...
That is very different. This was a 16 year old kid trying to mess around and learn stuff, he should be able to be around databases all he wants. I wouldn't hire him as a DBA, but he should be able to learn without having to wait for more advanced math.
And that's what I meantt, so since you agree that's "that's very different", we're on the same page.
>a 16 year old kid trying to mess around and learn stuff, he should be able to be around databases all he wants
Well, it's not like a comment on HN is going to stop them.
Should we critique "lefty loosey righty tighty" because a professional mechanic knows that's not precisely true and should be able to analyze the context of the screw to determine whether it's left or right handed? Of course not, that mnemonic is for novices, not professional mechanics.
Knowing what is meant when a term or symbol is used is not math, it's language.
You can know the meanings of all kinds of terms and symbols, and you will still not know anything about math, you'll only know something about mathematicians and how they like to communicate.
Conversely, someone may understand the math perfectly well but not know the correct terms or symbols (perhaps they learned under a different system, in a different country, etc.)
When the symbol is a mathematical symbol, and describes a mathematical operation, then "knowing what is meant" by it means you know a piece of math.
Besides I wasn't describing what you "know" if you know what a Cartesian Product is. I was stating (as a fact) where you learn it (at high school math classes).
Interestingly your "fact" is false - from my experience. You'd need to specific country where this high school is located, the year of the syllabus and possibly a region within a country. When I went to "high school" (we don't call it that) the syllabus differed by region.
And we never studied cartesian products.
In terms of what you're "stating (as a fact)," I can tell you with absolute certainty that you're wrong. Many high schools do not teach set theory and, among those that do, many - mine included - don't refer to this operation as the Cartesian product. It is simply "the product of A x B."