Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Select Star SQL, an interactive SQL book (selectstarsql.com)
700 points by zckao on Sept 4, 2018 | hide | past | favorite | 112 comments

This is good intiative. We need to simplify pedagogy.

On a related note, I always found the syntax of SQL to be great. Someone told me that simplicity of SQL made relational databases even more popular. It is one of the best designed language where even someone new to the field won't feel intimidated. I'm surprised designers of SQL haven't won the Turing Award yet.

I agree except one thing: SELECT cols FROM table. This should really be FROM table SELECT cols as every modern SQL editor will then be able to show you a list of available columns.

See LINQ. There they had the opportunity to fix it and it works just great.

More generally, statements should appear in the same order as they are executed, as far as this is possible.

This nullifies some other common sources of confusion, such as which statements are executed before a GROUP BY and which ones after (and why HAVING exists as a keyword). It's also similar to how CTE syntax is generally much more readable than subqueries.

The order of clauses in a SQL statement should have no relation on the order of execution because SQL is a declarative query language. The query specifies what the user wants and not how the system should compute the answer.

This declarative property also provides a (somewhat weak) motivation of why SELECT should come first. Precisely because it answers part of the WHAT, i.e., the schema of the result.

I personally think first about the loop/iterator organization (FROM in SQL) and only after that what I am going to do with its elements.


    SELECT a1,a2,... FROM Table AS t WHERE Condition
is syntactically equivalent to Python list comprehension:

    [(t.a1,t.a2,...) for t in Table if Condition]
Here the use of attributes (SELECT) is also written before the iterator.

SELECT is also analogous to normal loops:

    foreach t in Table
      if not Condition: continue
      # Use t.a1, t.a2 etc.
Here we first provide the loop specification while the usage of elements is written only in the body.

In the clients I've used (SQLYog, Mysql Workbench), aliasing a table lets you use autocomplete when referencing that alias followed by a dot.

So, I will start with a SELECT * FROM myTable t, then go back and replace * with t.<columns appear here>.

I'll use the same in other places in the query like WHERE conditions.

I concur. I always start with select * from table because I cannot possibly remember all the right column names

I mean, it works either way. Once you type the column names, it can guess the table name easily.

I use DataGrip at work and it can complete the column names without knowing the table name, so it ends up not being a problem.

> Once you type the column names, it can guess the table name easily.

If the query is simple enough that a natural join would work and you aren't doing FROM clause aliasing (which can be useful to make reusable queries self-documenting), sure.

For more complex queries and obviously any time table or column aliasing are used, that becomes somewhere between less likely and logically impossible.

I'd rather have the enjoyment of being able to think in normal English speak rather than Yoda speak just to fix one edge case that appeared decades after they came up with the syntax.

It’s also how azure query language works.

I teach SQL to financial people and, for the most part, it isn't too bad. When I get to joins, I teach a subset of SQL where all joins are done through INNER JOIN / LEFT JOIN syntax and that helps a lot. Joins are the part that everyone has trouble with.

Tbh I usually see teachers worried about joins but its the group by/having clauses that I always watched people struggle with (at least, writing it).

Ofc that might just be the emphasis on joins makes group by the next lowest hanging fruit

What do your abbreviations mean?

tbh = to be honest, ofc = of course

Urban Dictionary (https://www.urbandictionary.com) or even just typing something like "what does tbh mean?" into your preferred search engine will often help you find likely interpretations of unknown abbreviations, in my experience.

> or even just typing something like "what does tbh mean?" into your preferred search engine will often help you find likely interpretations

I mean, that's what they did.

> Joins are the part that everyone has trouble with.

There is significant problem-solution mismatch in joins and some other SQL constructs which are therefore semantically quite controversial in many use cases:


The syntax isn't the problem though. Join relational operation is though on itself.

Can you explain when to really use a lateral join?

In Postgresql, Lateral Joins enable function chaining for set returning functions. ie you can feed the results of earlier srf function to the arguments of following function. This gets around one of Postgresql most glaring omissions - table valued parameters. Is incredibly useful once you get used to them.

I might be misunderstanding your point, but Postgresql functions can take (and return) tables:

    CREATE FUNCTION fullname(person) RETURNS TEXT AS $$
    SELECT $1.firstname || ' ' || $1.lastname
Could you go into a more detail wrt using lateral joins with set returning functions? I'm used to using them as a way to "extend" the row with later terms depending on prior terms as you say, but I'm not comprehending the srf case so well (or tbh the results that pop up when I search for "posrgresql set returning function").

In my experience it’s relatively rare that you need them. I think of them as the for-each of joins. It evaluates row by row, not a single time. Another analogy would be a correlated subquery.

I write a lot of sql and it is generally a nice language.

But for my taste the language is to close to English and therefore redundant and inconsistent in an annoying way. Example:

Select x from tbl ...

Delete from tbl ...

Update tbl set x=...

Why change the word order?

Yes,I also suggest standards committee allow the option of putting FROM clause first - this would make it easier for beginners to grok what’s going on and enable better tooling such as auto-complete.

On PC the alternatives were xBase, FoxPro, Paradox and Access, so it was a no brainer that SQL eventually took over.

Since there weren't any commercial relational databases available pre-SQL, this is quite true.

Codd wrote a couple of query languages before, but they apparently weren't usable by mere mortals.

SQL is over 40 years old, and still dominates. The most recent contenders had to position themselves in terms of SQL (NoSQL).

Also interesting how even NoSQL (non relational) Data stores are embracing SQL or some subset or flavouring of it.

This is gorgeous, and a great topic. No matter what field you're working in, understanding databases is extraordinarily valuable...and SQL is among the longest lasting techs I've ever learned (e.g. name another programming language that is as valuable today as it was 20 years ago, if not moreso). Even with the rise of NoSQL databases, I suspect we'll all still need to know a little SQL in another 20 years (and it was already more than 20 years old when I first encountered it in the 90s). And, even if not the same language, conceptually understanding how to query data is probably timeless.

> name another programming language that is as valuable today as it was 20 years ago, if not moreso

* C

* Python

* Java

20 years ago was 1998.

C I agree with.

But Java and Python? Both languages have changed beyond recognition over the past 20 years. A programmer who hasn't touched either for 20 years would probably be overwhelmed.

Another difference is that even though SQL has changed, there isn't much you need to know outside of the DBMS you are using. With Java and Python on the other hand, you need to keep up with the changing library landscape to stay relevant.

>With Java and Python on the other hand, you need to keep up with the changing library landscape to stay relevant.

For C, its just library management is troublesome enough that you end up writing your own ;)

Perl. And no, that's not just a factor of its current popularity, as the language has evolved and has regular releases.

Perl has amazing backwards compatibility, but has also solidified on much better coding practices and libraries in the last 20 years. You can still run that code from 20 years ago with a new interpreter and it will be safer and faster, or you can choose a more modern style or library and it will work just as well.

And to forestall anyone trying to bring up Perl 6 as a counterargument, unless you actually know what it is and how it interacts with Perl 5 and the community, I doubt the argument is actually saying what you think...

Sure, and I still use Perl as my primary language, just as I did 20 years ago, with some forays into other languages (like Python and C) along the way. But, this is survivor bias (one could argue SQL is also benefiting from survivor bias, though SQL was the dominant data query language then and it is the dominant data query language today, whereas one could argue only C was/is dominant in its space at both points in history). But, where's Pascal, Visual Basic (or Basic of any sort), Tcl, etc.? There's dozens of other languages that came and went, including data-oriented languages. All of which were popular during the 90s, and effectively gone now. And, only C among your examples is actually exhibiting the same kind of longevity (SQL is more than 40 years old, so it predates Python, Java, Perl, etc. by quite a lot)...I probably should have put the starting point for this thought experiment at 30+ years ago, but 20 years ago is when I first started using SQL, so it's what I used, which isn't really useful. Who cares when I learned the value of SQL, when it's an industry-wide phenomenon?

My point wasn't that one can't name languages with staying power. There are several. But, SQL is still special, I think. It's almost universally needed. If you go to work in a Node or Go shop today, you're probably still gonna need to know some SQL, just like if you went to work in a Python or Perl or Java shop in the 90s.

JavaScript is probably also more valuable than it was in 1998. Still pretty much backwards compatible, too.

Pretty sure SQL has been around far longer than either Java or Python.

>e.g. name another programming language that is as valuable today as it was 20 years ago, if not moreso

If we include non-Turing-complete languages as “programming languages”, which we must for SQL to get that 20-year label since SQL wasn't Turing-complete until recursive CTEs were part of the language, then XML.

Also, even without that caveat, candidates include Bash, C, C++, Objective-C, Perl, elisp, Java, JavaScript, MATLAB, VHDL, and others. Some, while still being as useful as they were 20 years ago, are down from a peak in the intervening period, but arguably that's true of SQL, as well.

To your point. I learned SQL almost 30 years ago. Along Unix, vi, Perl it is a skill that I still use often. It is one of the best learning investments that I ever made.

The choice of example database (the death penalty) is terrific.

So much more interesting, important and compelling than the usual tired customer/product examples.

I'm tired of seeing the implicit assumption around the place that software is just about business. Software should be about more than that.

I'm not the guy for trigger warnings but I find the choice of data unfortunate. There is a big percentage of people I won't recommend the page because of the dataset.

same here.

I like the death penalty database, it makes things so much more interesting and memorable. Data doesn't have to be boring

Congratulations! I love this kind of interactive tutorials as their helping more and more people learning stuff without a boring book and no exercises.

That said: Doesn't work in Firefox and Edge? Common, are we back in 1998? Sorry for the rant, but please don't do that.

It works pretty fine on FF. However, you have to enable `dom.webcomponents.customelements.enabled` manually...

Awesome work Kao!! Earlier this year I launched a similar interactive SQL tutorial with similar goals.


With the writeup on why i made it here: https://medium.com/@__dave/why-i-wrote-yet-another-sql-tutor...

Do let me know if there's any collaboration we could do!

Thanks Dave! I share many of the sentiments in your writeup, especially about existing tutorials. Admittedly, I have been impressed by the quality of a few of the tutorials that others have suggested here. Will let you know if I think of anything!

Sorry for being dense, or maybe there are pop ups that I'm not seeing? I'm missing how is this "interactive". Sorry

Slightly embarrassing -- but the problem may be that the site isn't compatible with FireFox (at least pre-v63 FireFox). The site makes use of custom HTML elements which are not supported by default in pre-v63 FireFox (https://developer.mozilla.org/en-US/docs/Web/Web_Components/...). The good news is that all you need to do is to turn on a flag as set forth in the link provided. The bad news is that many other less tech-savvy people are going to be stymied by this.

edit: the site will now warn you if your browser is incompatible.

It's 2018 and we're back in 1998 where website proudly showed "Optimized for Internet Explorer".

In 1998 sites might looks bad but at least user can still read the content. In 2018, some sites just refuses to work for anything but Google Chrome.

In 1998 you had to be running Windows. Now you can run any OS, even if the browser you have to use for a certain site isn't your favorite. It's an improvement.

People have always wanted to use "the web" as an application platform. It didn't work that well in 1998, but it's pretty good now. Progress has been made.

The difference is these are standard features, albeit ones that are just coming out.

Also not working on Edge.


Scroll down to Dataset.

After "I’ve prepared for this book:"

There is a little interactive code editor you can type in and run the query.

I was confused too. It appears the page does not work on Firefox. I get a "customElements is not defined" error in the console in Firefox and none of the interactive sections render.

The page works fine on Chrome.

Doesn't work on Chrome for me - Version 67.0.3

Yeah I wondered if it was something like that.

Building a site for Google Chrome is like building for IE ActiveX back in the day. I'm surprised this isn't the main topic discussed here.If it doesn't work in Firefox, it isn't worth supporting. We need an open web.

It supports Firefox, you just have to enable a component from your about:config.

Nice job, this is a great way to teach something like SQL. Any plans to add more chapters? Would love to see something around window functions AND CTEs as this format would go a long way in showing when and how to use them.

Did you have a look at https://www.windowfunctions.com ? Their exercises were a great introduction for me.

I also discovered the ability to select from multiple tables and do joins using where clauses instead, which to me is a lot more intuitive than explicit/imperative join statements. I didn’t see anything covering queries like those (maybe because there’s only one table in the dataset, idk i’m on mobile).

The O'Reilly "SQL Pocket Guide" is very well written, and, IMO, one of the best tools for learning SQL. It's extremely concise, and yet packed chock full of information. It's missing some things, sure, but once you have a strong foundation, the rest you can get from RDBMS docs. The fact that this is such a short book, and not a reference guide but an actual guide to using SQL, makes this book approachable and unintimidating. I highly recommend it.

See https://sqlbolt.com/ for an interactive SQL tutorial with less distracting examples.

This commentary could have been constructive

I am looking for something to recommend to a new dev, and your link is definitely preferable. While the story in the opening post is cute, I just don't care about said story.

thanks a lot. it's an awesome project. in the introduction it was stated that this project is using Github pages for hosting but i couldn't find the repository address anywhere.


here is the Github repository in case anyone wants to send PRs or start the project.

One suggestion to make things even more clear than they already are with this paragraph:

>The SQL query may look like an ordinary sentence, but you should view it as three Lego blocks: SELECT FROM executions LIMIT 3. As with Lego, each block has a fixed format and the different blocks have to fit together in particular ways.*

Explain what SQL is.


One example for this could be:

SQL means Structured Query Language, and with SQL we can build (structured) questions (query) using a syntax (language). To find what we want in the data, much like we build structures with various Lego bricks, we can use the following statement:

SELECT * FROM executions LIMIT 3

The SQL query may look like an ordinary sentence, but viewed as three Lego blocks, each block has a fixed format and the different blocks have to fit together in particular ways.


Just thinking of how to distill it even further...

It’s a lovely product and you can tell a lot of work went into making this course. But I fail to understand the author’s logic behind “a good course has to be free”. Why though? What’s wrong with creating value (like The OP) and then extracting it from the people that benefit from it?

In fact, there’s tons of anecdotal evidence that free MOOCS haven’t faired well in terms of retention precisely because of a lack of a buy-in from the people taking those courses.

See this, from one of the authors of one of the best undergraduate textbooks on operating systems:

The Case for Free Online Books (FOBs): Experiences with "Operating Systems: Three Easy Pieces" by Remzi Arpaci-Dusseau, http://from-a-to-remzi.blogspot.com/2014/01/the-case-for-fre...

Imagine people in low paying jobs who dont have the means to pay - there should be stellar free resources like this for them to grasp things with a solid enough foundation to get onto more complex projects.

Free is great.

Is it possible that value can be generated without a financial transaction?

Outstanding work. SQL was the gateway drug to coding for me and I wish this was around. SQL is my top recommendation for people looking to become more technical, and this will likely be what I send them first.

The dataset is sobering. Good on you for using this.

I just finished working through it. It is well written with an interesting if macabre example dataset. The interactive bits worked well once I enabled web components in Firefox.

I find the writing style interesting, but I'm not like others at my work who could benefit from learning SQL (business managers at a bank back office who rely mostly on Excel, then get stuck when Excel crashes with large datasets).

Are these funky, quirky books generally well-received outside the programming community? I quite liked Why's Poignant Guide to Ruby, for example, but I would never recommend it to anyone at work.

As a hobbyist programmer I absolutely love this style of instruction.

I don't know that I'd use "funky" or "quirky" to describe it though. Instead it comes across as clear, concise and approachable.

I really like this. Bookmarked. I have spotted a few bugs that you may want to check into.

Search the page for the word “llanguage”. The sum/count checker told me it was incorrect for using lowercase nulls. I normally write all my sql lowercase for brevity, but I also mostly use sql server.

I really like what you have done here and I hesitate to report it in your show hn thread because this really is great.

Thanks so much for the feedback! Both the typo and the validity-checking bug have been fixed.

Constructive criticism and commentary is one of the most valuable parts of Show HN!

I really like your explanations. I have never been able to wrap my head around joins.

You say that: "Programming is best learned by doing", but I am struggling to do anything...

It would be great if there was some online area like SQL Fiddle with the data loaded, so I can run queries and see results in my browser without having to set anything up or am I missing something.

(self plug) pgexercises.com provides something like this.

The code is open source, anyone is welcome to fork and populate with new data like this.

I learned SQL mostly by using your site, thank you for making it!

Me too! Thanks Alisdair!

pleasure, glad you both found it useful :-)

One option is to learn on bigquery (a google cloud service) - its "pre-loaded" with all kinds of public datasets that you can just start querying, with a web interface for doing sql.

Its pretty easy to stay within the free tier (1TB of data per month)

All the blue boxes are actually interactive code editors. You can run arbitrary queries against the dataset in them. I changed some of the wording to make it clearer. Hopefully that helps!

I was wondering what you were talking about. What blue boxes? Then I saw the problem. The boxes showed up in Chromium but not Firefox.

Same here - the front matter page mentions web components - perhaps the support is different /disabled in Firefox?

The <sql-exercise data-question...> elements aren't visible in Firefox at all.

My favorite version of sqlfiddle is dbfiddle.uk. Lots of databases and generally fast.

sqlzoo.net might be something you would like.

I have picked up SQL and statements from my time in industry. I really like how this book provides not only interaction but a dataset that actually means something. Also, interactive systems like this for teaching are amazing! I learned about how "_" is a wildcard for one character (I have been making do with % for LIKE statements).

I only recently learned that LIKE also supports something vaguely similar to regex character classes:

    SELECT * FROM customers
    WHERE company_name LIKE 'foo[lt]'
will only return customers whose company_name is 'fool' or 'foot', while

    SELECT * FROM customers
    WHERE zipcode LIKE '12[^30]45'
will return customers whose zipcode is LIKE '12_45', except '12345' and '12045'.

Why did you use death sentence cases as data? I've spent around 20 minutes just reading the last words of various executed people. It's... disturbing and distracting.

The reason is explained in the front matter: “On one level, the data is simply a part of a mundane programming book. On another, each row represents immense suffering, lives lost, and in some cases amazing redemption and acceptance. In preparing for this dataset, I was deeply moved by a number of the statements and found myself re-evaluting my position on capital punishment. I hope that as we examine the data, you too will contemplate the deeper issues at play.”

It's deep and moving, it makes this course more than just a course. I just fetched a few rows and read Beazley's last statement and I now have a great respect for the creator of this course.

I strongly agree. Rarely do I see such a humanistic approach in tech. It brings an important matter to our attention and is meaningful. As a side note, what is meaningful is pedagogically valuable, because what has meaning is easier to remember.

it's data that asks to be investigated. There is no reason to query meaningless data. A query that tells you something you didn't know is infinitely more useful than a query that tells you what you already knew, and a dataset that encourages you to ask your own questions and craft your own queries is infinitely more effective than, say, an imaginary book catalog or some other nonsense that doesn't at all tempt the student to ask more questions of the data.

and besides: technology is a facet of society, it is not a purely rationalist act.

I love it :) It gives the data weight.


This breaks the site guidelines. You unfortunately have a history of posting uncivilly to HN. If you would please review https://news.ycombinator.com/newsguidelines.html and follow the rules when posting here, we'd appreciate it. The idea here is: if you have a substantive point to make, to make it thoughtfully; and if you don't, to not comment until you do.

Why did you choose executions over something mundane like blogs or comments? Reading last statements isn't exactly something I want to be distracted by. The data is interesting sure and if this just happened to be based on something you did recently, I get it. It's not something I'd recommend to students doing data science or getting acquainted with SQL when there's less morbid source material out there. The interactivity provided is cool though, no doubt.

I found it engaging, and a truly real world example of how what we do can impact others, provide us with insights we might not normally have, and can help us influence the world around us.

This might sound like an odd statement, but I can tell from reading part of it, that it was written by a beautiful person.

Genuinely enjoyed the book. A couple of notes:

1. In The Long Tail > Nested Queries, the first example contains the function LEN, which gives an error as LENGTH is expected.

2. On the questions with multiple answers, when showing hints, I would use different font colors for correct/incorrect statements as it will be easier to check your results.

Wonderful website!

As someone who has never really touched SQL except very briefly during a degree module, which basically taught me nothing, I enjoyed this. I think it taught me some good stuff. I will need to go back through it as I didn't pick it all up.

That said, as a novice, the challenge questions are not in anyway suitable for a beginner. I wouldn't, and didn't, have any idea how to do them; or know where to look to find a starting point. The previous chapters do not prepare you for the challenges and I came away thinking I was useless until I click the solutions and realised there was no way I would realistically ever been able to solve them without some form of further training in SQL.

Glad to see that the Creative Commons license is used!

I used the MIT license for the interactive SQL tutorial I built ( https://www.sqlteaching.com/ )

Is there a trend towards pure SQL lately? Not that i m complaining (quite the opposite but let me not jinx it), but it seems we might stop seeing a new ORM each week for a while.

The book says this is a valid where clause:


> 1 and 0 are the most basic Boolean statements. This block guarantees that no rows will be returned.

This doesn't work in SQL Server and it has always bothered me. But is it part of any SQL standard?

The book also says this is an invalid where clause:

> WHERE '%obert%' LIKE first_name

> More than one wildcard is fine. But the pattern has to come after the LIKE operator.

This one is probably a bug/mistake, but it's definitely not invalid in the SQL language.

I can only imagine how much work this must have been (and not very accurately, I think). I did not have time to look over it in great detail, but I like what I have seen.

Thank you for creating and sharing this!

"Let’s take a quick quiz to cement your understanding."

"Now you have the tools you need to complete our project."

Where is this quiz and this project? Am I missing something?

Reading the final words is really moving. So much suffering.

I’m in love. I’d pay for this. Thank you.

Nice work...

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact