Hacker News new | past | comments | ask | show | jobs | submit login
Mastery with SQL: Learn Modern SQL with Postgres (masterywithsql.com)
551 points by nsainsbury 26 days ago | hide | past | web | favorite | 80 comments

Hi HN!

Mastery with SQL is a course I've been working on for a long time now and just recently released. I'm a big self-learner and one thing I love in courses and books is exercises (with full solutions). I've always wanted to make a SQL course that's not just theory but has lots of great real-world inspired exercises...and not just only simple exercises, but challenging ones too that put you into problem solving mode and require some creativity. Mastery with SQL is just that. I estimate that at least half of the time spent creating this course was focused on exercises alone.

I also wanted to showcase all of the wonderful functionality that's available in modern PostgreSQL - my favourite go-to relational DB.

I'll be around to answer any questions anyone has about the course - otherwise you can always contact me directly at neil@masterywithdata.com.


Do you have a plan to add interactive SQL coding session? Something like Codecademy.

So a student can jump into SQL problem. The data has been setup already. The student only needs to think hard to find the sql query for the problem. Then the student can get the feedback based on the error.

Yeah, I have been thinking about building something like this - possibly even going beyond the course and offering just the interactive exercises for people who only want to work through the exercises and don't need all the videos.

Nice! If you want to take a look at how someone else approached this, pgexercises [1] also handles interactive SQL sessions.

[1] https://pgexercises.com/about.html

The course seems great, congratulations on putting everything together, it must've taken a lot of time and effort. Maybe you'll find my feedback instructive: I would say that I have intermediate knowledge of SQL, and most of the things are familiar to me, and what's stopping me from buying it is the lack of advanced topics.

Are you planning on adding any advanced topics like query optimizations / performance tuning, triggers, transaction types, full text search, etc.?

Do you plan on expanding this course with something like that? Because that would definitely make it a "from zero to hero" type of course.

That's my 2 cents, great work once again :)

Thanks! I am planning on adding several additional chapters over the coming months - the next one will be on query performance and indexes and I've got plans to add a full chapter on transactions too. I think some devops topics covering setting up postgres in production, users/roles, security, etc. would be great as well...I definitely see the course as evolving over time (much like how books get multiple editions over their lifetime).

I'm definitely interested especially if there will be DevOps topics covered in the future.

This looks awesome. If we purchase now, will we get updated/new chapters in the future?

Yep - all content I release in the future will be available to all customers. I will be sending out emails as new content is released.

Similar feedback here: I'd be most interested in concurrency control (e.g. https://www.postgresql.org/docs/9.4/mvcc.html). I've rarely seen it covered in depth outside of the official documentation.

Similar feedback here - the only chapters that'll be useful to me are 7. and 8. It would be great if you can add the tentative list of chapters and topics you plan to cover in the website itself, so I can see what's going to come later.

This looks interesting but really needs some clear pathway to free samples to evaluate it. The first thing I did was click "Start Learning" and it took me to"Pricing".

I want to understand the quality and depth of what I'm paying for before I shell out.

Yes, and the Show HN guidelines require a way for uses to try out the product or project. We've taken "Show HN" out of the title for now.


Didn't know that, the guidelines a good idea, thanks.

The OWASP Postgres hardening page recommends that one removes the default "public" schema in postgres: https://www.owasp.org/index.php/OWASP_Backend_Security_Proje...

...whereas this course seems to use the public schema and just query for "SELECT * IN public.foobar" etc.

I remember being a bit confused about "schemas" when I switched from mysql to postgres. I think it would be good to have a special section that explains what schemas are typically used for, and in particular when/how to use the public schema correctly.

I think that's a reasonable recommendation if you're deploying Postgres in production. Not something I'd want to cover in any of my current chapters, where the focus is on teaching people SQL, but definitely something I'll mention down the road when adding a chapter around production deployment and security recommendations.

I agree. Some more info here for those interested: https://wiki.postgresql.org/wiki/A_Guide_to_CVE-2018-1058:_P...

I had that same problem when I switched from postgres to mysql and big query with their "projects"..

Wow, OWASP recommends keeping your database schema secret.

I've never been big on security through obfuscation.

I think it's really nice that's it's possible to self publish something like this. It would be cool with a blog about creating the product, thoughts about marketing strategy, sales channels, etc. And the experience selling it. I would love to make something like this for a niche topic. Although I'm worried it will be hard to sell.

In your main example video, you're using varchar(45) for your name fields.

What's your take on using varchar(X) vs having a text field with a char_length(X) constraint? Isn't the text approach considered more of a best practice nowadays?

Yep, I tend to prefer using the text type in most cases (in Postgres in particular, both varchar and text have the same performance characteristics with varchar effectively being just the text type with a check constraint on the length). I'm not super strict about it though - it's more just a preference and there's some good arguments I've seen around 'signalling intent' by using fixed length text types vs "unlimited" length text types.

> In Postgres in particular, both varchar and text have the same performance characteristics with varchar effectively being just the text type with a check constraint on the length

Are both pretty much the same from a stand point of being able to change them in the future once they have a substantial amount of data saved?

For example if you have 1,000,000 rows of varchar(50) vs. text with a check_length(50). Which one wins if you want to decrease X from 50 to 25? Does the answer change if you want to increase X from 50 to 100?

This is with PG11 by the way if it matters.

A major difference is that if you have a view which uses that varchar(50) column, you'll have to drop and recreate the view in the same transaction that you increase the length of the column. (And if that view has any dependent objects, you'll have to do the same for them.)

With the text type and check constraints, none of this is necessary.

Performance wise, there is no distinction in Postgres 11. As I understand it, internally varchar(N) is basically just text with a character limit. varchar(N) really only exists in Postgres these days for backwards compatibility, and the fact that it is ANSI SQL.

Would it be equally as performant to change a large table's varchar(50) field to a text field with a range constraint (min / max) vs. just changing the existing text's constraint to include a min? That type of use case might come up in the wild where you're like, "well... maybe zip codes shouldn't just have a max size, they should also have a min too".

Really just pulling at straws here to see if it's ever viable to use text / constraint when it sounds like varchar(N) might be the way to go? Even if it's only there for backwards compatibility, I must say varchar(50) is a lot friendlier to type than having to wire up a char_length constraint.

Postgres org recommends Text. I use a text domain defined with 2 constraints such as min chars 1 / max chars 255 Domains are great - a fast flexible and easy way to enforce scalar constraints including more complex checks such as regex - reusable across multiple columns and easy to change.

Which is why I still use this nomenclature for SQL

The PostgresQL manual recommends using text over char or varchar unconditionally. I think that taking it straight from the source is the best approach here.

I just took a look at the manual (version for 9.3 and 11) and saw no such recommendation for character or character varying types... seems like the manual should be agnostic to such choices outside of deprecation. It did mention that common assumptions of performance differences weren't real in the PostgreSQL implementation.

I think you're thinking of the Wiki rather than the manual: https://wiki.postgresql.org/wiki/Don%27t_Do_This#Text_storag...

I generally agree with the advice in this section of the Wiki, but recognize that it's more opinionated... and less authoritative... than the manual.

I was indeed referring to the section on text types in the manual. But I admit that I somehow lumped char and varchar together in my head while reading that section a couple of days ago. So you shouldn't ever use char (and only char) unless you have very specific requirements. This means that yes, I didn't get it straight.

Can you provide a link to that recommendation?

There's pretty much no difference between those types, they are all stored exactly the same way. Varchar has a bit of extra cost for the length check, but that's OK when you want to enforce that limit.

> varchar(45)

That's crashing with my full name

If a machine learning engineer job posting asks for ‘SQL’ in ‘desired skills’ section, will this be a good resource to work through, or are there better resources for that? I don’t have any experience with databases.

I'm sure this is a good resource and will very well repay the time to work through it, but what a job will most likely want is actual industry, hands-on experience. Learning SQL from a book/website is very valuable, but it's not enough.

Working with large, real datasets where the SQL optimiser has decided that it doesn't like your face today, well, you need lots of those headaches before you get expert.

I realise that often leaves people in the tough position of "can't get a job without experience, can't get experience without a job". Sorry, I don't know what to suggest.

But good luck and don't let me put you off!

Any resource to learn SQL will do, there are minor syntax differences between databases but they are not important. And there is not difference to learn SQL for big data, machine learning or any other purpose

Is there a good guide on which database to choose for an ML project? Like ease of use, or speed, or maybe some particular features?

I would first look at the tools/libraries you are using and what they may support. PostgreSQL, mySQL/mariaDB are both well supported RDBMS in some circles and work well with Python... if you are doing map/reduce at its' core, you may do better with Hadoop or Cassandra. MongoDB may be a disconnect depending on your tooling.

PostgreSQL does have a lot of options, and this includes indexing on binary serialized JSON data, that can help a lot depending on what you want to do. PLV8 is also incredibly interesting.

As an aside, I tend to find that if I can do something in a Docker container (locally), I will do it in a container. This lets you spin everything up, and down and cleanup and retry without leaving remnants of stuff behind on your host environment (desktop). The only gotcha is don't use volume mounts for your db data if you're using docker desktop for windows/mac, it does not perform well.

As no-one's answered this, I'll try though my area is DBs but certainly not ML.

AFAIK ML doesn't really care about the source of data so DB is irrelevant. The conjunction of DB and ML in your question makes me wonder if you're sort of confusing the two. AFAIK they are totally independent, though some DBs come with some data mining tools, which is ML eg. MSSQL (you can prob download an eval version for nothing. It's time limited though, 6 months or so).

But to repeat AFAIK ML and DBs are entirely different things.

If you want to grab a DB, well I've no experience with postgres but I've heard a lot that's good, so perhaps start here.

So far I’ve been using Excel and Pandas to maintain/pipeline my data, but I feel like a database might be a better way. I’m not sure about built in data mining tools, as I’m used to writing them myself. Also, most of my pipeline configs are in json format so I’ve been thinking about MongoDB.

Sounds like you know ML then. Anything non-trivial on a spreadsheet is usually best ported to a DB, that's a pretty good rule of thumb.

I don't understand about your configs being in json, nor why that makes it more suitable for mongo. If you need to store semistructured blobs of data, relational DBs can do that fine (in BLOB fields), but IIRC postgres supports json, including indexing into individual json subfields. <http://www.postgresqltutorial.com/postgresql-json/> but to repeat, I've never used postgres so speak to someone else first who has.

If you can do ML as it seems you are, install postgres and just get started on this tutorial stuff here - best of luck!

I'm curious if you considered providing the course using one of the online courseware providers (Coursera, Udemy, etc.)?

Second, no love for Linux? DBeaver and pgsql are obviously available, is there any particular reason that Linux isn't included; or anything in the course that's specific to MacOS/MS Windows?

I think the assumption is that the average Linux user will easily figure out installing DBeaver and PostgreSQL by themselves, plus installation methods will vary depending on distribution. I'm a Linux user and I'd certainly skip that video.

That's a pretty lazy assumption if you're providing a how-to video. Especially since:

1. There are new Linux users every day

2. The Mac/Windows users will most likely have to install PostgreSQL on a Linux server eventually for production anyway

It would be very beneficial to at least add instructions for installing on a Debian based distribution like Ubuntu, since that would cover the vast majority of Linux users, especially newer users.

Yes, but exclusion suggests that there might be use of software later that doesn't allow it, or that there's some difference -- like they do some DB thing and it doesn't work if you're using ext/zfs/whatever. Remember it's for non-experts, people may not even know(?) that pgsql and such are freely available to them.

At least having the add-apt-repository line in the course manual saves you looking for it on the DBeaver page.

Aside: I've been doing a the Standford machine learning course and it includes installing in the required work, so I did the install on Windows as instructed (what a pallaver) and only then realised it was entirely optional just not presented as such and so then did `sudo apt install octave`, messed around with a few test queries, and was done with the entire second lesson.

I agree, especially some info on production installation on Linux would be really nice. For example, is it better to install the "postgres" ubuntu distro package, or is it better to install the Ubuntu package provided by postgres themselves at https://www.postgresql.org/download/linux/ubuntu/? Which one of these offer the best security / stability?

Yes, you'll be able to work through the course using Linux without any problems. The only software that's in use is DBeaver and Postgres - both of which run on Linux, Mac, and Windows.

I'll make a note to record a setup video for Linux as well (though in truth, the Mac and Windows setup videos are practically identical anyway owing to the cross-platform software in-use)

I sent a message from the course site, but would probably want to consider installing postgresql via docker, which can be a one-liner in any platform with docker installed (including docker desktop for mac/windows).

The problem with coursera and udemy is that they lock you into their platform and limit you in terms of what you can do. When you build the product with its own website, you can basically do whatever you want in terms of innovation.

This looks quite nice!

I had a look at the content overview and I was missing any operations considerations. While one might have the comfort of somebody doing the DevOps for them, many people will need to do that themselves. Any plans on that front?

The content overview doesn't include anything about escaping data to prevent SQL injection (escaping of literals vs escaping of table/field names etc). This is important to get right and postgres has a slightly different approach compared to mysql and others.

For example, you can and should use parametric queries like:

client.query('INSERT INTO mytable(a) VALUES ($1)', ['hello'])

...but that doesn't work for all types of queries, for example you get an error for:

client.query('SET LOCAL SEED = $1', [someVal])

Another example, you might need to build a dynamic where-clause based on data from an "advanced search" UI query builder, and then you need to make a list of all the "foo = $N" parts (and increment $N as you go), plus another list of the actual variables. Maybe there are some nice tricks / techniques to simplify that?

The free "teaser video" starts off showing "actor_id serial NOT NULL" but in postgres 10 and above identity columns can/should be used instead?


I actually do cover this in the course! I actively recommend and give examples of using autogenerated identity columns for the primary key, but still in many places in the course do expose people to using the serial types due to their popularity.

Do new coders still learn SQL, now that they all use frameworks that abstract away the database queries?

Ugh... I tend to prefer SQL database interaction via scripting languages, and the abstractions are a LOT more work than just knowing SQL and writing parameterized queries directly.

Also, conceptually it's a VERY good idea to understand how your database, and databases in general work. You can avoid a lot of weird behaviors and bottlenecks if you know more.

Aside/Joking/Rant: Except mySQL, I don't want to know its' weird behaviors any more than I already do, it can die in a fiery inferno and suffer in hell incarnate.

I'm deciding between doing this and the Kaggle SQL summer camp. How does it compare?


Great looking site. A big motivator for me to do something similar.

Small typo under HD Video and Top Notch Audio: vidoes

How translatable is this to MS SQL Server? That’s the database we use at my company.

Pl/SQL in PostgreSQL is a bit different for some things vs MSSQL's T-SQL syntax. Most of the syntax will work without much modification, and all of the stuff that seems to be covered should apply to both. You can take this course, then use google to cover differences, once you understand the concepts it's pretty easy.

Most databases have some minor differences. mySQL/mariaDB and SQLite seem to be the most odd in terms of how standards are supported and fallback mechanisms. For mySQL this is mostly out of historical context, and for SQLite its the nature of a small/fast/light SQL library.

FYI your site is blocked by our forcepoint system as a "parked domain"

Sounds like a problem with your forcepoint system, whatever that may be.

It should be really nice to have variable playback speed for the videos.

Yep, there is! It's just the promo videos on the landing page that don't have this.

This looks like a very promising resource.

Looks for beginner correct


Please don't be a jerk on HN, especially in response to someone else's work.


Are you telling me that this HN entry is anything more then an ad? I know an ad when I see one. Yahoo is full of them disguised as articles. And I kinda stopped reading yahoo exactly because of click-bait articles disguised as ads. So guess what was my reaction when I seen this one. I really hope HN doesn't go the same path.

I think courses and books are super valuable.

For example, I've been building a new web app with Elixir and Phoenix for the last few weeks, learning as I go. Couple hours a day (when I can basically). Did nothing but skim the docs and look at open source examples.

I got pretty far and made real progress but the other day I decided to just buy the Programming Phoenix 1.4 book which is written by the people who made Elixir and Phoenix.

I'm now 80% done with the book and I'm very happy I bought it. It filled in SO many gaps and I have a ton of actionable "TODOs" in my app to make changes based on what I learned in the book. Some of these are really big wins and the insights learned help me understand how to write better code in the future (even outside of Elixir).

When it comes to courses, I can't speak for OP but I personally offer free life time updates and support (I happen to create courses too, but not on SQL). Having a get out of jail free card where you can ask someone a question and get an answer is worth the course's price alone. Especially considering if you were to hire someone for that level of support you would probably end up paying 3 or 4 times the cost of the entire course just for 1 hour's worth of support.

    >  ...instead of just learn it on your own on postgresql.org...
The world is full of people who "learned things on their own" and still created nightmares.

The important thing is to just do it, accept failures, apply some self-criticism and keep going. How you get started is far less important in terms of quality of output as time progresses in one's career.

Courses are a perfectly good way to jump-start the process as is learning from source materials. Different things work for different people.

Or you just retire someday and these people take over maintaining all your codebases. Hopefully they're as successful at spotting the unsupportable assumptions in your code as I was at finding them in your comment.

This seem based on the assumption that the "real learner" who RTFM will somehow end up with deeper knowledge than someone who pays less than one billable hour for a well-written course. I've found many times when I don't do a good overview, I end up with gaps in my knowledge because I merely learned "just enough".

Learning syntax is one thing, learning to write good code is another thing, and learning community best practices and advanced techniques is yet another thing. You can be a competent Java programmer without ever reading a book like "Effective Java", but reading that book will probably improve your skills. The same is true for PGSQL (or any other language). Sure, you can pick it up on your own, but mastery is achieved with experience and education.

Not quite sure what you mean is this a positive comment or a passive aggressive negative one.

I still contented the £2K (back in the day) that my employer spent on a week long intro to Oracle was a good investment for them and Me.

It's definitely a positive comment. 100% positive for me.

As for how my employer at the time pointed me toward SQL was something like; "Here is yahoo.com, learn SybaseSQL by Monday on your own, you start a new project on Monday that requires it". It was Friday afternoon, in 1998 and Google was still in it's infancy, hence all searches were done via yahoo or altavista. And I had to learn it by myself over the week-end. Not even a £0.02 investment from my boss.

I am also an avid self learner but a good teacher / course it's inestimable. Just watch a course like linear algebra from MIT with Gilbert strand and compare it to studying it alone. The difference is enormous .

The biggest problem with learning on an "as-needed" basis: You only learn what's needed to accomplish your immediate goals. An inefficient, injectable query will certainly get data showing up on your web page.

Most people are not going to "learn databases" over the weekend. A boss or company that has employees capable of this type of independent learning and routinely expects it will soon lose the former. That costs more than 2 cents.

In 1998 I learned a lot on my own as well. I love the fact that today there are better resources when I need to learn new topics.

Was there any useful content on Sybase, searchable via Yahoo, in 1998?

Sure, links to Amazon books about Sybase, for way more than $50

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact