Hacker News new | comments | show | ask | jobs | submit login
PostgreSQL Exercises (pgexercises.com)
420 points by pvsukale3 7 months ago | hide | past | web | favorite | 46 comments

Very impressive! Works well, nice UI, informative.

After doing a couple exercises what I don't like is the naming convention. Something like "montlymaintenance" or "recommendedby" are just hard to read and type and I'd much rather use snake-case and/or shorter names. Eg. "monthly_maintenance" or "maintenance_per_month" or just "maintenance" if it's explicit enough that it's always per month.

Also to me using shorthands for ids is a bit hmm hmm I mean sure you write them a lot so it's useful to spare your fingers when typing queries but "facid" just seems wrong. Again maybe underscore? "fac_id" or even "facility_id".

Well that's at least my opinion. Also would it be a better practise to use "timestamptz" instead of "timestamp"? And FYI I had to zoom to 125% because the font was so small on my laptop.

Author here - thanks a lot for the feedback, it's appreciated.

You're right on the naming convention, it's very haphazard. I'll raise an issue on Github to revisit that - although it may take me quite a while to get to it as I have a two week old to look after right now :-).

Kudos on the new budding programmer you got there. ;-)

Thank you!


thanks :)

Same thing, took me a while to understand that facid was indeed facility_id.

In my previous work, we used this to test candidates when their job involved writing SQL. Worked great.

Timestamptz is better practice. Same size (8 bytes) on disk and saves you from headaches further down the line.

This seems as good of place as any to ask novice-level questions about PostgreSQL. I teach students SQLite and one of the most massive pain points in transitioning to PgSQL is how the latter, in the `WHERE` clause, fails to recognize aliases in the `SELECT` clause, e.g.

   SELECT UPPER(name) AS bigname
   FROM people
   WHERE bigname = 'JOE';

Apparently this is the SQL standard, which PgSQL seems to do a better job of following than SQLite, but I'm at a loss to understand why this computation is particularly problematic for PgSQL (or any variant) to adopt? Unlike other SQL standard rules that PgSQL follows that SQLite/MySQL doesn't (such as forbidding the selection of column names that aren't being GROUPed by in a GROUP BY clause), this strictness seems to be all inconvenience.

MSSQL has the same (very frustrating) limitation. The obvious and ugly solution would be to wrap in a subquery (bonus: still std sql compliant, and performs well):

    SELECT bigname
    FROM (
        SELECT UPPER(name) AS bigname
        FROM people
    ) x
    WHERE bigname = 'JOE'
Add columns as necessary, it should work fine.

But this method can make large queries with lots of aliasing very difficult to read and lead to deep nesting. There is an alternative in MSSQL and (via a quick search) pgsql as well to keep the crazy nesting to a minimum:

    -- MSSQL
    SELECT bigname
    FROM people
    CROSS APPLY (SELECT UPPER(name) as bigname) x
    WHERE bigname = 'JOE'

    -- PGSQL (not 100% sure of syntax)
    SELECT bigname
    FROM people
    LEFT JOIN LATERAL (SELECT UPPER(name) as bigname) x on true
    WHERE bigname = 'JOE'
The problem is this might have a negative impact on performance because it mucks with the query planner. I've seen everything from huge, complex, CROSS APPLYs have literally zero performance penalty to completely benign single-column renames tanking query performance 10x, so beware if you use this.

It's because the WHERE clause is evaluated before the SELECT. The reason that SQLite allows you to do that is because SQLite is much, much simpler than PgSQL, MySQL and SQL Server (all of which don't allow you to do this).


Yes. I wish SQL select-statements put the select-clause last:

   from people
   where name = 'Joe'
   select upper(name) as bigname
1. It would mirror the other clauses (insert, update, and delete) which all begin with the table name.

2. It would be easier on my head. I skip the select-clause anyway when reading SQL, then come back up to it at the end, when I know the columns' source.

3. It would show that using column aliases in the where-clause would fail (unless the database did some kind of two-pass processing).

You just described LINQ

You would think that an initial pass of SELECT could be made only to note any aliases. It would be a much appreciated nod to usability.

That seems straightforward enough, don't know why I assumed SQLite's implementation of it was a feature as opposed to a consequence of its overall simplified model. I think I assumed/hoped that aliasing worked as a kind of function definition which WHERE could magically understand and evaluate.

As a counter point teradata (a ridiculously expensive closed source but fairly complex database) supports this feature. So it is definitely possible for this feature to be supported.

in SQLSERVER you could build your selects (using unique names or aliases) and then wrap it all in a common table expression like so:

; with cte as ( select a as A, b as B, c as C from dbo.table1 tbl1 join dbo.table2 tbl2 on tbl2.id = tbl1.id) select A, B, C from cte

I’m always saddened when I see this feature lacking in so many databases. Especially as Teradata supports this feature so i get to use it at my day job but not on any open source projects.

Cool.SQL Zoo http://sqlzoo.net/ is also worth recommending for beginners.

Neat - thanks for sharing!

In the SQL MOOC space, you can also check out https://academy.vertabelo.com.

If other people have resources, please share them out, I think this is an area where spreading the knowledge is particularly useful to most developers.

Great idea, does what it should and can be extended to cover a lot more. Keep up the great work.

First time looking at it I thought I was on a official Ubuntu site, what a font and color can do to a branding is fascinating.

Well, it is using the Ubuntu font! https://pgexercises.com/css/site.css

Yeah, I have zero design sense and just went with the defaults of a bootswatch theme that I liked :-)

Yes, I know, that's what I am talking about. ;-)

That's exactly what good branding looks like!

Author here: out of curiosity, what would be your priorities for future exercises? Right now my list looks roughly like:

    - DDL
    - Advanced datatypes (JSON, arrays, etc)
    - GIS
This order is partly informed by the current support of the site: adding DDL exercises is a relatively minor bit of technical work, and then it's just writing. Everything else involves schema changes (or, possibly, a second database).

First, I appreciate the work you've done. Great stuff. Thanks.

Second, I don't imagine these topics fit well with the exercise based course material, but they are of interest to me:

Window Functions, Logic in the database vs in the code, Serialization, Normalization vs Denormalization, Object Relational Mapping, Concurrency & Transactions, Triggers

Glad you've found it useful! FWIW window functions are already covered in the Aggregation section.

The rest of it, yeah, as you say it's not so great a fit.

The order sounds reasonable - IMHO it correlates with the number of users who need that area of knowledge.

Did a few exercises and seems nice. An improvement would be if hints included links to PostgreSQL documentation, would make it easier for beginners to find good information.

Zipcode is an integer? What about zips in New Jersey?

grins for that I'll have to say that I'm from the UK and that situating the exercises in the US was probably an error :-)

As a general rule of thumb, if something isn't used for calculations, it probably shouldn't be a number.

That wouldn't generally be my approach. In my view if something is a number it should be typed as such.

That said, I recognise that there are pros and cons to this approach similar to those in strongly vs weakly typed PLs. My personal preference is for a strong schema.

There is a hidden assumption in your phrase "if something is a number". You imply that a zipcode is "obviously" a number. But it's not obvious. A zipcode is obviously a sequence of digits, but not everything that we represent as a sequence of digits is meaningfully treated as a number.

Leading zeros are one sign that zip codes are just digit strings, not numbers. Another (silly) example: if you REALLY have a number, then it doesn't matter what base you write it in. The number four can be written as 4 in base ten, or as 100 in base two. If you write a zip code in anything other than base ten, it's not really a zip code anymore.

Likewise, you can't add zip codes, nor subtract them, nor even meaningfully say that 11229 is more than 11228. You might as well use the letters A through J instead of the digits zero through nine, and suffer no serious inconvenience.

And indeed, some countries use something like zip codes, except that they have letters and numbers, yet I would hesitate to say that Canadian zip codes are so fundamentally different from US zip codes that they deserve to be stored under a different data type.

In conclusion, even you prefer a strong schema, a case can be made that zip codes should not be integers. Of course, that leaves open the question of what they should be, since most SQL databases have no built-in data type for digit sequences or anything similar. I have no answer to this and would personally just use integers anyway.

Ah, I should have been clearer in my initial response. I actually have no opinion on whether zip codes are a number, as I'm from the UK and don't really know how they work :-). I was more responding to the idea that any numeric value you don't do calculations on should be stored as a bare string. I would be happy enough with a string that had a constraint limiting the string to characters [0-9], though (again, assuming the data in question is limited to those chars).

On the whole, pgexercises punts on the data modelling aspect, as it's focused on teaching SQL instead - in some cases choosing deliberately bad schema design to make it easier to ask interesting questions. On reflection I'm not sure this was a good approach, but I'm far past the point where I have time/inclination to revisit it.

Ah; makes sense

What's the issue with making them char(5) (or perhaps char(9) depending if you want to do zip+4)?

That's what I've always done..

it's not 'a case can be made' - it's strictly incorrect to store zip codes as integers

So you'd create a different table for international addresses, or have 2 zip code fields or ...?

Zip codes aren't numbers, they are numeric strings.

You can store a NJ zip in an integer; you just have to be careful when printing it. Annoying, but if space is scarce, maybe worth it. (Though if space is too scarce for zipcodes, good luck storing anything else more interesting.)

A suggestion: repeat the table names on every exercise in the description. I had been going trough them casually - and always forgot what the table name was and had to go back or press the "show answer" button.

i was just googling for these yesterday! this is great

Very good site. Just a question here, is there any book or tutorial which teach (like the site do) index's, triggers, function etc?

What a great tool/site. Thanks for sharing!

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact