SQL Style Guide

deburo · on Sept 6, 2018

Familiarity makes consistency the only real guide in any formatting guide. So, apart from formatting rules, there are still a few bits to argue about.

Try to use only standard SQL functions instead of vendor specific functions for reasons of portability.

I don't think this is too important as the many places I've worked at had been using the same vendor for many years. Depends on your future plans, I guess.

Include comments in SQL code where necessary. Use the C style opening closing where possible otherwise precede comments with -- and finish them with a new line.

I prefer always using -- so that adding a new line of comment only consist of prefixing it with -- instead of going on the preceding line, backspacing a bit and then starting your new line.

Do not prefix with tbl or sp_ [...]

More-so because in SQL Server, sp_ means special and not stored procedure.

Avoid, where possible, concatenating two table names together to create the name of a relationship table. Rather than cars_mechanics prefer services.

I don't think this is a good example. This table is there to support the many-to-many relationship, ye? The name "Services" doesn't really convey that.

Object oriented design principles do not effectively translate to relational database designs—avoid this pitfall.

This is too generic and deserve additional details and examples.

mdaniel · on Sept 6, 2018

> I don't think this is too important as the many places I've worked at had been using the same vendor for many years. Depends on your future plans, I guess.

Just as a "for your consideration," using the standard ones also increases the likelihood of a new person joining the team who has also had experience with them and need not context switch to look up whatever vendorism there is. A super common case would be all the variants of `substr`: 0-indexed, 1-indexed, start+length or start+end, that kind of jazz.

I deeply appreciate that a new person is only new for a little while, then they _have_ said context. I just wanted to point out that the mythical "database vendor portability" isn't the only reason to use a standard vocabulary/syntax.

pwaivers · on Sept 6, 2018

> Avoid: Plurals—use the more natural collective term where possible instead. For example "staff" instead of "employees" or "people" instead of "individuals".

What if a word doesn't have a collective term, or the term isn't very good? For example, what would you do for "computers"?

Additionally, does this mean that "employee" is better than "employees"?

paulddraper · on Sept 7, 2018

I always use singular.

But I don't really know if there is a convention.

ANSI information_schema uses plurals. https://en.m.wikipedia.org/wiki/Information_schema

cissou · on Sept 6, 2018

"hardware"

Bjartr · on Sept 6, 2018

Boxen

ScoutOrgo · on Sept 6, 2018

> Joins should be indented to the other side of the river and grouped with a new line where necessary

Don't agree here. Having all tables down the left makes it very easy to quickly scan all the tables that are being joined. With indents following the joined table, you can then quickly see and map in your head how all the tables are joining as well. This is how I would format the code in the Joins section:

    SELECT r.last_name
    FROM riders AS r
    INNER JOIN bikes AS b
        ON r.bike_vin_num = b.vin_num
        AND b.engine_tally > 2
    INNER JOIN crew AS c
        ON r.crew_chief_last_name = c.last_name
        AND c.chief = 'Y';

jXCw1N0jtH3 · on Sept 6, 2018

Yeah I like that. Personally I don't indent the join conditions either because of syntax highlighting, but at least yours is readable. The indentation in the "joins" and "subqueries" sections of the style guide are fucking atrocities. Who writes SQL like that?

ainar-g · on Sept 6, 2018

>Where possible avoid simply using "id" as the primary identifier for the table.

I wonder what is the reasoning behind this? To escape the ambiguity during joins?

>Designs to avoid

>EAV (Entity Attribute Value) tables—use a specialist product intended for handling such schema-less data instead.

What if you have to use the data in some complex join-heavy query? Make a copy of the data needed in the relational database and hope that the data doesn't get unsynchronised?

Slackwise · on Sept 6, 2018

>> Where possible avoid simply using "id" as the primary identifier for the table.

> I wonder what is the reasoning behind this? To escape the ambiguity during joins?

Most justifications center around the "USING" clause which can only be used on matching column names:

    select user_id,
           username,
           comment

    from   users
           inner join
           comments     using (user_id);

Versus `on user.id = comments.user_id`.

The argument against this is that the USING clause really mucks up the result set columns by unifying those that match the USING clause.

mhd · on Sept 6, 2018

Most styleguides really do benefit from rationales. Just stating something ex cathedra sounds rather patronizing.

Basic example would be the mandatory capitalisation of keywords.

Or why collective nouns instead of plurals? Why file_system and not files? The latter would serve better to avoid the common "file_" prefix of the tables, never mind that "file system" is ambiguous.

As for EAV, the first question isn't "what other product should I use for this" but whether stuff is really schema-less in the first place. For some kind of reason, people are really scared of defining new tables and views or extending existing ones… (Only place where I personally ever had problems with that was with a very enterprisey structure and strict DBAs. ITIL requests for every DDL)

ainar-g · on Sept 6, 2018

>Most styleguides really do benefit from rationales. Just stating something ex cathedra sounds rather patronizing.

The author kind of explained their justification in their blog post[1]:

>It beggars belief how many readers claim to have read the guide and yet I still get messages asking for justifications. At the very top, the introductory paragraph clearly states that Celko’s book is the place to find in-depth discussion of each point. Discussions of justification have no place in a style guide anyway.

Not sure I agree with this last point.

[1] https://www.simonholywell.com/post/2016/12/sql-style-guide-m...

amp108 · on Sept 6, 2018

>> EAV (Entity Attribute Value) tables—use a specialist product intended for handling such schema-less data instead.

> What if you have to use the data in some complex join-heavy query? Make a copy of the data needed in the relational database and hope that the data doesn't get unsynchronised?

No, he's saying that schema-less, arbitrarily-extended information has no business in a relational system at all.

polaritron · on Sept 7, 2018

Not only should one use ISO-8601 compliant data types, but also set the time zone and server time according to UTC time.

accnumnplus1 · on Sept 6, 2018

Do we really still need the caps? I thought that was left over from monochrome days, but I don't use them on the cli either.

mfonda · on Sept 6, 2018

As with any style guide, many things are subjective, and the only correct answer is the one that your team applies consistently, whatever that may be.

Overall, these look like a great set of standards to follow. A few items that I think deserve additional attention:

> Use a collective name or, less ideally, a plural form. For example (in order of preference) staff and employees.

I fully agree with using plural names. Plural makes a lot of sense IMO because tables contain multiple things. An employees table contains many employees.

However, I disagree with using the collective name. There is too much ambiguity: the collective name doesn't always exist, doesn't always mean the exact same thing, and there are often multiple options for the collective term. Simply converting it to plural by appending an s (if possible) makes things more consistent and easier to quickly see, and can furthermore help with things like automatically naming tables / deducing table names.

Often times code exists that represents a single row in a table (e.g. we might have an Employee class that represents a single employee). Singular here makes sense to me because it represents a single row. To then find the table corresponding to this class (or vis-versa), it's simply a matter of pluralizing. If our table was instead "staff", this would be harder to see at a glance.

I'm curious why they advocate collective names--having trouble thinking of any advantages to this. Does anyone have any arguments in favor of collective names?

> Where possible avoid simply using id as the primary identifier for the table.

This is one of the biggest things you can do for SQL readablitiy IMHO. It's so much easier to quickly see what's going on when columns aren't simply named "id". If I see a column called "userID", I can quickly figure out that it's a userID with much less context than I would need if all I saw was ID.

One additional rule I would add, and this is perhaps a bit controversial: always declare columns as NOT NULL if possible, using a sane "empty default value" such as an empty string or 0. If I wanted to find all users without a name, it's much easier to do "SELECT * FROM users WHERE name = ''". This won't match NULLs--if we don't declare NOT NULL, then we must do "SELECT * FROM users WHERE name='' OR NAME IS NULL". Easy to forget this--better to have the extra safety of NOT NULL to prevent mistakes like this. A good rule of thumb IMO is "declare columns as NOT NULL unless you specifically need to allow null".