
SQL Style Guide - ainar-g
https://www.sqlstyle.guide/
======
deburo
Familiarity makes consistency the only real guide in any formatting guide. So,
apart from formatting rules, there are still a few bits to argue about.

 _Try to use only standard SQL functions instead of vendor specific functions
for reasons of portability._

I don't think this is too important as the many places I've worked at had been
using the same vendor for many years. Depends on your future plans, I guess.

 _Include comments in SQL code where necessary. Use the C style opening
closing where possible otherwise precede comments with -- and finish them with
a new line._

I prefer always using -- so that adding a new line of comment only consist of
prefixing it with -- instead of going on the preceding line, backspacing a bit
and then starting your new line.

 _Do not prefix with tbl or sp_ [...]_

More-so because in SQL Server, sp_ means special and not stored procedure.

 _Avoid, where possible, concatenating two table names together to create the
name of a relationship table. Rather than cars_mechanics prefer services._

I don't think this is a good example. This table is there to support the many-
to-many relationship, ye? The name "Services" doesn't really convey that.

 _Object oriented design principles do not effectively translate to relational
database designs—avoid this pitfall._

This is too generic and deserve additional details and examples.

~~~
mdaniel
> I don't think this is too important as the many places I've worked at had
> been using the same vendor for many years. Depends on your future plans, I
> guess.

Just as a "for your consideration," using the standard ones also increases the
likelihood of a new person joining the team who has also had experience with
them and need not context switch to look up whatever vendorism there is. A
super common case would be all the variants of `substr`: 0-indexed, 1-indexed,
start+length or start+end, that kind of jazz.

I deeply appreciate that a new person is only new for a little while, then
they _have_ said context. I just wanted to point out that the mythical
"database vendor portability" isn't the only reason to use a standard
vocabulary/syntax.

------
pwaivers
> _Avoid: Plurals—use the more natural collective term where possible instead.
> For example "staff" instead of "employees" or "people" instead of
> "individuals"._

What if a word doesn't have a collective term, or the term isn't very good?
For example, what would you do for "computers"?

Additionally, does this mean that "employee" is better than "employees"?

~~~
paulddraper
I always use singular.

But I don't really know if there is a convention.

ANSI information_schema uses plurals.
[https://en.m.wikipedia.org/wiki/Information_schema](https://en.m.wikipedia.org/wiki/Information_schema)

------
ScoutOrgo
> Joins should be indented to the other side of the river and grouped with a
> new line where necessary

Don't agree here. Having all tables down the left makes it very easy to
quickly scan all the tables that are being joined. With indents following the
joined table, you can then quickly see and map in your head how all the tables
are joining as well. This is how I would format the code in the _Joins_
section:

    
    
        SELECT r.last_name
        FROM riders AS r
        INNER JOIN bikes AS b
            ON r.bike_vin_num = b.vin_num
            AND b.engine_tally > 2
        INNER JOIN crew AS c
            ON r.crew_chief_last_name = c.last_name
            AND c.chief = 'Y';

~~~
jXCw1N0jtH3
Yeah I like that. Personally I don't indent the join conditions either because
of syntax highlighting, but at least yours is readable. The indentation in the
"joins" and "subqueries" sections of the style guide are fucking atrocities.
Who writes SQL like that?

------
ainar-g
>Where possible avoid simply using "id" as the primary identifier for the
table.

I wonder what is the reasoning behind this? To escape the ambiguity during
joins?

>Designs to avoid

>EAV (Entity Attribute Value) tables—use a specialist product intended for
handling such schema-less data instead.

What if you have to use the data in some complex join-heavy query? Make a copy
of the data needed in the relational database and _hope_ that the data doesn't
get unsynchronised?

~~~
mhd
Most styleguides really do benefit from rationales. Just stating something _ex
cathedra_ sounds rather patronizing.

Basic example would be the mandatory capitalisation of keywords.

Or why collective nouns instead of plurals? Why file_system and not files? The
latter would serve better to avoid the common "file_" prefix of the tables,
never mind that "file system" is ambiguous.

As for EAV, the first question isn't "what other product should I use for
this" but whether stuff is really schema-less in the first place. For some
kind of reason, people are really scared of defining new tables and views or
extending existing ones… (Only place where I personally ever had problems with
that was with a very enterprisey structure and strict DBAs. ITIL requests for
every DDL)

~~~
ainar-g
>Most styleguides really do benefit from rationales. Just stating something
_ex cathedra_ sounds rather patronizing.

The author kind of explained their justification in their blog post[1]:

>It beggars belief how many readers claim to have read the guide and yet I
still get messages asking for justifications. At the very top, the
introductory paragraph clearly states that Celko’s book is the place to find
in-depth discussion of each point. Discussions of justification have no place
in a style guide anyway.

Not sure I agree with this last point.

[1] [https://www.simonholywell.com/post/2016/12/sql-style-
guide-m...](https://www.simonholywell.com/post/2016/12/sql-style-guide-
misconceptions/#where-is-the-justification)

------
polaritron
Not only should one use ISO-8601 compliant data types, but also set the time
zone and server time according to UTC time.

------
accnumnplus1
Do we really still need the caps? I thought that was left over from monochrome
days, but I don't use them on the cli either.

------
mfonda
As with any style guide, many things are subjective, and the only correct
answer is the one that your team applies consistently, whatever that may be.

Overall, these look like a great set of standards to follow. A few items that
I think deserve additional attention:

> Use a collective name or, less ideally, a plural form. For example (in order
> of preference) staff and employees.

I fully agree with using plural names. Plural makes a lot of sense IMO because
tables contain multiple things. An employees table contains many employees.

However, I disagree with using the collective name. There is too much
ambiguity: the collective name doesn't always exist, doesn't always mean the
exact same thing, and there are often multiple options for the collective
term. Simply converting it to plural by appending an s (if possible) makes
things more consistent and easier to quickly see, and can furthermore help
with things like automatically naming tables / deducing table names.

Often times code exists that represents a single row in a table (e.g. we might
have an Employee class that represents a single employee). Singular here makes
sense to me because it represents a single row. To then find the table
corresponding to this class (or vis-versa), it's simply a matter of
pluralizing. If our table was instead "staff", this would be harder to see at
a glance.

I'm curious why they advocate collective names--having trouble thinking of any
advantages to this. Does anyone have any arguments in favor of collective
names?

> Where possible avoid simply using id as the primary identifier for the
> table.

This is one of the biggest things you can do for SQL readablitiy IMHO. It's so
much easier to quickly see what's going on when columns aren't simply named
"id". If I see a column called "userID", I can quickly figure out that it's a
userID with much less context than I would need if all I saw was ID.

One additional rule I would add, and this is perhaps a bit controversial:
always declare columns as NOT NULL if possible, using a sane "empty default
value" such as an empty string or 0. If I wanted to find all users without a
name, it's much easier to do "SELECT * FROM users WHERE name = ''". This won't
match NULLs--if we don't declare NOT NULL, then we must do "SELECT * FROM
users WHERE name='' OR NAME IS NULL". Easy to forget this--better to have the
extra safety of NOT NULL to prevent mistakes like this. A good rule of thumb
IMO is "declare columns as NOT NULL unless you specifically need to allow
null".

