
SQL style guide - okket
http://www.sqlstyle.guide/
======
Stasis5001
Weird, I really dislike a lot of the suggestions. In particular, an example:

    
    
      SELECT first_name AS fn
        FROM staff AS s1
        JOIN students AS s2
          ON s2.mentor_id = s1.staff_num;
    

We already agreed that staff is a good name for a table, so why are we
renaming it to s1? There are all sorts of subtle bugs that arise when
s2.mentor_id = s1.staff_num is wrong, and the variable names provide no help
here that we're doing things right.

How about:

    
    
      SELECT first_name
        FROM staff
        JOIN students
          ON student.staff_id = staff.id;
    

If you're reading some 50+ line SQL query with a bug, things like this are
easy to read and be confident in. Honestly, I think AS should basically never
be used for tables.

Likewise, the guide gives the example of naming join tables 'services' instead
of 'car_mechanics'. But if I see a table 'car', a table 'mechanic', and a
table 'car_mechanic', I instantly know how to join against this in any
direction. The hard part of understanding a schema is how it fits together,
not what it represents, and when you have hundreds of tables and need to
remember that join table's name, it's really, really nice if it's trivial to
derive.

Lastly, the guide's recommended indentation for FROM vs JOIN statements seems
off to me. Consider:

    
    
      FROM riders AS r
             INNER JOIN bikes AS b
             ON r.bike_vin_num = b.vin_num
                AND b.engines > 2
    

Well, the thing is, the riders table and bikes table have the exact same
priority in the query, but this spacing really emphasizes the riders table and
it's not clear why.

~~~
quizotic
> Honestly, I think AS should basically never be used for tables.

I believe AS is required for self-joins. A common expository query is to find
all employees who earn more than their managers, which can be expressed via
"select name from staff as employee join staff as manager on
(employee.manager_id = manager.id) where employee.salary > manager.salary".

Also, SQL allows a sub-select to be listed in the FROM clause. But in such
cases, the sub-select must be given a name (tables have a name, but sub-
selects don't).

As an aside, the keyword "AS" is allowed by not required.

~~~
Stasis5001
Sure, I forgot the self-join case. In my experience these are a minority of
joins, but definitely. In postgres at least, a lot of self-joins can be solved
with a DISTINCT ON instead.

Sub-selection with a FROM clause, yeah, those definitely need names. But those
don't have perfectly good names that everybody already knows, whereas tables
do. If you're maintaining somebody's code and they declare

    
    
      foo_bar = 10

I think it'd be pretty strange to ever consider writing

    
    
      fb = foo_bar

just because you find foo_bar too long to write or something.

------
mattmanser
Try to use only standard SQL?

Stopped reading right there.

There is no way you can port SQL Server to MySQL to Oracle to PLSQL without
rewriting virtually every single query. They're all too different.

So why bother? Write idiomatic code that other programmers used to that engine
will understand.

Not only that but different engines like different things. That blazingly fast
nested subquery in MS SQL will become a massive performance problem in MySQL.

There is no such thing as portable SQL, don't pretend there is.

Not only that, often the standardised way is worse than the vendor specific
way (e.g. INFORMATION_SCHEMAS vs built in sp_ from MS).

~~~
wvenable
I completely agree with this. One should never plan to change database engines
-- it just happens so rarely and it's such a big deal that planning for it is
a waste of time.

It would be like coding in Java in such a way to make porting to C# easier in
the future (or vice-versa).

~~~
berntb
So it isn't true anymore that quickly growing companies often need to go from
one of the open source DBs to Oracle, after a decade or so?

(OK OK, that is a good problem to have.)

Right now I miss Oracle. (Never thought I'd say that. :-) )

~~~
wvenable
It's highly unlikely you could ever use generic enough SQL to go from open
source DB or Oracle. Best case you wouldn't be getting the best performance
out of your open source database and then you wouldn't be getting the best
performance out of Oracle.

It's better to build for the platform you have than build for a mythical
common platform that contains the minimum features of syntax of all of other
DBMS (which probably doesn't exist). If you change DB platforms, you're just
going to have to rebuild.

~~~
berntb
OK, your point is not that DB changes won't happen -- but that there is no use
to prepare and think about it beforehand. Sounds reasonable to me.

~~~
Can_Not
Kind of like premature optimization, "premature compatibility" can be a
performance/labour-hours killer.

------
Pxtl
I never get the fussing with aligning. I get the importance of clean
indenting, but so much SQL I see looks like the writer spent a long time
tweaking the alignment with spaces to make sure the data types lined up
perfectly or the ON clauses or what have you.... And then had do do it all
over again the moment they changed something.

Just indent when scope changes, and don't align things. Alignment is a time-
sink.

~~~
terminalcommand
I always think of the DB, as the last castle of a program. When everything
else fails, the DB should function. Programs are maintained way more than
they're written.

No matter how hard I try to keep my code-base clean, it gets cluttered.

That's why I strive to create a powerful base structure and make them
bulletproof.

Whatever may come (depression, personal problems, fatigue, stress, deadlines,
etc.) this skeleton keeps the app alive.

In my experience code that is written now, will stay and that complete rewrite
I've been dreaming about will never happen :D.

So IMHO it's better to design the core funcionality exhaustively and try to
create a failproof code-base.

My intellectual capacity fluctuates and I'm never as productive as I want to
be. I need a safety rope for my dumber version, who has forgotten about the
code and has no idea where to start :)

Indenting, and aligning helps a lot, when I try to load the code back into my
memory.

~~~
Numberwang
"I need a safety rope for my dumber version"

I like this. 14:00 Numberwang appreciates the effort.

------
dotancohen
Though mostly good advice, I definitely disagree with this:

    
    
      DONT:  
      Plurals—use the more natural collective term where possible instead. For
      example staff instead of employees or people instead of individuals.
    

I like naming my tables as plurals so that foreign keys to the table rows can
have a name that relates to the table name. For instance, having column
`Orders.employee` as an FK to an `Employees` record makes much more sense than
having column `Orders.employee` as an FK to a `Staff` record.

~~~
exceptione
The debate about singular/plural naming for relations comes up frequently. In
my opinion, it stems from the fact that English has only 2 forms for all the
possible grammatical cases. It has only a different form for the singular and
plural cases, and thats what people concentrate on. I propose to take a step
back.

There are quite some languages that have different forms for the nominative,
accussative, dative and genetive case. Lets take Latin and the concept of an
unpaid developer ( _servus_ ).

Servus is nominative singular. Lets make it plural: _servi_.

    
    
      Select from servi...
    
    

oh wait, we have a from so we need an ablative plural here. So we get:

    
    
      select from servis...
    
    

What about updating? Then we need a plural accusative form:

    
    
      update servos...
    
    

But are we updating the whole collection or just a single record? Then we
should probably talk about _servum_. Et cetera ad infinitum.

This _might be a little overstretched_.. Why don't we __just use the normal
form __(the style guide talks about _natural_ ) of a noun? Good question, that
is the singular nominative form. So, why make a special case for plural and
ignore al other grammatical concerns?

~~~
throwaway745234
Ian Mackinnon, is that you?
([http://stackoverflow.com/a/3894235/1163893](http://stackoverflow.com/a/3894235/1163893))

~~~
exceptione
Nope, i am not. But good catch, since his comment on SO made me originally see
the light. :)

------
no_protocol
I have often seen commas placed at the start of each line when columns are
being listed. I'm a little surprised this guide doesn't follow that
convention. It would fit nicely with the established pattern of lining
everything up:

    
    
      SELECT first_name
           , last_name
           , email_address
        FROM users
    

Can anyone speculate why it's not done this way in the guide? I think the
reasoning behind it was to make later modification easier because adding or
removing the last field required modifying two lines.

~~~
Treffynnon
This is a pretty commonly asked question. The answer is two fold:

1\. putting the comma before doesn't actually make commenting out columns
easier - it just moves the problem to the other end of the list. Just try
commenting out `first_name` in your example and you'll see what I mean.

2\. because normal convention would be for the comma to come immediately after
an item in written English.

~~~
r00fus
Re; #1, I also use comma-first indenting and I actually put each field in it's
own line specifically for ease in commenting (first_name would have its own
line)

~~~
Treffynnon
This doesn't solve the problem at hand either unfortunately. You'll still have
a hanging comma in front of `last_name`:

    
    
      SELECT 
          -- first_name
           , last_name
           , email_address
        FROM users
    

When parsed it will look like this query error to the system:

    
    
      SELECT
           , last_name
           , email_address
        FROM users

------
barrkel
If you're using Rails, your life will be much simpler if you do use an 'id'
column, use plurals for table names, and name your join tables 'bars_foos',
i.e. using the plural on both sides and in alphabetical order.

Don't really agree on indenting joins that are at the same level as FROM.
There's not much semantic difference between the table in the FROM clause and
one in a JOIN clause. Subqueries should be definitely be indented when nested.

Don't really agree with the leading whitespace alignment either. Alignment
across lines is generally troublesome when combined with team development and
version control: it leads to merge conflicts. If you start using 'GROUP BY'
that forces you to introduce whitespace in your 'SELECT' line to bring
everything into sync, then chances of a conflict increase quite a bit.

Don't 100% agree on using IN either, in particular if you're using a tuple
membership check. That is, `JOIN x ON (x.a, x.b) IN ((1, 2), (2, 3), (4, 5))`
certainly looks neater than the equivalent mix of ORs wrapped around ANDs, but
it usually performs quite a bit worse because not many people write their SQL
that way and the DB is not normally tuned for it.

A much bigger concern from my perspective (reviewing PRs from other
developers) is writing SQL that will perform well for the database and data
sizes. There are massive perf pitfalls around NOT EXISTS vs NOT IN, and it's
easy to get horrific performance out of MySQL especially if using a subquery
in the WHERE clause.

Structuring things as a join gets much more predictable performance, and
structuring a set of joins as nested subqueries instead of joins at the same
level (or using extra parentheses in Postgres) gives the SQL author more
control over what will be done. Don't trust the optimizer to know best. It
often doesn't and if it chooses wrong, it can hose a release. Measure, as
always.

~~~
Stratoscope
> Don't really agree with the leading whitespace alignment either. Alignment
> across lines is generally troublesome when combined with team development
> and version control: it leads to merge conflicts.

I'm definitely with you there. I've written about the disadvantages of this
kind of column alignment several times on HN, so won't repeat them here.
(Well, if anyone asks, I'll dig up a specific comment or two.)

I have a suggestion that can help anyone cure the temptation to column align
all the things: Try coding in a proportional font for a while.

Contrary to popular wisdom, proportional fonts work fine for most kinds of
programming. The only thing you can't do in them is column alignment, and in
most cases this limitation is a _good_ thing.

Instead of column alignment, you'll be forced to use indentation, and you may
discover that indentation alone gives you enough ways to make your code
structure clear without arbitrary alignment.

In my case, it happened the other way around. I got tired of the problems that
column alignment caused, so I stopped using it and switched to indentation
only. Some time later I was curious and tried a proportional font, and the
code was still just as readable as before. So now I use proportional fonts all
the time, and I'm no longer tempted to use column alignment.

------
wvenable
My SQL style guide is quite a bit different from this:

* Use consistent and descriptive identifiers and names.

* Make judicious use of white space and indentation to make code easier to read.

* Try and use the same column/table naming convention as your programming language. If you language uses camelCase, use that. If it all possible, names should be the same everywhere to prevent confusion.

* Table names should always be in the singular: "Person" instead of "People", "Invoice" instead if "Invoices". This removes all pluralizing ambiguity and makes sense when mapping to class names, etc.

* Concatenate two table names together to create the name of a relationship table (especially in the case of many-to-many) unless a more appropriate name makes sense.

* Use TableNameId as the primary key column name.

\---

* Do not use descriptive prefixes or Hungarian notation such as sp_ or tbl.

* Do not plurals -- don't even use collective names

* Avoid quoted identifiers

* Avoid underscores -- except for use a poor-mans namespacing

* Avoid simply using id as the primary identifier for the table.

* Avoid aliasing as much as possible unless joining the same table multiple times. In that case, use a descriptive name. Never use short one or two character aliases.

* Avoid putting keywords in all caps -- use an editor with syntax highlighting instead.

------
cema
An interesting read! I do follow certain rules when writing SQL, and I agree
that having them and following them is a good idea. Plenty of what is in the
article looks like good advice.

However, these rules do not always appear to be consistent with how the code
in (most) other programming languages is written. Consider indentation, for
example. The usual approach is to line up those elements of the code which
correspond to the same logical level of the flow, whether it is C++,
Javascript, or Lisp. So rather than (quoting from the article)

    
    
      SELECT file_hash
        FROM file_system
       WHERE file_name = '.vimrc'
    

I would rather see either

    
    
      SELECT file_hash
      FROM file_system
      WHERE file_name = '.vimrc'
    

or

    
    
      SELECT file_hash
          FROM file_system
          WHERE file_name = '.vimrc'
    

(The difference here is the same as between indented and non-indented braces
in C.) I think this is especially useful when we have multiple joins or inner
queries.

Many suggested rules are indeed consistent with what I have seen to be
accepted as best practices. Such as naming a table in singular (more
precisely, giving it the name of what a single row corresponds to), or
avoiding the infamous Hungarian notation (pretty much an accepted best
practice in most languages that I have seen).

Overall, I think, a good first step towards building best practices.

~~~
dahdum
It's good advice, but the formatting shown is not conducive to quick editing
imo.

Here's your huckleberry:

    
    
      SELECT
      	fs.id,
      	fs.file_hash 
      FROM
      	file_system fs,
      	other_table ot 
      WHERE
      	file_name = '.vimrc' AND
      	fs.id = ot.file_system_id

~~~
cema
Yes, this is better. BTW, I saw people writing things like

    
    
      fs.id
      ,fs.file_hash
    

and

    
    
      file_name = '.vimrc'
      AND fs.id = ot.file_system_id
    

which has subtle advantages (such as easier editing).

Also, I cannot bring myself to writing SQL keywords in all caps. It feels
ancient. Just a matter of taste.

------
Dowwie
"Where possible avoid simply using id as the primary identifier for the
table."

Has anyone had trouble by using surrogate primary keys? I've found the
opposite of what the author said could be more true: composite keys should be
avoided instead.

~~~
andreareina
I think the author was talking about (not) using `id` as the column name, e.g.
`user.user_id` instead of `user.id`

~~~
webbore
ON second reading I think you are correct. I agree on this point then, as it
helps when joining through a denormalized table that references multiple PK
ids, ex: I think: user_id = role_id is more readable than the likely
alternative: u.id = r.id

~~~
NickNameNick
it also lets you use the join foo using(foo_id) syntax, which I quite like

------
bigger_cheese
Naming conventions are pretty controversial I think.

The databases I deal with are mostly for storing time series data taken from
measurement instruments (i.e Load Cells, Thermocouples, Flow transmitters,
Pressure gauges etc). We seem to use prefixes to distinguish columns i.e.

MS_ (Mass in Tonnes, MS_KG_ etc. Is used If not stored as Tonnes) VL_ (volume,
in cubic meters) RT_FL_ (is Flow rate in normal cubic meters per hour) TP_ (is
temperature in degrees Celsius) PR_ (is gauge pressure in KPa)

I've seem some databases which didn't use a prefix convention and in my
experience it was a real nightmare trying to keep track of units etc.

There is an ISO standard for naming conventions ISO:11179 - My work's
databases are (apparently) configured based on this but I'm sure the DBA's
extended it in a bunch of ways.

~~~
Pxtl
I tend to prefer English suffixes and camelcase rather than Hungarian prefixes
and underscores. Like, for a steel coil, InnerDiameterMM, MassTonnes, GaugeMM,
etc. I prefer to leave abbreviations for cases where it's obvious even to non-
experts, like mm for millimeters.

~~~
bigger_cheese
My only issue with that is you'd wind up with some very long column names.

Using prefixes you can encode a lot of information in a relatively short ID.
Which may or may not be useful. As you mentioned though the downside is it
requires some domain knowledge to decode.

A lot of our columns are in a form like this:

TP_TSC_HEX1_XXXXX

(where XXXX is some ID usually from scada system)

I can see from the name this is a temperature corresponding to thermocouple
id: XXXX in TSC Heat exchanger 1.

Edit: I should also mention some programming languages (not sql) have a
language feature for specifying named ranges (I think it is called lazy
evaluation) which tends to favor prefixed names over suffixed - I.e
x1,x2,...,xn to iterate over the prefixed range. We use SAS heavily at my work
which supports using this syntax, which probably influenced the design of our
DB tables.

------
mmgutz
Still dont get why sql uses all caps for keywords. It's one of the few
languages that has it as a best practice. I can read C/JavaScript/go just fine
without all caps.

I do it anyway to conform.

~~~
VLM
You can ignore the boilerplate if its in CAPS. Quite a few DB engines enforce
as a syntax error a SELECT that isn't in the precise order of FROM WHERE GROUP
HAVING ORDER LIMIT (plus some others depending on your DB). You and I
understand what we mean if we write

LIMIT 10

SELECT *

FROM wtf

but many DB engines will be unhappy because LIMIT has to be after FROM, etc.

Once you're used to the strict order rules, you can skip reading anything in
CAPS as long as the bug you're fixing isn't that they're myspeled or order out
of.

------
VLM
Encoding. Needs a standard for encoding. "We are all UTF-8 here" As a parody
of a bad design: this table is UTF-8 except for that column which is UTF-16 LE
and that column of names is CJK and the street addresses column which is 7bit
ASCII.

If you embed encoding into a design where the data changes like putting png
files into a column named icon_gif, you are deserving of physical punishment.
A better design is column binary_blob_28 should really have a sister column
named binary_blob_28_mimetype. And don't get cute and use silly names for
mimetype, if IANA's never heard of it you probably shouldn't be using it as a
name.

Oh and while you're all unicoding for a good time, standardize that everything
you write is run thru a normalizer as appropriate, or at least "you tried".

Another fun area not covered (or I missed it) is nothing is more hilarious
than a timestamp without associated timezone. So is 1846 a good Central time
to eat dinner, or a good UTC time to eat lunch in the midwest USA, or ... some
DB support a surprising amount of TZ conversion although TZ always turns into
a surprising amount of pain. "We're storing every time as UTC" isn't the worst
idea ever.

Where possible ALWAYS use "id" as the name of the table primary identifier and
its always a bigint... why burn infinite limited brain cell cycles trying to
figure out the prikey of an arbitrary table "hmm well table employee uses
MD5(social_security_or_equiv) which is a 16 byte char as its primary key but
table source_code uses git hashes now is it the full 40 bytes or is 7 byte
prefix good enough (usually is...). If you have to look up every foreign key
or even worse, guess, or worse yet, guess and get it wrong, thats the last
time your prikey won't be a bigint named id. Oh and as a corollary all your
foreign keys will always be named "something_id" type bigint where "something"
is the exact table name, not just close.

For the largest subset of equipment you'll interconnect with, figure out the
best hardware supported hash and use it for non-security (de-duplication, for
example) purposes. That may very well be md5, or maybe the lower 4 to 6 bytes
of md5, or whatever.

Speaking of security the guide didn't go into it, so unless you're conferred
with someone who knows the problem domain plus a little about security, assume
your DB will downloaded next week and published on wikileaks ... don't do
something stupid with sensitive data. If theres no reason to store sensitive
data, then don't store it at all to begin with!

------
Animats
Further suggestions:

\- If a field should never be NULL, put NOT NULL in the field definition. If a
field should be unique, put UNIQUE in the field definition. This improves both
correctness and index performance.

\- A useful format for CREATE TABLE is the one you get back from SHOW CREATE
TABLE.

------
jawns
The suggestion to right-align the root keywords (e.g. SELECT, FROM, WHERE) and
left-align everything else does make the SQL look good.

But I can't think of a single editor that makes this type of alignment
formatting easy to do. Can you?

~~~
rspeele
emacs C-u M-x align-regexp can do this, if you give it a regexp matching the
keywords you wish to right-align and enter -1 when prompted for "group to
modifiy". It would still be a hassle though.

------
dghf
Can anyone explain what this means?

> 1\. The key should be unique to some degree.

If it's talking about primary keys (which it seems to be), it should be
unique, full stop.

If it's not, where does uniqueness come in? Is it actually talking about
indices, which, if I understand correctly, should have a wide spread of values
(without any of them necessarily being unique) if they're to be useful?

~~~
carterehsmith
It is just the terminology.

Technically, a key is any column (or a set of columns) that you use to find
records. It does not need to be unique. But it tends to be more useful if it
is more selective, I guess that's why they say 'to some degree'.

So now, if a key is also unique, it is called a 'candidate key'.

In general, you can have more than one candidate key in a table. For example,
a natural key (such as order#) and a surrogate key (e.g. 'id').

Among candidate keys, you pick one, and call it 'primary key'.

~~~
dghf
> Technically, a key is any column (or a set of columns) that you use to find
> records.

Isn't that an index rather than a key?

I thought 'key' meant:

\- a superkey (any set of columns the values of which must be unique in any
given row; in a properly relational model, any relation will have at least
one, the trivial superkey, the set of all attributes in the relation)

\- a candidate key (a minimal superkey, i.e. one without redundant columns --
if you remove any column from the key, it will cease to be unique)

\- a primary key (the candidate key considered to be and designated as the
most important)

\- a foreign key.

~~~
carterehsmith
I agree, the definition I provided (not mine) sounds more like an 'index' than
a 'key'.

I am not familiar with that usage, I am used to 'key' being used as a logical
concept, a constraint, while 'index' is a physical concept, some structure
that is there 'just' to make shit go faster.

Anyways, some sources don't even define the word 'key', but use it to mean
'unique key'.

And some other sources (like the one by OP) use it to mean something else
(totally not unique), as they recommend that it be 'unique to some degree'

And then some others use the definition I provided above. Is that the one
meant by OP? I don't know.

I any case, I guess OP meant something like "indexes should be highly
selective'. Which BTW is a performance tuning advice, I am not sure that that
belongs in 'SQL Style Guide'.

------
cyberferret
Glad to see I conform to the guide in most places, but one thing I have been
doing for nearly all 35 years of my programming career is to prefix my SQL
views with `vw` so that I can tell immediately in my queries when I am
referencing a view that may be made up of other queries.

~~~
beart
With this style, you lose the ability to transparently switch out a view for a
table.

------
geordee
Now we need a formatter/linter too

~~~
antoineaugusti
I'd love this tool. If you know a great one, please share it

~~~
ZenoArrow
If you're using MS SQL Server, this tool is popular...

[http://www.red-gate.com/products/sql-development/sql-prompt/](http://www.red-
gate.com/products/sql-development/sql-prompt/)

... and this tool appears to have similar refactoring capabilities...

[http://www.ubitsoft.com/products/sqlenlight/index.php](http://www.ubitsoft.com/products/sqlenlight/index.php)

------
insulanian
The part about indentation is a bad practice to follow. Indenting the code
that way is not maintainable. I witnessed that multiple times in my career and
adopted three simple rules:

    
    
        1. Use only one space character between any two words.
        2. Use only one tab (4 spaces) per indentation level.
        3. Use no more than two blank lines to vertically separate parts of the code.
    

These three rules are simple, easy to remember and I'm applying them to any
language I work with.

------
johnnyb9
This is great, agree with many of the suggestions.

The worst SQL for my eyes is the stuff that has multiple columns, etc. on the
same line. Makes diffing stuff 2x as hard and you can't parse as much
information. Though maybe I'm just used to that at this point.

------
buro9
Let's just agree that we will all disagree.

Then let's agree just to accept that there is no ideal, but that there is
benefit in using the same formatting, and that perhaps we should create a
sqlfmt utility like gofmt and all just use that.

~~~
greggyb
Ah, but what RDBMS should that sqlfmt utility target? There are large
differences.

Here's one for TSQL. I find it very useful when spelunking through customers'
almost invariably awful SQL.

[http://poorsql.com/](http://poorsql.com/)

------
lisper
Arrgghhh... yet another set of rules with no rationale or justification.

~~~
Treffynnon
Have you read Celko's book? If not I suggest you do. As the guide states at
the very top that is where you'll find the detail you're after.

------
mrgalaxy
I have only done a little bit of work with SQL. Could someone please go into a
bit more detail on a couple of these points for me?

> Plurals—use the more natural collective term where possible instead. For
> example staff instead of employees or people instead of individuals.

> Where possible avoid simply using id as the primary identifier for the
> table.

~~~
VLM
The first one is just poorly stated but is correct in spirit. Use the most
inclusive and flexible name allowed by the business logic without being overly
generic. Some would say thats the natural term, I guess. The first example of
using "employee" as a term invokes murphy's law that the company will hire
their first contractor or intern the first week the software ships, resulting
in much confusion, wait you've got a column for employees wheres the column
for contractors? The second example guarantees that someone in production will
confuse the individual serving ice cream cup production table with your HR
list of employees if you use the overly generic word "individual". Or maybe
those individuals are sales prospects, not employees.

The second one is simply wrong because it burns brain cells when you come back
to debug or extend something a year later and nobody memorizes the prikey of
table production_quality_results, is it the serial number of the mfgrd object
or the timestamp of the QAQC inspection or the serial number of the inspection
activity or ... and when you look at column names in table 131 is
drivers_license_id a foreign key to a row in the drivers_license table or just
a raw store of data, like this is just where you store it in the system? This
is especially hilarious if your FK and data source are similar bigint type,
like a bigint to connect a FK to a prikey or is your component serial number
literally a bigint itself, assuming the prikey is the data itself without
dereferencing it will be a hilarious bug, "it seems serial number 10 doesn't
exist in our assembly line" "Whoops thats actually row 10 of the production
table, serial number whatevs".

~~~
Treffynnon
I am totally lost when it comes to your comments on the second item. It seems
you're trying to say that surrogate keys are easier to find or perhaps guess.

The indexes set out against a table can generally be accessed with a query
making it very easy to work out what existing index suits your use case best.

------
iblaine
A guide is better than no guide. I like the part about adding suffix strings
to variables. Two things I would changes

* Add commas before the variable * Do not use single letters for aliases. Some versions of hive will throw exceptions. Plus seeing tables a, b, c, d joined together is annoying to read.

------
jbverschoor
I really like his indentation. I'm gonna use it. Most of it has been my style
for a long time.

~~~
webbore
If you're writing a lot of ad-hoc queries, I prefer to have keywords on their
own line and also put commas at the start of the following line vs end of the
previous. It lets you comment stuff out easier as you experiment. In a
production query the only hard rule should be, "Please be consistent!"

~~~
Treffynnon
Putting the comma before doesn't actually make commenting out columns easier -
it just moves the problem to the other end of the list. Just try commenting
out the first column in your SQL and you'll see what I mean.

------
based2
[https://www.reddit.com/r/SQL/comments/3efmn0/sql_style_guide...](https://www.reddit.com/r/SQL/comments/3efmn0/sql_style_guide_a_consistent_code_guide_ensuring/)

------
mjw_byrne
Trouble with this kind of thing is that if you make it general enough to apply
to most situations you can't really say anything (too many exceptions) and if
you get opinionated then your guide doesn't apply to a wide class of use
cases.

Stuff:

1\. Advice like "Use consistent and descriptive identifiers and names" \- is
general programming advice, not SQL style advice, and is so bland as to go
without saying (would anyone seriously recommend being inconsistent or non-
descriptive?)

2\. Preferring standard SQL to vendor-specific SQL just isn't practical for
nontrivial use cases.

3\. "Use / _..._ /-style comments where possible" \- why? This makes
commenting out large blocks of code really fiddly. Using double-dash comments,
even for multiple-line comments, leaves / _..._ / usable for knocking out big
blocks of code.

4\. The advice on table naming - if a table contains one row per xyz, then
calling the table xyzs and calling its id xyz_id is sound practice, arguably
even if this does violence to the natural language in which xyz is a word.
This reduces cognitive load on devs and makes the naming more predictable,
especially bearing in mind that not everyone speaks the same first language.

5\. No idea why "cars_mechanics" is worse than "services". "services" could
mean practically anything.

6\. There's a lot more to be said about table aliasing. Personally I prefer a,
b, c... and then in derived tables, different groups of letters such as x,
y... or p, q... . Stuff which requires an alias which will never be used (e.g.
a derived table which is simply selected from) is aliased by _.

7\. The guide says Hungarian prefixes are to be avoided, but a _num suffix
denotes a number? Doesn't seem consistent.

8\. All the advice on portability is unnecessarily opinionated. Portability is
a concern that varies from completely unimportant to completely necessary,
depending on what you're trying to achieve.

9\. Indentation and layout - there isn't a "one size fits all" rule - it just
comes down to judgement and taste. The given examples also look weird:

    
    
      SELECT r.last_name
        FROM riders AS r
             INNER JOIN bikes AS b
             ON r.bike_vin_num = b.vin_num
                AND b.engines > 2
    

(a) Tables "riders" and "bikes" are siblings in this join so why are they
indented to different levels? (b) The AND isn't symmetric with respect to its
operands.

IMHO this is better:

    
    
      SELECT
        r.last_name
      FROM
        riders AS r
      INNER JOIN
        bikes AS b
      ON
        r.bike_vin_num = b.vin_num
        AND
        b.engines > 2
    

Arguably, "b.engines > 2" should be in a WHERE clause instead of the ON
clause, because it doesn't refer to table r. The query result is the same and
the query planner probably produces an identical plan (depends on engine, of
course), so this is genuinely just a style/semantics point.

10\. "Avoid UNION" \- why??

11\. Avoid vendor-specific data types - this just isn't practical real-world
advice. Sometimes you commit to a certain DB because you need to use its
specific feature set. That's OK.

12\. "Prefer NUMERIC and DECIMAL to REAL and FLOAT because rounding errors are
annoying". This is _completely_ determined by your specific requirements!
Essentially this advice is saying "always prefer arbitrary-precision
arithmetic". 32-bit and 64-bit floats are _much_ faster and more compact than
arbitrary-precision and are accurate enough for a very wide class of use
cases. IMO this advice should be turned on its head: "use floating-point
unless you know you need higher precision and you understand the
performance/storage implications" \- which would be the default approach in
most programming scenarios.

13\. "Avoid table partitioning" \- this is a design/performance concern, not a
style concern, and it depends entirely on use case.

14\. "Avoid EAV, use a different product" \- not real-world advice. Most devs
don't have the luxury of just introducing different products when they run
into an awkward scenario, and even if they did, introducing a whole new DB
engine carries major overhead. Sometimes you just suck it up and have a small
EAV wart in your design.

~~~
Treffynnon
I think a lot of this comes down to what appears to be a simple
misunderstanding.

 _Avoid_ \- in this context means that where it does not make sense (for
performance, readability, etc reasons) then don't follow the guide. Follow it
where possible and be mindful of when you deviate that you're adding to tech
debt.

------
justinlardinois
> If there is already a correlation with the same name then append a number.

And then applies it for this query:

    
    
      SELECT first_name AS fn
        FROM staff AS s1
        JOIN students AS s2
          ON s2.mentor_id = s1.staff_num;
    

I think that's terrible. s1 and s2 sound to me like they're instances of the
_same table_ that you had to JOIN with itself for some reason.

~~~
developer2
Agreed, there is a _lot_ I disagree with from the article, but this is the
only one that my mind lashed out at as simply being WRONG. The only time I
have seen numbers used on aliases in 10+ years is on self-joins. Even on self-
joins I prefer to see something in the alias that identifies why that
particular join is part of the query, but the reality is that such a
descriptive name would often be so long that you're more likely to see a
numeric suffix.

------
HalfwayToDice
Regarding lower-case/hungarian notation etc, the membership table in my
database is:

    
    
      tblMember
      ---------
      MemberID
      MemberUsername
      MemberDateJoined

etc

Is this very non-standard formatting for SQL?

~~~
wvenable
Why prefix your tables with tbl? Do you prefix your functions with "fun" and
your variables with "var" and your classes with "cls"? It's completely
redundant.

~~~
carterehsmith
Some people do it to avoid clashes with reserved words.

Example: in some DBMSs, you cannot name your table 'user' because it is a
reserved word.

"user" (double quotes) works, but there may be gotchas as sometimes double
quotes suddenly make identifiers case-sensitive... but only if the filesystem
is case-sensitive... things like that.

So you can either deal with the gotchas or say fuck it, put a prefix "tbl" and
call it a day.

~~~
wvenable
I'd much rather just name that table AppUser than prefix every single other
table with "tbl". That's just taking one minor annoyance and multiplying it a
thousand times.

------
Numberwang
Anyone use these fellas [] ?

~~~
ZenoArrow
Yes, though generally only if a field or table name happens to be a SQL
keyword.

~~~
philliphaydon
In SQL server? In PostgreSQL you would wrap with double quotes. Which kinda
sucks when your language uses double quotes for strings and you have to
escape.

~~~
ZenoArrow
Yes, in SQL Server.

