
How I Write SQL, Part 1: Naming Conventions (2014) - sehrope
https://launchbylunch.com/posts/2014/Feb/16/sql-naming-conventions/
======
meritt
I personally prefer _person_id_ to be the primary key name (instead of _id_ )
in both the person table and any table which has it as a foreign key. One
reason is for join syntax:

    
    
        select * from person join team_member using (person_id)
    

The other reason is _person_id_ now unambiguously refers to the same field
regardless if we're looking at the PK or a FK. It's always _person_id_.

~~~
jermaustin1
I use T-SQL, and this is actually why I always join with the table name:

SELECT * FROM Person JOIN TeamMember on PersonId = Person.Id

~~~
mtone
I'd bring it a little further and would write:

SELECT * FROM Person as P INNER JOIN TeamMember as TM on TM.PersonId = P.Id

I have:

\- Aliased each table and prefixed every field names with their table alias in
my join conditions.

\- Explicited the JOIN type.

The above:

\- Reduces mistakes due to ambiguities that tend to generate unwanted
duplicates rows in SQL.

\- Increases the likelihood of getting an error at parse time, instead of run-
time or analysis-time, thanks to added scoping.

\- Works in any schema, no matter what naming conventions are followed.

\- Keeps working as the query becomes more complex with multiples table
aliases or self-joins, and similar field names appearing in the set.

\- Better expresses intent. Sure JOIN defaults to INNER JOIN, but writing
"INNER JOIN" shows that you genuinely expect any row not matching your
condition to be removed from the result set.

~~~
pbreit
As a newbie, this makes more sense to me:

    
    
      SELECT * FROM person, team WHERE person.team_id=team.id
    

(no unnecessary aliasing, no weird JOIN phrasing)

~~~
mtone
I can understand not using aliasing in a simple query, but I'd recommend
against using these non-ANSI joins (deprecated syntax). What you're writing
can be interpreted as either:

SELECT * from person CROSS JOIN Team WHERE person.team_id=team.id

SELECT * from person INNER JOIN Team ON person.team_id=team.id

That they happen produce the same result in a query is practically just luck.
Changing the join from INNER to LEFT OUTER is also much easier than managing
(+)'s in the WHERE clause, once you're used to it.

~~~
kbenson
This is good advice, IMO. You _will_ have to use left/right joins at some
point, and having the join type at the join location is useful, but having the
join conditions at the join location is _immensely_ useful.

It's harder to see in your example, likely because you tried to keep it
similar to the example presented, but once there's more than a couple joins,
having the join conditions close to the join is essential for keeping track of
what's going on. e.g.

    
    
      SELECT * FROM person, team, role, group, person AS lead
      WHERE person.team_id=team.id
        AND person.role_id=role.id
        AND person.group_id=group.id
        AND group.lead_id=lead.id
    

Compared to:

    
    
      SELECT * from person
        INNER JOIN team ON person.team.id=team.id
        INNER JOIN role ON person.role_id=role.id
        INNER JOIN group ON person.group_id=group.id
        INNER JOIN person AS lead ON group.lead_id=lead.id
    

And let's be clear, we know these should be left joins, because the chance
some person doesn't have a team, role, group, or a group ends up without a
lead is high when sampled over time. And simulating left joins with the non-
ANSI joins quickly gets unwieldy.

------
elchief
Joe Celko, as well as ISO-11179, tell us to use collective names ("personnel")
or plural names ("employees") for tables

As well, fewer keywords are plural, compared to singular, so there's less
chance of accidentally using a keyword if you use plurals

Haven't yet seen an "octopus" table in production...

~~~
danso
I did a Google search for elaboration on this and apparently there is
disagreement that ISO-11179 says this at all:

[https://social.msdn.microsoft.com/Forums/vstudio/en-
US/d5f2f...](https://social.msdn.microsoft.com/Forums/vstudio/en-
US/d5f2faf3-3c85-413d-bc09-ce1d477b31e8/iso-111795-and-table-
names?forum=transactsql)

> _Yes, this is the same version as I found, but the closest thing I could
> find to addressing table names in the paper itself was an "Object Class
> name", something like an OOP Class or something you'd find in a UML diagram,
> but not really the same as a table name, and in any case all the examples
> were singular._

> _Was actually kinda hoping Celko would deign to comment on this himself as
> he seems to be the chief proponent of the "collective identifiers as
> specified by ISO 11179" meme._

~~~
elchief
"To remind users that tables are sets of entities, ISO-11179 Standard likes to
use collective or plural nouns that describe the set of those entities for the
names of tables. Thus 'Employee' is a bad name because it is singular"

Page 10, SQL For Smarties (Celko), 5th Ed

If Celko says it's right, it's right

~~~
danso
OK, but the ISO-11179 Standard doesn't seem to actually say that. Here's a
purported copy of Part 5 -- Naming and identification principles -- of the
standard (as linked to from the aforementioned MSDN forum thread:

[https://www.ftb.ca.gov/aboutFTB/Projects/ITSP/Part_5_Naming_...](https://www.ftb.ca.gov/aboutFTB/Projects/ITSP/Part_5_Naming_Principals.pdf)

The word "collective" doesn't show up in the document.

The word "plural" shows up twice, both times in item #a of "Lexical rules":

> _a) Nouns are used in singular form only. Verbs (if any) are in the present
> tense._

> _NOTE In Japanese, this rule shall not be applied because of no plural form
> of nouns and no distinction of verb tense._

The only reference on Wikipedia to the ISO 11179 standard making that
recommendation has since been deleted for being unsubstantiated:

[https://en.wikipedia.org/w/index.php?title=Data_element_name...](https://en.wikipedia.org/w/index.php?title=Data_element_name&type=revision&diff=677768128&oldid=544105362)

edit: The apparently inaccurate content is talked about on the "Data element
name" Talk page:
[https://en.wikipedia.org/wiki/Talk:Data_element_name](https://en.wikipedia.org/wiki/Talk:Data_element_name)

------
daigoba66
Also important is adapting to the existing naming conventions of the database,
even if you don't like it. (Unless the existing naming conventions cause more
trouble that its worth like requiring quoted identifiers or redundant
prefixes/suffixes).

~~~
ysleepy
I agree, consistency is worth a lot more than using a slightly better
convention. It would also create surprises in the sense of "principle of least
surprise" which is in general a good guideline while designing APIs schemas
and so on.

------
daphneokeefe
For naming stored procedures, there were a lot of helpful answers to my
question "What is your naming convention for stored procedures?" on
StackOverflow a few years ago.
[https://stackoverflow.com/questions/238267/what-is-your-
nami...](https://stackoverflow.com/questions/238267/what-is-your-naming-
convention-for-stored-procedures)

------
0xffff2
Without a rigorous attempt at justifying each of these rules, I don't find
this article particularly useful. For example, can someone link to or provide
a formal explanation for why table names should be singular? I actually really
wanted to read the full relational algebra rational for that one.

~~~
cwbrandsma
My own view is they should either be all plural or all singular. Just pick
one. But I do gravitate to singular because all nouns have naturally
consistent singular words. The same cannot be said of plural. e.g. Moose,
Cactus, and any other word that ends in an 's'.

~~~
wvenable
I used to do all plural but then I read a good argument online for doing
singular so I switched to that for the next project and I've done singular
ever since.

The few advantages of singular:

1\. It's not always clear what the plural of something should be.

2\. Chances are the singular maps better to your application layer (Person
class <=> Person table).

------
jcadam
I know a reasonable amount of SQL but am by no means a database guru.
Naturally, this makes me the local database expert at my current place of
employment :/

The other devs' eyes glaze over when I say things like 'stored procedures' and
'trigger functions.' Bah.

------
wvenable
I don't understand this one:

> Mixed case identifier names means that every usage of the identifier will
> need to be quoted in double quotes

I've used quite a few RDBMS engines, including most mentioned by the author,
and I've never had to quote mixed-case identifier names. They work just the
same as all lower-cased names or as any other case-sensitive language.

Most of the programming languages I use typically have the convention of using
PascalCase for classes and public fields/properties so I prefer to use that
convention for tables and columns (and then everything else for consistency).
When doing operations between the application and the database, the name is
exactly the same without the need for translation.

Otherwise, I think it's a good list.

~~~
stubish
If you are not quoting your mixed-case identifiers, then they are not mixed-
case. They are being implicitly converted to either lowercase or uppercase
depending on your database, and your capitalization is lost. It normally
doesn't matter, until you are introspecting your database schema, at which
point your code generation generates a Organizationrole protobuf message
instead of an OrganizationRole, or your ORM fails to find OrganizationRole
because PostgreSQL stored it as organizationrole.

~~~
wvenable
This seems like a terrible feature of Postgres and Oracle (which apparently
works the same but oppositely). All other database engines I've used have
retained the case of unquoted identifiers.

I haven't used Postgres enough to notice this, it's almost a deal breaker.

I might be tempted to mandate that all identifiers be quoted than deal with
half the possible characters for names. Although more likely all code-
generation would happen on the application side with DB migrations so the
database wouldn't the source of truth for identifier names, anyway.

------
walshemj
How come I have _never_ in a multi decade career come across "i18n".

I though the canonical way of doing this was to write KEYWORDS in caps and use
camel case for Variables.

Also never really brought into adding the type as part of a name - your type
is already defined in your schema.

~~~
cryptonector
Really? I have. In some circles (e.g., the IETF), i18n is an ancient acronym.
People who've worked on operating systems (e.g., OS X, Solaris, RHEL,
whatever) have to deal with L10N (localization). G11N (globalization) is I18N
+ L10N.

And then there's a11y: accessibility. This is all about making user interfaces
accessible for people with low or no vision, low or no hearing, difficulty
typing, and so on.

There are generally applicable laws requiring G11N and A11Y, and these fall
heavily on OS vendors, which is why people who've worked on OSes tend to know
these acronyms.

I18N -> dealing with Unicode in general, codeset conversions, font issues, ...

L10N -> dealing with translating system/application messages to the users'
preferred languages (and how to even know they preferences) (think locales)

G11N -> I18N and L10N.

Localization is damned difficult. There's all sort of little bothersome
things, like how to format numbers (which varies quite a lot) and dates (can't
we all just use ISO-8601?!). And translating printf-like format strings is
often non-trivial, especially when the coder doesn't stop to think about just
how hard they might be to a translator as they write their code.

~~~
dvh
V10N --> Velociraptor

T15X --> Tyrannosaurus Rex

D11S --> Dilophosaurus

B11S --> Brachiosaurus

T9S --> Triceratops

S9S --> Stegosaurus

~~~
ysleepy
V10R you mean.

~~~
dvh
Of course

------
sytelus
Lot of these is debatable. For example, I have preferred FirstName or even
“[First Name]” instead of first_name in sql because lot of tooling uses these
names to generate UX. Similarly using Person.PersonID instead of Person.ID
gives consistency in diagrams and foreign key naming. I have used both
approaches with its own pro and cons.

~~~
sbov
> Similarly using Person.PersonID instead of Person.ID gives consistency in
> diagrams and foreign key naming.

I assume you mean you just use PersonID as the foreign key. This oftentimes
introduces ambiguity into what the relationship actually is. I prefer names
that describe the actual relationship (e.g. author, owner, approver, etc)
rather than letting other people guess what it is.

~~~
always_good
I usually want to reserve the noun like "author" as the embedded record after
a join. That way "author_id" is always the key and then "author" is the
json_agg joined object that embeds the whole record.

Otherwise you're actually introducing ambiguity imo.

------
barrkel
You should follow the conventions that make life easier in the rest of your
tooling.

The fact is, you're probably going to be issuing more SQL via abstractions
like ORMs or querying libraries than raw SQL. If you need to work against the
grain of those libraries to map your model, what upside are you getting?

If most of your data is queried via ActiveRecord, for example, you should use
plural table names.

~~~
btilly
It is true that most of my SQL goes through ORMs. However my most complicated
SQL is always constructed by hand. Furthermore note the point about how often
applications get rewritten against the same database. You should not assume
that future code will use the same ORM that you are using now.

~~~
rockostrich
It depends on the situation. If the queries you're writing deal are heavily
integrated with application models and logic, then using the ORM is probably
the way to go (especially with the ORM does some client side caching and other
optimizations). Of course, if you start to see that the ORM's queries
underperform compared to raw SQL then you should check to see what SQL the ORM
is spitting out. I've seen SQLAlchemy create some very poor queries compared
to what would be expected and ended up writing parts of those queries in raw
SQL, but those cases are pretty uncommon when most of the logic for the
application is simple gets/updates with filter/join conditions.

~~~
btilly
It does depend on the situation.

My experience has been that transactional code is usually best written with an
ORM, and complex reporting code winds up better with SQL.

------
iblaine
> Avoid reserved words

Glad you cleared this up for the rest of us.

FWIW, naming conventions are like opinions. Everyone has them, and they
usually differ from person to person. The best naming convention is a
consistent naming convention. Also, naming conventions differ greatly by
environment. A group of SQL Server engineers are going to have different
standards than those of people working on mysql.

------
youpassbutter
No. That's how you write sql within your organization. Also there are
syntactic differences between SQL flavors ( postgres, mysql, mssql, oracle,
etc ) that make a SQL standard unrealistic.

The only generic rule is "be consistent". Whatever convention/style you
choose, it should be consistent.

