
Documenting your PostgreSQL database - dmmalam
http://www.craigkerstiens.com/2013/07/29/documenting-your-postgres-database/
======
SugarfreeSA
I believe that documenting anything with comments (be it code or a PostgreSQL
database) should only be done for more complex pieces of code or to elaborate
on something that could be misinterpreted.

Misguiding or incorrect comments caused by "Comment Rot" will actually cause
more harm than having no comments at all. In fact, your code should be self
explanatory (if you do it correctly) in terms of what it is doing. Adding
extra comments to elaborate on "why" something is being done may be more
useful - i.e. When the semantics of the code are not as easily deduced from
the syntax.

For more information on "Comment Rot":
[http://blogs.msdn.com/b/ericlippert/archive/2004/05/04/12589...](http://blogs.msdn.com/b/ericlippert/archive/2004/05/04/125893.aspx)

However, I do find documenting database tables interesting and will consider
doing it in the future under very specific circumstances. Thanks for sharing!

~~~
eitally
I've worked intensively on a schema (as an end user developing reports against
it) where several columns in one of the tables were flex fields whose
content/use changed dependent on a different column's value. It was always a
bear to train new developers on it and it wasn't documented anywhere (until I
created a flat file documenting it). In a situation like that I'd have loved
inline tabular comments.

Similarly, if a table or a field has become "legacy", that is, no longer
actively used but kept around because there's old data in it somewhere, that
should be commented, too. Or, if a column has it's datatype changed for some
reason while there is data in the table, that should probably be noted, too
(e.g. char --> varchar, int-->string, etc).

Schema versioning is a huge problem at many companies, especially non-tech
companies where the developers are completely at the mercy of their "business
stakeholders", and since full-fledged data dictionaries don't exist for almost
anything, using simple comments like this could be a boon. I dunno, ymmv, but
I'd have appreciated it.

~~~
SugarfreeSA
Yes, I definitely see value of comments in a situation like flex fields
because that's something that may not be straight forward or clearly apparent!

Legacy tables or fields also make sense because the code will remain the same
and thus lead to no comment rot.

As long as the comments stay relevant and cause no extra confusion, then it is
useful! I suppose it is also situation dependant as well as the preference of
the development team at the time.

------
NDizzle
I add my comments to the commit message that contain the schema changes. If
you seriously add a comment like 'required first name of user' you are wasting
everyone's time.

~~~
jsnell
Especially when at least right now one of the things being documented in the
example as "first name of the user" is the last_name field!

    
    
      last_name  | character varying(50)       | ... | required first name of user
    

That shows two documentation antipatterns, actually. Documentation that
conflicts with the code, and documentation so generic that it gets carelessly
filled in with copy-paste.

~~~
cmdkeen
I work with Oracle these days, or the ... might contain it - but is it not
also missing a NOT NULL constraint?

Documentation that conflicts with the DDL makes for an easy code review at
least. I look at it more like an interface specification, especially as there
are several ways to write DDL - though there should really be a house style.
For instance the NOT NULL constraint might be being added later on the script,
so it isn't necessarily readable in the same way as normal code.

------
hartror
> You comment your code, why shouldn’t you comment your database?

I don't as a general rule, the code itself expresses what it is, comments
explain why it is. The examples here suffer the common problem of explaining a
language/tool feature with naive examples. For example the columns in the
users table are self explanatory apart from the HSTORE "data" column, which
itself is likely a crime in its own right.

The exception I would make is with complex queries, SQL isn't the most
transparent language especially the more advanced/vendor specific techniques.

~~~
craigkerstiens
Author of the original post here. Any time you come in new to a system and
just assume the columns are self explanatory you often make a wrong assumption
somewhere along the way. Being explicit is not much more effort to make it
safer when analyzing or working with the data.

As for a 'data' column name for hstore. This is something I've often talked
about, hstore is a key value store. The big value of it is adding and removing
things fluidly over time, for that reason data can be a good a column name as
any in certain cases.

~~~
sbuttgereit
Absolutely could not agree more. The thing that even clearly named schemas,
tables, columns, etc. miss are context and intent. If you have that context
already you probably don't need much by way of documentation. But as someone
that gets called into clients to work through their databases, I spend a lot
of time trying to figure that context and intent. If it's documented, that
learning curve is reduced greatly.

I usually do these comments on all objects regardless of how obvious the names
and code may be. If even only for the reason of ensuring that doing this
discipline remains second nature. I also like to remember that I'm not doing
this for me, but rather for the client's team and others that come after me to
work on things.

~~~
cmdkeen
This. Database impact analysis is painful enough without a developer assuming
5 years ago that their table was so obvious it didn't need a comment.

------
narrator
Anyone had any good experiences with tools to graphically document PostgreSQL
with E/R Diagrams, etc.?

~~~
GregQuinn
Nearly all the standard data modelling tools(ERwin,System Architect,
Enterprise Architect ...) have a facility for reverse engineering a schema.
This produces an ER diagram for the schema. It's worth pointing out however
that this diagram gives you structure not semantics. The latter has to come
from elsewhere.

------
teddyh
I always did this, but maybe I’m unusual? I started with SQL by having a real
project to do and by reading the PostgreSQL manual – I made extensive use of
keys, triggers, functions, etc. Later I took some university classes in
database design, which was interesting as the knowledge about normal forms,
etc. complemented my existing experience.

I guess that if you instead started by using something like MySQL or SQLite
then you would have a wholly different view of what an SQL database is for,
and would have a hard time viewing SQL as something other than a data
container with optional sorting.

------
mianos
Not to harp on about it, but, if you use python, an ORM, like sqlalchemy can
make documenting a schema much more manageable. Although it is saying the same
thing, it is mainly because the relationships and derivations are more
compact, in-line and not spread all over the place. I have taken to making a
new git project with the just the schema and making it a dependency of the
project.

~~~
fibo
Ahah, probably if Postgres were written in python it would self document its
data models

~~~
mianos
It would not be real quick though :) SQLA, Python and Postgres the best of
both worlds, and another world.

------
helge5
If it is worth commenting on, you might actually want to turn your SELECT into
a VIEW:

CREATE VIEW "Names of projects with open tasks grouped by email" AS SELECT
users.email, array_to_string( ...; COMMENT ON VIEW "Names of projects with
open tasks grouped by email" IS 'More info on query ...';

------
j_s
It is reasonable to require some info about each major stored procedure change
as a comment at the top (when, who, why - something standardized).

It is nice to have this on-hand, and it should also be in source control
check-in comments / project management work items or tickets too... this can
be picked up auto-magically.

------
ExpiredLink
You'd use a data(base) modeling tool which creates the DDL script - including
comments - for you.

------
brightball
Great feature not taken advantage of often enough.

------
sinzone
PostgreSQL is loved by the developer community and is growing in adoption very
fast. When we decided to open-source KONG [1], we went with the only supported
db at that time which was Cassandra. Mainly we use Cassandra to handle rate-
limiting across multi region and because it's also easy to scale horizontally.
But even though is battle tested at FB or Netflix, we were surprised to see
how the community immediately pushed [2] for supporting PostgreSQL as well.

It's definitely a good fit, especially for the powerful features around JSONB
datatypes.

[1] [https://github.com/Mashape/kong](https://github.com/Mashape/kong)

[2]
[https://github.com/Mashape/kong/issues/331](https://github.com/Mashape/kong/issues/331)

