
Rapid schema development with PostgreSQL - pearkes
https://speakerdeck.com/andrewgodwin/rapid-schema-development-with-postgresql
======
moron4hire
I'm failing to see the utility of hstore or json over adding nullable columns
here, especially if the latest version makes adding a nullable column
essentially instantaneous. It doesn't seem to simplifying the handling of
missing/unavailable data, but does manage to complicate the SQL syntax.

I have clients that use a variety of databases for a variety of reasons, not
the least of which is "just because that is where our data is." All the value
of most projects is in the data, and the value of data grows with age, because
you cannot recreate the past. If you lose something or you fail to record
something, then you can't ever get it back in a way that will stand up to
audit scrutiny. This has the awful effect of making technology-specific
details of databases get pushed into the business-decision realm, rather than
the technology-decision realm.

So complicating the SQL syntax is a significant issue for me. Sticking to as
much standard, ANSI SQL as possible makes my programs more portable across
RDBMSes. With some of the tools I've written, I can make a full transition
from MySQL to MS SQL Server and back again and the application doesn't care.
Having that sort of power makes upgrading your database a technology decision,
not a business one.

Yes, the features that Postgres have are nice, but to me they represent a very
great chance of vendor lockin. I don't believe that the Postgres team will
ever pull anything to make me hate them, but then I thought the same about Sun
at one point, too.

~~~
rpedela
I just see it like all applications and libraries that adhere to some
standard. Some adhere strictly to the standard whatever that is. Some do that
but also add more features to push the industry forward. NVIDIA does that for
graphics. Mozilla does that for browsers. Looks like PostgreSQL is starting to
take that role with relational databases.

I generally view it as a good thing unless you care about cross-platform. If
you do, then don't use the non-standard features.

~~~
moron4hire
But how is that different from the old MS battle-cry "embrace, extend,
extinguish"? It's the "extinguish" part I'm concerned about.

~~~
sehrope
> But how is that different from the old MS battle-cry "embrace, extend,
> extinguish"?

It's OSS. That alone makes it different. Anybody can continue work on it or
even fork it.

Postgres is an awesome database and an awesome project. I think the core
developers have a good blend of being committed to supporting the SQL standard
and adding new non-standard features. If you read through the docs they
usually specify if a given feature is standard SQL or a Postgres extenstion.
If you want to use something new that isn't standardized yet you have that
option. If you don't, you have the info that it's not. The choice is yours.

> It's the "extinguish" part I'm concerned about.

I'm not sure I understand this. Are you worried about Postgres extinguishing
its competitor databases (MySQL, Oracle, SQL Server, ... etc)? I'll be honest
that I do see that happening but it's not malicious. It's just a better
product. For transactional/persistent storage of data I don't see myself
recommending anything else.

------
josephlord
Hmmm... I'm having evil thoughts about putting a whole web application
including templating inside of Postgres and just exposing a few functions via
a web interface.

Don't worry I won't do it really but does anyone worry that Postgres is doing
too much and that focus might be lost on being a reliable and fast relational
DB? I haven't seen any signs of problems but I do have this slight concern
with all the array/hstore/json features they have been adding recently.

~~~
mclarke
I think it's impressive how quickly the Postgres world was able to adapt to
the shifting needs of webapps and the whole nosql thing. The latest JSON
features are a natural extension of the key-value & array stuff that has been
around for years.

There's also a bunch of working focusing on replication enhancements that is
underway; adding first-party replication tooling will be a huge reliability
improvement for Postgres clusters.

I don't think it's that they lost focus, it's that the project is picking up
steam.

~~~
josephlord
Yes you are probably right. I really like Postgres, the attitude (reliability,
standards etc.), the documentation, basically everything and don't want it to
change too much.

~~~
einhverfr
It's worth looking at things like hstore and json through the history of the
project. The project has always been one which has focused on how to manage
complex data in a relatively relational way, but has tended to go where no
other database has gone before (table inheritance for example).

Now, it is true that when you get into the advanced capabilities of the
database you run into hard edges that just don't make much sense at first, in
part because they represent real disputes regarding how everything is supposed
to work. Composite types in fields and table inheritance are well known for
these sorts of problems but once you get used to the ideosyncracies they
aren't bad.

This is true for JSON too, as there is no real way to map nested composite
types to JSON constructs both ways (you can do tuple -> json, but not json ->
tuple if the json object is nested). But a lot of these things just take time.

------
craigkerstiens
If you're looking to try Postgres as a full on document store there's more and
more tooling around making this feasible. Here's one newer gem that makes it
pretty straightforward for Rails:
[https://github.com/webnuts/post_json](https://github.com/webnuts/post_json)

------
nsitarz
I'm pretty sure that "ALTER TABLE ... ADD COLUMN ... NULL;" holds an
AccessExclusiveLock on the relation being altered for the duration of the
transaction in PG 9.1 and 9.2. Is there some trick to the zero downtime schema
changes that are mentioned in this slide deck? I only ask because I recently
had to get creative with zero downtime schema migrations for my current
project.

~~~
gwy
I believe the AccessExclusive is only if you are setting to NOT NULL, such as
"ALTER TABLE my_table ADD COLUMN my_col boolean NOT NULL DEFAULT false"

We were just down for 56 hours due to our 3rd party platform vendor applying
that type of update.

AFAICT the only workaround is to do it in steps: set the DEFAULT, fill
existing rows with the default, then apply the NOT NULL constraint (which will
still lock it for a full table scan to check the validity of the constraint).

~~~
joevandyk
ALTER TABLE always creates the AccessExclusive lock.

If you add a 'not null default something' to the column, it'll rewrite the
whole table, which can take some time. And since the table has the
AccessExclusive lock, this is what will block reads/writes.

Doesn't matter for small tables.

For large tables, you want to add the column without a not null or default,
commit that transaction, then populate the data, then add the not null and
default constraint.

~~~
fdr
You are right: in the fast case of "NULL" the amount of physical work amounts
to diddling catalogs around rather than copying potentially gigabytes of data.

So in any case a lock is taken, but I've never seen anyone get too bent out of
shape over that momentary mutual exclusion to deform the table's type
(exception: in transactional DDL where cheap and expensive steps are inter-
mingled).

------
CraigJPerry
When would the hybrid schema approach be useful?

To my mind it doesn't offer any extra ability to add or remove columns over
what we have already.

E.g. I'd never write select * I'd always name columns in my query so that I
can be immune to reordering or adding columns to the underlying table or view.

~~~
einhverfr
Where we are looking at it in LedgerSMB is to allow user defined extra fields
for forms. A previous approach was a framework for managing join tables, but
json would be much cleaner.

