
Why Use Postgres - craigkerstiens
http://craigkerstiens.com/2012/04/30/why-postgres/
======
famousactress
Left out the fact that Schema-changing operations are transactional (big win
over MySQL).

[Edit: In migrations, specifically. Also! No mention of the MySQL dual-license
JDBC driver malarky... Not that it's a success of Postgres' as much as it's a
failure of MySQL's... ]

I'd never heard of Window Functions (not an Oracle user, and been on MySQL for
a few years... left Postgres for replication and haven't made it back quite
yet).. but I desperately want them. The idea of effectively aggregating over
groups in a group by (which is how I understand these Window doohickeys) is
something I come up on a few times a year and get really frustrated by.

I think the only killer feature I'm left wanting from some RDBMS at this point
is managed-sequences... The idea that I have a related list of things that I
want to keep a sequence column for, and would like the database to understand
the concept and help me maintain it.

~~~
tzs
> I think the only killer feature I'm left wanting from some RDBMS at this
> point is managed-sequences...

I would like some RDBMS to give sharable transactions. What I mean by that is
something like this.

1\. When I begin a transaction, I can assign it a name.

2\. If multiple connections to the database are in transactions with the same
name, they can see the changes made by the others, as if none of them were in
a transaction. Connections not party to the shared transaction do not see the
changes until all of the connections in the shared transaction commit.

3\. The connections in the shared transaction can do unnamed transactions,
which isolate them from the others within the shared transaction.

The idea here is that sometimes I have some operation that I would like to do
atomically, but that involves multiple processes acting on the database.
Process A wants to make some changes, then pass responsibility on to process B
(which may be on a separate machine) to make more changes, and then process C
finishes up, and if any of them encounter an error I want to rollback to the
state before A started.

~~~
famousactress
Interesting. Without knowing what kind of system you're talking about I'm
suspicious that I'd use this. I certainly have distributed problems like this
where a logical concept is broken into a number of steps, and while I want to
distribute the steps.. the effects of them to the entire system might not be
realized until all of them are complete.

I generally wouldn't defer that responsibility to the database though. I'd
more often handle this as multiple database transactions that advance a state
machine, at the end of which consequence-inducing-events are fired off.

The idea that process A tickles the database into some state expected to be
understood (and only seen by) process B seems like too flimsy a contract to
me.

~~~
_sh
I get your point, but there is value in delegating to the database--how do I
roll back an external state machine?

~~~
bad_user
See the command pattern: <http://en.wikipedia.org/wiki/Command_pattern>

Basically you encapsulate whatever you need to do with the database in objects
(logic + parameters). And you describe for these objects not only the logic
for updating the database (forward), but also for reverting the database to
its previous state (rollback).

~~~
_sh
Yep, I know the command pattern--it requires knowledge of all state up front
and breaks down in the presence of distributed state. It makes statements like
'update all records where value _x_ is _y_ ' difficult. You need to gather all
such records first, possibly guarding their current state with a
timestamp/vector clock (we are talking distributed systems after all), make
the change encapsulating the current state in the command pattern and commit.
If you have to extend this model to multiple queries, you'll be rolling your
own transaction protocol lest you expose partial state to other queries in
transit.

Or you could delegate all this to the database, which already has its own
transaction protocol.

------
gouranga
I really disagree with this. The more features you use from your DBMS vendor,
the more you shoot yourself in the following departments:

\- scalability. Logic and processing in the server is bad as it means you can
only scale up and not scale out. Scale up is damn expensive. When you need
that 64 core 96Gb machine to run it all on with 6 replicas will see what I
mean.

\- complexity. The DBMS is a black box which sucks up tight coupling almost
instantly. Coupling is bad for maintenance and scalability. SQL always ends up
with heavy coupling.

\- lock in. Those features only work with postgres so you're stuck with it
forever or induce a lot of cost to move away when a better solution comes
along.

\- schema. Maintaining schema and dependencies is hell on all but a small
system.

These facts come from 20 years of using databases.

Stuff that will save you: Use your DBMS as a simple flat store akin to a
document store. Use a CQRS based architecture and encapsulate ALL of your
logic in code. If you have to go at least partially relational for god's sake
use an ORM that supports several databases well. Make a provider independent
app level backup/restore facility such as the one with JIRA or TeamCity. NEVER
hire a DBA - they exist only to create walled gardens to protect their
existence.

My current gig has cocked up on all areas and it's costing them over 40% of
their turnover keeping it alive.

Happy databasing :)

~~~
platonichvn
I agree with you but I often question when people list "lock in" as an issue.
How often do you see projects where they decide to move from one db to
another? Even in cases where an ORM is used, making the migration process
painless in theory, I doubt many projects take the risk and exercise their
freedom of not being locked in.

~~~
gouranga
If you look at most "corporate" IT platforms, they tend to have favourite
database engines that they need to use. If you have DBMS portability, then it
gets one foot in the door straight away.

Also contrary to the norm here, some applications last a long time. The one I
work with daily is actually 20 years old this year. In that time a lot has
changed and in the next 20 years a lot will as well. It makes sense for long-
lived applications to plan well ahead and consider paradigm and architecture
changes.

This platform started in C++ on OS/2 with Oracle running on VAX/VMS at the
back end. It went to Java/J2EE with an Oracle back end in the early 00's. It's
now SQL Server and .Net.

~~~
neeleshs
Interesting to know what abstractions were found useful when migrating from
VAX/VMS to Java/J2EE to .Net.

~~~
gouranga
Hibernate and nhibernate.

------
dahlia
The one of the biggest advantages of PostgreSQL is GiST (Generalized Search
Tree) which is based on the theory of indexability.

> One advantage of GiST is that it allows the development of custom data types
> with the appropriate access methods, by an expert in the domain of the data
> type, rather than a database expert.

<http://www.postgresql.org/docs/9.1/static/gist-intro.html>

> Traditionally, implementing a new index access method meant a lot of
> difficult work. It was necessary to understand the inner workings of the
> database, such as the lock manager and Write-Ahead Log. The GiST interface
> has a high level of abstraction, requiring the access method implementer
> only to implement the semantics of the data type being accessed. The GiST
> layer itself takes care of concurrency, logging and searching the tree
> structure.

> [...]

> So if you index, say, an image collection with a PostgreSQL B-tree, you can
> only issue queries such as "is imagex equal to imagey", "is imagex less than
> imagey" and "is imagex greater than imagey". Depending on how you define
> "equals", "less than" and "greater than" in this context, this could be
> useful. However, by using a GiST based index, you could create ways to ask
> domain-specific questions, perhaps "find all images of horses" or "find all
> over-exposed images".

[http://www.postgresql.org/docs/9.1/static/gist-
extensibility...](http://www.postgresql.org/docs/9.1/static/gist-
extensibility.html)

If you are already using PostgreSQL though have not known about this fact, I
highly recommend you to learn about GiST. It is the most powerful feature of
PostgreSQL as I know.

------
imperialWicket
If you are doing anything serious with GeoSpatial, you should be using
PostgreSQL with PostGIS.

~~~
sheraz
Beat me to it, so upvotes :)

Let me pile on a big one: Stored procedures in multiple languages:

* python * perl * PHP (though I could not get it to work on 9.1)

~~~
erichocean
And JavaScript! We're putting this to great use with the latest
SproutCore/Blossom project I'm working on.

~~~
pvh
Hitoshi Harada and Andrew Dunstan are currently working on making it ready to
put into production. Also, he's working on backporting the new JSON datatype
to 9.1 as an extension!

------
jamesli
Both Postgres and MySQL are great. In history, Postgres emphasized more on
feature development instead of performance, while MySQL took the opposite
approach. It depends on your engineering and operation requirements to choose
which one to deploy. Most of OP's points, however, need to have further
consideration, IMHO.

\- "While replication is indeed very important, are users actually setting up
replication each time with MySQL or is it to only have the option later?"

Replication is not an option. It is a must-have for any serious products, both
in scaling and in operation.

\- Windows functions: They are wonderful and I love them. But it is not an
important factor at all.

\- Flexible datatypes: True in certain scenarios. It allows to create certain
types to map a business object with unusual requirements. Otherwise, the
requirements have to be implemented in application logic. But data type of
Array? IMHO, use of Array data type in relational database usually means there
are some issues in data modeling and design. The performance is another issue.
If you have to use array, consider NoSQL options.

\- Functions. Although I am a database architect, I use database functions
only for simple poking around. For any serious work, I prefer to write
application code. It is easier to maintain, to test, and to extend. I don't
want to have a big muddy ball. I prefer to have structured, well-decoupled
application code.

\- Custom Language. Same as above. Why not just write application code?

\- Extensions: Geospatial is awesome. It is better than that in MySQL. But for
many extensions, I prefer not to do the computation in database. Databases are
usually the most expensive resources (in operation costs) and are usually the
performance bottle neck. It would be better to have the computation in app
servers. It is easier to scale and it is cheaper.

\- Custom Functions: see above.

\- Common Table Expressions: Recursive queries are awesome. I t is nice to
have for ad-hoc data inspection, but I wouldn't encourage its usage in
application code. The risk to use it wrong vs the benefits are high if used in
application code.

~~~
fdr
> \- Windows functions: They are wonderful and I love them. But it is not an
> important factor at all.

I disagree or would choose to qualify a number of the things you wrote, but
I'll choose this one because I think it's the most interesting:

While window functions are seldom (but not never) used by common application
workloads, but I find them pretty important for lowering the barrier of what
one-off queries I can write (hence, more questions are economic to answer) to
figure out some statistics that assist doing business or solving strange
problems. The alternative is to be forced to make a greater number of
decisions with less supporting data.

This can be applied to any labor-saving device when it comes to reporting, but
I think window functions make the cut for me overall, and it's great they are
there.

------
DrJokepu
PostgreSQL is awesome (I really love it), now let's do some complaining:

* I'm not fan of doing backups with pg_dump. I think it would be a lot more awesome if I could tell the database to back itself up in-band. Even better, I would like to be able to ask the database to schedule its own backups, instead of having to set up my own cron jobs (or equivalents). Commercial databases can do that.

* The sad state of libpq (the C API). Libpq is where PostgreSQL really shows its age. It's very cumbersome to use and can't do (or can't do well) a bunch of things the raw protocol is capable of. Since almost all PostgreSQL bindings are built on libpq, none of the bindings support these features either.

* It would be cool if it was possible to kill sessions in-band (other database servers can do that). With PostgreSQL, the only way to shut down misbehaving sessions is sshing in to the database server and doing a `kill` on the process of the session.

* The infamous COUNT(*) fiasco.

~~~
dkadams
Depending on how badly the session is misbehaving:
<http://www.postgresql.org/docs/9.1/static/libpq-cancel.html> or
pg_cancel_backend(pid int) or pg_terminate_backend(pid int)

~~~
DrJokepu
Oh that's very cool, I din't know about that one. I guess this item can be
removed from my list.

~~~
fdr
Internally there is also an interesting mechanism: when one connects, the
backend hands you a "backend key", which is a 64 bit number. The client is
intended to remember that number. Cancellations are processed by sending a
cancel request to the server -- which involves a special new connection and a
different first packet -- including that secret number. As an interesting
security precaution, absolutely no acknowledgement or response is made to the
cancellation request.

libpq knows how to do all this.

The reason for this is that to avoid inspecting the connection for new
instructions during query processing, acting under the assumption that
cancellations will be rare.

[http://www.postgresql.org/docs/9.1/interactive/protocol-
flow...](http://www.postgresql.org/docs/9.1/interactive/protocol-
flow.html#AEN91658)

------
ret
List of reasons is so short...

For example since 9.1 Postgres support foreign data tables - other data
sources is available for user as regular table; there are many drivers: MySQL,
Oracle, external pg server, and even plain files.

Since 9.0 there is support for per-column triggers.

------
CoffeeDregs
I know this isn't a MySQL versus PostgreSQL thread, but I was bitten by MySQL
today...

The issue: I ran a SELECT with a WHERE against a column that is an "int(11)",
but I meant to run it against another column that is "varchar(255)". Something
like: SELECT * FROM table WHERE wrong_column = "1abcdefghijk";

Somehow MySQL cast a "varchar(255)" to an "int(11)" with a value "1" _without_
telling me. WTF? My result set was approximately 1M rows. I expected 1 row, so
knew the result was wrong... But what if I had expected more than 1 row? Then
the bizarre results from MySQL would have _appeared_ to make sense.

~~~
te_chris
Maybe there's a deeper reason PHP and MySQL are so closely linked.

~~~
CoffeeDregs
Funny: I thought the exact same thing as I was writing my comment...

------
zzzeek
Great points. I'd love to see someone tackle how to get a corporate
environment to consider Postgres over MS-SQL. We weren't able to, due to MS-
SQL's great amount of "out of the box" functionality regarding automatic
failover, mirroring, clustering, etc. PG of course can do all of this but to
convince the DBAs to throw away their point-and-click interfaces so that we
could start writing custom scripts and listeners from scratch wasn't really
feasible.

~~~
imperialWicket
I don't have an answer to the issue of getting PostgreSQL in corporate
environments - but I did run into a couple other hurdles in trying.

1\. It's surprisingly difficult to find an experienced PostgreSQL DBA who
wants to work in a corporate environment.

2\. It's even more difficult to get a large corporation to pay a reasonable
salary (why would we pay more for a PostgreSQL DBA, when we can just throw
that money at Oracle/Microsoft and get a cheaper DBA with all the certs...)
which might entice an experienced PostgreSQL DBA to join the organization.

~~~
tbrownaw
I would guess that #1 is caused by #2... does #2 mean that Oracle+DBA (or
MSSQL+DBA) is cheaper than a Postgres DBA, or just that the database and the
DBA are generally paid for out of different department's budgets?

~~~
imperialWicket
I think it's correct that (2) directly influences (1). And I think that most
corporate groups are willing to hire an Oracle/MS SQL DBA with small amounts
of experience but the carrying the right certifications.

To expand a bit, IMO - that scenario doesn't really exist for PostgreSQL DBAs,
you are either experienced or inexperienced (with no weight-carrying
certificate to complement credentials). Hence the corporate environment won't
hire an entry-level DBA who has no certs to validate the hire, and they won't
justify the seemingly added cost of an experienced PostgreSQL DBA.

------
ironchef
You left out one of my faves. Parent / child tables (ie tabular inheritance).
Makes time series stuff so much nicer as pruning over time can be performed
with truncate as opposed to "delete from".

------
joevandyk
ActiveRecord 4.0 will support postgresql custom types, by the way.
<https://github.com/rails/rails/pull/4775> Thanks @tenderlove!

------
sunsu
Call me lazy, but this is why I don't use Postgres: Amazon RDS is MySQL

Its that simple. I can spend a lot of time setting up and monitoring a
Postgres cluster, or I can spend about 5 minutes configuring a MultiA-Z RDS
instance. However, the moment Amazon offers Postgres on RDS instances, I'll
make plans to switch.

~~~
papsosouid
So use enterpriseDB's postgresql service. It is the same deal as RDS, hosted
on amazone, only using postgresql and operated by postgresql experts.

------
gmac
Also: fast (having a properly smart query planner) and, in my experience,
fantastically reliable.

------
ddorian43
Is there a software that balances databases on different servers just like
mongodb sharding works, by range? So for example i have a SaaS and each
customer has 1 database. And the databases are migrated from one server to
another automatically? If not is there something similar? Thanks

~~~
joevandyk
Not yet. <http://postgres-xc.sourceforge.net/> should be released soon.

~~~
pvh
I wouldn't expect too much from XC in 1.0 state. When I last looked in on it
there was still a ton of work to do before it was really operationally ready
to put in production.

I'd love to hear from anyone who has managed to deploy the beta.

------
jfb
Two I particularly like:

\- CREATE DOMAIN

\- ON UPDATE CASCADE

Oh, and yeah, a BOOLEAN datatype. That's one in the eye to Larry Ellison.

What would make it absolutely the bee's knees would be a decent data language
on top of that lovely data engine. Something closed over composition, with a
simple, orthogonal syntax. NoSQL in the truest form — not a blind rejection of
Codd's algebra, but something LESS DUMB STUPID THAN THE WRETCHEDNESS THAT IS
SQL.

~~~
pvh
I've been agitating for an API that would allow developers to build their own
query languages on top of Postgres alongside of SQL. Some day.

~~~
luriel
Yes! Please! The time for saner alternatives to SQL for querying relational
databases is long overdue.

Is sad that Postgres completely replaced QUEL with SQL, it was not the best
language ever, but some competition and alternatives in this area would be
good.

Probably the sanest alternative around today to SQL would be Date's D (not to
be confused with the "C++ on steroids" D language), see:
<http://en.wikipedia.org/wiki/D_(data_language_specification)> and
<http://www.thethirdmanifesto.com/> (evidence that database types should never
be allowed to design websites :))

------
rfurmani
Is there a good inexpensive hosting option for postgres aside from just
running it yourself? Sort of like RDS on Amazon? Alternatively a DBaaS host
for postgres?

~~~
mazelife
Something like this, maybe? <https://postgres.heroku.com/>

~~~
rfurmani
With a reserved instance, RDS starts with $20/month for the small, $92/month
for the large, and it's easy to change the size as the need arises. Heroku
Postgres starts at $200/month, which is more than I'd want to pay for an
early-stage project that I'm still experimenting with.

------
gbog
Two points:

\- using pg'a arrays is using SQL against its grain. If you need multiple
phone records per user just create a user_data table with three columns (user,
key, value)

\- the number one reason to prefer PostgreSQL over MySQL is just correctness.
Every second query I write in MySQL get a wrong answer.

~~~
empthought
It's not true that using arrays is using SQL against its grain. If this is an
arbitrary list of phone numbers then an array is fine -- the domain being
modeled is "list of phone numbers" not "phone number."

If you need to query specific phone numbers, then you should have separate
columns whose names describe their roles (e.g. cell_phone_number) rather than
the EAV monstrosity you proposed. :)

~~~
gbog
> rather than the EAV monstrosity you proposed. :)

I would really like to know what is monstrous in the "Entity–attribute–value"
model, I scanned the Wikipedia article without finding any obvious problems.

I had, however, many many problems with separate columns for each values: I
have to handle tables with over 600 columns, for example "fixed_phone",
"fixed_phone_modified", "fixed_phone_modified_by", "cell_phone",
"cell_phone_modified", "cell_phone_modified_by", etc. With an EAV variant,
I'll have "entity", "attribute", "value", "modified", "modified_by",
"deleted", and it will be much cleaner, or prove me wrong.

~~~
empthought
All of those modification times/users should be in a separate audit trail
table, which likely will look like an EAV table. That's OK because it's
tracking the history of updates, e.g. (time, table_name, column_name,
old_value, new_value, changed_by).

The current (rather than historical) values should not be in generic EAV
tables because putting data of different domains in the same column subverts
both the relational model and the DBMS's indexing capabilities.

If the point is to label telephone numbers with a role, then a person_id,
phone_number, phone_number_label table should be fine.

------
cageface
I'm pretty impressed that something of this complexity can be implemented
entirely in C. People complain about the lack of tools for building higher
level abstractions in C but this shows that with the right code discipline it
can scale quite high.

------
alexchamberlain
Great article.

As someone coming from a Maths background - instead of pure CS - I'd never
heard of some of the features you highlighted. If you consider expanding the
article, please could you put in a line or 2 explaining what a feature is?

------
ams6110
I love that there are so many options for writing stored procedures. There is
the traditional pl/pgsql (matches quite closely to Oracle's plsql) but also
Tcl, Perl, Python, Ruby, Lua, and others.

