
 PostgreSQL vs MySQL: an apples to oranges comparison - ottbot
http://ledgersmbdev.blogspot.co.uk/2012/09/or-modelling-interlude-postgresql-vs.html
======
3amOpsGuy
Really well thought out article.

The single app vs. multiple apps distinction is a good frame for constructive
discussion.

Not so much because Postgres is better or vice versa but because in these days
of agile and YAGNI development* there's a real problem with some developers
not thinking further than the next iteration.

I like that development shouldn't be weighed down by 'what if we'll need X
someday maybe' but it should have at least a rough roadmap - the terminology i
use next is irrelevant but I'm getting at the idea of having macro, micro and
miso level goals defined - in the future we'll deliver architecture concern Y,
so let's not box our implementation into an incompatible state.

------
crazygringo
Wow, this is one of the best articles I've read on HN in a long time. It's a
fantastic insight, and explains so much.

I would _kill_ for a whole series of articles like this, explaining the
philosophical underpinnings behind various technological alternatives.

So often, HN comments seem to degenerate into PHP vs everything else, iOS vs
Android, OSX vs Linux, MySQL vs PostgresSQL, based on nothing else but
people's personal experiences and preferences.

Taking a step back and exploring the _why_ and _history_ behind products, and
what problems they were intended to solve, feels so refreshing and
educational.

Kudos to the author (Chris Travers)!

------
smoyer
I've never been able to completely dismiss MySQL since it obviously works for
so many users, but I also haven't found it very useful in my work. I started
with PostgreSQL in the '90s and every time I tried to use MySQL, I found
myself asking either "why did it do that"? or "why would I want that"?

The article describes two databases that are very different conceptually, and
I think they're mirrored by users that conceptually think about data in
different ways. So if one or the other works well for you, go for it.

I should note that I have successfully deployed applications on both platforms
... once I started doing most of my database interactions through an ORM, the
differences didn't really matter any more.

Vive la difference!

~~~
lmm
I think MySQL's original success came from features that are less obvious when
developing (although that said it does have some friendlier development syntax
(e.g. show create table) and lower startup latency in interactive use). It
made running multiple instances on a single host easy. MyISAM tables give
amazing performance in return for weakening many consistency guarantees, which
is a worthwhile tradeoff for many use cases. Postgres is only just starting to
gain the level of clustering support that MySQL has had for years.

~~~
jeltz
The performance benefits of MyISAM are exaggerated. Its advantage was that is
was a decently fast and very simple engine which made it possible for MySQL to
reach the market quickly.

MyISAM only gives good performance when you either have only reads or just one
single writer. And even then it does not always win over InnoDB and
PostgreSQL. And as soon as you get a mixed load MyISAM performance drops like
a stone.

------
VLM
Quick summary, assuming you know what MVC stands for:

Mysql - Model is in your code. PostgreSQL - Model is at least partially in
your database.

There is a HUGE mistake in the article in the assumption that WRT the model
design, that the database always knows best. Its possible to come up with
weird situations where you just want the DB to store stuff and not nanny you.
Consider a database of actual, real world, gravestone inscriptions. If
someone's gravestone stone has "1890-02-30" inscribed on it, I know thats
wrong but I don't care, I need to store it exactly as is for historical
purposes, I don't want a DB crash or need to recompile postgres to accept it,
I don't want to force the users to falsify gravesite records, I don't want to
have to store as a CHAR or VARCHAR and have to write my own date handling
routines in my app... The correct way to handle data modeling/integrity is to
allow the app designer to decide how flexible he wants to be WRT reality, and
let him decide exactly how to shoot himself in his foot.

On a bigger scale, if I made a database table and related CRUD app to store
philosophical positions, if I wanted an AI to only accept "truth" then I'd put
the AI in my app, I would not want the DB model and the app model to have to
fight over Marx being right or wrong before the data could be stored. What if
filesystems needed to verify "truth" before allowing a file to be saved?
Weird.

Is it persistent storage or is it a turing complete theorem prover and why
most both be in the same executable? Note I'm not claiming a "middleware" of a
model is a bad idea, in fact its a great idea, it just doesn't belong in the
persistant DB store anymore than it belongs in the filesystem layer.

~~~
xtdx
How is MySQL going to store "1890-02-30" as a date? What internal format does
it use to allow storing a date like that?

~~~
VLM
CHAR(10) worst case, or probably something a lot more like rowname.year INT,
rowname.month INT, etc. Yes you could do your own homemade date type in that
in postgres and your query would look like "SELECT _" and then you'd write
your own date DBMS routines, but it would be icky. Compare the execution time
of "Select_ from blah order by somedate limit 10" on each design, especially
if the DB and webserver are on separate boxes.

It comes down to the fundamental question of who defines bad data, the DEV in
his model or the DBA in his table design. Worst case is both, with no
coordination, second worst case is both with coordination (wasted effort)

~~~
xtdx
Wasn't it you who said "I don't want to have to store as a CHAR or VARCHAR and
have to write my own date handling routines in my app"? rowname.year and
rowname.month sounds a lot like writing your own date handling routines.

~~~
einhverfr
And I am confused as to why you wouldn't use varchar or char to record, you
know, inscribed writings. I mean if it says 1890-03-300 I assume you'd want
the extra zero recorded, right?

------
gmac
(See also: <http://howfuckedismydatabase.com/>)

~~~
mh-
well, that's completely pointless.

------
jaakl
I have a story about performance of two of these. I maintain a Drupal website
which under MySQL. Single server system, not very high load. But recently,
last Monday it was in news and load increased some 10 fold. People started
complaining that it does not respond, or gives "offline" message. So I managed
to login to server and it had load average 130 or so, mostly under MySQL.
Drupal gives no mercy to DB for sure. It just so happened that I had a recent
copy of the site with PostgreSQL, as I had planned to switchover to use
PostGIS, but not everything was working there yet. But site was down anyway,
so I saw good option to test it against high load, I reconfigured site and
restarted Apache to direct traffic to the new database. After minute or two
the load average was in the range 5-6. Not normal for 2-core server really,
but significantly lower than before, and the site was saved, and I was not
able to perform the load test with real users. PostgreSQL was not tuned at
all, just plain yum installation of latest 9.1.4, how much MySQL tuning was
done I do not now, it was set up before my time. But it smashed my basic
belief that MySQL should work better for large number of simple transactions.
Stupid thing is that now I know less than before: I don't know any case where
MySQL could preferred.

~~~
taligent
Did you try Percona or Twitter/Facebook's version of MySQL ?

------
dgregd
If you cannot compare PostgreSQL to MySQL then what databases can be compared?

~~~
dotborg
the point of this article is that Postgres is not just a database

~~~
einhverfr
You have to define "just a database" (I am the author btw).

PostgreSQL is a very different kind of database than MySQL is. MySQL is closer
to a persistence layer for an application with an ability to plug into other
reporting tools. PostgreSQL is a data modelling platform.

Both are "just databases" or not depending on how you want to define
"database." I would say it is clear that PostgreSQL is not just an RDBMS but
that's the point of other articles on the blog, not this one.

~~~
dotborg
postgres provides much more capabilities than just "dumb storage of the
application's state"

~~~
einhverfr
No argument there. See my other posts in the Object/Relational modelling
series for example.

~~~
dotborg
I might refer to one of your comments to this entry:
<http://news.ycombinator.com/item?id=4495749>

when your code goes into RDBMS engine, then it's no longer just a rdbms engine

~~~
einhverfr
Well, in this case that's one of the things that makes it an ORDBMS engine.

One of the interesting things about writing the series I have been working on
PostgreSQL as an ORDBMS is that it has helped me solidify my understanding of
how these features fit together. It has also forced me to think about
separation of concerns a lot more.

What this has taught me so far is that PostgreSQL is an absolutely amazing
data modelling platform. Sure that's a lot more than an RDBMS engine. The
difference is somewhere between a math program capable of doing algebra and
one capable of doing symbolic manipulation to solve derivatives and integrals.
(Indeed this is forcing me to rethink the way I approach relational math to
account for the sorts of things an ORDBMS can do.) So yeah it is a lot more.

But "just a database" is a hard statement to nail down. If BDB is "just a
database" then MySQL is "not just a database." As Paul Brown put it in "Object
Relational Database Development: A Plumber's Guide" this basically makes an
ORDBMS an information backplane for software services. It becomes an
interesting whether whether "database" is a superset of that or not.

------
lobster_johnson
It's worth mentioning that while Postgres does promote the idea that the
database is the gatekeeper to the data -- where you're supposed to manage data
with stored procedures, triggers and so on -- it's not in any way mandatory,
and not necessarily the way people prefer to use Postgres.

Personally, I lean towards treating the database as a passive data store
controlled by the application, as opposed to using the database as an
application platform. For example, validating and normalizing is done in the
client application.

While I would also be happy about putting logic in the database, this
currently requires a split of the data logic between the application and the
database in a way that I don't like: Stored procedures, functions, triggers
and so on are all _persisted_ alongside data. (Internally they are treated
very much like data by the database, in fact.)

This is unlike the form of the application, which manifest itself in its
source code, which resides in a Git repo, and so on. The database code is
always "live", whereas the app code runs at my behest. There is a reason why
the article uses "create or update procedure", not "create procedure".

So if I move some of my code into stored procedures, triggers, rules, checks,
etc., I have to push this code to the database using SQL scripts. This drives
the code into a kind of dark, murky hole with respect to visibility and
versioning. What code am I running? I'd have to look into the catalogs to see.

Also, can I run different versions of the code concurrently? Not in the same
database, apparently?

And how do I seamlessly, elegantly upgrade the code? Sure, I can manufacture
SQL scripts that do the "create or update procedure" dance, but what if I
removed a procedure? Or what my database schema changes -- then my code has to
move along in tandem with the schema.

As far as I know, current database migration tools just aren't very good. In
other words, for me, personally, I don't think the toolchain is quite there.
It sounds like a nightmare to manage. Having dealt with libraries such as
PostGIS, which are built on functions and stored procedures, I know how gnarly
this system can be.

Personally, I would like to see a database that supported non-persistent data
logic. It could be plugging in actual source code (put foo.rb in a folder,
database will run it and export its methods as database functions) or
interfacing through an API (instead of "create trigger" to create a trigger,
have the database call my REST API /user/validate for each row) or similar.

~~~
einhverfr
_Personally, I would like to see a database that supported non-persistent data
logic. It could be plugging in actual source code (put foo.rb in a folder,
database will run it and export its methods as database functions) or
interfacing through an API (instead of "create trigger" to create a trigger,
have the database call my REST API /user/validate for each row) or similar._

The question is where you put your API. The biggest tradeoff I see is whether
you can run on many different RDBMS's or whether you are tied to one RDBBMS.
But similarly the question becomes to what extent your db is tied to the
application and to what extent it can be used safely by many apps.

So that's a big tradeoff. The rest can be solved the same way you solve the
problems elsewhere, and there are some advantages to being able to put your
unit tests in db transactions and roll them back.

There's a second big issue too which rarely gets noticed. Doing application-
style development in the db rarely works well. To do this well, you really
need to make your queries front and center, write good, clear queries, and so
forth. I have seen what happens when app developers try to write stored
procedures and I don't really recommend that.

------
ibotty
great article. i am amazed. (many articles from ledgersmb folks (or only this
person, i don't remember) are great.)

------
tesmar2
I think it really depends on the type of apps you write. If you write small -
medium rails apps, then the db is usually just dumb storage and if you are
careful to use the ORM properly, then DBs are just plug and play. If your app
is larger and you need more features, then you look more deeply into the
features of each in which, most of the time, POSTGRES is the clear winner.

~~~
VLM
"If you write small - medium rails apps"

If you are writing using a MVC like rails, mysql is easier because the M is
solely in the rails app rather than some of the M being in the rails and some
of the M being in the DB configuration.

Theoretically there's no postgresql issue if you're comfortable splitting
design and config stuff into two areas is acceptable IF all the devs are also
DBAs, or if you are careful to never use any of the features of postgresql
(turning it back into a persistent store), or if you never use any of the
model features of rails to enforce data constraints, or if the DBA and the
DEVs are on exactly the same page... adding a spinlock like that across
functional areas, or maybe even across departments, is rarely a win.

~~~
fdr
For many small Rails web applications, details having sane (and rich) datetime
support in the database is both nearly invisible (turning bugs into errors,
and working about the same otherwise) and very useful. The same could be said
about silent truncations of strings (one can get the same behavior in Postgres
by using a explicit cast, or just a built-in function).

LedgerSMB -- an author of which wrote this article -- is on the advanced side
of the spectrum in using specific Postgres-isms, and that's not for everyone.

There are grades in-between: using arrays for simple kinds of matching in
Postgres is not rocket science and can make your life a lot easier (and
faster):

    
    
      SELECT * FROM blawg WHERE tags && '{a,b,c}';
    

Yet another use case might be full-text search, where integrating and
synchronizing an external information retrieval system (like Lucene, Solr) for
simple search use cases are just more trouble than it is worth. With slightly
more work than the above, one can use Postgres's Full Text Search feature.

So here I think you are supposing a false dichotomy: there is a continuum
available on both databases, but the continuum extends a lot more into 'the
database can know something about the data' territory in the Postgres
implementation.

~~~
einhverfr
Certainly it isn't for everyone.

The thing is, LedgerSMB aims to be a database for many apps. You can't do that
and still have an app that runs on many databases.

I am not convinced that grades in between are all that great. I think you have
to choose the degree of portability you want first and the level of db-as-api
you want first and then program to that.

------
dscrd
Are there any books out there that describe how to efficiently use
PostgreSQL's object-relation techniques?

~~~
dotborg
postgres docs are full of performance tips

~~~
joedoe55555
I learned SQL with Postgres docs ;)

------
dkhenry
every time I read a PostgreSql vs MySQL post the only thing I can seem to take
away from it is that DBA's really seem to hate the fact that MySQL has made
them obsolete and really want you to think you should switch to a platform
where they are necessary.

~~~
joedoe55555
Postgres is definitely a pain in the butt to properly setup in a network, in
particular if you have never done it before. (And just talking about non-
cluster use)

However, once you know the steps necessary to setup a postgres server, it's a
piece of cake. In fact it's even fun because Postgres is a software that comes
in pretty much the same packaging in every version. And I can even top that:
with pgAdmin it has a powerful, consistent and stable tool to manage
databases. (MySQL seems to be miles away from that)

I think the impression that Postgres is more difficult than MySQL roots in the
fact that it's more difficult to install at home. (Why so many installation
steps? MySQL is basically just installing the package, changing pw and done?)

I worked around 2 years with Postgres as a developer and I loved it. Very
stable, very solid, many features and no surprises.

MySQL on the contrary seems like a toy db, at least when you want to do a lot
of relationship stuff with foreign keys etc. It feels really awkward that
outer white-spaces have no meaning, there is no boolean type and that the
admin tools out there seem to be really immature.

Having to work since nearly two years with MySQL, I find it still painful. If
you don't store Petabytes of data and don't use it for a highly frequented
website, Postgres is probably your choice.

~~~
jeltz
> I think the impression that Postgres is more difficult than MySQL roots in
> the fact that it's more difficult to install at home. (Why so many
> installation steps? MySQL is basically just installing the package, changing
> pw and done?)

I think much of the blame here can be put on the people writing installation
guides on the Internet. The guides often suggest setups more suited for
networked databases than for a local database. When I install PostgreSQL for
development machines it is just these two steps.

    
    
      1. sudo apt-get install postgresql-9.1
      2. sudo -u postgres createuser -s `id -un`
    

After that I can create whatever databases I like.

So it is the same number of steps to setup a PostgreSQL and a MySQL database.
I admit it took me some time to figure out that this was the simplest way, and
I have yet to see a installation guide which suggests this method.

