

Ask HN: When to use MySQL vs PgSQL? - Bjoern

Dear HN,<p>first of all thank you that you are reading this. Out of caution, please let me remind you that I don't want you to take this question and to start a "flame war". Arguments with facts would really be <i></i>constructive<i></i>.<p>I am sincerely interested if my current assumptions are true or where I am wrong.<p>Background:<p>My current employer asked me about MySQL and I mentioned those points below which were later countered with this:<p>http://pastie.org/456941<p>(Large) Parts of this gave me the classical "WTF" moment. So I thought I ask you about this.<p>Detailed arguments against MySQL from my side:<p>A long time ago I saw this nice PDF where there were some measurements between PgSQL and MySQL. 
http://people.freebsd.org/~kris/scaling/7.0%20and%20beyond.pdf 
Title: "FreeBSD 7.0 and Beyond" on Page 9 following there is a Case Study of MySQL vs. PgSQL.<p>This was for me the first reason to really take a closer look at MySQL.<p>Secondly I know that MySQL uses two different internal SQL engines. A X/Open XA distributed transaction processing (DTP) support; two phase commit as part of this, using Oracle's InnoDB engine<p>Full-text indexing and searching using MyISAM engine<p>Should Oracle at some point decide to revoke the usage rights of InnoDB for the open source community then users of MySQL would need to
switch to MyISAM which is, frankly said, terribly slow.<p>http://en.wikipedia.org/wiki/MySQL#Criticism
Regarding the licensing renewal which is necessary. Now even Oracle is planning to buy Sun, which would make me think hard why they would renew the license of InnoDB for MySQL anyway? (creating competition - "multi-year" extension of their licensing agreement, why would they continue?)<p>The next reason was that I have no theoretical, but empirical proof (I didn't enjoy debugging that) that I end up more often with table data corruption if I run MySQL. Ok, I can repair it, but why should that take up more of my time away doing other more productive things?
(It is even mentioned in the Criticism point of the wikipedia link above).<p>Also critical bugs get fixed slowly. E.g. It took them from 2003-2008 to fix this bug.
http://bugs.mysql.com/bug.php?id=989<p>Dear HN, please help me with this question.
Again, thank you for your time.
======
mdasen
Use a good abstraction library and start coding. Seriously.

PostgreSQL tends to be what is favored by people here. I used to use it for
everything until I tried to set up a replicated cluster with it. PostgreSQL's
replication is severely lacking. Slony-I, the most mature of the bunch, is
very difficult to set up (with many steps needed for every single table) and a
lot slower than MySQL's replication (due to its use of SQL and triggers rather
than binary log shipping as well as the fact that its design causes
communication costs to grow quadratically).

However, PostgreSQL's query planner is a lot better than MySQL's (especially
if you're doing something like subqueries) and I find that complex queries run
decently faster. The community process is also a lot more attractive.

To address the Oracle question:

Oracle cannot withdraw InnoDB. It's GPL licensed. Anyone can fork it should
Oracle decide they don't want to play ball - just as anyone can fork MySQL. In
fact, there are already forks underway including Drizzle. So, that isn't a big
deal. The bigger deal would be if Oracle decided to halt future development.
I'm guessing they won't since it would simply mean they would loose control to
the community that still has rights to the GPL'd code.

In the end, don't worry about this issue. It distracts from what is really
important: actually creating something. They both work fine. There are plenty
of abstractions that will allow you to create code that will work with either
with no modifications. Do that, build your application, and let the pundits
from either side debate this issue until they're blue in the face while you're
actually creating useful things.

~~~
jasonkester
Your job as a developer is to pick a technology and build your application.
Building/implementing an entire extra layer onto your project just so you
don't have to make a choice is a bad move. The technical term, I believe, is
Yak Shaving.

~~~
silentbicycle
Postponing the choice until you have enough data to make an informed decision
is a good idea, though, and trying to isolate direct interaction with the
database to a relatively small number of code paths would be worth doing
regardless. If your code is so entangled with the database code that you
_can't_ separate them easily, you have much more immediate problems than
theoretically having to coping with scalability someday (if you're lucky).

~~~
jasonkester
Out of curiosity, how many times in your career have you decided to switch
databases halfway into a project?

In 15 years of doing this for a living, I've never actually needed to do so.
With that in mind, I don't feel particularly worried that my SQL is not
instantly portable.

~~~
silentbicycle
I've moved from sqlite to serialization via Lua on one, when clarifying the
problem made it clear that using a relational database was actually a poor
fit. Most of my examples aren't database-related, though. I agree that
switching databases doesn't happen much, I'm just noting that having a major,
conceptually distinct part of a project entangled with the others is usually a
sign of other problems. (Maybe that's just my experience maintaining legacy
codebases, though.)

I think that getting preoccupied with moving from one database to another for
scalability reasons is usually wishful thinking -- it's like worrying about
all the positive attention you're getting.

------
aditya
MySQL

Pros:

* Easier to replicate (and hence to scale for some people)

* Easier to find DBA's for

* More popular (twitter, sixapart, etc. use them)

Cons:

* MyISAM is slow as hell

* InnoDB is better but may have licensing issues

* Oracle owns them now, unclear whether it is for better or for worse

PgSQL:

* Better performance of DB engine

* Excellent query planner and analyzer

* Rock-solid in most aspects, with the core team dedicated to putting out a good product

Cons:

* Lack of a canonical, good, production-ready replication method (current choices: Slony, pgPool, Continuent, Londiste)

* Hard to find good DBA's

* Less popular to some extent

There you go. Maybe it boils down to performance (pgsql) vs replication
(mysql), but that's just from what I know...

~~~
brianm
Was going to post a response to this, but you nailed it.

Anecdotally, we moved off postgres because we could not hire (full time,
contact, or consult) _anyone_ who would be considered an expert at postgres
replication, and the cost/benefit of switching off compared to growing such an
expert favored switching off.

Postgres is my go-to database where a regular backup, rather than warm/live
replica, meets availability needs.

------
apinstein
I've been using postgres in production situations since 1998, at 4-5 different
companies. I've _never_ had a problem with it.

OTOH, the many times I've tried to use mysql I've run into gotchas even in
development that were unexpected and obnoxious. Also, I don't really like the
dual gpl/commercial license thing.

My faves about Postgres:

\- Very mature OSS development community with amazing devs/community support

\- Ridiculously stable

\- BSD license

\- I practically never run into a postgres bug.

\- GIS support

\- ltree data type (native trees with powerful query syntax)

\- Clustered indexes

\- Very fast under load

\- Simple to manage

\- Lots of tuning options

To sum up, I've used it forever and it's never caused me a problem. I am a
postgres fanboy.

------
artificer
Since one can easily find comparisons, benchmarks, gotchas etc ( you can
always start here: <http://www.wikivs.com/wiki/MySQL_vs_PostgreSQL> ) the one
thing that made me decide in favor of PostgreSQL is this (quoting a blogger):

MySQL is an open-source PRODUCT (and can be bought as such). Postgres is an
open-source PROJECT (and can't be bought, and will be around as long as there
is a sizable developer pool).

<http://www.jiaozhoujob.com/newsprint-995.html>

EDIT: PostgreSQL also seems to take support very seriously. Their latest
security advisory has a patch back to version 7.4, which was released in
November 2003.

------
jcoby
PostgreSQL gets a lot of flack for not having replication. It's been my
experience that it just doesn't matter in practice. Let me explain.

MySQL's performance sucks. It's great lightly loaded and with little
concurrency. Add load and it noses over. (I've read that this is mostly
because of the malloc they use. Regardless of the cause, it's a problem.)

I tested MySQL 5 vs PG 8.3 to handle session storage. This is about as simple
of a test as possible. PostgreSQL outperformed MySQL by a significant margin
across every load scenario, sometimes up to 2x the capabilities. PG's
performance flatlined with load. MySQL flatlined and then hit a wall and
dropped significantly as load and concurrency rose.

So the first thing someone does when MySQL noses over is to buy faster
hardware. But that doesn't really help as much as it should. So they add
replication to be able to distribute load across multiple servers. That's all
fine and dandy until you realize that for every rep node you add, you are
adding overhead for writes. And the headaches of keeping MySQL's brain dead
replication in sync. Eventually you spend all of your time waiting for
replication to finish or rebuilding a failed sync or chasing down ghosts in
the system.

Back to postgres. Its performance scales nearly linearly with load and number
of cores. Adding more hardware resources actually improves performance
accordingly. So you end up being able to stay on one machine longer.

If you're truly going to have more load than one server can handle, break the
data set up into smaller pieces (sharding).

Postgres is an awesome database. It behaves consistently and correctly. It has
a lot of polish when you get into it. psql is a pleasure to use. It supports
more of the SQL standard than MySQL. It's query analyzing tools are great. The
only real pain point is that you have to do a dump/load to upgrade point
releases (8.1 -> 8.2).

With all of that said, MySQL is great for smaller sites. It's available
everywhere and every framework and language supports it. There are a million
people who can administer it. Postgres is a lot harder to find competent DBAs.

PostgreSQL has been improving significantly throughout the 8.x series.
Autovacuum has made life easier for everyone. Full text search is now built
in. It now supports recursive queries and a bunch more.

------
silentbicycle
While this is somewhat more subjective, documentation and tutorials I've read
for PostgreSQL consistently seem better informed about relational databases
than MySQL's. I keep seeing cases where somebody is having performance issues
with MySQL, and it turns out they neglected to do something basic like
indexing their tables. This is probably _partially_ due to its reputation of
being good for the web, and thus getting a used by a lot of people who kinda-
sorta know PHP but know nothing about the relational paradigm, but MySQL also
has a history of ignoring fundamental relational concepts
([http://www.reddit.com/r/programming/comments/764fp/mysql_vs_...](http://www.reddit.com/r/programming/comments/764fp/mysql_vs_postgresql/c05sayb)),
so it's not a coincidence.

Also, it irritates me when people write off relational databases because of a
bad experience naively using MySQL, but that's neither here nor there.

You should probably program for a neutral wrapper and do _actual testing_ to
determine which will work better for you, though. I like PostgreSQL, and have
had very good results with it, but I've never come close to needing the sort
of dramatic scalability that people seem to worry so much about.

~~~
olefoo
> I've never come close to needing the sort of dramatic scalability that
> people seem to worry so much about.

s/worry/fantasize/

~~~
silentbicycle
I actually went back and forth on the wording, there. I decided "fantasize"
sounded a bit too harsh -- some people are probably just underestimating how
much one RDBMS can handle, due to a combination of buzz about scalability
issues and bad experiences running databases without indexes, normalized
tables, etc.

~~~
olefoo
Well, a lot of people here think that their web-delivered software as a
service offering is going to be an instant mega-success and they will need to
scale up to meet 100k requests per second in the first day.

The truth is a much more pedestrian than that. For one thing, outages can be
good publicity; I'm willing to bet many more people heard about twitter
because of people bitching about it being than any other reason. For another,
you should never look further than one order of magnitude ahead when scaling a
web app because the load profile changes too unpredictably as you climb the
curve; you can't tell what 100k visits/hr looks like from 1k visits/hr but at
1k v/h you probably have a good idea of what will kill you at 10k.

------
cperciva
If you asked me a year ago, I would have said "use whichever you're most
familiar with". Right now, given the unclear situation with MySQL development
(will the real MySQL please stand up?) I'd say to go with postgresql.

------
astrodust
MySQL has the advantage of being better supported by the developer community,
but otherwise of being a technically inferior product to PgSQL.

If anything, the trouble with PgSQL is it's too feature rich to be easily
understood by the typical developer. It's more DBA-grade than MySQL.

If you're building a real database driven application, though, having an
engine like PgSQL is a real asset since it gives you a lot of capability
within the database itself that MySQL does not easily provide.

That being said, MySQL is a fantastic engine for some applications that use it
as a fancy file system. For example, message boards, blogs and chat rooms are
often driven by MySQL because it's nearly foolproof to install and is mostly
self-maintaining.

MyISAM is actually very fast database engine, but that comes at the price of
forsaking transactions, foreign keys, and overall data integrity. For some
applications this doesn't really matter and "good enough" will do.

While you can outgrow MySQL's feature set, it will take a long time to exceed
the capabilities of Postgres. Most likely by the time you do, you can afford
to hire a DBA so it won't be your problem.

------
amalcon
There used to be a tremendous gap between these two databases. MySQL had an
incredible dearth of features (subselects, triggers, TRANSACTIONS, and others
were missing). PostgreSQL had the features, but it was relatively unpopular,
so the language bindings were generally immature and not well documented. Then
there was Interbase/Firebird, which was even less popular but was supposed to
have an even better feature set (I never used it).

These days, both have come a long way. MySQL has most of the necessary
features, and PostgreSQL has perfectly good bindings in most popular
languages. It doesn't matter so much which you use any more. The reputations
remain, however, that MySQL is lacking critical features and that PostgreSQL
is black magic.

The legal issues around PostgreSQL are probably a little more stable at the
moment, given the Oracle question (it now owns the trademarks). If Oracle
starts causing any serious problems for MySQL, you can bet "YourSQL" will
appear very quickly (sort of like how Iceweasel did). I wouldn't worry too
much about it. Pick one, and start coding.

------
sanswork
I use MySql for all my projects these days. In the past I've used both
Sybase(when working at one of the big investment banks) and Informix(when
working at one of the worlds leading customer tracking systems) and postgres
for a number of small person projects.

My current project is a logging/reporting system which has between 50-75
million new rows per day amongst 2-3 tables in MySql with very few problems on
a single Amazon EC2 instance(I've had to switch to partitioned tables for
reporting recently though). My last employer did a tracking system with over 1
billion impressions a day on MySql with no problems(Multiple servers)
Including near realtime reporting(delayed by about an hour).

Thats my plus side for MySql. I really have no negatives for Postgres but I
haven't used it much so I can't honestly report from either side. Others I am
sure will be able to add more from that side of the argument.

------
bryanalves
My opinion/sense is that Postgres is better in every way, except replication.
Which ironically makes it far worse than mysql since you can't easily use it
for huge datasets.

~~~
olefoo
There are several replication solutions for PostgreSQL but they all require a
level of sophistication and configuration from the administrator. One of the
biggest issues I have seen in the field with setting up replication for
postgres is that people design databases without primary keys and with
duplicate rows. A DBA who understands the relational model wouldn't do that in
the first place; and none of the common solutions for replicating a postgres
database will let you copy a database with broken referential integrity.

It's a feature.

~~~
silentbicycle
No kidding. Why are people so surprised that using an RDBMS well requires an
understanding of databases? Database administration requires a very different
skill set than programming, and DBAs have different priorities (e.g., Do NOT
corrupt the data. _Ever._ ); they're not just there to inconvenience
programmers.

It seems like almost every time I've read a review of a database book, there's
been somebody complaining that it's "full of a bunch of boring theory" (or
else, "at least this book doesn't have much boring theory"). At the same time,
some programmers nonchalantly do things that could land them on a DBA's
equivalent of The Daily WTF (e.g. [http://ask.metafilter.com/117908/Theres-
got-to-be-a-faster-w...](http://ask.metafilter.com/117908/Theres-got-to-be-a-
faster-way-to-update)). It's a weird disconnect.

~~~
olefoo
Well, there's a couple of issues out there; one being that most of the DBA's
I've met were coders before they specialized, so on average most DBA's are
more experienced. Another is that a shocking number of people in the industry
don't understand and don't want to understand the basic concepts underlying
the field. There is an unfortunately large class of developers that seem to
want a pile of Lego Bricks that they can assemble into a semblance of
functionality and who don't want to "bother" with understanding the
fundamentals.

/me apologizes for the rant.

------
bradgessler
If you're a Rails Developer; MySQL does NOT support transactional migrations,
which means if something screws up during a deployment your database might be
left in an inconsistent state.

If a migration screws up and you're running Postgres, the schema (and data)
will just roll-back.

------
madkangas
This question is 11 hrs old, and nobody has mentioned Postgres' need for
periodic "VACUUM" operations? Seriously?

If you are running a system with significant data churn (think: a Reuters news
feed expiring at N days), Postgres is at a massive disadvantage if you're
concerned with system throughput. Postgres 8.4's rewritten Free Space Map
looks promising, but 8.4 was just announced in beta on 4/15/2009. In 8.3 or
earlier, you can expect significant latency if "VACUUM" is going to touch any
significant percentage of your table size.

Given sufficient data churn with a Postgres system in a processing pipeline,
we can be talking about an hour or more for the "VACUUM" operation per
evening. Yes, I speak from experience. The MySQL (5.0.45 InnoDB) systems
downstream from these Postgres DBs have zero measurable latency due to space-
recovery operations.

------
modoc
Another vote for Postgres. MyISAM is a joke for serious use, and InnoDB isn't
better/faster than postgres in my experience.

Postgres has felt much more mature and more focused on delivering things you'd
expect in a real database (FKs, etc..) for a longer time.

------
jasonkester
I've never used PostgreSQL, but have spent my fair share of time programming
against MySQL. It never fails to amaze me how far behind the curve MySQL is,
that it still has table corrupting bugs happening frequently in the wild, and
how unconcerned its developer pool is about this state.

So from that background, if I were forced to choose between the two for a new
project, I'd go with Postgre. I know nothing about it at all, except that it
seems to have a good reputation, and it's not MySQL.

Disclaimer: I spend the bulk of my time in SQL Server, and I have trouble
understanding why anybody would use anything else out of choice.

------
Zarathu
These days, it doesn't quite matter. Especially if you're using something like
Rails, which can switch between the two very easily. Just modify your config
if you later feel you want out.

------
alanthonyc
I'm glad you brought this up because I'm struggling with the same question.
Since I'm developing in Django, I am currently abstracting out the db layer
while I do the programming. However, I will eventually want to make a
selection for when I launch the app and have been trying to decide between
these two.

I'm currently leaning towards PostGres but am interested in hearing the pros
and cons of both. The recent purchase of Sun by Oracle makes things a little
more interesting.

~~~
lacker
For Django work I prefer postgres just because more Django projects use
postgres, so when you have issues it's more likely that someone else on a
message board has faced exactly the same thing. My projects haven't come near
the limit of a single server, though.

