Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: When to use MySQL vs PgSQL?
36 points by Bjoern on April 24, 2009 | hide | past | favorite | 42 comments
Dear HN,

first of all thank you that you are reading this. Out of caution, please let me remind you that I don't want you to take this question and to start a "flame war". Arguments with facts would really be constructive.

I am sincerely interested if my current assumptions are true or where I am wrong.

Background:

My current employer asked me about MySQL and I mentioned those points below which were later countered with this:

http://pastie.org/456941

(Large) Parts of this gave me the classical "WTF" moment. So I thought I ask you about this.

Detailed arguments against MySQL from my side:

A long time ago I saw this nice PDF where there were some measurements between PgSQL and MySQL. http://people.freebsd.org/~kris/scaling/7.0%20and%20beyond.pdf Title: "FreeBSD 7.0 and Beyond" on Page 9 following there is a Case Study of MySQL vs. PgSQL.

This was for me the first reason to really take a closer look at MySQL.

Secondly I know that MySQL uses two different internal SQL engines. A X/Open XA distributed transaction processing (DTP) support; two phase commit as part of this, using Oracle's InnoDB engine

Full-text indexing and searching using MyISAM engine

Should Oracle at some point decide to revoke the usage rights of InnoDB for the open source community then users of MySQL would need to switch to MyISAM which is, frankly said, terribly slow.

http://en.wikipedia.org/wiki/MySQL#Criticism Regarding the licensing renewal which is necessary. Now even Oracle is planning to buy Sun, which would make me think hard why they would renew the license of InnoDB for MySQL anyway? (creating competition - "multi-year" extension of their licensing agreement, why would they continue?)

The next reason was that I have no theoretical, but empirical proof (I didn't enjoy debugging that) that I end up more often with table data corruption if I run MySQL. Ok, I can repair it, but why should that take up more of my time away doing other more productive things? (It is even mentioned in the Criticism point of the wikipedia link above).

Also critical bugs get fixed slowly. E.g. It took them from 2003-2008 to fix this bug. http://bugs.mysql.com/bug.php?id=989

Dear HN, please help me with this question. Again, thank you for your time.




Use a good abstraction library and start coding. Seriously.

PostgreSQL tends to be what is favored by people here. I used to use it for everything until I tried to set up a replicated cluster with it. PostgreSQL's replication is severely lacking. Slony-I, the most mature of the bunch, is very difficult to set up (with many steps needed for every single table) and a lot slower than MySQL's replication (due to its use of SQL and triggers rather than binary log shipping as well as the fact that its design causes communication costs to grow quadratically).

However, PostgreSQL's query planner is a lot better than MySQL's (especially if you're doing something like subqueries) and I find that complex queries run decently faster. The community process is also a lot more attractive.

To address the Oracle question:

Oracle cannot withdraw InnoDB. It's GPL licensed. Anyone can fork it should Oracle decide they don't want to play ball - just as anyone can fork MySQL. In fact, there are already forks underway including Drizzle. So, that isn't a big deal. The bigger deal would be if Oracle decided to halt future development. I'm guessing they won't since it would simply mean they would loose control to the community that still has rights to the GPL'd code.

In the end, don't worry about this issue. It distracts from what is really important: actually creating something. They both work fine. There are plenty of abstractions that will allow you to create code that will work with either with no modifications. Do that, build your application, and let the pundits from either side debate this issue until they're blue in the face while you're actually creating useful things.


Your job as a developer is to pick a technology and build your application. Building/implementing an entire extra layer onto your project just so you don't have to make a choice is a bad move. The technical term, I believe, is Yak Shaving.


You shouldn't have to build that layer yourself. I'm only familiar with the Python tools like Django and SQLAlchemy that abstract out which database you use, but I know Rails works similarly, and I'm pretty sure there are existing solutions for other languages.


Postponing the choice until you have enough data to make an informed decision is a good idea, though, and trying to isolate direct interaction with the database to a relatively small number of code paths would be worth doing regardless. If your code is so entangled with the database code that you can't separate them easily, you have much more immediate problems than theoretically having to coping with scalability someday (if you're lucky).


Out of curiosity, how many times in your career have you decided to switch databases halfway into a project?

In 15 years of doing this for a living, I've never actually needed to do so. With that in mind, I don't feel particularly worried that my SQL is not instantly portable.


I've moved from sqlite to serialization via Lua on one, when clarifying the problem made it clear that using a relational database was actually a poor fit. Most of my examples aren't database-related, though. I agree that switching databases doesn't happen much, I'm just noting that having a major, conceptually distinct part of a project entangled with the others is usually a sign of other problems. (Maybe that's just my experience maintaining legacy codebases, though.)

I think that getting preoccupied with moving from one database to another for scalability reasons is usually wishful thinking -- it's like worrying about all the positive attention you're getting.


in theory lots of things can 'cause scalability problems. I like to write a lot of my code in the database, and I would never be able to move to mysql from postgres, since mysql just doesn't support half the stuff postgres does. I might be able to migrate to sql server, db2, or oracle, but not the other way.


> Use a good abstraction library and start coding. Seriously.

I'm pretty skeptical of this approach. Cross platform development is a pain in the ass, whether we're talking databases or operating systems. I say pick the most powerful database you can and marry it. Use its features to the fullest and ignore portability. I suggest db2 or oracle.


if you are a startup you are not even considering db2 or oracle. you consider exactly what the op suggested. further, as your business grows so to will your needs. you may start with one backend in a particular configuration and evolve and/or migrate to a different configuration/platform.

abstraction is "the right thing to do" (tm).


Agreed. I love developing locally with a version controlled SQLite database and then deploying to *sql and having everything just work. Sometimes there are differences that need to be worked around (same as cross platform development), but starting with the assumption that your db might change is a good thing, in my opinion.


> if you are a startup you are not even considering db2 or oracle.

Why not? The free versions exceed postgresql. If you succeed and face heavy load, then you just throw some money at the problem rather than blow hundreds of man-hours fidling with the hacks people use to scale the free DBMSes.


You can just throw money at Postgres problems and make them go away. There are a number of very capable firms that do nothing but Postgresql support. And, they don't hold you captive w/ licensing. And if your needs are really that unique they can help you with building extensions to postgresql that support the capabilities you need.

See http://www.commandprompt.com/support/ for an example, or http://pgexperts.com/services.html


For long term maintainability, absolutely abstraction is the way to go. Any pros and cons listed now can help you decide, but what about a year from now? Two? Five?

This goes double if you are considering commercial products.

For various reasons we are forced to use $Vendor, but we are not a commercial shop and we always have the fear factor that $Vendor will price us out (they occasionally try it on by "forgetting" our academic discount). We hid all the DB interactions in two layers (one to hide the relational nature of the database, and the other to hide the specific implementation). As a result, I am in the position to tell $Vendor "How much? Well, you can give us this price or we can move to postgress - either is fine by me". And mean it. And they can tell.


MySQL

Pros:

* Easier to replicate (and hence to scale for some people)

* Easier to find DBA's for

* More popular (twitter, sixapart, etc. use them)

Cons:

* MyISAM is slow as hell

* InnoDB is better but may have licensing issues

* Oracle owns them now, unclear whether it is for better or for worse

PgSQL:

* Better performance of DB engine

* Excellent query planner and analyzer

* Rock-solid in most aspects, with the core team dedicated to putting out a good product

Cons:

* Lack of a canonical, good, production-ready replication method (current choices: Slony, pgPool, Continuent, Londiste)

* Hard to find good DBA's

* Less popular to some extent

There you go. Maybe it boils down to performance (pgsql) vs replication (mysql), but that's just from what I know...


Was going to post a response to this, but you nailed it.

Anecdotally, we moved off postgres because we could not hire (full time, contact, or consult) anyone who would be considered an expert at postgres replication, and the cost/benefit of switching off compared to growing such an expert favored switching off.

Postgres is my go-to database where a regular backup, rather than warm/live replica, meets availability needs.


Since one can easily find comparisons, benchmarks, gotchas etc ( you can always start here: http://www.wikivs.com/wiki/MySQL_vs_PostgreSQL ) the one thing that made me decide in favor of PostgreSQL is this (quoting a blogger):

MySQL is an open-source PRODUCT (and can be bought as such). Postgres is an open-source PROJECT (and can't be bought, and will be around as long as there is a sizable developer pool).

http://www.jiaozhoujob.com/newsprint-995.html

EDIT: PostgreSQL also seems to take support very seriously. Their latest security advisory has a patch back to version 7.4, which was released in November 2003.


I've been using postgres in production situations since 1998, at 4-5 different companies. I've never had a problem with it.

OTOH, the many times I've tried to use mysql I've run into gotchas even in development that were unexpected and obnoxious. Also, I don't really like the dual gpl/commercial license thing.

My faves about Postgres:

- Very mature OSS development community with amazing devs/community support

- Ridiculously stable

- BSD license

- I practically never run into a postgres bug.

- GIS support

- ltree data type (native trees with powerful query syntax)

- Clustered indexes

- Very fast under load

- Simple to manage

- Lots of tuning options

To sum up, I've used it forever and it's never caused me a problem. I am a postgres fanboy.


While this is somewhat more subjective, documentation and tutorials I've read for PostgreSQL consistently seem better informed about relational databases than MySQL's. I keep seeing cases where somebody is having performance issues with MySQL, and it turns out they neglected to do something basic like indexing their tables. This is probably partially due to its reputation of being good for the web, and thus getting a used by a lot of people who kinda-sorta know PHP but know nothing about the relational paradigm, but MySQL also has a history of ignoring fundamental relational concepts (http://www.reddit.com/r/programming/comments/764fp/mysql_vs_...), so it's not a coincidence.

Also, it irritates me when people write off relational databases because of a bad experience naively using MySQL, but that's neither here nor there.

You should probably program for a neutral wrapper and do actual testing to determine which will work better for you, though. I like PostgreSQL, and have had very good results with it, but I've never come close to needing the sort of dramatic scalability that people seem to worry so much about.


> I've never come close to needing the sort of dramatic scalability that people seem to worry so much about.

s/worry/fantasize/


I actually went back and forth on the wording, there. I decided "fantasize" sounded a bit too harsh -- some people are probably just underestimating how much one RDBMS can handle, due to a combination of buzz about scalability issues and bad experiences running databases without indexes, normalized tables, etc.


Well, a lot of people here think that their web-delivered software as a service offering is going to be an instant mega-success and they will need to scale up to meet 100k requests per second in the first day.

The truth is a much more pedestrian than that. For one thing, outages can be good publicity; I'm willing to bet many more people heard about twitter because of people bitching about it being than any other reason. For another, you should never look further than one order of magnitude ahead when scaling a web app because the load profile changes too unpredictably as you climb the curve; you can't tell what 100k visits/hr looks like from 1k visits/hr but at 1k v/h you probably have a good idea of what will kill you at 10k.


If you asked me a year ago, I would have said "use whichever you're most familiar with". Right now, given the unclear situation with MySQL development (will the real MySQL please stand up?) I'd say to go with postgresql.


MySQL has the advantage of being better supported by the developer community, but otherwise of being a technically inferior product to PgSQL.

If anything, the trouble with PgSQL is it's too feature rich to be easily understood by the typical developer. It's more DBA-grade than MySQL.

If you're building a real database driven application, though, having an engine like PgSQL is a real asset since it gives you a lot of capability within the database itself that MySQL does not easily provide.

That being said, MySQL is a fantastic engine for some applications that use it as a fancy file system. For example, message boards, blogs and chat rooms are often driven by MySQL because it's nearly foolproof to install and is mostly self-maintaining.

MyISAM is actually very fast database engine, but that comes at the price of forsaking transactions, foreign keys, and overall data integrity. For some applications this doesn't really matter and "good enough" will do.

While you can outgrow MySQL's feature set, it will take a long time to exceed the capabilities of Postgres. Most likely by the time you do, you can afford to hire a DBA so it won't be your problem.


There used to be a tremendous gap between these two databases. MySQL had an incredible dearth of features (subselects, triggers, TRANSACTIONS, and others were missing). PostgreSQL had the features, but it was relatively unpopular, so the language bindings were generally immature and not well documented. Then there was Interbase/Firebird, which was even less popular but was supposed to have an even better feature set (I never used it).

These days, both have come a long way. MySQL has most of the necessary features, and PostgreSQL has perfectly good bindings in most popular languages. It doesn't matter so much which you use any more. The reputations remain, however, that MySQL is lacking critical features and that PostgreSQL is black magic.

The legal issues around PostgreSQL are probably a little more stable at the moment, given the Oracle question (it now owns the trademarks). If Oracle starts causing any serious problems for MySQL, you can bet "YourSQL" will appear very quickly (sort of like how Iceweasel did). I wouldn't worry too much about it. Pick one, and start coding.


I use MySql for all my projects these days. In the past I've used both Sybase(when working at one of the big investment banks) and Informix(when working at one of the worlds leading customer tracking systems) and postgres for a number of small person projects.

My current project is a logging/reporting system which has between 50-75 million new rows per day amongst 2-3 tables in MySql with very few problems on a single Amazon EC2 instance(I've had to switch to partitioned tables for reporting recently though). My last employer did a tracking system with over 1 billion impressions a day on MySql with no problems(Multiple servers) Including near realtime reporting(delayed by about an hour).

Thats my plus side for MySql. I really have no negatives for Postgres but I haven't used it much so I can't honestly report from either side. Others I am sure will be able to add more from that side of the argument.


My opinion/sense is that Postgres is better in every way, except replication. Which ironically makes it far worse than mysql since you can't easily use it for huge datasets.


Replication does nothing to enhance searches over large datasets, and in the case of master-slave replication, you don't even have a second server to query.

Not to mention, that MySQL replication is known to be horribly fragile, and silently corrupts data -- which is why you need separate tools (like Maatkit) to know when it's time to rebuild the slave.


There are several replication solutions for PostgreSQL but they all require a level of sophistication and configuration from the administrator. One of the biggest issues I have seen in the field with setting up replication for postgres is that people design databases without primary keys and with duplicate rows. A DBA who understands the relational model wouldn't do that in the first place; and none of the common solutions for replicating a postgres database will let you copy a database with broken referential integrity.

It's a feature.


No kidding. Why are people so surprised that using an RDBMS well requires an understanding of databases? Database administration requires a very different skill set than programming, and DBAs have different priorities (e.g., Do NOT corrupt the data. Ever.); they're not just there to inconvenience programmers.

It seems like almost every time I've read a review of a database book, there's been somebody complaining that it's "full of a bunch of boring theory" (or else, "at least this book doesn't have much boring theory"). At the same time, some programmers nonchalantly do things that could land them on a DBA's equivalent of The Daily WTF (e.g. http://ask.metafilter.com/117908/Theres-got-to-be-a-faster-w...). It's a weird disconnect.


Well, there's a couple of issues out there; one being that most of the DBA's I've met were coders before they specialized, so on average most DBA's are more experienced. Another is that a shocking number of people in the industry don't understand and don't want to understand the basic concepts underlying the field. There is an unfortunately large class of developers that seem to want a pile of Lego Bricks that they can assemble into a semblance of functionality and who don't want to "bother" with understanding the fundamentals.

/me apologizes for the rant.


With the 3.0 release of londiste, that's no longer true as far as I can tell. Postgresql now has a great replication solution.


What are the issues you've seen with Slony-I and/or PGCluster?


Slony-I is a terrible piece of software; it's incredibly difficult to administer, requires full table locking across the entire data set (EVERY SINGLE TABLE!) to do DDL changes, and occasionally breaks in mysterious ways which can lead to data corruption.

I haven't used PGCluster, but it's synchronous multi-master and thus inherently fails to scale out as well as async systems like Slony-I and Londiste. Probably great in circumstances with only 2-3 db boxes though.

Londiste, on the other hand, is awesome, and with the 3.0 release now has hot backups. Simple to administer, rock solid, and has really important features like live DDLs. Demand Londiste and the SkyTools package! Don't settle for less!


Cool! Thanks for the info! I love Postgres buy haven't done a lot of replication with it yet. I'll checkout SkyTools. Thanks again.


PostgreSQL gets a lot of flack for not having replication. It's been my experience that it just doesn't matter in practice. Let me explain.

MySQL's performance sucks. It's great lightly loaded and with little concurrency. Add load and it noses over. (I've read that this is mostly because of the malloc they use. Regardless of the cause, it's a problem.)

I tested MySQL 5 vs PG 8.3 to handle session storage. This is about as simple of a test as possible. PostgreSQL outperformed MySQL by a significant margin across every load scenario, sometimes up to 2x the capabilities. PG's performance flatlined with load. MySQL flatlined and then hit a wall and dropped significantly as load and concurrency rose.

So the first thing someone does when MySQL noses over is to buy faster hardware. But that doesn't really help as much as it should. So they add replication to be able to distribute load across multiple servers. That's all fine and dandy until you realize that for every rep node you add, you are adding overhead for writes. And the headaches of keeping MySQL's brain dead replication in sync. Eventually you spend all of your time waiting for replication to finish or rebuilding a failed sync or chasing down ghosts in the system.

Back to postgres. Its performance scales nearly linearly with load and number of cores. Adding more hardware resources actually improves performance accordingly. So you end up being able to stay on one machine longer.

If you're truly going to have more load than one server can handle, break the data set up into smaller pieces (sharding).

Postgres is an awesome database. It behaves consistently and correctly. It has a lot of polish when you get into it. psql is a pleasure to use. It supports more of the SQL standard than MySQL. It's query analyzing tools are great. The only real pain point is that you have to do a dump/load to upgrade point releases (8.1 -> 8.2).

With all of that said, MySQL is great for smaller sites. It's available everywhere and every framework and language supports it. There are a million people who can administer it. Postgres is a lot harder to find competent DBAs.

PostgreSQL has been improving significantly throughout the 8.x series. Autovacuum has made life easier for everyone. Full text search is now built in. It now supports recursive queries and a bunch more.


If you're a Rails Developer; MySQL does NOT support transactional migrations, which means if something screws up during a deployment your database might be left in an inconsistent state.

If a migration screws up and you're running Postgres, the schema (and data) will just roll-back.


This question is 11 hrs old, and nobody has mentioned Postgres' need for periodic "VACUUM" operations? Seriously?

If you are running a system with significant data churn (think: a Reuters news feed expiring at N days), Postgres is at a massive disadvantage if you're concerned with system throughput. Postgres 8.4's rewritten Free Space Map looks promising, but 8.4 was just announced in beta on 4/15/2009. In 8.3 or earlier, you can expect significant latency if "VACUUM" is going to touch any significant percentage of your table size.

Given sufficient data churn with a Postgres system in a processing pipeline, we can be talking about an hour or more for the "VACUUM" operation per evening. Yes, I speak from experience. The MySQL (5.0.45 InnoDB) systems downstream from these Postgres DBs have zero measurable latency due to space-recovery operations.


Another vote for Postgres. MyISAM is a joke for serious use, and InnoDB isn't better/faster than postgres in my experience.

Postgres has felt much more mature and more focused on delivering things you'd expect in a real database (FKs, etc..) for a longer time.


I've never used PostgreSQL, but have spent my fair share of time programming against MySQL. It never fails to amaze me how far behind the curve MySQL is, that it still has table corrupting bugs happening frequently in the wild, and how unconcerned its developer pool is about this state.

So from that background, if I were forced to choose between the two for a new project, I'd go with Postgre. I know nothing about it at all, except that it seems to have a good reputation, and it's not MySQL.

Disclaimer: I spend the bulk of my time in SQL Server, and I have trouble understanding why anybody would use anything else out of choice.


These days, it doesn't quite matter. Especially if you're using something like Rails, which can switch between the two very easily. Just modify your config if you later feel you want out.


I'm glad you brought this up because I'm struggling with the same question. Since I'm developing in Django, I am currently abstracting out the db layer while I do the programming. However, I will eventually want to make a selection for when I launch the app and have been trying to decide between these two.

I'm currently leaning towards PostGres but am interested in hearing the pros and cons of both. The recent purchase of Sun by Oracle makes things a little more interesting.


For Django work I prefer postgres just because more Django projects use postgres, so when you have issues it's more likely that someone else on a message board has faced exactly the same thing. My projects haven't come near the limit of a single server, though.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: