Hacker News new | comments | show | ask | jobs | submit login

Why do you use pgsql over mysql?

I've used mysql a lot, and pgsql a little. I can't tell the difference, other than pgsql being slower and having less support. Some people swear by it, so I'm curious what I am missing.

When I first started doing serious DB work (back in the PostgreSQL 7.x/MySQL 3.x days), there were some show-stopper flaws in MySQL that made it a non-starter for my scenario. (In particular, you couldn't self-join a table, and "February 31" was treated as a valid date.)

In performance terms, granted, pg was a bit of a dog in those days, but it outperformed MySQL in every benchmark I could throw at it, with minimal tuning. So, if you're experiencing it as less performant, you're most likely running with a default configuration — which is deliberately tuned for something like a 486-class box with 128MB RAM or so — and/or haven't ANALYZEd your data. (Those are complete WAGs, knowing nothing more than what you've said about your scenario, but they tend to be among the more common reasons for lackluster performance.)

As for support, as a complete noob, I had a weird performance problem I couldn't make sense of, so I went to the mailing lists. Within a few hours, I was exchanging stack traces and other sundry debugging/profiling dumps with Tom Lane. (He's in the Wikipedia. Even if you've never touched PostgreSQL, you use his code every day of your life.) I don't think you can get much better support than that. Since, I've never encountered a problem that I haven't been able to have addressed, or at least get pointed in the right direction, by asking on — or searching the archives of — the relevant mailing list.

From there, it was largely a matter of, "This is the one I already know how to use...", along with the better feature-set (not mentioned by any of the sibling posts thus far: transactional DDL); the lack of a known-evil corporate overlord who could pull the plug at any time; the consistent tens-of-percent performance improvements in every major release; a development community that will punt a feature to the next release if it's not 100% ready and provably correct; and, let's be honest, the fact that, as someone who's been doing pg work for as long as I have, I can command a very comfortable hourly rate on the basis of that depth of experience — particularly when it's been focused in high availability and replication.

(EDIT: proofreading.)

I'll second this. For the longest time mysql was the quickest way on the web to lose your data. As in corrupt databases.

It also didn't support transactions where postgres did. Essentially mysql was a dumb datastore with sql interface, whereas postgres was a database.

That's changed now, but postgres is still a head in reliability(shit just doesn't break) and feature-set. and I hear in speed these days. But I no longer care about speed these days, as an SSD backed postgres handles anything I can possibly throw at it.

Be very, very careful using SSDs under your DB, whether PostgreSQL or anything else. If you don't have supercaps on your drives, you will lose data in a power loss situation, even with a battery-backed RAID controller. That data loss could take the form of anything from silent corruption of a table or index, to unrecoverable filesystem loss. (Briefly, the drive's controller uses the on-board cache to accumulate writes into erase-block sized chunks before flushing to the NAND media. If you disable the on-board cache, performance drops through the floor — USB thumb drives look fast by comparison — and you shave at least an order of magnitude off the drive's lifetime.)

At present, it looks like the best choice is the forthcoming Intel 710 series drives, but if you need an SSD now, their 320 series, the Sandforce controller based drives with supercaps (like the OCZ Vertex Pro models, though I wouldn't touch those with a competitor's database), or a FusionIO card are the only remotely safe options.

This is why I will be able to go back tonight and honestly say that I didn't waste time on Hacker News in the morning. :)


Except if you want to do a nice pagination interface over a set of data. select count() in postgresql takes forever on large tables. select count() over myisam is speedy. select count(*) over innodb is speedy but an estimate (I'd still prefer an fast estimate over exceedingly slow accuracy for many scenarios, such as paginated web interfaces).

That's because of PostgreSQL's MVCC (Multi-Version Concurrency Control) architecture. And, yes, while some aggregate queries can tend to suck on very large tables (for which there are a good half-dozen workarounds; this is the FAQ-est of FAQs), the upside of MVCC is that read queries can never block write queries, and write queries can never block read queries.

The benefit that has in terms of increased concurrency is worth far, far more than having to implement a workaround for quick-and-dirty row counts for simple things like pagination, IMO.

(Aside: InnoDB is also MVCC-based, which is why its COUNT(*) is an estimate. The MySQL folks apparently decided that it was better to provide an estimate than an exact count, while the PostgreSQL folks decided the other way. There's a part of me that wants to call that symbolic of the way the two projects operate on a much broader level...)

I personally picked PostgreSQL instead of MySQL since I don't trust Oracle. Why support a free database server when it competes with Oracle RDBMS? Granted Oracle RDBMS is extremely expensive, but still, something doesn't sit well with me...

I felt the same mistrust for Oracle, but at Railsconf, I had the chance to talk with the team from Percona, which is a free fork of MySQL that makes some performance improvements and sells consulting (think Red Hat).

The Percona team said that Oracle has a lot of customers who are using Oracle RDBMS for big stuff and MySQL for smaller stuff, and they like being able to sell support for both, but they don't see MySQL as competing with their flagship DB. He also said that Oracle has so far kept their promise to keep developing MySQL and that Percona has pulled their upstream changes.

Besides that, if Oracle decides to stop supporting MySQL, Percona is a drop-in replacement, as is MariaDB, developed by some of the original MySQL developers. So you won't get stuck.

All of that said, however, I'm interested in PostgreSQL just because I've heard that it's a well-made database. There's nothing wrong with switching for other reasons; I just don't think that uncertainty about MySQL's future is a good reason right now.

Thanks for the clarification billybob. :) Part of my perception was driven by the unknown; what would happen if Oracle stopped putting money into it? It's nice to hear that Percona appears to be a potential candidate to continue its development if anything bad ever happened to it on Oracle's end.

This is a good point, but only became the case fairly recently. I might lean towards pgsql in the future based on that, actually.

MySQL is licensed under the GPL, and that isn't going to change. Oracle can't kill MySQL, nor can they force you to pay for it. I think your fears are unfounded.

• EXPLAIN output is awesome (once you understand the format, you'll know exact algorithm postgres uses for the query with bottlenecks highlighted)

• Subqueries are optimised as well as JOINs. I can go all-Inception in my queries and they perform well (I find subquery style often easier to understand than equivalent JOIN).

• You can do UPDATE … SELECT on the same table.

• Postgres has query rewrite (RULE) that can be used to implement writeable VIEWs (which is awesome for migrating legacy applications to new schema)

The only thing I really miss from MySQL is ON DUPLICATE KEY UPDATE. Postgres has only weak 1-row hacks emulating this, and "standard" MERGE syntax for this is horribly ugly.

I'm a longtime Postgres guy, but I've used MySQL on a few sizable projects. MySQL has come a long way over the years, such that a lot of my original criticisms are no longer valid. My reasons for not using it this day are mainly lack of transactional DDL and the fact that adding indices require a full table lock. There are some interesting solutions to the latter problem that the Percona guys have pulled together, but Postgres handles it all out of the box. Things like partial and expression indices are icing on the cake.

My impression (having used mysql at a number of jobs over the years, and talked several very sharp pgsql fans) was that pgsql used to have a big lead in cool features that mysql didn't have. That lead has been eroding, but I don't think mysql is 100% caught up, though I could be wrong. In any case, even when mysql had implemented the various cool features, pgsql's implementations naturally tended to be more mature and stable for a while.

Disclaimer: I don't think of myself as a database guy, and I couldn't give you a definitive list of the differences. I remember some times in the past when things like transactions, triggers, and stored procedures were on the list of differences the pgsql fans quoted to me.

I'm no longer a database "poweruser" due to changing requirements in my job, but not too long ago I used to be and MySQL had no CHECK constraints. Also triggers are less flexible than in Postgre, like in a cascading delete/update won't fire them.

Historically, PostgreSQL has been considerably more SQL-feature-rich than MySQL. It still is, yet to a lower extent (see previous paragraph) due to MySQL caching up. For example, MySQL only has views, triggers and stored procedures since version 5, and Postgre had them since at least 7 or 8 which were already mature at the time MySQL 5 was still development. I guess a lot of people became adepts at that time.

Atomic transactions, perhaps at the expense of pgsql being slower.

Subqueries, but we don't require this in production.

Wariness of Oracle's conflict of interest.

Mysql has had subqueries since version 4.1, circa 2004!

But the optimiser is really bad at doing sensible things with them.

Worse still, subqueries in the FROM clause are (documented to be) implemented as an unindexed temp table.

Google apparently compiles subquery support out of their mysql instances so people don't mistakenly think they're usable.

First off - whatever you're doing, you'll probably be fine using either.

That said, I'm generally more frustrated when using MySQL than when using PG. Here's a sample of the problems I've encountered from using both MySQL and PG. I haven't updated my list in a while now - please feel free to correct me on things - but hopefully it's a little more illustrative than that Wikipedia feature matrix, and a little more specific to MySQL vs. PG. (This list is a cleaned-up selection from my notes wiki at http://yz.mit.edu/notes/Hackery.)

No referential integrity.

No constraints (CHECK).

No sort merge join, let alone hash-join. http://www.dbms2.com/2008/07/10/how-is-mysqls-join-performan..., http://www.mysqlperformanceblog.com/2006/06/09/why-mysql-cou...

Generally poor at analytical workloads, since it's designed for transactional workloads.

Can't reopen TEMP table - WTF? (Still not fixed!) http://bugs.mysql.com/bug.php?id=10327

Multiple storage engines has always restricted progress: http://www.mysqlperformanceblog.com/2010/05/08/the-doom-of-m... (PG also supported multiple storage engines in 80s, then concentrated on one)

No WITH clause: http://stackoverflow.com/questions/324935/mysql-with-clause

Crappy errors: “Incorrect key file for table ‘stock’; try to repair it” on “alter table stock add constraint pk_stock primary key (s_w_id, s_i_id);” where stock is in InnoDB (which has no “repair table”) means I have no /tmp space (no Google answers)

Crappy EXPLAIN output - somewhat better when using the visual-explain tool from Percona.

InnoDB auto-extends ibdata1 file; only way to trim (garbage collect) is dumping and loading.

Scoping is broken:

  mysql> create table t(a int, b int); Query OK, 0 rows affected (3.30 sec)
  mysql> select a, (select count(*) from (select b from t where a = u.a group by b) v) from t u;
  ERROR 1054 (42S22): Unknown column ‘u.a’ in ‘where clause’
Optimizer leaves plenty to be desired, e.g. not pruning unnecessary joins.

“InnoDB is still broken…Just last week we had to drop/re-create an InnoDB-table in one project because it would not allow to add an index anymore, no matter what we tried…Mysql::Error: Incorrect key file for table 'foo'; try to repair it: CREATE INDEX [...]” http://news.ycombinator.com/item?id=2176062

MySQL only recently got such things as per-statement triggers and procedural language support.

MySQL has only its own internal auth system, whereas PG supports a wide array of auth providers.

PG has more supple ALTER TABLE implementation.

MySQL doesn’t support ASC/DESC clauses for indexes http://explainextended.com/2010/11/02/mixed-ascdesc-sorting-...

Optimizer only recently started working properly with certain subqueries

OK documentation, but still considerably unpolished compared to PG's. Random omission: auto_increment jumps up to next power of 2 but inconsistently across versions (platforms?).

(Older issue, not sure if it's still relevant) Crappy concurrency, >3 cores sucks vs PG: http://spyced.blogspot.com/2006/12/benchmark-postgresql-beat...

These days, Postgres is faster than MySQL+InnoDB, and scales much better across multiple CPU cores. (MyISAM is still faster, but that's not an appropriate comparison.)

Some features that make Postgres awesome:

* Transactional DDL. You can do "create table" in a transaction. _Everything_ is transactional, it's not a tacked-on feature, it's the basis of everything.

* No legacy cruft. Compared to MySQL, which is filled to the brim with historical warts. The Postgres people have been careful to weed out obsolete functionality. There are essentially no sneaky border cases that a developer needs to be aware of, no weird special cases like "0000-00-00 00:00" having special meaning.

* No need for a "strict" mode, since Postgres is always strict. Postgres doesn't allow invalid dates, doesn't allow byte sequences that violate character encodings, etc. It diligently enforces contraints and generally doesn't allow you to screw up. To Postgres, data integrity is paramount.

* PostGIS. Simply awesome. (MySQL's geospatial stuff also tries to implement the OGC API, but last I looked, it was a half-hearted attempt that negelcted to provide the fast R-tree-based (actually GiST-based) indexing that makes PostGIS so super fast.)

* Replication. It's late to the party, but I rather prefer how Postgres has implemented its replication, even though it has some downsides where it will abort a long-running query if some data has changed under its feet (but if you're using transactions it's easy to simply restart the query). 9.1 will be getting synchronous replication, which is pretty cool.

* Extensions. Postgres can integrate languages like R and Ruby as first-class languages that can be called from SQL. It also has a module system that can extend the type system (a bit of trivia: This was originally the main reason why Michael Stonebraker invented Postgres) with new types, eg. multidimensional matrix columns, or new features, like remote tables.

* The "text" type. Seriously, why should do people keep writing things like varchar(255)? Postgres' text type is an unlimited string. Unlike MySQL's text type, it can be efficiently indexed without limitations. (Varchar is internally implemented as a bounded text type.)

* Cost-based planner backed by row-level statistics. This is the stuff that allows Postgres to do complex nested queries and still perform incredibly well.

* Partial indexes. You can do something like "create index ... on themes (name) where color = 'blue'". Whenever you do a query that falls within the expressions's range, Postgres will use that index, potentially vastly reducing the search space.

* Functional indexes. You can do something like "create index ... on (lower(name))". If you then do a query such as "select ... where lower(name) = 'xyz'", then Postgres will recognize that it's the same expression, and it will be able to use the index.

* Windowing functions and recursive queries, both from ANSI SQL99 iirc. Look this up, they're great.

There are some bad points, none of them significant and all of them a matter of taste:

- I have never really liked Postgres' text indexing, which feels a bit creaky and antique. At least with 8.x, GIN index updating was slow as hell.

- Partitioned tables are a great feature, but I will never use it because of the requirement that one does the plumbing yourself (creating partitioning rules and so on); I keep waiting for something like Oracles's automatic partitioning.

- Stored procedures -- ie., running logic inside the database -- feels wrong to me, and always has. For some people this is a requirement, so I'm not really complaining. In some cases, writing a stored procedure can be essential to speed up queries/operations by saving on database roundtrips.

- Still no "upsert" SQL command (aka "insert or replace", "insert or update") for asserting the existence of a row atomically.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact