Hacker Newsnew | comments | show | ask | jobs | submit login

When I first started doing serious DB work (back in the PostgreSQL 7.x/MySQL 3.x days), there were some show-stopper flaws in MySQL that made it a non-starter for my scenario. (In particular, you couldn't self-join a table, and "February 31" was treated as a valid date.)

In performance terms, granted, pg was a bit of a dog in those days, but it outperformed MySQL in every benchmark I could throw at it, with minimal tuning. So, if you're experiencing it as less performant, you're most likely running with a default configuration — which is deliberately tuned for something like a 486-class box with 128MB RAM or so — and/or haven't ANALYZEd your data. (Those are complete WAGs, knowing nothing more than what you've said about your scenario, but they tend to be among the more common reasons for lackluster performance.)

As for support, as a complete noob, I had a weird performance problem I couldn't make sense of, so I went to the mailing lists. Within a few hours, I was exchanging stack traces and other sundry debugging/profiling dumps with Tom Lane. (He's in the Wikipedia. Even if you've never touched PostgreSQL, you use his code every day of your life.) I don't think you can get much better support than that. Since, I've never encountered a problem that I haven't been able to have addressed, or at least get pointed in the right direction, by asking on — or searching the archives of — the relevant mailing list.

From there, it was largely a matter of, "This is the one I already know how to use...", along with the better feature-set (not mentioned by any of the sibling posts thus far: transactional DDL); the lack of a known-evil corporate overlord who could pull the plug at any time; the consistent tens-of-percent performance improvements in every major release; a development community that will punt a feature to the next release if it's not 100% ready and provably correct; and, let's be honest, the fact that, as someone who's been doing pg work for as long as I have, I can command a very comfortable hourly rate on the basis of that depth of experience — particularly when it's been focused in high availability and replication.

(EDIT: proofreading.)

I'll second this. For the longest time mysql was the quickest way on the web to lose your data. As in corrupt databases.

It also didn't support transactions where postgres did. Essentially mysql was a dumb datastore with sql interface, whereas postgres was a database.

That's changed now, but postgres is still a head in reliability(shit just doesn't break) and feature-set. and I hear in speed these days. But I no longer care about speed these days, as an SSD backed postgres handles anything I can possibly throw at it.


Be very, very careful using SSDs under your DB, whether PostgreSQL or anything else. If you don't have supercaps on your drives, you will lose data in a power loss situation, even with a battery-backed RAID controller. That data loss could take the form of anything from silent corruption of a table or index, to unrecoverable filesystem loss. (Briefly, the drive's controller uses the on-board cache to accumulate writes into erase-block sized chunks before flushing to the NAND media. If you disable the on-board cache, performance drops through the floor — USB thumb drives look fast by comparison — and you shave at least an order of magnitude off the drive's lifetime.)

At present, it looks like the best choice is the forthcoming Intel 710 series drives, but if you need an SSD now, their 320 series, the Sandforce controller based drives with supercaps (like the OCZ Vertex Pro models, though I wouldn't touch those with a competitor's database), or a FusionIO card are the only remotely safe options.


This is why I will be able to go back tonight and honestly say that I didn't waste time on Hacker News in the morning. :)



Except if you want to do a nice pagination interface over a set of data. select count() in postgresql takes forever on large tables. select count() over myisam is speedy. select count(*) over innodb is speedy but an estimate (I'd still prefer an fast estimate over exceedingly slow accuracy for many scenarios, such as paginated web interfaces).


That's because of PostgreSQL's MVCC (Multi-Version Concurrency Control) architecture. And, yes, while some aggregate queries can tend to suck on very large tables (for which there are a good half-dozen workarounds; this is the FAQ-est of FAQs), the upside of MVCC is that read queries can never block write queries, and write queries can never block read queries.

The benefit that has in terms of increased concurrency is worth far, far more than having to implement a workaround for quick-and-dirty row counts for simple things like pagination, IMO.

(Aside: InnoDB is also MVCC-based, which is why its COUNT(*) is an estimate. The MySQL folks apparently decided that it was better to provide an estimate than an exact count, while the PostgreSQL folks decided the other way. There's a part of me that wants to call that symbolic of the way the two projects operate on a much broader level...)


Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact