Hacker News new | comments | show | ask | jobs | submit login

I spent 13 years writing the core trading system for many of the well known exchanges. We used open source wherever possible because the software tended to be more reliable. That said, clients usually got to request the database and we used Sybase a lot. I have been using Postgres for the last eight years since. every day of the week and I really like it but the planner is quite a bit worse than Oracle, SQL Server's. The postgres planner is still way way better than MySQL's. It still has correlated subqueries explode into cartesian joins. Mysql is great as a data store but it's more of a replacement for noSQL than an advanced query engine.

MySQL's planner is predictably stupid; structure complex multi-table predicates as joins (nested if necessary) rather than subqueries and it's almost imperative. Postgres OTOH is very unpredictable; sometimes it does the right thing, and sometimes it does something amazingly asinine, where simply swapping a table between from vs join clause can result in 1000x speedup.

Specifically, I've seen pg take a query that looks like this:

  select ... from a join b join c join (select ...) d
where a has millions of rows and is an equijoin with d where d has 10 rows, and it decides to materialize a x b x c, only joining in d at the last step. But do it like this:

  select ... from (select ...) d join a join b join c
and it does the right thing! And analyze gets it right (i.e. the plan for the reordered joins is recognized as better) - never mind genetic optimization, it's lacking analytic optimization.

With the lack of hints, almost the only tool you have to control query plans effectively in postgres is parenthesized joins. Since it's more liable to rewrite the query, the language ends up being less imperative, and thus less predictable. And I like predictability in production.

SQL-level feature set is no comparison of course, pg wins easily.

There are settings for choosing between the exhaustive search planner and the genetic planner. The exhaustive planner is better, but can be slow for complex queries with a lot of paths. But, if your query is at all time consuming you probably want to increase geqo_threshold and geqo_effort as well as join_collapse_limit and from_collapse_limit.

I'd also suggest disabling nest_loop_entirely if you are having problems with bad cardinality estimates resulting in nestloop plans that run 100 times when the planner estimated once.

An interesting argument for the predictability of mysql. Great observation.

It is interesting to see how postgresql will often choose hashmap scan, even with very up to date statistics and much better paths available.

SQLServer's planner does an amazing job of digging right into joins/sub-selects to constrain preliminary results for joins.

It's a very hard job and MS and Oracle obviously have had some of the best people on the world paid well to work on this.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact