
Debugging the Postgres Query Planner - kiyanwang
https://gocardless.com/blog/debugging-the-postgres-query-planner/
======
ris
Great writeup of a situation I've been in before. The path that the query
planner traces through the space of all possible query plans is not _always_ a
smooth curve. That's the best way I have of explaining it.

Interestingly, the choices the planner makes _can_ even be affected by trivial
things like clause order. And as much as that sounds initially like a bad
thing, it does enable those who have the time to perform some trial-and-error
on a particularly important query to find ever faster formulations for it.

As for how to detect problems like these before they bite, I've considered
whether it might be worth having one of your "staging" environments running
with a version of your data that's, say, 10% bigger in all dimensions. Has
anyone ever tried anything like this?

~~~
lawrjone
Hey, I'm the author of this article- nice to see it's still going the rounds!

While we haven't got anything to identify these situations before they happen,
we've become quite good at tackling them once they do occur. We have a
development tool called draupnir
([https://github.com/gocardless/draupnir](https://github.com/gocardless/draupnir))
which we use to provide an anonymised copy of our production data for
debugging these sort of problems after they've come up.

Draupnir is always primed with an anonymised image from the previous day,
allowing us to compare the problematic query plan that happens today with what
had been working before. It's quite easy to then try out a few different
rewrites of the query until you identify a good fix.

Like you, I'd be interested if anyone has tried your approach of a staging
environment with lots more data. My hunch is that you might get different sort
of query malfunctions, rather than predicting what might occur in production,
but it would be a cool experiment.

