
Debugging PostgreSQL performance the hard way - endymi0n
https://www.justwatch.com/blog/post/debugging-postgresql-performance-the-hard-way/
======
mcherm
Dominik Raute, I just wanted to say that this may be the best article I have
EVER read on how to do database performance tuning. The fact that you showed
all the initial steps that DIDN'T reveal any problems was great. The fact that
you mentioned the tools you were using and commented on what was particularly
good about each was also great. And you explained it all on a level that was
comprehensible to a beginner even though this was a very advanced level of
work.

Thank you for sharing this!

~~~
endymi0n
Now you make me cry :D

Seriously, the hardest part for me about teaching and writing is going back
and trying to see the problem from the eyes of the "guy you were before". Now
that it's all obvious, it's suddenly all not that hard anymore and even a
little embarrassing to write down - I mean you can find all of this stuff on
the Postgres Mailing lists, or superb books like "High Performance Postgres",
so it's actually trivial, right?

Thing is, there's still way too few good and reasonably short TL;DR guides for
Postgres out there (somehow it's always getting a little less love than MySQL,
although this is starting to change), so I thought I'd give it a try, wrap it
in an entertaining real-world story and in this way give a little part back to
this awesome community. So: Glad you like it! AMA

~~~
nugator
If you have any writeup about your PostgreSQL to Elasticsearch
denormalization/aggregation/transportation methods I'd be very interested to
read about it!

~~~
vitorbaptistaa
+1 to that. We use a similar stack in OpenTrials
([http://opentrials.net](http://opentrials.net)) and it would be great to see
how others solved these issues.

------
malisper
> _Now the most common solution to less row lookups would be using an index.
> We could create one on created_at and call it a day. But we’d probably only
> see a very small speedup. Why is that?

As we’re not interested in “the last 14 days of price changes overall” here,
but only in “the last 14 days of price changes of one single reference”, what
the database would need to do with two separate indices is called a bitmap
scan. It would first look up everything that ever happened to the offers of
this reference from the index, then fetch everything that happened within the
last 14 days, then doing a bitmap scan (which is a fancy word for calculating
the intersection between the slices) and only then look up the actual rows
that lighted up in both result sets.

If this sounds complex and taxing, it actually is - and for this reason it’s
an optimization that often times so small the database won’t even attempt it._

This isn't exactly accurate. A bitmap scan builds a compact representation (a
bitmap) of the locations of all of the rows it needa to fetch. Postgres will
build a bitmap for each index, intersect them to obtain a bitmap of all of the
rows that pass both filters, _then_ fetch those rows from the table. This is
cheap since you only read the rows from the database that satisfy all of
filters. You don't need to read all of the rows that pass either filter.
Additionally, if effective_io_concurrency is tuned properly, a bitmap scan can
potentially be much faster since Postgres will prefetch the rows it will need
(Postgres doesn't do prefetching for a normal index scan). I once saw a query
speed up 10x after tuning effective_io_concurrency and forcing a bitmap scan.

~~~
endymi0n
Good catch, but what you're saying is at least what I was trying to convey.
Still doing it like this would mean two potentially larger index lookups plus
calculating the intersection between them rather than satisfying the filter
from one index alone - for fetching the rows, it will be equivalent from there
on of course. Still, I may have overstated the cost - I admit didn't even try
separate indices in this case.

~~~
malisper
Yes, in your case you did the right thing.

------
0xmohit
Good to see a post that actually talks about debugging a performance issue,
and not saying that moving from db-a to db-b did it all.

~~~
karmelapple
And great that it gets into many of the details, mentioning both tools to use
and raw queries to run in the database.

Great writeup!

------
partycoder
"debugging performance" is otherwise known as profiling. Googling "PostgreSQL
profiling" will yield more and more accurate results than "debugging
PostgreSQL performance". A good wiki article on the topic is:
[https://wiki.postgresql.org/wiki/Performance_Analysis_Tools](https://wiki.postgresql.org/wiki/Performance_Analysis_Tools)

Then, slow queries are not necessarily slow all the time. Sometimes they are
slow depending on how populated a table is, or it is slow for specific records
that have lots of relationships (e.g: a record for a user that is more active
than the rest). In this case, getting: averages, min, max, std deviation,
etc... gives you an idea of what is going on.

Then, once you've narrowed down the queries that are acting up, there are many
things you can do:

1) Optimize the query: use the execution plan (with the EXPLAIN command) to
find if there is any abnormal thing such as queries over fields that are not
indexed, or if the query scope is too broad, etc.

2) Basic schema optimization: add an index, denormalize, etc.

3) Reduce call count (caching results) or provide more specific parameters to
reduce the scope of the query.

4) Advanced optimization: partitioning, sharding, etc.

Overall, I did not like the problem solving approach of the article.

------
allan_s
If it's not a secret, I would be interested to know

    
    
      1. What is your volumetry (number of rows in your biggest central table, if you have one)
      2. How many insert/seconds do you have , same for SELECT (rough estimate)
      3. What RDS size are you currently using
      4. Is this instance sharded ?
    

We're also running our business on amazon RDS for the database, and I would
like to have a measure on how far you can go, as often people who don't
understand SQL databases start to jump to "only NoSQL can save us" once they
start to have performance problem, that could be solved by simply having a
smarter schema, better indexes, or sharding a bit. So that next time these
voices appear I will have some numbers to throw at them :)

------
dfsegoat
Fantastic write-up! Put this on our data engineering reference article list.

I am just curious - are you using Provisioned IOPS? If so what value? Noticed
in your metrics your Queue Depth was pegged around 15-20 on average. According
to AWS queue depth is essentially number of operations waiting because device
is busy (somewhat analogous to iostat => await metric i assume).

When we hit that 15-20+ range or higher - it typically means we are maxing out
our provisioned IOPS (1k or 2k - nothing crazy), and we see a lot of
performance slowdown (in both MySQL and PG RDS).

We have not tried increasing IOPS - but was curious if you had in your
previous performance tuning efforts for this or any other issues?

~~~
endymi0n
The dirty secret AWS doesn't want you to know is: Provisioned IOPS is really
bad value for the money. Like a lot of others, we're just using a massively
overprovisioned General purpose SSD volume, but reaching IOPS limits came back
to bite us multiple times. Be sure to monitor it (and no, they don't provide
any convenient way to do that, had to import several metrics from Cloudwatch
API to Prometheus and combine there).

~~~
pjungwir
Around 2009-2010, EBS volatility was leading a lot of people to run their
database on instance storage instead of EBS. (At the time I suspected that
even Heroku was doing this.) That seems pretty crazy, since the other name for
instance storage is "ephemeral storage". If you have replication you reduce
the risk, but it's still a wild thing to do. But I understand it. Would you
run Postgres on NFS? If not, then why is EBS okay?

I haven't heard of people doing this for a few years, so I think PIOPS must
have really improved things. But I agree, it sure is expensive! So I'm curious
if anyone is still taking the instance storage approach to get more consistent
disk performance.

~~~
nasalgoat
The secret to EBS is to use General SSD, not Provisioned, but use a RAID
stripe.

The reason this works is because IOPS are provisioned _per EBS drive_ and by
the size of the drive. So a RAID0 stripe of, say, ten General SSD drives will
outperform the more expensive PIOPS single drive.

------
akurilin
Thanks for sharing those hard earned learnings!

------
tenken
Is a similar approach available with say MySQL vendor tooling/API. Slow query
log? Meta information tables in mysql?...etc.

I guess I'm asking as a MySQL web dev user, but not a dba -- is the tooling
you describe inherent in PG a good reason to switch platforms?

~~~
tex0
You can go very much the same debugging route with MySQL. There are similiar
tools available for MySQL as well.

The tooling discussed in the post should definitely not be a reason to switch
from one DBMS to another. You need to understand your DBMS if you want to make
best use of it. Just switching to another system w/o understanding why would
help much.

~~~
endymi0n
This. I took a lot of inspiration doing this debugging session from my
experience as a MySQL DBA. pg_catalog (PG) and information_schema (MySQL) are
very similar indeed, and I remember finding out index sizes was even way
easier in MySQL, the path would have been pretty much the same. Can repeat:
Don't switch your database "just because", as Postgres isn't all a happy
place. Replication is way harder to set up, it often needs more maintenance,
performance is often slightly worse and the tooling is simply better and more
mature for MySQL still. MySQL is a formidable SQL-flavored key-value store
that scales like hell. Switch when you have a need for CTEs, partial indices,
window functions, transactional DDL, CONCURRENT index building or other
advanced features.

------
ComodoHacker
While it might seem "the hard way" to a developer who deals with db
performance just occasionally, it's a regular performance tuning workflow. The
problem and the solution in this case was fairly simple.

Nevertheless, nice writeup.

------
nhumrich
I've been needing this article for quite some time. Thank you SO much.

------
kakoni
Wondering, is there anything similar to RDS Metrics Dashboard for non-aws
users?

