Thank you for sharing this!
Seriously, the hardest part for me about teaching and writing is going back and trying to see the problem from the eyes of the "guy you were before". Now that it's all obvious, it's suddenly all not that hard anymore and even a little embarrassing to write down - I mean you can find all of this stuff on the Postgres Mailing lists, or superb books like "High Performance Postgres", so it's actually trivial, right?
Thing is, there's still way too few good and reasonably short TL;DR guides for Postgres out there (somehow it's always getting a little less love than MySQL, although this is starting to change), so I thought I'd give it a try, wrap it in an entertaining real-world story and in this way give a little part back to this awesome community. So: Glad you like it! AMA
As we’re not interested in “the last 14 days of price changes overall” here, but only in “the last 14 days of price changes of one single reference”, what the database would need to do with two separate indices is called a bitmap scan. It would first look up everything that ever happened to the offers of this reference from the index, then fetch everything that happened within the last 14 days, then doing a bitmap scan (which is a fancy word for calculating the intersection between the slices) and only then look up the actual rows that lighted up in both result sets.
If this sounds complex and taxing, it actually is - and for this reason it’s an optimization that often times so small the database won’t even attempt it.
This isn't exactly accurate. A bitmap scan builds a compact representation (a bitmap) of the locations of all of the rows it needa to fetch. Postgres will build a bitmap for each index, intersect them to obtain a bitmap of all of the rows that pass both filters, then fetch those rows from the table. This is cheap since you only read the rows from the database that satisfy all of filters. You don't need to read all of the rows that pass either filter. Additionally, if effective_io_concurrency is tuned properly, a bitmap scan can potentially be much faster since Postgres will prefetch the rows it will need (Postgres doesn't do prefetching for a normal index scan). I once saw a query speed up 10x after tuning effective_io_concurrency and forcing a bitmap scan.
Then, slow queries are not necessarily slow all the time. Sometimes they are slow depending on how populated a table is, or it is slow for specific records that have lots of relationships (e.g: a record for a user that is more active than the rest). In this case, getting: averages, min, max, std deviation, etc... gives you an idea of what is going on.
Then, once you've narrowed down the queries that are acting up, there are many things you can do:
1) Optimize the query: use the execution plan (with the EXPLAIN command) to find if there is any abnormal thing such as queries over fields that are not indexed, or if the query scope is too broad, etc.
2) Basic schema optimization: add an index, denormalize, etc.
3) Reduce call count (caching results) or provide more specific parameters to reduce the scope of the query.
4) Advanced optimization: partitioning, sharding, etc.
Overall, I did not like the problem solving approach of the article.
1. What is your volumetry (number of rows in your biggest central table, if you have one)
2. How many insert/seconds do you have , same for SELECT (rough estimate)
3. What RDS size are you currently using
4. Is this instance sharded ?
I am just curious - are you using Provisioned IOPS? If so what value? Noticed in your metrics your Queue Depth was pegged around 15-20 on average. According to AWS queue depth is essentially number of operations waiting because device is busy (somewhat analogous to iostat => await metric i assume).
When we hit that 15-20+ range or higher - it typically means we are maxing out our provisioned IOPS (1k or 2k - nothing crazy), and we see a lot of performance slowdown (in both MySQL and PG RDS).
We have not tried increasing IOPS - but was curious if you had in your previous performance tuning efforts for this or any other issues?
I haven't heard of people doing this for a few years, so I think PIOPS must have really improved things. But I agree, it sure is expensive! So I'm curious if anyone is still taking the instance storage approach to get more consistent disk performance.
The reason this works is because IOPS are provisioned per EBS drive and by the size of the drive. So a RAID0 stripe of, say, ten General SSD drives will outperform the more expensive PIOPS single drive.
So just provision a 1TB volume or so, and garner the standard iops throughput that comes as a result of the volume size right? I will give that a shot. Looks like you are hitting 1k+ on read and write easy. That is pretty impressive and plenty for us.
This is a relatively recent change (last year?), it used to cap out at 3k iops rather than 10k. The only real reason to go for provisioned iops now is if you need > 10k iops per volume.
I guess I'm asking as a MySQL web dev user, but not a dba -- is the tooling you describe inherent in PG a good reason to switch platforms?
The tooling discussed in the post should definitely not be a reason to switch from one DBMS to another. You need to understand your DBMS if you want to make best use of it. Just switching to another system w/o understanding why would help much.
Nevertheless, nice writeup.