
How a single PostgreSQL config change improved slow query performance by 50x - ryanashcraft
https://amplitude.engineering/how-a-single-postgresql-config-change-improved-slow-query-performance-by-50x-85593b8991b0
======
quizotic
The prescription of changing sequential_page_cost to equal random_page_cost is
certainly reasonable for SSD, but I wonder if the underlying issues aren't
somewhat deeper and more interesting. One difference between a sequential scan
and an index scan is the amount of data being scanned. PostgreSQL stores
information horizontally as rows and a sequential scan will have to read in
all column values of all rows. An index scan will read through all values of a
_single_ column. The 50x performance difference _might_ be just that the whole
row is 50x wider than the width of the indexed join column.

An interesting second factor relates to the nature of the SSD storage. With
SSDs a read request will pull back a 4K page, even if the read request was
smaller. So it's not quite right to say that a sequential read and a random
read cost the same on SSD, particularly if the same 4K page must be read
multiple times. I suspect that the particular index technique used by
PostgreSQL tends to organize data such that successive indexed values reside
in the same 4K SSD page. IOW, it's not so much that the cost of random SSD
access is the same as sequential SSD access (though that's true), as it is
that the PostgreSQL index mechanism doesn't require multiple reads of the same
4K page.

if a Hash-based index was used instead of a Btree-based index, and if the
table width was narrower, the sequential scan might have outperformed the
index scan.

~~~
idontgetproton
You're saying that during sequential scans, the time it takes per row is O(n)
where n is the number of columns? I find that hard to believe. Can anyone
confirm / deny this?

~~~
pg314
The number of columns in PostgreSQL is limited to 250-1600 [1] since a tuple
(a row) can't span more than one page of memory. Since O-notation talks about
asymptotic behavior, it doesn't really apply here.

But yes, tables with more columns normally take more time to scan
sequentially. The complete tuple is always loaded (excluding the data of TOAST
[2] attributes), there is no way to only load one column. This is one of the
reasons that column-oriented databases can be faster than row-oriented
databases [3].

[1] [https://www.postgresql.org/message-
id/42C3C382.5020108@cinec...](https://www.postgresql.org/message-
id/42C3C382.5020108@cineca.it) [2]
[https://www.postgresql.org/docs/9.5/static/storage-
toast.htm...](https://www.postgresql.org/docs/9.5/static/storage-toast.html)
[3] [https://en.wikipedia.org/wiki/Column-
oriented_DBMS](https://en.wikipedia.org/wiki/Column-oriented_DBMS)

------
ahachete
For those interested in postgresql.conf tuning, I did a recent presentation at
two conferences recently, hope it's interesting. Slides:
[https://speakerdeck.com/ongres/postgresql-configuration-
for-...](https://speakerdeck.com/ongres/postgresql-configuration-for-humans)

[shamless plug]

~~~
aargh_aargh
There's a link to postgresqlco.nf in the description. Do you run the service?
It doesn't accept a file being dropped. In fact, it doesn't seem to do
anything (I looked at the HTML).

~~~
ahachete
It's a WIP. It's coming, stay tuned... (yo may subscribe to get noticed).

------
danjoc
Thanks for the tip. This probably affects more people than there are people
who realize it. It would be really interesting if PG would use machine
learning to discover this sort of tuning on its own.

~~~
quizotic
+1 for the idea of tuning query optimization based on ML. I don't know of any
DBMSs that take this advice, and that's remarkable in this day and age.

~~~
power
Postgres has a query optimizer that uses Genetic Algorithms:
[https://www.postgresql.org/docs/current/static/geqo.html](https://www.postgresql.org/docs/current/static/geqo.html)

Clarification: this is for planning how to execute a query, not for tuning the
db settings

~~~
agnivade
Wow, did not know that. TIL!

------
olavgg
I always increase the seq scan cost, but another thing that helped me more was
updating the table statistics. In that way you can make the planner better
aware of your indexes.

In my case I increased STATISTICS to 5000 and the planner immediately start
using the index instead of full table scan.

[https://blog.pgaddict.com/posts/common-issues-with-
planner-s...](https://blog.pgaddict.com/posts/common-issues-with-planner-
statistics)

~~~
pgaddict
Please don't mess with the seq_page_cost. It's considered a reference value,
so leave it set to 1.0 and tweak the other values.

------
pgaddict
Unfortunately the author does not say some pretty basic things - which
PostgreSQL version, how much data, how much of it fits into RAM, what storage
(and hardware in general) ...

If I understand it correctly, PostgreSQL was using the default configuration.
Which is rather inefficient, and is more about "must start everywhere".

Decreasing random_page_cost makes sense if you have storage that can handle
random I/O well (although I wouldn't go to 1 even if it's an SSD). But who
knows if the data was read from storage at all? Maybe it'd fit into RAM (and
just increasing effective_cache_size would be enough for the planner to
realize that).

~~~
rpedela
Setting random_page_cost = 1 is pretty common advice for SSD and works well in
my experience. Typical advice for HDD RAID or SCSI is random_page_cost = 2 and
SSDs are faster than them.

~~~
pgaddict
I don't know who recommends random_page_cost=1, but IMNSHO it's a bit silly.
Even SSDs handle sequential I/O better than random I/O. Values between 1.5 and
2.0 are more appropriate. I wouldn't really recommend 1.0 except when you know
the data fits into RAM. There are other options that affect costs of random
I/O, e.g. effective_cache_size.

------
branko_d
Hmm... The author is filling a combo-box (probably just one column), yet the
query is selecting all columns (SELECT *).

I would have tried selecting just the needed column (let's call it "foo"),
with following indexes:

event_types (app, id)

prop_keys (event_id, foo)

This should cover the entire query with indexes (i.e. allow for index-only
scan).

------
stubish
You also get good results tweaking this particular knob if you have large
amounts of RAM. If your blocks are almost always in OS cache, you are almost
never going to make random seeks even if the PostgreSQL planner thinks you
are.

~~~
pja
That feels like something the database could do for itself - checking whether
a range of mmap()ed blocks are actually in memory or not is a single syscall.

I guess for large indexes the overhead of walking the page tables is going to
be large though, so it’s not necessarily going to be a net win.

~~~
pgaddict
But you don't know which blocks you'll need at planning time, so you can't
really check that.

You could of course check if the total database size is within RAM, but it's
much more common to have database much larger than RAM (say 1TB on a machine
with 128GB of RAM), but the actual working set (recent data processed by
queries) is much smaller.

~~~
pja
_But you don 't know which blocks you'll need at planning time, so you can't
really check that._

Really? Postgres doesn't know which blocks are part of which tables / indexes?
That seems ... suboptimal if so.

~~~
jeltz
I guess one could implement a job which samples the pages of tables and
indexes to check what percentage is typically in RAM and use that for an
estimate, but to get which actual pages will be hit by a query you need to
start executing it (you need to do an index lookup to see which index pages
and table pages which will be accessed) which would defeat the purpose of
query planning.

~~~
pja
Granted, but (AFAICT) in this case the database was doing a full table scan
because it didn’t think the index was in memory. Checking to see whether the
index itself is already loaded seems like something the query planner in
principle ought to be able to do efficiently. (Obviously the existing codebase
might make it difficult.)

------
topbanana
This seems like something that should be measured at startup?

~~~
jeltz
Sadly that is not (at least easily) possible because random_page_cost vs.
sequential_page_cost is not just about the IO system but about all factors
which can affect the cost of reading pages randomly vs reading pages
sequentially. E.g. how often PostgreSQL's tables are in the file cache. So how
much RAM your machine has available for PostgreSQL and your access patterns
matter too.

Also I imagine the some expensive SAN solutions would be pretty tricky to
measure given how smart they try to be with caching and moving between
different kinds of disks.

------
emilfihlman
Why aren't these tuned automatically? Should be pretty easy.

~~~
pgaddict
It sounds simple until you actually try doing that. The thing is, reducing the
costing to these two parameters is a significantly simplified model of what
happens in practice. So you can't just run some I/O benchmark to measure
random vs. sequential requests. For example the defaults that worked fine for
a long time (seq_page_cost=1 and random_page_cost=4) certainly do not reflect
the difference between random and sequential I/O on rotational devices (where
the device can easily do 100MB/s in sequential access, but less than 1MB/s in
random).

~~~
ric2b
> For example the defaults that worked fine for a long time (seq_page_cost=1
> and random_page_cost=4) certainly do not reflect the difference between
> random and sequential I/O on rotational devices (where the device can easily
> do 100MB/s in sequential access, but less than 1MB/s in random)

The postresql documentation explains why. They assume HDD random access is 40x
slower than seq access but that you'll have a 90% cache hit rate, so
random_page_cost=4 reflects 10% of 40x slower.

~~~
pgaddict
Not entirely. The documentation says you can _interpret_ it that way, not that
it's how the numbers were determined.

AFAIK it's much more "We're using those numbers as defaults because they seem
to be working well," rather than "We did extensive benchmarking and these are
the right values!"

You can measure how much slower random I/O is fairly easily. But the question
is how to derive PostgreSQL cost parameters from that. Should you use the same
assumption about 90% cache hit ratio (why?) or should you use some different
value?

