
Prewarming PostgreSQL I/O caches - suresh70
https://www.cybertec-postgresql.com/en/prewarming-postgresql-i-o-caches/
======
legulere
If you think about it, it’s a bit tragic that the APIs operating systems offer
for memory mapped files are not enough for databases. The operating system
already has all the caching logic in place and can do so more efficiently with
hardware support.

If you have data that rarely changes you can actually create an ad-hoc
database system based on that and have performance like an in memory database
without having to worry much about the memory of the system.

~~~
diroussel
Actually Postgres is one of the few databases that does use the os disk
caching. But then shared_buffers is another cache layer on top of the os
cache.

~~~
goatinaboat
That doesn't make any sense, any program accessing blocks on the disk via the
OS gets the OS cache with no action required on its part. Which includes all
databases except those like Oracle that can be set to bypass the OS. The
reason most databases do their additional caching is that they can be smarter
about access patterns since they have more information. The proportion of DB
to OS cache in any individual system is a matter of tuning.

~~~
usrnm
O_DIRECT allows you to bypass OS-level caching entirely, it's not that hard.
Whether or not it's a good idea is another question, of course

~~~
natmaka
Are you sure it leads to entirely bypassing the OS buffercache?

If I understand correctly O_DIRECT (the usual flag of open(2)) is one of the
possible behavior of WAL updates, precisely the 'wal_sync_method' setting (
[https://www.postgresql.org/docs/11/runtime-config-
wal.html](https://www.postgresql.org/docs/11/runtime-config-wal.html) ).
Therefore it will indeed not let any data written by PG in the buffercache,
but what about data _read_ (not written) by PG? The documentation lets me
think that those data will, just after being read and without any specific
disposition (posix_fadvise(2)...), be in the OS buffercache. (I didn't check
PG's sourcecode).

~~~
anarazel
Postgres does not use O_DIRECT for data (contrasting to the write ahead log /
journal, where it can in some configurations), neither when reading, nor when
writing. So data will definitely pass through the kernel page cache.

By default we force the kernel's hand about when to write data back to disk
from the page cache for a lot of the writes (from the checkpointing process
and the background writer, but not by backends themselves - it can be enabled
for backends too, and is often beneficial), as disk performance otherwise is
quite unpredictable (with long stalls when the OS decides to trigger
writeback).

For some workloads, in particular large and high read/write volume databases,
it'd be very beneficial to support O_DIRECT for data too, but there's some
work on postgres' IO subsystem needed to make that doable without performance
regressions in a lot of scenarios.

Even once we add it, it's unlikely to become the default, at least for reads,
as using O_DIRECT would require somewhat accurate tuning of postgres. Even for
databases without high throughput. Whereas also utilizing the OS page cache
doesn't require accurate sizing of shared_buffers etc, and adapts to changing
workloads and global resource availability.

------
mobilemidget
Sounds interesting, I miss a benchmark to go with the article in my opinion.

"pg_prewarm makes most sense if your database and your RAM are really really
large (XXX GB or more)"

A benchmark preferably with databases of sizes that make sense.

~~~
natmaka
Pre-fetching may be interesting if the first SQL query using prefetched data
picks a small set of data stored on a disk plate here, then another one stored
far away (distant cylinder), then back, then again far away, and so on... On a
mechanical HDD the implied head movements may considerably slow this first
query down.

Is there another case justifying a pre-fetching?

If I'm right pg_prewarm if not adequate with a SSD storage device, moreover
"makes most sense if your database and your RAM are really really large" is
sound as the amount of head moves is (typically) proportional to the DB size.

IMHO the most significant overall performance indicator here is the ratio R/D,
where R is the size of the RAM available to PostgreSQL (mainly shared_buffers
and the kernel buffercache), and D stands for the size of often used database
objects (data, indices...).

Multiple runs of a benchmark in contexts where this ratio is 1, then 0.5, then
0.01... often produces interesting results.

~~~
nieve
If you've got a database you can fit in memory that's got very sporadic access
pattern it could be a nice boost even on SSDs. My biggest personal project has
a 15GB database on disk, all of it a single table of 7.5M comments and a
PGroonga full text search index. The queries are very disproportionately
searching for low frequency terms so the index will largely cached, but most
of the rows will require disk access. Since I can spare the ram to get the
entire db in memory it's a clear win for me even if it's decent speed without.
My users are basically a font of random accesses and being able to treat
PostgreSQL as an in-memory db is convenient.

