
Understanding caching in Postgres - craigkerstiens
http://madusudanan.com/blog/understanding-postgres-caching-in-depth
======
jordanthoms
One thing I've always wondered about: It seems that Postgres is designed to
use the operating system's capabilities in terms of disk caching, I/O
scheduling, Multitasking etc as much as possible, whereas most other databases
take the opposite approach and try to bypass the operating system as much as
possible, sometimes even to the point of bypassing the file system and doing
raw disk I/O. AFAIK something like Oracle is almost an operating system by
itself.

What lead to that design decision and what are the tradeoffs from it?
Intuitively it seems like it'd be better to bypass the OS, but on the other
hand Postgres does work incredibly well.

~~~
jandrewrogers
Generally speaking, bypassing the operating system I/O scheduler and cache
will substantially increase the effective throughput of the database kernel.
The reasons why are somewhat complex and technical (e.g. it is prerequisite to
implementing a good approximation of Bélády's optimal algorithm) but this fact
is not controversial. The reason for doing your own task scheduling has
similar theoretical foundations in the abstract though it manifests
differently.

The tradeoff is that it _greatly_ increases the complexity of maintaining
portability and consistency because every operating system and file system
behaves differently and offers different guarantees at this level of
abstraction. PostgreSQL makes a priority of being highly portable, and has
therefore chosen to avoid this particular code swamp and testing nightmare
altogether thus far even though it comes at a significant cost to peak
performance potential.

PostgreSQL is quite efficient and well designed for the design constraints
they currently operate within but their primary value and philosophy is not
peak performance per se.

~~~
jordanthoms
Thanks for that - the tradeoff between performance and complexity in this case
is clearer now.

It's interesting how many different approaches there are when stepping back
and using some of the OS - for example where does MySQL/InnoDB fit in this?

It seems they are intending you to use their buffer pool rather than the OS's
disk cache (as they advise to make the buffer pool take large percentage of
RAM), as opposed to Postgres where they recommend having shared_buffers as
~20% of RAM and leaving the rest for the OS to use. Is InnoDB doing something
different here or is it just because they have a better cache implementation?

------
callesgg
1 CPU cycle is not really comparable to 1 s. A thougt that you can think of in
1 second can be extremely complex.

Nitpicking? neh dont think so.

~~~
brobinson
I think the goal was to establish a base from which all the other comparisons
could be reasonably drawn. Humans don't generally think in fractional seconds.

~~~
callesgg
No but a human thought does not represent a numeric operator, it represents a
very very complex state. It would take megabytes probably even perhaps
gigabytes to represent a thought on some level.

~~~
teraflop
It takes an awful lot of data to fully capture the instantaneous state of a
superscalar processor, too. Instruction scheduling, register renaming, cache
lookups, branch prediction, speculative execution, and so on.

