
Understanding caching in Postgres – An in-depth guide (2016) - craigkerstiens
https://madusudanan.com/blog/understanding-postgres-caching-in-depth/
======
napsterbr
That seems amazing, thanks for sharing. Do anyone on hn happen to know a
similar guide / article explaining how data locality can help a database?

I've been looking for the topic, both spatial and temporal locality, and to my
surprise I've found very few discussions about it, most of which led me to
believe it wouldn't matter much shaping my indexes to have good locality
between them. I believe there would be some optimization but I still hadn't
the time to benchmarking.

~~~
j_s
Here's a recent discussion recommending a specific site that might help you as
a resource, but I don't see much there regarding locality.

How far can you go with MySQL or MariaDB? |
[https://news.ycombinator.com/item?id=14892184](https://news.ycombinator.com/item?id=14892184)
(Aug 2017)

> dmux _This is a great resource RE indexing:[http://use-the-index-
> luke.com/](http://use-the-index-luke.com/) _

There is a detailed PostgreSQL optimization comment there that may be more
what you had in mind:

[https://news.ycombinator.com/item?id=14896154](https://news.ycombinator.com/item?id=14896154)

> olavgg: _Use multi-column /functional/partial indexes, as this will help you
> design a database where most of your 'hot' indexes can fit in RAM. If you
> have too many indexes, you will do index scan on disk, which is painfully
> slow. I[n] many cases a full table scan (sequential read) would be faster
> instead._

------
dullgiulio
I was surprised to learn Postgres doesn't use direct read/write (O_DIRECT) to
bypass kernel caches.

The article doesn't go into the details, but maybe someone can point me to
some resources that explain the benefits of this double cache layer?

All I can think of is PHK (author of Varnish) arguing for exactly the
opposite: use only kernel caches with clever mmap.

~~~
madhusudhan000
Author here.

> I was surprised to learn Postgres doesn't use direct read/write (O_DIRECT)
> to bypass kernel caches.

It is more related to the database being platform neutral. The community is
very sensitive to putting in performance improvement/hacks which applies only
to certain systems. There was a discussion back then -
[https://www.postgresql.org/message-
id/50E4AAB1.9040902@optio...](https://www.postgresql.org/message-
id/50E4AAB1.9040902@optionshouse.com). There were several more, you can search
around the mail archives, but the general tendency is not to put changes which
is not universally applicable to all platforms.

With that said, a lot is changing, as their community grows we can expect
optimizations like these.

> The article doesn't go into the details, but maybe someone can point me to
> some resources that explain the benefits of this double cache layer?

The article was already a bit long so I didn't go into much details. Perhaps
ill write on it in another article.

~~~
dullgiulio
Thank you. And yes, please do write more about such things, it would be very
interesting read.

------
thinkMOAR
It would also be interesting to see the tests (re-)done or compared vs new
postgresql 10.

~~~
madhusudhan000
It hasn't changed much. But new OLAP features have come up such as parallel
queries and for those, the existing cache architecture doesn't make sense
since OLAP is full of sequential scans.

You can read the doc for the same -
[https://www.postgresql.org/docs/10/static/when-can-
parallel-...](https://www.postgresql.org/docs/10/static/when-can-parallel-
query-be-used.html)

~~~
thinkMOAR
Thanks for the feedback. Interesting nonetheless, thanks for the link

------
ComodoHacker
The cache architecture looks much like Oracle's, which I guess was the model
at the time it was designed.

