
Parallel Index Scans in PostgreSQL - amitkapila
http://amitkapila16.blogspot.com/2018/05/parallel-index-scans-in-postgresql.html
======
humanfromearth
Parallel scans while great for performance, I found them hard to tune
considering how they work with `work_mem`, `max_connections`.

The issue is that when a parallel scan starts, it's not limited to the
`work_mem` which you would expect per connection. Instead, every parallel
worker gets it's own `work_mem` limit. On a very busy db you can quickly end
up with lots of OOM errors. What you end up doing is either disable all
parallel scans or reduce `work_mem` which sometimes results in slower query
performance for non-parallel queries. Suffice it to say, I wasn't smart enough
to tune them and just disabled the parallel scans. Another way to do it is set
a higher `work_mem` for expensive queries while having a generally low
`work_mem`, but that's just hard to get right and you end up with even more
bookkeeping.

This behavior might have changed. If anyone knows more about this would be
glad to hear what you've done.

~~~
jasonmp85
Agreed. It might be nice to have separate work_mem settings for single-process
and parallel queries, or perhaps even just divide the single-process query
work_mem by the number of workers that will be used. The number of OoMs that
start popping up just because someone turns on parallelism is embarrassing.

~~~
amitkapila
I don't have any solution for this problem, but one thing to note is that if
you set the appropriate value of work_mem in the session that runs the
parallel query, the same will be used by parallel workers.

~~~
humanfromearth
The problem I see with this approach is that you don’t know when the planner
is going to decide to use parallel scans meaning that you might get worse
performance for non parallel queries.

~~~
amitkapila
That's right, but you can manage if you know which query is going to take
parallel plan (using explain). You can set it before that query execution and
reset once query is executed. I understand that this can't be a generic
solution, it can help only in a limited set of cases.

------
mistrial9
recent experience (yesterday) with Postgres 10 from the PGDG repo for Ubuntu..
reading indexes plus data from a 30GB database for analytics on a four-core i5
consumer motherboard with "spinning rust" disks, parallel configs set in
postgresql.conf, and shared_mem of about 6GB, the reads from disk are
consistantly 30-40 percent faster for the larger tables, with no change in the
smaller tables.. (it is easy to see the parallel workers in htop) Great
feature from PostgreSQL Global Development Group!

------
Tostino
Excited for this. I have quite a few queries which would benefit. The work has
been amazing over the past few years (and really even further back).

~~~
amitkapila
Thanks.

