
Vertically Scaling PostgreSQL - pplonski86
http://pgdash.io/blog/scaling-postgres.html
======
bshipp
I administer a medium-sized postgres db, around (now) 110GB of data, and
recently had the opportunity to revisit the old .conf file when I redeployed
that database to a different server. What I quickly realized was that years of
tweaks I had implemented as I learned postgres from various (now outdated)
online sources were making it difficult to optimize my new instance.

I decided to start again from scratch. I discovered a handy little utility
called pghero and set up it's internal monitoring:

[https://github.com/ankane/pghero](https://github.com/ankane/pghero)

I backed up my old .conf replaced it with a default and blew away all my old
indices, and fired up the database. Of course, nothing worked well. But step
by step, working with postgres' internal tools and pghero, I took a brand new
look at every previous assumption and was much more strategic about where and
when I created an index and which tuning parameter I tweaked.

Long story short, I was able to knock 30 GB off of the install disk size
(based, in large part, on unnecessary duplicate indices), accelerate the most
common queries, and everything worked a lot smoother. Plus my postgres
installation was no longer fighting old postgres .conf tuning tweaks which
were designed for much older versions of postgres.

Just adding my vote for revising your old tuning parameters after your
database (and your programmer/administrator) has matured a bit.

~~~
joeyrobert
This is a common operation for my team as well. In my old project, so many
indexes that were added were just assumptions about the usage pattern of the
data. Not only did they blow up the size of the table and INSERT time, they
became the opposite of a red-herring where developers would stop and look
elsewhere because the table "already has indexes".

Only adding indexes with good use of EXPLAIN/EXPLAIN ANALYZE on expected usage
patterns yielded good results. Often we'd just blow the indexes away and
fiddle until we added the correct one to enable an index scan / index only
scan.

Other big boons for us have been using ENUM types where necessary (small known
dataset for a column) -- now your column takes 4 bytes instead of N bytes for
a string. I find them a bit easier to work in than foreign keys for this
optimization because of their direct string mapping.

We've had really good mileage with Datadog's APM tools, RDS performance
insights and pghero as you mentioned.

~~~
Jedd
> Other big boons for us have been using ENUM types where necessary (small
> known dataset for a column) -- now your column takes 4 bytes instead of N
> bytes for a string. I find them a bit easier to work in than foreign keys
> for this optimization because of their direct string mapping.

Can you give an example?

It sounds like normalising the schema would have similar (perhaps more)
benefits.

~~~
joeyrobert
Normalizing has similar size benefits. For our use case (big aggregate
reporting tables), incoming external data is not normalized, meaning we'd have
to have another process iterate over it with foreign table in memory and map
it to the foreign key. Enum's can be transparently ingested without this
normalization requirement, while taking equivalent space. There's also now one
level of indirection between the column and its value. Both are fine.

------
tyingq
_" In Linux, the most popular host for Postgres, the size of the OS disk cache
cannot be set by the user."_

True, though you can tweak settings on when the cache writes dirty pages.
Percona has a blog post specific to Postgres that covers it. There's also some
other tuning there not mentioned in the HN linked story.

[https://www.percona.com/blog/2018/08/29/tune-linux-kernel-
pa...](https://www.percona.com/blog/2018/08/29/tune-linux-kernel-parameters-
for-postgresql-optimization/)

------
michelpp
There is an interesting tool called nancy that runs automated experiments with
different postgres setups, allowing you to compare them against each other. It
can run workloads derived from log files or pgbench (disclaimer, I wrote the
pgbench support). The idea is, you run many experiments, tweaking different
configurations and nancy automates the analysis of them, eventually allowing
you to settle into a cost minimum/performance maximum you're comfortable with.

[https://github.com/postgres-ai/nancy](https://github.com/postgres-ai/nancy)

------
natmaka
Beware, effective_cache_size is NOT, as stated, "the amount of disk cache
provided by the OS"

It is a combination of shared_buffers and kernel cache size See
[https://www.postgresql.org/message-
id/20181105120104.GA13769...](https://www.postgresql.org/message-
id/20181105120104.GA13769%40momjian.us)

This important configuration parameter is a too-often badly understood. IMHO
it should be, at least on a server non dedicated to PG, periodically
dynamically calculated by the engine.

------
ps
Check out pgtune
([https://pgtune.leopard.in.ua/#/](https://pgtune.leopard.in.ua/#/)) if simple
config tuning is what you are looking for. It is probably more valuable than
this blog post.

~~~
dewey
> It is probably more valuable than this blog post.

Learning about what goes on under the hood before you blindly change settings
some tool spits out doesn't seem like a bad idea I'd say.

~~~
ps
The point is that the pgtune could and should force you to check the config
options in documentation and it actually addresses your current setup. RTFM.

Values in the post are pretty arbitrary and useless as their comments in
postgresql.conf used to be.

I am using PostgreSQL in production since 7.4 and the best source of
performance tuning tips for me has been PostgreSQL 9.0 High Performance by
Gregory Smith. These days PostgreSQL wiki is pretty good as well.

~~~
rch
I'd second that book recommendation. I just went through it again on a long
plane flight, and much (if not all) of the configuration advice applies to
more recent releases as well.

------
manigandham
Sounds like many of these should be automatic by now. One area where I'd like
to see some progress in Postgres.

------
ahachete
Feel free to check [https://postgresqlco.nf](https://postgresqlco.nf) A humble
effort to document all PostgreSQL configuration parameters and provide some
guidance.

