
How to Manage Connections Efficiently in Postgres, or Any Database - bgentry
https://brandur.org/postgres-connections
======
sk5t
Good pre-reading to this article is Brett Wooldridge's (of HikariCP) article
on connection pool sizing:
[https://github.com/brettwooldridge/HikariCP/wiki/About-
Pool-...](https://github.com/brettwooldridge/HikariCP/wiki/About-Pool-Sizing)

An important consideration is that increasing the pool size many times above
the number of cores is very often counterproductive in the face of decreasing
IO latency. Also, there may often not be justification for anything other than
a constant-sized pool.

~~~
Serow225
Yes this is a great article, thanks for reminding me of it.

------
Mizza
PgBouncer is basically a requirement of any production Postgres system, but
I've always wondered why it simply wasn't part of the Postgres package itself?
For instance, it means a dedicated EC2 instances is necessary for getting the
most out of AWS RDS - even the whole point of RDS is to avoid managing that
server..

~~~
akurilin
We were big proponents of pgbouncer for years, but having one extra, sometimes
capricious actor in our architecture ended up not being worth it. Especially
if you're IO-heavy like we are, and that Web App -> DB pipe is the hottest
path in the entire system.

Removing it altogether made monitoring and debugging performance issues
easier, and client-side pools were essentially "good enough" for our purposes.

~~~
Mizza
Were you managing your own DB server instance? I don't think client side pools
are possible with RDS.

~~~
akurilin
We're on RDS right now. We use connection pools from
[https://www.yesodweb.com/book/persistent](https://www.yesodweb.com/book/persistent)

~~~
Mizza
Ohhh, I understand. Long day. I got that backwards, I thought there was a
server-side option I wasn't aware of. So you don't have a dedicated server
with PgBouncer on it, you just move the pooler to where the requests
originates from. I think this might be problematic in a highly
elastic/distributed architecture but would work if you have a beefy dedicated
application instance.

~~~
akurilin
Correct, you'd have to instruct your pools to resize as machines are added and
removed. We haven't had to do that so far, as we're not that dramatically
elastic.

------
jjirsa
“Or any database”, except it’s really driven by postgres’ process model, and
definitely doesn’t apply to most distributed databases

------
karlmdavis
This is a fine basic intro to the topic, but the whole argument is hung on a
very sketchy benchmark (as noted by the article itself in a footnote).

Some things that'd help improve the argument:

1\. Account for table/row locking in the benchmark: is it a factor or not?

2\. Re-run the benchmark with for some of the proposed solutions, e.g. local
connection pooling and PgBouncer.

~~~
aidos
I didn't know about this global locking but I'm assuming the row / table
locking is taken into account in the testing. Presumably just inserting into a
simple table with explicit unique ids each time should avoid issues with
locking (I guess you'd want to avoid indexes and sequences).

------
coleifer
This trips up a lot of newbies who have no concept of connection management or
scope. I routinely get questions about it in regards to the orm library I
wrote, which includes a connection pool of course.

Another thing to be aware of with postgres is that connections in an error
state (for example, due to trying to insert a row that violates a constraint)
must be rolled back before they can be recycled.

------
lixtra
I wonder how alfresco ends up with such a huge recommended default pool size
of 275 [1]. Is this bad architecture on their part or are different principles
at play?

[1] [https://docs.alfresco.com/5.0/tasks/postgresql-
config.html](https://docs.alfresco.com/5.0/tasks/postgresql-config.html)

------
analogkid
Nice article, but doesn't actually demonstrate that connections setup and tear
down is the bottleneck. If you really have 500 active connections all
"shouting" at the database simultaneously, the fact that those connections
were obtained from a pool or not isn't going to change the throughput of the
database.

Pooling introduces numerous other issues as well. For example, clients
probably shouldn't all be using the same entitlements to connect, right?
Surely there are some segregation of duties and roles that clients will be
using alternative entitlements into the system?

Finally, you have to show some before and after results. This article simply
asserts that connection pooling would improve the situation, but does nothing
to demonstrate the point. Even the details of the server matter...how many
cpus, how many drives, HDD or flash, what file system is being used?

~~~
hinkley
I don’t have the numbers but I do know we are facing this problem at work.
When you have microservices that are supposed to respond in <30 ms connection
setup time becomes noteworthy.

Ironically we exacerbated the symptoms by reducing average calls per page
below 1.0. The burstiness of the traffic made it clear that the low water mark
for the connection pool was not working out.

------
zzzeek
note that while the MySQL connection-pool equation is pretty different from
that of Postgresql's, at least in my experience MySQL connections use less
memory (sorry, no measurements handy, feel free to correct me, but they
generally are handled as a thread rather than an entire subprocess and we
regularly see apps that spin up thousands of them fairly cheaply) and also
have a lower startup time (also anecdotal, sorry), the MySQL world does
finally have middleware connection pools now such as proxysql:
[https://proxysql.com/](https://proxysql.com/)

~~~
mbell
There isn't much practical difference. MySQL's memory cost is lower, largely
due to using a thread per connection rather than a process, so you can have a
larger number of connections before running out of memory. But this really
doesn't matter since by the time you run out of memory with PG or MySQL you'll
probably have run into a situation where performance has tanked due to the
amount of context switching going on. In both cases you likely want something
like `num_cpu_cores * 2..3` connections to the database for optimal
performance.

~~~
zzzeek
it is different, because if you have 100 processes all talking to the
database, they can safely all have an application-local connection pool
holding onto a dozen connections in a pool that are probably largely idle, but
are ready in a flash. context switches aren't necessarily a problem because
the database probably sees 5-10% of these connection receiving queries at a
given moment. Also, mariadb has a thread-pool feature
([https://mariadb.com/kb/en/library/thread-pool-in-
mariadb/](https://mariadb.com/kb/en/library/thread-pool-in-mariadb/)) that
allows a significant reduction in threads for a given number of connections.

Contrast to Postgresql, where none of the above is an option, every connection
is a process and you don't want thousands of those piled up.

------
CWuestefeld
Does anyone have an understanding of if and how much this applies to AWS's
Aurora service, when operating as a PostgreSQL database?

~~~
stephengillie
The Aurora handbook strongly recommends using these techniques, or using a
connection proxy if connection pools won't work for your situation. Having too
many connections can cause your Aurora instances to scale up as well, limiting
database performance, and leaving you with hundreds more instances than you
want to pay for.

[https://d1.awsstatic.com/whitepapers/RDS/amazon-aurora-
conne...](https://d1.awsstatic.com/whitepapers/RDS/amazon-aurora-connection-
management-handbook.pdf)

~~~
CWuestefeld
Thanks. Your link is for MySQL, but I found a little related info for
PostgreSQL:
[https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide...](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraPostgreSQL.Managing.html)

------
bulatb
Those diagrams look really nice. Does anyone know how they were made?

~~~
craigkerstiens
The diagrams were done with monodraw -
[https://monodraw.helftone.com/](https://monodraw.helftone.com/)

~~~
MSM
Second that these diagrams look really nice and clean- does anyone happen to
know if there's an alternative that isn't Mac only?

~~~
eddyg
Have a look at draw.io
([https://about.draw.io/features/](https://about.draw.io/features/)) which can
make a lot of fantastic diagrams. Be sure to look at the "More shapes..."
options and choose the libraries that are of interest to you. Free and open-
source.

------
bullen
You need to use non-blocking async. clients.

And you should probably use HTTP too.

All database clients are blocking which kills the CPU with IO-wait and most of
them use some binary protocol.

Here is the worlds first and only async. parallel HTTP app server and
HTTP/JSON database:

[https://github.com/tinspin/rupy](https://github.com/tinspin/rupy)

