
Costs of a PostgreSQL connection - wlll
http://hans.io/blog/2014/02/19/postgresql_connection/index.html
======
smilliken
> We were running out of file handles which caused the database to crash

Most distributions ship with conservative defaults for the maximum number of
concurrent file descriptors and socket connections. Especially for a database
server, it's wise to tune these values before optimizing anything else.

> 1000 files per connection * 370 = 370,000 files

This is a very pessimistic estimate, in reality you'll need much less than
1000 file descriptors per connection. As long as you tune your file descriptor
limit, you should be able to handle one or two orders of magnitude more
connections than this.

A few resources that might help:

[http://serverfault.com/questions/92723/file-descriptors-
sett...](http://serverfault.com/questions/92723/file-descriptors-setting-on-
etc-sysctl-conf)

[http://www.cyberciti.biz/faq/linux-increase-the-maximum-
numb...](http://www.cyberciti.biz/faq/linux-increase-the-maximum-number-of-
open-files/)

[http://stackoverflow.com/questions/2569620/socket-accept-
err...](http://stackoverflow.com/questions/2569620/socket-accept-error-24-to-
many-open-files)

~~~
_hans_
Thank you. We configured the system to accept more file descriptors. I think
our issue was caused by something else.

------
lazyjones
* work_mem is not used per connection. It _may_ be used for some queries. Same for temp_buffers.

What I miss here is information about the Postgres version used.

Also, if 3.5GB of memory hurt you on a database server with 370 connections,
you need to think about buying a DB server with more RAM (our main DB server
has 512G and typically ~220 connections, we don't sweat over it and don't use
PGBouncer to add latency).

~~~
_hans_
Thanks for your comment!

Re work_mem: I think this is per query and thus might affect every
connection?! At least thats what I gather from 'PostgreSQL 9.0 High
Performance' Re temp_buffers: 'Sets the maximum number of temporary buffers
used by each database session.' For us it practically means per connection.
Not for you?

Shit, I'm missing the version! 9.3. I edited the blog post.

Re memory: I think that really depends on the application I think. We have
many idle connections.

~~~
lazyjones
> _Re temp_buffers: 'Sets the maximum number of temporary buffers used by each
> database session.' For us it practically means per connection. Not for you?_

It means that potentially every connection may at some point use this amount
of memory, but idle connections will not. If you have 20 busy connections and
350 idle ones, at most 20 x temp_buffers will be used and there is no benefit
from PGbouncer in this regard. If you have 370 busy connections, all of them
might use temp_buffers and your tradeoff can be 370 concurrently executing
queries at higher memory cost vs. e.g. 20 concurrently executing queries via
PGbouncer at lower memory cost and 350 stalled queries with potentially high
latency.

> _We have many idle connections._

In this case, you will not have any issues with memory use from temp_buffers,
work_mem ...

~~~
_hans_
Yes you're right, idle connection do not use temp_buffers or work_mem.

I was trying to gather everything relevant to resources per connection. I'm
not saying work_mem caused our problem.

------
arielweisberg
I have always found it odd how some databases conflate network connections
with threads (or processes) of execution. If Postgres has an ideal number of
concurrent workers it should instantiate that many and multiplex requests from
various connections across them which has several benefits including allowing
you to preemptively shed requests you know you won't get to in time and do
other scheduling and QoS stuff.

I get it from a historical perspective where it would have been needless
complexity to implement something else.

On an unrelated note it's pretty amazing what threads get conflated with in
general.

~~~
wulczer
One of the important reasons Postgres connections are serviced by a separate
process is that if that process segfaults, you only disconnect other
connections (because if might have tainted shared memory when it died), but
the main database process keeps running and you can reconnect immediately.

~~~
adamtj
Not quite. If a backend segfaults, postgres will detect it and immediately
terminate all connections and exit. When it restarts, it will enter recovery
mode. It does this because a segfaulted backend may have corrupted shared
memory. The only way to fix that to restart in recovery mode and replay the
WAL.

------
AznHisoka
For some reason, Rails doesn't behave well with pgbouncer or any other pg
connection pool. So I really have no choice but to set my max connections to
1900, and pray that it's enough. Surprisingly, my 48 GB RAM, quad core CPU,
non SSD server can handle that load, although the hardware will probably
degrade after a certain time (oh well _shrugs_ )

~~~
_hans_
We enabled PGBouncer transaction mode and had to set 'prepared_statements:
false' in our database.yml b/c the transaction mode doesn't support that.
Works fine for us.

~~~
gkop
What are the implications of 'prepared_statements: false'? It sounds really
bad.

~~~
justincormack
Prepared statements are not a massive performance gain in postgres so I
wouldn't worry.

~~~
nousernamesleft
If done right they can be huge. The fact that they do nothing when you just do
"prepare" followed immediately by "execute" in place of just running the query
directly is not a problem with postgresql or prepared statements, it is a
problem of doing something dumb. For simple selects the parsing and planning
stage can be 90% of the time spent. The problem is that postgresql doesn't
have stored procedures, so you have to make sure every connection is
initialized by running all of your "prepare"s before it is made available to
the pool.

~~~
dbenhur
You can get detailed timing data about parse, plan, execute timings from
postgresql logging. Parsing is almost always trivial. Planning can get
expensive for complex queries with many joins (large space of solutions to
explore). For simple queries there's almost no benefit for prepared statements
in postgres. Prepared statements are a maintenance headache and don't play
well with session multiplexed connection pooling (like PgBouncer); generally
best to avoid them unless you have measured a concrete and significant
benefit.

~~~
nousernamesleft
>You can get detailed timing data about parse, plan, execute timings from
postgresql logging. Parsing is almost always trivial.

And yet you can get >50% speedups for super common queries like "select * from
foo where id = ?" if you prepare it once and then use execute instead of just
running the select every time. Seems like maybe you're making assumptions you
shouldn't.

------
robbles
If pgbouncer makes this big a difference to performance, is it included in
managed PostgreSQL services, such as Amazon RDS and Heroku Postgres?

Or would you have to roll your own pgbouncer setup for these systems?

~~~
_hans_
Neither AWS RDS PostgreSQL nor Heroku has a PGBouncer.

~~~
btown
Heroku actually does seem to support/encourage it:

[https://postgres.heroku.com/blog/past/2013/11/22/connection_...](https://postgres.heroku.com/blog/past/2013/11/22/connection_limit_guidance/)
[https://github.com/gregburek/heroku-buildpack-
pgbouncer](https://github.com/gregburek/heroku-buildpack-pgbouncer)

~~~
robbles
Saw that reference in the linked post from Heroku in the original article.

Has anyone used this? Is it configurable, or do they just choose some
reasonable defaults?

~~~
thruflo
Works great for me. There's some minimal configuration explained in the
pgbouncer docs. What the pgbouncer buildpack does is just route your
DATABASE_URL through the local pgbouncer.

In my python app, I switched from an sqlalchemy connection pool per process
(eg per gunicorn worker) to using a Null pool everwhere. No increase in
latency; far fewer connections needed.

------
Spearchucker
Would be nice to see a little more detail on the clients that need hundreds of
connections. I have no experience with PostgreSQL, but do get around this in
other RDBs through a mix of async calls, caching and optimising data
structures and application logic (meta-data driven schema; indexing content in
the app before a db write to eliminate having to retrieve that content to
search it, or indexing it using the db).

------
ocharles
If you only have `2 * n cores` connections, doesn't that mean that you can at
most have `2 * n cores` transactions running at once? I suppose on larger
servers that is fairly reasonable (as you want these transactions to be very
short lived anyway), but it does seem like you don't give yourself much wiggle
room.

------
tbarbugli
hard to believe there are still people non using pgbouncer in production :)

~~~
nousernamesleft
Not everyone runs some PHP damaged setup where connection pooling has to be
pushed out to a proxy. Some people use sane applications in the first place.
There is no reason for those people to add an extra layer of latency and
potential failure to achieve what they already have.

~~~
yummyfajitas
True - it's only application insanity preventing a distributed collection of
database clients from properly coordinating the proper allocation of postgres
connections amongst themselves.

Now true, in the event of a network partition, you lose the ability to open a
database connection. But that's a small price to pay, right?

~~~
teraflop
I guess this is sarcasm, but it's not clear what you're trying to actually
say. How is a connection-pooling proxy supposed to make an application more
resilient to network partitions? If anything, it's worse, since you have twice
as many points of failure -- if either the application can't talk to the
proxy, or the proxy can't talk to the database, you're hosed.

~~~
yummyfajitas
Practically speaking, you typically run pgbouncer on the same box as postgres.

Even if you ran pgbouncer on a different box, the situation is still better.
Suppose you have a probability P of having a partition between any two boxes.
Then the probability of a partition taking down the system with pgbouncer is
P.

If you instead had N clients coordinating among themselves how many
connections to use, your odds go way up. The probability of a partition
existing between two clients is 1-(1-P)^(N(N-1)/2) ~= N(N-1)P/2 (for very
small P). Once two clients can't communicate, neither one can connect to the
DB (because they don't know if the other two have pushed the db over the
limit). All clients will need to stop connecting to the DB if any client is
ever disconnected from the other N-1 of them.

In a distributed system, ensuring that SUM(x[i]) remains bounded for all time
is a tricky problem to solve.

~~~
saurik
> Practically speaking, you typically run pgbouncer on the same box as
> postgres.

I've done this, long-term, in production. This is not how PGbouncer is
designed to be used, and it causes serious problems under load. (I am still
doing this, but I have it on my todo list to remove this glitch from my
architecture, as it caused my entire site infrastructure to fail once during a
launch.) Put simply: PGbouncer is single-threaded; it is not difficult to end
up in situations where the one core PGbouncer is running on to be at 100%
while the remaining cores (in my case, seven) running PostgreSQL are having no
issue with the traffic.

~~~
yummyfajitas
Just curious, how does that happen? I've never found myself in a situation
where PGBouncer can't handle the load but Postgres can.

I'm guessing it's a situation where you are running a large volume of very
easy queries, for example primary key lookups with no joins?

~~~
_hans_
We do have this problem! It is only a matter of how small the PGBouncer
machine is compared to the PostgreSQL. In our case the PGBouncer had to deal
with 16000 req/s, coming from 700 clients going through 100 connections to the
PostgreSQL. We now have 4 PGBouncers for this DB, and we're 'load balancing'
them with DNS.

------
zzzeek
tl;dr;tl;dr; - use a connection pool.

------
jmnicolas
Now you fixed your Postgres could you fix the reminders in Wunderlist pretty
please ?

It doesn't work (for me) on Android so I started to use the app less and less.

