
About Database Connection Pool Sizing - zyngaro
https://github.com/brettwooldridge/HikariCP/wiki/About-Pool-Sizing
======
commandlinefan
I did some profiling when I was working at one of the big online travel sites.
We were running about 300 or so servers with a 50-connection pool each. After
testing, we determined that at no time, ever, was more than one connection in
simultaneous use across the server farm. We could have reduced it to a single
shared connection per server and maintained the same QoS.

~~~
nextweek2
I've always worked with a transaction per connection principle to saving
having to manage transaction id's within a connection.

Were you not using transactions or did you have some other approach?

------
dlubarov
Why is that Oracle instance using a max thread pool size of 2048 on a 12-core
host? That seems like the real problem. Why not configure a smaller thread
pool like cores * 2, and do more queuing on the server?

It seems wrong IMO to expect _clients_ to tune connection pools based on how
much concurrency they think a server can productively handle. The client
doesn't necessarily know how many cores the server has, or more importantly,
how many other clients are connected.

In HTTP/2, clients can configure a maximum number of streams
(SETTINGS_MAX_CONCURRENT_STREAMS), and the RFC recommends setting it "no
smaller than 100". If servers worked on all pending requests concurrently, a
value of 100 would be unreasonably large. But the implicit expectation is that
servers will decide for themselves how many requests to process concurrently,
and queue the rest. In other words, SETTINGS_MAX_CONCURRENT_STREAMS is
intended to limit the server's queue size, not its thread pool size.

~~~
derefr
The SQL "protocol"† is a hairy one for a backend to deal with, state-wise.
Each concurrent connection to an (ACID conformant) SQL-speaking server implies
that that connection can have a transaction in progress.

A transaction, in SQL, requires the ability for the client (or the backend) to
roll it back at any time before it gets committed—and thus, a transaction in
progress requires keeping around at least two "world states" of the DB engine:
the one before the transaction began, and the way the world looks after the
current statement's modifications—but not those of any other concurrently-
executing transaction.

Thus, a backend concurrent connection (i.e. transaction) cap of N, translates
to O(N) concurrent MVCC "world states" that the DB engine has to keep track
of. No matter how efficient the DB engine is at doing that, that's still a lot
of state!

Imagine an RDBMS instance shared between numerous clients built and run by
different companies. This RDBMS is locked down in various ways to guarantee
each company an SLA on their queries. And, of course, the most important part
of that SLA is that a query that is ACKed by the server as having gone
through, must never lose data.

If you allow clients unlimited concurrency, they can execute enough
transactions to fill memory _and_ disk with MVCC world-states, causing
potential data loss (since the full disk may have prevented the DB engine from
writing down a _committed_ world-state.)

Thus, ACID-conformant SQL-speaking databases are always going to limit the
number of concurrent transactions, and so the number of concurrent
connections; and so—at least for SQL-speaking databases designed for instance-
addressable rather than cluster-addressable deployment—the responsibility for
deciding how much concurrency a client wants/needs is always going to be
pushed to the client, so that an enterprise with a DB cluster can set up its
less concurrency-intensive clients to self-limit their pool size.

† SQL isn't a wire protocol per se; it's just a syntax and semantics of a
formal language, without even a shared encoding. But the standard _does_
specify per-connection and per-transaction state-machines the backend must
have, and how these are affected by executing particular statements. So, in
this way, SQL is an abstract contract for the semantics that any particular
RDBMS's wire protocol must obey. A "meta-protocol", if you will.

~~~
dlubarov
That's a fair point -- most (all?) SQL servers use the connection limit as an
indirect transaction limit, rather than separating the two (in which case
BEGIN TRANSACTION would sometimes have to block or fail).

Are large numbers of concurrent transactions really an issue in practice
though? There may be some per-transaction overhead, but I haven't seen any
stern warnings about it, e.g. in the Postgres docs for max_connections.

> Thus, a backend concurrent connection (i.e. transaction) cap of N,
> translates to O(N) concurrent MVCC "world states" that the DB engine has to
> keep track of. No matter how efficient the DB engine is at doing that,
> that's still a lot of state!

I don't quite see what you mean about world state. An abort just rolls back
changes made by the aborted transaction, so I would expect SQL engines to
store a diff of each updated row. And if that's the case, it seems like a
single huge transaction could use as much memory as many small transactions,
if the same rows were modified.

~~~
anarazel
> Are large numbers of concurrent transactions really an issue in practice
> though? There may be some per-transaction overhead, but I haven't seen any
> stern warnings about it, e.g. in the Postgres docs for max_connections.

There's definitely scalability implications to higher number of established
connections in postgres. Essentially computing a "snapshot" (visibility
information for queries etc, needs to be computed at least once a transaction)
is O(#established-connections). If you get in the high hundreds on a large
NUMA machine, and you run tiny fast transactions, you definitely can observe
the overhead of that.

>> > Thus, a backend concurrent connection (i.e. transaction) cap of N,
translates to O(N) concurrent MVCC "world states" that the DB engine has to
keep track of. No matter how efficient the DB engine is at doing that, that's
still a lot of state!

There's definitely ways to handle that better than O(N), at least in the
average case.

~~~
derefr
> There's definitely ways to handle that better than O(N), at least in the
> average case.

O(N) root-level snapshot objects of the state, I mean. Like O(N) git commits.
The minimum size of the persisted MVCC snapshot _data_ , probably just scales
with O(log N) of the number of connections, depending on what sort of data
structure you're using for copy-on-write'ing your snapshots.

~~~
anarazel
Most databases don't have "root level snapshots of the state" in the way you
appear to imagine them. Leaving trivial per-transaction information ('did this
transaction id commit', ~a few bits per xact) aside, usually it won't scale
with the number of snapshots, but with the amount of change done by writing
transactions.

------
wild_preference
The other half of this is correctly using the connection pool in the
application layer to begin with.

One of the most common performance errors I see in random http applications is
grabbing db connections for too long, like in middleware, or too aggressively
in parallel queries.

~~~
sb8244
This is so true and easy to overlook until a post mortem. A great exercise is
to audit how connections are maintained in your application, before it's too
late.

~~~
macintux
I'm reminded of a systems failure described in, IIRC, the O'Reilly book
"Release It!". Java ODBC exception swallowed resulted in eventually running
out of connections.

Very tough problem to troubleshoot without extensive monitoring.

~~~
fnord123
Release it! is Prag Press. Good book too!

------
zrail
Alternatively, and specifically for PostgreSQL, if you can live with
pgbouncer's constraints then you should be using it. It effectively self-
manages an optimal shared connection pool for all of the application processes
that connect to it.

~~~
dijit
I would like to dissent.

If your application is long lived you will get more benefit using the pooling
functionality provided by the postgresql client libraries for your
platform/language.

pgbouncer is an intimidatory, and thus another thing to break- it has it's own
constraints (mainly things like prepared statements will block a
connection/cannot be reused) and its application should be well understood.

For instance; I use it on a local socket on my zabbix master so that it can
connect using a TLS encrypted pgsql socket. (something zabbix doesn't
support). But for my main applications written in C++, I use connection
pooling with nothing but the C++ client libraries.

If you're bringing up and tearing down connections very often, then you will
get a lot of benefit from pgbouncer, but in my experience these cases are
limited.

~~~
sb8244
Coming from ActiveRecord gem, pgbouncer has been great. I'd say the real
advantage is transaction level pooling, which will free up connections when
they're not actively in a transaction (vs request level isolation). I don't
see how this is possible across a large number of isolated servers without a
cooperating agent like pgbouncer.

I would advocate for pgbouncer at the DB ingress level and your languages
connection pooler at the application level.

------
exabrial
Take time to peruse the rest of the Wiki; It's chock full of information of
performance tuning techniques. Absolutely fascinating.

------
lixtra
See also [1]. The top comment there links to the article (2017).

[1]
[https://news.ycombinator.com/item?id=18220906](https://news.ycombinator.com/item?id=18220906)

------
bullen
Or you could use an async. db connection:
[https://github.com/tinspin/rupy/wiki/Fuse](https://github.com/tinspin/rupy/wiki/Fuse)

------
thezviad
This article would be better if it was called "connection pool sizing for
MySQL". If you had a better database, you would have to worry way less about
configuring pool size on the clients.

The proper way to do this would be for the server (i.e. database) itself to
have maximum active transaction limits and ways to setup quotas for different
use cases, especially as your company gets large, and you have many different
use cases mixed in the same database. Basically the queue should exist mostly
on the server, and clients shouldn't have to worry about overwhelming the
server. If server queue gets large, server should start rejecting requests
faster, and clients would do a backoff and retry with a delay based on that
instead. This also makes sure server can't be easily overworked if you have
one misconfigured and misbehaving client.

Lot of issues with many idle connections in MySQL are specific to MySQL itself
and its implementation. In MySQL the perf drops not only when you have many
active transaction, but even when you have many just many connected idle
sessions. This is why there are tons of different random "MySQL Connection
Proxy" projects that exist in the open source.

~~~
evanelias
What are you referring to in your 3rd paragraph? Idle connections in MySQL
pose few issues. They just take up some memory for session-level buffers, and
that amount of memory depends on what you've configured those buffer sizes to.
They also take up a slot in terms of whatever you've configured
max_connections to, but that's fully configurable as well.

MySQL has long had some abilities to limit resources on a per-user basis, see
[https://dev.mysql.com/doc/refman/5.6/en/user-
resources.html](https://dev.mysql.com/doc/refman/5.6/en/user-resources.html)
for example. This includes the ability to set max simultaneous connections per
user.

MySQL's default connection model dynamically uses a thread per connection,
which actually tends to handle high connection counts out-of-the-box better
than process-per-conn approaches like Postgres's. In my experience, using a
proxy like pgbouncer is much more common in Postgres than using a proxy is in
MySQL.

I'm not bashing Postgres overall, it's a great DB. But in terms of connection-
handling your criticism of MySQL here feels substantially off-the-mark.

(Source: have been using MySQL for 15 years, including at largest scale in the
world)

~~~
thezviad
Start of every new transaction does a scan of all connected sessions
(including idle) ones. I am pretty sure this is true in 5.6 too, and most
likely in newer versions too. So if the queries that you are issuing are large
or the total connection count is <10k it is not going to be problem or even
noticeable. But if you have lots of small and fast queries and have idle
connections >100k you will definitely have significant perf issues because of
the idle connections. 100k is a big number for sure, but it is definitely
possible to hit those limits if the application layer is done in languages
like node, ruby, python, when you need to run many application processes since
each app process can't properly utilize more than single CPU core. Thus you
end up having a lot of separate processes each with its own connection pool to
the backing database.

As for per-user limits, those resource limits aren't that useful for
stability. Setting max simultaneous connections is not as useful, because you
don't know how many of those are idle or active. You want limits on active
sessions that are actually doing work, not how many idle sessions exist.
Unless of course you plan on not doing any session reuse, and always make a
new session for each transaction, which will lead to huge other set of
performance issues because establishing new sessions is pretty expensive.

As for the other "per hour" limits, they are also not useful to provide
protections against burst traffic which is a common way a MySQL instance can
enter into a bad feedback loop and slow down to crawl. (Example: there is a
burst traffic from one use case, creating lots of new active transactions at
the same time, because of that, MySQL perf slows down, so now you have even
more active transactions because everything is slower, which leads to even
more slow down, so even after the burst traffic is over, database is in a bad
state since now you continuously have too many active transactions and it is
unable to recover on its own to handle the same steady state traffic as it was
able to handle before the burst).

~~~
evanelias
In my experience, many high-scale MySQL configurations use a global
max_connections in the single-digit thousands (4000-5000 is common at social
networks doing high-volume OLTP), and an aggressive wait_timeout (~10 seconds)
to kill idle conns from misbehaving/stalled clients.

I've never seen a max_connections configured anywhere near 100k. That would be
an extreme edge-case, and is generally unwise unless there's some very
specific unusual reason that you need that. My assumption would be something
is very wrong at the application architecture level if this is needed.

What "scan of all connected sessions" are you referring to? I've never heard
anything about this, and never seen general performance issues purely related
to high idle connection count, but I've also never configured max_connections
to such an insanely high value.

As for other databases, I don't see how Postgres would be able to handle 100k
connections (without a proxy) either, given that means 100k OS processes in
Postgres.

> always make a new session for each transaction, which will lead to huge
> other set of performance issues because establishing new sessions is pretty
> expensive

Fire-and-forget (connection per web request) is somewhat common in MySQL,
especially with languages like PHP. Establishing a connection is relatively
low overhead on the server side in MySQL, since it just involves spawning a
thread. Usually the bigger issue is network latency, especially if cross-
region SSL connections are involved. In this case, a client-side connection
pool or proxy certainly makes sense, and that is true regardless of what DB
technology is used.

It sounds like you have a lot of application servers maintaining connection
pools to a single database. In this case it is generally beneficial to tune
the connection pools to aggressively prune idle connections, or use proxies
that multiplex connections (ProxySQL is great), and then set max_connections
to a more sane value as a circuit-breaker.

