Hacker News new | comments | show | ask | jobs | submit login
Costs of a PostgreSQL connection (hans.io)
109 points by wlll 1158 days ago | hide | past | web | 59 comments | favorite



> We were running out of file handles which caused the database to crash

Most distributions ship with conservative defaults for the maximum number of concurrent file descriptors and socket connections. Especially for a database server, it's wise to tune these values before optimizing anything else.

> 1000 files per connection * 370 = 370,000 files

This is a very pessimistic estimate, in reality you'll need much less than 1000 file descriptors per connection. As long as you tune your file descriptor limit, you should be able to handle one or two orders of magnitude more connections than this.

A few resources that might help:

http://serverfault.com/questions/92723/file-descriptors-sett...

http://www.cyberciti.biz/faq/linux-increase-the-maximum-numb...

http://stackoverflow.com/questions/2569620/socket-accept-err...


Thank you. We configured the system to accept more file descriptors. I think our issue was caused by something else.


* work_mem is not used per connection. It may be used for some queries. Same for temp_buffers.

What I miss here is information about the Postgres version used.

Also, if 3.5GB of memory hurt you on a database server with 370 connections, you need to think about buying a DB server with more RAM (our main DB server has 512G and typically ~220 connections, we don't sweat over it and don't use PGBouncer to add latency).


Thanks for your comment!

Re work_mem: I think this is per query and thus might affect every connection?! At least thats what I gather from 'PostgreSQL 9.0 High Performance' Re temp_buffers: 'Sets the maximum number of temporary buffers used by each database session.' For us it practically means per connection. Not for you?

Shit, I'm missing the version! 9.3. I edited the blog post.

Re memory: I think that really depends on the application I think. We have many idle connections.


> Re temp_buffers: 'Sets the maximum number of temporary buffers used by each database session.' For us it practically means per connection. Not for you?

It means that potentially every connection may at some point use this amount of memory, but idle connections will not. If you have 20 busy connections and 350 idle ones, at most 20 x temp_buffers will be used and there is no benefit from PGbouncer in this regard. If you have 370 busy connections, all of them might use temp_buffers and your tradeoff can be 370 concurrently executing queries at higher memory cost vs. e.g. 20 concurrently executing queries via PGbouncer at lower memory cost and 350 stalled queries with potentially high latency.

> We have many idle connections.

In this case, you will not have any issues with memory use from temp_buffers, work_mem ...


Yes you're right, idle connection do not use temp_buffers or work_mem.

I was trying to gather everything relevant to resources per connection. I'm not saying work_mem caused our problem.


I have always found it odd how some databases conflate network connections with threads (or processes) of execution. If Postgres has an ideal number of concurrent workers it should instantiate that many and multiplex requests from various connections across them which has several benefits including allowing you to preemptively shed requests you know you won't get to in time and do other scheduling and QoS stuff.

I get it from a historical perspective where it would have been needless complexity to implement something else.

On an unrelated note it's pretty amazing what threads get conflated with in general.


It is no odder than Unix conflating your shell with a process, really. You connect, run some commands within a context that's "yours", then it's all cleaned up when you go.


One of the important reasons Postgres connections are serviced by a separate process is that if that process segfaults, you only disconnect other connections (because if might have tainted shared memory when it died), but the main database process keeps running and you can reconnect immediately.


Not quite. If a backend segfaults, postgres will detect it and immediately terminate all connections and exit. When it restarts, it will enter recovery mode. It does this because a segfaulted backend may have corrupted shared memory. The only way to fix that to restart in recovery mode and replay the WAL.


For some reason, Rails doesn't behave well with pgbouncer or any other pg connection pool. So I really have no choice but to set my max connections to 1900, and pray that it's enough. Surprisingly, my 48 GB RAM, quad core CPU, non SSD server can handle that load, although the hardware will probably degrade after a certain time (oh well shrugs)


We enabled PGBouncer transaction mode and had to set 'prepared_statements: false' in our database.yml b/c the transaction mode doesn't support that. Works fine for us.


What are the implications of 'prepared_statements: false'? It sounds really bad.


Prepared statements are not a massive performance gain in postgres so I wouldn't worry.


They do provide a nice safety net against SQL injection though. A problem that Rails has from time to time.


bound parameters are the safety net against SQL injection. Prepared statements are not a prerequisite for that depending on database client API. (e.g. psycopg2, doesn't use prepared statements).


I appreciate the comment, but since this was in regards to Rails, the two options were basically use prepared statements or don't. You don't get any safety without them and have to rely on Rails, which does an admirable job, but has faltered from time to time.


If Rails stops using the two-step parse/execute PostgreSQL commands (which are supported without issue by PGbouncer, and which is how you are supposed to get bound parameters) when prepared_statements is turned off (a setting which intuitively should only turn off using standard SQL named prepared statements), then that's pathetic behavior (even for Rails) and should really be fixed.


If done right they can be huge. The fact that they do nothing when you just do "prepare" followed immediately by "execute" in place of just running the query directly is not a problem with postgresql or prepared statements, it is a problem of doing something dumb. For simple selects the parsing and planning stage can be 90% of the time spent. The problem is that postgresql doesn't have stored procedures, so you have to make sure every connection is initialized by running all of your "prepare"s before it is made available to the pool.


You can get detailed timing data about parse, plan, execute timings from postgresql logging. Parsing is almost always trivial. Planning can get expensive for complex queries with many joins (large space of solutions to explore). For simple queries there's almost no benefit for prepared statements in postgres. Prepared statements are a maintenance headache and don't play well with session multiplexed connection pooling (like PgBouncer); generally best to avoid them unless you have measured a concrete and significant benefit.


>You can get detailed timing data about parse, plan, execute timings from postgresql logging. Parsing is almost always trivial.

And yet you can get >50% speedups for super common queries like "select * from foo where id = ?" if you prepare it once and then use execute instead of just running the select every time. Seems like maybe you're making assumptions you shouldn't.


I think the statements I benchmarked were more complex and slower, so there wasnt much in it. It was a while ago though. For simple selects it could make much more difference.


> The problem is that postgresql doesn't have stored procedures

Of course it has: http://www.postgresql.org/docs/9.3/static/sql-createfunction...

And you can choose between 4 languages: http://www.postgresql.org/docs/9.3/interactive/xplang.html


Those are functions, not stored procedures. Every time that function is executed, it is re-parsed and re-planned.


PL/pgSQL function plans are cached. Functions written in other languages may not be, I don't know, but in my experience PL/pgSQL is the most widely used.

http://www.postgresql.org/docs/9.3/static/plpgsql-implementa...


>PL/pgSQL function plans are cached

Read your link. They can be cached. You have no way to ensure that they are. And you have no way to have the planning done up front at all, you have to wait until the function is run, and even then every code path in the function has to be exercised to get it planned. And then half the time it doesn't think it should bother to cache the plan anyways. And it is per-session, not shared. So every single connection has to start over with an empty cache every time, and there's tons of duplicate plans in the cache wasting space. Not cool, and by far the biggest thing keeping a ton of "enterprise" customers on SQL server and oracle.


You have some valid criticisms of the implementation, though pgsql-hackers has their reasons (http://www.postgresql.org/message-id/CAHyXU0ybwZZUbuQQVFQMK3...) for the way things are. Regardless, my point is that the situation is not so bad as "Every time that function is executed, it is re-parsed and re-planned."


The default stored procedure language is PL/pgSQL http://www.postgresql.org/docs/9.3/static/plpgsql.html and has been available in Postgres almost forever.


Those are functions, not stored procedures.


There was no apparent problem with turning it off for us. But maybe there is for others.


dont prepare statements?


If pgbouncer makes this big a difference to performance, is it included in managed PostgreSQL services, such as Amazon RDS and Heroku Postgres?

Or would you have to roll your own pgbouncer setup for these systems?


Neither AWS RDS PostgreSQL nor Heroku has a PGBouncer.



Saw that reference in the linked post from Heroku in the original article.

Has anyone used this? Is it configurable, or do they just choose some reasonable defaults?


Works great for me. There's some minimal configuration explained in the pgbouncer docs. What the pgbouncer buildpack does is just route your DATABASE_URL through the local pgbouncer.

In my python app, I switched from an sqlalchemy connection pool per process (eg per gunicorn worker) to using a Null pool everwhere. No increase in latency; far fewer connections needed.


Would be nice to see a little more detail on the clients that need hundreds of connections. I have no experience with PostgreSQL, but do get around this in other RDBs through a mix of async calls, caching and optimising data structures and application logic (meta-data driven schema; indexing content in the app before a db write to eliminate having to retrieve that content to search it, or indexing it using the db).


If you only have `2 * n cores` connections, doesn't that mean that you can at most have `2 * n cores` transactions running at once? I suppose on larger servers that is fairly reasonable (as you want these transactions to be very short lived anyway), but it does seem like you don't give yourself much wiggle room.


hard to believe there are still people non using pgbouncer in production :)


Not everyone runs some PHP damaged setup where connection pooling has to be pushed out to a proxy. Some people use sane applications in the first place. There is no reason for those people to add an extra layer of latency and potential failure to achieve what they already have.


True - it's only application insanity preventing a distributed collection of database clients from properly coordinating the proper allocation of postgres connections amongst themselves.

Now true, in the event of a network partition, you lose the ability to open a database connection. But that's a small price to pay, right?


I guess this is sarcasm, but it's not clear what you're trying to actually say. How is a connection-pooling proxy supposed to make an application more resilient to network partitions? If anything, it's worse, since you have twice as many points of failure -- if either the application can't talk to the proxy, or the proxy can't talk to the database, you're hosed.


Practically speaking, you typically run pgbouncer on the same box as postgres.

Even if you ran pgbouncer on a different box, the situation is still better. Suppose you have a probability P of having a partition between any two boxes. Then the probability of a partition taking down the system with pgbouncer is P.

If you instead had N clients coordinating among themselves how many connections to use, your odds go way up. The probability of a partition existing between two clients is 1-(1-P)^(N(N-1)/2) ~= N(N-1)P/2 (for very small P). Once two clients can't communicate, neither one can connect to the DB (because they don't know if the other two have pushed the db over the limit). All clients will need to stop connecting to the DB if any client is ever disconnected from the other N-1 of them.

In a distributed system, ensuring that SUM(x[i]) remains bounded for all time is a tricky problem to solve.


> Practically speaking, you typically run pgbouncer on the same box as postgres.

I've done this, long-term, in production. This is not how PGbouncer is designed to be used, and it causes serious problems under load. (I am still doing this, but I have it on my todo list to remove this glitch from my architecture, as it caused my entire site infrastructure to fail once during a launch.) Put simply: PGbouncer is single-threaded; it is not difficult to end up in situations where the one core PGbouncer is running on to be at 100% while the remaining cores (in my case, seven) running PostgreSQL are having no issue with the traffic.


Just curious, how does that happen? I've never found myself in a situation where PGBouncer can't handle the load but Postgres can.

I'm guessing it's a situation where you are running a large volume of very easy queries, for example primary key lookups with no joins?


We do have this problem! It is only a matter of how small the PGBouncer machine is compared to the PostgreSQL. In our case the PGBouncer had to deal with 16000 req/s, coming from 700 clients going through 100 connections to the PostgreSQL. We now have 4 PGBouncers for this DB, and we're 'load balancing' them with DNS.


I can't tell if everybody does the same, but I simply enable enough connections for the total of all clients, and am done with that.

Clients talking between themselves to establish if they can create another connection or not is madness. They'll keep their connections in the maximum all the time anyway.


Or you could give each client C/N connections.


This can easily result in too few connections at a local peak. If each box is allocated 3 connections (and postgres can handle 12), you get high latency in the event that your load looks like [6, 1, 1, 1]. This happens occasionally due to random chance.


Assuming multiple application servers despite 99.9999% websites not needing them doesn't make much sense.


there are a number of applications that are sane and dont have a postgresql connection pooling built-in (eg. django introduced it on 1.6)


Django connections to the database are bounded by the number of threads and processess you set.

It didn't have a pool, but it's a mistake to think it just oppened connections at will.


Django requires multiple processes, which makes sharing things like connections quite difficult. That is not sane.


whats your alternative to that?


A language without a global lock to make threads useless.


You have taken a FUD sound bite and repeated it without understanding. The language is not at issue here. CPython's GIL does not make threads useless at all, and there are already Python interpreters without a GIL anyway.

An addiction to globally shared state across threads multiplies race conditions and scaling problems, even if the GIL magically goes away. If you are incapable of using even processes effectively then I really fear for you when you have to scale out horizontally.

I'd say this more nicely if I could think of any way, but you don't know what you're talking about and you should read up before you begin talking about it again. It looks bad and you might mislead someone.


No, you have taken a "stick my head in the sand and pretend the last 30 years of progress didn't happen" and repeating it without understanding. I do web development in haskell. I have thousands of concurrent connections to a single process, which is using 32 cores without problems. There has not been a single race condition, deadlock, mutex bottleneck, etc, etc. Just because you are happy with an absurdly primitive language, doesn't mean those of us in the 21st century are ignorant.


tl;dr;tl;dr; - use a connection pool.


Now you fixed your Postgres could you fix the reminders in Wunderlist pretty please ?

It doesn't work (for me) on Android so I started to use the app less and less.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: