When I was running the Postgres databases for reddit, I was constantly seeking people who could teach me better ways to tune Postgres on AWS. Most people I would meet would just regurgitate what anyone could learn with a day or two of googling and reading.
At one point I got scared by the prospect that I might be an expert on Postgres on AWS, because frankly I didn't know all that much about it and thought we were doomed if that was really the case.
Then I went over had lunch with the Heroku team, and it was eye opening. These guys truly knew how to run Postgres on AWS (and presumably still do).
I can't think of another org that is moving Postgres on AWS forward better than Heroku.
It's been a while since I've been in the weeds, but this is what I remember:
It's a balancing act between max_connections and shared_buffers. Each instance type will have a sweet spot for your use case -- you'll have to find it through experimenting.
Give as much RAM to Postgres as possible -- let it do the memory management. Sometimes going to an instance with double the RAM will give you more than double the performance.
No swap on the box. The moment you hit swap you're screwed.
Vacuum often, maybe continuously, if the database has a lot of updates or deletes. Do your capacity planning so you can vacuum all the time.
1. Swap space does not inherently slow down your system. In fact, not having swap space doesn't mean you won't swap pages. It merely means that Linux has fewer choices about what RAM can be reused when a demand hits. Thus, it is possible for the throughput of a system that has no swap space to be lower than that of a system that has some.
2. Swap space is used for modified anonymous pages only. Your programs, shared libraries and filesystem cache are never written there under any circumstances.
3. Given items 1 and 2 above, the philosophy of “minimization of swap space” is really just a concern about wasted disk space.
...
I find the better balance is to use a file-based swap, at least while experimenting with sizes, the performance loss from the FS overhead really is minimal.
I've heard the "no swap" argument before, but can you explain this a little further?
It seems like there are two potential situations here:
1/ You have swap, you hit swap. Massive slowdown.
2/ You don't have swap, run out of memory and processes die.
#1 is not a good situation, but it seems preferable to #2 doesn't it?
This is a bad metaphor. Utility computing isn't quite up to the same level as electrical utilities. Heroku has suffered from more downtime in the past year than I have power outages. Also, most commercial operations that tap into public utilities have some sort of backup plan for when the power goes out. A dramatic example of that is the power staying on in the Goldman Sacs building during Hurricane Sandy (http://www.inquisitr.com/381743/hurricane-sandy-rages-but-th...)
More visibility would be greatly appreciated into what you're doing behind the scenes and increase your customers' confidence and expectations of using the Heroku pg cloud service. I hope you provide more depth to future answers as opposed to, "Just trust us". I've found that in practice, things never work out that way.
That's a fair criticism - thanks for the rebuttal. (Edit) Also - what kind of visibility are you looking for that we don't offer today? Please feel free to email me (my email is rather guessable) with whatever you have.
What if you are a Fortune 500 company, or a part of the government? A defense contractor? Heath insurance provider? I work for a company that does "private cloud", mostly large enterprise/government places that want the flexibility of having their data accessible anywhere, but are not willing or able (for security reasons) to have Amazon or some other provider host their data. If you already have data center infrastructure and expertise in-house, the long-run cost and benefits of buying your own hardware for high-traffic applications can often outweigh renting space and/or paying on a per-transaction basis from AWS. Public cloud is great, but it's not the right solution for everyone.
We give talks some times, and we're always happy to answer questions if you've got 'em. We open source stuff sometimes too, if we think it might be generally useful.
I don't doubt what you are saying, but I just wanted to point out one thing: Postgres is amazingly forgiving, and performs incredibly with just basic adjustments to the default configuration. Give it a good amount of memory, and it will handle the vast majority of workloads extremely well. I have been using Postgres basically out of the box for a few years now for Data Warehousing purposes, and its performance is comparable to the tuned-and-managed Oracle servers that I am pulling from.
Reddit obviously is a different scenario, but most people don't work at Reddit.
Thanks a lot! We've come a long way since then - forks, followers, dataclips, extensions, new plans, new versions... Man, actually it kind of terrifies me to think of it all.
Hey, do you have customer instances where the app is on non-aws bare metal ( like Softlayer) and the db is on heroku postgres? What are the pros and cons of this scenario?
Absolutely. Basically, as long as your app is running somewhere with Very Good Latency to your database, this can be a fine pattern. In practice, this means if the data center is also in Ashburn, Virginia, and better yet, has a DirectConnect deal set up with AWS, you should be in business. The best way to test this is to provision a database and see how latency from your app is. My opinion is that 5ms latency would be a ceiling on what I'd tolerate for my own apps, and I'd probably look for more like 1-2ms or better.
I wonder how often it'd actually come up that you could get stuff through your application layer into the database and/or back out without wanting to tweak it first. Where i have Rails JSON api's, I usually do some tweaking/modification of the JSON that comes out, for instance.
I don't think I'd ever want to _serve_ json from the db, and as you say, I'd tweak the data for common usage anyway.
But my use case is the opposite: I am storing exactly the data coming from external APIs.
My table has some fields I already extracted from the JSON
data, cause I need them now, yet I don't want to throw away the full data, as I may need it in the future.
So, I was storing the original JSON as text.
But, having support for the API format in the db itself is much better, as I can also actually query and manipulate this data without pulling all the text fields in my client code.
You may want to look into using postgresql functions to modify the json that comes out. Probably much faster, especially if you have lots of nested stuff coming from different tables.
We don't provide server-side connection pooling today, no. It adds too many gotchas to be universally applicable and has had too few use cases to reach the top of our TODO list. We do hear from time to time from people who want it though, so it's certainly not out of the question.
can't tell from this, but will existing Postgres databases at Heroku be automatically upgraded, or do you need to provision a new instance and migrate the data?
We figure reliability of your database as one of our chief missions. To that end, we prefer not to do anything which could even remotely be a possible cause of operational issues. Since changing over to a new version is easy, we tend to let users do so at their leisure. That said, in the event of a major 0-day we could be forced to migrate people, but it hasn't happened yet.
At one point I got scared by the prospect that I might be an expert on Postgres on AWS, because frankly I didn't know all that much about it and thought we were doomed if that was really the case.
Then I went over had lunch with the Heroku team, and it was eye opening. These guys truly knew how to run Postgres on AWS (and presumably still do).
I can't think of another org that is moving Postgres on AWS forward better than Heroku.