Postgres 9.2 – The Database You Helped Build

jedberg · on Dec 6, 2012

When I was running the Postgres databases for reddit, I was constantly seeking people who could teach me better ways to tune Postgres on AWS. Most people I would meet would just regurgitate what anyone could learn with a day or two of googling and reading.

At one point I got scared by the prospect that I might be an expert on Postgres on AWS, because frankly I didn't know all that much about it and thought we were doomed if that was really the case.

Then I went over had lunch with the Heroku team, and it was eye opening. These guys truly knew how to run Postgres on AWS (and presumably still do).

I can't think of another org that is moving Postgres on AWS forward better than Heroku.

wooster · on Dec 6, 2012

Any insights you can share? I'm looking at this now, and it's not encouraging compared to dedicated hardware.

jedberg · on Dec 6, 2012

It's been a while since I've been in the weeds, but this is what I remember:

It's a balancing act between max_connections and shared_buffers. Each instance type will have a sweet spot for your use case -- you'll have to find it through experimenting.

Read this: http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Serve...

And this: http://www.postgresql.org/docs/9.2/static/kernel-resources.h...

Give as much RAM to Postgres as possible -- let it do the memory management. Sometimes going to an instance with double the RAM will give you more than double the performance.

No swap on the box. The moment you hit swap you're screwed.

Vacuum often, maybe continuously, if the database has a lot of updates or deletes. Do your capacity planning so you can vacuum all the time.

I'll update if I think of more.

mike_ivanov · on Dec 8, 2012

> No swap on the box.

Wrong.

...some common myths:

1. Swap space does not inherently slow down your system. In fact, not having swap space doesn't mean you won't swap pages. It merely means that Linux has fewer choices about what RAM can be reused when a demand hits. Thus, it is possible for the throughput of a system that has no swap space to be lower than that of a system that has some.

2. Swap space is used for modified anonymous pages only. Your programs, shared libraries and filesystem cache are never written there under any circumstances.

3. Given items 1 and 2 above, the philosophy of “minimization of swap space” is really just a concern about wasted disk space. ...

http://www.linuxjournal.com/article/10678

Edit: formatting

miguel_pilar · on Dec 8, 2012

I find the better balance is to use a file-based swap, at least while experimenting with sizes, the performance loss from the FS overhead really is minimal.

bratsche · on Dec 8, 2012

I've heard the "no swap" argument before, but can you explain this a little further?

It seems like there are two potential situations here: 1/ You have swap, you hit swap. Massive slowdown. 2/ You don't have swap, run out of memory and processes die.

#1 is not a good situation, but it seems preferable to #2 doesn't it?

mike_ivanov · on Dec 8, 2012

"No swap" is a common misconception. Having no swap is dangerous and can actually slow down your system.

wooster · on Dec 6, 2012

Thanks!

pvh · on Dec 6, 2012

You could always use Heroku Postgres. Our whole reason to exist is to ensure you can work fearlessly with your data.

At this point, I think running your own data infrastructure is like having a generator in your garage instead of using the power grid.

bradgessler · on Dec 6, 2012

This is a bad metaphor. Utility computing isn't quite up to the same level as electrical utilities. Heroku has suffered from more downtime in the past year than I have power outages. Also, most commercial operations that tap into public utilities have some sort of backup plan for when the power goes out. A dramatic example of that is the power staying on in the Goldman Sacs building during Hurricane Sandy (http://www.inquisitr.com/381743/hurricane-sandy-rages-but-th...)

More visibility would be greatly appreciated into what you're doing behind the scenes and increase your customers' confidence and expectations of using the Heroku pg cloud service. I hope you provide more depth to future answers as opposed to, "Just trust us". I've found that in practice, things never work out that way.

pvh · on Dec 6, 2012

That's a fair criticism - thanks for the rebuttal. (Edit) Also - what kind of visibility are you looking for that we don't offer today? Please feel free to email me (my email is rather guessable) with whatever you have.

pestaa · on Dec 6, 2012

It sounds good until the power grid goes down and the only person in the street with the lights on is with the generator in his garage.

wiredfool · on Dec 7, 2012

And then a few hours later, the CO2 leaks into his house.

pestaa · on Dec 7, 2012

At least he can fix it.

patrickgzill · on Dec 6, 2012

What is your uptime over the last 12 months? As opposed to e.g. the power grid in non-Sandy hit areas?

keypusher · on Dec 7, 2012

What if you are a Fortune 500 company, or a part of the government? A defense contractor? Heath insurance provider? I work for a company that does "private cloud", mostly large enterprise/government places that want the flexibility of having their data accessible anywhere, but are not willing or able (for security reasons) to have Amazon or some other provider host their data. If you already have data center infrastructure and expertise in-house, the long-run cost and benefits of buying your own hardware for high-traffic applications can often outweigh renting space and/or paying on a per-transaction basis from AWS. Public cloud is great, but it's not the right solution for everyone.

sickpig · on Dec 6, 2012

Not from heroku guys but definitely a worth reading http://thebuild.com/blog/2012/05/18/running-postgresql-on-aw...

judofyr · on Dec 6, 2012

So, is this information publicly available anywhere?

pvh · on Dec 6, 2012

We give talks some times, and we're always happy to answer questions if you've got 'em. We open source stuff sometimes too, if we think it might be generally useful.

darksaints · on Dec 7, 2012

I don't doubt what you are saying, but I just wanted to point out one thing: Postgres is amazingly forgiving, and performs incredibly with just basic adjustments to the default configuration. Give it a good amount of memory, and it will handle the vast majority of workloads extremely well. I have been using Postgres basically out of the box for a few years now for Data Warehousing purposes, and its performance is comparable to the tuned-and-managed Oracle servers that I am pulling from.

Reddit obviously is a different scenario, but most people don't work at Reddit.

pvh · on Dec 6, 2012

Thanks a lot! We've come a long way since then - forks, followers, dataclips, extensions, new plans, new versions... Man, actually it kind of terrifies me to think of it all.

ayanb · on Dec 6, 2012

Hey, do you have customer instances where the app is on non-aws bare metal ( like Softlayer) and the db is on heroku postgres? What are the pros and cons of this scenario?

pvh · on Dec 6, 2012

Absolutely. Basically, as long as your app is running somewhere with Very Good Latency to your database, this can be a fine pattern. In practice, this means if the data center is also in Ashburn, Virginia, and better yet, has a DirectConnect deal set up with AWS, you should be in business. The best way to test this is to provision a database and see how latency from your app is. My opinion is that 5ms latency would be a ceiling on what I'd tolerate for my own apps, and I'd probably look for more like 1-2ms or better.

ayanb · on Dec 7, 2012

Thanks pvh. Exactly what i was looking for. Would you mind if I send you an email with couple of more questions?

pvh · on Dec 7, 2012

By all means.

monksy · on Dec 6, 2012

There is this book: http://www.amazon.com/PostgreSQL-High-Performance-Gregory-Sm...

I haven't had a chance to finish it. But it does look like the author knows what he's talking about.

riffraff · on Dec 6, 2012

I just started a pet project some days ago where I had to store some JSON stuff and was sad to find heroku postgres did not support it yet.

And now it does :)

davidw · on Dec 6, 2012

I wonder how often it'd actually come up that you could get stuff through your application layer into the database and/or back out without wanting to tweak it first. Where i have Rails JSON api's, I usually do some tweaking/modification of the JSON that comes out, for instance.

riffraff · on Dec 7, 2012

I don't think I'd ever want to _serve_ json from the db, and as you say, I'd tweak the data for common usage anyway.

But my use case is the opposite: I am storing exactly the data coming from external APIs.

My table has some fields I already extracted from the JSON data, cause I need them now, yet I don't want to throw away the full data, as I may need it in the future.

So, I was storing the original JSON as text.

But, having support for the API format in the db itself is much better, as I can also actually query and manipulate this data without pulling all the text fields in my client code.

joevandyk · on Dec 6, 2012

You may want to look into using postgresql functions to modify the json that comes out. Probably much faster, especially if you have lots of nested stuff coming from different tables.

joevandyk · on Dec 6, 2012

heroku has unofficially supported 9.2/json for a couple months.

zrail · on Dec 6, 2012

I hadn't seen anything about pg_stat_statements before. That will be extremely helpful to have around.

pvh · on Dec 6, 2012

Here's a little teaser: https://github.com/will/datascope

zrail · on Dec 6, 2012

Oh man that's super cool!

mfenniak · on Dec 7, 2012

Does Heroku's Postgres implementation run with a server-side connection pooling tool like pgpool or pgbouncer?

pvh · on Dec 7, 2012

We don't provide server-side connection pooling today, no. It adds too many gotchas to be universally applicable and has had too few use cases to reach the top of our TODO list. We do hear from time to time from people who want it though, so it's certainly not out of the question.

cmaggard · on Dec 6, 2012

We just upgraded our database plan on Monday. Guess we're doing it again soon to get on the new hotness!

cmalpeli · on Dec 6, 2012

can't tell from this, but will existing Postgres databases at Heroku be automatically upgraded, or do you need to provision a new instance and migrate the data?

pvh · on Dec 6, 2012

We figure reliability of your database as one of our chief missions. To that end, we prefer not to do anything which could even remotely be a possible cause of operational issues. Since changing over to a new version is easy, we tend to let users do so at their leisure. That said, in the event of a major 0-day we could be forced to migrate people, but it hasn't happened yet.

joevandyk · on Dec 6, 2012

Existing databases won't be automatically upgraded.

https://devcenter.heroku.com/articles/upgrade-heroku-postgre... has information about the upgrade process.