If you can afford a one off 1 second of latency for your SQL queries, then using logical replication with pgbouncer seems way easier :
- setup logical replication between the old and the new server (limitations exist on what is replicated, read the docs)
- PAUSE the pgbouncer (virtual) database. Your app will hang, but not disconnect from pgbouncer
- Copy the sequences from the old to new server. Sequences are not replicated with logical replication
- RESUME the pgbouncer virtual database.
You're done. If everything is automated, your app will see a temporary increase of the SQL latency. But they will keep their TCP connections, so virtually no outage.
You can temporarily reduce query timeout to a smaller setting as part of the automated failover. The long running transactions will fail but you can minimize the window where you can't talk to postgres
Not really, new connections will block as it's pausing. But you won't be able to shut down Postgres until those long queries complete. Perhaps I was not super clear, but what I'm trying to say is that PAUSE is not instantaneous.
yeah what I'm saying is that you can only pause as fast as your slowest currently initiated query. So if you have a diverse set of query patterns, you could be waiting for a really small percentage of small queries to wrap up.
To be fair about this page, this was used to migrate versions of postgres __prior__ to the introduction of logical replication. Logical replication makes this significantly easier (ie you no longer need the triggers)
Exactly this. The OP’s approach reminded me so much of the days of Slony, and I wondered why a simpler approach with logical replication would not just suffice.
Rather than pgbouncer, I did this in the actual application code once (write to both databases at the same time, once everything is in sync and you’re confident the new server works well, fail over to the new one only), but it depends upon how much control you can exercise over the application code.
Any approach that is based on triggers makes me shiver, however.
This is precisely the migration I'm planning on doing in the next few weeks with pglogical under the hood for replication. Seems like the atomic switch is much easier than any sort of problem that could stem from conflict or data duplication errors while in a bi-directional replication strategy.
Yep, you can also prepare the new database by using a snapshot of the primary's volume, and use pg-rewind to get them in sync. Altogether the tooling can make migrations super easy without minimal downtime.
I use pgbouncer and had no idea it supported logical replication. I cant find anything about it in the docs. Do you have something you can link me to to read more?
Which is only possible if you are using a version of postgres which is new enough, and isn't restricted, such as some versions of RDS. Which, explains the whole original post.
Braintree (IIRC) had a really clever migration strategy, although I can't seem to find the blog post now. They paused all traffic at the load balancer, cut over to the new DB, and then resumed traffic. No requests failed, just a slight bump in latency while the LBs were paused.
This app apparently had robust enough retry mechanisms that they were able just eat the errors and not have customer issues—Color me impressed! I'm not sure how many teams can make that claim; that's a hard thing to nail down.
This is one of the areas where Postgres is so far behind MySQL is embarrassing. Zero downtime migrations in MySQL have been a common method for over 10 years. This solution is far from ideal due to the use of Triggers which can greatly increase the load on the database and slow down transactions. If you don't have a lot of load on your DB thats fine, but if you are pushing your DB this will bring it down.
In MySQL they started with triggers with PT-OSC, but now there is GH-OST which does it with Replication. You can do something like this with Postgres by using Logical replication, but its still requires hand holding, and to my knowledge there is no way to "throttle" the migration like you can with GH-OST. Where I work now we are building all this out so we can have first class online migrations, but the chasm is still pretty big.
The article is about migrating from RDS Postgres 9.5 to RDS Postgres 12, which is only feasible with Bucardo. Major version upgrade is less of a pain point these days with logical replication, and even Google Cloud SQL started to support logical replication since about a week ago, after a painfully long wait.
Meanwhile, you are talking about schema migration? In that case, gh-ost indeed is the best tool available. With Postgres, most types of schema migration can be done instantly with just a metadata change, or otherwise be done concurrently without locking the table. So a tool like gh-ost is not as vital, but still valuable for edge cases such as:
- altering the column type: requires table rewrite
- adding an index concurrently onto a busy server: a throttle would be nice
Mysql admin here. If i need to create an in sync copy of a mysql db machine i take a snapshot then let replication catch up. Done. I can do whatever I want to this new machine (test alters, drop columns, etc) and let it catch up. If i like the box i just attach a few replicas - promote it to master, done.
The Triggers trick is awesome (go percona!) -- but with cloud vms it takes a few seconds to fire up a complete insync replica. I find this safer than triggers running all over a prod database [esp. if you have thousands of vms!]. IMHO.
When I read about Postgres it's like being transported to 2005.
Up until the table is so large that the copy takes longer then your WAL retention so you can't ever catch up. Like all things in Postgres, it works great up to a point, and then you are stuck. You also have to Logically replicate the entire schema because postgres won't logically replicate from table to table
Its insane that it has to be this complex and require third party software to accomplish…
Most modern rdbms/nosql database vendors allow a rolling upgrade where you roll in new servers and roll out the old ones seamlessly.
Also the fact that AWS rds doesnt do this with zero downtime by default through automating it this way is also crazy. Why pay for hosted when the upgrade story is incomplete? Take downtime to upgrade a DB in 2021? Everyone must be joking.
Well, Postgres is F/OSS, so I expect the solution to problems to be "lots of small tools"... but I see the same kind of herculean battle-plans for MS SQL Server work, and I get shocked. That is a paid product, what on Earth are we paying for?
You're not. Approximately 25% of SQL Server EULA text deals with ways in which the warranty is limited. The best the license gives you is your money back if you prove that it was the software's fault.
Of course, you can always sue. A $2T company. Good luck with that.
Went to post the same thing but HN had a little downtime burp.
Would only add that the main reason people buy these licenses, apart from familiarity, is so they can shift the blame if something goes wrong. Its all about ass covering.
This is a complex task, fraught on any platform. Most systems base such clustering on a shared filesystem, which is obviously orthogonal to the shared nothing designs most try to pursue.
AWS can upgrade your database automatically, but with some downtime. AWS also provides DMS for migrations, which didn't work well in our case. So it was rather a simple problem at first, which turned to be a very complex one in the end.
"Modern" = "not a traditional RDBMS". Expecting your 1983 Toyota Corolla to fly, regardless of the fact that it's now 2021, is unrealistic. But personal helicopters have been around for a while.
I didn't read this article, but I really hate the tag line "done right". It expresses such a poor sense of humility, which is one of, or perhaps the most, important traits in the world of software
We didn't mean to be arrogant with the "done right" statement. In the background story we explain how we performed the same migration once again in the past, and we end up with data loss. So this was the time that actually "did it right". Many tutorials on the Internet that describe a similar process have flows that also lead to data loss.
If you are in AWS, or have connectivity available, AWS Database Migration Service makes this relatively trivial.
DMS for Postgres is based on Postgres Logical Replication, which is built-in to Postgres, and the same thing Bucardo is using behind the scenes. But AWS DMS is very nearly point-and-click to do this sort of migration.
The key here is Postgres 9.5. AWS DMS does not support it because they require logical replication support.
A few years back I migrated a PostgreSQL 9.2 database to AWS and wasn't able to use RDS because logical replication was not available.
I did try to use Bucardo but ultimately didn't trust myself to configure it such that it wouldn't lose data (first attempt left nearly all BLOBs unreplicated because the data isn't actually in the tables you set triggers on)
Physical replication to a self-built instance was easy, I was 100% confident it wouldn't be missing data, and the downtime from cutover was about 15 minutes (It involved restarting piles of slow Java applications)
You are right. AWS DMS was our very first choice to try out. It is very easy to use, deployed within your VPC and most of the problems we mention in the article are already solved. Unfortunately, we experienced errors during our tests and the logging mechanisms were not quite helpful, so we failed to find out the problem and make it work.
Bucardo has no performance impact, it just adds a trigger to every mutation. Negligible!
I really think articles of this kind are not useless, but need to explicitly narrow their audience to set expectation at the start. This particular topic is PG replication for users who aren't very sensitive to write latency.
That's a good point. We mention latency as "replication lag" and we have devoted a paragraph to the drift that comes as the result of this latency. In our case, we measured the latency to be <1s, and it was totally acceptable. As for the performance, triggers can become a problem with enough write traffic. In a different use case, Bucardo would be eliminated from a potential solution due to this.
Has anyone here leveraged pglogical for this before? Looking at a similar migration but it has native extension support in AWS RDS. Would love to hear any success / horror stories!
Your team is now working 10pm to 4am est every Sunday, and anyone who uses your services in any time zone around the world is without your service. 10pm est is lunch time in Australia. 4am est is working hours in many parts of western Europe.
You don't have to be huge to have customers all around the world.
I think the idea is that this world is one where the customer accepts that there may be an outage every Sunday because feature work can be done far more cheaply.
Er, you bring the service down so you can do work on it. Somebody is doing that work.
I guarantee that you'll have trouble hiring qualified people for a role that includes core hours of "2AM-5AM Sunday mornings" (or whatever other oddball time you've decided). Updating systems is precisely when you may need to have all-hands-on-deck responses. Been there, multiple times.
they might not have to work during every planned downtime, but they're going to have to work during some of them, or what was the point of the downtime?
This predictability only matters for frequent, repeat customers. But those are the exact customers that you could just email saying "the website is going to be down on Sunday for 1 hour for planned maintenance" a few times a year when it's actually required.
If you run a business (eg. HFT where you run certain cleanup scripts after trading hours) where regular planned downtime is fine then go for it but most of the time, you don't need a downtime schedule every single week because you're not doing planned maintenance work every single week.
You will be doing that every week when the system is designed to only be worked on during those planned windows, because that's what happens in these types of situations.
These windows are also when people discover how fragile their infrastructure is; when it doesn't come back up and now everyone is panicking and well past the planned downtime window.
I find it very hard to come up with a use case where a weekly 6 hours of downtime at night EST would be acceptable.
US government websites already often do this, and I find that completely unacceptable since these services need to be available to anyone, regardless of work and life schedules.
Any website that's international would also suffer greatly for an EST centralized scheduled downtime.
Maybe a very localized website that doesn't have much impact on real life?
Frankly, because it can be. Weekly scheduled downtime is arbitrary, manufactured, and lazy.
Of course, having to schedule the occasional downtime for a database migration is fine. There's probably a few times a year that you'd need to do it if you don't have the bandwidth to do fancy zero-downtime solutions. It's the weekly arbitrary downtime that I'm firmly against.
It's expensive to run things like that. Multiple "9"s are significantly more expensive to engineer than, say, 95% uptime.
In general, the correct amount of allowable downtime is when losses due to downtime are greater than the cost of increasing availability (which depends on the system). Should we raise taxes to increase government website availability? Not sure there's a clear cut case and it would definitely depend on each specific website and what utility it provides
Why is it lazy? When things are up people have to work. Do you believe people should be working all of the time?
I think regular downtime is only natural. If you had to choose between 95% availability or 100% availability other than the before mentioned downtime which would you choose?
That's not true. Monitoring software and on-call rotations are well established things, and wouldn't even go away in your world. Believe it or not, people who run websites already sleep at night.
> Do you believe people should be working all of the time?
I'm not sure where you got that. Besides, you're already having people in the office in the middle of the night every Sunday already. Or are you saying that the downtime is when your team gets to sleep?
> If you had to choose between 95% availability or 100% availability other than the before mentioned downtime which would you choose?
Again, not sure what you're getting at. There's only a few reasons why you'd need to schedule a downtime. Database migrations are a good one. All of the reasons should only happen a few times a year, likely. Way better than 52 times a year.
One service I worked on (with customers around the globe) had about 10 minutes of scheduled downtime a week. Peak user load was, interestingly enough, during our core hours, when there are a lot more hands on deck to handle any issues. So that's the time we chose.
Naturally people still complained bitterly about that ten minutes.
Getting to zero didn't seem worth the significant engineering effort required. "You're not Amazon, and you're not 911" [emergency number in the US]
Setting aside the other comments, I think it depends on your definition of “easier”.
Yes - it is easier in terms of up-front investment to have a scheduled (daily, weekly) downtime. But that often means the downtime is not automated - it’s someone doing the work, often out of hours when they’re not fresh, things can do wrong, etc. And it also means that the duration of the work is limited to the scheduled downtime. Some large upgrades just won’t fit in the allotted time, and you need to schedule an outage.
On the other hand, creating a system designed for zero downtime (ie, in-place upgrades, removal of big batch jobs etc) is a lot more of an investment up front, and makes upgrade planning more complex, but ultimately it saves time and effort because you can do all the changes during working hours, and “upgrades” become just how you deploy code - part of the core skill set of your team - and you do them 10 times a day.
The main difference that I’ve seen is that the scheduled downtime approach tends to create a much more conservative upgrade cycle, much bigger upgrades, and more risk of an upgrade failing - which means that a pile of features don’t get released. Also, certain customers will push back and demand a ton of documentation and detailed release notes and advanced warning of the upgrades etc - this creates loads of unproductive work for a small company. A continuous deployment model tends to reduce these problems because you’re not externalising your upgrade schedule and you don’t have a single big upgrade that can go bang, so maybe one feature doesn’t get released but everything else does.
Having taken both approaches (and even moved a company from scheduled to continuous) I would never go back to a scheduled downtime approach.
Interestingly, this is unacceptable in the US and probably most of Western Europe. However, I know of more than one bank in Eastern Europe where online banking simply doesn't work after midnight, until about 6am.
My Dutch bank often sends me mails about their internet banking being available for a few hours during the night for maintenance.
Availability is great but realistically, who’s going to use their bank in the middle of the night? It probably isn’t worth the effort and the risks to avoid the downtime.
I think it's still somewhat (fairly?) common for internal facing apps where you can just tell your users "It's going to be down in this window, plan accordingly". For a publicly available app, you'll lose your customers to a competitor with a more advanced maintenance strategy (or even one that happened to pick a NoSQL database where rolling upgrades are trivial)
This is very common in the equity trading space. Applications must be up during the trading day (non-fri 8-5) but after that they can be upgraded and restarted easily. Typically there’s strict change management processes that prevent Continuous Delivery and slows the rollout of changes.
Often I wish I worked in an industry where I could release during the day.
Can this be done using existing PostgreSQL functionality around replicas? I think there's a plugin for PostgreSQL that supports master-master replication as well.
Very interesting article. But I have to ask: would taking down the system for a couple of hours be that bad?
I looked at the company, and while they seem rather large, they're not Netflix or AWS.
I imagine they need to be up for people to be able to check in, etc. But they could just block out the planned maintenance as check in times far in advance. I'm sure there's a million other edge cases but those can be thought out and weighed against the engineering effort.
Don't get me wrong, this is very cool.
But I wonder what the engineering cost was. I'd think easily in the hundreds of thousands of dollars.
I think it's always worth questioning both sides. Why is downtime acceptable? People on this site routinely complain about windows needing a restart for system updates while boasting about their Linux servers uptime.
People talk about how kubernetes is overkill for many people, but it gives you rolling deployments for your applications out of the box.
There's also the "slippery slope" argument. A 0 downtime migration means 0 downtime. A 5 minute migration creeps up to a 30 minutes migration occasionally, and then regression to the mean happens, causing standard migrations to be 30 minutes.
At least where I'm working downtime is acceptable because our customer base is composed of professionals in north america that work 9-5 thus taking the DB offline for five minutes at midnight PST has essentially no cost to our client base. To contrast that, spending three months developing a rollout plan costs our company dearly in competitive advantage.
I agree that it's always worth examining from both sides but I also think that 0 downtime migration is both a shiny interesting problem and a point of pride for developers that can cause a company to vastly over invest in a complex technical solution to a problem they never had.
A lot of 0-downtime migrations end up turning into a months-long ordeal and can still fail.
Allowing maintenance windows means you can do things in a much simple manner. Need to take a snapshot of a DB without running into issues with a production system adding data? Sure, go ahead, you just saved two months.
The principle of something like zero downtime or switching a vendor out to save 20% may not stack up against the reality in opportunity costs, employee burnout, and customer turnover.
My company is reorganizing and so everyone is trying to settle old philosophical debates during The Churn. But everyone is overextended and a number of us have started pushing back. Just the meetings to discuss changing vendors might push the break-even point out 6 months, before you even get into implementation costs. The more time we spend working on infrastructure instead of customer-facing features and bugs, the more likely those customers are to wander off.
I think the slippery slope argument is almost always a fallacy. In practice you can say "we accept 99.99% availability" (which is about 1 hour every year) and budget for that. Your service might go down for unplanned reasons such as an outage but it could also go down for planned maintenance and as long as you're within that budget it seems ok.
Unplanned? Probably. But maybe it's not as a dire as you think? Azure Active Directory (auth as a service) went down for a while, globally, and life went on. Same with GCP and occasionally AWS[1]
I'm not saying there's no downside, I'm asking against the downside of engineering cost. And that itself carries risk of failure when you go live. It's not guaranteed.
[1] AWS is so ubiquitous though, that half the internet would be affected so it makes it less individually noticeable.
So don't host on AWS, wait out their next big outage and then take down your app for that big migration, so you can hide among all the other dead apps. Half joking, of course.
OTOH I worked for several small companies that had "business hours" software - if you had to take downtime on the weekend or after hours you'd be looking at impacting <1% of users.
It absolutely is, BGP with geographically diverse paths, databases, app servers, etc, are all redundant. It's hosted in-house, so there is a cold standby database in AWS which would only be used if, say, an aircraft crashed into our server rooms.
We have everything in place to run from AWS if needed but do not operate from there because of cost.
This goes both ways -- there have been many AWS outages which have not affected us. I hear what you're saying, but we've had only one instance of extended (hours) downtime in the last 20 years.
The more experienced I become, the more such down to earth solutions seem OMG SO MUCH MORE reasonable than all the bells and whistles of "modern" engineering practices.
For postgres we send WAL files to a server in AWS which processes them. To bootstrap the database initially we sent zfs snapshots, and those WAL files are applied on an ongoing basis. If our data center were to die a horrendous fiery death, we could lose, at most, about 3 minutes of data although monitoring shows that it's closer to 30s under normal operating conditions.
For the app servers, saltstack is used and we synchronize that repository with what is needed to reproduce a production environment in AWS.
Obviously we'd have to provision servers, etc, but it's all possible in a worst-case scenario.
pg_upgrade with --hard-link option will upgrade the db in mins, I've done a 2TB db in less than a minute.
Yes, there is no rollback, but if you do enough testing prior it can be really smooth. Am in the middle of doing this across an environment where we have tens of hosts from a few GB to couple TB.
The problem happens when you estimate it will only take a few hours and at the end of the first hour it's 1% done, or it nears completion and crashes, in a loop. Now what?
A few minutes of downtime wouldn't be that bad, but it would harm the company's prestige. Unlike AWS and Netflix, we are not based on massive consumption, but on a few customers. Losing one good customer can make a difference. We believe it worth the investment of time.
Update: The cost is a good point. It wasn't that high. We actually updated the article to reflect this. It was two engineers, working full-time on it, for three weeks.
MySQL has had a myriad of easy-to-use solutions for over a decade that offer point-in-time cloning:
- Percona xtrabackup
- Linux LVM snapshots
- master-slave replication
- then built-in replication for point-in-time catchup.
I do some work with pg, but it seems clunky compared to MySQL for typical HA and failover scenarios, but it can be done. All the MySQL methods I mentioned above can and have been scripted. With MySQL GTIDs, it's usually trivial.
MySQL 8 and Percona Cluster are easy-to-use multi-master topologies. (I only use them with less than 100 GB of data in case of state transfers.)
does anyone know if this works when the target database is a replica/standby? The downside using pg_dump is that it acquires a lock on the table its dumping, and doing this on production may cause some slowness
You can do multi master replication with conflicts detection with symmetricDS. The migration would be similar to the procedure described in the article.
I just don’t get all the “just shutdown the db for an hour, you’re not Netflix” comments.
If you can do things properly, as an engineer, you absolutely should, even if your company serves “just” hundreds of thousands instead of hundreds of millions users.
It is not like they wrote their own database for that, they just used an open source tool.
My understanding of doing things "properly" as an engineer, is picking the solution with the right tradeoffs for my use case. If the cost to the business of having some amount of scheduled downtime occasionally is significantly less than the engineering cost of maintaining several 9s worth of availability over major migrations, then I consider the former to be "done right".
If you do things properly as an engineer, you've already negotiated and committed to service-level agreements with specific downtime objectives, right? Right?
If you've got say 5 minutes of downtime budget per month, do you really think it's a good investment to spend a million dollars of engineering effort and opportunity costs to get database downtime to zero seconds? Or you could use off-the-shelf techniques for orders of magnitude less investment and suffer a few minutes of downtime at worst, well within your SLO. It's an economic decision, not a technical one.
If zero downtime at any cost is non-negotiable, well you'd better hire, plan and budget accordingly. And hope those few extra minutes per year are worth the cost.
I agree that it is an economic decision. In my understanding it didn’t cost them millions of dollars to do it zero-downtime. Maybe a $10k-$50k in development/admin/test hours to make it so.
Also engineers get better (and happier) when they do challenging tasks!
> Also engineers get better (and happier) when they do challenging tasks!
Better maybe, but happier, I'm not so sure.
At my last job I was the main developer in charge of hitting our zero-downtime target for code deploys and system upgrades, and it was a pain in the ass to always have to implement multi-stage migration processes with bidirectional compatibility between adjacent stages for things that would have been very little work if we'd been able to schedule an hour of downtime.
The cost was worth it from a business point of view, but it wasn't much fun to actually do the work.
Though I guess I agree with the "happier" part too if you're just talking about doing the initial infrastructure and software-architecture work that allows those annoying multi-stage migrations to run seamlessly. I did enjoy writing the migration framework code and figuring out what all the stages would need to be.
> Also engineers get better (and happier) when they do challenging tasks!
I would rather encourage them to engineer to the limit of ability but deliver to Ed: within the limit of burning out my engineering team, plus some comfortable margin gained from the encouragement given, which should be rewarded.
I've talked myself into engineering insufficient time for unforseen downtime tasks and it took me a while to realise that I was wearing my management hat as a founder and not the engineering hat I should have been wearing.
> Blueground is a real estate tech company offering flexible and move-in ready furnished apartments across three continents and 12 of the world’s top cities. We search high and low for the best properties in the best cities, then our in-house design team transforms these spaces into turnkey spaces for 30 days or longer.
Seriously, how big can that db be, and how bad would a 1hr reduced availability / downtime be?
Seems like a lot of wasted engineering effort. “You are not google”
You cut out the part in About where they hint at it being important here even. "But we go a step further, merging incredible on-the-ground support with app-based services for a seamless experience for all our guests"
It sounds like across multiple timezones, with a tech stack that backs the business in a specific way, that downtime could be a problem and that it would reduce their offering, "seamless" and reliability for moving.
If twitter goes down and you can't tweet, only really Twitter loses money and you can try again later. But if you're moving into an apartment you don't want to be standing outside trying to get the keys but the system is down. Edit: And you also don't want the service to tell you that you can't move in from 11am-1pm because of 'maintenance'
You can often get away with some downtime, but that's not the same as not spending any engineering effort.
What kills you is when you take an hour scheduled downtime, communicate this out, get everyone on board... and then are down for a day. If you don't have a good, well-rehearsed, plan, you might be unexpectedly screwed even then. As a rule of thumb... something usually doesn't quite go how you expect it to!
You can have the best plan in the world, well rehearsed and battle tested; something in that last patch bugged out and took the system with it. Now you're getting cursed out because the email, ivr, etc all said 1 hour downtime and it's been 4.
Weekly scheduled downtime has value, but it's not a good idea for most.
I was trying to get on Zoopla (and, I think, Rightmove) two nights ago for over an hour after midnight and the site continued to tell me it was down for maintenance, and to wait again until the non-night owls were up in the morning. Pretty sure their market cap is ten, if not eleven, figures.
Agree that it's a lot of wasted engineering effort for companies that can have planned downtimes.
My guess the conversation was 'To be agile, we need to be 100% CICD'. Next thing you know everything needs to get pushed straight to prod continuously with no downtime.
- setup logical replication between the old and the new server (limitations exist on what is replicated, read the docs)
- PAUSE the pgbouncer (virtual) database. Your app will hang, but not disconnect from pgbouncer
- Copy the sequences from the old to new server. Sequences are not replicated with logical replication
- RESUME the pgbouncer virtual database.
You're done. If everything is automated, your app will see a temporary increase of the SQL latency. But they will keep their TCP connections, so virtually no outage.