Love pglogical. I was thinking, it'd be nice to orchestrate this setup using pglogical too. Would attempt it if there is interest. Especially with bi-directional replication setup.
I checked the code and noticed some things that you might want to address in the future (I've done major version upgrades via logical replication myself several times using a script similar to this).
The "default" way of bringing the target DB in sync with the source one (CREATE PUBLICATION on the source then CREATE SUBSCRIPTION on the target) only works for moderately sized databases. Even on ~50GB I noticed that it may take hours to catch up with the origin. There are a couple of solutions:
1. Drop all indices except for the primary keys before performing initial copy, and then restore the indices after all tables have been copied and replication stream started.
2. Instead of copying into a blank database, you could instead create a logical dump using pg_dump with snapshot isolation, restore it on the target DB using pg_restore, create a subscription in the disabled state, advance the logical replication slot to the LSN with which you created the logical dump, and then enable the subscriptions. This way your target database would only have to process the delta between the time you created a dump and the current time. I have a snippet illustrating how we did it: https://gist.github.com/take-five/1dab3a99c8636a93fc69f36ff9....
3. We found out that pg_dump + pg_restore is still very slow on larger databases (~1TB) and it's untenable to wait several hours. Remember that while you're waiting until the dump is being restored, the source database is accumulating WALs which it should later send to the target DB. If left unchecked for 12-18 hours, it could lead to running out of disk space. This was unacceptable for us, so instead of creating a target DB from a logical dump, we created a copy of the source database from AWS RDS snapshot, upgraded it using pg_upgrade and then set up logical replication (similar to how it's described in Percona blog: https://www.percona.com/blog/postgresql-logical-replication-...). Something like this can probably be achieved with ZFS snapshots.
--
Otherwise, it was very similar to how you programmed your library.
Recently, I saw an article here on HN on the same topic of migrating Postgres databases using logical replication. There was a very nice idea to set up _reverse_ logical replication after switchover so that it's possible to roll back quickly in case something breaks.
These are great shouts! Thank you so much for sharing. One of the operations I ran with this was ~500GB and yeah it takes a few hours to catchup. For us, it wasn't an issue, especially also since we were on AWS Aurora. However, that said, I can totally see it being not feasible on other database engine/systems. I considered the idea of dropping all the indices and re-creating it once the catch up is complete. I'd like pg_easy_replicate it to support that. It should make the initial COPY lot more swift. On a 200GB large DB it cut down down the total time by 90%.
The fastest way to switchover is probably adding a iptables rule on the old server to DNAT to the new server. Unless you have a load-balancer or another network middleware like Kubernetes' clusterIP services (which uses iptables or IPVS internally).
Waiting for a rolling restart or DNS cache expiration could take a while, during which the app is talking to a read-only database, if I understand correctly.
When you do this, does it need to process the whole database before the server can start processing transactions, or is it more of a "upgrade in the background on the fly" kind of thing?
This is also a spot where having failover strategies and circuit breakers can help. Microservices make you think about these things but you don’t have to wait for that forcing function to think about them.
When the database goes down, you have to do something else. Could be the server crashed or could just be a Postgres upgrade. Five minutes is just about the right amount of time for an open circuit to do its job.
Because when I last tried it, the replica has to be the same major version (WAL streaming requires it) and then upgrading it to the latest version can take 10+ hours.
You'd want logical replication instead of physical replication. Logical rep doesn't stream the WAL as-is, it streams a representation of the statement that can be replayed on the destination.
It looks like this uses pglogical plugin but there's also a built in decoder I think called "pgoutput"
> The switch is made by putting the user on the source database in READ ONLY mode, so that it is not accepting any more writes and waits for the flush lag to be 0. It is up to user to kick of a rolling restart of your application containers or failover DNS (more on these below in strategies) after the switchover is complete, so that your application isn't sending any read/write requests to the old/source database.
This does mean that there will be a "downtime" during which the DB is in read-only mode, right? I understand this period can be really small, but that depends on how long the app deployment takes.
Thats right, it was a typo. Mean't zero data loss and minimal downtime. There are some strategies in the readme like using weighted based DNS failover to reduce the downtime even less without requiring application deployment.
Simple LB like Haproxy works fine for this and similar cases. In very nutshell, Haproxy executes checks every n seconds (in your case it can be readonly mode checks ) and disables that upstream.
We use it over Mysql/Pg+patroni/Redis and works fine - we made tests for low load though, just up to 5k qps fot PG case and up to 80k rps for Redis case.
This can be done with pgbouncer and you can suspend the database so pgbouncer "holds" connections. If the application has high enough timeouts, it will look like it's just taking longer than normal to connect.
That specific case and if you can afford PgBouncer, I agree with you.
On a wider scale, Haproxy adds value - auto tracking on sync/async replicas, replica lag windows, falling back on master if all replicas lagging/down, so on.
I guess this is pretty similar to what heroku does for pg updates?
If using this tool for upgrades with minimal downtime, if I understand right, you need to orchestrate your db callers to switch over to the new primary at the right point, when it's ready? Tips for getting that to happen at just the right point to minimize downtime?
Great question: so you can watch the stats command. The command returns a JSON every second which has a switchover_completed_at field. It is updated the as soon as the switchover is complete.
Let's say I have a Rails app and I want to do a major postgres version upgrade using this tool.
I see one of the requirements is that 'Both databases should have the same schema'. How should I manually load the schema to the new database (running on port 5433 for instance)?
Yeah I think the documentation is a bit ambiguous here. Postgres "schemas" have a very specific meaning, different from what most people think of when they hear the word. It's more like a table namespace.
I have historically had a “drop-create script” that could handle building me empty tables. Typically split up so that I can create tables, import (test) data from a snapshot or synthetic source, then enable constraints and indexes.
So even if a tool like that needed a little boost at the beginning to get going, I’d have something that already took care of that.
That’s a great question: internally what we have done is when preparing the new empty database, we also run a “bundle exec rake db:schema:load” against it. Depending on your setup, it can be hard to do.
I am exploring some options like loading schema from a schema.sql file or something along the lines of what pg_dump does for schema export.
The easy answer would be on some non-production copy of the Rails app, point at the new db in database.yml, and simply run `rails db:create; rails db:schema:load`, or some variation rails command like `rails db:prepare`.
Rather than this ad-hoc thing, you could also try to create a new rails "environment" for this use, I guess.
The postgres documentation for logical replication suggests using "pg_dump --schema-only" as a starting point.
I am sure the pg_easy_replicate tool brings something to the table(probably replication lifecycle management) but reading the docs, the bare postgres replication looks just as easy if not easier than the easy replicate tool. I am going to try it on my toy db and see how it goes.
This looks like a script that basically just runs all the necessary SQL for you. It looks pretty similar to an internal Python tool I came up with to do the same thing (basically just scripting out setting up logical rep and waiting for it to catch up, then executing switchover commands)
Thats right! Its mostly a CLI orchestration tool (mentioned in the readme), with some logging, stats and safety checks so that user/engineer can be mostly hands off during the process. Perhaps, even automate the entire process by wiring up different pg_easy_replicate commands together.
We do that with Tines, where we can be fully hands off (zero touch ops) and are able to kick off upgrade with switchover for tens of DBs (one at a time for now :D).
I would prefer to not have to install a new language on the server. There's Python and Perl already; you would certainly cover nearly all Linux installations with these two.
If you had included any content or justification in your comment, it might have gone differently. "I don't like software written in Ruby" is not interesting to anyone. No polarization.
I likecode written in Assembly. No problem with that. If it's written in Assembly in the first place (not disassembled from another language) it's usually not that hard to read...
https://jstaf.github.io/posts/pglogical/
pglogical is magic. Keeping an eye on this project for the future, looks great.