Unfortunately not, but it's surprisingly straight-forward, apart from the database bit, but here's a bit more detail from memory. There are many ways of doing this and some will depend strongly on which tools you're comfortable with (e.g. nginx vs. haproxy vs. some other reverse proxy is largely down to which one you know best and/or already have in the mix) [Today I might have considered K8s, but this was before that was even a realistic option, but frankly even with K8s I'm not sure -- the setup in question was very simple to maintain]:
* Set up haproxy, nginx or similar as reverse proxy and carefully decide if you can handle retries on failed queries. If you want true zero-downtime migration there's a challenge here in making sure you have a setup that lets you add and remove backends transparently. There are many ways of doing this of various complexity. I've tended to favour using dynamic dns updates for this; in this specific instance we used Hashicorp's Consul to keep dns updated w/services. I've also used ngx_mruby for instances where I needed more complex backend selection (allows writing Ruby code to execute within nginx)
* Set up a VPN (or more depending on your networking setup) between the locations so that the reverse proxy can reach backends in both/all locations, and so that the backends can reach databases both places.
* Replicate the database to the new location.
* Ensure your app has a mechanism for determining which database to use as the master. Just as for the reverse proxy we used Consul to select. All backends would switch on promoting a replica to master.
* Ensure you have a fast method to promote a database replica to a master. You don't want to be in a situation of having to fiddle with this. We had fully automated scripts to do the failover.
* Ensure your app gracefully handles database failure of whatever it thinks the current master is. This is the trickiest bit in some cases, as you either need to make sure updates are idempotent, or you need to make sure updates during the switchover either reliably fail or reliably succeed. In the case I mentioned we were able to safely retry requests, but in many cases it'll be safer to just punt on true zero downtime migration assuming your setup can handle promotion of the new master fast enough (in our case the promotion of the new Postgres master took literally a couple of seconds, during which any failing updates would just translate to some page loads being slow as they retried, but if we hadn't been able to retry it'd have meant a few seconds downtime).
Once you have the new environment running and capable of handling requests (but using the database in the old environment):
* Reduce DNS record TTL.
* Ensure the new backends are added to the reverse proxy. You should start seeing requests flow through the new backends and can verify error rates aren't increasing. This should be quick to undo if you see errors.
* Update DNS to add the new environment reverse proxy. You should start seeing requests hit the new reverse proxy, and some of it should flow through the new backends. Wait to see if any issues.
* Promote the replica in the new location to master and verify everything still works. Ensure whatever replication you need from the new master works. You should now see all database requests hitting the new master.
* Drain connections from the old backends (remove them from the pool, but leave them running until they're not handling any requests). You should now have all traffic past the reverse proxy going via the new environment.
* Update DNS to remove the old environment reverse proxy. Wait for all traffic to stop hitting the old reverse proxy.
* When you're confident everything is fine, you can disable the old environment and bring DNS TTL back up.
The precise sequencing is very much a question of preference - the point is you're just switching over and testing change by change, and through most of them you can go a step back without too much trouble. I tend to prefer ensuring you do changes that are low effort to reverse first. Need to keep in mind that some changes (like DNS) can take some time to propagate.
EDIT: You'll note most of this is basically to treat both sites as one large environment using a VPN to tie them together and ensure you have proper high availability. Once you do, the rest of the migration is basically just failing over.
I remember reading this published insight[1] from Marissa Mayer a few months ago:
Burnout is caused by resentment
Which sounded amazing, until this guy who dated a neuroscientist commented[2]:
No. Burnout is caused when you repeatedly make large amounts of sacrifice and or effort into high-risk problems that fail. It's the result of a negative prediction error in the nucleus accumbens. You effectively condition your brain to associate work with failure.
Subconsciously, then eventually, consciously, you wonder if it's worth it. The best way to prevent burnout is to follow up a serious failure with doing small things that you know are going to work. As a biologist, I frequently put in 50-70 and sometimes 100 hour workweeks. The very nature of experimental science (lots of unkowns) means that failure happens. The nature of the culture means that grad students are "groomed" by sticking them on low-probability of success, high reward fishing expeditions (gotta get those nature, science papers) I used to burn out for months after accumulating many many hours of work on high-risk projects. I saw other grad students get it really bad, and burn out for years.
During my first postdoc, I dated a neuroscientist and reprogrammed my work habits. On the heels of the failure of a project where I have spent weeks building up for, I will quickly force myself to do routine molecular biology, or general lab tasks, or a repeat of an experiment that I have gotten to work in the past. These all have an immediate reward. Now I don't burn out anymore, and find it easier to re-attempt very difficult things, with a clearer mindset.
For coders, I would posit that most burnout comes on the heels of failure that is not in the hands of the coder (management decisions, market realities, etc). My suggested remedy would be to reassociate work with success by doing routine things such as debugging or code testing that will restore the act of working with the little "pops" of endorphins.
That is not to say that having a healthy life schedule makes burnout less likely (I think it does; and one should have a healthy lifestyle for its own sake) but I don't think it addresses the main issue.
Then I finally realized how many times I've burnt out in my life, and I became much better into avoiding it. Which is really hard to do.
And it seems to me that this is one of the many points that Ben Horowitz talks about on his What’s The Most Difficult CEO Skill? Managing Your Own Psychology[3]
Interesting. I've read that under the rules of Islamic finance, which forbid lending at interest, people have exploited similar loopholes. In the UK, HSBC (I think) offers an "Islamic mortgage" product, in which the bank purchases the house, the prospective homeowner rents it, and the bank agrees to give him the house if he's a good renter for thirty years. The loan market interprets lenience as damage and routes around it.
* Set up haproxy, nginx or similar as reverse proxy and carefully decide if you can handle retries on failed queries. If you want true zero-downtime migration there's a challenge here in making sure you have a setup that lets you add and remove backends transparently. There are many ways of doing this of various complexity. I've tended to favour using dynamic dns updates for this; in this specific instance we used Hashicorp's Consul to keep dns updated w/services. I've also used ngx_mruby for instances where I needed more complex backend selection (allows writing Ruby code to execute within nginx)
* Set up a VPN (or more depending on your networking setup) between the locations so that the reverse proxy can reach backends in both/all locations, and so that the backends can reach databases both places.
* Replicate the database to the new location.
* Ensure your app has a mechanism for determining which database to use as the master. Just as for the reverse proxy we used Consul to select. All backends would switch on promoting a replica to master.
* Ensure you have a fast method to promote a database replica to a master. You don't want to be in a situation of having to fiddle with this. We had fully automated scripts to do the failover.
* Ensure your app gracefully handles database failure of whatever it thinks the current master is. This is the trickiest bit in some cases, as you either need to make sure updates are idempotent, or you need to make sure updates during the switchover either reliably fail or reliably succeed. In the case I mentioned we were able to safely retry requests, but in many cases it'll be safer to just punt on true zero downtime migration assuming your setup can handle promotion of the new master fast enough (in our case the promotion of the new Postgres master took literally a couple of seconds, during which any failing updates would just translate to some page loads being slow as they retried, but if we hadn't been able to retry it'd have meant a few seconds downtime).
Once you have the new environment running and capable of handling requests (but using the database in the old environment):
* Reduce DNS record TTL.
* Ensure the new backends are added to the reverse proxy. You should start seeing requests flow through the new backends and can verify error rates aren't increasing. This should be quick to undo if you see errors.
* Update DNS to add the new environment reverse proxy. You should start seeing requests hit the new reverse proxy, and some of it should flow through the new backends. Wait to see if any issues.
* Promote the replica in the new location to master and verify everything still works. Ensure whatever replication you need from the new master works. You should now see all database requests hitting the new master.
* Drain connections from the old backends (remove them from the pool, but leave them running until they're not handling any requests). You should now have all traffic past the reverse proxy going via the new environment.
* Update DNS to remove the old environment reverse proxy. Wait for all traffic to stop hitting the old reverse proxy.
* When you're confident everything is fine, you can disable the old environment and bring DNS TTL back up.
The precise sequencing is very much a question of preference - the point is you're just switching over and testing change by change, and through most of them you can go a step back without too much trouble. I tend to prefer ensuring you do changes that are low effort to reverse first. Need to keep in mind that some changes (like DNS) can take some time to propagate.
EDIT: You'll note most of this is basically to treat both sites as one large environment using a VPN to tie them together and ensure you have proper high availability. Once you do, the rest of the migration is basically just failing over.