Hacker News new | comments | show | ask | jobs | submit login
Deploying Python Without Downtime (philipcristiano.com)
33 points by philipcristiano 1487 days ago | hide | past | web | 8 comments | favorite

A neat trick with uWSGI is to dynamically set the number of worker processes to 0, which causes incoming requests to hang whilst waiting to be processed by no-longer-extant workers.

As long as you can apply DB migrations before the waiting requests timeout (and then set the number of workers back to something sensible) you can perform quite major upgrades without even dropping connections.

While not great for migration uWSGI also has chain reloading [1] which will do the same thing as killing each worker. I use Tornado at work for some applications so I had to come up with something for Gunicorn as uWSGI + Tornado doesn't really fit together.

  [1] http://uwsgi-docs.readthedocs.org/en/latest/Changelog-1.9.html#chain-reloading

Does the master process actually accept() connections while the workers are being restarted? If not, clients will eventually fail to connect after the accept queue length has been reached. This can happen very quickly as accept queues tend to be pretty short (in the hundreds, if not less).

It doesn't accept, not sure what the queue length is, either 100 or 1k as a guess.

One thing worth noting about these rolling restarts that I didn't see in your post: if the new code isn't completely backwards-compatible, you can end up with bad states from having a mix of workers running. This negates a lot of the value of the rolling restart because it creates other failure modes.

For example, if you introduce a new ajax endpoint in the release, and a client hits a new worker generating a HTML page that calls it, but 90% of your gunicorn workers are still serving the old version of the app, 90% chance that you're going to 404 that request.

That's definitely an issue and you have to weight the time it takes to work around that with how long you mind the service being unreachable.

A new AJAX endpoint is pretty simple to work around with 2 releases, one to add the endpoint and another to use it. Most changes probably aren't that fortunate.

Rather than killing Gunicorn's child processes, I prefer to send SIGHUP to the master process.[1]

It's as simple as

`pkill -f --signal HUP "gunicorn: master \[procname\]"`

[1] https://gunicorn-docs.readthedocs.org/en/latest/faq.html?hig...

The issue with that is that it tends to stop all the workers then start them again, not one-by-one. Since some applications can take a minute to start that leaves the socket open but won't start accepting connections until the workers start again.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact