
How We Deploy Python Code - spang
https://nylas.com/blog/packaging-deploying-python
======
svieira
Back when I was doing Python deployments (~2009-2013) I was:

* Downloading any new dependencies to a cached folder on the server (this was before wheels had really taken off) * Running pip install -r requirements.txt from that cached folder into a new virtual environment for that deployment (`/opt/company/app-name/YYYY-MM-DD-HH-MM-SS`) * Switching a symlink (`/some/path/app-name`) to point at the latest virtual env. * Running a graceful restart of Apache.

Fast, zero downtime deployments, multiple times a day, and if anything failed,
the build simply didn't go out and I'd try again after fixing the issue.
Rollbacks were also very easy (just switch the symlink back and restart Apache
again).

These days the things I'd definitely change would be:

* Use a local PyPi rather than a per-server cache * Use wheels wherever possible to avoid re-compilation on the servers.

Things I would consider:

* Packaging (deb / fat-package / docker) to avoid having any extra work done over per-machine + easy promotions from one environment to the next.

~~~
s_kilk
I built a system that did something very like this at a previous employer. We
got really quick (mostly) atomic deployments which could be rolled-back
instantly with one command.

Even at the time I thought Docker would be a great solution to the problem,
but the organization was vehemently against using modern tech to manage
servers and deployments, so I ended up writing that tool in bash instead. Good
times.

------
morgante
Their reason for dismissing Docker are rather shallow, considering that it's
pretty much the perfect solution to this problem.

Their first reason (not wanting to upgrade a kernel) is terrible considering
that they'll eventually be upgrading it anyways.

Their second is slightly better, but it's really not that hard. There are
plenty of hosted services for storing Docker images, not to mention that
"there's a Dockerfile for that."

Their final reason (not wanting to learn and convert to a new infrastructure
paradigm) is the most legitimate, but ultimately misguided. Moving to Docker
doesn't have to be an all-or-nothing affair. You don't have to do random
shuffling of containers and automated shipping of new images—there are
certainly benefits of going wholesale Docker, but it's by no means required.
At the simplest level, you can just treat the Docker contain as an app and run
it as you normally would, with all your normal systems. (ie. replace "python
example.py" with "docker run example")

~~~
viraptor
> (not wanting to upgrade a kernel) is terrible considering that they'll
> eventually be upgrading it anyways.

If they're running ubuntu 12.04 LTS they can keep the 3.2 kernel until late
2017. That's 2 more years. And they wrote "did not", so it was likely the
situation months ago, not yesterday.

> (not wanting to learn and convert to a new infrastructure paradigm) is the
> most legitimate, but ultimately misguided

It depends on the amount of stuff they deploy. If they handle _everything_
using Ansible (and from the list it looks like they do), then it's months of
work to migrate to something else. They may need the right users / logging /
secret management in the app itself, not outside of it.

~~~
morgante
> If they handle everything using Ansible (and from the list it looks like
> they do), then it's months of work to migrate to something else.

It's not. It would be months of work if they wanted to convert all their
Ansible code to Docker, but that's by no means required.

Docker and Ansible can easily coexist peacefully.

~~~
viraptor
They can. But depending on how you used Ansible before, it may mean a heavy
rewrite of your deployment strategy. I'm not saying it will always take that
long. But depending on your app, the requirements may be very complex and not
fit into the docker idea.

(it always means some extra work for security updates though - now you're
updating both the host and images)

------
Cieplak
Highly recommend FPM for creating packages (deb, rpm, osx .pkg, tar) from
gems, python modules, and pears.

[https://github.com/jordansissel/fpm](https://github.com/jordansissel/fpm)

~~~
jlees
That seems like a neat tool. I wonder if you could combine it with the
sandboxing that dh-virtualenv provides to get the best of both worlds?

~~~
rhelmer
I've used fpm to make rpm and deb packages that simply include a virtualenv,
it works ok.

One of the significant tradeoffs to this approach is you lose the carefully-
crafted tree-of-dependencies that the distros favor, so it makes the package
pretty much automatically unacceptable to package maintainers.

However, being able to have install instructions that amount to "yum/apt-get
install <package>" is pretty great.

I am hoping for an app/container convergence at some point, but we might need
to drop the fine-grained dependency dream and have them be more self-
contained, like Mac OS X apps.

~~~
vacri
FPM is written as an in-house solution only. It's not intended to be used for
making packages for official distro repositories for third-party users to pick
up, and they suggest you use the distro-specified methods for those.

------
doki_pen
We do something similar at embedly, except instead of dh-virtualenv we have
our own homegrown solution. I wish I new about dh-virtualenv before we created
it.

Basically, what it comes down to a build script that builds a deb with the
virtualenv of your project versioned properly(build number, git tag), along
with any other files that need to be installed (think init scripts and some
about file describing the build). It also should do things like create users
for daemons. We also use it to enforce consistent package structure.

We use devpi to host our python libraries (as opposed to applications),
reprepro to host our deb packages, standard python tools to build the
virtualenv and fpm to package it all up into a deb.

All in all, the bash build script is 177 LoC and is driven by a standard build
script we include in every applications repository defining variables, and
optionally overriding build steps (if you've used portage...).

The most important thing is that you have a standard way to create python
libraries and application to reduce friction on starting new projects and
getting them into production quickly.

------
remh
We fixed that issue at Datadog by using Chef Omnibus:

[https://www.datadoghq.com/blog/new-datadog-agent-omnibus-
tic...](https://www.datadoghq.com/blog/new-datadog-agent-omnibus-ticket-
dependency-hell/)

It's more complicated than the proposed solution by nylas but ultimately it
gives you full control of the whole environment and ensure that you won't hit
ANY dependency issue when shipping your code to weird systems.

~~~
sytse
At GitLab we use Chef Omnibus too and we love it. More than 100k organizations
use GitLab with Omnibus and it has lowered our support effort enormously.
[https://gitlab.com/gitlab-org/omnibus-
gitlab/blob/master/REA...](https://gitlab.com/gitlab-org/omnibus-
gitlab/blob/master/README.md)

------
kbar13
[http://pythonwheels.com/](http://pythonwheels.com/) solves the problem of
building c extensions on installation.

~~~
emidln
Pair this with virtualenvs in separate directories (so that "rollback" is just
a ssh mv and a reload for whatever supervisor process) and you get to skip the
mess of building system packages.

Also, are there seriously places that don't run their own PyPI mirrors? Places
that have people who understand how to integrate platform-specific packages
but can't be bothered to deploy one of the several PyPI-in-a-box systems or
pay for a hosted PyPI?

~~~
vamega
Can you point me to your recommended PyPI-in-a-box system?

~~~
emidln
I currently use this one:
[https://localshop.readthedocs.org/en/latest/installing.html](https://localshop.readthedocs.org/en/latest/installing.html)

It works. It's django based and you can setup s3-backed storage. It also has a
docker-compose script.

~~~
doki_pen
We migrated off of localshop and onto devpi. Devpi is a much better product
and much more actively maintained. localshop was nothing but headaches and
constantly breaking.

~~~
mvantellingen
Author here: I created it to solve an issue I was running into a couple of
years ago. I've only recently started using it again myself. I think the
development version (not on pypi) is in much better shape with things like
multiple repositories and better user management (teams).

------
tschellenbach
Yes, someone should build the one way to ship your app. No reason for
everybody to be inventing this stuff over and over again.

Deploys are harder if you have a large codebase to ship. rSync works really
well in those cases. It requires a bit of extra infrastructure, but is super
fast.

~~~
mattbillenstein
+1 rsync is pretty darn good at any scale -- I'm not sure why the simplest
solution possible doesn't beat out docker as a suggestion in this thread.

I've been bundling libs and software into a single virtual environment like
package that I distribute with rsync for a long time - it solves loads of
problems, is easy to bootstrap a new system with, and incremental updates are
super fast. Combine that with rsync distribution of your source and a good
tool for automating all of it (ansible, salt, chef, puppet, et al) and you
have a pretty fool-proof deployment system.

And a rollback is just a git revert and another push away -- no need to keep
build artifacts lying around if you believe your build is deterministic.

~~~
viraptor
Rsync is good for simple things. But it will fail with more complicated apps:

\- how do you know which version you're running right now?

\- how do you deploy to two environments where different deps are needed?

\- how do you tell when your included dependencies need security patches?

~~~
mattbillenstein
rsync isn't the complete system - you're going to need git (or another vcs)
and some other tools of course.

#1 is git (dump and log the git head on a deploy) #2 don't do that - keep a
single consistent environment #3 use the system openssl - monitor other
software components for security updates -- you need to do this anyway in any
of these systems.

~~~
viraptor
> #2 don't do that

I wish everyone to have easy deployments where environments, OS versions and
everything else are always consistent. :)

> #3 monitor other software components for security updates -- you need to do
> this anyway in any of these systems.

Sure. But having multiple virtualenvs means you need to monitor all of them on
all of deployed hosts. Having everything packaged separately means you can do
audits much easier and without location-specific checks.

------
sandGorgon
The fact that we had a weird combination of python and libraries took us
towards Docker. And we have never looked back.

For someone trying out building python deployment packages using deb, rpm,
etc. I really recommend Docker.

~~~
craigmccaskill
They specifically called that out in the article with an entire section called
"just use docker".

~~~
x0x0
if you have to choose between a kernel and docker, just choose docker. Python
can't get their shit together deployment-wise, and docker is the one true
route (tm) to python deployment happiness.

forget virtualenv; forget package dependencies on conflicting versions of
libxml; forget coworkers that have 3 different conflicting versions of
requests scattered through various services, and goddamnit I just want to run
a dev build; forget coworkers that scribble droppings all over the filesystem,
and assume certain services will never coexist on the same box

just use docker. It's going to go like this:

step 1: docker

step 2: happy

~~~
pyre
Ha. Wait until you need to run a build of shared Perl codebase against unit
tests in all of the dependent codebases... but some of those codebases compile
and run C (or C++) programs... and some of those codebases depend on
conflicting versions of GCC!

"If we hit the bullseye, the rest of the dominos will fall like a house of
cards... checkmate!" \-- Zap Brannigan

> forget coworkers that scribble droppings all over the filesystem, and assume
> certain services will never coexist

I think this tends to be less of a problem than the desire to have a build
artifact that can be reliably deployed to multiple servers, rather than having
the "build" process and "deploy" process hopelessly intertwined with each
other.

------
sophacles
We use a devpi server, and just push the new package version, including wheels
built for our server environment, for distribution.

On the app end we just build a new virtualenv, and launch. If something fails,
we switch back to the old virtualenv. This is managed by a simple fabric
script.

------
nZac
We just commit our dependencies into our project repository in wheel format
and install into a virtual env on prod from that directory eliminating PyPi.
Though I don't know many other that do this. Do you?

Bitbucket and GitHub are reliable enough for how often we deploy that we
aren't all that worried about downtime from those services. We could also pull
from a dev's machine should the situation be that dire.

We have looked into Docker but that tool has a lot more growing before "I"
would feel comfortable putting it into production. I would rather ship a
packaged VM than Docker at this point, there are to many gotchas that we don't
have time to figure out.

~~~
erikb
You put the wheels into a git repo? That's the most sad thing I've heard
today. You know that if you add a file in commit A and remove it in Commit B
each and every clone still pulls in that file? It's okay for text files but
it's very much not okay for binaries and packages.

~~~
kevinschumacher

        git clone --depth=1 path/to/repo
    

when doing a clone for a deploy, since you don't need the history

edit: but yes, cloning as a developer will take a long time. But, if it really
gets out of hand, I can hand new devs a HDD with the repo on it, and they can
just pull recent changes. Not ideal, but pretty workable

------
viraptor
> curl “[https://artifacts.nylas.net/sync-
> engine-3k48dls.deb”](https://artifacts.nylas.net/sync-engine-3k48dls.deb”)
> -o $temp ; dpkg -i $temp

It's really not hard to deploy a package repository. Either a "proper" one
with a tool like `reprepro`, or a stripped one which is basically just .deb
files in one directory. There's really no need for curl+dpkg. And a proper
repository gives you dependency handling for free.

~~~
mixmastamyk
Could you elaborate on the simple folder?

For example I find the --instdir option to dpkg but it still would have to be
downloaded from the other host, unless of course the folder was mounted
somehow.

~~~
viraptor
Search for debian's "trivial archive". It replaces release/component elements
with explicit path. It's deprecated now, but I believe still works.

------
perlgeek
Note that the base path /usr/share/python (that dh-virtualenv ships with) is a
bad choice; see [https://github.com/spotify/dh-
virtualenv/issues/82](https://github.com/spotify/dh-virtualenv/issues/82) for
a discussion.

You can set a different base path in debian/rules with export
DH_VIRTUALENV_INSTALL_ROOT=/your/path/here

------
serkanh
"Distributing Docker images within a private network also requires a separate
service which we would need to configure, test, and maintain." What does this
mean? Setting up a private docker registry is trivial at best and having it
deploy on remote servers via chef, puppet; hell even fabric should do the job.

~~~
juliangregorian
It's not necessarily true either; it's not difficult to have your continuous
build process build images from the Dockerfile, run tests, swap green and
blue, etc...

------
erikb
No No No No! Or maybe?

Do people really do that? Git pull their own projects into the production
servers? I spent a lot of time to put all my code in versioned wheels when I
deploy, even if I'm the only coder and the only user. Application and
development are and should be two different worlds.

------
objectified
I recently created vdist
([https://vdist.readthedocs.org/en/latest/](https://vdist.readthedocs.org/en/latest/)
\-
[https://github.com/objectified/vdist](https://github.com/objectified/vdist))
for doing similar things - the exception being is that it uses Docker to
actually build the OS package on. vdist uses FPM under the hood, and
(currently) lets you build both deb and rpm packages. It also packs up a
complete virtualenv, and installs the build time OS dependencies on the Docker
machine where it builds on when needed. The runtime dependencies are made into
dependencies of the resulting package.

------
rfeather
I've had decent results using a combination of bamboo, maven, conda, and pip.
Granted, most of our ecosystem is Java. Tagging a python package along as a
maven artifact probably isn't the most natural thing to do otherwise.

------
StavrosK
Unfortunately, this method seems like it would only work for libraries, or
things that can easily be packaged as libraries. It wouldn't work that well
for a web application, for example, especially since the typical Django
application usually involves multiple services, different settings per
machine, etc.

~~~
vacri
> _different settings per machine_

/etc/default/mycoolapp.conf

Debian packages have the concept of 'config' files. Files will be
automatically overwritten when installing a new version of package FOO, unless
they're marked as config files in the .deb manifest. This allows you to have a
set of sane defaults, but not to lose customisations when upgrading.

~~~
acdha
Just wanted to +1 this. There are literally decades of convention built around
patterns where you ship a standard config file which merges in
system/user/instance-specific settings from a known location, command-line
argument, environment, etc. The Debian world in particular has lead the
community for decades with the use of debconf to store values such as a
hostname or server role which can automatically be re-applied when otherwise
unmodified files are modified upstream.

When I used this approach with a Django site years ago using RPM[1] we used
the pattern vacri mentioned or the reverse one where you have an Apache
virtualhost file which contains system-specific settings (hostname, SSL certs,
log file name, etc.) and simply included the generic settings shipped in the
RPM.

In either case the system-specific information can be set by hand (this was a
.gov server…), managed with your favorite deployment / config tool, etc. and
allows you to use the same signed, bit-for-bit identical package on testing,
staging, and production with complete assurance that the only differences were
intentional. This was really nice when you wanted to hand things off to a
different group rather than having the dev team include the sysadmins.

1\. [http://chris.improbable.org/2009/10/16/deploying-django-
site...](http://chris.improbable.org/2009/10/16/deploying-django-sites/)

------
avilay
Here is the process I use for smallish services -

1\. Create a python package using setup.py 2\. Upload the resulting .tar.gz
file to a central location 3\. Download to prod nodes and run pip3 install
<packagename>.tar.gz

Rolling back is pretty simple - pip3 uninstall the current version and re-
install the old version.

Any gotchas with this process?

~~~
webo
You have to do this every time there's a change in the codebase which is not
easy. How do you stick this into a CI without the git & pip issue talked about
in the post?

~~~
avilay
I have to do this everytime I have to deploy, which is similar to having to
create a deb package everytime Nylas has to deploy.

There are no git dependencies in the process I describe above.

The pip drawback that is discussed in the post is of PyPi going down. In the
process described above there is no PyPi dependency. Storing the .tar.gz
package in a central location is similar to Nylas storing their deb package on
S3.

~~~
mixmastamyk
Are you using a venv?

~~~
avilay
Nope.

~~~
mixmastamyk
If you did it would probably strengthen the isolation of your modules from
conflicts, or say un-installation errors. Whether that's needed is up to you.

------
velocitypsycho
For installing using .deb files, how are db migrations handled. Our deployment
system handles running django migrations by deploying to a new
folder/virtualenv, running the migrations, then switching over symlinks.

I vaguely remember .deb files having install scripts, is that what one would
use?

~~~
viraptor
Depends on your environment, number of hosts, etc. really. You probably don't
want to stick it into the same install script because:

\- your app user doesn't need rights to modify the schema

\- you need to handle concurrency of schema upgrades (what if two hosts
upgrade at the same time?)

\- if your migration fails it may leave you in the weird installation state
and not restart the service

Ideal solution: deploy code which can cope with both pre-migration and post-
migration schema -> upgrade schema -> deploy code with new features.

~~~
jlees
For e.g. changing the format of a column that's easy enough but it's tricky to
create that intermediate state at the migration level for every migration. One
option is to deploy the migration code without restarting the running services
(or to a different box), rollback the code if the migration failed, restart
the services to pick up the new code if it succeeded. This still means not
writing migrations that actively break the running version though - if you're
using database reflection, everything will go boom when the schema changes.

------
lifeisstillgood
Weirdly I am re-starting an old project doing this venv/ dpkg
([http://pyholodeck.mikadosoftware.com](http://pyholodeck.mikadosoftware.com)).
The fact that it's still a painful problem means Inam not wasting my time :-)

------
webo
> Building with dh-virtualenv simply creates a debian package that includes a
> virtualenv, along with any dependencies listed in the requirements.txt file.

So how is this solving the first issue? If PyPI or the Git server is down,
this is exactly like the git & pip option.

~~~
stephenr
You need those things up to build the package not to install it

~~~
webo
Ah I misunderstood the article. I just package my application source in a tar
during deployment. I thought that's what most people do.

~~~
stephenr
Using a native package gives so much more power - you define that the package
relies on python, and maybe a mysql or postgres client, redis, whatever it
needs, and then just install the one package, and let apt/dpkg handle the
dependencies.

I'm a big fan of using the config-package-dev package from DebAthena to build
config packages, which allow for about 99.9% of Debian server setup to be
defined in Debian packages.

------
compostor42
Great article. I had never heard of dh-virtualenv but will be looking into it.

How has your experience with Ansible been so far? I have dabbled with it but
haven't taken the plunge yet. Curious how it has been working out for you all.

~~~
emfree
Ansible works well for us, although we use it in a somewhat different way than
most folks. We previously wrote about our approach here, if you're curious:
[https://nylas.com/blog/graduating-past-
playbooks](https://nylas.com/blog/graduating-past-playbooks)

~~~
compostor42
Thanks for the article. Well written and a very interesting concept.

------
BuckRogers
Seems this method wouldn't work as well if you have external clients you
deploy for. I'd use Docker instead of doing this, just to be in a better
position for an internal or external client deployment.

~~~
ytjohn
If you took this a step further and set up a debian repo, then you could have
your clients use that debian repo.

I'm looking to do something pretty similiar, but RPMs. I found rpmvenv that
seems to work in the same fashion.
[https://pypi.python.org/pypi/rpmvenv/0.3.1](https://pypi.python.org/pypi/rpmvenv/0.3.1)

~~~
stephenr
Exactly this.

If a company wants to use Docker that's their choice, but I don't think its at
all reasonable to _insist_ on or only support that environment as a software
vendor. If it works on Debian, give me a .deb or even better an Apt Repo to
use.

------
ah-
conda works pretty well.

~~~
TDL
Agreed (although biased since I used to work at Continuum.) I am wondering
what others think of conda?

~~~
4lejandrito
We have used Conda for our first python deployment and the process has been
seamless. It provides the same sandboxing concept using virtualenvs and also
uses prebuilt binaries for native dependencies so you don't have to build them
every time. The only drawback I would say is that we have to install miniconda
in our production servers rather than just deploying an standalone package.

------
jacques_chester
Here's how I deploy python code:

    
    
        cf push some-python-app
    

So far it's worked pretty well.

Works for Ruby, Java, Node, PHP and Go as well.

~~~
nobullet
That's interesting. I've tried to google cf CLI however wasn't able to find
good documentation. Is it possible to install cf CLI on my server? Or is it
Cloud Foundry tool only?

~~~
jacques_chester
The cf CLI tool interacts with a Cloud Foundry installation.

You'd use it for one in your own data centre, or Pivotal Web Services[0], or
BlueMix. You point it at an API and login, then off you go.

If you need something more cut-down to play with, Lattice[1] is nifty, but
currently doesn't do buildpack magic.

[0] [https://run.pivotal.io/](https://run.pivotal.io/) [1]
[http://lattice.cf/](http://lattice.cf/)

------
daryltucker
I see your issue of complexity. Glad I haven't ever reached the point where
some good git hooks no longer work.

------
theseatoms
Does anyone have experience with PEX?

------
stefantalpalaru
> The state of the art seems to be ”run git pull and pray”

No, the state of the art where I'm handling deployment is "run 'git push' to a
test repo where a post-update hook runs a series of tests and if those tests
pass it pushes to the production repo where a similar hook does any required
additional operation".

~~~
themartorana
Git deployments work great if you're packing an image (AMI, Docker) using,
say, Packer. But we only deploy "immutable" images, not code on to existing
servers.

------
hobarrera
> The state of the art seems to be ”run git pull and pray”

Looks like these guys never heard of things like CI.

~~~
jumpkick
You had to read further.

 _This is the core of how we deploy code at Nylas. Our continuous integration
server (Jenkins) runs dh-virtualenv to build the package, and uses Python’s
wheel cache to avoid re-building dependencies._

