
Python Deployment Anti-Patterns - craigkerstiens
http://hynek.me/articles/python-deployment-anti-patterns/
======
djtriptych
As a python dev who deploys a lot of software, I found this article to be
wonderfully helpful and informative, and a good reflection of current best
practices.

Summary of the deployment tools mentioned:

    
    
      - Manage remote daemons with supervisord
        http://supervisord.org/    
    
      - Manage python packages with pip (and use `pip freeze`)
        http://pypi.python.org/pypi/pip
        http://www.pip-installer.org/en/latest/requirements.html
    
      - Manage production environments with virtualenv
        http://www.virtualenv.org/
    
      - Manage Configuration with puppet and/or chef
        http://puppetlabs.com/
        http://www.opscode.com/chef/
    
      - Automate local and remote sys admin tasks with Fabric
        http://fabfile.org
    

Other tips:

    
    
      - Don't restrict yourself to old Python versions to appease your tools / libs.
    
      - Strongly consider rolling your own DEB/RPMs for your Python application. 
    

Author also touted:

    
    
      - Celery for task management
        http://celeryproject.org/
    
      - Twisted for event-based python.
        http://twistedmatrix.com/trac/
    
      - nginx / gunicorn for your python web server stack
        http://www.nginx.com/
        http://gunicorn.org/

~~~
mdehaan
Chef requires Ruby programming. Puppet doesn't, but the core is obviously
Ruby, and you extend it using Ruby.

You may possibly like my new project:

<http://ansible.github.com>

The core is Python but you can write modules in any language.

~~~
pconf
Thanks for that! We gave up on chef when one of their version updates failed
to work with a prior version, both of which were the OS package defaults. Chef
silently failed, no error message, nothing in the docs, nothing even in the
source code. Had to do a fair bit of searching to find out why.

When open source projects like chef have nobody interested in even documenting
much less testing backwards incompatibilities we move them to the bottom of
our to-eval list.

This also illustrates a problem in article's blind enthusiasm for the latest
revisions and libraries i.e., it dismisses the headaches this causes end-
users, who often don't have staff or budget to fix whatever breaks during an
upgrade. That said we are at least talking about python, which has had better
release QA and backwards compatibility than perl, ruby or, gasp, php.

~~~
chromatic
_That said we are at least talking about python, which has had better release
QA and backwards compatibility than perl...._

I'm curious as to your experience here. I've found that Perl has by far the
best backwards compatibility and release QA of the major dynamic languages.
What did you encounter?

~~~
pconf
We don't use as much perl as we used to but the last upgrade issue was
amavisd-new (a Spamassassin wrapper). Spamassassin has perl version issues
every so often as well. NetDNS used to introduce new bugs about every 4th
revision but seems to have been stable for the past couple of years. GNUmp3d
and many audio libraries have non-perl revision-related, backwards-
compatibility issues with some regularity.

~~~
chromatic
That makes sense. XS components (compiled code which uses the Perl API) don't
have binary backwards compatibility between major Perl releases.

~~~
pconf
The audio library incompatibilities were API changes. Amavisd's issues are not
binary either but do seem mostly socket related.

------
smacktoward
I don't get the negativity on using your distro's packages, at least from the
staying-stable perspective. Any decent package manager should let you pin/hold
critical packages on a particular version, so if "the next Ubuntu ships with a
different SQLAlchemy by default" you just hold the SQLAlchemy package at the
version you want and then ignore it until you're ready to make that move.

99% percent of the time when I hear people complaining about their distro's
packages, the complaints are coming from the opposite direction -- they want
to run something bleeding-edge and the distro doesn't have it yet. (This is
the standard beef Rubyists have with Debian, for instance -- that code that
just hit Github ten minutes ago isn't in Debian's repos yet.)

~~~
hynek
The worst thing about them is the fact that they are installed into global
site-packages that you shouldn’t use for any serious coding.

And yes, they are mostly outdated too.

~~~
viraptor
> "into global site-packages that you shouldn’t use for any serious coding"

Any specific reason for that? I find it quite good and have quite large
deployments using .debs only with packages in global location. (tens of
packages produced locally - either updated or unavailable dependencies and the
service itself) Any direct dependency is handled by package pinning and no
update goes into production untested, so the whole "new sqlalchemy suddenly
appears" issue does not exist. As long as people don't break API versioning in
silly ways, what's the problem with this?

The only version-related issue I remember was when someone thought it would be
nice to install something through pip, instead of via package. (went to
/usr/local)

~~~
j_baker
Having a global python installation where packages are constantly installed,
uninstalled, and updated is the path to madness. If something goes wrong, what
can you do? You can't wipe out the system python. You _can_ wipe out a
virtualenv though.

~~~
viraptor
What do you mean by goes wrong? Either some package is installed or not. For
me, chef manages which ones are. If something is really FUBAR, then wiping is
exactly the path I'd take - or more accurately, take that server down for
analysis of how exactly it got into that state (so we won't do that again) and
bring a clean one up.

~~~
mbreese
Mainly it's about incompatibilities. What if you have two apps that require
different versions of a library? If you've installed it in site-packages, then
you have little recourse. By separating them out with virtualenv the two apps
will work just fine.

~~~
viraptor
Fortunately I'm in a one-service-one-server environment, so I may be biased
here ;)

~~~
pbiggar
Well, not really, because what if ubuntu packages rely on version X, and you
need version Y.

~~~
davvid
You roll version Y yourself and install it into an alternate prefix. If the
server uses debian that means make a new deb and deploy it using the standard
tools. apt-get/yum/etc. are very solid deployment tools.

~~~
pbiggar
I wasn't saying it was impossible, but what you've described is already about
10 times harder than using virtualenv.

------
lifeisstillgood
Regarding virtualenv, I have come to the conclusion that Linux containers are
robust enough now (like freebsd jails say two or three years ago) that I don't
need to virtualise just python - I can afford to have the whole server as a
"virtualenv" - no need for that extra complexity just install into site
packages. No conflicts because a whole instance is dedicated. Jails take this
to the limit - one virtual machine, one process - say Bind. A vulnerability in
Bind ? The attacker takes over ... Nothing.

~~~
j_baker
Sorry, but I'm not sure I understand your logic. Using virtualenv adds extra
complexity, but virtualizing the entire server doesn't? I mean, the only
complexity using virtualenv adds is having to run the virtualenv process once.
After that, you can still install to site-packages. You just have to install
to a different site-packages directory.

Besides that, it's worth pointing out that using a virtualenv is _not_ a
security precaution. It's a precaution to prevent mucking up the global python
installation for other packages that run on it. Using linux containers to
achieve this seems like overkill.

~~~
lifeisstillgood
A late reply but I don't just want the python environment virtualised - if
it's important enough that I should section off python, then there is a good
chance I will want to consider the whole box as a single unit, python,
firewall rules, database what ever. I tend to think the unit of abstraction
should not be the python process, but the server. This is a little easier to
grasp when you think of BSD jails where essentially you can choose to only run
those processes that actually mTter - it's less virtualised and os than pick
and mix an os.

Apologies for late reply - I guess I am straightening it out in my head more
than telling anyone else.

------
Pewpewarrows
Regarding "Don't use ancient system Python versions" and "Use virtual
environments", you can knock out two birds with one stone by just using
pythonbrew. It also saves you the hassle of rolling your own deb/rpm if a
package doesn't happen to exist.

Also, Chef/Puppet aren't "alternatives" to something like Fabric. Use the
former for server provisioning, and use the latter for actually kicking off
the deployment process. Trying to shoe-horn the finer deployment steps (git
checkout, tarballing, symlinks, building the virtualenv, etc) into Chef was a
nightmare every time I tried. Those tasks are better suited for Fabric's
imperative design. Plus you can just run any Chef commands from Fabric itself,
or use something like pychef for finer grained control. It's a win/win.

~~~
hynek
> Those tasks are better suited for Fabric's imperative design.

Yes they are, but IMHO not on the target servers.

I use Fabric to build DEBs that get deployed by Puppet. I prefer to have no
build tools on target servers, YMMV.

~~~
mrsteveman1
DEBs for your own project files, or debs containing built python modules that
the server needs to run those project files?

If the latter, how well does that mix in with virtualenv? or do you just avoid
it entirely?

~~~
hynek
I’m the dude that wrote article, so like it says: Own stuff + deps.

Your can re-initialize a virtualenv to fix it by simply running virtualenv
again.

But pinky swear I’ll write the second article. ;)

~~~
bebop
Get on it! :) I really enjoyed the article and want to know more of your
magic.

------
ch0wn
> The trick is to build a debian package (but it can be done using RPMs just
> as well) with the application and the whole virtualenv inside.

I would love to read an article describing some best practices for doing that.
I tried it once and found it extremely difficult, reverting to a git checkout
+ virtualenv kind of deployment.

~~~
davvid
Check out git's "make rpm" target.

<https://github.com/gitster/git/blob/master/Makefile>

Hosting your own apt/yum repo is pretty simple.

Does anyone have an example of a similar "make deb" target they could share?

I've heard of git-dpm and git-buildpackage but haven't used them extensively
myself. They're the debian git packaging tools.

<http://wiki.debian.org/PackagingWithGit>

~~~
hynek
Have a look at fpm: <https://github.com/jordansissel/fpm>

More details to come.

------
bickfordb
Long-time Python user here.

For a lot of my projects I write a shell script which builds all of the
application dependencies (including services) into a project directory and run
them all from there.

It takes a little bit of work to get going --- especially when building a new
service for the first time --- but I like that it side-steps language-specific
packaging tools (particularly the half-baked Python ones) and lets me pin an
applications dependencies and port to various environments (develop on Mac,
deploy on Unix) almost exactly. Integrating with Puppet/Chef is just a matter
of breaking up the shell script into pieces.

------
postfuturist
We do this, but also add a couple layers of safety between us and PyPI:

1\. Run your own secure, local pypi clone with exact source versions of the
packages you use.

2\. The packages for production are built into RPMs from the local pypi.

PyPI is great for discovery, getting things running quickly, and testing new
versions, but you never want to rely on it, even for development.

~~~
jmlane
I'd really enjoy reading about how to setup a PyPI mirror like the one you use
in your development/deployment workflow. It seems like a really good idea,
considering I've had problems with PyPI at really inconvenient times in the
past.

~~~
postfuturist
I think we use this: <http://pypi.python.org/pypi/chishop/>

You just set it up on a local server, and upload packages the same way they
are uploaded to real PyPI.

    
    
        python setup.py register sdist upload
    

You can specify alternate PyPI server with a ~/.pypirc. There are probably
other ways to do this. What's nice about this is you can upload your own
private packages or your own personal forks of packages. We do both.

------
serverascode
I don't think using virtualenv to jam everything into a big deb file is really
a best practice.

But at the end of the day, I do have to do a lot of that with application
deployment, but I try to only go as far as packaging libraries (ie. gems,
jars, python equiv) in the rpm/deb file.

RHEL 6 is python 2.6.6, btw.

What happens when there are vulns for your stack?

~~~
hynek
> What happens when there are vulns for your stack?

That’s a good point and the answer is: You have to monitor your dependencies
of public services (that aren’t that many).

But you have to do that anyway, because I can’t explain to our customers that
their data has been hacked because Ubuntu/Red Hat didn’t update Django (fast
enough).

~~~
serverascode
You make it sound like if you do one then you can do 100. Not the case.

~~~
hynek
My public services don’t have 100 dependencies and that’s on purpose. Relying
on magic distribution fairies for all your libraries is a IMHO a false sense
of security, YMMV.

How do you make sure that whenever one of your dependencies gets updated that
your daemons get restarted?

And what do you do if you need a package that isn’t part of your distribution?

~~~
serverascode
Do you have your own linux distro?

------
IgorPartola
My two cents: I am a developer + ops person and deploy Python apps all the
time. Typically they are Django and Tornado services. On top we also have a
lot of daemons and a ton of library code.

I agree with the OP on most points but do not on a few. First DO use packages
that come with the OS. The OP says that you should not have the distro
maintainers dictating what you use. I say, use what is widely available. It
takes the headache out of a lot of your deployments. If you are looking for a
library that converts foo to bar look in your distro's repos before going on
GitHub. Your sysadmin will thank you.

Second, DO NOT use virtualenv. It fixes the symptoms (Python's packaging
system has many shortcomings such as inability to uninstall recursively, poor
dependency management, lack of pre and post install scripts, etc.), but not
the problem. Instead, use distro-appropriate packages. Integrate your app into
the system. This way you will never end up running a daemon inside a screen
session, etc. You also get the ability to very nicely manage dependencies and
a clean separation between code and configuration.

Lastly, DO use apache + mod_wsgi. It is fast, stable, widely supported and
well tested. If apache feels like a ball of mud, take the time to understand
how to cut it down to a minimum and configure it properly.

When it comes to infrastructure, making boring choices leads to predictable
performance and less headaches more often than not (at least in my
experience).

~~~
read_wharf
"First DO use packages that come with the OS."

I'd go middle ground, and start here, but consider a self-built package where
necessary. It depends in part on the focus of your distro.

virtualenv. What problem _does_ it solve? Different python
version/environments? Wouldn't that be better solved with another (virtual)
server? I understand if an extra $20/month is an issue, but otherwise ...

~~~
read_wharf
I have just been educated by reading more of this thread. I can see an obvious
need for _one_ virtualenv, so that you can separate your service and its needs
from the system python and its needs. Beyond that my inclination would be to
go more servers rather than more virtualenvs, but circumstances vary and my
experience is narrow.

~~~
hynek
Not every Python application is a big web app. We have systems that run
several smaller Python apps. Python is everywhere.

------
jordanb
This guy seems to be of the opinion that the software should be completely
isolated from the deployment operating system.

I know that's a common view and wrapping as much of the site as possible up in
a virtualenv certainly has a lot of advantages. But ultimately, your software
is going to have to interact with the OS, at some level, otherwise, why do you
even have an OS? So the question is: where do you draw the line? He seems to
draw it further down the stack than most people (no system python, for
instance) but he doesn't give his opinion on, for instance, using the system
postgresql.

Anyway, I personally would draw the line further up the stack than him, but
take things on a case-by-case basis, and I don't really consider it an "anti-
pattern."

With regards to fabric vs. puppet, I understand the advantages of puppet when
you have a complicated, hetrogenous deployment environment. But the majority
of projects I've worked on have the operations model of a set of identically-
configured application servers back-ended against a database server. For this
configuration, what does puppet give you? If the author's argument is that the
site may eventually outgrow that model, well, I can see puppet becoming
necessary, but why not cross that bridge when you get to it?

~~~
raverbashing
I think there's an exageration as well, today the trend seems to be "use
nothing by the distro"

Ok, sure, MongoDB still changes a lot between versions, in this case you
should use the latest version.

But stop there. Especially if you're paying for support (like RHEL)

There should be a good reason for you to compile Apache / MySQL / PostgreSQL /
Python. Otherwise, use the distro version. One (common) exception would be "we
need Python 2.7 but this ships only 2.6"

Most of the "just download and compile" have no idea of the work that goes
behind Linux distributions to ship these packages.

Yes, I'm sure you're going to read all security advisories and recompile all
your stack every X days instead of running apt/yum upgrade

~~~
hynek
Yes you should, but if you have them: do it. That’s what the article says. ;)
Please don’t push it in a wrong direction.

What I actually wrote is: because we’re a LAMP web hoster, we compile
MySQL+PHP+Apache ourself. And because we’re a Python shop, we don’t let
Ubuntu/Red Hat dictate which Python version we use.

~~~
raverbashing
You're right, sorry about that ;) Guess I was thinking too much about the
"compile everything" people.

Great article, btw!

~~~
hynek
thanks!

------
cdavid
I am glad this is working for the OP, but pushing virtualenv and "self-
contained" apps as the one solution is a diservice to the community. There are
valid reasons to rely on your OS, assuming you have an homogenous deployment
target (same OS, maybe different versions):

\- lots of people argue for virtualenv because some versions may be
incompatible. The problem here is the lack of backward compatibility of
packages, and frankly, if you need to rely on packages which change API
willingnily betwen e.g. 1.5 and 1.6, or if each of your service depends on a
different version of some library, you have bigger problems anyway.

\- any sufficiently complex deployment will depend on things that are not
python, at which point you need a solution that integrates multiple languages.
That is, you re-creating what a distribution is all about.

\- virtualenv relies on sources, so if some of your dependences are in C,
every deploy means compilation

\- I still have no idea how security is handled when you put everything in
virtualenv

See also [http://bytes.com/topic/python/answers/841071-eggs-
virtualenv...](http://bytes.com/topic/python/answers/841071-eggs-virtualenv-
apt-best-practices)

~~~
hynek
> pushing virtualenv and "self-contained" apps as the one solution is a
> diservice to the community.

Wow. :(

> There are valid reasons to rely on your OS, assuming you have an homogenous
> deployment target (same OS, maybe different versions):

I’d love to hear them.

> lots of people argue for virtualenv because some versions may be
> incompatible. The problem here is the lack of backward compatibility of
> packages, and frankly, if you need to rely on packages which change API
> willingnily betwen e.g. 1.5 and 1.6, or if each of your service depends on a
> different version of some library, you have bigger problems anyway.

Well, you said there are possibilities of problems but that they shouldn’t
matter in an ideal world. Maybe you’re okay to take the chances but I’m not.
Every code I deploy has been tested rigorously against a certain set of
versions and that is the only combination of dependencies I’m willing to
consider “working”. UnitTests with different dependencies are just as
worthless like integration/functional tests against sqllite instead of the
same DB type as in production.

There’s even the possibility that your code works because of a bug and when
that one gets fixed, you app goes south because of some weird side-effect.

> any sufficiently complex deployment will depend on things that are not
> python, at which point you need a solution that integrates multiple
> languages. That is, you re-creating what a distribution is all about.

I’m not sure if I understand what you mean, but yes if you want to use certain
features outside the Python ecosystem, you’ll have to buckle up and package
them yourself too. “We can’t do that, package XYZ is missing/too old.” isn’t
really a good excuse to not do something that is important/good for your
business. And that’s one of the main points of the article.

> virtualenv relies on sources, so if some of your dependences are in C, every
> deploy means compilation

That’s wrong if you go the way described: The virtualenv is packaged with the
code. Build tools don’t belong on production servers.

> I still have no idea how security is handled when you put everything in
> virtualenv

Just as everywhere else. If you think it’s okay to tell customers that their
data has been hacked because debian was to slow to issue a fix, be my guest.
We can’t afford that. What happens on my servers security wise is _my_
responsibility and using ancient versions of Python libraries just to be able
to blame others for FUBARs is not a solution in my book.

~~~
__alexs
> I’m not sure if I understand what you mean, but yes if you want to use
> certain features outside the Python ecosystem, you’ll have to buckle up and
> package them yourself too.

I think his point is that pip can't do this, but learning to actually work
with your distro's packaging system properly results in a more powerful and
easier to redistribute project.

> Just as everywhere else. If you think it’s okay to tell customers that their
> data has been hacked because debian was to slow to issue a fix, be my guest.

Your packaging methodology is not what gives you security there. It's that you
noticed a vulnerability and deployed a fix. The method of deployment is
irrelevant. Your point is that knowing you have a security issue and waiting
for upstream to get around to fixing it isn't always acceptable. That goes for
everything.

If you know how to roll .debs it's just as easy to patch and release a fixed
version of a library. (Or even install the pip version earlier in your
sys.path...)

~~~
hynek
> I think his point is that pip can't do this, but learning to actually work
> with your distro's packaging system properly results in a more powerful and
> easier to redistribute project.

Absolutely. And that’s why I package the whole virtualenv into the DEB along
with the project. I always have the assertion, that that combination of
packages that’s inside passes all my tests no matter where I install it.

------
afhof
Using tmux is a Python daemon antipattern? And then "there's so much wrong
about this approach" that he doesn't bother explaining why? Isn't that why we
are reading the article: because we want to know why?

If the author is trying to convince people to change their habits, he is doing
a crummy job. He comes across as elitist and "if you don't do it my way you're
wrong".

~~~
eclark
He's talking about creating /etc/init.d scripts that do something like:

screen python manage.py gunicorn &

That is an anti-pattern. Granted he didn't really explain it so well.

~~~
hynek
Honestly I didn’t think it would be necessary. :-/

Your example is btw the advanced version. Many people just ssh on the host,
fire up screen and start their server inside.

------
ma2rten
_Don’t run your daemons in a tmux/screen_

Wow, I always though of myself as an idiot for doing this. But that is for
some not yet launched thing. Who on earth does this for a production website?

------
cageface
The ability of Go to produce a single, self-contained executable is one of the
biggest advantages it has over the "scripting" languages. It makes deployment
so much simpler.

------
njharman
Virtualenv is a half solution and a hack. Use vagrant and VMs. There's a whole
sea of libs and software that isn't "versioned" by pip/virtualenv.

supervisord is the wrong solution. It answers the wrong question (is the
process running). It's worse than useless in that it has given false
positives. The right question is (is the process responding correctly). Use
monit or something else that actually does what's needed.

~~~
antihero
Why would you want to virtualise an entire system when your dependencies are
restricted to a bunch of Python modules?

~~~
mrweasel
Personally I would just go for Virtualenv, but you'd be surprised if you know
people do with virtualization. I've mostly seen it in Windows shops, but it's
by no means limited to that. A common "anti-pattern" I've seen is to build a
vmware image, or set of images, that run your "environment". Developers then
spin up copies and request changes that they need be made to the "master
copy".

When it's time to test and deploy, the code is deployed to a vmware image,
which is then sent to the testers. When everything checks out, the code is
once again deploy to a new copy of the image, which is then promoted to
production.

On argument we've seen is that it makes it easy to do a rollback of the
system, using vmware snapshots. It might be a sensible idea in some cases, I
just thing it's a bit weird and some overhead.

The worst use case of this I've seen is a Telco, where you needed to spin up
as much 12 vms, depending one what you and your teams was working on. But then
again that's the same company that read the Perforce guidelines on recommended
setup and still decided to just pill all the projects in to one repository.

------
minikomi
Just a thank you for writing a positive, easy to follow overview with links to
more in-depth information. I love when people boil down experience and serve
it without a side dish of attitude.

