Summary of the deployment tools mentioned:
- Manage remote daemons with supervisord
- Manage python packages with pip (and use `pip freeze`)
- Manage production environments with virtualenv
- Manage Configuration with puppet and/or chef
- Automate local and remote sys admin tasks with Fabric
- Don't restrict yourself to old Python versions to appease your tools / libs.
- Strongly consider rolling your own DEB/RPMs for your Python application.
- Celery for task management
- Twisted for event-based python.
- nginx / gunicorn for your python web server stack
You may possibly like my new project:
The core is Python but you can write modules in any language.
When open source projects like chef have nobody interested in even documenting much less testing backwards incompatibilities we move them to the bottom of our to-eval list.
This also illustrates a problem in article's blind enthusiasm for the latest revisions and libraries i.e., it dismisses the headaches this causes end-users, who often don't have staff or budget to fix whatever breaks during an upgrade. That said we are at least talking about python, which has had better release QA and backwards compatibility than perl, ruby or, gasp, php.
I'm curious as to your experience here. I've found that Perl has by far the best backwards compatibility and release QA of the major dynamic languages. What did you encounter?
Do you have any easy tutorials on getting Nginx + uwsgi set up?
Here is the doc from the uwsgi site: http://projects.unbit.it/uwsgi/wiki/RunOnNginx
99% percent of the time when I hear people complaining about their distro's packages, the complaints are coming from the opposite direction -- they want to run something bleeding-edge and the distro doesn't have it yet. (This is the standard beef Rubyists have with Debian, for instance -- that code that just hit Github ten minutes ago isn't in Debian's repos yet.)
And yes, they are mostly outdated too.
Any specific reason for that? I find it quite good and have quite large deployments using .debs only with packages in global location. (tens of packages produced locally - either updated or unavailable dependencies and the service itself) Any direct dependency is handled by package pinning and no update goes into production untested, so the whole "new sqlalchemy suddenly appears" issue does not exist. As long as people don't break API versioning in silly ways, what's the problem with this?
The only version-related issue I remember was when someone thought it would be nice to install something through pip, instead of via package. (went to /usr/local)
Understand what went wrong and not ignorantly hit Ctrl-Alt-Delete.
And BTW, there’s more stuff that can happen than breaking APIs: new bugs that happen only on your system or even better: your code worked only _because_ of a bug. :)
Honestly: how many people installing software through virtualenv are registered to security mailing lists for each of their packages (and their dependencies down to things like simplejson)?
High profile projects like simplejson, Django or Pyramid and their deps won’t be missed and the really obscure ones will never make it into the repositories anyway.
The same can be achieved by subscribing to CVEs... but you have to remember to filter the ones you use. Of course that's not a huge difference, so if someone prefers the second way, there's nothing wrong with it ;)
Ubuntus's latest release 11.10 (yes I know 12.04 is a couple of days away) is Python 2.4. I don't remember what Ruby it is, but it's something like ~=1.8.7. Ruby is on 1.9.3 and the next version of Rails won't even support 1.8.7.
I'd be fine with a year or two old Python and Ruby....
$ uname -a
Linux apollo 3.0.0-17-generic #30-Ubuntu SMP Thu Mar 8 20:45:39 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
$ python -V
The problem is that most programming languages (especially Python and Ruby) are ecosystems unto themselves and often move at a much faster pace than any stable distro (or LTS) could keep up with. That's why we have gems and pip.
Cf. Ubuntu LTS MongoDB default is like 1.2 or something. This is why I switched to using 10gen's repo.
Besides, in the Ubuntu case anyway, LTS is supposed to be old. The whole point of LTS releases is to let slow-moving institutions/enterprises sit on ancient packages for 3-5 years without having to worry about backporting security updates. If you want recent versions of packages LTS is precisely the wrong place for you to be.
Also, even if you're not on an LTS, doesn't mean you have the latest/greatest available. The python community moves at it's own pace, so there's still a chance that you'll be stuck with the just-before-latest-stable version.
Is that so bad? Does python really change that much from release to release?
Besides that, it's worth pointing out that using a virtualenv is not a security precaution. It's a precaution to prevent mucking up the global python installation for other packages that run on it. Using linux containers to achieve this seems like overkill.
Apologies for late reply - I guess I am straightening it out in my head more than telling anyone else.
Also, Chef/Puppet aren't "alternatives" to something like Fabric. Use the former for server provisioning, and use the latter for actually kicking off the deployment process. Trying to shoe-horn the finer deployment steps (git checkout, tarballing, symlinks, building the virtualenv, etc) into Chef was a nightmare every time I tried. Those tasks are better suited for Fabric's imperative design. Plus you can just run any Chef commands from Fabric itself, or use something like pychef for finer grained control. It's a win/win.
I'd love to see some proper detail in the article around why.
And not in the vein of "Chef/Puppet are better", but more along the line of "here's what can go wrong with Fabric".
With Fabric you tell what to _do_ and with Puppet/Chef you define what the result should _look_ like.
You define how a server should look like and it can make sure its true for 1000 servers. Or 10000.
It’s not about Fabric vs. Puppet, but how to use both in the most efficient way.
I prefer to deploy applications in the home directory of a dedicated user account with minimal privileges, and use fabric for installing updates and running application tasks.
OTOH, I'd prefer to be using puppet for creating the user accounts, managing the installation and configuration of PostgreSQL, putting app configuration in a safe location, and so on.
This comes down to my (maybe antiquated?) view of having multiple applications running on a single server.
Maybe I should add a “running apps as root” anti-pattern, but this its 2012 after all, everyone should know that, right? :-/
Yes they are, but IMHO not on the target servers.
I use Fabric to build DEBs that get deployed by Puppet. I prefer to have no build tools on target servers, YMMV.
For example, for our deployment, we rely on softlinks and uwsgi robust reload behavior to avoid losing requests. I've seen many devops who were using hg update/git update as a way to "deploy" (arg!), but I'm not sure about the behavior of deb/rpm.
And you’re right: replacing files of a running application can lead to all kind of weirdness. I’d even prefer to lose some requests than to risk that.
If the latter, how well does that mix in with virtualenv? or do you just avoid it entirely?
Your can re-initialize a virtualenv to fix it by simply running virtualenv again.
But pinky swear I’ll write the second article. ;)
> [...] you can knock out two birds with one stone by just using pythonbrew
I would love to read an article describing some best practices for doing that. I tried it once and found it extremely difficult, reverting to a git checkout + virtualenv kind of deployment.
Hosting your own apt/yum repo is pretty simple.
Does anyone have an example of a similar "make deb" target they could share?
I've heard of git-dpm and git-buildpackage but haven't used them extensively myself. They're the debian git packaging tools.
More details to come.
For a lot of my projects I write a shell script which builds all of the application dependencies (including services) into a project directory and run them all from there.
It takes a little bit of work to get going --- especially when building a new service for the first time --- but I like that it side-steps language-specific packaging tools (particularly the half-baked Python ones) and lets me pin an applications dependencies and port to various environments (develop on Mac, deploy on Unix) almost exactly. Integrating with Puppet/Chef is just a matter of breaking up the shell script into pieces.
1. Run your own secure, local pypi clone with exact source versions of the packages you use.
2. The packages for production are built into RPMs from the local pypi.
PyPI is great for discovery, getting things running quickly, and testing new versions, but you never want to rely on it, even for development.
You just set it up on a local server, and upload packages the same way they are uploaded to real PyPI.
python setup.py register sdist upload
But at the end of the day, I do have to do a lot of that with application deployment, but I try to only go as far as packaging libraries (ie. gems, jars, python equiv) in the rpm/deb file.
RHEL 6 is python 2.6.6, btw.
What happens when there are vulns for your stack?
That’s a good point and the answer is: You have to monitor your dependencies of public services (that aren’t that many).
But you have to do that anyway, because I can’t explain to our customers that their data has been hacked because Ubuntu/Red Hat didn’t update Django (fast enough).
How do you make sure that whenever one of your dependencies gets updated that your daemons get restarted?
And what do you do if you need a package that isn’t part of your distribution?
I agree with the OP on most points but do not on a few. First DO use packages that come with the OS. The OP says that you should not have the distro maintainers dictating what you use. I say, use what is widely available. It takes the headache out of a lot of your deployments. If you are looking for a library that converts foo to bar look in your distro's repos before going on GitHub. Your sysadmin will thank you.
Second, DO NOT use virtualenv. It fixes the symptoms (Python's packaging system has many shortcomings such as inability to uninstall recursively, poor dependency management, lack of pre and post install scripts, etc.), but not the problem. Instead, use distro-appropriate packages. Integrate your app into the system. This way you will never end up running a daemon inside a screen session, etc. You also get the ability to very nicely manage dependencies and a clean separation between code and configuration.
Lastly, DO use apache + mod_wsgi. It is fast, stable, widely supported and well tested. If apache feels like a ball of mud, take the time to understand how to cut it down to a minimum and configure it properly.
When it comes to infrastructure, making boring choices leads to predictable performance and less headaches more often than not (at least in my experience).
I'd go middle ground, and start here, but consider a self-built package where necessary. It depends in part on the focus of your distro.
virtualenv. What problem does it solve? Different python version/environments? Wouldn't that be better solved with another (virtual) server? I understand if an extra $20/month is an issue, but otherwise ...
The sysadmin will have no part in the game if you use packaged virtualenvs. OTOH developer time is expensive. Do you really want to pay your developers to implement functionality that a more recent version of a package has already implemented? A good example is IPv6 support in Twisted. It’s getting implemented right now but I guess (and hope) that I’ll need it sooner than it lands in major distros (please no “lol ipv6” here, it’s just an example and the support is growing).
> Second, DO NOT use virtualenv. It fixes the symptoms (Python's packaging system has many shortcomings such as inability to uninstall recursively, poor dependency management, lack of pre and post install scripts, etc.), but not the problem. Instead, use distro-appropriate packages.
I’m not sure what your problem is, but mine is that I don’t want to develop against a moving target and need to run apps with contradicting dependencies on the same host.
That’s how I started using virtualenv years ago BTW, I’m not talking ivory tower here.
> Integrate your app into the system.
Yes. And I prefer supervisor for that. If your prefer rc.d scripts, be my guest.
> You also get the ability to very nicely manage dependencies and a clean separation between code and configuration.
I don’t get this one TBH.
> Lastly, DO use apache + mod_wsgi. It is fast, stable, widely supported and well tested.
And nginx + uwsgi/gunicorn aren’t?
> If apache feels like a ball of mud, take the time to understand how to cut it down to a minimum and configure it properly.
I know Apache pretty well, because we’re running thousands of customers on them. I’ve already written modules for it and been more than once in it‘s guts. And my impression is not a good one.
My point was to look around before you settle. If you think Apache is da best, knock yourself out. However stuff I see daily on IRC lets me think that it isn’t very unproblematic.
I’m not going to start a “vi vs. emacs”-style holy war here. That’s why I wrote “shop around before you settle” and not ”don’t ever use Apache”.
> When it comes to infrastructure, making boring choices leads to predictable performance and less headaches more often than not (at least in my experience).
Absolutely. nginx is way past the “new and hacky” state though.
I don't have a problem with virtualenv: it's a fine development tool, but it is not what I would use in production. If you want separate clean environments for each app, use KVM or Xen and give it a whole server.
supervisord is a fine solution. I just prefer that my processes look exactly like system processes. Upstart and rc.d are fantastic and do everything I need well.
As for separation of code and configuration, I simply mean that in my case I use Debian packages to deploy all of our software. This means that each package must be generic enough that it is deployable so long as its dependencies are satisfied. Thus your config files are mostly external to your packages. Then you can easily use Puppet or some such to deploy code. One other reason to use native distro packages: Puppet does not play well with pip/easy_install, etc.
>> Lastly, DO use apache + mod_wsgi. It is fast, stable, widely supported and well tested.
> And nginx + uwsgi/gunicorn aren’t?
I didn't say that. I am simply stating that saying "don't use apache" is wrong. Do use it. You can also use nginx + uwsgi/gunicorn if you want to, but apache is by no means a bad choice. It's got 25 years of use and nobody that uses it seriously is complaining.
> Absolutely. nginx is way past the “new and hacky” state though.
It is not, and I never said so. I use nginx + apache, where nginx is a reverse proxy. In fact nginx is my top choice for front-end server setups. I am saying, don't deploy things directly out of GitHub. Go with slightly older, more tested stuff. It'll be a bigger payoff in the end.
Overall, I think we are saying the same thing, with slightly different tools we normally reach for. I am just trying to throw a different perspective out there and a different way to do things. Thanks for the detailed reply.
JFTR, there is a pretty good solution for this (because in the real world, you can’t always avoid that): a custom pypi server.
You take a git version you know that works do a `python setup.py sdist` und push it to a private repo. We have to do stuff like this for Sybase drivers for example which are open source but not in PyPI (or any distribution). It saves so much pain.
> As for separation of code and configuration, I simply mean that in my case I use Debian packages to deploy all of our software.
Well we do the same but the Debian packages contain code _and_ the virtualenv.
> Overall, I think we are saying the same thing, with slightly different tools we normally reach for.
Mostly yes. Just wanted to add the pypi tip as it hasn’t been mentioned yet in this thread.
I know that's a common view and wrapping as much of the site as possible up in a virtualenv certainly has a lot of advantages. But ultimately, your software is going to have to interact with the OS, at some level, otherwise, why do you even have an OS? So the question is: where do you draw the line? He seems to draw it further down the stack than most people (no system python, for instance) but he doesn't give his opinion on, for instance, using the system postgresql.
Anyway, I personally would draw the line further up the stack than him, but take things on a case-by-case basis, and I don't really consider it an "anti-pattern."
With regards to fabric vs. puppet, I understand the advantages of puppet when you have a complicated, hetrogenous deployment environment. But the majority of projects I've worked on have the operations model of a set of identically-configured application servers back-ended against a database server. For this configuration, what does puppet give you? If the author's argument is that the site may eventually outgrow that model, well, I can see puppet becoming necessary, but why not cross that bridge when you get to it?
Ok, sure, MongoDB still changes a lot between versions, in this case you should use the latest version.
But stop there. Especially if you're paying for support (like RHEL)
There should be a good reason for you to compile Apache / MySQL / PostgreSQL / Python. Otherwise, use the distro version. One (common) exception would be "we need Python 2.7 but this ships only 2.6"
Most of the "just download and compile" have no idea of the work that goes behind Linux distributions to ship these packages.
Yes, I'm sure you're going to read all security advisories and recompile all your stack every X days instead of running apt/yum upgrade
What I actually wrote is: because we’re a LAMP web hoster, we compile MySQL+PHP+Apache ourself. And because we’re a Python shop, we don’t let Ubuntu/Red Hat dictate which Python version we use.
Great article, btw!
With fabric and similar systems it's a bit harder. Basically you'd have to write your scripts exactly the same way you'd write a puppet/chef recipe: "make sure this is configured that way, make sure that is installed", etc. (or do migration steps) It's very different from fabric's "do this, do that" approach. Unless you run fabric on every single host after you make every change, some of your infrastructure will be lagging behind.
For example, what do you do when you create a new server, or do an upgrade that involves different dependencies? Run a fabric script that migrates from state X to Y? What happens to new machines then? How do you make sure they're in the same state?
I found chef a very good solution even if I have a single server to manage. No need to think about how it was configured before. Migrating to another provider? Just migrate the data, point the server at chef, done.
But that's besides the point. I don't think he's arguing that software should be completely isolated form the deployment operating system. That would be absurd, since, as you pointed out, software has to interact with the OS at some level, i.e., to manage system resources. Just because the OS ships with a bunch of packages with specific versions doesn't mean you have to use them. And what I think the OP is saying is that to make your applications portable and easily deployable, you shouldn't.
The question "why even have an OS?" was a rhetorical one, meant to illustrate the fact that the argument here is over where one draws the line, between the dependencies you maintain yourself and those you outsource to your OS vendor. Unless you're deploying on Linux From Scratch then you are outsourcing to the vendor at some level, so the only question is, where is that level?
That this guy maintains even his own web and database servers means wants/needs to control things at a much deeper level than is typical. And I think that's beautiful if it works for him. My point of disagreement is his use of the word "antipattern" to suggest that any other choice is wrong.
They’re both for our customers as we sell traditional web hosting with a LAMP stack (and yes, we have to compile PHP ourselves too, so we can offer different flavors).
I never said you _have_ to compile everything yourself. In case of Python I said you shouldn’t inflict the pain of programming Python 2.4 just because you’re on RHEL/CentOS 5 or earlier.
System packages have several problems I outlined too. Most important: you add a middleman to the release pipeline, no virtualenv and possible dependency conflicts: I have apps that depend both SQLAlchemy 0.6 and 0.7 – they couldn’t run on the same server.
But OTOH we use stock packages as far as possible, because it’s less work. Eg. there’s no reason to compile an own Python on Oneiric and later. Same goes for PostgreSQL which is uptodate.
It’s all about freeing yourself from constrains in points that matter, not adopting a new religion.
I always want to compile my full stack; DB, network servers, language.
Perhaps a better term would be "decoupled"?
- lots of people argue for virtualenv because some versions may be incompatible. The problem here is the lack of backward compatibility of packages, and frankly, if you need to rely on packages which change API willingnily betwen e.g. 1.5 and 1.6, or if each of your service depends on a different version of some library, you have bigger problems anyway.
- any sufficiently complex deployment will depend on things that are not python, at which point you need a solution that integrates multiple languages. That is, you re-creating what a distribution is all about.
- virtualenv relies on sources, so if some of your dependences are in C, every deploy means compilation
- I still have no idea how security is handled when you put everything in virtualenv
See also http://bytes.com/topic/python/answers/841071-eggs-virtualenv...
> There are valid reasons to rely on your OS, assuming you have an homogenous deployment target (same OS, maybe different versions):
I’d love to hear them.
> lots of people argue for virtualenv because some versions may be incompatible. The problem here is the lack of backward compatibility of packages, and frankly, if you need to rely on packages which change API willingnily betwen e.g. 1.5 and 1.6, or if each of your service depends on a different version of some library, you have bigger problems anyway.
Well, you said there are possibilities of problems but that they shouldn’t matter in an ideal world. Maybe you’re okay to take the chances but I’m not. Every code I deploy has been tested rigorously against a certain set of versions and that is the only combination of dependencies I’m willing to consider “working”. UnitTests with different dependencies are just as worthless like integration/functional tests against sqllite instead of the same DB type as in production.
There’s even the possibility that your code works because of a bug and when that one gets fixed, you app goes south because of some weird side-effect.
> any sufficiently complex deployment will depend on things that are not python, at which point you need a solution that integrates multiple languages. That is, you re-creating what a distribution is all about.
I’m not sure if I understand what you mean, but yes if you want to use certain features outside the Python ecosystem, you’ll have to buckle up and package them yourself too. “We can’t do that, package XYZ is missing/too old.” isn’t really a good excuse to not do something that is important/good for your business. And that’s one of the main points of the article.
> virtualenv relies on sources, so if some of your dependences are in C, every deploy means compilation
That’s wrong if you go the way described: The virtualenv is packaged with the code. Build tools don’t belong on production servers.
> I still have no idea how security is handled when you put everything in virtualenv
Just as everywhere else. If you think it’s okay to tell customers that their data has been hacked because debian was to slow to issue a fix, be my guest. We can’t afford that. What happens on my servers security wise is _my_ responsibility and using ancient versions of Python libraries just to be able to blame others for FUBARs is not a solution in my book.
I think his point is that pip can't do this, but learning to actually work with your distro's packaging system properly results in a more powerful and easier to redistribute project.
> Just as everywhere else. If you think it’s okay to tell customers that their data has been hacked because debian was to slow to issue a fix, be my guest.
Your packaging methodology is not what gives you security there. It's that you noticed a vulnerability and deployed a fix. The method of deployment is irrelevant. Your point is that knowing you have a security issue and waiting for upstream to get around to fixing it isn't always acceptable. That goes for everything.
If you know how to roll .debs it's just as easy to patch and release a fixed version of a library. (Or even install the pip version earlier in your sys.path...)
Absolutely. And that’s why I package the whole virtualenv into the DEB along with the project. I always have the assertion, that that combination of packages that’s inside passes all my tests no matter where I install it.
Windows still uses shared libraries but at least they invented the GAC so applications can specify exactly which version of a library they work with and the installer will install that version if it's not present.
If the author is trying to convince people to change their habits, he is doing a crummy job. He comes across as elitist and "if you don't do it my way you're wrong".
screen python manage.py gunicorn &
That is an anti-pattern. Granted he didn't really explain it so well.
Your example is btw the advanced version. Many people just ssh on the host, fire up screen and start their server inside.
update I added some more context. I would never spit on my beloved tmux. :)
Wow, I always though of myself as an idiot for doing this. But that is for some not yet launched thing. Who on earth does this for a production website?
supervisord is the wrong solution. It answers the wrong question (is the process running). It's worse than useless in that it has given false positives. The right question is (is the process responding correctly). Use monit or something else that actually does what's needed.
supervisor is not a an alternative to monitoring, I never claimed that. But that’s a whole different story.
When it's time to test and deploy, the code is deployed to a vmware image, which is then sent to the testers. When everything checks out, the code is once again deploy to a new copy of the image, which is then promoted to production.
On argument we've seen is that it makes it easy to do a rollback of the system, using vmware snapshots. It might be a sensible idea in some cases, I just thing it's a bit weird and some overhead.
The worst use case of this I've seen is a Telco, where you needed to spin up as much 12 vms, depending one what you and your teams was working on. But then again that's the same company that read the Perforce guidelines on recommended setup and still decided to just pill all the projects in to one repository.