* Downloading any new dependencies to a cached folder on the server (this was before wheels had really taken off)
* Running pip install -r requirements.txt from that cached folder into a new virtual environment for that deployment (`/opt/company/app-name/YYYY-MM-DD-HH-MM-SS`)
* Switching a symlink (`/some/path/app-name`) to point at the latest virtual env.
* Running a graceful restart of Apache.
Fast, zero downtime deployments, multiple times a day, and if anything failed, the build simply didn't go out and I'd try again after fixing the issue. Rollbacks were also very easy (just switch the symlink back and restart Apache again).
These days the things I'd definitely change would be:
* Use a local PyPi rather than a per-server cache
* Use wheels wherever possible to avoid re-compilation on the servers.
Things I would consider:
* Packaging (deb / fat-package / docker) to avoid having any extra work done over per-machine + easy promotions from one environment to the next.
Even at the time I thought Docker would be a great solution to the problem, but the organization was vehemently against using modern tech to manage servers and deployments, so I ended up writing that tool in bash instead. Good times.
We're moving to the Docker approach, which is really nice, but it does change the shape of the whole deploy pipeline, so it's going to take some time.
>Use a local PyPi rather than a per-server cache
I stil prefer a per-server cache. A local pypi is another piece of infrastructure you need to keep alive. You don't have to worry about the uptime of an rsync playbook.
Their first reason (not wanting to upgrade a kernel) is terrible considering that they'll eventually be upgrading it anyways.
Their second is slightly better, but it's really not that hard. There are plenty of hosted services for storing Docker images, not to mention that "there's a Dockerfile for that."
Their final reason (not wanting to learn and convert to a new infrastructure paradigm) is the most legitimate, but ultimately misguided. Moving to Docker doesn't have to be an all-or-nothing affair. You don't have to do random shuffling of containers and automated shipping of new images—there are certainly benefits of going wholesale Docker, but it's by no means required. At the simplest level, you can just treat the Docker contain as an app and run it as you normally would, with all your normal systems. (ie. replace "python example.py" with "docker run example")
If they're running ubuntu 12.04 LTS they can keep the 3.2 kernel until late 2017. That's 2 more years. And they wrote "did not", so it was likely the situation months ago, not yesterday.
> (not wanting to learn and convert to a new infrastructure paradigm) is the most legitimate, but ultimately misguided
It depends on the amount of stuff they deploy. If they handle everything using Ansible (and from the list it looks like they do), then it's months of work to migrate to something else. They may need the right users / logging / secret management in the app itself, not outside of it.
It's not. It would be months of work if they wanted to convert all their Ansible code to Docker, but that's by no means required.
Docker and Ansible can easily coexist peacefully.
(it always means some extra work for security updates though - now you're updating both the host and images)
sudo apt-get install linux-generic-lts-quantal
Edit: found https://py2deb.readthedocs.org/en/latest/comparisons.html
that said, for python files and simple packages it works well enough!
One of the significant tradeoffs to this approach is you lose the carefully-crafted tree-of-dependencies that the distros favor, so it makes the package pretty much automatically unacceptable to package maintainers.
However, being able to have install instructions that amount to "yum/apt-get install <package>" is pretty great.
I am hoping for an app/container convergence at some point, but we might need to drop the fine-grained dependency dream and have them be more self-contained, like Mac OS X apps.
We also incorporate a set of meta packages which means we can have multiple codebase versions installed and switch the "active" one by installing the right version of the meta-package. There's also meta-packages for each service running off the same codebase, which deals with starting/stopping/etc.
Basically, what it comes down to a build script that builds a deb with the virtualenv of your project versioned properly(build number, git tag), along with any other files that need to be installed (think init scripts and some about file describing the build). It also should do things like create users for daemons. We also use it to enforce consistent package structure.
We use devpi to host our python libraries (as opposed to applications), reprepro to host our deb packages, standard python tools to build the virtualenv and fpm to package it all up into a deb.
All in all, the bash build script is 177 LoC and is driven by a standard build script we include in every applications repository defining variables, and optionally overriding build steps (if you've used portage...).
The most important thing is that you have a standard way to create python libraries and application to reduce friction on starting new projects and getting them into production quickly.
It's more complicated than the proposed solution by nylas but ultimately it gives you full control of the whole environment and ensure that you won't hit ANY dependency issue when shipping your code to weird systems.
Also, are there seriously places that don't run their own PyPI mirrors? Places that have people who understand how to integrate platform-specific packages but can't be bothered to deploy one of the several PyPI-in-a-box systems or pay for a hosted PyPI?
Yes. I've seen them, and they've been huge shops.
Only in cases where you don't have wheels depending on external libraries. If you do, you should still package with the right dependency constraints. Otherwise you can install a wheel which does not work (because of missing .so)
Running devpi in another environment and syncing the resulting repository should allow me to achieve what I want.
-f, --find-links <url> If a url or path to an html file, then parse for links to archives. If a local path or file:// url that's a directory,then look for archives in
the directory listing.
It works. It's django based and you can setup s3-backed storage. It also has a docker-compose script.
Deploys are harder if you have a large codebase to ship. rSync works really well in those cases. It requires a bit of extra infrastructure, but is super fast.
Come from the same island as you, trust me. But the more you learn about this the more you see how complex it is. You can't even say that one solution is better than the other (like apt vs yum). Each and every one of them has their pros and cons. And more often than not architectural decisions make it impossible to get both solutions into the same system working together.
rSync is not deploying. It's syncing files. But even if you have a 1:1 copy from your development computer on a server it still might not work because on that server package xyz is still in version 1.4.3b and not 1.4.3c. Deployment is getting it there AND getting it to work together nicely and maintainable with the other things that run on that computer/vm.
I've been bundling libs and software into a single virtual environment like package that I distribute with rsync for a long time - it solves loads of problems, is easy to bootstrap a new system with, and incremental updates are super fast. Combine that with rsync distribution of your source and a good tool for automating all of it (ansible, salt, chef, puppet, et al) and you have a pretty fool-proof deployment system.
And a rollback is just a git revert and another push away -- no need to keep build artifacts lying around if you believe your build is deterministic.
- how do you know which version you're running right now?
- how do you deploy to two environments where different deps are needed?
- how do you tell when your included dependencies need security patches?
#1 is git (dump and log the git head on a deploy)
#2 don't do that - keep a single consistent environment
#3 use the system openssl - monitor other software components for security updates -- you need to do this anyway in any of these systems.
I wish everyone to have easy deployments where environments, OS versions and everything else are always consistent. :)
> #3 monitor other software components for security updates -- you need to do this anyway in any of these systems.
Sure. But having multiple virtualenvs means you need to monitor all of them on all of deployed hosts. Having everything packaged separately means you can do audits much easier and without location-specific checks.
For server-side apps like this, that usually means a Deb or an RPM. These systems handle upgrades, rollbacks, dependencies, etc.
Just because some people decide that writing an RPM specfile or running dh_make is too hard to work out, doesn't mean that the solution doesn't exist.
For someone trying out building python deployment packages using deb, rpm, etc. I really recommend Docker.
forget virtualenv; forget package dependencies on conflicting versions of libxml; forget coworkers that have 3 different conflicting versions of requests scattered through various services, and goddamnit I just want to run a dev build; forget coworkers that scribble droppings all over the filesystem, and assume certain services will never coexist on the same box
just use docker. It's going to go like this:
step 1: docker
step 2: happy
"If we hit the bullseye, the rest of the dominos will fall like a house of cards... checkmate!" -- Zap Brannigan
> forget coworkers that scribble droppings all over the filesystem, and assume certain services will never coexist
I think this tends to be less of a problem than the desire to have a build artifact that can be reliably deployed to multiple servers, rather than having the "build" process and "deploy" process hopelessly intertwined with each other.
In an ideal world you build a system from the ground up, but rarely is that ever possible and the approach taken to iterate is far more valuable than your requested 'serious' recommendation.
Indeed, we actually use Docker to build packages. Blog post coming soon, maybe.
In the meantime you can get a taste with Lattice.
one of which was just silly (kernel version -- are you living on that point release forever?)
one of which was valid (necessity to maintain method for distributing docker images), but probably dumb: you only get so many innovation points per company, and innovating on a problem docker just solves means you are supporting your in-house solution ad infinitum
and one of which definitely sounds painful (docker vs extant ansible playbooks)
This being said, I'm using docker for packaging/deployment of a nodejs app on those machines, and I hate it. I'm about to strip it out and go for .debs. Docker brings a lot of baggage with it, and requires major restructuring of some infrastructure parts. As they say in the article, the changes required to bring docker in just to do packaging are way too heavy. And Docker also sucks for rollbacks, to be honest - their tagging system is downright terrible.
My advice is not to use Docker in a production environment unless you can articulate the specific pain points it will solve for you.
It is easy to pick the silliness of the kernel reason, but Docker is moving fast right now. They are still getting the basic building blocks in place, and the Docker in two years will look nothing like today's.
We use Docker quite a bit today, but it's immature and it shows. With Composer I feel the basic functionality is finally in place it needs time to mature.
So I think it's quite wise to wait. You don't need to chase every new technology. If you have a product to ship, focus on that instead and use whatever tools are proven to work.
sudo apt-get install linux-generic-lts-quantal
On the app end we just build a new virtualenv, and launch. If something fails, we switch back to the old virtualenv. This is managed by a simple fabric script.
Bitbucket and GitHub are reliable enough for how often we deploy that we aren't all that worried about downtime from those services. We could also pull from a dev's machine should the situation be that dire.
We have looked into Docker but that tool has a lot more growing before "I" would feel comfortable putting it into production. I would rather ship a packaged VM than Docker at this point, there are to many gotchas that we don't have time to figure out.
git clone --depth=1 path/to/repo
edit: but yes, cloning as a developer will take a long time. But, if it really gets out of hand, I can hand new devs a HDD with the repo on it, and they can just pull recent changes. Not ideal, but pretty workable
see here: http://stackoverflow.com/a/29936384/138469
It's really not hard to deploy a package repository. Either a "proper" one with a tool like `reprepro`, or a stripped one which is basically just .deb files in one directory. There's really no need for curl+dpkg. And a proper repository gives you dependency handling for free.
For example I find the --instdir option to dpkg but it still would have to be downloaded from the other host, unless of course the folder was mounted somehow.
You can set a different base path in debian/rules with export DH_VIRTUALENV_INSTALL_ROOT=/your/path/here
Do people really do that? Git pull their own projects into the production servers? I spent a lot of time to put all my code in versioned wheels when I deploy, even if I'm the only coder and the only user. Application and development are and should be two different worlds.
Debian packages have the concept of 'config' files. Files will be automatically overwritten when installing a new version of package FOO, unless they're marked as config files in the .deb manifest. This allows you to have a set of sane defaults, but not to lose customisations when upgrading.
When I used this approach with a Django site years ago using RPM we used the pattern vacri mentioned or the reverse one where you have an Apache virtualhost file which contains system-specific settings (hostname, SSL certs, log file name, etc.) and simply included the generic settings shipped in the RPM.
In either case the system-specific information can be set by hand (this was a .gov server…), managed with your favorite deployment / config tool, etc. and allows you to use the same signed, bit-for-bit identical package on testing, staging, and production with complete assurance that the only differences were intentional. This was really nice when you wanted to hand things off to a different group rather than having the dev team include the sysadmins.
1. Create a python package using setup.py
2. Upload the resulting .tar.gz file to a central location
3. Download to prod nodes and run
pip3 install <packagename>.tar.gz
Rolling back is pretty simple - pip3 uninstall the current version and re-install the old version.
Any gotchas with this process?
So at some point, as you know you'll need to move on.
There are no git dependencies in the process I describe above.
The pip drawback that is discussed in the post is of PyPi going down. In the process described above there is no PyPi dependency. Storing the .tar.gz package in a central location is similar to Nylas storing their deb package on S3.
I vaguely remember .deb files having install scripts, is that what one would use?
- your app user doesn't need rights to modify the schema
- you need to handle concurrency of schema upgrades (what if two hosts upgrade at the same time?)
- if your migration fails it may leave you in the weird installation state and not restart the service
Ideal solution: deploy code which can cope with both pre-migration and post-migration schema -> upgrade schema -> deploy code with new features.
If your migration system is smart enough (or you can easily check the migration status from a shell script) you could also do this in a multi-app-server environment too.
So how is this solving the first issue? If PyPI or the Git server is down, this is exactly like the git & pip option.
I'm a big fan of using the config-package-dev package from DebAthena to build config packages, which allow for about 99.9% of Debian server setup to be defined in Debian packages.
How has your experience with Ansible been so far? I have dabbled with it but haven't taken the plunge yet. Curious how it has been working out for you all.
I'm looking to do something pretty similiar, but RPMs. I found rpmvenv that seems to work in the same fashion. https://pypi.python.org/pypi/rpmvenv/0.3.1
If a company wants to use Docker that's their choice, but I don't think its at all reasonable to insist on or only support that environment as a software vendor. If it works on Debian, give me a .deb or even better an Apt Repo to use.
With that said, Conda is not a perfect solution. One thing that can be frustrating is that a package can include compiled code (shared objects/dylibs) that may be incompatible with your system. Unfortunately, while you can indicate dependencies on other conda packages, python versions, etc there isn't currently a convenient way to indicate things like GLIBC dependencies.
cf push some-python-app
Works for Ruby, Java, Node, PHP and Go as well.
You'd use it for one in your own data centre, or Pivotal Web Services, or BlueMix. You point it at an API and login, then off you go.
If you need something more cut-down to play with, Lattice is nifty, but currently doesn't do buildpack magic.
No, the state of the art where I'm handling deployment is "run 'git push' to a test repo where a post-update hook runs a series of tests and if those tests pass it pushes to the production repo where a similar hook does any required additional operation".
Looks like these guys never heard of things like CI.
This is the core of how we deploy code at Nylas. Our continuous integration server (Jenkins) runs dh-virtualenv to build the package, and uses Python’s wheel cache to avoid re-building dependencies.