
Race-condition-free deployment with the "symlink replacement" trick - Deejahll
https://gist.github.com/3807742#file_symlink_replacement.md
======
viraptor
Ok, one problem solved. Now what's left is - schema change, making sure
ongoing process flows can automatically migrate from the previous version to
the new one, resources referenced from the previous version are still valid,
nothing tries to read files with the code via the link (it can change mid-
request).

I got the strange feeling from that article as if changing the code files was
the hardest thing about yes upgrade.

~~~
ibotty
you are certainly right. db schema updates are the hard part.

for db schema updates: have a look at sqitch by postgres' david wheeler. it
should also support mysql (or will in the future).

------
Firehed
This is a good system at face value, but can present other problems. Any code
that has a stat cache (I know PHP does, and I'd be surprised if other common
languages don't) suddenly doesn't realize that your paths are going to a
different place. Because /var/www as your "base" directory is symlinked to
/var/www.a, www.a is cached and when you swap your symlink to www.b to deploy
the next version, anything relying on that stat cache (include directives,
autoloaders, etc) suddenly starts pulling in the wrong version of the file.

Solving this in a way that doesn't require restarting any services and doesn't
introduce any more race conditions is nontrivial, although it tends to work
pretty reliably so long as you have a front controller and it very rarely
changes. Basically it comes down to a symlink change detector by hitting a
special (internal) URL after your deploy script which kills the stat cache. If
people are interested I can post a more concrete example.

~~~
hkolk
I work for a pretty big PHP shop. Our opcode cache has some issues with this,
but our stat-cache is ok. Inside the code we unpack the symlink, so internally
we are pointing to files like ~/releases/<id>/codebase/HTML/index.php instead
of ~/codebase/HTML/index.php

Because of the opcode issues we still do a rolling restart of the apaches but
we don't see them a lot.

We use `rename` instead of `mv`. Don't know why :)

edit: this also handles the problem of suddenly referencing different files
mid-request. Because you don't

------
troels
I wonder what problem this is really solving. I mean, delete+create in a
script will happen pretty fast after each other, so the moment of
inconsistency is really very short. If this is an actual problem for you,
chances are that your setup is rather large and you have multiple nodes behind
a load balancer. In that case, you have bigger issues, such as making sure the
individual nodes are updated at the same time. Usually this would be solved by
taking them out of rotation while updating, in which case the atomic symlink
switch becomes moot.

~~~
regularfry
The problem is that the issues created by out-of-sync code that happens to get
loaded in the wrong order can be an absolute _nightmare_ to debug. You'll have
transient, unreproducible, potentially data-destroying bugs which vary with
each release. Some releases you might get lucky, some not. If you don't think
about atomicity of the deployment, you can chase your tail for days trying to
figure out what went wrong.

That being said, this strikes me as more of a pain under the traditional PHP
model, where reloading code from disc per request is normal, than for
something like Rails which loads everything into memory once at launch.

~~~
martinaglv
With APC (and stat disabled) PHP behaves in exactly the same way. Bytecode is
kept in memory between requests and you can then safely push a new version of
your directory tree. All that is required is to flush the cache to have the
new version go live.

------
praptak
Yeah, mv is atomic but I believe that a crash can still leave your data
inconsistent due to write reordering, see:

[http://comments.gmane.org/gmane.os.solaris.opensolaris.zfs/2...](http://comments.gmane.org/gmane.os.solaris.opensolaris.zfs/25120)

I would do a fsync() before switching the symlink to the new dir.

~~~
ibotty
yes. that's a good idea if you care about it. the good part about this
solution is that even if the newly deployed file tree becomes corrupted, you
can simply point the symlink to the old (certainly not corrupt) tree. that is
easy and fast error recovery. an fsync in between makes the window very small,
so that only the symlink might get broken.

------
mjs
This system ensures the _server_ is always in a consistent state, but client
race conditions are still possible if the "old" index.html references an asset
that isn't available after the deployment has occurred. Is there any good way
of dealing with this? (I just ignore it...)

~~~
jacquesm
Two phases. Leave possibly referenced assets for a while until use of cached
old files has become rare enough, then remove them.

~~~
mjs
Sure, but this approach is going to be much more complicated and less
predictable than the "switch the docroot to a completely different directory"
model. (e.g. you need to distinguish between asset and non-asset directories,
and make a decision about how long to keep "old" assets around.) With some
effort these problems can be solved, I'm just pointing out that even with a
completely static site, and perfectly atomic docroot switching, you can still
end up with clients in an inconsistent state.

------
Negitivefrags
We use use this technique at Grinding Gear Games for our deployments but here
are a few random assorted details about how we are set up.

The first is that we find it's a good idea to have your release directories on
the server named after tags from your VCS. Each time we want to do a deploy we
just make a tag and the deployment script just takes the name of the tag to
deploy as it's argument. It's very easy to see what version is deployed on a
server by just looking at the address of the symlink.

The second is that you should use rsync with the --link-dest option. --link-
dest allows you to specify a previous directory that rsync can use to create
hard links from for files that haven't changed. For example, if you a new
version to deploy in a directory called "0.9.10/2" and on the remote server
you have "0.9.10/1" currently deployed, you can "rsync 0.9.10/2
server:0.9.10/2 --link-dest 0.9.10/1". What this does is create a new dir tree
in /2 with all the files that didn't change from /1 hard linked but with new
copies for the files that did. This saves a lot of disk space and it means you
can keep versions around on the server for as long as you feel the need to.

As our deployment is ~8GB this is quite important for us. This means that we
actually have releases sitting on the server for quite a while back.

The third thing is setting something up so you can have simple versioning of
your deployment scripts.

We have a script that drives this whole process called "./realmctl".
Deployment is split in to a 4 step process. You find scripts like this in each
release dir like this:

./0.9.10/1/prepare (create/upload new release)

./0.9.10/1/stop (stop existing servers)

./0.9.10/1/deploy (change symlinks over to this release)

./0.9.10/1/start (start servers)

Each of the releases contains it's own version of the script. That means if
you issue a command like "./realmctl restart --release=0.9.10/2" then the
script can find the stop script for the current version then run the deploy
and start scripts for the new version. In this way if your deployment process
changes between versions then you can still freely move around between
versions without needing to worry about the version of your deployment
scripts.

The last thing is that it's really nice if your writing something similar for
your scripts to have some idea about different parts of your infrastructure so
that they can be controlled independently. It's really useful to be able to
say something like "./realmctl restart all poe_webserver" (restart webserver
processes on all servers) or "./realmctl stop ggg4 poe_instance" (stop the
game instance servers on ggg4). Those kind of commands are really useful
during an emergency.

~~~
peterwwillis
Nice. It's almost rare these days to see a shop manage its deploys like
application releases.

Do you do staged production deploys of new code for small groups of users? I
found it was beneficial to be able to test a change on a random subset of
users so if there's a production-only bug it doesn't hit everyone at once.

This also allows you to not have to "stop" the app servers because you're
starting up the new version's instance in parallel with the old. The frontend
just passes user-specific requests to the new instance and the old instance
keeps chugging along with no downtime. Of course this usually requires no
schema changes (unless you have lots of spare infrastructure handy).

~~~
Negitivefrags
We don't do that on the production realm, but it's kind of because we are a
game and patches are a big deal for our community. It's not like most websites
where you often don't know when patches are coming or what they changed. We
keep full change logs here: <http://www.pathofexile.com/forum/view-forum/366>

It's worth bearing in mind that we are actually deploying an application that
they play on their desktop machines, it's just that our website is tightly
integrated with the live realm so they are deployed together in the same
deployment system.

What we do have as a game though is the ability to have a separate alpha realm
that we can deploy to for testing a release and we have a trusted set of our
player base that is allowed access to it.

So here is the list of realms we have:

Testing (Local continuously integrated deploy of trunk. Updated every commit)

Staging1 (Local staging for the next major patch)

Staging2 (Local staged copy of whatever is on production. This is used for
when we want to test bugfixes to production)

Alpha (Deploy of the next major patch for some community members to play and
test in advance. This is deployed alongside the production realm on the live
servers.)

Production

All of that said though, we are adding the ability very soon for the backend
to be able to spawn game instance servers for multiple versions of the realm.
This would mean that we can deploy a game patch without a restart (assuming
the backend didn't change). Old clients would get old game instance servers
but as players restart their game client and patch, they will get on new game
instance servers.

------
sausagefeet
I'd prefer to just bring the machine out of rotation, redeploy, bring it back.

------
Gigablah
Rex [1] already supports deployment (and rollbacks) using symlink replacement.

[1] <http://rexify.org/modules/application_deployment.html>

------
daenney
Isn't this guy reinventing a simplified wheel? We already have tools like
Capistrano and Fabric or Rex (which does quite a bit more than just
application deployments).

~~~
dsl
Thats like saying why understand how writing a file to disk works when we have
things like notepad and emacs. What do you think these tools do under the
hood?

------
rll
This seems a bit oversimplified. Sure, there are no race conditions if there
are no interactions between files, but if there are then swapping the symlink
mid-request on requests that are already in progress will cause all sorts of
race conditions.

~~~
cheald
Not necessarily. On a Linux-based system, you're actually going to be
operating on inodes rather than path names, so once you have a handle to a
file, changing the path to it isn't going to affect your ability to interact
with it. Just make sure that your new directory is set up so that the same
request for the same target file ends up requesting the same inode and you
shouldn't notice any problems.

~~~
kelnos
But what if you don't have a handle to a file? Say you have index.php, which
includes stuff.php. Order of operations:

    
    
      1. Server opens (old) index.php and begins executing
      2. Symlink swap
      3. Interpreter gets to the line that includes stuff.php, and then opens the *new* file

~~~
regularfry
That's precisely why the classical PHP model is kinda stuffed under this
situation.

If you want to stick with the traditional disc-read-per-request model, I'd be
interested to see if something like a blue/green code deployment could work.
You'd have two separate html roots - say, with /var/www/blue being current.
You deploy your new code to /var/www/green, update httpd.conf to point there,
then SIGHUP apache. The next deployment would switch back from /var/www/green
to /var/www/blue. That way every request sees a consistent deployment from
start to end.

------
standel
mv is relying on operation "similar" to rename() defined by POSIX which
specifies that it should be atomic.

So, the assumption "On Unix, mv is atomic operation" is not true. If your
underlying FS is fully POSIX-compliant, mv will be an atomic operation.

I think it's important to stress it because there are some distributed FS that
might even try to be POSIX-compliant but which are not guaranteeing atomic
rename's and therefore this trick would not work well.

------
njharman
Any deployment tool fab, capistrano, etc. should do this.

My preferred layout is

./releases/<datetime or rev or whatever makes sense for you stamped>

./current symlink to ./releases/<foo>

Keeping releases in directory by themselves make it easy to list them, archive
old ones etc.

------
ralph
Telling the server of the new root instead, which needs the relevant
permissions as the author points out, would remove the symlink traversal on
every access?

------
code_duck
If you are on a shared host and can't restart your webserver, you probably
don't have the ability to set the document root, either.

------
j_baker
Is it just me, or does this seem like a lot of work just to avoid having
assets be inconsistent for "some number of milliseconds"?

~~~
sirclueless
Especially when browsers are stateless and request things at different times
anyways. If you have a lot of requests, you can expect a few of them to get
the HTML from one version and the javascript and CSS from another anyways, no
matter your strategy.

I still think this is a valuable deployment strategy, just because you can
rollback and switch deployed versions easily, which is always useful. And it's
certainly better than rsyncing to a live directory, at any rate.

