Hacker News new | past | comments | ask | show | jobs | submit login
Staging Servers, Source Control & Deploy Workflows, And Other Stuff (kalzumeus.com)
330 points by revorad on Dec 12, 2010 | hide | past | web | favorite | 54 comments



>Git is very popular in the Rails community, but there are probably no two companies using git the same way.

For anyone not already aware of it, I recommend checking out Git Flow. It's a set of git extensions that standardize the git workflow:

https://github.com/nvie/gitflow

Some articles:

http://nvie.com/posts/a-successful-git-branching-model/

http://news.ycombinator.com/item?id=1617425


"...until that day when I accidentally created an infinite loop and rang every number in the database a hundred times."

A developer that worked for me did exactly this a few years ago, only instead of ringing numbers he sent them overcharged SMS messages. I had to call up every single affected customer and explain to them why they had just received 50 SMS messages that cost them $5 a pop. After that I of course refunded the money - only problem was that the SMS gateway charges 40% on each transaction, which I couldn't get back.

Very expensive mistake.


I also recommend to have a demo server, to which you can push your latest changes without thinking twice. This way, demoing new features to the customer or testing your deployment script does not change the staging box. This way you can have your staging deploy be a much more realistic test run of your production deploy. You only push to staging, when you are about to push to production. Otherwise you might get state differences between the two like outstanding migrations that need to be run on one server but not on the other one. Typically things like that beak you neck during deployment to prod, so you want to test that. But you still want to have a faster way of getting features vetted by your customer. So you should have demo and staging.


One of the nice things about being on the cloooooooooooud is that, if you've got your infrastructure managed right, a developer who wants a sandbox -- for any reason -- should be able to get one with about as much difficulty as getting office paper, and be able to bin it with about as much regret.


If you're hosting with Amazon and your application data is on its own EBS (block storage service) volume cloning your environment is easy. You can snapshot the volume, create a new volume from the snapshot, and spin up a new server (you automated provisioning, right?). That way you get a near exact replica of your environment with not too much work.


This is what we have with an Oracle product: Vanilla - no sample or legacy data, no dev Sandbox - oracle sample data, no dev Conversion - legacy data, no dev Dev - "scrambled" legacy data, dev Config - functional team environment setup Test - config and dev together, unscrambled user data Test2 - I have no idea why this here, but it is :) Training - for training users, selected legacy data and dev PreProd - everything after test and before prod Prod

Given that each environment has an app server, a web server, two or more batch servers and a database, I can vouch that a fubarred deployment workflow is hell. You end up with no baseline to tell if your process is bombing due to configuration or code.


An excellent article - these are lessons most people learn the hard way. I'll second the recommendation for using Chef or something like it to manage your system configuration. It makes building new servers based on your configuration trivial (say if you wanted to move VPS hosts). Additionally, if you use Chef, you can use Vagrant[1], you can replicate your production or staging environments locally in VirtualBox.

Also, not to pimp my own stuff, but I wrote a thing about generating test data with Ruby some time ago. I've used this strategy a number of times and it works really well: http://larrywright.me/blog/articles/215-generating-realistic...

[1]: http://vagrantup.com/


Good post! I especially appreciated "staging = production - users", simple and easy to remember.

It is so useful to have very similar setups in staging and production.

In particular, I really try to avoid having a different architecture (eg: 32bits vs 64bits, or different versions of ubuntu, or passenger just in production etc). It makes it easier to catch issues earlier.


Sorry for nit-picking on an otherwise great post, but:

It is virtually impossible to understate how much using source control improves software development.

Shouldn't that be "overstate"?


I agree - an overall great post. But to add another nitpick, do you mean anachronism where you say asynchronism?


Thanks guys. This is why I write -- Engrish, you use it or you lose it.


Another nitpick: There isn’t a written procedure or automated script for creating it from the bare metal.

I would say "from bare metal" rather than "from the bare metal"


I wouldn't. You're not creating your script "from bare metal", since a script isn't made from bare metal. Rather, the bare metal part is the PC or virtual machine. Think of "building from the ground up".


The two preceding nitpicks were concrete misuses, whereas this one reflects a much greater amount of subjective reasoning.

Not saying that's a bad thing... I just think the distinction is very important. :-)


Please don't upvote me, I just felt like commenting: It's threads like these that make me want to check Hacker News every day. So much good information, both in the post and the comments. That is all.


Thanks, this really put a smile on my face.


A few other things:

One-click rollbacks. It's really, really important that when you deploy a release to the production servers, you can un-deploy it with a single click or command. That means all changes should be logged, and all the old files should be kept around until the next release. You hopefully won't have to use this often, but when you do, it's a lifesaver to be able to say "Okay, we'll rollback and fix the problem at our leisure" rather than frantically trying to get the servers back online.

Staging/production configs. If you do need to have differences between staging & production configs, try to limit them to a single overrides file. This should not contain app config that changes frequently, and should be limited to things like debug options and Patrick's "don't e-mail all these people" flag. Check in both the staging and production config overrides, but don't check in the actual filename under which the system looks for them. On the actual machines, cp the appropriate config to the appropriate location, and then leave it there. This way it doesn't get blown away when you redeploy, and you don't need to manual work to update it on deployment. (I suppose you could have your deployment scripts take a staging or production arg and copy it over appropriately, but this is the poor-man's version.)

Deployment schedule. I'd really recommend having a set, periodic deployment schedule, maybe even run off a cronjob. The problem with manual deployments is they usually happen only when people get around to it, and by then, dozens of changes have gone in. If something goes wrong, it's hard to isolate the actual problem. Also, deploying infrequently is bad for your users: it means they have to wait longer for updates, and they don't get the feeling that they're visiting a living, dynamic, frequently-updated website.

The holy grail for deployment is push-on-green. This is a continuous-integration model where you have a daemon process that continually checks out the latest source code, runs all the unit tests, deploys it to the staging server, runs all the functional & integration tests, and if everything passes, pushes the software straight to the production servers. Obviously, you need very good automatic test coverage for this to work, because the decision on whether to push is completely automatic and is based on whether the tests pass. But it has big benefits for both reliability and morale as team size grows, and big benefits for users as they get the latest features quickly and you can measure the impact of what you're doing immediately. I believe FaceBook uses this system, and I know of one team inside Google that has the technical capability to do this, although in practice they still have some manual oversight.

Third-party software. I know Patrick recommended using apt-get, but I'm going to counter-recommend pulling any third-party code you use into your own source tree and building it with your own build tools. (Oftentimes you'll see all third-party software in its own directory, which makes it easier to audit for license compliance.) You should integrate in a new version when you have a big block of spare time, because it'll most likely be a long, painful process.

There are two main reasons for this. 1) is versioning. When you apt-get a package, you get the most recent version packaged version. This is not always the most recent version, nor is it always compatible with previous versions. You do not want to be tracking down a subtle version incompatibility when you're setting up a new server or deploying a new version to the production servers - or worse, when you rollback a change. (If you do insist on using apt-get, make sure that you specify the version for the package to avoid this.)

2.) is platforms. If you always use Debian-based systems, apt-get works great. But what if one of your devs wants to use a MacBook? What if you switch hosts and your new host uses a RedHat-based system? The build-from-source installers usually have mechanisms to account for different platforms; open-source software usually wants the widest possible audience of developers. The pre-packaged versions, not so much. And there're often subtle differences between the packaged versions and the source - I recall that PIL had a different import path when it was built & installed from source vs. when it was installed through apt-get.


> I know Patrick recommended using apt-get, but I'm going to counter-recommend pulling any third-party code you use into your own source tree and building it

Counter-counter-recommended. This is needlessly duplicating immense amount of work that distro packagers do.

> You do not want to be tracking down a subtle version incompatibility when you're setting up a new server or deploying a new version to the production servers - or worse, when you rollback a change.

This is why LTS releases exist. If you're locked to Ubuntu 10.04, then you'll be using the packages that come with it until you're ready to make the significant leap to the next LTS version three years later.

> If you always use Debian-based systems, apt-get works great. But what if one of your devs wants to use a MacBook?

Then they can suck it up and learn how virtualbox works. Even versions you've hand-chosen are going to exhibit cross-platform differences that will make them fail to reflect the reality of production: case-insensitivity and GNU/BSD differences are two such things that come to mind. (Indeed, both of these have been encountered in the last few months by one of last few the VM-holdouts at work.)


I absolutely agree on the virtualisation front, and I'm a steadfast Mac user. Dev systems should be close to production/staging to avoid weird bugs. I really really don't want to spend my time dealing with whatever version of Erlang I can get on my Mac when I could just apt-get it. Case insensitivity is also an issue when using Python with the Cookie library, and with the eventlet library (and that's just off the top of my head).

Added advantage of using virtualisation is I can easily trash and rebuild my dev environment whenever I need to.


>> I know Patrick recommended using apt-get, but I'm going to counter-recommend pulling any third-party code you use into your own source tree and building it

> Counter-counter-recommended. This is needlessly duplicating immense amount of work that distro packagers do.

I think that distro packagers do far too much work: they sometimes do not include compilation options that are very useful, apply distro-specific patches and add too many dependencies. And when you have a problem, they should become the primary point of contact, not the "upstream" writer of the software which has zero control on how it is packaged.

For the vast majority of people the distro packages are fine, but for some people the distro packages are an inconvenience.


A small point, but with respect to one click rollbacks that is something you get "for free" with capistrano: cap deploy:rollback


Assuming your data migrations are stateless and reversible.


Push on green may have legal implications if you are trying to become a publicly traded company and need to have manual oversight for change control (sarbox/cobit nastiness.)


Sounds like a job for click-wrap, with logging of who is clicking (and when).

So that person logs in, sees a list of green things, and clicks on all of them. Like marking messages read in gmail.


I was recently setting up production env and thinking of deployment process for the new web app I'm building. A few of my experiences/thoughs (my app is on a single VPS - test/staging and production servers are separate systems):

Patrick's suggestion to keep a log with all the setup steps for the production was already a life saver. Two days ago I remembered that my server was running 32bit code and I was going to run MongoDB on it. Whoops. Complete reinstall in 48min, worry-free.

I'm keen on git, and plan to use it to deploy to production. I actually rsync to test/staging server doing development[0] (to avoid having to commit knowingly broken code just to be able to deploy on testing/staging server), but I use git to manage the code, and to deploy it. I have a clone repo on the production server with a production branch, which has a few commits on top of master, in which I committed the production-specific configuration.

Deploying on production does roughly: 1. check out the latest code from master 2. rebase production on top of master in a temporary repo to catch any rebase problems (because I don't want to merge master into production) 3. run unit tests in a temporary repo (sadly my tests only test the backend, not the web ui; I plan to improve in 2011 :) 4. rebase real production repo 5. make a new tag (for almost-one-click rollbacks) 6. restart whatever services need to be restarted

This is automated by a simple shell script which aborts at the first sign of trouble.

Regarding the 3rd party packages versioning: I use system packages wherever available. I don't have automatic updates though, and I don't use a system having rolling updates (I'm on Ubuntu Maverick). I had to manually rebuild two packages: nginx (to include upload module) and Tornado web server (the one in Maverick is too old for my purposes). This was pretty straightforward, and I've recorded the exact steps in my server setup log.

[0] my app involves callbacks from external services, so I can't test it on my laptop; my development workflow is "save in editor, rsync, see whether it works", with services in debug mode reloading themselves as soon as change is detected.


> I have a clone repo on the production server with a production branch

Do you protect the dot-files from being accessed via the web? I was at a security conference recently and a speaker mentioned a number of companies had source code accessible over the web because they served directly from a repo and didn't block accessing the VCS files.


The holy grail for deployment is push-on-gree

Never heard of this before, it sounds cool.


I figured out much of this the hard way. You don't hear people talking about it much because most people who know how to do it are too busy to write about.


A third option for the staging database is to do a dump and then scrub the data for security compliance. You may be able to use that database through several development cycles.


A company I am familiar with did that. Down that path lies madness. See the 33 bits blog: if data gets out, you are almost certainly screwed. (Trivial example: imagine you're my university and you release the medical records and student registration tables of your students for research purposes. You anonymize names and randomize ID numbers. Want to find my records? Look for the only person in the university who ever took AI, Japanese, and Women's Studies in the same semester. My medical records are pretty boring. They don't include, for example, an abortion. Let your imagination run wild on what happens if your company leaks the identity of someone whose records do. Something similar-with-the-serial-numbers-filed-off has happened before.)


For the purpose of a private staging server, particularly one used by people who have access to production data anyway, you don't need such "hard" anonymisation.

The main purpose of anonymisation, in this case, is to make sure you don't send testing emails to clients. So actually, the only kind of scrubbing you really need to do is to make sure every email/phone number/twitter handle/outwardly communicating piece of data is replaced by a test email/etc.

The hardcore anonymisation that banks use is only necessary because there is an actual security and reputation risk if the data is leaked by some random developer in India (or some angry developer in London). In the case of swiss banks, they are also legally obliged to scrub that data when using it internally in locations outside of Switzerland.

However, for the purpose of a startup with 1-30 ppl, most of whom have access to production anyway, there is no sense in doing that kind of anonymisation. The only risk you're protecting yourself against is sending hundreds of testing emails to your customers.


If your access controls to the staging server are ironclad, you're right. But they stop being ironclad the moment you make allowances to allow the staging server to connect to external API's. Most people who think they have ironclad controls on who can attack the staging server don't.


Or, in a distressingly common failure mode in Japan, when the staging server is initialized by a developer from a SQL dump and the developer does not realize that he has left a copy of if-this-gets-out-oh-god-the-company-is-finished.tar.gz on his hard drive until the day after losing it.


I don't see why access control (i.e. unix/db users) should be any more lax for a staging server than for a production server... After all, it's got your whole application on there. If you're running a rails app, that means it has your whole source code.

The solution there is to have robust access control to all of your servers.


Much less screwed than if you fail to catch a bug and the live production database is compromised, particularly if you store credit card numbers. This does mean that the staging environment must have all the same security controls as the production environment. If you can't achieve that then you probably shouldn't use a database with PII (even if it's indirect, like your course listing).

Incidentally, The nice thing about having the infrastructure to deploy a replica of your production environment is that it's probably not much harder to deploy multiple scaled-down versions cheaply, so that you can do two stages of QA. You can do all possible testing in an environment with a fake database, then for the real staging test use the scrubbed production version.


This is even more important for bigger sites because on staging sites with limited data some performance issues won't be visible until bigger data is thrown at the code


I agree, generally, but I much prefer writing a one-time script (or using one of the tools available) to populate the database with random-ish data instead of using production data.

It may be that I've been working with government data too much, but people's dev environments are generally far less secure than their prod environments, and I personally don't want to be the guy who declared that the scrubbed data contains nothing sensitive and be wrong, whether or not that data gets out into the wild.


For one client, we're going to do exactly that, and put an automated process that will produce the data, anonymize sensitive data (with checks).

This will also be useful for developers willing to test changes with the real data volume.

EDIT: after reading patio11 comment, I can only emphasize the importance of using a "white list" approach here, ie carefully picking the fields you will keep, and adding ETL screens. You have been warned :)


Other cool stuff that i haven't seen in many places is an environment that is automatically testing that the backups are working.

That's something worth of having as well.


"...but I’m not Facebook and having capacity problems means that I am probably already vacationing at my hollowed-out volcano lair."

So now we finally know patio11's grand scheme!

But seriously, thanks for the writeup. I am using the lazy man's version control (Dropbox... ;-) ), but I definitely need to more to Git ASAP. I guess before now the time spent learning and setting Git up was better spent doing something else (at least in my mind).


Having a staging server doesn't only help with capacity problems, but more so with deployment problems and differences in the environment. You might be running 32 Bit on your server but 64 Bit on your dev box. Or prod uses Amazon RDS, but locally you just have MySQL on the same box. And oh suprise, now your mysql gem doesn't want to be built. Or you ran each migration on your dev box separately as you build them and when you run them all at once during your prod deploy, things blow up. That's what a good staging environment should protect you from. And for all of that to be a problem, 2 users a minute are more than bad enough. You don't need to be Facebook for that. Even better, if you for example host your app on heroku, you get you staging and demo env for free in less than 30 minutes!


Patrick's reference to "capacity problems" was in regards to deciding whether to clear memcached. In a large scale deployment, such as Facebook, clearing the caches could overwhelm the servers. Patrick is small-scale enough to not worry about that, and prefers clearing the cache to avoid stale-cache-related bugs.

I do agree regarding how nice it is to use Heroku. My only issue with them is that they only support Ruby and Node.js, so I need to take my Clojure applications elsewhere.


+1

Staging environment must be IDENTICAL to production or ridiculous and disastrous errors will occur when you least expect them.


Any form of real source control is a better setup than just using Dropbox. I happen to do Git+Dropbox right now, but anything will do.

Source control just provides lots of benefits for logging and rolling back and around. Git and Mercurial, as distributed version control systems, just happen to be easiest since you don't need to really setup a repository, and just have all the data on local disk. It really is practically free.

Go to your source directory now:

1. git init # makes the directory a git repository

2. git add . # adds everything in that directory

3. git commit # commits it


Definitely get Git/Mercurial set up sooner rather than later - it's actually very easy to get the hang of the basics and it will give you a noticeable productivity boost (having a decent history of everything you've done is surprisingly helpful at times). You can still use dropbox as a kind of backup for your repository - no need to muck about setting up servers etc.

If you don't mind going for mercurial instead of git then take a look at hginit.com. Then if you're on windows, go and install tortoiseHg.


What is a seed script?


A script that populates your database with a set of test data. It can be as simple as a bunch of INSERT statements, or something much more elaborate.


automatic data creation, useful if your app requires some data in the database to function.


For a sandbox for AWS we've just started using Zettar (http://www.zettar.com/zettar/index.php). I'm not affiliated with the company, but I found them when they purchased one of ours.


I'm planning to use this git-based deployment workflow sometime soon:

https://github.com/apinstein/git-deployment/

Seems pretty nifty.


Patrick, how do you compress such hard-won wisdom in such a young person's head, and express it so well at the same time? ;-)


I'm using Git and I have a question: when do you commit?


I typically commit at least once an hour when I'm coding full steam ahead. My rule is I commit to my local repo whenever I have uncommitted work that I'd care about losing (if I made a blunder and had to revert everything). I also commit locally whenever I switch tasks (so one commit is one coherent block of work). I only push from my local to central repo when I have the code in a reasonable "working" state.

If you're working on your own then commiting frequently is fine. If you are working with a team then that's a lot of information for your colleagues to process, so it's a good idea to condense your changes down a bit when you commit to the main branch. I think that's what the article is getting at when it talks about feature branches.


Commits should be as small as possible, but not break anything.

With git the more important questions: When do you push?

Not as often as possible. I repeatedly regret a push, because i need to commit a revert/change commit afterwards. If i had waited, this could have been cleaned up before pushing.




Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: