This is interesting, but I find using a different AMI for each project to be tedious. My understanding is that instances should be ephemeral and agnostic to the project at hand. Initial bootstrap scripts should handle the installation of any dependencies and the project code itself.
I have a very similar setup as the one mentioned in the article where we create a full baked ami, then create a new autoscaling group which attaches itself with a load balancer.
The reason you want a fully created ami is to greatly cut down on the time it takes to accept new traffic. Lets say you get a spike in traffic, if you have a fully baked ami your autoscaling group can get an alert of the increase in traffic, launch a new instance and that instance will start handling traffic around 60 or so seconds. If you booted a generic ami and then provisioned the box you could easily be looking at 10's of minutes before it could handle traffic depending on the complexity of your app. This could easily lead towards long periods of down time.
DevOps here! Very much this. You can use Jenkins with Packer Builder and a configuration management tool of your choice to build an AMI for each code deployment (AMI generation costs nothing except for the space of the AMI). Need to scale 100 instances in <5 minutes? EASY.
What you don't want to do is attempt to scale, and you can't reach Github, or you overwhelm your configuration management tool, etc. Your AMIs should be built so they need almost nothing from external resources before being ready to serve requests.
Totally agree. In practice, we've found that running bootstrap scripts on fresh instances is a major pain point. When an instance is spun up as a result of a load spike, you need it up ASAP so waiting for bootstrapping to finish sucks. It's really a tradeoff between flexibility (configuration in code) vs spin up speed (configuration baked in AMI).
This setup is very similar to the setup I created at my previous job.
Instead of Ansible, I used a plain old shell script, but still idempotent. I did all the testing locally using vagrant/virtualbox. Application deployment was separated in our setup as well, also retrieving the latest application codebase during boot, but using a tgz stored on s3 with the application all the application dependencies (ruby gems and precompiled assets) already in there.
All in all I am very happy with this setup being able to autoscale in a few minutes.
Have you looked at docker as a replacement for creating all those AMIs? You could have one AMI (with Docker) installed, and then use Ansible to pull whichever container (app) you want onto each instance.
Using docker also has the added advantage of letting you test your images in the development environment (and it's much faster than vagrant-up for every service you want running locally for dev/test).
The problem I see with using a mostly-baked AMI like this is what happens if your git repo is down when you're trying to bring up a new instance? Bringing up a new server with old code is just asking for all sorts of little problems to spring up if, for example, the git server is down. Given your environment, partial failure seems to be a much worse scenario than an instance not coming up.
We've experimented with a similar configuration at GoGuardian. We settled on:
- Gzipping the subset of code that needs to be deployed
- Uploading that tar to S3 and creating some infrastructure so that new instances pull the current from S3 on boot
- Creating a fab script that ssh's into our instances and re-runs the on-boot script to point them to a new deploy version
That sounds like a pretty reasonable deploy process to me. Reasonably quick, and a much more straightforward way of getting instances going. I think their concerns about load spikes would be better handled by having an instance or two already spun up, so that additional load and/or crashed servers can be handled without needing a process that has more failure conditions.
I'm surprised you use fabric for the last mile instead of ansible. There's really no sequence of run() commands that can't be done better as a quick module.
Although I may have been more frustrated than most by paramiko's performance and stability.
In general, I like Ansible's concepts of idempotency, etc., and find them to be well suited for server configuration in general, but it does come at a price, both in terms of development time and speed (I actually find Ansible to be a lot slower than Fabric).
I've started thinking of server configurations in Ansible terms (i.e. "I want X module installed, Y directory present, etc"), but sometimes nothing beats a simple bash command to git pull && sudo service restart.
> I actually find Ansible to be a lot slower than Fabric
Have you tried it since they switched to using ssh by default (instead of paramiko)? Because in my experience Ansible can ship multiple modules faster than Paramiko can connect to a host. And it is vastly easier to build reliable tasks out of modules than bash scripts.
If you want to use paramiko (like Fabric does), you can -c paramiko too. This mode came first, but -c ssh offers more features and can be just as quick.
This deploys in a reliable way using a docker container and ShutIt, a tool we use in my corp for fast development of reliable and composable iterative steps: