This really resonates with me. I have been having identical issues with Dotcloud, a Heroku competitor.
There's nothing worse than wondering if deploying your stupid one-character typo fix will hang or leave the app in an incomplete state.
I myself am working on setting up salt stack for deployments. I like that all of it is in python, and after a couple of days I've begun to make progress.
I am migrating things slowly: first cron jobs, then internal tools, and at some point, our website. The only thing that really needs "scale" is our frontend, and I don't yet know how I'll manage that, but you know, I guess it is time to learn!
Huge fan of salt stack, and I may push Adrian in that direction. The changes we made here were minimal in the way he deploys, and with so many architecture changes behind the scenes, we decided less is more on that front.
The latter, unfortunately. (I wish I had more details - but for one reason or another, it never seems to work, be it a bug in the deployment system or a host which is down.)
Dreams of an identical staging environment were somewhat shattered when the "free tier" plans went away. Plus, sometimes a new version will be deployed to 2 servers, and the 3rd one will be hanging on an update. These kinds of things are difficult to plan for and test, since intentionally messing up a deployment is rather difficult with PaaS.
Shame, I was looking into Dotcloud just the other day and liked the more control they gave you, especially the custom nginx conf. And that they hosted static files for you instead of relying on third party services (i.e. S3).
i'm starting a process to deploy on aws, and i just finished creating my salt stack conf for a vagrant VM ( as a warm up). After reading the post i was in the process of ditching that for a simpler AMI-based deployment.
Could you elaborate on why salt stack is better than custom AMIs ? also if you have any link explaining how to deploy to ec2 using saltstack, that would be awesome.
AMIs are really useful for production servers when you want to quickly launch a bunch of similar servers or set up auto-scaling groups to launch new servers based on load. The issue is that to take full advantage of using AMIs, you need to re-build (bake) the AMI every time you make a change to your production machines.
Disadvantages of AMIs are that they are large files (which can be unwieldy) and are not in a format that can be immediately used in a vagrant instance or elsewhere, so you need to convert between AMI and .vbox or .box formats to use amis with your local vagrant instance.
With saltstack, you can deploy and configure machines from base Ubuntu or RHEL distros by using salty-vagrant for vagrant instances and salt-cloud for AWS / rackspace / openstack / vps machines. Once machines are deployed and connected to your salt-master, you can use your yaml config files to bring all the machines to the desired roles (api machine, load balancer, web front-end, etc.). This is more flexible than using AMIs, since you can roll out a minor change quickly to all servers, including production servers, without needing to wait to build a new AMI.
There are hybrid approaches also (some puppet/chef users do this) where for production machines you can have saltstack deploy to a machine that then gets built into a new AMI automatically. And then this new AMI gets updated into the autoscaling groups and is deployed to replace the current production servers. Sounds complicated... and it is, but at the cost of extra complexity it does give you the best of both worlds.
Sounds like the last option seems the best indeed. My salt configuration is way too slow to build to be used in an autoscaling scenario, that's one reason why AMI seemed suitable to me, but that last combination you mentioned (which i didn't know was possible) looks great.
As an alternative approach would be that a Juju charm (script) would handle the initial deployment of a stock Ubuntu AMI and the customization in one step (or with puppet/chef) and then allow you to add new instances based on scale (though currently not automatic). When you have changes to your service you update the charm and just `juju upgrade-charm`.
This would consolidate step 1, 5, and the "ongoing" into one tool and you'd get a cloud-agnostic deployment (openstack cloud or bare-metal).
I would really love to recommend Juju. I used it for a good couple months attempting to get even just a working Openstack deployment. The trouble is that for the bare-metal functionality you guys mean it to be paired with MaaS. Unfortunately the community I found to exist around this combination was basically zero.
Even a simple question went unanswered the couple times I posted it. Eventually I gave up and just went back to Puppet+Openstack modules for the configuration I was working on at the time.
Very cool. I looked at the Fuel stuff as well, but at the time it wasn't quite able to do everything I needed so I ended up rolling my own. The 2.2 update they have coming up soon sounds like it fixes all of the things that were 'wrong' for me though!
FWIW, I tested juju a few months ago and found it to be buggy and unreliable. Sometimes the instances would connect together correctly, and sometimes they would fail inexplicably. Didn't seem ready for any kind of production use to replace config mgmt tools.
I have tried ansible and love it. I invested the time (a few hours) to get a script working for my stack, and now I have a 70ish line script that will provision a server or VM (with a shared code directory in the VM), from clean installation to the codebase up and running in its production state, in one command.
New project? I just copy the script, change a few variables (names and packages it needs), and I get deployment for free.
I never thought of Ansible for orchestration particularly, but that sounds interesting! The current orchestration support in saltstack is somewhat limited, so it would be neat to check out.
Are people using Ansible to replace or manage fabric scripts? I'd like to figure out some way to limit the spread of one-off fabric scripts (reminds me too much of bash script proliferation) and it'd be great to get everything under one roof.
If you think puppet/chef is too much complexity you might find saltstack worth some attention. I haven't used it in real anger yet, but I've really liked what I've seen so far. I see they've even got instructions specific to aws now too 
Also worth mentioning cloud formation as well . That might make the pain of chef/puppet more of a worthwhile investment!
I really want to use Salt for something one of these days given I work with Python quite a bit... but basically at this point my investment with Puppet is relatively significant, so I'm not sure if the pain of switching would be worth it. I'm sure eventually I'll find a use for it though! I like what I've seen before.
> The way we set up Soundslice is relatively simple. We made a custom AMI with our code/dependencies, then set up an Elastic Load Balancer with auto-scaling rules that instantiate app servers from that AMI based on load.
Doesn't sound that simple to me (as a complete sysadmin noob). Somebody should write a book about this.
I'm a PHP dev working with Apache on a daily basis, but besides .htaccess and some minor changes I can't really do much. I would love an article which explained in a simple way how to scale your server and really debug problems with it.
Last week, I literally spun down an EC2 instance and signed up for Digital Ocean because I couldn't figure out how the hell to make stuff work on EC2, but have lots of VPS experience. As a dev and not a sysadmin, it's much easier to go with what you know... but I want to learn.
What's the difference between EC2 and a VPS for you? Do your VPSs already have things installed or a GUI? I've used EC2 before as a single server, never scaling. The main difference was installing things that are usually pre-installed (like on Ubuntu's official desktop image). Is that it or is it more about the scaling?
And thanks for the reference to Digital Ocean. Never heard of them before. Seems great, might try using them :)
> What's the difference between EC2 and a VPS for you?
For me it that restarting an EC2 instance deletes all the local storage. I have had good success just getting a big ass VPS, and running the database locally, and pushing text backups to S3. It is trivial to manage, and in the real world, downtime is more likely to be caused by configuration wonkiness than hardware failures.
You also have to have a huge amount of traffic to overwhelm a 24core / 96GB ram server. Why not put off the managing the complexity until you really are doing 10M page views per day?
If you where having all your data deleted when you restarted your EC2 server, then something was VERY wrong. I'm not an expert, but I've used EC2 a little bit and I think I hit that exact problem.
The thing is that for whatever reason, data wasn't being written to the EBS (the virtual hard disks for use with EC2 instances) and was instead being written to the "ephemeral storage", a really big local data store that every EC2 instance has that is basically a `/tmp` directory. If the server restarts, everything in the ephemeral storage is destroyed.
EBS is a slow turd. Running a database on EBS is running a database on network attached storage. I know tons of people do this, but I have no idea why. Provisioned IOPS will work, but it is expensive.
Local storage on a $5 Digital Ocean plan will do 2000 IOPS, where as that would cost $200/month with Amazon Provisioned IOPS. I know it is not an apples to apples comparison, but it is worth thinking about. Running a database on a local SSD is a good option for many people, and it is not an option that Amazon offers.
> Do your VPSs already have things installed or a GUI?
Nope. I prefer straight-up Arch linux. ssh in and go from there.
To add some concrete-ness to the mix, I was installing ejabberd. When it came time to ping the server... no response. I did the exact same steps on my Digital Ocean VPS and everything went fine. I had done whatever commands EC2 expects to open the right ports...
I'm a large user of AWS but, in general, I never feel like I'm in a true VPS. My last experience:
Out of the sudden one of our EC2 instances could not connect to another, causing our HA solution to spun dozens of instances and eventually crash too. It was clear to me that the dest machine was behind some firewall, we went to the security group, the machine was supposed to accept any connection, from any port, any host. The instance itself had no active firewall.
Out of desperation I added the very own security group to itself. It worked for a few hours, then stopped again, I removed, it came back to work and still working (8+ months now)
This is only one of various mysterious events I've seen happening on AWS.
For EC2 security groups you have to open access to both the correct ports and protocols. Ports are a concept at layer 4 of the OSI model, while ping, or more correctly 'ICMP Echo Request', is lower down the TCP/IP stack at layer 3. So when configuring the security group, look for the option to choose Protocols, then enable ICMP :)
The big difference between traditional VPS and IaaS services like EC2 (or Google Compute Engine, etc.) is that the latter has dynamic scaling (and a pricing model built around the assumption that you will use dynamic scaling) as a core feature; for traditional VPS-style work on an IaaS, you are likely to end up paying a premium for flexibility you aren't using (and possibly dealing with some attendant management complexity from the same source), but, other than that, IaaS should be a complete substitute for VPS.
At that point, I had already been googling for so long that giving up was the best decision. Also, when I said 'ping' I meant 'hit via a web browser' as well as ping on the command line.
Furthermore, I don't feel very comfy when doing 'sysadmin via Google,' who knows what stuff I'm screwing up?
I had never used Digital Ocean before. But getting going with them was the exact same as my previous Linode, Rackspace Cloud, prgmr.com, and every other VPS provider I have tried. I don't need to install special command line tools, or set up security groups, or generate .pems... I ask for a server, they send me an email with a password, I log in, change it, and set up my ssh prefs. Super easy, using the same stuff I use everywhere else.
I have quite a bit of devops experience myself, especially on Amazon Web Services and their CloudFormation service combined with Chef. I would be very interested in a blog post (or a series, go nuts!) and I'm considering doing a series of write-ups on our setup as well.
I would LOVE to see a detailed breakdown on how an experienced sysadmin would set something like this up. I've cobbled together systems before, but I've found that doing so in a robust way is difficult.
You know those days when you think you know a bit, and stumble upon someone who knows vastly more than you? Today was one of those days. It appears we're both Devops guys in Chicago; can I buy you a beer sometime?
That's actually, honestly, a bad way to do it. It's fine to pre-load an instance with source and application/package dependencies. At this point though, you really should be backing those up with it being deployed by Puppet/Chef.
Essentially you pre-run a puppet manifest and have it do all the first-run processing and then store THAT as the AMI/instance. This way you have an instance that only needs a few seconds to get itself ready while also integrating configuration management (which really you should be doing these days). Just some suggestions from a sysadmin.
Depends on how fast you want new instances to be online. Sometimes you want to respond to demand very fast, it can take a few minutes for an instance to register with ELB, if you have to download/build/configure packages prior you are just adding minutes on time. The faster you can spin up instances the higher you can run your servers as you need to tell amazon at what threshold should I spawn more instances
There are 3 general levels you can take with AMIs:
1) Vanilla instance AMI, have puppet/chef install everything for you, then fetch your app code and configure that
2) Use another tool to have your instance built with puppet/chef and then capture that into an AMI. Once it spins up it just needs to get your app code. Idea here is your services arent likely to need updated as fast as your app code.
3) Same as two, but when your production code is in release/maintenance mode, bake everything into an AMI. When you need to deploy new code, you need to create a new AMI, but you are creating an AMI with scripts so its no big deal :) All you have to do now is update your cloudformation
After building a half dozen one-off sites, each with a simple, single-server config, I found that going sitting down and learning Chef was helpful for jumpstarting the next project. All of the basic needs are the same: nginx, unicorn, postgresql, etc, and I was tired of reading through my jumble of system configuration notes. Yes, I could have baked AMIs and created new AMIs, but not everything was Amazon-based.
I don't really agree with this. If you're defining a whole crazy master/agent relationship with Puppet then it's overkill for them, no doubt.
There's a reason many Vagrant boxes I see use Puppet to instantiate though: It's friggin simple to get started with. More advanced configurations might even end up requiring you to build your own modules and extended manifests, but I expect this configuration detailed in the article would probably amount to a couple hundred lines for the entire manifest.
I think this also underscores one of the things I like the most about Puppet (instead of Chef, which is a fine CM system as well); It's easy to get started with but powerful enough to get things done down the road. Honestly, there's never a better time to start working with Puppet than from the very beginning of a project!
Overkill for now. But it's best to learn the tools you will need while things are still simple. You know how things should get set up and can ensure that what you're telling Puppet/Chef to do achieves that.
If you wait until you've got a more complex configuration, you're fighting both the tool syntax and system setup requirements. Not to mention the clock that's ticking and telling you that you needed to be all up and running yesterday.
Add in reasonable max-node count: automated processes that cost you money are dangerous if not governed. You don't want your provisioning tool spinning up endless amounts of nodes, all because some backend database has become slow (some new user pattern has emerged and is causing indexed queries to queue up), and all response times are getting trashed.
It's not simple, AMIs are a pretty bad idea and not something you should rely on due to the statefulness and vendor lock-in unless you can reproduce it in disparate deployment environments from scripts.
- Create a new EC2 instance (Literally click "Launch instance" and select Ubuntu.
- Login and install any dependencies (sudo apt-get update && sudo apt-get my list of awesome packages i need installed)
- Go back to the EC2 dashboard and click "Create AMI" which is just imaging the server.
- I can't attest to the ease of ELB and auto-scaling rules as I haven't used them, but I would assume it's fairly straight forward - and there is a ton of resources on AWS to help you out :-)
I am working on a site right now http://makerops.com that will walk through via screencasts, texts, and interactive learning a lot of the issues described by the OP. How to interact, as a dev with various cloud APIs, to auto scale, manage configurations, etc.
I gave AEB a try with django about 3 months ago and let me say it was a major headache. Some times it would work other times the load balancer would fail other times it would just laugh in your face( there were many more but you get the point). I wasted weeks trying to get our app running, when it came down to only 3days before our demo run I made some bash and fabric scripts and all was running on multiple instances in about 1hr. Took a snapshot of our state and used that Ami for auto scaling. I myself won't be trying AEB again anytime soon.