...and yet, would take almost no work to set up using a non-dockerized workflow. I don't understand why so many people are putting themselves through this. It's becoming common to hear people from dinky little startups going down dark rabbit holes trying to build their infrastructure like Google -- but it's a total distraction!
If you're running a service with a small number of machines, don't do any of this. Architect a sensible multi-AZ deployment (i.e. a cluster of 2-3 machines, an ELB, a VPN, and a firewall/bastion server), spin up instances by hand, and upgrade things as needed. Create AMIs for your machine classes, and get yourself used to working with a sensible upgrade schedule. Doing this for a small number of machines (e.g. N <= 25) won't take an appreciable amount of your time.
Once you start to have more machines than that, you'll probably also have the resources to get someone who knows what they're doing to set up more "magical" automated management schemes. Don't bury yourself in unnecessary complexity just because it's the hot new tech of the moment.
If you have a 'static' number of servers for your app - just rend a bunch of VPS. Use whatever automated setup tool works for you. I'm a big fan of Debian config & meta packages, but ansible, chef, hell even fucking shell scripts will work.
In general, the whole thing feels rushed and duct taped together.
Networking model (inherited from Docker) doesn't play nice with ELB.
- The built-in AWS tools for monitoring are not container aware.
- We've had multiple occurrences of the ECS daemon dying.
- Very little visibility into the progress of deploys. The API/console will report something as "running", when it in fact was still loading up.
If you watch their videos, they promise integration with Marathon, but if you look at the code, it's in "proof of concept" stage.
At this stage, GCE is significantly ahead of AWS.
Out of the box, you get a top notch container story, logging and monitoring.
And upgrading your cluster is easy too. Just one command and it rolls it out server-by-server, doing a full health check afterwards.
k8s (not specific to GCE) also integrates with network drives, so attaching and maintain persistent storage is really easy.
I've used ECS and it feels just so vanilla by comparison. AWS should have just built a hosted cluster system, rather than one from scratch.
ECS suffers from a lack of proper monitoring and has no satisfactory way to recover from certain types of failures, such as when the daemon dies.
We ended up switching to Kubernetes, still on AWS, and have been much much happier.
Over all ECS is great. It took hard work to get here but I am easily managing hundreds of production clusters. I am seeing excellent reliability with extremely low cost of ownership.
All this while continuing to benefit from the massive AWS ecosystem.
GCE certainly has advantages too.
I'm glad that we have such great tools and strong competition. This is yet another strong confirmation that the cloud model is correct and offering massive efficiency to all.
You can use kubernetes without installing and operating it yourself on Google Cloud PLatform too, it's called Google Container Engine but it's k8s under the hood.
My experience with ECS is very brief, as contrasted with several months working daily with k8s. My first impressions of ECS was that it is cobbled together from a bunch of existing AWS services, and as such it requires you to get far more involved in the various APIs than kubernetes on GCP does.
Overall kubernetes feels more like a cohesive abstraction because that's what it is. ECS by comparison feels like a solution pieced together out of Amazon's existing proprietary parts because that's what it is. I'm sure they will be improving this.
ECS is akin to selling git on top of SVN. It doesn't really make sense.
Bryan Cantrill gave a talk about the craziness of running containers on VMs at https://www.youtube.com/watch?v=coFIEH3vXPw.
Triton does an incredibly good job of hiding servers.
There is one big flaw with this approach which is the entire tool set needs to be rewritten for the container only universe.
How do you debug containers without access to a host OS?
How do you isolate neighbors without a machine boundary somewhere?
Boxes, instances and VMs are never going away. It's simply a question of who's responsibility it is to run them.
A "server-less" app is one that literally has no need for it. Just because you aren't in control of the server-provided resources the app depends on, doesn't make it "server-less".
I'd love to compare more notes with everyone about deploying to ECS.
If you want to play with an ECS cluster, `convox install` is a free and open source tool that sets everything up in minutes. Little to no AWS knowledge required. https://convox.com/docs/overview/
1. Cluster set up on AWS/GCP is fairly automatic. On vanilla machines it's straightforward, but involved, to install.
2. Kubernetes relies on etcd, which uses Raft, to run. It does depend on one master node to provide an API server to control the instances, though.
3. Pods have health checks, and deployments will halt if new instances fail their health checks.
4. Deployments handle rolling updates within the cluster
5. The scheduler can cordon off or drain nodes before destroying them, and new nodes can easily be added to the cluster
6. fluentd is often run in a DaemonSet for this purpose
7. kubectl makes deployment feel like playing with legos – interacting with a bunch of objects in a predictable and documented manner. In the case you laid out, for example, many rapid deployment version changes are handled gracefully by the deployment object, scaling down the old two versions and scaling up the new version.
AWS ecosystem. Everyone is already running stuff here, or wants to.
Particularly data. If I have important stuff in RDS or Redshift I want to run my apps in a VPC next to that.
Managed service. I don't want to be responsible for etcd. I manage 150 clusters and counting. I couldn't do this without ECS and DynamoDB and their excellent SLAs.
I am aware GKE does this. When I use Kubernetes that's what I'll use.
In the end, we just opted to use GCE Kubernetes/GKE, a really developed and thought-out product. Aside from being open-source and portable, we are still kicking ourselves for having devoted so much time on ECS (which EC2 is more expensive than GCE too!). We only spent a few hours getting it setup compared to the few all nighters we spent tuning AWS.
Plus, when container pods died, Kubernetes would automatically spin up a new pod (this sometimes happened on ECS), and for bonus points, everything from standard output/error in a container went to Google's Stackdriver logging service without configuration. Google's docs were actually quite good; much better than the unhelpful AWS docs.
I found ElasticBeanstalk to me MUCH simpler for Docker based deployments. It's using ECS behind the scenes for multi-container instance.
You essentially deploy a zip with with a Dockerrun.json metadata file and it handles everything for you, including rolling deploys based on your auto scaling groups.
I never found the ECS definitions of Task and the rest really intuitive to work with.
ElasticBeanstalk does have some serious simplicity advantages for the type of apps it was designed for.
I'll definitely second your opinion that raw ECS can be very challenging. It's begging for some good tools to help you manage it.
1. Mesos dependency
2. Security (Firewall) settings
3. DNS settings (You need the FQDN to resolve from each part of the system)
4. Dynamic load balancing (ELB bridge, Nginx Bridge)
5. No control over the underlying cluster, so to actually scale your app, you will need to scale your cluster first.
Those are just off the top of my head. I am using Marathon every day but I still carry a lot of pain frm the first days/weeks of using it.
Lets say I have a cluster of 2 machines.
I deploy my Docker container using Marathon, the constraints is that it's unique per machine.
Now, I want to scale my app to 30 containers.
Without Marathon controlling Amazon APIs and actually scaling my underlying cluster to 30 machines first, the scale will never happen.
This creates a disconnect between scaling strategies.
But I definitely agree that at this stage there's some oddly rough edges. I'm glad to hear logging is a bit better. But I would love if that were just solved by default. Similarly build/deploy. I think a teeny bit more standardization and UI would make that a ton easier.
Overall though I'm still bullish. I put together a sample terraform config that is the skeleton for a basic rails app in ECS https://github.com/jdwyah/rails-docker-ecs-datadog-traceview...
I have some multi-server deploys running with load balancers, rolling restarts, SSL, private registries and more.
In fact, I like it so much that I created an end to end course that teaches you all about using roughly half a dozen AWS resources to deploy and scale a web application with ECS.
Details can be found here (it's an online course taken at your own pace and costs $20):
The example application is a multi-service rails app that uses postgres, redis and sidekiq but you can follow along without needing any rails experience.
>We need to bring and configure our own instances, load balancers, logging, monitoring and Docker registry. We probably also want some tools to build Docker images, create Task Definitions, and to create and update Tasks Services.
Doesn't sound like much of a win then. That sounds annoying. I just set up mesos/dcos on AWS, and it sounds like the same amount of effort, only now I've got a platform-independent solution with great UI and cli, along with load balancers and routing. Is ECS worth the effort?
Amazon took the managing instances, managing containers, selected the hardest to comprehend parts and wrapped with with a cumbersome JSON definition.
Marathon takes a much simpler approach, but it lacks the control over the underlying cluster.
ECS should be a combination of the marathon with control over the underlying cluster. I deploy a container and I don't want to worry about anything else after that.
For now, I'd seriously focus on HTTP facing services. With simple health checks rules for scaling.
Disclosure: I work at Google on Google Container Engine