Hacker News new | past | comments | ask | show | jobs | submit login
Challenges of Deployment to ECS (convox.com)
80 points by mwarkentin on April 29, 2016 | hide | past | favorite | 37 comments

"Say we want to deploy a web service as four containers running off the “httpd” image....This is simple to ask for but deceptively hard to actually make happen."

...and yet, would take almost no work to set up using a non-dockerized workflow. I don't understand why so many people are putting themselves through this. It's becoming common to hear people from dinky little startups going down dark rabbit holes trying to build their infrastructure like Google -- but it's a total distraction!

If you're running a service with a small number of machines, don't do any of this. Architect a sensible multi-AZ deployment (i.e. a cluster of 2-3 machines, an ELB, a VPN, and a firewall/bastion server), spin up instances by hand, and upgrade things as needed. Create AMIs for your machine classes, and get yourself used to working with a sensible upgrade schedule. Doing this for a small number of machines (e.g. N <= 25) won't take an appreciable amount of your time.

Once you start to have more machines than that, you'll probably also have the resources to get someone who knows what they're doing to set up more "magical" automated management schemes. Don't bury yourself in unnecessary complexity just because it's the hot new tech of the moment.

I mostly agree with your basic point, but I disagree that AWS is a necessary part of the situation. If you aren't dynamically spinning VM's up/down depending on usage - that is, if your EC2 instances are a pretty fixed list of always-on machines - you're throwing money away, and you're spending more time and effort to make use of proprietary infrastructure you aren't using as intended.

If you have a 'static' number of servers for your app - just rend a bunch of VPS. Use whatever automated setup tool works for you. I'm a big fan of Debian config & meta packages, but ansible, chef, hell even fucking shell scripts will work.

I agree in general. You want to keep everything as simple as possible. If you have a simple, small service, or don't deploy too frequently, traditional server management will be better

Our own experience with ECS has been similarly negative. While this was 6 months ago, I am not aware of significant improvement.

In general, the whole thing feels rushed and duct taped together. Networking model (inherited from Docker) doesn't play nice with ELB.

- The built-in AWS tools for monitoring are not container aware.

- We've had multiple occurrences of the ECS daemon dying.

- Very little visibility into the progress of deploys. The API/console will report something as "running", when it in fact was still loading up.

If you watch their videos, they promise integration with Marathon, but if you look at the code, it's in "proof of concept" stage.

At this stage, GCE is significantly ahead of AWS. Out of the box, you get a top notch container story, logging and monitoring.

You also get a private registry. And all the complexities of authentication are wrapped up in a nice package.

And upgrading your cluster is easy too. Just one command and it rolls it out server-by-server, doing a full health check afterwards.

k8s (not specific to GCE) also integrates with network drives, so attaching and maintain persistent storage is really easy.

I've used ECS and it feels just so vanilla by comparison. AWS should have just built a hosted cluster system, rather than one from scratch.

We too experienced the ECS daemon dying, and although AWS Support tried we never really got to the bottom of it.

ECS suffers from a lack of proper monitoring and has no satisfactory way to recover from certain types of failures, such as when the daemon dies.

We ended up switching to Kubernetes, still on AWS, and have been much much happier.

Thanks for sharing your experience but I have a hunch you didn't read the article carefully.

Over all ECS is great. It took hard work to get here but I am easily managing hundreds of production clusters. I am seeing excellent reliability with extremely low cost of ownership.

All this while continuing to benefit from the massive AWS ecosystem.

GCE certainly has advantages too.

I'm glad that we have such great tools and strong competition. This is yet another strong confirmation that the cloud model is correct and offering massive efficiency to all.

>> All of this works without requiring that we install or operate our own container scheduler system like Mesos, Kubernetes, Docker Swarm or Core OS Fleet.

You can use kubernetes without installing and operating it yourself on Google Cloud PLatform too, it's called Google Container Engine but it's k8s under the hood.

My experience with ECS is very brief, as contrasted with several months working daily with k8s. My first impressions of ECS was that it is cobbled together from a bunch of existing AWS services, and as such it requires you to get far more involved in the various APIs than kubernetes on GCP does.

Overall kubernetes feels more like a cohesive abstraction because that's what it is. ECS by comparison feels like a solution pieced together out of Amazon's existing proprietary parts because that's what it is. I'm sure they will be improving this.

It would be amazing if Amazon just used kubernetes in the background like Google compute uses it for their container engine. It seems like everyone knows kubernetes did the best job at container orchestration, but everyone wants to keep their own tools (docker, aws, etc.)

The biggest missed opportunity of ECS that AWS missed was completely hiding the complexity and concerns of managing EC2 instances. If you use ECS you have to deal with the complexity of both and manage those resources. The dream being sold by the container industry is that you don't really care about the machines you're containers are running on. ElasticBeanstalk gets this right because they hide that concern.

ECS is akin to selling git on top of SVN. It doesn't really make sense.

Bryan Cantrill gave a talk about the craziness of running containers on VMs at https://www.youtube.com/watch?v=coFIEH3vXPw.

I talked about some of this in another post comparing ECS to Joyents Triton.


Triton does an incredibly good job of hiding servers.

There is one big flaw with this approach which is the entire tool set needs to be rewritten for the container only universe.

How do you debug containers without access to a host OS?

How do you isolate neighbors without a machine boundary somewhere?

Boxes, instances and VMs are never going away. It's simply a question of who's responsibility it is to run them.

Yeah, that starts taking you down the path to "serverless" - eg. AWS Lambda. I think iron.io has a fully container-based approach here (I've only used their pre-baked containers though).

An application written purely in client-side code, but that still requires a server-side component is not "server-less".

A "server-less" app is one that literally has no need for it. Just because you aren't in control of the server-provided resources the app depends on, doesn't make it "server-less".

Author here.

I'd love to compare more notes with everyone about deploying to ECS.

If you want to play with an ECS cluster, `convox install` is a free and open source tool that sets everything up in minutes. Little to no AWS knowledge required. https://convox.com/docs/overview/

I would love to hear what advantages ECS has over Kubernetes. I've found that Kubernetes solves all seven of those challenges (and more – service discovery is a challenge, too!) pretty handily, and is cloud provider independent on top of that (but can have provider-dependent tie-ins where desired).

1. Cluster set up on AWS/GCP is fairly automatic. On vanilla machines it's straightforward, but involved, to install.

2. Kubernetes relies on etcd, which uses Raft, to run. It does depend on one master node to provide an API server to control the instances, though.

3. Pods have health checks, and deployments will halt if new instances fail their health checks.

4. Deployments handle rolling updates within the cluster

5. The scheduler can cordon off or drain nodes before destroying them, and new nodes can easily be added to the cluster

6. fluentd is often run in a DaemonSet for this purpose

7. kubectl makes deployment feel like playing with legos – interacting with a bunch of objects in a predictable and documented manner. In the case you laid out, for example, many rapid deployment version changes are handled gracefully by the deployment object, scaling down the old two versions and scaling up the new version.

The big advantages I see:

AWS ecosystem. Everyone is already running stuff here, or wants to.

Particularly data. If I have important stuff in RDS or Redshift I want to run my apps in a VPC next to that.

Managed service. I don't want to be responsible for etcd. I manage 150 clusters and counting. I couldn't do this without ECS and DynamoDB and their excellent SLAs.

I am aware GKE does this. When I use Kubernetes that's what I'll use.

We had so much trouble deploying on ECS. Everything from its unhelpful debug/troubleshooting tools to its unhelpful log/event messages (yes, thanks for letting me know that you tried running a task 10 times and it failed; or that you had I/O issues pulling from the AWS Docker Registry) were road blocks that when we finally got it working it still needed a lot of tuning. Even getting images from the AWS Docker Registry to work on ECS was a piece of work, and required a custom IAM policy configuration for which they hid on the marketing page of the AWS Docker Registry FAQ. Did I mention the documentation was bad?

In the end, we just opted to use GCE Kubernetes/GKE, a really developed and thought-out product. Aside from being open-source and portable, we are still kicking ourselves for having devoted so much time on ECS (which EC2 is more expensive than GCE too!). We only spent a few hours getting it setup compared to the few all nighters we spent tuning AWS.

Plus, when container pods died, Kubernetes would automatically spin up a new pod (this sometimes happened on ECS), and for bonus points, everything from standard output/error in a container went to Google's Stackdriver logging service without configuration. Google's docs were actually quite good; much better than the unhelpful AWS docs.

ECS is order of magnitude more complex than almost anything I had experienced with (taking out Marathon).

I found ElasticBeanstalk to me MUCH simpler for Docker based deployments. It's using ECS behind the scenes for multi-container instance.

You essentially deploy a zip with with a Dockerrun.json metadata file and it handles everything for you, including rolling deploys based on your auto scaling groups.

I never found the ECS definitions of Task and the rest really intuitive to work with.

That's how I'm currently running our apps. There are still loads of problems and bugs on the AWS side, but that's to be expected. However, it's far from ideal in resource usage and automation (of the whole infrastructure not just individual apps). Part of my goals and probably that of the article's author was to increase resource usage. My servers are typically around 1%-10% CPU and we have a lot of apps. With a minimum of 2 decent EC2 instances per app (c4.large or better), that comes to a lot of servers with a lot of unused capacity. I could definitely see huge room for improvement and increased automation for the whole cluster (including automated deployment of supporting services) with some clustering solution based on Docker (still need to evaluate). Given my horrific experiences with AWS and AWS Beanstalk in particular, I would never run said cluster on ECS, however, due to extremely poor quality of AWS services outside of EC2 and its closed source nature.

I didn't realize ElasticBeanstalk delegated to ECS. Thanks for sharing.

ElasticBeanstalk does have some serious simplicity advantages for the type of apps it was designed for.

I'll definitely second your opinion that raw ECS can be very challenging. It's begging for some good tools to help you manage it.

I believe ElasticBeanstalk only uses ECS if you're doing multi container. If you're just deploying a single Docker image, I believe it's still using the Docker AMI on an EC2 instance.

I mentioned that in my original comment. Yeah.

What did you find complex about Marathon?

Few things

1. Mesos dependency 2. Security (Firewall) settings 3. DNS settings (You need the FQDN to resolve from each part of the system) 4. Dynamic load balancing (ELB bridge, Nginx Bridge) 5. No control over the underlying cluster, so to actually scale your app, you will need to scale your cluster first.

Those are just off the top of my head. I am using Marathon every day but I still carry a lot of pain frm the first days/weeks of using it.

Can you expand on 5.? I'm not following what you mean by 'need to scale cluster first to scale app'.


Lets say I have a cluster of 2 machines.

I deploy my Docker container using Marathon, the constraints is that it's unique per machine.

Now, I want to scale my app to 30 containers.

Without Marathon controlling Amazon APIs and actually scaling my underlying cluster to 30 machines first, the scale will never happen.

This creates a disconnect between scaling strategies.

I've generally enjoyed my time with ECS and I think it's pretty likely it will be a strong-contender/the-winner of the docker wars, because net-net if you're starting from scratch without experience and a skilled ops team it's the easiest to Standup.

But I definitely agree that at this stage there's some oddly rough edges. I'm glad to hear logging is a bit better. But I would love if that were just solved by default. Similarly build/deploy. I think a teeny bit more standardization and UI would make that a ton easier.

Overall though I'm still bullish. I put together a sample terraform config that is the skeleton for a basic rails app in ECS https://github.com/jdwyah/rails-docker-ecs-datadog-traceview...

I found ECS to be quite good honestly (recent experience).

I have some multi-server deploys running with load balancers, rolling restarts, SSL, private registries and more.

In fact, I like it so much that I created an end to end course that teaches you all about using roughly half a dozen AWS resources to deploy and scale a web application with ECS.

Details can be found here (it's an online course taken at your own pace and costs $20):


The example application is a multi-service rails app that uses postgres, redis and sidekiq but you can follow along without needing any rails experience.

Good course, I'm about half way through, though from the comments in this thread I'm wondering if I should just give GKE a try.

Thanks, I'm glad you like it so far. Yeah, there's nothing wrong with trying new things if you want to experiment with Google's platform.

Just in case that came across wrong, I wasn't taking a dig at you. I'm enjoying and will use the knowledge from your course even if we opt to go with GKE. I started with ECS just because I have a lot more AWS experience.

It's ok, I knew your intentions.

>All of this works without requiring that we install or operate our own container scheduler system like Mesos, Kubernetes, Docker Swarm or Core OS Fleet.


>We need to bring and configure our own instances, load balancers, logging, monitoring and Docker registry. We probably also want some tools to build Docker images, create Task Definitions, and to create and update Tasks Services.

Doesn't sound like much of a win then. That sounds annoying. I just set up mesos/dcos on AWS, and it sounds like the same amount of effort, only now I've got a platform-independent solution with great UI and cli, along with load balancers and routing. Is ECS worth the effort?

The first time I heard/read about ECS, I thought it's going to be like Lambda for containers.

Amazon took the managing instances, managing containers, selected the hardest to comprehend parts and wrapped with with a cumbersome JSON definition.

Marathon takes a much simpler approach, but it lacks the control over the underlying cluster.

ECS should be a combination of the marathon with control over the underlying cluster. I deploy a container and I don't want to worry about anything else after that.

For now, I'd seriously focus on HTTP facing services. With simple health checks rules for scaling.

Don't rule out the Azure Container Service, which is based on the open source DC/OS[1] and is, arguably, the most robust and proven container system.

[1] http://dcos.io

Can you say more? Many, many businesses bet their workloads on Google Container Engine and has been GA for nearly a year with a 99.95% SLA, was there something about it that we could do to demonstrate that it's robust or proven better?

Disclosure: I work at Google on Google Container Engine

any tutum users available to weigh in?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact