Hacker News new | comments | show | ask | jobs | submit login
Ask HN: Are you using AWS ECS in production?
9 points by skyisblue 95 days ago | hide | past | web | 9 comments | favorite
We're thinking of migrating to ECS and wondering what the state of it is currently.

Are there still issues with agents disconnecting?

Should we not bother and go straight to kubernetes?

We are. Agent seems pretty solid. Biggest issue I've seen is when doing a new deploy, sometimes old tasks keep running.

Biggest gotcha: tasks restarting over and over because of bad load balancer config on my part (for instance, using 200 as status code when the healthcheck endpoint returns a 302)

Some of what won me over:

* IAM role integration at both instance and task level

* ecs-cli can use docker-compose.yml (with minor revision)

* easy use of spot fleets

* cron support for tasks

* easy to script in control of clusters into your app with AWS SDK

I evaluated Kubernetes, and may give it another look soon, but ECS was pretty easy to get going.

I'm currently a developer advocate for ECS at AWS, so I'm pro ECS as you'd expect. But before I worked at AWS I used ECS in production (since the early beta).

At the time we ran a microservices deployment of ~15 services on ~20 hosts. ECS made orchestrating the services easy for a couple reasons:

Unlike with self managed Kubernetes on AWS we could have high availability with just a simple cluster that just had two machines. Running the Kubernetes control plane high availability requires a lot of setup, and while there are tools like kops that are helping out with setup now its still a lot of extra administration. (See https://kubernetes.io/docs/admin/high-availability/) The advantage of ECS here is that you just start two or three instances in different availability zones that run an agent and that is all it takes to have high availability. You don't have to pay anything extra for the control plane resources, or worry about monitoring it or maintaining it.

Also AWS ECS integrates really well with all the other AWS services. For example metrics from your services automatically get piped to CloudWatch, where you can set up an alarm that triggers a Lambda function, or publishes to an SNS topic that triggers a Pagerduty notificaton. Or you can use the metrics to make a CloudWatch Dashboard for creating a custom overview of your cluster. Logs likewise go to CloudWatch where you can setup triggers that execute a Lambda function. You can give each service its own IAM role to control which resources (DynamoDB tables, S3 buckets, etc) that specific service has access to. ECS integrates really well with Application Load Balancer, which allows you to easily setup a mixed architecture, where some traffic is routed to services that are running as containers under ECS, and other traffic is served by older applications running directly on hosts with no container.

If you are looking for more info as you evaluate whether or not AWS ECS is right for you please check out this list of ECS resources, most of which are created by the developer community: https://github.com/nathanpeck/awesome-ecs

And feel free to reach out using the Twitter handle or email on my profile if you have any questions or feedback on ECS.

Thanks, the ECS resources look very handy, will check them out!

We've experienced agent crashes in the past but those seem to have been resolved now. Occasionally we will find a docker container from an old task which is still running, but about which ECS knows nothing. Definitely can make for an interesting troubleshooting adventure.

And it seems like maybe the ECS team is trying to move a little too fast recently. They released this blog which claims the run-task api supports several new override parameters but the backend still doesn't actually do anything with them it just silently ignores them.

https://aws.amazon.com/about-aws/whats-new/2017/06/amazon-ec... https://github.com/boto/boto3/issues/1184

i think they were talking about the cli, cause that's what they link to from the blog post. they should be clearer though.

Don't use it. It's crapware. We have 100+ hosts. Problems include the scheduler assigning tasks to nodes that report their ecs agent as crashed. Just use kubernetes it's going to be more stable and have more support. I wish execs didn't take aws seriously when they promised features 6 months down the line.

I'd love to hear more about this problem and see if we can get to a root cause and help you resolve it, because it does not sound like standard ECS behavior. Please email peckn@amazon.com and I can connect you to the right people to figure out what is going on.

ECS doesn't stop trying to schedule tasks ever. So you can ddos and crash your entire cluster if one if your containers fails on startup.

Issue is still open on GitHub, you can easily blow through your entire iops budget and have cpu pegged at 95% iowait.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact