
Ask HN: Are you using AWS ECS in production? - skyisblue
We&#x27;re thinking of migrating to ECS and wondering what the state of it is currently.<p>Are there still issues with agents disconnecting?<p>Should we not bother and go straight to kubernetes?
======
bdcravens
We are. Agent seems pretty solid. Biggest issue I've seen is when doing a new
deploy, sometimes old tasks keep running.

Biggest gotcha: tasks restarting over and over because of bad load balancer
config on my part (for instance, using 200 as status code when the healthcheck
endpoint returns a 302)

Some of what won me over:

* IAM role integration at both instance and task level

* ecs-cli can use docker-compose.yml (with minor revision)

* easy use of spot fleets

* cron support for tasks

* easy to script in control of clusters into your app with AWS SDK

I evaluated Kubernetes, and may give it another look soon, but ECS was pretty
easy to get going.

------
NathanKP
I'm currently a developer advocate for ECS at AWS, so I'm pro ECS as you'd
expect. But before I worked at AWS I used ECS in production (since the early
beta).

At the time we ran a microservices deployment of ~15 services on ~20 hosts.
ECS made orchestrating the services easy for a couple reasons:

Unlike with self managed Kubernetes on AWS we could have high availability
with just a simple cluster that just had two machines. Running the Kubernetes
control plane high availability requires a lot of setup, and while there are
tools like kops that are helping out with setup now its still a lot of extra
administration. (See [https://kubernetes.io/docs/admin/high-
availability/](https://kubernetes.io/docs/admin/high-availability/)) The
advantage of ECS here is that you just start two or three instances in
different availability zones that run an agent and that is all it takes to
have high availability. You don't have to pay anything extra for the control
plane resources, or worry about monitoring it or maintaining it.

Also AWS ECS integrates really well with all the other AWS services. For
example metrics from your services automatically get piped to CloudWatch,
where you can set up an alarm that triggers a Lambda function, or publishes to
an SNS topic that triggers a Pagerduty notificaton. Or you can use the metrics
to make a CloudWatch Dashboard for creating a custom overview of your cluster.
Logs likewise go to CloudWatch where you can setup triggers that execute a
Lambda function. You can give each service its own IAM role to control which
resources (DynamoDB tables, S3 buckets, etc) that specific service has access
to. ECS integrates really well with Application Load Balancer, which allows
you to easily setup a mixed architecture, where some traffic is routed to
services that are running as containers under ECS, and other traffic is served
by older applications running directly on hosts with no container.

If you are looking for more info as you evaluate whether or not AWS ECS is
right for you please check out this list of ECS resources, most of which are
created by the developer community: [https://github.com/nathanpeck/awesome-
ecs](https://github.com/nathanpeck/awesome-ecs)

And feel free to reach out using the Twitter handle or email on my profile if
you have any questions or feedback on ECS.

~~~
skyisblue
Thanks, the ECS resources look very handy, will check them out!

------
mmontagna9
We've experienced agent crashes in the past but those seem to have been
resolved now. Occasionally we will find a docker container from an old task
which is still running, but about which ECS knows nothing. Definitely can make
for an interesting troubleshooting adventure.

And it seems like maybe the ECS team is trying to move a little too fast
recently. They released this blog which claims the run-task api supports
several new override parameters but the backend still doesn't actually do
anything with them it just silently ignores them.

[https://aws.amazon.com/about-aws/whats-new/2017/06/amazon-
ec...](https://aws.amazon.com/about-aws/whats-new/2017/06/amazon-ecs-runtask-
and-starttask-apis-now-support-additional-override-parameters/)
[https://github.com/boto/boto3/issues/1184](https://github.com/boto/boto3/issues/1184)

~~~
cpufry
i think they were talking about the cli, cause that's what they link to from
the blog post. they should be clearer though.

------
Sevii
Don't use it. It's crapware. We have 100+ hosts. Problems include the
scheduler assigning tasks to nodes that report their ecs agent as crashed.
Just use kubernetes it's going to be more stable and have more support. I wish
execs didn't take aws seriously when they promised features 6 months down the
line.

~~~
Sevii
ECS doesn't stop trying to schedule tasks ever. So you can ddos and crash your
entire cluster if one if your containers fails on startup.

~~~
Sevii
Issue is still open on GitHub, you can easily blow through your entire iops
budget and have cpu pegged at 95% iowait.

