Hacker News new | past | comments | ask | show | jobs | submit login
The Fargate Illusion (leebriggs.co.uk)
107 points by ingve 6 days ago | hide | past | web | favorite | 62 comments

The author is purposefully making things more difficult.

- using Terraform instead of cloud formation

- not using the default create a VPC wizard like anyone new would.

- he said if we wanted to use lambda functions he would still have to setup networking. It depends on what he was doing. If he used a regular database, yes he would have had to setup a subnet for his RDS cluster, but if he used DynamoDB, he wouldn’t need to do that. But even then, the default VPC wizard that most people would have used would have been good enough. But either way, that has nothing to do with K8s vs Fargate.

No even when you run a lambda “inside your VPC”, you don’t get more security. The lambda is never running inside your VPC. It is running inside an AWS VPC and connecting to your VPC via a network interface (ENI).

- If he were to use lambda, he would not have had to worry half as much. He wouldn’t have needed a load balancer. Also, he was using a third party ssl certificate when he could have had a free one that was automatically managed by AWS with Amazon Certificate Manager. Using API Gateway and just using a lambda proxy would have done the same thing. No, there is no “lock-in” from using lambda and a lambda-proxy interface. You write your same C#/WebAPI, Node/Express, Python/Django code just like you always would and add the proxy on top of it.

I don't support the CF is would have been easier than TF argument

The CF console is opaque, slow and worst of all lazy evaluates. This is exacerbated when doing anything docker service related. If you make a mistake it can be three hours before an update timeout is triggered, then another three hours for a rollback to complete.

The other _really_ annoying thing is that TF gets the features first. For example its not possible with CF to use SSM variables in environment vars. But as TF uses the AWS API, it can.

I dislike TF because I've not used it overly much, and it makes me have horror flashbacks of puppet.

However, what you say about the lambda is basically spot on. They are much simpler and easer, "true" serverless if you will. Or more likly /cgi-bin/lambda, if you are old enough.

The CF console is opaque, slow and worst of all lazy evaluates. This is exacerbated when doing anything docker service related. If you make a mistake it can be three hours before an update timeout is triggered, then another three hours for a rollback to complete.

How is it opaque? It’s a wizard that lets you upload a template from a file or S3. Most of the time though it’s part of my CodePipeline anyway. I don’t use the console in day to day use.

For a quick and dirty I use the CLI.

The other _really_ annoying thing is that TF gets the features first. For example its not possible with CF to use SSM variables in environment vars. But as TF uses the AWS API, it can.

Yes you can....


But there is one benefit to CloudFormation if you have the business support plan - AWS’s chat support is excellent.

CF regularly lacks features that TF has, but the list of features is constantly changing as Amazon release new features and CF eventually catches up.

The one that got me this week was that CF cannot define global Dynamo tables, you have to find an alternative way to turn them on.

Or just do a quick Google search for a custom resource....


Ignoring the obnoxiousness of “just google it” replies, I would consider having to write a custom lambda function to enable global tables to be well within the realm of “CF doesn’t support it”.

Our experience also shows that custom resources can go into a 3 hour timeout cycle if you don’t handle error cases in your lambda perfectly, making them a sub-par development experience.

Well, if you had an issue with Terraform wouldn't you first look to see if there were already a module for it?

But in the case of "global" tables, they are by definition cross region and CloudFormation stacks are per region.

Looking at the code from the link...

        for region in event['ResourceProperties']['ReplicationGroupList']:
            replication_group.append({ 'RegionName': region})
It is doing stuff across regions.

yes, I can resolve SSM variables into plain text, however key rotation requires re-deploy. (unless I'm missing something here)

However the new feature where you just dump in the SSM name in the environment variable and it unencrypts at execution is not supported in CF yet.

It was the same ECS/fargate/batch task creation for a long time.

As for the CF Console, its the same data in the CLI or on the web, the data comes from the same point.

1) it lazy evaluates, so it gets 90% of the way through a deploy only to find that there is a spelling mistake. That is inexcusable.

2) yes you can download the template via S3, but if you have a circular dependency, it won't tell you the line numbers. It won't even tell you the occurrences that are causing it to be over constrained.

3) It won't tell you the order in which your stuff will come up.

For automatic key rotation you use the new Secrets Manager.

1) there are cloud formation linters.

2) It will tell you the resource name

3) it creates a dependency graph. It only knows when things come up based on when the underlying API call returns. How else would it know until execution time?

> key rotation you use the new Secrets Manager.

True, but that gets very expensive very quickly. We keep all our config for each environment in SSM pretty cheaply. (we have >1000 config items, most of which aren't and don't need encryption)h

We basically treat SSM like Vault from hashicorp

1) We have a linters in the CI pipeline, they help but are not perfect.

2) true, but not where the references are, thats the killer bit. If you have > two people working on a script, and they each put in a !Ref or !GetAtt, its exceptionally difficult to debug.

3) once you have a dependency graph, you know the order in which things come up. You need to generate the graph before you start calling APIs.

For example, you have a a property that you've spelt wrong, isn't the right data type, or isn't valid for that type of config. All of those can be evaluated when the dependency graph is generated. However, they are not, they are differed to when the API is called.

This gets very tedious if you are using anything ECS/fargate/rds, as the debug loop can be > 3 hours. This can be mitigated if you use uniquely named items, allowing multiple stacks.

Parameter Store is a bad choice for app configuration. Yeah I use it. But I wouldn’t trust it as an organization standard. There are unknown, unpublished, unchangeable Service limits where it will throttle you.

Even when using CF, I have to aggressively use DependsOn to keep multiple calls to the Parameter Store API from being called in parallel. We gave up and used DynamoDB for configuration for non sensitive parameters and created a custom resource to create Config values.

Yeah but the Secrets Manager at $0.25/month per secret is expensive.

There are linter plug ins for Visual Studio Code and other editors. There is also the new CDK which is suppose to be able to do checks based on the real stars of your environment like whether a specified subnet exists. I haven’t used it yet.

For example, you have a a property that you've spelt wrong, isn't the right data type, or isn't valid for that type of config. All of those can be evaluated when the dependency graph is generated. However, they are not, they are differed to when the API is called.

Some things CF could be more intelligent about knowing, but for the most part, each service does its own validation when you call the underlying API.

Not related to CF but for instance the CloudWatch GetMetricData API has what seems like dozens of generic vslidations. The parameter store only allows certain formats for the keys. How would CF know that you specified an invalid key? Can you imagine how much slower Cf would be if it had to call some type of prevalidare API on each resource before it created it? To get an idea, think of how slow the new drift detection feature is and it is only doing read operations.

In all my time working with AWS, the only thing that has been consistently horrible, has been CloudFormation. I honestly have no idea why anyone would ever choose it over Terraform, or Ansible even.

Because AWS manages state rather than your CI server. Ansible is not really comparable.

If terraform saves its state to an S3 bucket it is really fairly similar.

this is spot on. drop terraform and go with cloudformation. initially set it up throgh the console to learn what needs to be done and after that write the template or even better find one someone has already written.

pro level startup: set it manually and use cloudformer.

+1 on the comments regarding wiring and lambda in non-vpc vs vpc context

also, if you want K8s, there is this thing called EKS.

> even better find one someone has already written.

Alas, that never really works with multi account setups. Its sometimes useful for seeing which magic line one is missing, but the level of incorrect boilerplate out there on github is staggering.

I have multi account code pipeline/Cloud Formation setups for deploying lambdas and related resources. Once I got the hang of it (with a lot of help from AWS support admittedly), it’s not that bad.

I probably could have figured it out, but why bother when we are paying for the business support plan?

Funny, I was just talking to an experimental quantum physicist today and one of the other people there asked him, doesn't the complexity just make your head spin? He said, "no, once you get the hang of it and you are dealing with it every day, it's not that bad.

Oh I completely agree. I felt completely incompetent when it came to AWS 18 months ago. But now with a lot of studying for 5 certifications (just as an organized company paid study plan), a lot of late nights beating my head on my desk, some green field initiatives, and abuse of our business support plan with AWS’s excellent live chat support,I’m pretty comfortable with most of the non obscure parts Of AWS outside of the Docker and Big Data/ML parts. I’ll be working on those over the next year.

I personally “grew up” while AWS was born and grew and have been slowly exposed to most of the services as they were released / enhanced.

I agree that It can be confusing as f for a newcomer. This is definitely an opportunity for training courses/labs/moocs to figure out how to make the learning curve less steep.

hah. not github. most services have cloudformation snippets in the aws docs. they mostly work.

also, compared to terraform, cloudformation is state-of-the-art when it comes to actually bringing shit up

> also, if you want K8s, there is this thing called EKS.

Which is absurdly overpriced. $1,728/yr just to run an empty cluster.

1700$/year ? sounds like a steal to me. how much are you paying a human for setting up and operating a cluster?

Nothing. There is no overhead charge for Fargate.

some so-called genius always like showing off and building k8s cluster by themselves, instead of using managed services.

I have never worked at a company that didn’t manage their own vpcs, where having it generated the Vpc for you would be an option. Hell at the current place I can’t even create security groups or Iam roles or policies.

I think all companies that are serious about developers learning should allow them to have a separate dev account with a pretty long leash to experiment.

Nice breakdown from a personal perspective, but I see the author struggling against full cloud-native. Terraform will never be good for leading-edge AWS technologies. If you need something more advanced than Cloudformation to define your infrastructure as code, try Troposphere or CDK. I think we're past the point where Terraform can exhibit the "cloud arbitrage" value that was previously sold to us.

Working through permissions in AWS does suck, and the transparency problem in why a container fails to start in Fargate is especially frustrating.

Serverless and devops in general isn't about putting ops folks out of a job. It's about leveling the playing field on both sides of the coin so that someone who focuses on application development and someone who focuses on networking and infrastructure can collaborate easier, with more solid contracts and interfaces.

So long as the more adversarial perspective persists, we'll see a lot more digging in of the heels around kubernetes. As usual, I implore my friends in ops not to rely on the inherent bus factor that the mastery of kubernetes requires for job security.

I'm a developer (mostly) and I'm brand new to serverless technology, cloud infrastructure, etc. I've never set anything up in AWS before, except for a few EC2 instances here and there. I've never used Cloudformation, Fargate, or anything like it.

I was recently asked to set up a recurring Fargate task with a container of an app I developed. Despite knowing nothing about the ecosystem, I was able to get it up and running with Terraform in a few days, with pretty much no headaches, friction, or confusing issues. I didn't touch the AWS console even once, other than to verify that it appeared to be set up correctly after running "terraform apply".

I considered using Cloudformation, but it just seemed kind of messy and confusing to me. It also felt very visually noisy. (And YAML wasn't the issue; I read and write YAML very often, and actually really like YAML and use it in all of my personal projects.)

Despite going into it totally blind, by contrast, I found Terraform very intuitive and sensible within the first few minutes of using it. It felt much more structured and organized, and just generally simpler to read and write. I also liked that it was provider-agnostic. And the "terraform plan" feature was super helpful.

I "cheated" a little bit by using these Terraform templates as a base: https://github.com/turnerlabs/terraform-ecs-fargate-schedule.... I didn't have to change too much to get things working. I'm sure it would've taken longer if I wrote the whole thing from scratch.

I guess I'm asking: am I wasting my time investing in learning things like Terraform? Every reply in this thread is either recommending AWS-native alternatives to Terraform, criticizing Terraform, or both. For some reason I thought it was generally recommended over Cloudformation due to the lack of lock-in.

If terraform gets the job done and seems easier, then that is a good reason to use it. Especially as it seems that there are more boilerplate examples for terraform out there.

But no lock-in is not a reason at all. There is effectively no amount of terraform config that will plug and play into a different cloud.

It seems that it was pitched that way. But if you write anything in terraform for AWS you will have to fully rewrite it for GCP or DO. But at least you will be used to the syntax and conventions.

That’s the problem. Of course AWS will have more boilerplate examples with CF. CodeStar is a whole library of CF based deployments. When you export a definition from a lambda you get CF.

If you ever work for a company that uses AWS, they probably have a business support plan where you can get instant and excellent customer support on CF.

Of course, if you use Terraform you likely wouldn’t need any support in the first place.

Knowing Terraform is going to be a bit more helpful if you ever work with something other than AWS too.

Right because no one ever had issues with something as complicated as infrastructure set up.

All the provisioners are different. If you have to learn new cloud infrastructure. Learning the IAC configuration is the least of your troubles.

Thanks, that makes sense. I think a consistent syntax and data structures is definitely a still big benefit, even if it's not actually semantically vendor-agnostic. I think I'll probably keep using Terraform.

> If you need something more advanced than Cloudformation to define your infrastructure as code, try Troposphere or CDK

The author has already hit a use-case that is not currently supported by Cloud formation (creating a SecureString SSM parameter)--that's something that Troposphere or the CDK won't help you with (unless you use them to create the resource via the API rather than Cloud formation).

Actually, if the author had used CloudFormation, they could have a password-type (NoEcho) Input, and then they wouldn't need to deal with SSM at all. Much less hassle, and just as (in)secure.

To do it properly, the author could have created an AWS KMS key, encrypted the password with that key, and then stored the encrypted password in source control. Then password goes in as a regular, plain-text parameter, and the container decrypts it thanks to having the proper IAM permissions for the AWS Key. It's slightly more complicated but you get 1) ability to store the (encrypted) password in source control and 2) the ability to let anyone work with the stack without giving them the plaintext pw.

If you do want to stick to SSM for whatever reason, you can also use SSM parameters as Inputs to CloudFormation, so you can use a custom resource OR an out-of-band process to set the password up. It's wouldn't be great, but it's not the right tool for the job in the first place.

He was willing to use external Terraform modules why not a CF custom resource? A quick Google search found this:


Yep, that's what I use as well. It works, but I don't really love that I have to create an IAM role and a Lambda function just to create an encrypted secret.

Pre-built custom resources (like the one linked) are under not all that common. The fact that there's one that exists for this particular situation I would ascribe more to luck, then a generous amount of public material.

Not being able to create a secure string parameter was the first problem I ran into with CloudFormation. That’s how I happened to know that a custom resource for it existed.

But, creating a custom resource is relatively easy. I’ve had to create a few for things that are really “custom” to our environment.

I used these as templates —-


One issue that really seemed like an oversight is that you can’t add a event subscription to an existing S3 bucket. I had to write a custom resource to do it.

> Not being able to create a secure string parameter was the first problem I ran into with CloudFormation.

How about 'NoEcho' type of CFN parameters?

What do you mean?

CloudFormation supports the "NoEcho" option specifically to allow password-type parameters, which are not inspectable. How is that not a secure string parameter?

Or you can just use GKE and spend the rest of your day drinking at a resort of your choice. Fargate and ECS are unnecessarily complicated for most people’s workflows and use cases.

The author is on point with regards to the overall complexity of the setup.

I wanted to add that we switched from ECS without Fargate (managing our own EC2 instanced) to ECS with Fargate and it has been a pure improvement. It’s simpler and easier to manage with less moving parts. I would recommend to anyone currently on ECS to give Fargate a try

I put my intermittent jobs on fargate and long running on ECS back when price difference was there - now wish all on fargate - you have to keep agent up on the ec2 ECS hosts etc, every 6 months I need to tweak ec2 side but they manage fargate in a way that requires little maintenance

But can you specify spot instance pricing for Fargate containers? I don't think you could last I checked. Spot instances are 0.25x the cost of normal EC2 instances often, so ECS with Spot launch configs are super-cheap.

Fargate offers no variable pricing to my knowledge. No spot instances, no reserved. You pay per CPU and memory unit per hour, that’s it.

ECS/Fargate and Kubernetes are different, for sure, but it seems like the author is conflating the one-time setup costs of Fargate with the continual maintenance costs of running Kubernetes on VMs which you maintain yourself, which doesn't seem like a fair comparison to me.

He acknowledges your point in the article but says that most people don't have to administer their own cluster VMs - GKE makes it easiest but he also mentions a smooth experience with DigitalOcean's managed Kubernetes, for example.

Fargate's purpose was never to magically make container orchestration go away, though - it's just a managed compute layer you plug into via the other orchestration frameworks (ECS or EKS). Really the apples-to-apples comparison should be DigitalOcean's managed Kubernetes with EKS (and there are a lot of valid points to raise there, to be sure).

The real value-add of Fargate is pay-for-what-you-use, when your task queue is idle and your bill drops to zero with no extra orchestration (something that the article seemingly completely missed).

The level of orchestration required to achieve this on GKE is pretty minimal.

Trying to migrate my $work from hosting on manually configured onprem physical servers, I evaluated OpenShift, K8s, Docker Swarm and AWS.

I had drank the Docker kool-aid pretty hard and spinning up individual nodes was easy, until I started projecting forward what the final architecture would look like - patching, failover, backup, disaster recovery etc.

Eventually I convinced my boss to pay for Heroku and went home early.

This mirrors my experience. Infrastructure as code is hard. I’ve tried to rely on VSCode extensions for some help getting configuration values, but at the time the best that was available was some snippets. I want the editing experience to have less memorization and alt-tabbing to docs, and more expression and validation.

Fargate isn’t less config than a helm chart, but you’re on your own, in that there isn’t much of a public config ecosystem that I’m aware of. I also wrote it off due to 2.3-3X instance pricing vs EC2.

One of the biggest factors that intimidates developers in my experience is needing to determine resources. It would seem that such a task could be done with automatic testing, but I haven’t found or developed a technique to do so. In theory I would have my script test different RAM/CPU values against a load tester, have live cost data and compute an optimum. I suppose one can come from the opposite angle and determine the program’s needs based on data structures and libraries. That is beyond my skills.

I suppose network interface throughput would be another factor, but unfortunately with Fargate and Lambda that is largely opaque ( and I believe linked to CPU allocation ).

This just goes to say that in addition to managing the complexity of configuration code, the underlying abstractions are harder to grok than a VM or even Kubernetes nodes and resource allocations.

There's a nice CLI tool that makes FarGate feel like a PAAS [0]. It configures ALB, ECR and your FarGate task definitions and services for you. All you need to deploy a FarGate service is a Dockerfile and some code. Hoping this project gets more love.

[0]: https://github.com/jpignata/fargate

So I migrated everything to Fargate.

Then I started questioning, is this any better than 1 container per ec2 instance and running docker myself?

Assume that 1 fargate task definition == 1 ec2 instance.

I can't really think of any way that fargate would win in cost.

The management overhead for ec2 seems to be limited to starting docker daemon and launching your containers.

Should I migrate off of fargate onto ec2?

If you're running the tasks 100% of the time and they take enough resources to require one EC2 instance worth of resources then running ECS targeting EC2 is more cost effective. In general most of AWS's "serverless" solutions are only cost effective if you can switch them off at some point.

Where Fargate shines is situations that require rapid scaling or where the infrastructure can be more on-demand. The startup time for Fargate is lower than EC2 in my experience.

I would advocate for continuing to use ECS instead of self-managed docker, just use reserved EC2 instances.

Interesting. Most of my containers clock in at around 4GB total, stored in ECR.

Average task start time is about 3min on fargate. And there is no caching at all built-in.

I wish it was more like 5secs... :(

Really I wish I could run my app on Lambda but my app is too large at the moment, and has a lot of large/streaming/otherwise unusual HTTP requests that may be a bear to integrate with lambda...

Just a quick note article didn't mention: smallest Fargate instance you can get is 512mb, around 15$ per month (+ vpc costs). But, If you have a compose file, then you can fit multiple containers in a single instance.

Not sure if this is based on their earlier pricing, as I know there was a price reduction, but I'm calculating ~$9 per month at 0.25 vCPU + 0.5GB memory.



It might be, I relied on a google search. Thanks for the update.

I can't help but think that using the Now framework would have made the author way, way happier. Point now at your Dockerfile, and bam, everything is running in a minute.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact