That to me sounds like an astoundingly high number, especially for the amount of resources dedicated to the test (400GB of memory and 200 CPUs). Hell, if you're monitoring uptime, that's one nine (97% success).
I can't believe that this is a well-configured setup. Running 50 instances with four CPUs each at a total of ~100rps means that half of the CPUs are probably doing exactly nothing (and if my understanding is correct and they're using Flask in a single-threaded way, 150 of the 200 CPUs are going to be idle).
Triggering SNS is an API call. Assuming that's all the test application is doing, you hardly need one server to do this. I'd bet that you could make 100 simultaneous API calls from a stock Macbook Pro with a small handful of Node or Go processes without even making your CPU spin up.
If Fargate can't handle 100rps (to make a single API call per request) with <10 instances, it's a useless product. But I find it hard to believe that Amazon would put something so absolutely incapable into the wild. With the specs the author put up, that's the equivalent of ~$10/hr (if I'm reading their pricing page correctly). You could run 380 A1 instances, 58 t3.xl instances, or two m5a.24xlarge instances for that price.
It can. Easily. I've done it.
> I can't believe that this is a well-configured setup.
The author hasn't provided anywhere near enough information to gather any useful information from this. I don't know why there were errors (the other doesn't give any precise information), I don't know how long the tests were run, the state of instances before the tests were run, what operations were being performed, how the testing was being performed, or numerous other things.
Simply put, it's not useful.
For Fargate, the ideal scenario in my workloads is async background processing. Add tasks to SQS, Fargate pulls tasks off and does the job. Elastically scales up or down and lots of flexibility on machine specs. Ok with some failure rate.
For AWS Lambda, recently I like the combo of adding a Cloudflare worker in front and using it as an API gateway. More flexibility on routing, faster performance, and reverse proxy for free which can be good for SEO. And you get all the goodies of Cloudflare like DDOS protection and CDN.
However, I think containers and Lambda can and do serve this particular use case -- handle an API request and forward it to a different system. And Fargate is a good stand-in for containers generally. I wouldn't expect much different performance by using ECS or EKS or EC2 -- it's still going to be a load balancer forwarding to a container instance.
Definitely not perfect, but I think it works as a general approximation. For this particular common use case, you have three options. Was curious to see the perf differences between them.
* Original author here.
Then invoke lambda with this:
Important to note that workers have a 15s timeout, so this is really only good for routing. You probably don’t want this to manage tasks that could potentially take longer.
Not true -- the timeout on outgoing HTTP requests is (I think) 100 seconds (or unlimited as long as data is streaming).
The 15-second limit you may be thinking of is that Workers used to not let you start new outgoing HTTP requests 15 seconds into the event, but already-started requests could continue. This limit was recently removed -- instead, Workers now cancels outgoing requests if the client disconnects, but as long as the client is connected, you can keep making new requests. This was changed to support streaming video use cases where a stream is being assembled out of smaller chunks.
(I'm the tech lead for Workers.)
So the best solution is to reverse proxy so that internet traffic hits /blog, but the worker is actually forwarding the traffic to your internal service.
In fact it can be even more performant. You can use Workers KV to cache as well. So a request comes in, check KV store, return if found. If not, pull from asset CDN.
KV is a global persistent data store, so reads and writes may have to cross the internet. In comparison, the Cache API reads and writes from the local datacenter's cache. Also, Cache API doesn't cost extra (KV does).
However, better than either of these is to formulate your outgoing fetch() calls such that they naturally get the caching properties you want. fetch() goes through Cloudflare's usual caching logic. When that does what you want, it works better because this is the path that has been most optimized over many years.
The two sentences right before what you quoted are helpful:
"I’m not a Docker or Flask performance expert, and that’s not the goal of this exercise. To remedy this, I decided to bump the specs on my deployments.
The general goal for this bakeoff is to get a best-case outcome for each of these architectures, rather than an apples-to-apples comparison of cost vs performance."
I wasn't trying to squeeze out every ounce of performance and determine the minimum number of instances to handle 100 req/sec. I was trying to normalize across the three patterns as much as possible to see best-case performance. I didn't want resource constraints to be an excuse.
What'd also be interesting here would be a price comparison. Without having done the math I'd expect Fargate to be significantly more expensive than the other solutions, which'd make a nice trade-off of cost vs. performance: If performance matters choose Fargate, if cost matters choose API Gateway as service proxy.
I thought about doing a price comparison, but that gets really tricky. At some point, you're testing the skills of the tester more than the services themselves.
Agree with your expectations though. I think Fargate is naturally faster and can be moreso given the number of knobs you can tune. This will likely cost you more, both in direct resource costs and in engineering time trying to fine-tune the knobs. Whether that's worth it depends on your business needs.
So long as the approaches meet real world thresholds for specific things, particularly the speed of certain time-sensitive requests - then the technology is 'viable'.
(And let's also assume 'reliability' as a key component of 'viability')
Once the tech is 'viable' - it's really a whole host of other concerns that we want to look at.
#1 I think would be the ability of the tech to support dynamic and changing needs of product development.
Something that is easy, fewer moving parts, smaller API, requires less interference and support from Devops - this is worth a lot. Strategically - it may be the most valuable thing for most growing companies.
'Complexity' in all it's various forms represent a kind of constant barrier, a force that the company is going to have to fight against to make customers happy. This is the thing we want to minimize.
Obviously, issues such as switching costs and the 'proprietary trap' are an issue, and of course 'total cost of operations' i.e. the cost of the services are important basis of comparison, but even the later is an issue later on for the company, once it reachers maturity. (i.e. A 'Dropbox' type company should definitely start in the cloud, and not until they have the kind of scale that warrants unit-cost scrutiny would they consider building their own ifra)
In the big picture of 'total cost of ownership' - it's the ability of the system to meet the needs of product development, now 'how fast or cheap' it is, that's really the point.
Something that is 2x the cost, and 10% 'less performant' - but is very easy to use, requires minimal devops focus, and can enable feature iteration and easy scale - this is what most growing companies need.
Unless performance or cost are key attributes and differentiators of the product or service - then 'take the easy path'.
This is for hello world application. This is insane. Single EC2 HVM instance would have better performance.
> Something that is 2x the cost, and 10% 'less performant' - but is very easy to use, requires minimal devops focus, and can enable feature iteration and easy scale - this is what most growing companies need.
Cost is close to 50x and Fargate is not necessarily lower overhead. Terraform + Packer + EC2 vs ECS + Docker + Fargate is pretty the same for me. You still need to build images and manage the deployment lifecycle.
Lambda itself doesn't really take maintenance.
Routing can be done in the app, you only need one - or a small number of 'endpoints' so messing with API gateway can be minimized.
You can run any sized app with a single API Gateway endpoint, and a single lambda, on a simple node.js setup.
I can't see any reason to use containers on EC2 or containers on Amazon's container service until an app gains quite a degree of sophistication.
> When I ran my initial Fargate warmup, I got the following results. Around 10% of my requests were failing altogether!
> To remedy this, I decided to bump the specs on my deployments.
> The general goal for this bakeoff is to get a best-case outcome for each of these architectures, rather than an apples-to-apples comparison of cost vs performance.
> For Fargate, this meant deploying 50 instances of my container with pretty beefy settings — 8 GB of memory and 4 full CPU units per container instance.
One of the AWS container advocates mentioned it was likely something like the nofile ulimit: https://twitter.com/nathankpeck/status/1098992994131283968