While I'm not disputing the limit numbers or whatever hardship they might cause, it's worth noting that this is a basic limitation of the original Lambda model and maybe FaaS in general. The capability comes from running a giant pseudo-infinite mesh of isolated execution environments that load your code and execute on demand, while having to buffer both the request and response to make sure clients are protected from the details. This buffering means that size of the buffer will always be limited - the team managing might make the buffers bigger based on experience, but it's not a solved problem.
ALB to containers or servers is a different beast - here the entire request and response need not be buffered at all (there might still be a very small buffer, mostly negligible), so streaming responses, websockets etc become possible.
We use lambda to resize images, so we do push against these limits a bit, but it's a fair tradeoff for the advantages - no worries about CPU throttling from too many requests, no waiting for servers to start for spiky loads etc.
Lambda was not designed for request/response. It’s an event driven service. Wrapping API gateway around it is an architectural blunder, and leads to folks like the GP wondering why their use case is a shitty fit.
There is nothing inherently asynchronous about the Lambda product, unless you’re talking about the Node.js runtime and even then that’s more about Node than about Lambda.
Each Lambda invocation gets a dedicated VM for the duration of the request. It is a great match for synchronous code.
That is a mis-statement. Lambda executes functions in response to events. It is totally asynchronous with regards to its execution triggers.
Lambda does reuse VMs, so I hope you aren’t relying on containers being discarded for any integrity or security outcomes.
All the responses in this thread illustrate to me that AWS needs to put more effort into socialising how the product works. Since I was physically in the room for Lambda’s AWS internal launch this is twice disappoint because the technical messaging then was very clear and compelling.
ALB to containers or servers is a different beast - here the entire request and response need not be buffered at all (there might still be a very small buffer, mostly negligible), so streaming responses, websockets etc become possible.
We use lambda to resize images, so we do push against these limits a bit, but it's a fair tradeoff for the advantages - no worries about CPU throttling from too many requests, no waiting for servers to start for spiky loads etc.