
AWS Lambda – Best practices - maingi4
https://cloudncode.blog/2017/03/02/best-practices-aws-lambda-function/
======
throwaway2016a
As someone who uses Lambda heavily I find this post somewhat disappointing
from a "best practices" standpoint.

No mention of Lambda best practices like:

\- Using CloudFormation

\- IAM policies

\- Managing config

\- How to handle databases and connection pools

Instead we get a logging suggestion that while interesting (I too prefer
ElasticSearch to CloudWatch logs) is definitely not Lambda best practice.

And suggestion about choosing the right timeout (which is not necessarily
helpful since the default timeout is 3 seconds so the best practice could
be... "Leave the default alone unless you have a really good reason to raise
it")

As far as the logging piece goes, if you are to use ElasticSearch I would
strongly recommend putting the logs in CloudWatch logs first then using
something like Logstash with a CloudWatch logs source to populate it into
Elasticsearch as to not add another network dependency to your Lambda
function.

~~~
hendry
I like in [https://serverless.com/](https://serverless.com/) 's serverless.yml
how it provisions the CloudFormation stuff and gives you a way to manage IAM
permissions.

~~~
jaxondu
AWS has its own Serverless Application Model (SAM)
[https://github.com/awslabs/serverless-application-
model](https://github.com/awslabs/serverless-application-model) released last
Oct. Rather than learning another framework, you'd use the same CloudFormation
yaml syntax to define your stack and apps. It simplifies calling Lambda
functions by providing a higher level abstraction triggers called Events. This
makes it simpler to create Lambda functions for Rest API, Cron schedule, IoT
devices, create Alexa skills, response to S3 events etc. Then its just "aws
cloudformation package" and "aws cloudformation deploy" to create and update
your app. It appears AWS team is learning the best from Serverless Framework.
If you not mind the vendor lockin, SAM is the way to go. Serverless Framework
has the promise of using the same tool against AWS Lambda, Google/Azure
Functions and IBM OpenWhisk.

~~~
throwaway2016a
Personally I use the Serverless framework which supports other cloud providers
and is generally more on the cutting edge than SAM (and pre-dates SAM).

But SAM is certainly a valid option if you are OK with vendor lock in. (which
many people are)

------
joecot
I'm amused by the off-hand note about maybe using DynamoDB instead of a
relational database, because the biggest issue with connecting your Lambda
functions to MySQL is that using Lambdas within a VPC is absolutely terrible
right now.

Sure, you can make lambda IP subnets in your VPC, assign your lambda to a
subnet, give it a security group, and then you can easily setup your security
group rules for your lambdas to connect. It's quite snazzy! Until you realize
that cold startup times for your lambda normally are around a second, and when
you have it running in your VPC, it takes 5 more seconds for it to make the
VPC network interface. If you're using lambdas to make a REST API, this turns
an occasional slight slowdown to a complete show stopper for your users.

There doesn't appear to be any good solution to this right now. You can host
your Lambda outside the VPC, but then you essentially need to open your MySQL
security group to the world (or every AWS Subnet, which is about the same
thing). You can make a request to your Lambda function every 15 minutes, which
is a terrible hack but works, except when you start approaching 50+ lambda
functions you have to make heartbeat requests to every 15 minutes.

The issue is somewhat mitigated if, instead of leaning towards the natural
inclination to make a separate Lambda function for each possible API request,
you group API requests together, or use one Lambda function for your entire
need. If your service is small in scope anyway, it won't particularly affect
performance, and there will be less likelihood the lambda will go dormant/less
functions you need to make heartbeat requests to. Serverless Framework has a
good overview of the coding pattern options:
[https://serverless.com/blog/serverless-architecture-code-
pat...](https://serverless.com/blog/serverless-architecture-code-patterns/)

And yes, if you just jump straight from traditional server architectures
straight to Lambda functions and Dynamodb NoSQL, you don't need VPC
connections to connect to Database servers. But some of us change architecture
paradigms one step at a time, Amazon!

~~~
collyw
I am curious about the type of workloads that you do that will require
serverless architecture yet are backed by a MySQL database. It was only 5 or
so years ago that we were told that relational databases didn't scale so we
all needed NoSQL.

~~~
joecot
Is Lambda only for workloads that _require_ serverless architecture, or is it
also for workloads for folks that would like to be running serverless?

I have a variety of web application APIs running on webservers in a VPC, with
an RDS MySQL instance for backing. I'm interested in writing our new
development using Lambda, but I'm still using MySQL as the database backend
for now, and I still need to interact with existing APIs in the VPC. Perhaps
someday I will have moved all those other applications to Lambda as well, and
perhaps someday I will port all of our database functionality to DynamoDB, but
I don't think that should be required in order to just get started with
Lambda.

If I was writing everything from scratch, maybe I'd go full serverless. But
I'm not in that case, and I'm also not in a case where MySQL won't "scale" for
me at all (I wish I had that problem). If Lambda's not for me, any Amazon
engineer can feel free to tell me so, but given their marketing, I did not
think this was the case. They offered Lambda VPC for a reason, even though it
doesn't work very well currently.

~~~
collyw
OK, but serverless without good reason sounds like resume driven development
to me. And more hassle than what its worth. I prefer to have sound decisions
behind my architectural choices. Tried and tested is a good reason to go with
"normal" architecture for me.

------
wiredfool
Every time I look at Lambda+API Gateway for a web api, there's always been one
little thing that makes things way harder than they should be.

A while ago (pre proxy), it was nearly impossible to send back 302 redirects
cleanly.

With LAMBDA_PROXY, it's not clear that you can actually return binary data
through the Api Gateway. There's the undocumented isBase64Encoded parameter,
but as far as I can tell, it's ignored. It doesn't trigger an error, but it
doesn't decode base64 either.

Just this week I'm working on moving a static html site + a couple of dynamic
urls over. I can work around the binary issue, but serving images is a solved
problem in the rest of the world.

~~~
throwaway2016a
Binary data is tricky with Lambda proxy (and by tricky I mean... I haven't
gotten it to work). But I also haven't tried much. Typically best practice for
me has been:

\- When dealing with uploads have an API call to get a pre-signed S3 upload
URL

\- When dealing with downloads using HATEOAS and/or the Location header to
send the client to S3 to grab the actual file.

It requires some client-side designing around eventual consistency but it
works well and is very scalable.

~~~
breandr
You need to specify your MIME type (most commonly, application/json) to be
treated as binary (from console: API -> Binary Support). In your Lambda
function: `callback(null, { body: base64EncodedBody, status: 200, headers: {},
isBase64Encoded: true })`

~~~
wiredfool
The mimetype of the response from lambda? Or the mimetype in the headers?

------
schuyler2d
We've had a lot of success deploying python web apps with Zappa:
[https://github.com/Miserlou/Zappa](https://github.com/Miserlou/Zappa)

The value is that it does deploy, the route-mapping, config management, and
also allows you to 'keep-warm' if you want to keep the backend responsive even
after zero/low traffic.

My hope is that best practices are going to continue being distilled into
these deployment/wrapper frameworks.

------
013a
After using lambda for a while, the opinion I've landed on is that the concept
of serverless is a powerful one in specific domains, but (1) being the
zeitgeist it is encouraged for use in domains where it doesn't belong, and (2)
AWS Lambda itself isn't very good, for reasons that are specific to Lambda,
not to the concept of serverless.

~~~
jedberg
I'm curious if you could expand on:

> AWS Lambda itself isn't very good, for reasons that are specific to Lambda

------
stlava
I run a fairly high scale pipeline on lambda. The post touched briefly on
logging but wanted to chime in with a few helpful points:

\- Format all logs in json. This allows you to use the json path filtering
option on CW.

\- Add function version & tag to the log lines. CW doesn't allow you to filter
based on version.

\- Don't expect CW log querying to work all the time.

\- Output runtime stats to CW metrics.

edit: formatting

------
ahallock
> My opinion, when going serverless, I wouldn’t think of using RDBMS’s
> (although you can) and instead use databases like Elasticsearch, DynamoDB,
> Mongo etc.

This is a vague statement that's not really based on evidence, and I'm not
sure why using Postgres, for example, is going to be so much worse for most
use cases than these NoSQL solutions.

~~~
joecot
As I explain over here
([https://news.ycombinator.com/item?id=13774041](https://news.ycombinator.com/item?id=13774041)),
trying to use Lambdas with your existing servers, like Postgres, can cause
some serious complications.

~~~
bpicolo
If that's the case, seems like a great argument against using lambda ;)

Side note, is that just startup time for a single lambda `node`? Won't you be
running >>1 node and not really care about startup time for any single node?

~~~
joecot
It's every time you have to start up a lambda instance.

So if you're low, continuous traffic, it'll happen once and you won't really
care. If you're high traffic, occasionally different instances will get
started up, will also need their node network interface, and also take 6+
seconds. And if you're low traffic but not continuous, if there's a pause for
more than around 15 minutes (entirely depends on how much demand there is for
lambda and how rapidly they clean up old containers), for the next request you
get you'll have to cold start again.

------
danielhunt
Unfortunately, there's absolutely nothing here regarding things like git, or
CI/CD, or how-to-use-it-when-a-medium-to-large-team-is-involved

If anyone has any research, thoughts, or comments on this side of Lambda, I'd
love to learn more

~~~
keithwhor
Hey Daniel,

We've been experimenting with a lot of best practices around serverless
development with StdLib [1]. Everything from versioning, project and team
management, to authentication and billing. Basically --- we acknowledged that
a lot of these aspects were surprisingly difficult and created a "git-esque"
workflow for serverless function management ("lib up" to deploy, "lib get" to
retrieve a current version of a function, etc.) - would love your feedback if
you have time.

(Disclaimer: Founder.)

[1] [https://stdlib.com/](https://stdlib.com/)

------
wyldfire
Is there any easy way to tell what kind of things are infeasible/inappropriate
for Lambda? I went looking for limitations and found code size/fd counts/etc
[1] but I want to know more general things like "can I include native code? (a
DSO to be used by python code, e.g.) -- if so, what arch?", "what AWS services
can I/should I access?", etc.

[1]
[http://docs.aws.amazon.com/lambda/latest/dg/limits.html](http://docs.aws.amazon.com/lambda/latest/dg/limits.html)

------
intrasight
I tried AWS Lambda early on. But I decided to go with the new Azure Logic Apps
since I mostly code in C#. And then Lambda added C# support, so I need to find
the time to play around with it some more.

------
boyter
The biggest thing I have found about lambda is that if you plan to deploy with
cloudformation backed by s3 you either need to do some hacks to ever update
your function, manage versions in s3 or change the filename in s3 and
cloudformation to trigger an update. It's a rather annoying issue.

I really wish that there was an always update flag on the cloudformation
property to resolve this.

~~~
frogperson
Have you tried adding a timestamp or git sha to your CF input params?

------
eof
Maybe I am too late too this party but I have a question:

Is there any framework for handling routing and deployments?

What I mean is, is there ane xisting solution where I can make JS files to
routes, and have those routes and js files be loaded into Lambda and configure
the API end point; so that if I need to make an update to the lambda function
i can just update the file (and re run the framework)

~~~
breandr
[https://github.com/awslabs/aws-serverless-
express](https://github.com/awslabs/aws-serverless-express)

------
mwcampbell
Can a given Lambda container process ever be used for multiple concurrent
function invocations? If not, then we're dealing with at least 128 MB of RAM
per concurrent request. Multithreaded and event-driven servers can do much
better than that.

~~~
Just1689
Yes

See: [https://aws.amazon.com/blogs/compute/container-reuse-in-
lamb...](https://aws.amazon.com/blogs/compute/container-reuse-in-lambda/)
[http://stackoverflow.com/questions/37523128/how-aws-
lambda-c...](http://stackoverflow.com/questions/37523128/how-aws-lambda-
container-reuse-works)

~~~
ex2
Actually, the answer is "no".

------
brilliantcode
I gave up on Lambda when I realized it was cheaper to use a $5/month VPS or
EC2 which is cheaper and more responsive without cold startup boot time.

There will be niche use cases for Lambda but Serverless(TM) is just another
passing buzzword. It won't see wide adoption unless the cold startup and costs
are significantly lowered than current solutions.

