
Ask HN: How was your experience with AWS Lambda in production? - chetanmelkani
I would like to hear from people who have used AWS Lambda in production, how was there experience with it.
It would be great if you have references to project repositories.
======
callumlocke
I made an image hosting tool on Lambda and S3, for internal corporate use.
Staff can upload images to S3 via an SPA. The front end contacts the Lambda
service to request a pre-signed S3 upload URL, so the browser can upload
directly to S3. It works really well. Observations:

1\. Took too long to get something working. The common use case of hooking up
a Lambda function to an HTTP endpoint is surprisingly fiddly and manual.

2\. Very painful logging/monitoring.

3\. The Node.js version of Lambda has a weird and ugly API that feels like it
was designed by a comittee with little knowledge of Node.js idioms.

4\. The Serverless framework produces a huge bundle unless you spend a lot of
effort optimising it. It's also very slow to deploy incremental changes
_edit:_ – this is not only due to the large bundle size but also due to having
to re-up the whole generated CloudFormation stack for most updates.

5\. It was worth it in the end for making a useful little service that will
exist forever with ultra-low running costs, but the developer experience could
have been miles better, and I wouldn't want to have to work on that codebase
again.

\---

Edit: here's the code: [https://github.com/Financial-Times/ig-images-
backend](https://github.com/Financial-Times/ig-images-backend)

To address point 3 above, I wrote a wrapper function (in src/index.js) so I
could write each HTTP Lambda endpoint as a straight async function that simply
receives a single argument (the request event) and asynchronously _returns_
the complete HTTP response. This wouldn't be good if you were returning a
large response though; you'd probably be better streaming it.

~~~
kesor
The problem of logs is actually a problem of CloudWatch Logs being just not a
very good service. A great way to solve that is to push all logs from
CloudWatch Logs into an ElasticSearch cluster (using a Lambda function). AWS
even has the code already done for you if you click the "subscribe" button in
CWL. Then with Kibana/ElasticSearch the experience of inspecting and analysing
logs is MUCH better.

~~~
adjohn
We built IOpipe[1] to address these issues by offering our own wrapper[2] that
sends telemetry to our service. IOpipe aggregates metrics, and errors, and
allows the creation of alerts with multiple rules per alert.

[1] - [https://iopipe.com](https://iopipe.com) [2] -
[https://github.com/iopipe/iopipe/](https://github.com/iopipe/iopipe/)

~~~
munns
+1 to the folks at IOpipe. A really cool product that gives you very
interesting visibility into your Lambda function executions!

------
scrollaway
We use AWS Lambda to process Hearthstone replay files.

My #1 concern with it went away a while back when Amazon _finally_ added
support for Python 3 (3.6).

It behaved as advertised: Allowed us to scale without worrying about scaling.
After a year of using it however I'm really not a big fan of the technology.

It's opaque. Pulling logs, crashes and metrics out of it is like pulling
teeth. There's a _lot_ of bells and whistles which are just missing. And the
weirdest thing to me is how people keep using it to create "serverless
websites" when that is really not its strength -- its strength is in
distributed processing; in other words, long-running CPU-bound apps.

The dev experience is poor. We had to build our own system to deploy our
builds to Lambda. Build our own canary/rollback system, etc. With Zappa it's
better nowadays although for the longest time it didn't really support non-
website-like Lambda apps.

It's _expensive_. You pay for invocations, you pay for running speed, and all
of this is super hard to read on the bill (which function costs me the most
and when? Gotta do your own advanced bill graphing for that). And if you want
more CPU, you have to also increase memory; so right now our apps are paying
for hundreds of MBs of memory we're not using just because it makes sense to
pay for the extra CPU. (2x your CPU to 2x your speed is a net-neutral cost, if
you're CPU-bound).

But the kicker in all this is that the entire system is proprietary and it's
really hard to reproduce a test environment for it. The LambCI people have
done it, but even so, it's a hell of a system to mock and has a pretty strong
lock-in.

We're currently moving some S3-bound queue stuff into SQS and dropping Lambda
at the same time could make sense.

I certainly recommend trying Lambda as a tech project, but I would not
recommend going out of your way to use it just so you can be "serverless".
Consider your use case carefully.

~~~
gallamine
Can you clarify this point:

> its strength is in distributed processing; in other words, long-running CPU-
> bound apps.

it seems to me that's an explicit non-usecase for Lambda given it limits
sessions to < 5min per invocation.

~~~
scrollaway
I should have been more specific yes. I actually meant CPU-or-network-bound
tasks with a predictable, sub-5min runtime.

Like I said we use it for game replay processing, so that's 5-15 second tasks
that read and parse log files and hit a db and s3 with the results.

Other suitable tasks: image resizing, API chatter, bounded video transcoding,
etc. Lambda is pretty good at distributed processing (as long as you can make
the bill work in your favour, over hosting your own overprovisioned fleet).

------
munns
Hey all, My name is Chris Munns and I am currently the lead Developer Advocate
for Serverless at AWS (I am part of the Lambda PM team). We really appreciate
this feedback and are always looking for ways to hear about these pain points.
Can email me directly: munns@amazon.com if you ever get stuck.

Thanks, \- Chris

~~~
runamok
I am on a devops team trying to implement lambda by splitting pieces off from
a monolithic Java app where applicable.

The biggest pain point I have is because we have multiple "environments" (such
as dev, da, staging) in the same amazon account and because lambdas are global
I can't limit access to resources via IAM easily without hacks.

Aka, because the same lambda will be used (but different versions and/or
aliases) on all environments I can't marry the code and configuration to limit
access to say RDS or elasticache or an S3 bucket per lambda.

I feel like I need a higher order primitive (a lambda group that is role +
configuration can live in that includes the lambdas) to achieve this. I
realize api gateway has the concepts of stages but currently the idea is for
some lambdas to be invoked directly by the monolithic app or via SNS/SQS
async.

Otherwise I could namespace my lambda functions which is hacky and make
DevFooBar, StageFooBar, etc.

Currently we plan to split off our environments into separate AWS accounts.

~~~
munns
Hi Runamok,

Today given that aliases/versions do not have any sort of different
permissions you are probably best to either run completely different stacks of
resources or the multiple account model. With AWS Organizations these days its
not that hard to run multiple environments across accounts.

We did a webinar on some of this a few months back, the slides here might be
useful to you: [https://www.slideshare.net/AmazonWebServices/building-a-
deve...](https://www.slideshare.net/AmazonWebServices/building-a-development-
workflow-for-serverless-applications-march-2017-aws-online-tech-talks)

It heavily leverages AWS's tools, but you could create similar practices using
3rd party frameworks and CI/CD tools as well.

Thanks,

-munns

------
lanestp
We use Lambda for 100% of our APIs some of which get over 100,000 calls per
day. The system is fantastic for micro services and web apps. One caveat, you
must use a framework like Serverless or Zappa. Simply setting up API Gateway
right is a hideous task and giving your function the right access level isn’t
any fun either. Since the frameworks do all that for you it really makes life
easier.

One thing to note. API Gateway is super picky about your response. When you
first get started you may have a Lambda that runs your test just fine but
fails on deployment. Make sure you troubleshoot your response rather than
diving into your code.

I saw some people complaining about using an archaic version of Node. This is
no longer true. Lambdas support Node V6 which, while not bang up to date, is
an excellent version.

Anyway, I can attest it is production ready and at least in our usage an order
of magnitude cheaper.

------
chickenbane
I worked on a project where the architect wanted to use Lambdas for the entire
solution. This was a bad choice.

Lambdas have a lot of benefits - for occasional tasks they are essentially
free, the simple programming model makes them easy to understand in teams, you
get Amazon's scaling and there's decent integration with caching and logging.

However, especially since I had to use them for whole solution, I ran into a
ton of limitations. Since they are so simple, you have to pull in a lot of
dependencies which negate a lot of the ease of understanding I mentioned
before. The dependencies are things like Amazon's API Gateway, AWS Step
Functions, and AWS CLI itself, which is pretty low-level. So now, the
application logic is pretty easy, but now you are dealing with a lot of
integration devops. There's API Gateway is pretty clunky and surprisingly
slow. Lambdas shut themselves down, and restarting is slow. The Step Functions
have a relatively small payload limit that needs to be worked around. Etc. So
use them sparingly!

~~~
Prefinem
1\. Use a framework for deploying your lambda functions. There are a few that
will manage the API Gateway for you.

2\. Don't put the Lambda inside a VPC if you want lower response times

3\. Step Functions don't seem ready for prime time that I can tell. (This
might have changed in the last couple of months)

4\. Lambda Functions should be microservices. Small and lean.

5\. There is a limit on resources for CloudFormation so at about 20-30
functions with API Gateway on the serverless framework, you will hit a limit
and can't add anymore (other deployment tools which don't use CloudFormation
shouldn't have an issue)

6\. Want more CPU, add more RAM

~~~
striglia
What makes you say Step Functions isn't ready for prime time? We've been using
it (and SWF which it is based off) for about a year at decent scale and been
generally very happy with it.

~~~
Prefinem
When I initially set the up, there was no way to edit or delete them. I
haven't messed with them since, although, I did just check and there aren't
any step functions in my account so it looks like they deleted the old ones.

It may be time to test them out again, I just go bit pretty bad with the last
time I implemented them and lost about a weeks worth of work because of it
being un-usable.

------
CSDude
\- Monitoring & debugging is little hard

\- CPU power also scales with Memory, you might need to increase it to get
better responses

\- Ability to attach many streams (Kinesis, Dynamo) is very helpful, and it
scales easily without explicitly managing servers

\- There can be a overhead, your function gets paused (if no data incoming) or
can be killed undeterministically (even if it works all the time or per hour)
and causes cold start, and cold start is very bad for Java

\- You need to make your JARs smaller (50MB), you cannot just embed anything
you like without careful consideration

~~~
AlexanderC89
@CSDude you can check out this small tool i've written for debugging Lambda
and API Gateway integration.

[https://github.com/AlexanderC/lambdon](https://github.com/AlexanderC/lambdon)
(i know the name sucks)

Also @chetanmelkani as a hint: if you are using NodeJS runtime most optimal
from the execution time and cost efficiency perspective is setting up 512mb of
memory ;) it's about getting x2 performance boost over the 128mb
configuration.

------
thom_nic
I deployed a couple AWS lambda endpoints for very low-volume tasks using
claudia.js - Claudia _greatly_ reduces the setup overhead for sane REST
endpoints. It creates the correct IAM permissions, gateway API and mappings.

Claudia.js also has an API layer that makes it look very similar to express.js
versus the weird API that Amazon provides. I would not use lambda + JS without
claudia.

For usage scenarios, one endpoint is used for a "contact us" form on a static
website, another we use to transform requests to fetch and store artifacts on
S3. I can't speak toward latency or high volume but since I've set them up
I've been able to pretty much forget about them and they work as intended.

~~~
htormey
Oh claudia sounds interesting. I'm a mobile developer who's used express/node
before. The hardest thing for me when I was cobbling a backend together with
lambda/dynamodb was unterstanding the permission system and debugging it when
I configured it wrong.

Lots of the examples and articles around this process are out of date and
AWS's web front end can be painful to deal with. That said, when everything
was setup, it was pretty straight forward to maintain.

Do you have any links to good tutorials on Claudia? I'd love to setup a
contact me form using lambda for a project I'm working on.

Also, do you know how dos Claudia compare to stuff like serverless.js?

~~~
simalexan
Hey, one of the guys from Claudia here.

You can have tutorials and examples on the : \- Claudia.js website -
[https://claudiajs.com](https://claudiajs.com) \- Claudia Github examples -
[https://github.com/claudiajs/example-
projects](https://github.com/claudiajs/example-projects)

The purpose of Claudia.js is just to make it super easy to develop and deploy
your applications on AWS Lambdas, API Gateway, also ease up the work with
DynamoDb, AWS IoT, Alexa and so on.

There are two additional libraries: Claudia API Builder and Claudia Bot
Builder, to ease up API and chat bot development and deployment.

Regarding the contract form - the best is to create a single service that will
handle all the contract form requests. At that point, you can either connect
it to DynamoDb, or even call some other data storage / service.

Both Serverless and Claudia have their points where they shine. For a better
understanding of their comparison, you can read about it in the Claudia FAQ -
[https://github.com/claudiajs/claudia/blob/master/FAQ.md#how-...](https://github.com/claudiajs/claudia/blob/master/FAQ.md#how-
does-it-compare-to-)

~~~
htormey
Awesome answer, thanks!

------
dblooman
We have a number of different use cases at FundApps, some obvious like
automated tasks, automatic DNS, cleanup AMI's etc, to the more focused
importing and parsing of data from data sources. This is generally a several
times a day operation, so lambda was the right choice for us. We also use API
gateway with lambdas, its a small API, about 2 requests per second on average,
but very peaky during business hours, its response and uptime has been
excellent.

Development can be tricky, there are a lot of of all in one solutions like the
serverless framework, we use Apex CLI tool for deploying and Terraform for
infra. These tools offer a nice workflow for most developers.

Logging is annoying, its all cloudwatch, but we use a lambda to send all our
cloudwatch logs to sumologic. We use cloudwatch for metrics, however we have a
grafana dashboard for actually looking at those metrics. For exceptions we use
Sentry.

Resources have bitten us the most, not enough memory suddenly because the
payload from a download. I wish lambda allowed for scaling on a second attempt
so that you could bump its resources, this is something to consider carefully.

Encryption of environment variables is still not a solved issue, if everyone
has access to the AWS console, everyone can view your env vars, so if you want
to store a DB password somewhere, it will have to be KMS, which is not a bad
thing, this is usually pretty quick, but does add overhead to the execution
time.

------
beefsack
I'm running Rust on Lambda at the moment for a PBE board gaming service I run.
I can't say it runs at huge scale though, but using Lambda has provided me
with some really good architectural benefits:

* Games are developed as command line tools which use JSON for input and output. They're pure so the game state is passed in as part of the request. An example is my implementation of Lost Cities[1]

* Games are automatically bundled up with a NodeJS runner[2] and deployed to Lambda using Travis CI[3]

* I use API Gateway to point to the Lambda function, one endpoint per game, and I version the endpoints if the game data structures ever change.

* I have a central API server[4] which I run on Elastic Beanstalk and RDS. Games are registered inside the database and whenever players make plays, Lambda functions are called to process the play.

I'm also planning to run bots as Lambda functions similar to how games are
implemented, but am yet to get it fully operational.

Apart from stumbling a lot setting it up, I'm really happy with how it's all
working together. If I ever get more traction I'll be interesting to see how
it scales up.

[1]: [https://github.com/brdgme/lost-cities](https://github.com/brdgme/lost-
cities)

[2]: [https://github.com/brdgme/lost-
cities/blob/master/.travis.ym...](https://github.com/brdgme/lost-
cities/blob/master/.travis.yml)

[3]:
[https://github.com/brdgme/lambda/blob/master/index.js](https://github.com/brdgme/lambda/blob/master/index.js)

[4]: [https://github.com/brdgme/api](https://github.com/brdgme/api)

~~~
curun1r
I'm not sure if you care, but it's possible to use Neon to write native Node
modules in Rust. No need for JS glue code and better performance than spawning
a process.

~~~
beefsack
Will check that out, cheers!

------
petarb
It gets the job done but the developer experience around it is awful.

Terrible deploy process, especially if your package is over 50mb (then you
need to get S3 involved). Debugging and local testing is a nightmare.
Cloudwatch Logs aren't that bad (you can easily search for terms).

We have been using Lambdas in production for about a year and a half now, to
do 5 or so tasks. Ranging from indexing items in Elasticseaech, to small CRON
clean up jobs.

One big gripe around Lambads and integration with API Gateway is they totally
changed the way it works. It use to be really simple to hook up a lambda to a
public facing URL so you could trigger it with a REST call. Now you have to do
this extra dance with configuring API Gateway per HTTP resource, therefore
complicating the Lambda code side of things. Sure with more customization you
have more complexity associated with it, but the barrier to entry was
significantly increased.

------
cameronmaske
I've been using AWS Lambda on a side project (octodocs.com) that is powered by
Django and uses Zappa to manage deployments.

I was initially attracted to it as a low-cost tool to run a database (RDS)
powered service side project.

Some thoughts:

\- Zappa is a great tool. They added async task support [1] which replaced the
need for celery or rq. Setting up https with let's encrypt takes less than 15
minutes. They added Python 3 support quickly after it was announced. Setting
up a test environment is pretty trivial. I set up a separate staging site
which helps to debug a bunch of the orchestration settings. I also built a
small CLI [2] to help set environment variables (heroku-esque) via S3 which
works well. Overall, the tooling feels solid. I can't imagine using raw Lambda
without a tool like Zappa.

\- While Lambda itself is not too expensive, AWS can sneak in some additional
costs. For example, allowing Lambda to reach out to other services in the VPC
(RDS) or to the Internet, requires a bunch of route tables, subnets and a nat
gateway. For this side project, this currently costs way more running and
invoking Lambda.

\- Debugging can be a pain. Things like Sentry [3] make it better for runtime
issues, but orchestration issues are still very trail and error.

\- There can be overhead if your function goes "cold" (i.e. infrequent usage).
Zappa lets you keep sites warm (additional cost), but a cold start adds a
couple of seconds to the first-page load for that user. This applies more to
low volume traffic sites.

Overall: It's definitely overkilled for a side project like this, but I could
see the economics of scale kicking in for multiple or high volume apps.

[1]: [https://blog.zappa.io/posts/zappa-introduces-seamless-
asynch...](https://blog.zappa.io/posts/zappa-introduces-seamless-asynchronous-
task-execution)

[2]:
[https://github.com/cameronmaske/s3env](https://github.com/cameronmaske/s3env)

[3]: [https://getsentry.com/](https://getsentry.com/)

~~~
Mizza
Zappa author here, thank you for your kind recommendation!

Lots more features in the pipeline, too!

~~~
ikornaselur
I usually default to using Flask for Python when building APIs and using Zappa
to deploy it has been a wonderful experience. Easy to develop locally with a
Flask web server and then you just deploy with Zappa.

I haven't used it in a huge production environment, but it's definitely my go
to way of handling APIs in side projects and other related things.

------
kehers
I've been using it for heavy background job for
[http://thefeed.press](http://thefeed.press) and overall, I think it's pretty
ok (I use NodeJs). That said here are few things:

\- No straight way to prevent retries. (Retries can crazily increase your bill
if something goes wrong)

\- API gateway to Lambda can be better. (For one, Multipart form-data support
for API gateway is a mess)

\- (For NodeJs) I don't see why the node_modules folder should be uploaded.
(Google cloud functions downloads the modules from the package.json)

~~~
lukashed
> I don't see why the node_modules folder should be uploaded.

Exactly! Especially if you're using modules that include some sort of binary
and build your function on macOS it's a pain -- I ended up using a Docker-
based workflow to get the correct binaries into the node_modules.

~~~
eyko
Or you can use CI to deploy.

------
alexcasalboni
I'd recommend using a framework such as the Serverless Framework[1],
Chalice[2], Dawson[3], or Zappa[4]. As any other (web) development project,
using a framework will alleviate a big part of the pain involved with a new
technology.

Anyways, I'd recommend starting from learning the tools without using a
framework first. You can find two coding sessions I published on
Youtube[5][6].

[1]: [https://serverless.com/](https://serverless.com/)

[2]: [https://github.com/awslabs/chalice](https://github.com/awslabs/chalice)

[3]: [https://dawson.sh/](https://dawson.sh/)

[4]: [https://github.com/Miserlou/Zappa](https://github.com/Miserlou/Zappa)

[5]:
[https://www.youtube.com/watch?v=NhGEik26324](https://www.youtube.com/watch?v=NhGEik26324)

[6]:
[https://www.youtube.com/watch?v=NlZjTn9SaWg](https://www.youtube.com/watch?v=NlZjTn9SaWg)

------
xer
At Annsec we are all out on serverless infrastructure and use Lambdas and Step
functions in two development teams on a single backlog. Extensibility of a
well written lambda is fenomenal. For instance we have higher abstraction
lambdas for moving data. We make them handle several input events and to the
greatest extent as pure as possible. Composing these lambdas later in Step
functions is true developer joy. We unit test them locally and for E2E-tests
we have a full clone of our environment. In total we build and manage around
40 lambdas and 10 step functions. Monitoring for failure is conducted using
Cloudwatch alarms, Ops Genie and Slack bots. Never been an issue. In our setup
we are aiming for an infrastructure that is immutable and cryptological
verifiable. It turned out to be bit of a challenge. :)

------
tracker1
Best to keep your workloads as small as possible, cold starts can be very bad,
depending on the type of project. Been using mostly node myself, and it's
worked out well.

One thing to be careful of, if you're targeting input into dynamodb table(s),
then it's really easy to flood your writes. Same goes for SQS writes. You
_might_ be better off with a data pipeline, and slower progress. It really
just depends on your use case and needs. You may also want to look at Running
tasks on ECS, and depending on your needs that may go better.

For some jobs the 5minute limit is a bottleneck, others it's the 1.5gb memory.
Just depends on exactly what you're trying to do. If your jobs fit in Lambda
constraints, and your cold start time isn't too bad for your needs, go for it.

~~~
CharlesW
> _Best to keep your workloads as small as possible, cold starts can be very
> bad, depending on the type of project._

Here's a recent, interesting article on the topic that quantifies some of
this: [https://read.acloud.guru/does-coding-language-memory-or-
pack...](https://read.acloud.guru/does-coding-language-memory-or-package-size-
affect-cold-starts-of-aws-lambda-a15e26d12c76)

~~~
tracker1
Thanks for the link... interesting how quick python is to start...

------
Techbrunch
You might want to have a look at Serverless, a framework to build web, mobile
and IoT applications with serverless architectures using AWS Lambda and even
Azure Functions, Google CloudFunctions & more. Debugging, maintaining &
deploying multiple functions gets easier.

Serverless:
[https://github.com/serverless/serverless](https://github.com/serverless/serverless)

------
dcosson
Pros:

\- works as advertised, we haven't had any reliability issues with it

\- responding to Cloudwatch Events including cron-like schedules and other
resource lifecycle hooks in your AWS account (and also DynamoDB/Kinesis
streams, though I haven't used these) is awesome.

Cons:

\- 5 minute timeout. There have been a couple times when I thought this would
be fine, but then I hit it and it was a huge pain. If the task is
interruptible you can have the lambda function re-trigger itself, which I've
done and actually works pretty once you set up the right IAM policy, but it's
extra complexity you really don't want to have to worry about in every script.

\- The logging permissions are annoying, it's easy for it to silently fail
logging to to Cloudwatch Logs if you haven't set up the IAM permissions right.
I like that it follows the usual IAM framework but AWS should really expose
these errors somewhere.

\- haven't found a good development/release flow for it. There's no built-in
way to re-use helper scripts or anything. There are a bunch of serverless app
frameworks, but they don't feel like they quite fit because I don't have an
"app" in Lambda I just have a bunch of miscellaneous triggers and glue tasks
that mostly don't have any relation to each other. It's very possible I should
be using one of them anyway and it would change how I feel about this point.

We use Terraform for most AWS resources, but it's particularly bad for Lambda
because there's a compile step of creating a zip archive that terraform
doesn't have a great way to do in-band.

Overall Lambda is great as a super-simple shim if you only need to do one
simple, predictable thing in response to an event. For example, the kind of
things that AWS really could add as a small feature but hasn't like send an
SNS notification to a slack channel, or tag an EC2 instance with certain
parameters when it launches into an autoscaling group.

For many kinds of background processing tasks in your app, or moderately
complex glue scripts, it will be the wrong tool for the job.

~~~
jelder
If you're already using the Node runtime,
[https://serverless.com/](https://serverless.com/) fixes all of your "cons"
perfectly. Terraform isn't a great fit, in my experience.

------
mfrye0
I started off doing manual build / deploy for a project and it was a total
pain in the ass. From packaging the code, to versioning, rollbacks, deploy.
Then that doesn't even include setting up API Gateway if you want an endpoint
for the function.

Since then I've been using Serverless for all my projects and it's the best
thing I've tried thus far. It's not perfect, but now I'm able to abstract
everything away as you configure pretty much everything from a .yml file.

With that said, there are still some rough spots with Lambda:

1) Working with env vars. Default is to store them in plain text in the Lambda
config. Fine for basic stuff, but I didn't want that for DB creds. You can
store them encrypted, but then you have to setup logic to decrypt in the
function. Kind of a pain.

2) Working within a subnet to access private resources incurs an extra delay.
There is already a cold start time for Lambda functions, but to access the
subnet adds more time... Apparently AWS is aware and is exploring a fix.

3) Monitoring could be better. Cloudwatch is not the most user friendly tool
for trying to find something specific.

With that said, as a whole Lambda is pretty awesome. We don't have to worry
about setting up ec2 instances, load balancing, auto scaling, etc for a new
api. We can just focus on the logic and we're able to roll out new stuff so
much faster. Then our costs are pretty much nothing.

------
mgkimsal
I'm reading here a lot of people jumping through some massive amounts of hoops
to deal with a system that lock you down to a single vendor, and makes it hard
to read logs or even read your own bill.

a few years back, the mantra was "hardware is cheap, developer time isn't".
when did this prevailing wisdom change? Why would people spend
hours/days/weeks wrestling with a system to save money which may take weeks,
months or even years to see an ROI?

~~~
okaramian
I think it's very dependent on the task. I do agree with you, building a
large/high load system on lambda is bad for the reasons you're suggesting. I
think a lot of the hoop jumping is due to experimentation (you won't know how
much of a pain in the ass a piece of technology is until you need to deal with
it).

We've mostly used it for small tasks that will get run once a day. It's been
fantastic for that, as putting up a box to handle a sparsely (one or a couple
of times a day) run task is a lot of work and is expensive.

------
falcolas
Most of my experience mirrors that found in other comments, so here's a few
unique quirks I've personally had to work around:

\- You can't trigger Lambda off SQS. The best you can do is set up a scheduled
lambda and check the queue when kicked off.

\- Only one Lambda invocation can occur per Kinesis shard. This makes
efficiency and performance of that lambda function very important.

\- The triggering of Lambda off Kinesis can sometimes lag behind the actual
kinesis pipeline. This is just something that happens, and the best you can do
is contact Amazon.

\- Python - if you use a package that is namespaced, you'll need to do some
magic with the 'site' module to get that package imported.

\- Short execution timeouts means you have to go to some ridiculous ends to
process long running tasks. Step functions are a hack, not a feature IMO.

\- It's already been said, but the API Gateway is shit. Worth repeating.

Long story short, my own personal preference is to simply set up a number of
processes running in a group of containers (ECS tasks/services, as one
example). You get more control and visibility, at the cost of managing your
own VMs and the setup complexity associated with that.

~~~
jdc0589
> You can't trigger Lambda off SQS. The best you can do is set up a scheduled
> lambda and check the queue when kicked off.

This kills me. I can't believe they haven't added this.

~~~
falcolas
IIRC, it's a philosophical argument that prevents it - SQS is pull, Lambda
triggers off push. They don't want to add a push mechanism to SQS, ergo they
can't support Lambda.

------
cntlzw
Pretty good actually. We started using AWS Lambda as a tool for a cron job.

Then we implemented a RESTful API with API Gateway and Lambda. The Lamdbas are
straightforward to implement. API Gateway unfortunately has not a great user
experience. It feels very clunky to use and some things are hard to find and
understand. (Hint: Request body passthrough and transformations).

Some pitfalls we encountered:

With Java you need to consider the warmup time and memory needed for the JVM.
Don't allocate less than 512MB.

Latency can be hard to predict. A cold start can take seconds, but if you call
your Lambda often enough (often looks like minutes) things run smooth.

Failure handling is not convenient. For example if your Lamdba is triggered
from a Scheduled Event and the lamdba fails for some reason. The Lamdba does
get triggered again and again. Up to three times.

So at the moment we have around 30 Lambdas doing their job. Would say it is an
8/10 experience.

------
Jdam
For running Java in Lambda, I had to optimize it for Lambda. To decrease
processing time (and in the end the bill), I got rid of all reflection for
example and though twice when to initialize what and what to make static.
Also, Java Cold Start is an issue. I fixed this with creating a Cloudwatch
Trigger that executes the Lambda function every minute to keep it hot.
Otherwise, after some minutes of no-one calling the function, it takes 10+
seconds to respond. But if you use Python for example, you don't run into this
issue. I built complete backends on top of Lambda/API Gateway/Dynamo and
having "NoOps" that also runs very cheap is a killer argument for me.

~~~
manishsharan
If you are triggering it every minute, may I ask how much you are paying for
it per month . I have though about it as well as I have a java + spring +
Hibernate which takes much too long to start up ; in fact the task execution
time is less than the cold start time in my case.

~~~
Jdam
Execution takes less than 100ms and I never exhausted my monthly free quota on
Lambda, so I can't say.

------
eknkc
We use Node.JS lambda functions for real time image thumbnail generation and
scraping needs. As well as mirroring our S3 buckets to another blob storage
provider and a couple of periodic background jobs. It works beautifully. It's
a little hard to debug at first but when it's set up, both pricing and
reliability is really good for our use cases.

I think a lot of people try to use the "serverless" stuff for unsuitable
workloads and get frustrated. We are running a kubernetes cluster for the main
stuff but have been looking for areas suitable for lambda and try to move
those.

~~~
bflesch
Can you share your image resizing code on github? We're using thumbor on AWS
but it is a huge PITA.

~~~
rebolyte
Setting up your own resizer using sharp[1] is pretty simple. Just make sure
you install the module in a Lambda-compatible environment, so it can build its
copy of libvips (native C library) correctly. I built and deployed my image
thumbnailer on a CentOS VM.

[1]: [https://github.com/lovell/sharp](https://github.com/lovell/sharp)

------
davidvanleeuwen
After multiple side projects with Lambda (e.g. image processing services), we
finally implemented it on larger scale. Initially we started out without any
framework or tool to help, because there we pretty much non-existent at that
time. We created our own tool, and used Swagger a lot for working with API
gateway (because it is really bad to work with). Over time everything
smoothened out and really worked nicely (except for API Gateway though).
Nowadays we have everything in Terraform and Serverless templates, which
really makes your life easier if you're going to build your complete
infrastructure on top of AWS Lambda and other AWS APIs. There are still a
bunch of quarks you have to work with, but at the end of the line: it works
and you don't have to worry much about scaling.

I'm not allowed to give you any numbers; here's an old blogpost about Sketch
Cloud: [https://awkward.co/blog/building-sketch-cloud-without-
server...](https://awkward.co/blog/building-sketch-cloud-without-servers/)
(however, this isn't accurate anymore). For this use-case, concurrent
executions for image uploads is a big deal (a regular Sketch document can
easily exist out of 100 images). But basically the complete API runs on
Lambda.

Running other languages on Lambda can be easily done and can be pretty fast,
because you simply use node to spawn a process (Serverless has lots of
examples of that).

Let me know if you have any specific questions :-)

Hope this helps.

~~~
3minus1
why is API gateway so bad?

~~~
davidvanleeuwen
Well, using it manually is just cumbersome. API Gateway is not specifically
designed for Lambda, so it has lots of settings which you would think are just
default for building your API. Using it through Cloudformation or Serverless
is way easier.

------
viw
Can talk only about the node.js runtime with native add-ons. Using it for
various automation tasks less then 100 invocations a day where it is the most
convenient solution out there for peanuts. We also use it for parsing
Swagger/API Blueprint files, here we talk 200k+ invocations a day and works
great once we figured out logging/monitoring/error handling and limited output
(6MB). We do not use any framework because they mostly are not flexible enough
but apex ([http://apex.run/](http://apex.run/)) and serves us well. We've hit
couple of times some limits but as it is invocation per request only some
calls failed and the whole service was unaffected. I see the isolation as a
big benefit you get. One thing which sucks is that if it fails (and it is not
your code) often you have no idea why and if anything can be done. We use it
together with AWS API Gateway and the Gateway part is sub par. The gateway
does not support correct HTTP like 204 always returns a body and god forbid if
you want something else than application/json. To sum it up lambda is great
with some minor warts and the API gateway is OK but can easily imagine it much
better.

------
cjhanks
Developing Lambda is an absolutely terrible experience. Tightening the
integration between CloudFormation, API-Gateway, and Lambda would really
improve the situation. For example, a built-in way to map requests/responses
between API-Gateway and Lambda which didn't involve a janky parsing DSL would
be pretty nice.

The strategy Lambda seems to suggest you implement for testing/development is
pretty laborious. There's no real clear way for you to mock operations on your
local system and that's a real bummer.

A lot of things you run into in Python lambda functions are also fairly
unclear. Python often will compile C-extensions... I could never figure out if
there was really a stable ABI or what I could do to pre-compile things for
Lambda.

All of those complaints aside - once you deploy your app, it will probably
keep running until the day you die. So that's a huge upside. Once you rake
through the muck of terrible developer experience (which I admit, could be
unique to me), the service simply works.

So, if you have a relatively trivial application which does not need to be
upgraded often and needs very good up-time.. it's a very nice service.

------
lovehashbrowns
I've only been using it for one project right now. I made an API that I can
use to push security-related events to a location that a hacker couldn't
access, even if they get root on a local system. I use it in conjunction with
sec (Simple Event Correlator). If sec detects something, e.g. a user login, or
a package install, it'll send the event to the API in AWS Gateway + Lambda.
The event then gets stored in a DynamoDB table, and I use a dashing.io
dashboard to display the information. It works super well. I still need to
convert my awful NodeJS code to Python, but that shouldn't take long.

I do remember logging being a confusing mess when I was trying to get this
started. I feel better about the trouble I had now that I see it wasn't just
me. But for a side project that's very simple to use, Lambdas have been a
blessing. I get this functionality without having to manage any servers or
create my own API with something like Python+Flask. Having IAM and
authentication built in for me made the pain from the initial set-up so worth
it.

------
marcfowler
We use it with node for a bunch of things like PDF generation, asynchronous
calls to various HTTP services etc. I think it's excellent.

The worst part about it by far is CloudWatch, which is truly useless.

Check out [https://github.com/motdotla/node-
lambda](https://github.com/motdotla/node-lambda) for running it locally for
testing btw - saved us hours!

~~~
rebolyte
+1 for node-lambda. It's a lot simpler than Serverless when you just want a
little help with testing/deploying.

------
rowell
I've used it in production and we're building our platform entirely in
Serverless/AWS Lambda.

Here are my recommendations:

1) Use Serverless Framework to manage Functions, API-Gateway config, and other
AWS Resources

2) CloudWatch Logs are terrible. Auto-stream CloudWatch Logs to Elastic Search
Service and Use Kibana for Log Management

3) If using Java or other JVM languages, cold starts can be an issue.
Implement a health check that is triggered on schedule to keep functions used
in real-time APIs warm

Here's a sample build project I use: [https://github.com/bytekast/serverless-
demo](https://github.com/bytekast/serverless-demo)

For more information, tips & tricks:
[https://www.rowellbelen.com/microservices-with-aws-lambda-
an...](https://www.rowellbelen.com/microservices-with-aws-lambda-and-the-
serverless-framework/)

------
meekins
We're doing both stream processing and small query APIs using Lambda.

A few pointers (from relatively short experience):

\- The best UC for Lambda seems to be stream processing where latency due to
start up times is not an issue

\- For user/application-facing logic the major issue seems to be start-up-
times (esp. JVM startup times when doing Java or your API gets called very
rarely) and API Gateway configuration management using infrastructure as code
tools (I'd be interested in good hints about this, especially concerning
interface changes)

\- The programming model is very simple and nice but it seems to make most
sense to split each API over multiple lambdas to keep them as small as
possible or use some serverless framework to make managing the whole app more
easy

\- This goes without saying, but be sure to use CI and do not deploy local
builds (native binary deps)

------
ransom1538
I used it for converting images to BPG format and do resizing. I really
enjoyed it. Basically with Docker/lambda these days I feel like the future
will be 'having code' and then 'running it' (no more ssh, puppet,
kuberdummies, bash, vpc, drama). Once lambda runs a docker file it might take
over middle earth. These were my issues with lambda:

1\. Installing your own linux modifications isn't trivial (we had to install
the bpg encoder). They use a strange version of the linux ami.

2\. Lambda can listen to events from S3 (creation,deletion,..) but can't seem
to listen to SQS events WTF? It seems like amazon could fix this really
easily.

3\. Deployment is wonky. To add a new lambda zip file you need to delete the
current one. This can take up to 40 seconds (which you would have total
downtime).

------
zurn
For a serverless system that uses Lambda together with eg CloudFormation,
Dynamo, and S3, Cognito etc - it's pretty low level and you spend a lot of
time understanding, refining & debugging basic things. The end-to-end logging
and instrumentation throughout the services used by your app weren't great.

Doesn't like big app binaries/JARs and Amazon's API client libs are bloated -
Clojure + Amazonica goes easily over the limit if you don't manually exclude
some Amazon's API JDKs from the package.

On the plus side, you can test all the APIs from your dev box using the cli or
boto3 before doing it from the lambda.

Would probably look into third party things like Serverless next time.

~~~
baader
I uploaded with Clojure 50kbyte compiled java, which have multi handler in one
function. Execute times avg 15ms.

------
shakna
\- Cheap, especially for low usage.

\- Runs fast, unless your function was frozen for not enough usage or the like

\- Easy to deploy and/or "misuse"

\- Debugging doesn't really work

All in all, probably the least painful thing I've used on AWS. But that
doesn't necessarily mean much.

------
maephisto
A couple of months ago, I've started using AWS Lambda for a side project. The
actual functions were pretty easy to code using `nodejs` and deploying them
with `serverless` but the boilerplate to opening them via an http API was the
real bummer. IAMs, routing and all kind of other little things standing in the
way of actual productive work. Some time after that I tried to setup GCloud
Functions and to my surprise that boilerplate was minimal! Write your function
and have accessible with just a couple of commands. IMHO GCloud Functions is
way more developer friendly and AWS Lambda.

~~~
gulliG
I work for Gcloud functions. happy to help if you need support.

------
alexbilbie
If you need to store environment variables easily and securely take a look at
EC2 Parameter Store - you can fetch the relevant parameters on startup and
they are automatically encrypted and decrypted using KMS for you

------
rlv-dan
A session I remember that might be of interest:

Building reactive systems with AWS Lambda:
[https://vimeo.com/189519556](https://vimeo.com/189519556)

------
ajeet_dhaliwal
It's been great for using as 'glue' to do small tasks like clean ups in our
case or other short lived minor tasks. I haven't used it for anything major
though, only for minor tasks that are easier or more convenient to do with
Lambda rather than a different way. The real value comes from the integration
with other AWS services, for example, for developers using DynamoDb Lambdas
make a lot of maintenance of records far easier with streams events.

------
djhworld
Have used it in production for > 2 years, mainly for ETL/Data processing type
jobs which seems to work well.

We also use it to perform scheduled tasks (e.g. every hour) which is good as
it means you don't have to have an EC2 instance just to run cron like jobs.

The main downside is Cloudwatch Logs, if you have a Lambda that runs very
frequently (i.e. 100,000+ invocations a day) the logs become painful to search
through, you have to end up exporting them to S3 or ElasticSearch.

------
aeikenberry
We run many microservices on Lambda and it has been a pleasant experience for
us. We use Terraform for creating, managing environment variables, and
permissions/log groups/etc. We use CodeShip for testing, and validating and
applying Terraform across multiple accounts and environments.

For logging, we pipe all of our logs out of CloudWatch to LogEntries with a
custom Lambda, although looking at CloudWatch logs works fine most of the
time.

~~~
digitalsanctum
"looking at CloudWatch logs works fine most of the time"...are you kidding? I
wanna gouge my eyes out with a rusty spoon every time I have to look at CW
logs :)

------
eloycoto
I'm using it for a year in a half, and I'm more than happy, The cost
increments when you have much load, but I am a happy user to use it for these
small applications that need to be always up.

Need to say, that you should use
gordon<[https://github.com/jorgebastida/gordon>](https://github.com/jorgebastida/gordon>)
to manage it, Gordon makes the process easier.

Regards

------
jakozaur
So far used only for toy/infrequent use cases and it works there well. E.g.
Slack command, integration with different systems, cron style job.

------
rmccue
Pretty great, we're using it for resizing and serving images for our clients
(large media companies, banks, etc): [https://hmn.md/2017/04/27/scaling-
wordpress-images-tachyon/](https://hmn.md/2017/04/27/scaling-wordpress-images-
tachyon/)

API Gateway is a little rougher, but slowly getting there.

------
Dawny33
\- We use Lambda along with Ansible to execute huge, distributed ML workloads
which are (completely)serverless. Saves a lot of bucks, as ML needs huge
boxes.

\- For serverless APIs for querying the S3 which is a result of the above
workload

Difficulties faced with Lambda(till now):

1\. No way to do CD for Lambda functions. [Not yet using SAM]

2\. Lambda launches in its own VPC. Is there a way to make AWS launch my
lambda in my own VPC? [Not sure.]

~~~
zceee12
2\. You can! Although it's a bit tedious. You need to also ensure the function
has EIP allocation permissions.

------
StreamBright
We have moved our database maintenance cron jobs to Lambda as well as the
image resize functionality. General experience is very positive after we
figured out hot to use Lambda from Clojure and Java. People worried about JVM
startup times: Lambda will keep your JVM up and running for ~7 minutes after
the initial request and you can achieve low latency easily.

------
jonathanbull
We use Lambda extensively at
[https://emailoctopus.com](https://emailoctopus.com). The develop-debug cycle
takes a while, but once you're up and running, the stability is hard to beat.
Just wish they'd raise that 5 minute execution limit so we can migrate a few
more scripts.

------
adrianpike
It's good. We're using it for a ton of automation of various developer tasks
that normally would get run once in a while (think acceptance environment
spinup, staging database load, etc.).

It fails once in a while and the experience is bad, but that's mostly due to
our tooling around failure states instead of the platform itself.

------
dlanger
Hey everyone, I'm Daniel Langer and I help build lambda monitoring products
over at Datadog. I see lots of you are unhappy with the current monitoring
solutions available to you. If anyone has thoughts on what they'd like in a
Lambda monitoring service feel free to email me at daniel.langer@datadoghq.com

~~~
PandaHeathen
I spent a while talking to the Datadog guys at the AWS Sydney summit in April
and while the product was compelling, nobody could give me a straight answer
on pricing for lambda - all the pricing is given in terms of per host. So I'd
have to say the main thing I'm after is pricing transparency...

------
tommy5dollar
Been using it for about 6 months with Serverless for Node API endpoints and
it's great so far!

The only negatives are: \- cold start is slow, especially from within a VPC \-
debugging/logging can be a pain \- giving a function more memory (~1GB) always
seems to be better (I'm guessing because of the extra CPU)

------
ahmednasir91
\- When using with API Gateway the API response time is more than 2-3 seconds
for a NodeJS lambda, for Java it will be more. \- Good for use cases for
example \-- cron - can be triggered using Cloudwatch events. \-- Slack command
bot (API Gateway + Lambda) the only problem is timeout.

------
betimd
I'm having around 4 mln lambda executions per month, mostly on data processing
and I'm happy in overall with performance and easy of deployment. Debugging is
hard, frameworks are still very mature. I use AWS SDK and C# and I'm having
quite good experience.

------
erikcw
Lots of great comments here. I'd like to add that being limited to 512mb of
working disk space at /tmp has been a stumbling block for us.

Would be really great to have this configurable along with CPU/memory.

Additionally being able to mount and EFS volume would be very useful!

~~~
khc
Is there any reason why you can't use S3?

~~~
erikcw
Sure, and that's what I do most of the time.

However, if the S3 keys are larger than ~500mb, then it's not possible to
process them in any way with Lambda since there isn't enough "scratch space"
available in /tmp.

I was suggesting EFS support merely because it would allow access to
arbitrarily large amounts of "local" disk to work with...

~~~
khc
didn't know you need scratch space to work with S3 in lambda? What about
something like s3fs/goofys?

------
akhatri_aus
\- There is a surprisingly high amount of API gateway latency

\- The CPU power available seems to be really weak. Simple loops running in
NodeJS run way way slower on Lambda compared to a 1.1 GHz Macbook by a
significant magnitude. This is despite scaling the memory up to near 512mb.

\- Certain elements, such as DNS lookups, take a very long time.

\- The CloudWatch logging is a bit frustrating. If you have a cron job it will
lump some time periods as a single log file, other times they're separate. If
you run a lot of them its hard to manage.

\- Its impossible to terminate a running script.

\- The 5 minute timeout is 'hard', if you process cron jobs or so, there isn't
flexibility for say 6 minutes. It feels like 5 minutes is arbitrarily short.
For comparison Google Cloud Functions let you work 9 minutes which is more
flexible.

\- The environment variable encryption/decryption is a bit clunky, they don't
manage it for you, you have to actually decrypt it yourself.

\- There is a 'cold' start where once in a while your Lambda functions will
take a significant amount of time to start up, about 2 seconds or so, which
ends up being passed to a user.

\- Versions of the environment are updated very slowly. Only last month (May)
did AWS add support for Node v6.10, after having a very buggy version of Node
v4 (a lot of TLS bugs were in the implementation)

\- There is a version of Node that can run on AWS Cloudfront as a CDN tool. I
have been waiting quite literally 3 weeks for AWS to get back to me on
enabling it for my account. They have kept up to date with me and passed it on
to the relevant team in further contact and so forth. It just seems an overly
long time to get access to something advertised as working.

\- If you don't pass an error result in the callback callback, the function
will run multiple times. It wont just display the error in the logs. But there
is no clarity on how many times or when it will re-run.

\- There aren't ways to run Lambda functions in a way where its easy to manage
parallel tasks, i.e to see if two Lambda functions are doing the same thing if
they are executed at the exact same time.

\- You can create cron jobs using an AWS Cloudwatch rule, which is a bit of an
odd implementation, CloudWatch can create timing triggers to run Lambda
functions despite Cloudwatch being a logging tool. Overall there are many ways
to trigger a lambda function, which is quite appealing.

The big issue is speed & latency. Basically it feels like Amazon is falling
right into what they're incentivised to do - make it slower (since its charged
per 100ms).

PS: If anyone has a good model/providers for 'Serverless SQL databases' kindly
let me know. The RDS design is quite pricey, to have constantly running DBs
(at least in terms of the way to pay for them)

~~~
philliphaydon
RDS always running was just recently fixed...

[https://aws.amazon.com/about-aws/whats-new/2017/06/amazon-
rd...](https://aws.amazon.com/about-aws/whats-new/2017/06/amazon-rds-supports-
stopping-and-starting-of-database-instances/)

------
unkoman
\- Decouple lambdas with queues and events, SQS, SNS and S3 events are your
friends here

\- Use environment variables

\- Use step functions to create to create state machines

\- Deploy using cloudformation templates and serverless framework

~~~
falcolas
Since you can not trigger Lambda with SQS, I don't recommend using Lambda with
SQS. You end up having to use some other tool to trigger the Lambda
invocation, and polling SQS from there.

Better to use a stream that can trigger Lambdas natively - like SNS or
Kinesis.

------
forgottenacc57
In the end it's only the web server that is serverless, you still need other
servers depending on your use case, and hey, web servers aren't that hard to
run anyway.

------
synthc
We had a bad experience: we accidentally made an error in one function that
got called a lot, which blocked other functions from runnning. Yay for
isolation!

------
obrit
I'd like to use Lambda@Edge to add headers to my CloudFront responses. Does
anybody have any idea when this might be released from preview?

------
lunch
Can anyone speak to continuous deployments with Lambda, where downtime is not
an option? Is it possible to run blue green deployments?

~~~
chickenbane
If downtime is an not an option, Lambda is not the solution. Amazon will
automatically shut down instances of your Lambda if they haven't been used
(~5-15 minutes), and starting a fresh Lambda has noticeable latency. So, some
people have resorted to periodically querying their Lambda to prevent this.
However, occasionally Amazon will reset all Lambdas, which will force the hard
restart.

~~~
lunch
I'm aware of the increased startup latency for functions haven't been recently
used, but that's not the same as downtime or dropped connections.

------
david90
I tried using Lambda, but need to set up the API gateway before using as well.
Painful logging and parameter forwarding.

~~~
philliphaydon
That makes no sense, API Gateway can proxy requests directly to lambda, and in
C# there's Nancy / Web API middleware.

------
forgottenacc57
Lots of people saying the API gate away is hard.

 _You don 't need to use the API gateway_.

Just talk direct to Lambda.

------
caseymarquis
Any comparisons to GCE and Azure's offerings for those who have used both?

------
gaius
I've only played with it as opposed to deploying Prod but guve Azure Functions
a try too [https://azure.microsoft.com/en-
us/services/functions/](https://azure.microsoft.com/en-us/services/functions/)

------
goldenkey
Works terribly. It's basically a thin wrapper around a shoddy jar framework.
All the languages supported are basically shit-farmed from the original Java
one. The Java one is the only one that works half decently.

~~~
philliphaydon
This sounds like you touched it for all of 30 seconds hated it and formed an
opinion.

I use both NodeJS and C# lambda's without issue. The support is really good.

Debugging experience isn't great but aside from that it's fast and easy to
use.

C# lambda's can call RDS and respond back in ~5ms...

(before anyone calls me out on the 5ms...)

[http://www.philliphaydon.com/2017/05/10/part5-configuring-
th...](http://www.philliphaydon.com/2017/05/10/part5-configuring-the-vpc-so-
we-can-call-the-database/)

Last image, I state 2-3 second startup and then 4ms response on a call to the
database.

