1. Took too long to get something working. The common use case of hooking up a Lambda function to an HTTP endpoint is surprisingly fiddly and manual.
2. Very painful logging/monitoring.
3. The Node.js version of Lambda has a weird and ugly API that feels like it was designed by a comittee with little knowledge of Node.js idioms.
4. The Serverless framework produces a huge bundle unless you spend a lot of effort optimising it. It's also very slow to deploy incremental changes edit: – this is not only due to the large bundle size but also due to having to re-up the whole generated CloudFormation stack for most updates.
5. It was worth it in the end for making a useful little service that will exist forever with ultra-low running costs, but the developer experience could have been miles better, and I wouldn't want to have to work on that codebase again.
Edit: here's the code: https://github.com/Financial-Times/ig-images-backend
To address point 3 above, I wrote a wrapper function (in src/index.js) so I could write each HTTP Lambda endpoint as a straight async function that simply receives a single argument (the request event) and asynchronously returns the complete HTTP response. This wouldn't be good if you were returning a large response though; you'd probably be better streaming it.
2) Logging is indeed painful! You definitely need a separate tool/system for that. I have created a small CLI tool to view logs of multiple Lambdas which have been deployed using a CloudFormation template: https://github.com/seeebiii/lambdalogs This does not replace a good external system, but it can help for small searches in the logs.
4) Yes, this takes a lot. Though I'm not using the Serverless framework, deploying the code using CloudFormation takes me about 2:30 minutes (with a project using Java Lambdas), because CF is doing lots of checks in the background. I also wrote a tool for this to decrease the waiting time and just update the JS/Java code instead of the whole stack: https://github.com/seeebiii/lambda-updater
Hope this helps you or someone else a bit!
Hopefully this isn't just stockholm syndrome speaking though...
 - https://iopipe.com
 - https://github.com/iopipe/iopipe/
Here's a talk that walks through some of our features: https://www.youtube.com/watch?v=TgB-fs1hwlw&t=18s
Disclosure: I'm a Program Manager on Azure Functions.
* setup is 100% completely clicky-clicky UI driven, which was a huge pain to scale. instantiation of a Function on behalf of a developer for production use was a huge time sink
* it's clearly a thin veneer on Azure Web Services, and the abstractions leak badly in the portal (deployment credentials, for example)
* the web UI breaks completely and mysteriously if you enable authentication
* management of service princi- uh, I mean, Azure AD Applications was weird, and the (internal to MSFT, I suspect) permissions model to the Graph API was a huge barrier to ease of use
* management of NPM packages required me to start a terminal session in the UI and run commands manually, which was a huge turnoff (and had to be repeated ad nauseam with every new Function created)
* the configuration files for the runtime are utterly undocumented, with the sole exception of the bits used to plug Azure inputs/outputs together. this makes automating things exceedingly difficult. I recall there even being a magic value in the topmost config file
* the edit-commit-push-test cycle was VERY slow, with new commits sometimes taking tens of minutes to "appear" in my function
* I never found a way to run it locally, making the previous point that much worse
* log output is very difficult to find, and can live in a few different places. I spent too much time hunting for errors, especially things like syntax errors that make the runtime itself go kaboom. This was the thing that really killed it for me; if I had an error that resulted in anything but a "clean" return, it was torture trying to figure out where I'd missed the paren.
- You can create a Function App via ARM/CLI/etc., you can write functions without ever touching the portal. See https://docs.microsoft.com/en-us/azure/azure-functions/funct.... You can also now use Visual Studio to author C# functions: https://docs.microsoft.com/en-us/azure/azure-functions/funct...
- It’s true that Functions is built on App Service, but I see that as an advantage. You get all the great features of Continuous Integration, custom domains, automated deployment, etc.
- Indeed, the portal does not do well when auth is enabled and all routes are protected. The problem is that the portal calls admin APIs that are also protected, so it fails. We now have better error messages for this, and we’re tracking this bug: https://github.com/Azure/azure-functions-ux/issues/499
- The Graph API issue is probably not specific to Functions, but it is a bit easier with the Authentication/Authorization feature. Can you provide more detail?
- You can install npm packages at the "root" of your Function and not reinstall them for each Function, just like a normal Node.js app - it walks the directories.
- Our documentation is much better now, and we even have documentation for all bindings in the portal. We also have much better conceptual docs on bindings, see https://docs.microsoft.com/en-us/azure/azure-functions/funct.... We’d welcome any specific feedback on docs that are missing.
- CI should be faster now, it usually takes about 2-3 minutes for commits to show up. It’s fast enough that I’ve demo’d it.
- Logs definitely weren't great. Initially, they always went to table storage, but the ones you see streaming in the portal get written to disk to enable the realtime portal stream - they are only written to disk when you're in the portal, so they are "sometimes" there. The good news is that we've tightly integrated Application Insights, which means logs are easy to find. It's easy to alert on failed functions. You can see perf and metric data all in one place without log parsing. For a demo, go to the 6 minute mark of this video: https://www.youtube.com/watch?v=TgB-fs1hwlw&t=6m
I use it to handle a contact form on a static web site. It works really well.
2. Dear heavens yes. I ended up building a wrapper (similar to what you did to address 3) that handles logging, etc for any internal events. Everything else is a pass / fail check
3. Also had to build a wrapper. Context / Callback params are confusing
4. I wouldn't use Serverless unless you need it. Try something smaller. Apex is a nice, simple start. Shameless plug: I built a deployment tool because serverless wouldn't work for us and I wanted something in node (no binary like Apex - integrate into our build process) 
My #1 concern with it went away a while back when Amazon finally added support for Python 3 (3.6).
It behaved as advertised: Allowed us to scale without worrying about scaling. After a year of using it however I'm really not a big fan of the technology.
It's opaque. Pulling logs, crashes and metrics out of it is like pulling teeth. There's a lot of bells and whistles which are just missing. And the weirdest thing to me is how people keep using it to create "serverless websites" when that is really not its strength -- its strength is in distributed processing; in other words, long-running CPU-bound apps.
The dev experience is poor. We had to build our own system to deploy our builds to Lambda. Build our own canary/rollback system, etc. With Zappa it's better nowadays although for the longest time it didn't really support non-website-like Lambda apps.
It's expensive. You pay for invocations, you pay for running speed, and all of this is super hard to read on the bill (which function costs me the most and when? Gotta do your own advanced bill graphing for that). And if you want more CPU, you have to also increase memory; so right now our apps are paying for hundreds of MBs of memory we're not using just because it makes sense to pay for the extra CPU. (2x your CPU to 2x your speed is a net-neutral cost, if you're CPU-bound).
But the kicker in all this is that the entire system is proprietary and it's really hard to reproduce a test environment for it. The LambCI people have done it, but even so, it's a hell of a system to mock and has a pretty strong lock-in.
We're currently moving some S3-bound queue stuff into SQS and dropping Lambda at the same time could make sense.
I certainly recommend trying Lambda as a tech project, but I would not recommend going out of your way to use it just so you can be "serverless". Consider your use case carefully.
> its strength is in distributed processing; in other words, long-running CPU-bound apps.
it seems to me that's an explicit non-usecase for Lambda given it limits sessions to < 5min per invocation.
Like I said we use it for game replay processing, so that's 5-15 second tasks that read and parse log files and hit a db and s3 with the results.
Other suitable tasks: image resizing, API chatter, bounded video transcoding, etc. Lambda is pretty good at distributed processing (as long as you can make the bill work in your favour, over hosting your own overprovisioned fleet).
Personally, I think it's all fucking ridiculous the amount of effort you have to spend into reading your own bill.
The biggest pain point I have is because we have multiple "environments" (such as dev, da, staging) in the same amazon account and because lambdas are global I can't limit access to resources via IAM easily without hacks.
Aka, because the same lambda will be used (but different versions and/or aliases) on all environments I can't marry the code and configuration to limit access to say RDS or elasticache or an S3 bucket per lambda.
I feel like I need a higher order primitive (a lambda group that is role + configuration can live in that includes the lambdas) to achieve this. I realize api gateway has the concepts of stages but currently the idea is for some lambdas to be invoked directly by the monolithic app or via SNS/SQS async.
Otherwise I could namespace my lambda functions which is hacky and make DevFooBar, StageFooBar, etc.
Currently we plan to split off our environments into separate AWS accounts.
Today given that aliases/versions do not have any sort of different permissions you are probably best to either run completely different stacks of resources or the multiple account model. With AWS Organizations these days its not that hard to run multiple environments across accounts.
We did a webinar on some of this a few months back, the slides here might be useful to you: https://www.slideshare.net/AmazonWebServices/building-a-deve...
It heavily leverages AWS's tools, but you could create similar practices using 3rd party frameworks and CI/CD tools as well.
Then the IAM's are written with resource access to prod_* staging_* etc.
It allows to give full permissions to the developer to create dev ones, modify the other ones, but the prod_ are all controlled by a smaller group of people.
It's a bit hacky but it works well enough.
Would be nicer to grant access by stages.
One thing to note. API Gateway is super picky about your response. When you first get started you may have a Lambda that runs your test just fine but fails on deployment. Make sure you troubleshoot your response rather than diving into your code.
I saw some people complaining about using an archaic version of Node. This is no longer true. Lambdas support Node V6 which, while not bang up to date, is an excellent version.
Anyway, I can attest it is production ready and at least in our usage an order of magnitude cheaper.
Lambdas have a lot of benefits - for occasional tasks they are essentially free, the simple programming model makes them easy to understand in teams, you get Amazon's scaling and there's decent integration with caching and logging.
However, especially since I had to use them for whole solution, I ran into a ton of limitations. Since they are so simple, you have to pull in a lot of dependencies which negate a lot of the ease of understanding I mentioned before. The dependencies are things like Amazon's API Gateway, AWS Step Functions, and AWS CLI itself, which is pretty low-level. So now, the application logic is pretty easy, but now you are dealing with a lot of integration devops. There's API Gateway is pretty clunky and surprisingly slow. Lambdas shut themselves down, and restarting is slow. The Step Functions have a relatively small payload limit that needs to be worked around. Etc. So use them sparingly!
2. Don't put the Lambda inside a VPC if you want lower response times
3. Step Functions don't seem ready for prime time that I can tell. (This might have changed in the last couple of months)
4. Lambda Functions should be microservices. Small and lean.
5. There is a limit on resources for CloudFormation so at about 20-30 functions with API Gateway on the serverless framework, you will hit a limit and can't add anymore (other deployment tools which don't use CloudFormation shouldn't have an issue)
6. Want more CPU, add more RAM
It may be time to test them out again, I just go bit pretty bad with the last time I implemented them and lost about a weeks worth of work because of it being un-usable.
Response time's are fine inside a VPC... People keep saying gateway is slow but it isn't from my experience...
I am by no means an expert and am reporting what watching the time differences between Lambda reporting and our network calls shows.
Edit: this is by no means a deal breaker or anything. Just something that shocked me when I first noticed it.
I haven't tried with a NAT Gateway.
This is what I know.
If the response is small, the request duration is small.
Cloudfront > Gateway > Lambda > RDS (PostgreSQL)
1k | 20ms-60ms
10k | 50ms-90ms
100k | 150ms-200ms
350k | 200ms-450ms
That's a rough gauge of what I've experienced.
I think the throughput on the gateway is the bottleneck.
I am not sure what happened, but I had to move a couple of functions inside the VPC and our response times have remained the same.
Lambda run time (140 ms)
Total waiting time (354ms)
Total time (358ms)
Lambda run time (300 ms)
Total waiting time (490ms)
Total time (567ms)
Lambda run time (139 ms)
Total waiting time (479ms)
Total time (485ms)
This is for a 20kb payload single request.
Stack: Custom Domain -> Cloudfront -> API Gateway -> VPC -> Lambda -> NAT Gateway -> ElasticSearch
- CPU power also scales with Memory, you might need to increase it to get better responses
- Ability to attach many streams (Kinesis, Dynamo) is very helpful, and it scales easily without explicitly managing servers
- There can be a overhead, your function gets paused (if no data incoming) or can be killed undeterministically (even if it works all the time or per hour) and causes cold start, and cold start is very bad for Java
- You need to make your JARs smaller (50MB), you cannot just embed anything you like without careful consideration
https://github.com/AlexanderC/lambdon (i know the name sucks)
Also @chetanmelkani as a hint: if you are using NodeJS runtime most optimal from the execution time and cost efficiency perspective is setting up 512mb of memory ;) it's about getting x2 performance boost over the 128mb configuration.
Claudia.js also has an API layer that makes it look very similar to express.js versus the weird API that Amazon provides. I would not use lambda + JS without claudia.
For usage scenarios, one endpoint is used for a "contact us" form on a static website, another we use to transform requests to fetch and store artifacts on S3. I can't speak toward latency or high volume but since I've set them up I've been able to pretty much forget about them and they work as intended.
Lots of the examples and articles around this process are out of date and AWS's web front end can be painful to deal with. That said, when everything was setup, it was pretty straight forward to maintain.
Do you have any links to good tutorials on Claudia? I'd love to setup a contact me form using lambda for a project I'm working on.
Also, do you know how dos Claudia compare to stuff like serverless.js?
You can have tutorials and examples on the :
- Claudia.js website - https://claudiajs.com
- Claudia Github examples - https://github.com/claudiajs/example-projects
The purpose of Claudia.js is just to make it super easy to develop and deploy your applications on AWS Lambdas, API Gateway, also ease up the work with DynamoDb, AWS IoT, Alexa and so on.
There are two additional libraries: Claudia API Builder and Claudia Bot Builder, to ease up API and chat bot development and deployment.
Regarding the contract form - the best is to create a single service that will handle all the contract form requests. At that point, you can either connect it to DynamoDb, or even call some other data storage / service.
Both Serverless and Claudia have their points where they shine. For a better understanding of their comparison, you can read about it in the Claudia FAQ - https://github.com/claudiajs/claudia/blob/master/FAQ.md#how-...
Development can be tricky, there are a lot of of all in one solutions like the serverless framework, we use Apex CLI tool for deploying and Terraform for infra. These tools offer a nice workflow for most developers.
Logging is annoying, its all cloudwatch, but we use a lambda to send all our cloudwatch logs to sumologic. We use cloudwatch for metrics, however we have a grafana dashboard for actually looking at those metrics. For exceptions we use Sentry.
Resources have bitten us the most, not enough memory suddenly because the payload from a download. I wish lambda allowed for scaling on a second attempt so that you could bump its resources, this is something to consider carefully.
Encryption of environment variables is still not a solved issue, if everyone has access to the AWS console, everyone can view your env vars, so if you want to store a DB password somewhere, it will have to be KMS, which is not a bad thing, this is usually pretty quick, but does add overhead to the execution time.
* Games are developed as command line tools which use JSON for input and output. They're pure so the game state is passed in as part of the request. An example is my implementation of Lost Cities
* Games are automatically bundled up with a NodeJS runner and deployed to Lambda using Travis CI
* I use API Gateway to point to the Lambda function, one endpoint per game, and I version the endpoints if the game data structures ever change.
* I have a central API server which I run on Elastic Beanstalk and RDS. Games are registered inside the database and whenever players make plays, Lambda functions are called to process the play.
I'm also planning to run bots as Lambda functions similar to how games are implemented, but am yet to get it fully operational.
Apart from stumbling a lot setting it up, I'm really happy with how it's all working together. If I ever get more traction I'll be interesting to see how it scales up.
Terrible deploy process, especially if your package is over 50mb (then you need to get S3 involved). Debugging and local testing is a nightmare. Cloudwatch Logs aren't that bad (you can easily search for terms).
We have been using Lambdas in production for about a year and a half now, to do 5 or so tasks. Ranging from indexing items in Elasticseaech, to small CRON clean up jobs.
One big gripe around Lambads and integration with API Gateway is they totally changed the way it works. It use to be really simple to hook up a lambda to a public facing URL so you could trigger it with a REST call. Now you have to do this extra dance with configuring API Gateway per HTTP resource, therefore complicating the Lambda code side of things. Sure with more customization you have more complexity associated with it, but the barrier to entry was significantly increased.
I was initially attracted to it as a low-cost tool to run a database (RDS) powered service side project.
- Zappa is a great tool. They added async task support  which replaced the need for celery or rq. Setting up https with let's encrypt takes less than 15 minutes. They added Python 3 support quickly after it was announced. Setting up a test environment is pretty trivial. I set up a separate staging site which helps to debug a bunch of the orchestration settings. I also built a small CLI  to help set environment variables (heroku-esque) via S3 which works well. Overall, the tooling feels solid. I can't imagine using raw Lambda without a tool like Zappa.
- While Lambda itself is not too expensive, AWS can sneak in some additional costs. For example, allowing Lambda to reach out to other services in the VPC (RDS) or to the Internet, requires a bunch of route tables, subnets and a nat gateway. For this side project, this currently costs way more running and invoking Lambda.
- Debugging can be a pain. Things like Sentry  make it better for runtime issues, but orchestration issues are still very trail and error.
- There can be overhead if your function goes "cold" (i.e. infrequent usage). Zappa lets you keep sites warm (additional cost), but a cold start adds a couple of seconds to the first-page load for that user. This applies more to low volume traffic sites.
Overall: It's definitely overkilled for a side project like this, but I could see the economics of scale kicking in for multiple or high volume apps.
Lots more features in the pipeline, too!
I haven't used it in a huge production environment, but it's definitely my go to way of handling APIs in side projects and other related things.
- No straight way to prevent retries. (Retries can crazily increase your bill if something goes wrong)
- API gateway to Lambda can be better. (For one, Multipart form-data support for API gateway is a mess)
- (For NodeJs) I don't see why the node_modules folder should be uploaded. (Google cloud functions downloads the modules from the package.json)
So you don't end up with a leftpad-like event. Control and ship your dependencies.
But here's the fun part - if you want to just upload your code and make it download+deploy dependencies, you can do it using your own lambda function :-)
Exactly! Especially if you're using modules that include some sort of binary and build your function on macOS it's a pain -- I ended up using a Docker-based workflow to get the correct binaries into the node_modules.
Let's say you depend on libfoo. It can be obtained via system package, built from sources (with 3 different feature switches), or your language's package can simulate the effect without the native libfoo but it will take longer. Why knows what "it needs to do"?
This is not something anyone but you can answer. There could be some nice wrapper that warns you about libraries you use, but you have to make the decision.
> Why would you upload node_modules
It is required to: http://docs.aws.amazon.com/lambda/latest/dg/nodejs-create-de...
The retry one is new to me, need to read more about it.
Thanks for the info.
Anyways, I'd recommend starting from learning the tools without using a framework first. You can find two coding sessions I published on Youtube.
One thing to be careful of, if you're targeting input into dynamodb table(s), then it's really easy to flood your writes. Same goes for SQS writes. You might be better off with a data pipeline, and slower progress. It really just depends on your use case and needs. You may also want to look at Running tasks on ECS, and depending on your needs that may go better.
For some jobs the 5minute limit is a bottleneck, others it's the 1.5gb memory. Just depends on exactly what you're trying to do. If your jobs fit in Lambda constraints, and your cold start time isn't too bad for your needs, go for it.
Here's a recent, interesting article on the topic that quantifies some of this: https://read.acloud.guru/does-coding-language-memory-or-pack...
- works as advertised, we haven't had any reliability issues with it
- responding to Cloudwatch Events including cron-like schedules and other resource lifecycle hooks in your AWS account (and also DynamoDB/Kinesis streams, though I haven't used these) is awesome.
- 5 minute timeout. There have been a couple times when I thought this would be fine, but then I hit it and it was a huge pain. If the task is interruptible you can have the lambda function re-trigger itself, which I've done and actually works pretty once you set up the right IAM policy, but it's extra complexity you really don't want to have to worry about in every script.
- The logging permissions are annoying, it's easy for it to silently fail logging to to Cloudwatch Logs if you haven't set up the IAM permissions right. I like that it follows the usual IAM framework but AWS should really expose these errors somewhere.
- haven't found a good development/release flow for it. There's no built-in way to re-use helper scripts or anything. There are a bunch of serverless app frameworks, but they don't feel like they quite fit because I don't have an "app" in Lambda I just have a bunch of miscellaneous triggers and glue tasks that mostly don't have any relation to each other. It's very possible I should be using one of them anyway and it would change how I feel about this point.
We use Terraform for most AWS resources, but it's particularly bad for Lambda because there's a compile step of creating a zip archive that terraform doesn't have a great way to do in-band.
Overall Lambda is great as a super-simple shim if you only need to do one simple, predictable thing in response to an event. For example, the kind of things that AWS really could add as a small feature but hasn't like send an SNS notification to a slack channel, or tag an EC2 instance with certain parameters when it launches into an autoscaling group.
For many kinds of background processing tasks in your app, or moderately complex glue scripts, it will be the wrong tool for the job.
a few years back, the mantra was "hardware is cheap, developer time isn't". when did this prevailing wisdom change? Why would people spend hours/days/weeks wrestling with a system to save money which may take weeks, months or even years to see an ROI?
We've mostly used it for small tasks that will get run once a day. It's been fantastic for that, as putting up a box to handle a sparsely (one or a couple of times a day) run task is a lot of work and is expensive.
yes, 95% of the time this is accurate. Hardware is only a small part of what you are paying for though. You are also paying for: the actual lambda platform and completely transparent hardware support + replacement, security patching, feature updates, reliability guarantees, etc...
If anything that supports your statement. Time and people are expensive.
- You can't trigger Lambda off SQS. The best you can do is set up a scheduled lambda and check the queue when kicked off.
- Only one Lambda invocation can occur per Kinesis shard. This makes efficiency and performance of that lambda function very important.
- The triggering of Lambda off Kinesis can sometimes lag behind the actual kinesis pipeline. This is just something that happens, and the best you can do is contact Amazon.
- Python - if you use a package that is namespaced, you'll need to do some magic with the 'site' module to get that package imported.
- Short execution timeouts means you have to go to some ridiculous ends to process long running tasks. Step functions are a hack, not a feature IMO.
- It's already been said, but the API Gateway is shit. Worth repeating.
Long story short, my own personal preference is to simply set up a number of processes running in a group of containers (ECS tasks/services, as one example). You get more control and visibility, at the cost of managing your own VMs and the setup complexity associated with that.
This kills me. I can't believe they haven't added this.
Then we implemented a RESTful API with API Gateway and Lambda. The Lamdbas are straightforward to implement. API Gateway unfortunately has not a great user experience. It feels very clunky to use and some things are hard to find and understand. (Hint: Request body passthrough and transformations).
Some pitfalls we encountered:
With Java you need to consider the warmup time and memory needed for the JVM. Don't allocate less than 512MB.
Latency can be hard to predict. A cold start can take seconds, but if you call your Lambda often enough (often looks like minutes) things run smooth.
Failure handling is not convenient. For example if your Lamdba is triggered from a Scheduled Event and the lamdba fails for some reason. The Lamdba does get triggered again and again. Up to three times.
So at the moment we have around 30 Lambdas doing their job. Would say it is an 8/10 experience.
How much does it cost to receive 3600x24x31 calls per month? I'd start from that.
I think a lot of people try to use the "serverless" stuff for unsuitable workloads and get frustrated. We are running a kubernetes cluster for the main stuff but have been looking for areas suitable for lambda and try to move those.
I'm not allowed to give you any numbers; here's an old blogpost about Sketch Cloud: https://awkward.co/blog/building-sketch-cloud-without-server... (however, this isn't accurate anymore). For this use-case, concurrent executions for image uploads is a big deal (a regular Sketch document can easily exist out of 100 images). But basically the complete API runs on Lambda.
Running other languages on Lambda can be easily done and can be pretty fast, because you simply use node to spawn a process (Serverless has lots of examples of that).
Let me know if you have any specific questions :-)
Hope this helps.
Since then I've been using Serverless for all my projects and it's the best thing I've tried thus far. It's not perfect, but now I'm able to abstract everything away as you configure pretty much everything from a .yml file.
With that said, there are still some rough spots with Lambda:
1) Working with env vars. Default is to store them in plain text in the Lambda config. Fine for basic stuff, but I didn't want that for DB creds. You can store them encrypted, but then you have to setup logic to decrypt in the function. Kind of a pain.
2) Working within a subnet to access private resources incurs an extra delay. There is already a cold start time for Lambda functions, but to access the subnet adds more time... Apparently AWS is aware and is exploring a fix.
3) Monitoring could be better. Cloudwatch is not the most user friendly tool for trying to find something specific.
With that said, as a whole Lambda is pretty awesome. We don't have to worry about setting up ec2 instances, load balancing, auto scaling, etc for a new api. We can just focus on the logic and we're able to roll out new stuff so much faster. Then our costs are pretty much nothing.
The strategy Lambda seems to suggest you implement for testing/development is pretty laborious. There's no real clear way for you to mock operations on your local system and that's a real bummer.
A lot of things you run into in Python lambda functions are also fairly unclear. Python often will compile C-extensions... I could never figure out if there was really a stable ABI or what I could do to pre-compile things for Lambda.
All of those complaints aside - once you deploy your app, it will probably keep running until the day you die. So that's a huge upside. Once you rake through the muck of terrible developer experience (which I admit, could be unique to me), the service simply works.
So, if you have a relatively trivial application which does not need to be upgraded often and needs very good up-time.. it's a very nice service.
I do remember logging being a confusing mess when I was trying to get this started. I feel better about the trouble I had now that I see it wasn't just me. But for a side project that's very simple to use, Lambdas have been a blessing. I get this functionality without having to manage any servers or create my own API with something like Python+Flask. Having IAM and authentication built in for me made the pain from the initial set-up so worth it.
The worst part about it by far is CloudWatch, which is truly useless.
Check out https://github.com/motdotla/node-lambda for running it locally for testing btw - saved us hours!
Here are my recommendations:
1) Use Serverless Framework to manage Functions, API-Gateway config, and other AWS Resources
2) CloudWatch Logs are terrible. Auto-stream CloudWatch Logs to Elastic Search Service and Use Kibana for Log Management
3) If using Java or other JVM languages, cold starts can be an issue. Implement a health check that is triggered on schedule to keep functions used in real-time APIs warm
Here's a sample build project I use: https://github.com/bytekast/serverless-demo
For more information, tips & tricks: https://www.rowellbelen.com/microservices-with-aws-lambda-an...
A few pointers (from relatively short experience):
- The best UC for Lambda seems to be stream processing where latency due to start up times is not an issue
- For user/application-facing logic the major issue seems to be start-up-times (esp. JVM startup times when doing Java or your API gets called very rarely) and API Gateway configuration management using infrastructure as code tools (I'd be interested in good hints about this, especially concerning interface changes)
- The programming model is very simple and nice but it seems to make most sense to split each API over multiple lambdas to keep them as small as possible or use some serverless framework to make managing the whole app more easy
- This goes without saying, but be sure to use CI and do not deploy local builds (native binary deps)
1. Installing your own linux modifications isn't trivial (we had to install the bpg encoder). They use a strange version of the linux ami.
2. Lambda can listen to events from S3 (creation,deletion,..) but can't seem to listen to SQS events WTF? It seems like amazon could fix this really easily.
3. Deployment is wonky. To add a new lambda zip file you need to delete the current one. This can take up to 40 seconds (which you would have total downtime).
Doesn't like big app binaries/JARs and Amazon's API client libs are bloated - Clojure + Amazonica goes easily over the limit if you don't manually exclude some Amazon's API JDKs from the package.
On the plus side, you can test all the APIs from your dev box using the cli or boto3 before doing it from the lambda.
Would probably look into third party things like Serverless next time.
- Runs fast, unless your function was frozen for not enough usage or the like
- Easy to deploy and/or "misuse"
- Debugging doesn't really work
All in all, probably the least painful thing I've used on AWS. But that doesn't necessarily mean much.
Building reactive systems with AWS Lambda: https://vimeo.com/189519556
We also use it to perform scheduled tasks (e.g. every hour) which is good as it means you don't have to have an EC2 instance just to run cron like jobs.
The main downside is Cloudwatch Logs, if you have a Lambda that runs very frequently (i.e. 100,000+ invocations a day) the logs become painful to search through, you have to end up exporting them to S3 or ElasticSearch.
For logging, we pipe all of our logs out of CloudWatch to LogEntries with a custom Lambda, although looking at CloudWatch logs works fine most of the time.
Need to say, that you should use gordon<https://github.com/jorgebastida/gordon> to manage it, Gordon makes the process easier.
API Gateway is a little rougher, but slowly getting there.
- For serverless APIs for querying the S3 which is a result of the above workload
Difficulties faced with Lambda(till now):
1. No way to do CD for Lambda functions. [Not yet using SAM]
2. Lambda launches in its own VPC. Is there a way to make AWS launch my lambda in my own VPC? [Not sure.]
It fails once in a while and the experience is bad, but that's mostly due to our tooling around failure states instead of the platform itself.
The only negatives are:
- cold start is slow, especially from within a VPC
- debugging/logging can be a pain
- giving a function more memory (~1GB) always seems to be better (I'm guessing because of the extra CPU)
Would be really great to have this configurable along with CPU/memory.
Additionally being able to mount and EFS volume would be very useful!
However, if the S3 keys are larger than ~500mb, then it's not possible to process them in any way with Lambda since there isn't enough "scratch space" available in /tmp.
I was suggesting EFS support merely because it would allow access to arbitrarily large amounts of "local" disk to work with...
- The CPU power available seems to be really weak. Simple loops running in NodeJS run way way slower on Lambda compared to a 1.1 GHz Macbook by a significant magnitude. This is despite scaling the memory up to near 512mb.
- Certain elements, such as DNS lookups, take a very long time.
- The CloudWatch logging is a bit frustrating. If you have a cron job it will lump some time periods as a single log file, other times they're separate. If you run a lot of them its hard to manage.
- Its impossible to terminate a running script.
- The 5 minute timeout is 'hard', if you process cron jobs or so, there isn't flexibility for say 6 minutes. It feels like 5 minutes is arbitrarily short. For comparison Google Cloud Functions let you work 9 minutes which is more flexible.
- The environment variable encryption/decryption is a bit clunky, they don't manage it for you, you have to actually decrypt it yourself.
- There is a 'cold' start where once in a while your Lambda functions will take a significant amount of time to start up, about 2 seconds or so, which ends up being passed to a user.
- Versions of the environment are updated very slowly. Only last month (May) did AWS add support for Node v6.10, after having a very buggy version of Node v4 (a lot of TLS bugs were in the implementation)
- There is a version of Node that can run on AWS Cloudfront as a CDN tool. I have been waiting quite literally 3 weeks for AWS to get back to me on enabling it for my account. They have kept up to date with me and passed it on to the relevant team in further contact and so forth. It just seems an overly long time to get access to something advertised as working.
- If you don't pass an error result in the callback callback, the function will run multiple times. It wont just display the error in the logs. But there is no clarity on how many times or when it will re-run.
- There aren't ways to run Lambda functions in a way where its easy to manage parallel tasks, i.e to see if two Lambda functions are doing the same thing if they are executed at the exact same time.
- You can create cron jobs using an AWS Cloudwatch rule, which is a bit of an odd implementation, CloudWatch can create timing triggers to run Lambda functions despite Cloudwatch being a logging tool. Overall there are many ways to trigger a lambda function, which is quite appealing.
The big issue is speed & latency. Basically it feels like Amazon is falling right into what they're incentivised to do - make it slower (since its charged per 100ms).
PS: If anyone has a good model/providers for 'Serverless SQL databases' kindly let me know. The RDS design is quite pricey, to have constantly running DBs (at least in terms of the way to pay for them)
I have heard that go will be supported in the next three months and theres a lot of improvements coming. Cant wait to ditch those js and python wrappers.
- Use environment variables
- Use step functions to create to create state machines
- Deploy using cloudformation templates and serverless framework
Better to use a stream that can trigger Lambdas natively - like SNS or Kinesis.
You do have a cold start issue as mentioned above but if that isn't an issue, then you shouldn't see any down time.
We run a beta / prod system to do testing and then for blue / green, we deploy a second function and switch the API Gateway over when we are good to go. Pretty straight forward
You don't need to use the API gateway.
Just talk direct to Lambda.
I use both NodeJS and C# lambda's without issue. The support is really good.
Debugging experience isn't great but aside from that it's fast and easy to use.
C# lambda's can call RDS and respond back in ~5ms...
(before anyone calls me out on the 5ms...)
Last image, I state 2-3 second startup and then 4ms response on a call to the database.