Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Have you shipped anything serious with a “serverless” architecture?
329 points by freedomben on June 23, 2018 | hide | past | favorite | 201 comments
I've been watching the rise and maturing of AWS lambda and similar offerings with excitement. I've also shipped several microservices in both node and Java that are entirely serverless, making use of API gateway, lambda, dynamo db, sqs, kinesis, and others.

For the simple case, I found the experience to be great. Deployment was simple and made use of shell scripts and the excellent AWS CLI.

I've been hesitant to build anything serious with it tho. The primary concern has been visibility into the app. The app's operation can be quite opaque when deployed that way. Further exacerbating the issue, we've a few times lost Cloudwatch logs and other reporting due to both configuration issues and improper error handling, but these are things that would have been much easier to identify and diagnose on a real server.

Have you shipped anything serious with a serverless architecture? Has scaling and cost been favorable? Did you run into any challenges? Would you do it again?

@ mabl (mabl.com) we've been running a serverless backend on Google Cloud Functions for over a year. It's handled 600M function calls/month without much trouble. Our findings are:

* Eventually you need to promote a function to legitimate service for cost savings (e.g. GCF ️ AppEngine Node service)

• You need a buffer (e.g. GCF outscales services it calls) such as Pub/Sub

• Multi-repo/project layout is best for deployment speed, but needs extra dev/CI tooling to simplify boilerplate

• Minimizing costs can be creative/tricky compared to legacy services

• GCF is great for automatic stats and logging (Stackdriver) and "it just works" configs, compared to Lambda

• Don't go cloud functions everything, just the parts that are a good fit

We've gotten a good uptime using cloud functions, but we're always pushing for more nines. Since the functions tie together a bunch of backend Pub/Sub queues and services/stores, a brief cold start or queue backup has no notable impact on the overall system latency or throughput.

BTW, the coolest feature of AWS Lambda I've found is tying it to SES/SNS for inbound and outbound email routing. I've been running my personal email for years through a Lambda function for a few cents a year.

Overall the space is rapidly evolving and we'll see lots more features on Azure Functions, AWS Lambda, and Google Cloud Functions. See our learnings [1].

[1] https://www.slideshare.net/JosephLust/going-microserverless-...

Any links on how to implement your email routing setup?

You just have to verify your email address in SES (https://docs.aws.amazon.com/ses/latest/DeveloperGuide/verify...) and then you can use SES to send emails on your behalf. You might also want to send a support ticket to get you out of their sandbox environment as it has a lot of restrictions. And then its very simple with Boto3 for example: https://pastebin.com/raw/hC6ihZZx

Here is the script I've been using [1].

[1] https://github.com/arithmetric/aws-lambda-ses-forwarder

I'm curious about that as well. Does he have a Lambda that sends email he gets to a spool file somewhere? What does that workflow look like? Whats the advantage over just using a regular email client or Gmail?

The advantage is collecting email from disparate addresses (e.g. admin@your-domain.com) and forwarding them all to your preferred Gmail accounts. You can't setup ACM or other certs services without that to prove ownership. Previously I paid ~$100/yr to get all these mail forwarding routes setup. You can also capture inbound mail directly to a queue (e.g. unsubscribe@your-domain.com) and feed it to a Lambda to take action.

If you need simple email forwarding you can use this https://forwardemail.net (it's FOSS)

So yeah basically serverless is just too expensive. It's cheap or free for low volume, to get you hooked, but for any "serious" service it is too expensive.

Developer time is the highest cost. Building and deploying a function in an hour minimizes time to market and opportunity cost. Lifting and shifting to a dedicated service like AppEngine is a "good problem to have" once the function's usage exceeded specific thresholds. However, if you never reach said thresholds (e.g. only a few thousand calls a day), you didn't spend the time/money setting up a service backed by dedicated services.

Spinning up an app engine service is incredibly easy and, if using standard environment, is basically free until you have load. It auto scales from zero very quickly (milliseconds).

wow, this is great insight on the economics. Thanks!

That just isn't true. For many companies the cost of their IT infrastructure is a very small fraction of their annual turnover. It all depends on the economics of your particular case whether or not this makes sense. For some companies it does, for some it doesn't.

At some point the fixed cost of real servers does become appealing.

That point comes way further along, assuming you're talking about bare metal and not simply cloud instances.

For my workloads, serverless functions would end up costing 4-5x more than operating the infrastructure myself. Given that my infrastructure costs exceed the salary of some number of developers, that's a very real difference.

Probably makes a lot of sense for early-stage, though.

It is literally tech debt, in the credit card sense. You get to validate the market faster, but if it takes off then you incur a higher interest rate (it gets expensive quickly after a point)

If you're really iterating quickly, much of the code may have a short shelf life, so you can also see it as leveraging tech debt, since a fail amount of the code may not be needed in 3mo.

I have spent the last year and a half building a completely serverless production service on Lambda, API Gateway, and DynamoDB (along with the standard auxiliary services like CW, SNS, Route53, S3, CF, X-Ray, etc.). It was a lot of work establishing new patterns for many of the operational aspects, particularly custom CW metrics and A/B deployments with Lambda traffic shifting, but in the end everything is set up nicely and I'm quite pleased with the end result. We're starting to ramp up traffic now by orders of magnitude (with many more to come) and it's soooooo awesome knowing the stack is pretty much bombproof. Another super-nice thing is all internal authentication and networking being controlled by IAM rather than security groups/VPC/traditional networking - that aspect alone eliminates a tremendous number of headaches.

My biggest complaints are probably DynamoDB eventual consistency (unavoidable when using GSIs), occasional CloudFormation funkiness (though no urgent prod issues yet, thankfully), CodeDeploy CW alarm rollback jankiness (which doesn't tell you which alarm triggered a rollback!!), and lambda coldstarts. But none of these are too terribly concerning and I have faith they'll get incrementally better over time, hopefully.

The biggest cautionary tip I have is we run all our Lambdas with the max 3GB memory both for peace-of-mind and because the underlying EC2 instances have significantly faster CPU. We were seeing weird timeouts and latency initially with <1GB memory, so I'd be hesitant to run the service if the extra cost of using the biggest possible instances is a concern, which for us it is not.

Another cost concern I should also mention is that we mitigate cold starts by running multiple canaries using scheduled lambdas (in addition to the standard canary role of generating a baseline of metrics and immediately detecting/alarming on end-to-end issues). We are effectively maintaining a constant warm pool which, in theory anyway, greatly decreases the chances customer traffic will hit cold starts. I'm not intimately involved with the financial aspects but I suspect achieving the same effect with EC2 would be significantly cheaper, at least with respect to infrastructure costs. I would guess, though, that the developer time savings achieved by massively reduced ops burden and overall system simplicity are probably comparable to the increased infrastructure cost, and very possibly hugely outweighing it.

Your setup, although appealing from an AWS ecosystem perspective, sounds like a bit expensive to me on a first reading. Of course everything depends on the specifics but Lambdas and DynamoDB are expensive at scale. I wonder how it compares cost-wise to a more traditional solution.

Yes, it's definitely expensive. But, as I stated, cost isn't much of a concern in our particular situation, at least not now. I'll also point out that the entire team is about 1/3 the size of other teams running comparable non-serverless services in production so there is a massive cost savings with respect to developer salaries. There are also multiple viable ways to incrementally migrate to more cost-effective implementations that I've detailed in other comments.

Indeed, I always get the feeling you need some sort of exit strategy to a traditional model for when your service starts lifting of. Never did the cost calculations though.

I've felt the same way. Serverless is most appealing when you're starting out and have low traffic. It enabled us at ipdata.co to have the most global infrastructure possible with the lowest latencies at an insignificant cost.

At some point I believe when we're big enough we might switch to using servers in all the regions where we currently run APIG+Lambda.

The advantage of Serverless to me seems that it already forces you in a somewhat sane design and separation of concerns so all work out into the lambda functions should be easily translatable into a different architecture.

Yes, very much this. A lot of the effort was extremely in-depth planning for scalability with respect to both unbounded traffic growth/adoption and expanding the team. In fact I coded an initial prototype in about a month that could have run quite happily on a single instance and subsequently broken out into a typical LB/Autoscaling group/DB architecture without much trouble. Now we have about 7 core microservices which are independenty scalable and deployable, many of which we anticipate handing off to dedicated teams as we expand and hire.

Very smart. Good point.

That gets to lock-in.

The stacks are super proprietary. Porting away from AWS to another serverless cloud vendor (e.g. Azure) would be a major project.

Porting to a server-ful architecture would be a full rewrite.

It really wouldn't be that bad - at least not any worse than any other migration of a massively complex project from one platform to another. We deliberately kept our implementation flexible enough to be able to move off Lambda if necessary. Our entire stack can be containerized using Docker. All database interactions are behind interfaces that allow us to swap DB implementations if needed (even to relational ones). All custom CW metric publishing is centralized in a single object that can just as easily publish somewhere else. All our APIs are defined using Swagger which is portable to lots of tooling. The worst part would be replacing IAM with whatever networking/permission model the other platform had, but even that could be approached programmatically to reduce the difficulty.

Edit: We also broke the stack into a number of independent microservices, each with their own API, DB, and dedicated CI pipeline. This would allow us to incrementally migrate chunks in parallel without disrupting the entire service.

At one point I was highly concerned with lock-in, but I'm becoming less so. in the last 7 or 8 years the companies I've worked for have exclusively used AWS, and prices have come down and stayed competitive. Giving up the platform-independence also allows you to take advantage of platform specific features, which can make development a lot faster.

I still think platform neutrality is a good goal, but I'm starting to view it like I do database neutrality. It's great in theory to be able to swap out postgres with mysql and vice versa, but you miss out on a lot of features of postgres that aren't portable. And in practice, I've never swapped out postgres for something else. Just some thoughts.

I wouldn't see it that black and white. If you do it wrong of course you will have a hard time moving away to a different vendor. But if you approach it right and make sure you have the proper abstractions in place and can switch to any other cloud provider or in-house solution without to much hassle. It's all about what cost you want to pay when. Do you want to invest upfront in all development time of frameworks and infrastructure to support your core business API or do you just want to get the MVP out and invest a little more once you established a solid user base?

If necessary, our most likely cost control measure would probably be moving from Lambda to ECS Fargate to get more fine-grained control on concurrency and warm pool size.

This is one reason we (FaunaDB) offer on-premise as well as managed cloud options. So you can run the database on any machines you want. When ease-of-use matters most, small scale apps are cheaper on cloud. With high transaction volumes, you can pre-purchase cloud capacity or run on your own iron.

>We're starting to ramp up traffic now by orders of magnitude (with many more to come) and it's soooooo awesome knowing the stack is pretty much bombproof.

This does come with additional cost. Serverless pricing doesn't scale, your costs increase linearly with your usage, and there're no discounts for bulk usage or reserved pricing.

We were recently on the receiving end of a massive HTTP GET Flood DDoS and although we did not experience any downtime as a result of it, I ended up finding out about it a few days later when billing alarms started going off.

I read in many places that you should limit your max parallel executions.

We were wary of limiting paid users. Even with lambda's max concurrent function executions limit, when the function completes in a few milliseconds, the number of invocations per second can still be high.

And Shield wasn't an option?

Waf would've required setting up cloudfront in front of our endpoints which would've increased latency.

Basic Shield is supposed to be on for all users but I don't think APIG is covered in the range of services covered by their Advanced Protection plan.

Curious - are you in an industry where you expected bad actors?

I'm not blaming you -- far from it! Just wondering how applicable your experience is to the world I normally work in.

Not particularly no.

Excellent writeup. Are you serving client web traffic (if applicable) via Lambdas? Or deploying your web infrastructure to traditional instances managed through CF?

Yes, we're serving client web traffic directly with Lambdas via API Gateway. We have no traditional instances or non-serverless components of any kind in the core stack.

So how do you store your state?

I assume DynamoDB part isn't serverless.

DynamoDB is very much serverless :) I'd highly recommend to anyone remotely interested in serverless and/or databases to read the Dynamo whitepaper to get an idea of how it works: https://www.allthingsdistributed.com/2007/10/amazons_dynamo....

I'd like to point out that the DynamoDB implementation is quite different than the Dynamo paper. They are totally different things, almost like the relationship between Java and Javascript.

DynamoDB is great, especially when it's used wisely.

DynamoDB is as serverless as Lambda. (There are servers somewhere for both, but in neither case do you operate them.)

I'll give you there are no hosts to manage, but Lambda is serverless in the sense of "stateless" and "ephemeral".

DynamoDB is still persistent storage.

It seems that every every "hosted" solution is now being dubbed "serverless". :\

Oh and of course 5+ if-statements is now "AI".

Lamdas aren't stateless or ephemeral either. Anything that occurs on code load will persist on that container (i.e., if you initialize something at the module level in Node, it will persist between calls to the same container; this can cause all kinds of weirdness if you aren't aware of it. For instance, I have seen where a dev read some data at load time, and then performing destructive operations on it as part of data transformations in the code, and then wondered why he was getting non-deterministic results back). And there's a half gig of temp space on each container you can write to as well.

While the definition of what constitutes 'serverless' is pretty ambiguous, no one includes ephemeral state as a systems requirement, else you have something useless.

DynamoDB is generally viewed as serverless because there's no management of an underlying VM, and for some definitions because it can scale out horizontally automatically, without downtime, to meet demand (as compared with RDS, or another managed database solution that can only scale vertically).

We actually rely heavily on Lambda statefulness to reduce latency - many objects/data are constructed/fetched on cold start and cached for subsequent invocations.

Having no hosts to manage is the defining feature of serverless.

When you create a DB with DynamoDB, you're just telling AWS "I need a database" and it gives you one. No need to worry about deciding how much CPU power, RAM, or storage you'll need for it.

> It seems that every every "hosted" solution is now being dubbed "serverless". :\

Got a good example of something being called "serverless" that you don't think should be? I mean, yes, DynamoDB, Lambda, etc. are all running on servers. But the idea is that you don't manage them. No packages to keep up to date. No worrying about whether or not the instance size you chose is big enough. No dealing with autoscaling to meet demand when a million reddit users hit your app.

DynamoDB, Aurora, Kinesis, etc all existed BEFORE Lambda, and noone called them "serverless" until Lambda and now everything is called that.

Meanwhile Kinesis requires you to specify the number of shards you have to use, so there is management even if they're not called "servers".

> No need to worry about deciding how much CPU power, RAM, or storage you'll need for it.

You realize you specify the RAM for lambda functions, which correlates to CPU.

And with Dynamo you specify RCUs and WCUs and you enable autoscaling which adds more...

I'm not trying to be pedantic about "The cloud being just someone else's servers". I mean that "serverless" to mean is a very explicit thing about Lambda and writing stateless code. And every existing hosted multi-tenant service shouldn't just be dubbed that.

DDB is serverless in the sense that you don't have to worry about scalability issues, but you have to worry about design issues[1] if you want to take advantage of everything DynamoDB has to offer. I find dynamo's autoscaling very problematic: It's both slow to kick-in and you can scale down 4 times in 24h, not cost-effective if you have spikes.

[1]: https://docs.aws.amazon.com/amazondynamodb/latest/developerg...

now aws allow to decrease throughput 27 times day. https://docs.aws.amazon.com/amazondynamodb/latest/developerg...

We got DoS'd with a few hundred million calls over a few days and the dynamodb cost was about $4.

To me, DynamoDB is not quite serverless, because users still pay for the reserved capacity units. In contrast, Lambda functions scale automatically.

Does Lambda have a good version control or CI workflow? My biggest question is how to develop severless functions with a team of developers.

You can utilize lambda traffic shifting in conjunction with CodeDeploy and Code Pipelines to do safe A/B (aka Blue/Green) deployments. It has some rough edges but overall works well. Check out these docs to get started: https://github.com/awslabs/serverless-application-model/blob...

It's also possible to use API Gateway canary deployments and dedicated "preprod" stages as well - the benefit being you can, in theory, use traffic shifting lifecycle hooks to deploy a new version completely isolated from customer traffic and test it before incrementally rolling out your main A/B deployment. I created an experimental proof-of-concept for this but haven't had the time to flesh it out for production usage, but would very much like to at some point.

tl;dr The answer to your question is yes, there is a good story around CI with Lambda using the AWS ecosystem. However Lambda alone is not something that should be relied on for long-term or definitive versioning on its own.

Edit: thinking about your question more, it seems you are assuming that individual developers will directly edit Lambda function code in the console and you want to track versions or trigger deployments based on that activity. You absolutely should not ever be editing Lambda code manually directly in the console outside of one-off experiments/prototypes that are completely unrelated to dev/test/production. Always keep your Lambda function code under source control and deploy using zips uploaded to S3 and CloudFormation (the serverless framework[1] provides good tooling for this, though we don't use it - instead we use SAM and our own internal tools).

[1] https://serverless.com/

what is the business model around this - I mean 18 months to get "set up nicely" strikes me as hard to justify to upper management that is not invested already

Broadly speaking, it started out as a greenfield/experimental project with buy-in from senior management that is now going through the initial phase of productization and productionization. Although the AWS bill sounds expensive the whole project was effectively done by 3 developers including myself (2 backend/full stack and 1 frontend). There were also some deliberate management decisions that greatly prolonged implementation time - we could have easily shaved 8 months off that figure taking a more straightforward path.

That is still a year long, 1/4M or more speculative investment. I would not moan too loudly about mgmt interference prolonging the project - that was a rare piece of long term willingness to invest for future technical gains - rare in my experience. (But imo the only way)

well a moan bit loudly...

Yes, it was/is a rare opportunity and I'm very grateful I had/have it. (That being said, the potential upsides are in the range of tens to hundreds of millions of dollars, even with only minimal success - so there is definitely a very real business incentive to invest in the project.) The prolongment I'm referring to isn't to do with the project being put on hold or anything like that. Essentially what happened was it was decided we'd build an "alpha" version of the stack to 'validate' the value of the project even though it was plainly clear what we had to do. The alpha stack nominally was supposed to be a cheap, quick version of the real architecture (which we'd already designed) whose supposed savings were gained by substituting manual processes for some of the APIs rather than actually building them. The end result was a. confirming what we already knew in that yes, the proposed functionality is fundamentally useful (very obvious from the outset) and b. a huge diversion of time and effort doing throwaway work that was about 65% as much work as just doing it the right way would have been, with the additional burden of having to perform the manual "API" functions, operational overhead of maintaining that stack while implementing the real one, burden of having to migrate data from the alpha stack to new stack once it was ready, and work to deprecate/tear it down once it was completely out-of-use. The overall wasted time was easily 8 months.

Woah - potential upsides in 8-9 figures ?

So without digging in to who the client is etc, are you saying that serverless architecture is today, validated enough that the AWS bill can be cut by ... what 25%? more?

That sounds like great savings (plus nice fat contractor bills for rewriting as lambda) - but is it a hit for amazon

(I suspect amazon views it as "kill your own babies" survival but am interested in the margin effect on the data centre business - it's twenty years since i touched a business model of a DC)

PS I get the prolongment issue - it sounds sensible in the 20 seconds it was covered in the approval meeting, but not when the details are looked at.

This is a new project, not migration of anything existing. "Upsides" was referring to business value of the project itself, not cost savings of migrating to serverless.


We've shipped something simple and non-mission-critical in production (URL rewriting for ad placements).

It has been pretty much set-and-forget. Last anyone had to even look at it was almost 3 years ago, and afaik it's still working (our ad sales team would be complaining loudly if it weren't).

For something peripheral like that, it's nice not to have to run servers for it or devote any energy to keeping it running.

In terms of both server costs and upkeep costs, the economics have been highly favorable.

I'm not sure I'd use it yet for something mission-critical or that shipped changes frequently. My recollection is that when we did have to adjust it, debugging was a bear. Tho tooling for that may have improved in the last 30 months.

Regarding the tooling: I hated debugging Lambdas via CloudWatch logs when using them for Hustle, it drove me to drink.

I eventually got upset enough that I made a tool to stream the cloudwatch logs to the terminal and colorize, indent + nicely format json output:


There's also https://github.com/rpgreen/apilogs but I haven't yet gotten it to work.

Cloudwatch feels like it was not made for humans. Searching and filtering for specific log events is a huge pain.

I think one of the best things one could do is to pipe your Cloudwatch logs to an ElasticSearch cluster.

Anything where you have to install python is kind of a drag. Same thing with awslogs, etc...

Saw installs as a single binary, is performant, and has better looking output than all of them.

That looks like a nice tool! Will check it out.

Have you tried anything like iopipe for helping debug lambdas?

No I haven’t tried it. I’ll give it a look.

For anyone interested in giving IOpipe a try, it supports Node.js [0], Python [1], Java [2] and Go [3].

[0]: https://github.com/iopipe/iopipe-js/ [1]: https://github.com/iopipe/iopipe-python/ [2]: https://github.com/iopipe/iopipe-java/ [3]: https://github.com/iopipe/iopipe-go/

Disclosure: Work on the Python and Go agents.

One of the best things we did at ipdata was start using Sentry in our lambda functions. We caught an unbelievable number of silent errors that would've been impossible to find with cloudwatch.

You front production with an API gateway and version your APIs. You turn migrations into a business validation and testing process rather than a technical dependency; using the load balancer / gateway as the control lever.

In this way, serverless can actually be WAY better than traditional methods for dealing with frequent changes. You can have many valid endpoints, but only one “production” endpoint that changes based on business rules (or even split for A/B testing)

For some infrastructures and processes that may make sense. The problem for us was that when something didn't work as expected, it was very difficult to determine what was going wrong.

Just the opposite of serious for me.

Example: there's a website for all the weather radar stations in Canada that lets you see the last two hours of 10-minute radar snapshots. I wanted a dump of them to try some ML algorithms, but even after requests and emails there simply wasn't one.

So I set up a lambda to run every hour, load the website for all 31 weather stations and save all 6 images from the last hour to S3.

It took me an hour to setup. I've never gotten around to making that ML project, but lambda has kept on chugging away for me, GBs of data saved away. The only real cost is the s3 storage, still under $2/month.

Serverless is brilliant for little things like that.

I use a Raspberry Pi with Raspbian for things like these. It's a really cheap one-time cost (ignoring power costs, which are fairly minor), and you can setup all sorts of things on it. If you don't need high availability, running a small system at home can often be a good solution.

The initial learning curve for running your own system is a bit higher, but I think it's well worth learning. Once you have the system up and running you start finding all sorts of uses for it. So if you need it for just one thing it might not be worth it, but as you start doing more things with it I think it really pays off.

This is brilliant! I hadn't yet found any reason to mess with Raspberry Pis. But the way you've described it makes a lot of sense. They'd definitely be a maintenance overhead, uptime wouldn't be guaranteed but I can imagine the sense of self-sufficiency you'd get would be very satisfying.

Same. I setup an AppEngine cron to call a Google Cloud Function every 8 hours to invoke an API and persist on GitHub all for free[0].

0 - https://github.com/cretz/badads

As someobe new to serverless, is your code available to study as a pseudo tutorial?

If you are into python I highly recommend Zappa. It turns your flask app endpoints into lambda functions + API gateway. The big benefit here is that it is trivial to test locally because before it gets transformed you can just do 'flask run' and use postman to test the endpoints

I have a boilerplate app / tutorial here:


'Serverless' is an adjective. I'm happy to trade some Internet Points™ for the chance to point that out.

Good example of a suitable use case, thanks for sharing.

I can't name any names for obvious reasons or give you more hints about what industry this company is in but I just did DD on a very impressive outfit that ran their entire company on Google's cloud platform, it held about 500T of data and held up amazingly well under load.

I was super impressed with how they had set this all up and they were extremely well aware of all the limitations and do's and dont's of that particular cloud implementation.

Obviously there is the lock-in problem, if you ever decide to move you have a bit of work ahead, so build some abstraction layers in right from day 1 to avoid hitting all your code if that time should ever roll around.

And cultivate contacts with your cloud vendor.

There's a big difference between running on GCP and "serverless architecture" Did they have everything running with Google Cloud Functions?

Jacques can absolutely tell the difference between CGP and serverless architecture, so if no more information is forthcoming, it's due to non-disclosure agreements ;)

In other words, someone wrote a cryptic comment and is unable to elaborate how much of it is relevant to the topic discussed because important non-disclosure. Cool.

I will disclose what I can short of anybody being able to determine the nature of the business, the vertical they operate in or which business it is.

Context is everything they say.

Can you comment on their serverless architecture, which was the point of this post?

Well, given that it is serverless the infrastructure is operated by the provider, in this case Google.

That leaves the company to use the various APIs.

So you use Google Cloud Functions to ingest data and do all preliminary processing, store the data in one of the various persistent storage options (Spanner, Bigtable, whatever is best suitable for the job) processing optionally using background functions or containers for further processing or presentation.

Given that a reasonably short while ago I did not yet see Google as a serious contender in this space I'm actually surprised how far they have come.

You can basically create an enterprise class application dealing with vast amounts of data and never even know on what silicon (or where...) your processes are running.

Of course you still have to give some parameters, such as in which DC you want to run your stuff but on the whole it is about as painless as it can be.

For more info this would be a good starting point:


Important notes about the execution environment limits:



Due diligence.

Yep. Sorry.

My guess is “Due diligence”

Due diligence?

Dilly Dilly

My uptime monitoring project uses AWS Lambda heavily, almost exclusively https://apex.sh/ping/ — it has been great. I've processed 3,687,727,585 "checks" (requests really) with it, and I only had roughly 1 hour of downtime two years ago in a single region. Since then it has been stable.

I have 14 or so regions so doing the same thing with EC2 would have considerable overhead, though I can still imagine many cases where Lambda would not be cost effective, but its integration with Kinesis is fantastic as well, stream processing almost cannot be easier, and while people say Kafka is more cost-effective, with a bit of batching you can get a long way with Kinesis as well.

How much does it cost you monthly?

Since he's charging the customer I doubt he will give that information up.


But the business idea is simple so I am sure it would be easy to calculate a rough idea.

99% off our sas analytics frontend is backed by aws lambda. I love not worrying about underlying infrastructure. We have close to 150 lambdas running our api. We do not use api gateway instead we use apigee. For logging we built a logging module that logs to kinesis then to s3 and elasticsearch. Hardly ever look at cloudwatch, those logs get expensive after a while so we only keep 3 days. We use node, python and java depending on needs. It's a good idea to benchmark your lambdas and determine the resource size, a little bump can have a dramatic difference in execution time but after some point you are just wasting $$$.

Your last point is very important. At the very least start with comparing your function running with what it needs and then maxed out. Don’t forget warmup time.

Software developer at Transport for London here.

A substantial part of a few of our systems use Azure Functions.

We've actually found them excellent to work with, fairly cheap and scaling very efficiently.

The main issues we've had are internal disagreements about how to pass app settings in efficiently. Originally we put all the app settings in the ARM templates used to deploy them. Then we put them as variables in VSTS. And then finally we decided to put them inti variable groups, which are within task groups, which are used by releases. It's a bit of a weird chain of dependencies, but now all of our parameters are located in one place

Was latency a concern for you at all? If serverless did not exist, would the functions have been (micro) services with similar latency overhead?

I've found functions apps have a slow startup time, but once they are going, they perform pretty fast.

I think we currently don't have any functions in time-sensitive streams (they're rather new, so they are used for new features such as Oyster automated refunds, which can be applied a few days later if need be), so when I say we haven't had any performance issues with them, it has to be taken with a pinch of salt.

I think if we were using microservices, they would have substantially more logic in them than the individual functions have, so they would have a lower network overhead. They wouldn't scale automatically, so we'd have to have quite a few machines on all the time, which would cost a bit more.

The main benefit we enjoy from functions is the ability to change code with zero downtime, low risk and minimum disruptions to the whole service.

We've found the interfaces in Azure also to be great - we can write end-to-end tests that can poll service bus to check when all messages are delivered to assert against any final case.

This is fascinating stuff — every time I use my oyster card I have wondered what it is doing behind the scenes.

Do you know of any write ups about TFL's infrastructure/processes? Would love to read about it.

I'd also like to buy whoever lead the TFL API's move from XML to JSON a coffee|beer!

I don't know of any comprehensive writeups - even the internal wiki is lacking

You should check out CosmosDB for holding the app settings. Integrated with azure functions out of the box.

At ipdata.co we use the same APIG+Lambda setup replicated in 11 regions to have the lowest latencies globally. We had to do some extra work to get API keys and rate limiting working but it was worth it. Our setup averages ~44ms response times - https://status.ipdata.co/.

We wrote about our setup in detail on the Highscalability blog [1]

A few things have changed since we wrote that article;

- We implement custom authorizers, which have helped lower our costs and the auth caching means authentication only happens once per x minutes and all subsequent requests are much faster.

- We use redis and a couple of Kinesis consumers running on a real server to sync data across all our regions. This setup has been battle tested and has successfully processed more than a hundred million API calls in a single day in near real time. [Use pipes and mget in redis for speed]

Here are some answers to a few specific things you raise in your question;

1. Use Sentry for lambda for error handling. The logs you get are incredibly detailed and have single handedly given us the greatest visibility into our application, moreso than any other tool we've tried (like AWS Xray).

2. Cloudwatch logs are tough. You might want to consider piping your logs to an Elasticsearch cluster, that might be a bit costly if you use AWS's Elasticache.

3. We use terraform for deploying our lambda functions and other resources. I'd strongly recommend it.

[1] https://highscalability.com/blog/2018/4/2/how-ipdata-serves-...

We did. We are building our entire company: SQQUID on 100% serverless architecture. Scalability is awesome, in fact we had to do extra work to serialize some operations in order not to bring down other major corporation's server stack. Cost is a fraction of the traditional app scaling setup.

The best part is no devops needed. We use Serverless Framework. The biggest downside are cold starts for frontend response time. But this hasn't been a terrible issue as of yet. We have considered moving these 20 API endpoints to a nodeJS server which will resolve the issue but didn't have the time to do it yet.

We'll never go back. Serverless is the future.

What would the nodeJS server be?

Key parts of https://auth0.com is built on top of their public serverless offering Extend serving 100M+ authentications/day.

We at ReadMe recently launched Build (https://readme.build), a tool for deploying and sharing API's! It uses serverless under the hood, which makes it fast and easy to spin up your tasks in the cloud. Services can be consumed with code (Node, Ruby, Python), or via integrations with Slack, Google Sheets, etc. All you need is the one API key we provide to you.

We use it internally when fetching usage metrics, receiving notifications for new sign-ups, and to monitor page changes on our enterprise app (https://readme.io), and use these endpoints frequently from Slack channels.

We manage versioning, rate limiting, logging, documentation, and have offer private services and user management for teams.

Creating an API is as easy as:

  $ npm install api -g

  $ api init

  $ api deploy
AWS Lambda performs wonders. It's enabled us to make it as simple as humanly possible to create, deploy, and share functions. It requires little prior knowledge to start tinkering with, has a growing community to provide support, and handles smoothly in a production setting.

Serverless rules. I'd love to hear any feedback on our implementation of it! Give Build a try at http://readme.build, and feel free email your thoughts to achal[at]readme.io.

Instead of hitting our ingresses / load balancer, we made it so that webhooks hits a cloud functions, which then transform it to a cloud pubsub.

We listen to the cloud pubsub from a worker.

1) we don't manage it. We receive quite a lot of webhooks and it's nice to offload that 2) all of our webhooks are async. We just have 1 worker that handles it all, instead of provisioning a bunch of pods. 3) managing cloud functions is dope, since you can make it autodeploy from git.

10/10 would use again. Not sure about building a whole app around it tho

We have our backend processing for both https://doarama.com and https://ayvri.com done in lambda.

All of https://ayvri.com is built using serverless.

(we'll be migrating from doarama to ayvri in the coming weeks)

For our processing we handled 200k uploads in one hour on ayvri when we where building scenes for the Wings for Life event (the World's largest organized run).

When just relying on serverless triggering an event from s3, the cost was high due to the volume of scaling, time in spinning up new services, etc. etc. We built a queuing system which manages load and then spins-up new instances based on the load in the queue. This resulted in a much faster response, and SIGNIFICANT reduction in cost.

For the ayvri website, some pages are slow due to the lambda's not being warm, and I'm surprised users haven't complained. The important stuff is kept warm, and we're working on scaling that out for more responsiveness across the site.

As far as visibility into the app, I'm not going to pretend this is a solved problem. At the moment, we have most of the visibility we need via cloudwatch, and we have built some of our own analytics.

We had one instance where there was an issue between db connectivity which we were not able to resolve. We have put it down to a short networking issue between services. It lasted for 5 minutes one Sunday morning and then went away. So we had enough visibility into the service not being available, but failed in deeper understanding of where the problem was.

If you have further questions I can help with, feel free to reach out.

I will say, that I bought into serverless and went whole hog. I probably don't recommend that. We jump through some hoops we probably wouldn't need to if we had run our website via an ec2 instance and cloud-formation managing the scaling.

However, we have a few of our services which can come under high load quickly, and we don't need to scale up the entire site to serve those, such as our track processing. We believe Serverless was the correct decision for those processes.

Why was the cost high when relying on triggering from S3? Isn't lambda charged per invocation?

There are a few reasons, it gets complex, but it isn't just invocation, it is billed by a combination of invocation and time. https://aws.amazon.com/lambda/pricing/

That time includes the amount of time to spin-up the lambda.

Ours is a long-running lambda, and we benefit from some local cache when they are running as well.

In order to handle the load, we had to extend the number of lambda's available on our account as well, so we are talking about 1000s of seconds being eaten up every second.

The "serverless" architecture is cool, but it irks me every time I hear the given name for it. For a marketing term mainly aimed at developers, I'm amazed they picked one so terrible.

> For a marketing term mainly aimed at developers

I think the marketing is aimed mainly at technology management, not developers. Obviously, they have to evangelize developers, too, to a certain extent because they need a critical mass with familiarity so that the managers aren't saying “that’s nice, but who do we hire to build stuff on it”, but developers aren't the ultimate target of the marketing for serverless or most other cloud technologies.

It's a bit like driverless. Hopefully something or someone is driving the car, but how that driving comes to be isn't really your concern.

I've heard the analagy used that "Servers are to Serverless what wires are to wireless", the main point being that they're still there, but they're no longer something you manage or directly interact with.

I'm also starting to see the term LaaS (Logic as a Service) used as an alternative to "Serverless" here and there.

I think the analogy falls apart because in the part of the system that's referred to as "wireless", there is literally no wire. The serverless comparison would be more like if they used a lot of zip ties and some well-placed rugs so you never saw the wires going directly to you computer and called that "wireless".

LaaS sounds much more accurate/palatable.

Not just that, but "serverless" typically involves integrating a muddle of AWS/GCP/Azure services, locking you in. Portable software is the analog to "wireless" here, as it gives you the freedom to... move.

Agreed, it doesn't make much sense hearing "serverless". Like, is data being served to a client or not?

I haven't done much reading up on it but it basically sounds like someone else is hosting and executing your code?

To my understanding, it mean you write endpoints and only endpoints. You don't have to configure an OS, or even write the code that says "bind to port X and start". You just say "when you get this request, do this". Which is a neat abstraction! But there's still a server, you just don't have to manage it directly.

That makes perfect sense. Calling it "serverless" is kind of weird though. I can see the point from a marketing perspective but "serverless" is just such a weird term.

You definitely aren't the only one bothered by the name.

For a marketing term mainly aimed at developers, I'm amazed they picked one so terrible.

The message is "you don't need sysadmins anymore", which is exactly what developers want to hear.

It's also very good for the cloud vendors, because with noone in the sysadmin role, developers are far more likely address scaling issues by just throwing resources (money) at the problem.

Which is funny because my next job i start soon is exactly "sysadmin for serverless".

Indeed. You still need someone whose job it is to ensure your S3 buckets are secure and whatnot. The job is different day to day but the need for someone to take responsibility for it hasn’t changed.

Yes. We use both Lambda and Lambda@Edge, but for different reasons.

Virtually all async processing we do is achieved by the main service (running as a normal microservice - no serverless stuff) pushing into an SQS queue, then a Lambda function running every minute pulls from the queue to e.g. report policies to our underwriters, issue policy documents, capture payments, etc.

Essentially anything which happens after the hot path to get a response to the user - all this stuff can take a little while, isn’t really time sensitive, often needs to be tried multiple times, etc.

The Cloudfront distribution we have for our API passes all requests and responses through a Lambda@Edge function before/after the request hits our real system.

If you’re not aware, Lambda@Edge runs in the Cloudfront PoPs, so super low latency and can reject/respond to requests without going back to our real server.

We use it basically as middleware and one way to protect our real backend from potential bad actors:

- applying CORS headers

- doing basic (offline) auth checking - as we use JWTs (this happens in the services too, where it’s checked online)

- removing unwanted input headers

- generating and setting a request ID header

- setting X-Forwarded-For appropriately

- enabling the use of persistent HTTP/2 connections on a single hostname, even though our underlying services are all on separate hostname (basically some URL rewriting)

- enforcing minimum mobile app versions (as a regulated company, we eventually have to break very old versions of our mobile app, as they contain copy which is no longer correct/true)

- even calculating and returning insurance pricing with ultra-low-latency without having the latency of going all the way to our real servers in eu-west-1 - we’ll do a blog post about this at some point

Overall I must say, I’m incredibly pleased with Lambda and Lambda@Edge.

Only criticisms are that the GUI is an incredible pain in the ass to use (we don’t use any off-the-shelf serverless framework), and they’re very slow at supporting new Node versions.

Lambda does now support Node 8 (with async/await support etc.), though it took months. But Lambda@Edge still only supports Node 6 which has been quite difficult to continue supporting across our monorepo.

Aha! That’s really excellent and will make it so much easier for us to support Lambda@Edge.

Thanks for letting me know :)

Never done any API/HTTP stuff with lambda, but I've built many ETL pipelines using it.

A few years ago we tried the Kinesis -> Lambda thing, but it failed during a large traffic event. This was due to the way the Kinesis poller works, and was made worse by the fact that we had two lambdas running off the same Kinesis stream.

The main issue was you can't control the Kinesis poller - meaning we ran into resource contention during this high traffic event and the iterator age fell behind quite drastically. So we abandoned it in favour of EMR + Flink + Apache Beam.

Other then that though, the S3 -> Lambda stuff works perfectly, and has been running for 3+ years with no issues.

We also found that using lambda to process kinesis logs was incredibly slow and definitely not the best use case for lambda.

We had much better results running a server with a couple of kinesis consumers and redis.

This setup was able to process hundreds of millions of records in near real time.

I have some python based lambdas for simple service/user story monitoring. The couple problems I have with lambdas, which mean I will _never_ use them for a proper application:

- The language choices available do not meet my needs - Impossible to create a prod-like setup; all our services run on k8, so mini-kube works great locally - You lose any sort of control over architectural decisions (for better or worse) - Poor code structure/quality/re-usability leads into poor developer experience. I enjoy most new technologies I pick up, all I got from Lambda was frustration.

As an ops person this is a super interesting question, so its really kindof surreal to read dozens of replies wherein not a single one mentions throughput, tail latency, or error rate measurements.

For near realtime systems that scale it is right up there with the fastest application servers. In fact, if you take the auto-scaling properties into account it probably beats those servers because it can do it seamlessly up to incredible number of requests / sec without missing a beat. If you want low latency you can replicate your offering in as many zones as you feel like.

People start worrying about throughput, latency and error rates when they become high enough (or low enough) to measure.

My personal biggest worry is that if your Google account should die for whatever reason your company and all its data goes with it. That's the one thing that I really do not like about all this cloud business, it feels very fragile from that point of view.

> In fact, if you take the auto-scaling properties into account it probably beats those servers because it can do it seamlessly up to incredible number of requests / sec without missing a beat.

Autoscaling is one of those things that's easy to name but hard to actually achieve. I've had some involvement with an autoscaler for a few months and it's been educational, to say the least.

In particular people tend to forget that autoscaling is about solving an economic problem: trading off the cost of latency against the cost of idleness. I call this "hugging the curve".

No given autoscaler can psychically guess your cost elasticity. Lambda and others square this circle by basically subsidising the runtime cost -- minute-scale TTLs over millisecond-scale billing. I'm not sure whether how long that will last. Probably they will keep the TTLs fixed but rely on Moore's Law to reduce their variable costs over time.

Disclosure: I work for Pivotal on a FaaS product.

Has anyone else found that the CloudWatch logging cost of millions of calls to Lambda is actually as expensive or more expensive than Lambda itself? I have to get rid of the IAM permission for it.

We actually found this out the hard way. We got DDoS'd and cloudwatch was our second most expensive item after API Gateway.

Lambda and Dynamodb were almost insignificant.

Cloudwatch logs have been a big game changer versus on-disk logs for us. Getting into larger clusters (and larger log files), figuring out what’s happening in log files became somewhat arduous. https://github.com/jorgebastida/awslogs for viewing / tailing / searching is a lot easier. It’s also fairly straightforward to get logs streamed through to ELK hosted within AWS, if you’re interested in that angle.

Do you guys find CloudWatch logs too slow? Even on very small apps I find searching the past few days takes minutes (5-10m), but I'm using "structured" JSON logs as well.

CloudWatch is definitely a write optimised product. Any kind of reading or searching is pretty terrible.

I’m sure AWS would recommend you use Kinesis Firehose or something to put in ElasticSearch and use Kibana or something. But I would be great to have an equally scalable log parser/searcher.

All of the Cloudwatch is very slow (Logs & Metrics & Alarms) we hate it

Have you tried IOpipe? [0]

[0]: https://iopipe.com

Disclosure: Work on the Python and Go agents.

We haven’t had issues but we keep our time windows short for any searching through logs (and we don’t use json logs so that’s an interesting difference). We use other systems for longer range log analysis.

For anyone interested, here's a handy calculator comparing Lambda to EC2: https://servers.lol

I have a production service running on AWS Lambda and I haven't run into any major challenges. The Lambda service is responsible for authenticating to a downstream third party service and proxying requests along with an access token. I would consider this a simple use case.

CloudWatch have provided me all the visibility necessary to troubleshoot issues. I think the important thing here is to have a good logging strategy (logs are only as good as what you put in them). In my case, I made sure info messages were logging for the start and end of use cases (e.g., "reseting password", "password reset successfully"), warn messages for non-fatal errors (e.g. "username not found"), and error messages for fatal errors (e.g. "unable to connect to database").

The only frustrating limitation I've run into is when the Lambda function times out before receiving a response from the downstream service. At one point, the downstream service was having major performance issues and responses times were crazy high. This meant I couldn't get a response code and had to run the downstream calls locally to troubleshoot.

Performance is not great (most requests are in the 400-500 ms range), but it's more than adequate for my use case. A large portion of the response time is likely due to the downstream service, but there are cold starts that spike response time way out of normal range.

Overall, I'm really happy with AWS Lambda and it's definitely top of my list when taking on a new project. I'm really interested in experimenting with AWS Mobile Hub in the future. It doesn't get much better than one-stop serverless shopping.

My company uses them as a “shim” to ingest custom data sources into a data warehouse. It produces a really elegant separation of concerns, where the fiddly custom part is isolated in the lambda. I wrote up a blog post about how this works: https://fivetran.com/blog/serverless-etl-with-cloud-function...

Disclaimer: this is all AWS related as this is the cloud I'm using. I haven't tried Google Cloud Functions or the Azure equivalent.

I've been working with Lambda a lot more lately and it is not so bad... but also not great. I'm saying this cause I found it hard to have a git-first (or git ops) workflow that is good in AWS: it looks like everything is made to be changed manually. CloudFormation is a slow with some resources (if you need CloudFront it will take tens of minutes) and CodePipeline has a pretty terrible UX and user experience. CodePipeline is cheap and it works for sure, but it's not a good system for pipelines as restarting, terminating steps and getting the output of steps just don't work in a decent way (I want to see the output in the steps, not jump to CloudWatch). Pretty much every other system outside of AWS is better than that, but the integration with Lambda and APIGateway is not as good unfortunately. If you know of a better system for CI/CD with AWS Lambda outside of CodePipeline, I'd be interested to try it.

In a similar way, most of the serverless frameworks I've tried are written for a workflow that is executed from CLI which is great to start and attractive for developers but not good enough for a company that aims at full reproducibility of setups and "hands off" operations. Source code change should trigger changes in the Lambda/API Gateway setup all the time and it would be great if devs don't have to trigger changes manually.

Apart from those steps, I think Lambda is definitely promising and I see the company I'm working for right now using it more and more. The developer experience is still lacking IMO but I'm confident we'll get there at some point.

I've written and shipped numerous sites using Zappa for Python which makes deploying on Lambda/API gateway very simple. https://www.storjdash.com is entirely Lambda based (sorry for no real home page....you can read about StorJ at https://storj.io/)

I did a project using the first version of Azure Functions. It was early days with growing pains. I'll probably use Lambda on my next project. My general take is that serverless is a black box just like any computer - you need to figure out the rules of that black box and then accept those rules or tell the people who can open the box to fix something if it's broke.

Serious, but small :)

Open source A/B testing backend using AWS Lambda and Redis HyperLogLog[0]

Have been using it in production for last couple of years with great success. Replaced our (potential) ~$2000/mo Optimizely bill with somewhere in the region of $10/mo.

[0] https://github.com/Alephbet/gimel

Yes, I have shipped 3 applications already using serverless (https://serverless.com/) and AWS Lambda + API GW. All applications are in production for few months now.

The applications:

- creating snapshots of EBS volumes

- disabling users in IAM

- handling HTTP form submit for email list subscription

Are biggest complaint has been the cold start time. I cant say we have enough production data to see how much its effecting clients though.. We dont have enough constant traffic to keep them primed...

We are all in with AWS so the code build, cloudformation and code pipeline all work really well with lambdas.

We run a service that manages deployments for Serverless Framework applications - https://seed.run and it is completely serverless. It’s been great not worrying about the infrastructure. Would definitely do it again.

Yes I have— Cloud Custodian (https://github.com/capitalone/cloud-custodian) relies heavily on its event driven serverless policies to enable compliance across a large AWS fleet— filling in the gaps that are otherwise missing in IAM. Currently there are in the order of magnitude of 1000s of lambdas deployed across 100s of accounts and it definitely does exactly what it needs to do with little maintenance. Monitoring is done with a combination of cloudwatch, datadog and pagerduty so getting alerted on failing or errored invocations is completely built into our workflow.

We’re currently using Xamarin and azure as a backend, though not serverless yet as stuff like cosmos pricing is still too expensive. But using Xamarin, which is terrible on its own, we’re also splitting our clients up in two languages. I’d like if we could adopt flutter and angularDart for clients, and I’d really like to run some of the backend in something like Firebase / cloud firestore.

I’m not sure if google is a good option in terms of privacy and EU legislation though. I have my lawyers looking into it currently, but to be honest I’d love if Microsoft made an Azure alternative and fully embraced dart for azure.

This is probably unlikely, but it would be really awesome.

Developing a SaaS product based on completely proprietary stack that you can't even host yourself is VERY dangerous!!! Just yesterday there was a report that twitter bought a company and immediately closed their API access to all customers.

What will you do, if Amazon decide to close your AWS account for some reason? What if they discontinue one of the services you use?

Here is a huge list of products discontinued by Google as example: https://en.wikipedia.org/wiki/List_of_Google_products#Discon...

Most of the stuff in GCP won't be closed like the list as they are used by enterprise customers (Or at least, there will be proper notice, migration path etc). I agree that at the end of the day you have little control over the infrastructure, its impossible for a lot of companies to maintain them which is why they are going to cloud. If you are very worried, you can just use the VMs (EC2 or GCP's VMs etc) and not use other services.

I think the other more important path mentioned would be: "what happens in case of a ban/restriction?". Not with AWS, but we've previously had accounts locked/closed "accidentally" - thankfully they were non-mission critical.

Not sure about the qualification for the term "serious", but we are making $1000 revenue each month serving hundreds of customers and all our backend is built completely on Google cloud functions.

https://aprl.la https://itunes.apple.com/us/app/aprl-mens-clothing-network/i...

In fact, we had one of our API running in EC2 which recently migrated to cloud functions.

We're really passionate about static sites with cloud functions for dynamic functionality. After using Snipcart on a few sites and feeling we could do better, we actually built out our own e-commerce solution as a drop in product for static sites. It's all baked on firebase and cloud functions and we're loving it. It's super fun to work with and costs dollars to run. I'm usually very averse to the "build" end of build or buy scenarios but we couldn't be happier with the end result.

Yes, our entire email attachment -> image processing pipeline is serverless on Lambda, written in JS.

So far, we love it. It handles roughly 20k images a day.

You are right that Cloudwatch logs are a hassle. So we pipe all of the log events into Scalyr (and log JSON objects, which Scalyr parses into searchable objects).

In terms of error handling, Lambda retries once on exception. So we raise exceptions in truly exceptional cases (e.g. - some weather in the cloud prevents a file from being downloaded or uploaded). We have Cloudwatch alerts that notify the team for every true exception. Happens less than once a day.

In pseudo-exceptional cases (e.g. a user emails an invalid image), we simply log to Scalyr with an attribute that identifies that the event was pseudo-exceptional, and then set up Scalyr alerts to email us if the volume of those events goes above x per hour.

tl;dr - Cloudwatch + Scalyr with good alerts and thoughtful separation of exceptions from pseudo-exceptions is my recommendation!

Deep Learning classifier services via AWS Chalice. It's trivial. However, 50MB/250MB lambda limit is a pain, one has to cut down TensorFlow, Keras etc. significantly before deployment (doable but difficult) or do some S3 tricks with /tmp. I wish they allowed increasing this limit for extra money. It's cheaper than EC2 Elastic Bean Stalk though.

I wouldn't do that again; I can waste time on more interesting things than to hack artificial limits of architecture that will be changed at some point anyway.

Yes! It's great. Scales very well. Costs almost nothing. Focused only on business logic. Provision in minutes. Deploy in seconds. I will never go back.

- https://begin.com - https://brian.io - https://arc.codes <-- the framework I used

Yes, at my last company we needed to generate Open Graph thumbnails that composited several images together. We decided to use a serverless architecture that basically shelled out to an ImageMagick command and then pushed the thumbnail to S3 which was served via a CDN. The main problem we ran in to was a lack of processing power, but the new options from AWS solved our issues. I'd definitely do it again.

I think this gets at the core of why I think "100% serverless" isn't the right move for many projects.

If you decide you're never going to boot a machine and manage it yourself, you're locked into the exact set of choices your cloud had made available for you. When your project has a need that's not covered, you're stuck.

I'm not talking about vendor lock in here, purely about the reduced flexibility within a given vendor if you choose to never manage a server yourself.

This is totally correct, but I think it's also correct to say there is a huge vendor lock-in component to serverless. People make the claim that you can engineer your application to be vendor agnostic, but even if that's the case, you're still dumping a ton of time/money into AWS-specific tooling to get your application going, and almost none of that experience is transferable. Nothing from AWS API Gateway is applicable anywhere else, for example, and that's frankly one of the most awkward of all of the AWS services I've ever used (and so the costs are even higher than if it weren't). That's not to say that there won't be some form of serverless in the future that doesn't have this enormous vendor-specific lock-in cost. But that serverless is not here now.

Yes. bustle.com, romper.com, and elitedaily.com are all 100% serverless and do 80+ million unique visitors per month. GraphQL+AWS Lambda

Interesting. Are you using AppSync?

Nope, appsync came out a long time after we were in production and while it's very powerful, it's would require a full rewrite of our application. It's an application service/framework upon itself.

The Bustle stack looks like Redis/Elasticsearch => NodeJS Lambda GraphQL API layer => (sometimes api gateway) => NodeJS lambda render layer => api gateway => CDN. We're working towards removing all the api gateway usage if possible with smarter CDNs like cloudflare workers and Lambda at edge, but it's not currently possible.

This setup gets us an average of 70ms api response time and less than 200ms worst case rendering time. Higher than 90% of cache misses never gets the worst case as we can serve stale content in those cases. Lots of room for improvement too. =)

I'm curious on what's missing from Cloudflare Workers to allow you to remove the API Gateway usage. We're actively looking for more advanced use cases so we can make sure we prioritize upcoming features. Reply here or send me an email at <username> at cloudflare.com.

Anyone have a recommendation how to get started with AWS serverless options? There are so many AWS services and it isn’t intuitive when you login and see all the stuff you can do. Coming from Heroku where you do it all yourself, knowing when and how to break up functionality among AWS tools is not always clear.

we’re running a collaborative document editing service for mind maps (www.mindmup.com) entirely using lambda and associated services (such as kinesics and api gateway). started migrating from Heroku in 2016, went all in around February 2018. My anecdotal evidence is that we’re a lot more productive this way.

"Serverless" (by the current way it is approached) is more of the likes of "Cloud computing". Someone's server is your server. It is not difficult to create real serverless apps today, that works disconnected and fetches cached data when there is new data available.

We use Azure Functions to pull data out of DataDog and other services (bespoke) to chuck in a capacity management database, and then use more FaaS's to munge the numbers etc.

We also use it in prod with for events, and customers who subscribe to message queues and webhooks etc.

Just deployed a s3 bucket virus scanner for my main SaaS

Customized this for my needs: https://github.com/upsidetravel/bucket-antivirus-function

Not "serious", but I can definitely recommend it for simple (transformative as in webhook -> api) endpoints you don't want to care about hosting/maintaining a server for Low volume stuff is even free (at least on googles cloud)

I've been shipping serverless ever since it was launched on AWS. More recently I am using Google.

The biggest thing I am struggling with right now is how to appropriately split projects for module size. This may be more of an issue with Firebase functions than AWS because it's a lot easier to create separate projects with AWS for firebase and constrain the project. Google Firebase Functions very much assumes a big package architecture. We could break it up into multiple firebase projects but that separation creates a lot of annoyances.

It'd be awesome if you could split packages up so the resources don't have to be shared within the various functions.

tl;dr want to split modules and functions up with explicit dependencies to optimize bundle size / cold boot.

Additionally I found it to be an anti-pattern to use any DB that requires a connection pool vs HTTP based commands. It's annoying as heck to manage connection pools with serverless and seems downright buggy or broken. If you want to support it you need to centralize it with something like PGpool which seems like a big anti-pattern. I hate dynamodb but am loving Google's offering (firebase firestore or datastore).

AWS lambda running flask endpoints Accepts a variety of geojson and json inputs and talks directly to both RDS and S3.

Used as a backend for 2 webapps , one is for viewing geospatial data and 1 for creating transport scenarios.

We made a serverless backend with an API builder based on aws lambda. We have hundreds of production system running on it, so we are pretty happy with serverless :)


Currently building a startup off of serverless tech. Hoping to have few enough API calls that it is actually cheaper this way.

I'll soon see if the back of the envelop calculations were correct. :)

I recall Jed Schmidt talking at various events about UNIQLO mobile app having a lot of serverless/lambda in their architecture. IIRC for preprocessing/images.

I was working for EllieMae, to build their cloud platform. They were not happy with the performance of the product, though it reduced ops work.

was this mostly cold starts or another architectural choice causing a bottleneck?

Lambda is just one of the serverless services out there.

S3 and DynamoDB are serverless too and many big projects run with them.

How does one put a bunch of functions in source control? Or is this an opportunity for a new product?

A serverless app of mine gets only 30-40k hits a month... it’s serious to me though.

iRobot run their surprisingly large systems using AWS Serverless products as much as possible.

So, you've kinda mixed a few things here including: "Cloud functions", "Cloud Streams" and "Cloud databases" [1].

I've shipped significant work on all 3 now, with the least focus on the newest bit (cloud functions). Since you asked, here are my opinions:

# Cloud Databases

These are almost always a slam dunk unless you or someone else on your team has a deep understanding of MySQL or Postgres [2]. They often have unique interfaces with different constraints, but you can work around these constraints and the freedom to scale these products quickly and not worry as much about maintenance can be an enormous boon for a small team. This is fundamentally different from something like AWS RDS, where you do in fact sort of "have a server" and "configure that server". These other services have distribution build into their protocol.

Of the modern selection, DynamoDB and Firebase come to mind as particularly useful and spectacular products for key value and graph stores (DynamoDB is surprisingly good at it!). If you're using GCE, Spanner is some kind of powerful sorceros storm that does your bidding if you pay Google, it's really surreal the problems it can just magically solve (it's the sort of thing where it's so good your success with it disappears until you have to replicate it elsewhere and realize how much your code relied on it).

# Cloud Streams

I've been using these nonstop for about 6 years now, with most time logged on SQS. For some reason a lot of people object to streaming architecture on grounds of backpressure [3], or "want to run their own because of performance" and end up hooking zookeeper and Kafka into their infrastructure.

For small products or growing products, You Will Almost Certainly Not Overload SQS or Kinesis. You Just Won't Unless You're Twitter or Segment. Write your system such that you can swap streaming backends, and be prepared to solve obnoxious replay problems moving to a faster and less helpful queue.

Lots of folks are convinced they need to run their own RabbitMQ service so that they "can see what's going on." Given how incredibly reliable SQS has been for me since its introduction, I'm disinclined to believe that. While RabbitMQ is a fine product, I'd rather just huck stuff on SQS, obey sound design principles, and then only transition to faster queues ones.

# Cloud Functions (Cλ)

Firstly, these solutions work fine. I've only shipped on Lambda, and I will say I was underwhelmed. There are two reasons for this: cost and options. Cloud Functions with API Gateway is just about the most expensive way you can serve an API in the world of CSPs right now. The hidden requests costs are (or were when I set this up, shipped then tore it down looking in horror at my spend) just stupid. As for Options, it's very obnoxious how these environments (GAE, Lambda, etc) can only bless specific environments rather than giving us a specification over I/O or shm we could bind to. I want to ship Haskell in some cases and it's stupid what I have to do to enable that [4].

Much has been said about how spaghetti-like these solutions are, but I think this is more of a tooling issue. If you can actually specify Cλ endpoints in a single file, then you can write a uni-repo for a family of endpoints that share common libraries, build for those, and terraform/script them into deployment. This is actually probably more principled than how most folks cram endpoints into a single fat binary. It also makes things like partial rollouts on an API a heck of a lot more easy to implement.

But still, out of the trinity of CSP products, Cλ is by far the least exciting to me. I seldom ship API endpoints there. I usually use it for small cron jobs or data collection jobs where I'm confident I wont' end up with 4 running instances because a looped call is timing out.

[0]: I'm experimenting with writing these mega posts with classical footnotes as opposed to making them epic journeys to slog through my prose style.

[1]: I hate myself more every time I say the word cloud even knowing it's the lingo folks will understand the most. They're service products. Let's all sink into despair together.

[2]: And by "deep" I mean, "Good enough to have a reputation suitable for a professional consultant and attract desperate clients."

[3]: To which I say, "Look, if you wanna pretend that the only possible architecture is a spiderweb of microservices that positively push backpressure up to the client and pretend that introspect-able queues don't give your services equivalent confirmation, that's a game you can play. I think it's disrespectful to folks who have equivalent backpressure schemes because they have similarly refined infrastructure for understanding their queue volume. Both methods are similar, and have different strengths. Needham's duality is real and it's exactly the same here as it is on one single computer."

[4]: It's 2018, we have containers, and if you support Java with its slower startup times you surely could support lightning fast Rust or Haskell executables as well. Get with it, Amazon!


Why even leave a comment?

Lambda et al have some serious shortcoming and a lot more work needs to be put into these serverless platforms. The approach they're taking I don't think will last. It really needs a redesign/restructure.

I think serverless is the future... but not today.. in 5-10 years. That sounds like a long way off, but it's not.. it'll pass in no time. And maybe they'll improve it enough by then to make it viable.

I wouldn't build anything serious with it, unless you're ok rewriting it a few years from now.

Lot's of empty claims.

> some serious shortcoming and a lot more work needs to be put into these serverless platforms

Not actionable at all

> The approach they're taking I don't think will last

No reason given

> It really needs a redesign/restructure

Nothing here too

> And maybe they'll improve it enough by then to make it viable

What makes it (un)viable?

That's fair. I'm the founder of a company building a serverless platform. I didn't want to write specifics because it may reveal details about our approach.

Isn't this a great opportunity to sell us on your solution, or get our feedback? You don't need to reveal details of your approach, but what are the actual problems you see?

As a company that has built our most recent site completely on serverless, we've become familiar with some of the issues. Though I definitely would not consider it a no-go.

That sounds like a very pertinent thing to disclose when offering a critique of a competitor, particularly when you're not able to substantiate (for legitimate reason or otherwise)

Sounds even more fishy

In that case you might wanna take another look at the answers here. Notice how they all use Amazon Lambda?

If you intend to compete with them - you might wanna reconsider.

Listen to I just have to say this for everybody else out there there is no serverless design it's not like the fucking things running in JavaScript on the clients there are servers the whole name is stupid come up with a better name for service man it's not serverless God help us

If you're going to call it serverless architecture then you better be running it on webrtc data streams over P2P clients

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact