I agree with the sentiment of the article, if not the specifics of every point. Definitely feel some pain in areas. Sometimes I feel we are doing architectural contortions. Pain points include:
* SQS is not an event source :(
* Cold starts can be very problematic for latency sensitive scenarios when using
languages/frameworks that startup slow. It's pre-fork without the fork :(
* No concurrency in event handling exacerbating some of the above
* Artifact size limits can often be rough
* Bizarre artifact storage limits and little help in the way of cleaning them up
* CloudFormation limits and bugs
* The service is rather opaque
* Onboarding; good luck :)
Learned the hard way:
* Created too many individual project repos and serverless services. Classic monolith vs services pains but more to do with too-small service boundaries
* Interface to serverless projects too fine grained(too many lambda functions per service)
* Didn't start using step functions earlier to bridge the gap between stateful processes and stateless processors
* Used SQS as database(long story)
* Ran a flask API as a lambda. Works, but there is just no way we would use this if it weren't internal due to cold start/scale latencies with bursty traffic.
I really don't understand why this is the case. Every time we need a queue for something we end up having to explain this again to whoever our newest developers are, which seems to me like a sign of how unintuitive it is.
I have yet to encounter a use case where we didn't want to grab items out of the queue relatively frequently or even as they were added, so why force us to mess with scheduled events and long polling to do so?
In particular: "you get 5 reads". Number of consumers impacts latency, and not ms of latency, but s of latency.
This stuff is not obvious until you really try and use it.
New programming test idea for hopeful candidates; create a Kinesis client that reliably publishes batches of messages and doesn't starve other producers.
If SQS were event based, you could potentially try to process a message as soon as it was received and then have a batch job to reprocess any messages that have been left in the queue however often you like. The closest to this we got was by using SNS and feeding it into SQS, or even a table somewhere, but that gets sort of messy pretty quickly and didn't seem worth the extra infrastructure.
Edit: I didn't really respond to the suggestion of Kinesis here. I haven't done much with Kinesis, but I get the sense it's overkill for the use cases I'm thinking about. I don't plan to have a large amount of producers and don't expect them to have a consistent or large stream of data. If I'm wrong in that being the main use case for Kinesis please correct me.
You can attempt to keep between 1 and X instances of a lambda function running, however the underlying provisioning system is mostly a black box without published details and supposedly not entirely deterministic. Keeping a single instance of the function running isn't going to give great control over the tail on latencies. This is particularly true when faced with bursty, inconsistent traffic patterns.
We've come full circle, no?
I would love to hear it. I have mistakenly used SQS for medium-term storage of state, which is what I am assuming your issue was.
For old tech stacks we've had to maintain meticulous notes with setup and maintenance steps. They're very error prone and require constant upkeep to keep our build notes up to date.
With our new tech stack where we're (currently) using Docker and a BASH deployment script it's a breath of fresh air. We just keep our Dockerfile and setup scripts in Git. The script tracks the app version and is self documenting. We of course know it's always going to be correct because our CI server would complain if it wasn't.
The best part of it is that the ridiculously detailed document we used to have to maintain would take as much time as our automated strategy so in engineering resource the cost difference hasn't been very much at all.
Services like the ELK (Elasticsearch, Logstash, Kibana) stack and their variables include adding repositories, installing the software, configuring it, and Ansible makes sure that I get the same result no matter if it's an older generation VM, or if it has a different Linux distro.
That way in case of any failure or if any service needs to be scaled, a single Ansible run will do all the tasks for me and get my machine(s) ready and identical to each other.
Chef is used on a daily basis in a similar manner, but basically for standardization, making sure all the settings, users, critical files are the same after every run.
This is very useful for bootstraping new VMs and keeping old ones in order in two main scenarios - a client/user makes changes that they shouldn't have and a regular Chef run makes it standardized again and making simple global changes like adding a new resolver, adding a cron job or installing a new package.
I can't imagine not using them on a daily basis, but that's because of the 1500+ servers I manage.
I would predict that many of the early adopters are going to work themselves into a corner and find that Serverless doesn't fit a year or more down the line. Maybe that's ok if it works for now, but maybe not.
I also suspect that the long term place for Serverless is going to be in support services in infrastructure. Being used as "smart" wiring for alerts, internal chatbots, or for services that only ever have very spikey and infrequent traffic (which I think are rare).
I find that it usually takes longer to make a system which uses serverless technology than to make it from scratch using open source technologies.
It makes development difficult because you can't easily test locally; there are tools that let you run lambda functions locally but it's not exactly the same; not having a consistent
development vs production environment makes things difficult. Testing directly in the cloud is difficult when working in a team because you can't just share a single staging environment because it would always be in a broken state; so you have to split it up into a different test environment for each developer and you may also need to split up service dependencies in the same way when testing/debugging.
It kind of forces you to put everything in a separate service - You basically need a separate deployment pipeline for each developer which is impossible to manage.
Splitting everything up into services which you can't all run locally adds delays to development because, typically, in a real-world system, a single user action will propagate through multiple services; this makes debugging difficult because usually you don't know which service is responsible for a bug before you actually step through the entire code path.
Not being able to traverse through the entire code path in a single debug session is a massive problem; especially in situations in which there are multipe bugs in multiple different services.
To make matters worse, the logging for some services is quite opaque; often you need to raise a support ticket with the service provider and it takes days before someone can tell you what the problem is. The lack of control over the logging can be a huge problem.
The benefits don't outweigh the costs in my opinion.
There's so much glue bits these days to get a project to work. As long as it's in one spot, and can be consistently built (and probably other things, there's whole books on this stuff) your life is going to be better.
Also FTA: "As the result, serverless today lacks the established operational frameworks, patterns, and tooling that are required to tame it’s complexity. It requires an uber-architect to invent the end-to-end solution and tame complexity. These uber-architects are blazing the path and show success and helping the patterns emerge. But as Ann from Gartner pointed out at the (Emit) conference panel, there will be no widespread serverless adoption until the frameworks and tooling catch up."
I think what we need is tooling around web frameworks so that your web server code gets deployed as a series of lambda functions. I'm fine with deploying my code to AWS Lambda (or Google cloud, Azure, ...) but I hate not being able to test my code locally and I hate all the configuration stuff to be scattered around in a complex UI. I see serverless as a (sometimes) better way of deploying an API, I shouldn't have to completely change my workflow to do this.
I see some apps going serverless, and some going kubernetes. I cannot see a world where all apps go to aws locked in mess.
If there's a big enough team to justify and support (multiple) Kubernetes clusters, many serverless pieces don't make sense.
I run it as a 1 man dev team, spend maybe a few minutes per month on it. Save about 2x over running the same workload on bare VMs, and maybe 4x over the same workload on lambda.
But even on AWS.. all google is buying me is 1 "master" node for free (both running the node cost, and setting up the master cost). You can spin up a single reserved instance as your master node in about a day on AWS, and be good to go. So my "savings" by going google is like $50 and the setup of 1 master node. Outside of the master node, the experience is the same.
Bin packing is really nice. My app has something like 30 containers running. Using VMs, that would be 30 instances to fire up. In kubernetes, I fit it on 5 servers. Yes those 5 servers are a bit beefier, but it is still a pretty substantial savings.
The pattern for CRUD is pretty straightforward, you have an API that sits in front of a memcache/redis layer and a database. There's nothing really stateful about the API, so it should be a good candidate for a lambda function.
However, since a lambda function is stateless, that means you can't maintain a connection to a caching or database layer. As far as I can tell, that means you can't actually build a scalable CRUD API with lambda?
Steadystate load is very pricy. A 1 second spike is nice and cheap.
So, don't break the code up.
I've built several AWS Lambda applications, all as one big monolithic Python application - there's just one serverless function. Works fine. Super simple.
It's normally the user-exposed parts that need warming up.
A lot of the serverless services such as on AWS involve a scarily high amount of vendor lock-in. As mentioned in the article, "knowing DymanoDB will be little help in learning BigTable". I feel like a lot of the DevOps community prefers OSS and vendor-agnostic solutions rather than floundering once the limitations of the vendor's platform become clear.
It was almost immediately obvious where the bottlenecks were in the development process. How do I keep track of functions? How do I deal with versioning? How do I track code and function re-use? How do I enforce best practices for function execution via API?
I was in the (very fortunate) position where I had raised a modest $50,000 from Angels based on OSS adoption to pursue a broader business interest - we spoke to hundreds of customers and feedback directed us to (A) more clarity being needed around what serverless functions are, exactly, aside from cost-savings and (B) more mature tooling to manage them.
The result of these conversations and our own vision for the future led to StdLib  (and an invitation to AWS re:invent last year!) which addresses many of the concerns around tooling / framework maturity argued here. It relies on an open source specification, FaaSlang  to handle API execution and treat web resources as simple function calls. I think the author and many people who are commenting here may find, that for a lot of workflows they'd like to make "serverless", we're the best option in the market.
That said - this isn't for everybody, if you're micromanaging serverless workflows down to the MB of RAM, stick with what your DevOps team loves. However, if you love just writing code and shipping, and are looking to maximize your own development velocity with functions-first development and serverless architecture, we're your solution. We're the simplicity the author here has complained about the space lacking. We love any and all feedback - I'm an open book, e-mail me directly at keith at stdlib dot com.
All that takes time. If you are a company time costs money since you will probably have to employ people to do this. If you are an individual it means less time to work on whatever you're working on.
But in essence, it is not cheaper it's more expensive. It have the possibility to be cheaper if you are the right company or person with the right problem.
For example, buying disk space in the cloud is kind of expensive if you compare it with the hardware cost. I don't think a lot of file upload services use AWS or Azure to store files for this simple reason, it would not make any economical sense.
Interesting perspective. I've been thinking along these lines a lot recently. Is that obvious to everyone? Can anyone expand on it?
Coming from gamedev, software architecture that focuses not on runtime performance, but on development speed and ease of iteration and modification has tremendous effect on overall quality.
Microsoft, Oracle, etc: they are highly motivated to present "demo-ware" solutions to half-interested devs to sell their languages (and related high-margin database licensing). This tends to be strictly use-case driven day-one issues that are designed to trick impressionable devs and project managers that these tools will address fundamental complexity.
Oh look, a new "dynamic data" solution which optimizes a visible and annoying 0.002% of your project while locking you to a new untested stack... fun.
People selling programming solutions to programmers have a lot of incentives to focus on the immediate problems of new projects. The stuff that kills your team/project/company don't start popping up until day 3.
Are you a manager or an engineer who tries to gain public attention? Then use Serverless. The marketing is so good, you'll win in every meeting.
Are you the engineer who actually fixes the problems? Stay as far away from marketing-heavy solutions as you can. What you actually want in this case are tools that let you look inside, open up all the complexity to you, and therefore help you debug and learn the current context. In that case other PaaS solutions are preferable.
It does not solve all the issues. And you can potentially split the problem into too many tiny parts, but the "just make it another function" approach worked for me as a rule of the thumb.
As with most things though, if you push on this, then that happens over there. Some of your cost savings via serverless, will potentially get you elsewhere over time (increased complexity = typically the people handling that complexity will cost you more; if this becomes a much bigger trend and grows further in complexity, you can bet the bargaining power of the people handling that complexity will go up accordingly).