Hacker News new | past | comments | ask | show | jobs | submit login
Serverless is cheaper, not simpler (medium.com/dzimine)
146 points by kiyanwang on Sept 24, 2017 | hide | past | favorite | 77 comments

We run quite a bit of infrastructure via AWS lambda via the Serverless project including a data crawling/ingestion pipeline, data cleaning and enriching, and even a flask API..

I agree with the sentiment of the article, if not the specifics of every point. Definitely feel some pain in areas. Sometimes I feel we are doing architectural contortions. Pain points include:

* SQS is not an event source :(

* Cold starts can be very problematic for latency sensitive scenarios when using languages/frameworks that startup slow. It's pre-fork without the fork :(

* No concurrency in event handling exacerbating some of the above

* Artifact size limits can often be rough

* Bizarre artifact storage limits and little help in the way of cleaning them up

* CloudFormation limits and bugs

* The service is rather opaque

* Onboarding; good luck :)

Learned the hard way:

* Created too many individual project repos and serverless services. Classic monolith vs services pains but more to do with too-small service boundaries

* Interface to serverless projects too fine grained(too many lambda functions per service)

* Didn't start using step functions earlier to bridge the gap between stateful processes and stateless processors

* Used SQS as database(long story)

* Ran a flask API as a lambda. Works, but there is just no way we would use this if it weren't internal due to cold start/scale latencies with bursty traffic.

> SQS is not an event source :(

I really don't understand why this is the case. Every time we need a queue for something we end up having to explain this again to whoever our newest developers are, which seems to me like a sign of how unintuitive it is.

I have yet to encounter a use case where we didn't want to grab items out of the queue relatively frequently or even as they were added, so why force us to mess with scheduled events and long polling to do so?

SQS is probably carrying design baggage from when it was introduced in 2004. I've never used it, but would Kinesis be better for what you want to do?

Ahh kinesis has crazy gotchas depending on what you're doing. If you are pushing LOTS of events it kinda makes sense but see: https://brandur.org/kinesis-in-production for some gory details.

In particular: "you get 5 reads". Number of consumers impacts latency, and not ms of latency, but s of latency.

This stuff is not obvious until you really try and use it.

The API has tons of gotchas around pushing lots of events too; particularly from many producers.

New programming test idea for hopeful candidates; create a Kinesis client that reliably publishes batches of messages and doesn't starve other producers.

In most cases I think SNS is closer to what I want: multiple producers feeding into a single pipeline that notifies its consumer as soon as possible. The main thing from SQS is the ability to retry a failed message.

If SQS were event based, you could potentially try to process a message as soon as it was received and then have a batch job to reprocess any messages that have been left in the queue however often you like. The closest to this we got was by using SNS and feeding it into SQS, or even a table somewhere, but that gets sort of messy pretty quickly and didn't seem worth the extra infrastructure.

Edit: I didn't really respond to the suggestion of Kinesis here. I haven't done much with Kinesis, but I get the sense it's overkill for the use cases I'm thinking about. I don't plan to have a large amount of producers and don't expect them to have a consistent or large stream of data. If I'm wrong in that being the main use case for Kinesis please correct me.

I haven't used kinesis myself, I've just heard that it's "SQS with the FIFO issue fixed", and wondered if it might also be suitable for you. Judging from the other comments, it wouldn't be a silver bullet, though.

Speak of the devil - I've just had to create a new SQS queue just now (first time in months), and there's an option to make it FIFO.

You can keep the Lambda function warm by setting up a Cloudwatch event that pings the Lambda function every 5 minutes.

I believe there is a common misconception that this is an actual solution to the problem of cold start latencies.

You can attempt to keep between 1 and X instances of a lambda function running, however the underlying provisioning system is mostly a black box without published details and supposedly not entirely deterministic. Keeping a single instance of the function running isn't going to give great control over the tail on latencies. This is particularly true when faced with bursty, inconsistent traffic patterns.

Hopefully the warm/cold start problem is something they will find a solution for in the end.

Well, Docker already has (experimental) support for CRIU[1]. Since they control the environment, it should be possible to prevent people from doing stuff that would prevent it from working.

[1] https://criu.org/Main_Page

And now you're back to trying to manage the infrastructure, over which you only have very indirect control.

We've come full circle, no?

> Used SQS as database(long story)

I would love to hear it. I have mistakenly used SQS for medium-term storage of state, which is what I am assuming your issue was.

That's the gist of it; tracking position of a long running process by writing messages back into the SQS queue. Any of the numerous edge cases that will cause a dupe record to appear(some of them documented SQS implementation details) would cause the process to duplicate :| Then triplicate :|

It always puzzles me when someone mentions the cost, because it really depends on what you do. Even then, the costs are hard to compare. Usually people say things like "Yes, it might be mote expensive than bare metal, but you don't need an admin, so the TCO is lower". Call it what you want, but you need a specialist in the infrastructure you're using, and AWS has some very specific quirks you need to know. Not to mention that normally Lambda is just a part of a more complete setup.

I have a baremetal server and 99% of my admin task is apt-get update, apt-get upgrade. I have a diary where I write all the other admin tasks (the most complex one was configuring apache). When I buy a new server, I reread my notes to do some copy/paste. The freedom of a bare server is priceless ;-)

Transcribe those notes into Ansible and you'll have a one click solution for any new server. Or a thousand of them at once.

Cannot upvote this highly enough.

For old tech stacks we've had to maintain meticulous notes with setup and maintenance steps. They're very error prone and require constant upkeep to keep our build notes up to date.

With our new tech stack where we're (currently) using Docker and a BASH deployment script it's a breath of fresh air. We just keep our Dockerfile and setup scripts in Git. The script tracks the app version and is self documenting. We of course know it's always going to be correct because our CI server would complain if it wasn't.

The best part of it is that the ridiculously detailed document we used to have to maintain would take as much time as our automated strategy so in engineering resource the cost difference hasn't been very much at all.

A good (short) script can be a better documentation and at the same time be provably correct (just run it).

Deployment scripts existed before Docker.

Yes, but docker has created an incentive to publish them. When I want to install something, I search for a docker image that has it preinstalled and I read the dockerfile. I can choose either to use the docker image or to perform the installation globally on my main system. And, by the way, I have learned the installation procedure.

Out of curiosity, what is your usage scenario? In the place I work I use it for workstations (ca. 400 workstations and 10 servers), but I rarely use it for servers - neither for my employer nor for my customers. Somehow these servers are quite different from each other in terms of both hardware and software so I never quite understood the advantage in this case.

Ansible is clientless and the easiest orchestration system to setup, which is why I mentioned it, but I only use it in some scenarios like setting up a server for a specific task.

Services like the ELK (Elasticsearch, Logstash, Kibana) stack and their variables include adding repositories, installing the software, configuring it, and Ansible makes sure that I get the same result no matter if it's an older generation VM, or if it has a different Linux distro.

That way in case of any failure or if any service needs to be scaled, a single Ansible run will do all the tasks for me and get my machine(s) ready and identical to each other.

Chef is used on a daily basis in a similar manner, but basically for standardization, making sure all the settings, users, critical files are the same after every run.

This is very useful for bootstraping new VMs and keeping old ones in order in two main scenarios - a client/user makes changes that they shouldn't have and a regular Chef run makes it standardized again and making simple global changes like adding a new resolver, adding a cron job or installing a new package.

I can't imagine not using them on a daily basis, but that's because of the 1500+ servers I manage.

Well, that's all good until RAID goes down - your notes will only be partially useful then as there are many scenarios. Normally after the failed drive has been replaced the array should pick it up but if it doesn't the real fun begins.

if you have several servers you can either wait until the drive has been repaired (by the hosting provider) and just reinstall. If you need it more quickly you can spin up a new dedicated server. Many providers will set up bare metal servers within a few minutes (as long as the demand isn't too big). It doesn't give you the ability to scale 1000x within a few hours but that won't be a problem for most.

Software RAID is for you. Hardware RAID is only useful if you are buying the high end ones. Other than that software RAID is more flexible and is less susceptible to failure.

I also use software RAID. Nevertheless, it's not infallible. Recently a disk in one of the arrays went down, it got quickly replaced, but the server wouldn't boot. I had to connect via console and reinstall Grub - only then would the server boot and rebuild the array. A few weeks ago three disks failed at once. In cases like this the parent's notebook will be of limited use. In my experience traditional disks fail every 3-5 years - it's not something you can avoid.

I feel like the "Serverless is cheaper" thing here is being driven largely by the sorts of companies who are experimenting with it the most - small startups prematurely designing for scale.

I would predict that many of the early adopters are going to work themselves into a corner and find that Serverless doesn't fit a year or more down the line. Maybe that's ok if it works for now, but maybe not.

I also suspect that the long term place for Serverless is going to be in support services in infrastructure. Being used as "smart" wiring for alerts, internal chatbots, or for services that only ever have very spikey and infrequent traffic (which I think are rare).

The section about the extra 'wiring' complexity is spot on.

I find that it usually takes longer to make a system which uses serverless technology than to make it from scratch using open source technologies.

It makes development difficult because you can't easily test locally; there are tools that let you run lambda functions locally but it's not exactly the same; not having a consistent development vs production environment makes things difficult. Testing directly in the cloud is difficult when working in a team because you can't just share a single staging environment because it would always be in a broken state; so you have to split it up into a different test environment for each developer and you may also need to split up service dependencies in the same way when testing/debugging. It kind of forces you to put everything in a separate service - You basically need a separate deployment pipeline for each developer which is impossible to manage.

Splitting everything up into services which you can't all run locally adds delays to development because, typically, in a real-world system, a single user action will propagate through multiple services; this makes debugging difficult because usually you don't know which service is responsible for a bug before you actually step through the entire code path. Not being able to traverse through the entire code path in a single debug session is a massive problem; especially in situations in which there are multipe bugs in multiple different services.

To make matters worse, the logging for some services is quite opaque; often you need to raise a support ticket with the service provider and it takes days before someone can tell you what the problem is. The lack of control over the logging can be a huge problem.

The benefits don't outweigh the costs in my opinion.

Complexity-wise: it's another reason to lean towards Infrastructure as Code.

There's so much glue bits these days to get a project to work. As long as it's in one spot, and can be consistently built (and probably other things, there's whole books on this stuff) your life is going to be better.

FTA: "This wiring between the code: it better be code! And this is what DevOps was all about with it’s “Infrastructure as code” mantra."

Also FTA: "As the result, serverless today lacks the established operational frameworks, patterns, and tooling that are required to tame it’s complexity. It requires an uber-architect to invent the end-to-end solution and tame complexity. These uber-architects are blazing the path and show success and helping the patterns emerge. But as Ann from Gartner pointed out at the (Emit) conference panel, there will be no widespread serverless adoption until the frameworks and tooling catch up."

Docker works great to glue the pieces together.

As much as I hate to admit it (I love my servers) I think serverless is the future for most CRUD / admin API's. API's that require lots of computation and ultra low latencies will continue to be run on servers.

I think what we need is tooling around web frameworks so that your web server code gets deployed as a series of lambda functions. I'm fine with deploying my code to AWS Lambda (or Google cloud, Azure, ...) but I hate not being able to test my code locally and I hate all the configuration stuff to be scattered around in a complex UI. I see serverless as a (sometimes) better way of deploying an API, I shouldn't have to completely change my workflow to do this.

Don't forget kubernetes. A monolith with kubetnetes is in many ways easier to code, test, and deploy... and it is cross webhost.

I see some apps going serverless, and some going kubernetes. I cannot see a world where all apps go to aws locked in mess.

Kubernetes' cross webhost feature is not to be underestimated. Devops is the group that maintains the tooling that interacts with a company's given cloud provider, so if the order comes down from "on high" to change clouds (because negotiations with AWS end up meaning it's cheaper to move to a different cloud provider than it is to keep paying AWS), who is going to be doing the work to rewire code that talks to AWS Lambda?

If there's a big enough team to justify and support (multiple) Kubernetes clusters, many serverless pieces don't make sense.

And, at the end of the day, you gotta assume it's easier to rig-together a "serverless" solution on top of Kubernetes than it will be to make AWS Lambda driven apps into an orchestration solution...

Correct me if I'm wrong, but I get the impression Kubernetes is a somewhat expensive solution. The only reason I'd go with Kubernetes is if I was a) on Google cloud, which has first class support for Kubernetes b) a dedicated ops team deploying on premise.

You are wrong. It's not expensive.

I run it as a 1 man dev team, spend maybe a few minutes per month on it. Save about 2x over running the same workload on bare VMs, and maybe 4x over the same workload on lambda.

What platform do you run kubernetes on?

Google container engine.

But even on AWS.. all google is buying me is 1 "master" node for free (both running the node cost, and setting up the master cost). You can spin up a single reserved instance as your master node in about a day on AWS, and be good to go. So my "savings" by going google is like $50 and the setup of 1 master node. Outside of the master node, the experience is the same.

Bin packing is really nice. My app has something like 30 containers running. Using VMs, that would be 30 instances to fire up. In kubernetes, I fit it on 5 servers. Yes those 5 servers are a bit beefier, but it is still a pretty substantial savings.

> As much as I hate to admit it (I love my servers) I think serverless is the future for most CRUD / admin API's.

The pattern for CRUD is pretty straightforward, you have an API that sits in front of a memcache/redis layer and a database. There's nothing really stateful about the API, so it should be a good candidate for a lambda function.

However, since a lambda function is stateless, that means you can't maintain a connection to a caching or database layer. As far as I can tell, that means you can't actually build a scalable CRUD API with lambda?

I think you COULD but it's true you might end up with one DB connection per API call which might overwhelm your database server if you have too many concurrent requests that would be better off sharing a connection pool. For such scenarios you'd want more control than what serverless paradigms offer. On the other hand, most admin type API's don't get called very often.

I can't speak generally, but TCP connections stay warm between invocations with Google Cloud Functions.

It's not in any way cheaper unless you're really small. While you may not want to deal with the infrastructure, you pay a premium not to.

Well or really spikey.

Steadystate load is very pricy. A 1 second spike is nice and cheap.

Steady state with a lot of requests/s is surprisingly expensive and surprisingly hard to calculate before-the-fact (due to the strange ways ALB is billed, for example).

Much of the argument here is that serverless means breaking your codebase up into lots of serverless functions that act independently. Yes that would be complex.

So, don't break the code up.

I've built several AWS Lambda applications, all as one big monolithic Python application - there's just one serverless function. Works fine. Super simple.

Same here, but using express.js. People seem to think you have to make a new function for any little piece of your api

Does that mean you just build your express api as usual and run it on Lambda without any issues? We run a few functions that could perfectly act as a single api.

yes, but in my case I use google cloud I blogged about it: https://zach.codes/deploying-node-to-cloud-functions/

Interesting, thanks for writing about it!

And if you did that, then you'd have a nightmare keeping them all warm.

On the other hand, once you break the process into many parts, you don't have to keep most of them warm. Responding to some backend events - who cares? Some timer, or message, or whatever will fire up a couple seconds later and that's often completely fine.

It's normally the user-exposed parts that need warming up.

> For one thing, DevOps folks obviously don’t flock around serverless nearly as much as they do around kubernetes.

A lot of the serverless services such as on AWS involve a scarily high amount of vendor lock-in. As mentioned in the article, "knowing DymanoDB will be little help in learning BigTable". I feel like a lot of the DevOps community prefers OSS and vendor-agnostic solutions rather than floundering once the limitations of the vendor's platform become clear.

We have been using AWS lambda for a while now and it has been a very positive experience. It also allows us to grow steadily and build on top of existing infrastructure independently. It is true though that deploying lambda and API Gateway via cloudformation is pain and this is why we don't use it. However, everything else, from IAM policies, to user and identity pools and gateway resources - it all works really well and embracing its quirkiness and limitations is the only way you will enjoy developing for this platform. If you think about it, you will not use JavaScript style programming for Rust or even Swift. You need to think with the language and platform in mind. Same is with cloud technologies. You cannot think of them in a generic way. You need to use them in the specific way they were designed to be used.

Seriously, the only true server less design is p2p applications over RFC 1149.

I understand the sentiment of this article, but the fact is, the space is maturing very quickly and this argument won't hold for long, if at all. I've been building with Lambda and serverless architecture, generally, without frameworks for a couple of years now. My original impetus for building on Lambda was to simplify API development and deployment, mostly for myself and a popular open source framework I'd built over the preceding few years that had a few thousand GitHub stars and some Enterprise adoption.

It was almost immediately obvious where the bottlenecks were in the development process. How do I keep track of functions? How do I deal with versioning? How do I track code and function re-use? How do I enforce best practices for function execution via API?

I was in the (very fortunate) position where I had raised a modest $50,000 from Angels based on OSS adoption to pursue a broader business interest - we spoke to hundreds of customers and feedback directed us to (A) more clarity being needed around what serverless functions are, exactly, aside from cost-savings and (B) more mature tooling to manage them.

The result of these conversations and our own vision for the future led to StdLib [1] (and an invitation to AWS re:invent last year!) which addresses many of the concerns around tooling / framework maturity argued here. It relies on an open source specification, FaaSlang [2] to handle API execution and treat web resources as simple function calls. I think the author and many people who are commenting here may find, that for a lot of workflows they'd like to make "serverless", we're the best option in the market.

That said - this isn't for everybody, if you're micromanaging serverless workflows down to the MB of RAM, stick with what your DevOps team loves. However, if you love just writing code and shipping, and are looking to maximize your own development velocity with functions-first development and serverless architecture, we're your solution. We're the simplicity the author here has complained about the space lacking. We love any and all feedback - I'm an open book, e-mail me directly at keith at stdlib dot com.

[1] https://stdlib.com/

[2] https://github.com/faaslang/faaslang/

Having your own server is always the cheapest option and will probably always will be. Although, you have to manage it, update it, secure the network etc.

All that takes time. If you are a company time costs money since you will probably have to employ people to do this. If you are an individual it means less time to work on whatever you're working on.

But in essence, it is not cheaper it's more expensive. It have the possibility to be cheaper if you are the right company or person with the right problem.

For example, buying disk space in the cloud is kind of expensive if you compare it with the hardware cost. I don't think a lot of file upload services use AWS or Azure to store files for this simple reason, it would not make any economical sense.

"In technology, the most common currency to pay for benefits is “complexity”."

Interesting perspective. I've been thinking along these lines a lot recently. Is that obvious to everyone? Can anyone expand on it?

Much modern software development seems aimed at avoiding paying up-front complexity costs, at the expense of much greater complexity later on.

Gotta deliver that MVP, right? :)

It is actually a good strategy - the faster you get the MVP, the faster you realize that you're not building a good product and the faster you can change it or scrap it completely.

Coming from gamedev, software architecture that focuses not on runtime performance, but on development speed and ease of iteration and modification has tremendous effect on overall quality.

I'd say it's a touch more insidious than that...

Microsoft, Oracle, etc: they are highly motivated to present "demo-ware" solutions to half-interested devs to sell their languages (and related high-margin database licensing). This tends to be strictly use-case driven day-one issues that are designed to trick impressionable devs and project managers that these tools will address fundamental complexity.

Oh look, a new "dynamic data" solution which optimizes a visible and annoying 0.002% of your project while locking you to a new untested stack... fun.

People selling programming solutions to programmers have a lot of incentives to focus on the immediate problems of new projects. The stuff that kills your team/project/company don't start popping up until day 3.

Anyone interested in keeping up with developments in the serverless ecosystem, check out our weekly serverless newsletter at https://serverless.email/ :-)

Honest question. What are the advantages of Serverless over PaaS.

It is PaaS, but with a little less actual getting-dirty-tooling and a little more life-can-be-so-simple marketing.

Are you a manager or an engineer who tries to gain public attention? Then use Serverless. The marketing is so good, you'll win in every meeting.

Are you the engineer who actually fixes the problems? Stay as far away from marketing-heavy solutions as you can. What you actually want in this case are tools that let you look inside, open up all the complexity to you, and therefore help you debug and learn the current context. In that case other PaaS solutions are preferable.

Others mentioned things related to cost, but for me the best thing is the simplification of the design. Usually you have to decide many things up front - is something asynchronous, is it queued, is it behind a specific API, etc. Once you start using lambda, many of those questions get much simpler - that thing you're thinking of now is just another lambda function. You can change many details later. Even better - you can move functionality between subprojects without reintegrating code most of the time - lambda design guarantees you have completely isolated self-contained units of functionality.

It does not solve all the issues. And you can potentially split the problem into too many tiny parts, but the "just make it another function" approach worked for me as a rule of the thumb.

It's the elimination of idle resources that you're otherwise paying for. It can reduce a section of your DevOps costs as well.

As with most things though, if you push on this, then that happens over there. Some of your cost savings via serverless, will potentially get you elsewhere over time (increased complexity = typically the people handling that complexity will cost you more; if this becomes a much bigger trend and grows further in complexity, you can bet the bargaining power of the people handling that complexity will go up accordingly).

It depends if the PaaS is auto-scaling. One of the ideas of serverless is that you are only running infrastructure based on exact demand. Theoretically, you might be over-provisioning PaaS resources to handle scaling.

Does Google AppEngine count as serverless to the author? Because it's definitely simpler, and not cheaper.

I must have missed something here. What does serverless mean? What is it that is serverless?

> What is it that is serverless?

Your sysadmin.


Yeah I mean its a brand new paradigm, I think frameworks will be built on top of current approach and that will simplify this approach.

And there's a handful of projects already that meet various needs. serverless, zappa, chalice, AWS SAM, etc. All with pros and cons

It's dumber, not smarter.

DevOps is so five minutes ago.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact