Hacker News new | past | comments | ask | show | jobs | submit login
Traits of Serverless Architecture (thoughtworks.com)
121 points by kiyanwang on Aug 19, 2019 | hide | past | favorite | 67 comments

The first trait of "serverless" is that there is still a server, and it has nothing to do with P2P networks. This must be one of the worst marketing-isms I've come across, and I've heard some really awkward conversations due to it. In the very brief contact I had with "serverless" systems, I remember the phrase "serverless server" mentioned several times. A more appropriate name would probably be "managed microservices".

Three years of BigQuery... zero discussion about nodes, memory, disk arrays, vacuuming... None of it.

We still worry about costs, forecasting, latency, monitoring, availability, access controls, auditing, and resource utilization. So, I don't think "opsless" is applicable in the least.

But we don't worry about servers or nodes. We don't worry about zones or regions (we do worry about legal jurisdictions, but that's non-technical concern).

I don't worry about the server and there is no server are two very different statements. Serverless is a terrible name for it.

If I don't have to talk about servers, there's no difference in optics to saying serverless.

When I go to a Jimmy John's, I'm never served by Jimmy John. So what?

It's great marketing, albeit not what every person imagines, when encountering the term.

1st time you get a $60k AWS Lambda bill from developers leveraging lambda you end up having a talk with finance and a huge incentive to setup a server(s) for the developers to run jobs on

* edit was working on vault and for some reason typed that word instead of lambda

Instead of setting up servers I'd improve my cost monitoring and make the developers responsible for the costs they generate (with great power comes great responsibility). Both of that should have happened before they got permissions to use scalable infrastructure anyway.

I always go to Function as a Service. But i understand the struggle.

To be pedantic, the way I see it, it is not really a Function that is a Service here, rather a Function execution environment. The Function is actually yours.

EDIT: markup (not a frequent poster here :) )

Of course there are servers and networks and disk, etc. Serverless means you don't manage a server and the abstraction is not based around a server. The term "Managed Microservice" does not make any sense since microservices don't have the same lifecycle as managed functions.

It's just PaaS. We already had a term.

Some will say that PaaS did have knobs for amount of servers/instances/dynos/whatever but that's a minor detail and the innovation is called autoscaling. No need for an entirely new buzzword. But naming is both hard and exciting so we have "serverless".

PaaSes typically run persistent processes like big monolithic apps. "Serverless" is more like auto-scaled CGI-as-a-service.

What's the difference? PaaS platforms can accept code from a single function (lambda) all the way up to an entire container (knative/cloud run). The scale of code deployed doesn't really change the definition of a platform.

I've seen PaaS more frequently affiliated with services sold to me where the service is bound to a persisted compute resource, and you're always billed for it. There is a always a cpu core running the platform somewhere.

Whereas serverless/Lambda/container service can take the billed cpu to zero when you deem fit. There are still cpus managing and administering, but you have the option to take your utilization that you get billed for to zero, where in the PaaS that I've been sold to, I didn't have that option. YMMV, of course.

Those who don't "get" cloud likely haven't seen one big use case where the costs of cloud services are pretty immaterial: very large organizations with human-dominated change processes. The win here with cloud is the policies and contracts are embedded into the API and access configurations. The speed-up isn't from elastic services, it is eliminating the layers of humans standing between you and the service you want delivered.

If you work in such an organization where you send in a change ticket with the exact Unix commands you want run, and it takes a week for the sysadmins to get around to processing the ticket, the cloud seems lightning-fast by comparison. Then all the scaffolding to set up CI/CD, CMDB, configuration management, secrets store, elastic services, etc. either on-prem or in the cloud doesn't seem nearly so tedious any longer.

The aptly misnamed serverless architecture means you're using hosted services, where most "serverless" technologies can be placed somewhere between server hosting and static web hosting.

E.g. server hosting, container hosting, database hosting, app hosting, function hosting, web hosting.

I think opsless would be a better term, but probably has shitty marketing potential.

> I think opsless would be a better term

Considering that 'serverless' actually needs a more experienced and expensive ops team, no, it really wouldn't.

While I agree that opsless isn't a fitting name, how do you come to the conclusion that a serverless architecture requires more ops resources than a traditional one?

From my experiences the opposite is the case: The amount of time spent on operating a serverless architecture is much smaller than having to care about the underlying servers as well.

A pretty good example are all the nasty OS-related security vulnerabilities. Instead of patching hundreds of servers, like in the old days, with serverless applications you don't need to do anything. Even better: Your provider (at least if it's one of the major ones) has probably already patched these vulnerabilities, before they get published.

Even that’s not really true. You still have to wire it all together and figure out how to manage latency.

Agreed. You still have to forecast costs, audit use, manage access, deal with availability issues, etc.

A good serverless platform lets you focus on those issues instead of OS patches and hardware upgrades.

Opsless is pretty good. Function as a service (FaaS) is what my preference is. It seems every time something about serverless comes up on the forum, there's the inevitable "but there are servers" comment.

I don't like FaaS much more because of the ambiguity of the word "function". In most servers I've worked on, you start by definition the single function that determines where to route the request. What it really is, is managed services, but that has its conflicting definitions.

To add to the confusion, in Azure a Function App (the deployable thing) an contain multiple Functions (the invocable things).

Microservice as a service is clearest to me.

Is it really event function as a service? It's not like you can just remote a function to Lambda at least (unless there's some framework that does that on top). It's really just a managed container with a fixed entry point and built-in instrumentation.

In the FaaS (also my preferred term) world the “ops” responsibilities include cost management. Fargate, Lambda etc can get very expensive very quickly if you don’t have a guy on top of this.

Some people use "Zero Ops", which might be better than opsless. I've heard a conversation once where "opsless" was mentioned, and the other person thought it was "hopeless".

When serverless functions access resources in a VPC, the Network Ops, router, and configuration skills get put to the test.

Stateless sums it up but gives away the downside.

BigQuery is pretty stateful and, I would argue, one of the better "serverless" offerings out there.

"Modern FastCGI"?

Wasn't FastCGI actually running a persistent connection, just like an HTTP connection but with a custom protocol for no good reason?


Yes, it was a local only binary connection. The key difference from regular CGI was not needing to start a new process for every request, so it could be used with languages with long VM startup time.

Yeah, that makes it Not CGI, so it's more like binary HTTP. I guess you could say, a precursor of HTTP/2 :)

Name itself is important. It is faster and more efiecient than cgi

application serverless?

Honestly, I disagree with the "low barrier to entry" point on the basis of all the other points that follow. Yes, "Hello World" is easy. But a non-trivial distributed system consisting of dozens or hundreds of these functions, where each function is defined and deployed on its own, with an automated deployment pipeline with rollback functionality... not to mention debugging functionality, etc, etc.

But sure, Hello World is easy.

I work in this space, and the reason I do that is because I believe we can (and should!) make it fundamentally easier to build big distributed systems. Today's serverless offerings (including compute services like AWS Lambda, databases like Aurora Serverless, and others) do a great job of reducing complexity in some areas, but do still tend to add complexity in others. It's also "just one more thing to learn", especially if you are already familiar with server-based system operations.

One trend in this space I'm super excited about is tools like pywren (from UC Berkeley) and gg (from Stanford), which show the way to building much higher level abstractions that allow non-experts to build and run pretty huge systems. I think we're going to be seeing a lot more of that, and the serverless infrastructure is a huge enabler there.

Another trend is observability, and all the cool tools folks are building to see how systems work 'as built'. These tools can cut through a lot of complexity when debugging systems, and point very clearly to where problems are happening. This is an area where serverless is catching up to single-box tools, and will be for some time. Still, the core problem here is that distributed systems are still harder to debug than single-box systems, but I think that's entirely a reflection on immature tooling. I believe that the fundamental law here is that with the right abstractions distributed systems can be significantly easier to debug that single-box systems.

> Unfortunately, much of the current literature around serverless architecture focuses solely on its benefits.

This article suffers from the same problem though. Where are the downsides? What about production debugging? Moving complexity from inside of services to service interoperability? Vendor lock-in? Lack of human support? Etc.

Vendor lock-in from serverless -- It's not the boogeyman that many people make it out to be.

Two broad use cases for serverless -- event based triggers and APIs. With event based triggers you're already locked in by the events you're subscribing to and as far as API's, at least with AWS, if you're using proxy integration, you can use your standard Node/Express, C#/WebAPI, Python/Flask, etc. framework. You add a few lines of code for the lambda proxy but you can use the same code without any changes with your standard web servers.

With AWS, you also have "serverless Docker" with Fargate and you use standard Docker containers.

Production debugging? The same issues and solutions with any micro-service implementation. The solution usually being a common logging infrastructure.

On a broader scale, "vendor lock in" is severely overrated. Despite all of the "repository patterns that will let us hypothetically move our million dollar Oracle implementation to MySql", it rarely happens and often you end up with sub optimal and more expensive/harder to maintain solutions by not going all in on your vendor's of choice solution.

> The same issues and solutions with any micro-service implementation. The solution usually being a common logging infrastructure.

A mature service mesh will let privileged users get inside containers to attach profilers and debuggers, manually as needed or systematically through performance monitoring tools, in situ or with temporary exclusion from the load balancer pool. It's also sometimes necessary to correlate issues with host-level metrics to understand weirder bottlenecks and tail latency issues.

> it rarely happens

The credible threat of it happening keeps prices low enough to make it unnecessary.

The credible threat of it happening keeps prices low enough to make it unnecessary.

As far as vendor lock in, in general not just with respect to the cloud, there are so many path dependencies and the risk of disruption versus the reward is so rarely worth it, it hardly ever happens.

How much would Oracle or Microsoft have to raise their prices on their database products for instance to make it worthwhile for a large enterprise company to move away from them to an open source alternative?

A lot of CIOs would dearly love to move away from Oracle. The cost of rewriting software that was not designed to be portable is immense. Oracle knows this, and keeps the pain just below the threshold where porting to postgres or mysql is cost effective.

I agree with the general sentiment that optimizing for extreme portability does not make sense- but there is a balance that needs to be struck. Designing your apps to be relatively SQL agnostic, or at least isolating those dependencies, makes a lot of sense.

And even if you do have a policy of “designing your apps to be SQL agnostic” and you have had that policy for 5 years over dozens of apps. Would you trust your policy enough that you would change your connection string from pointing to Oracle to pointing to MySQL and hope everything would work?

What about all of the programs that you used that depended on Oracle specific drivers. Even in a perfect world where everything was using standard SQL, would the regression testing and migrations be something you would want to tackle as a CTO? Would you be willing to take the reputational risks of something going wrong?

The term Serverless is meant to illustrate that by using things like the Firebase ecosystem and namely the Firestore database, the most common database operations such as CRUD operations can be done in a secure way straight from the client side without having to develop server endpoints, such as for example REST endpoints.

This is possible because the data modification operations are triggered directly on the frontend using a SDK, and they are sent straight to the service provider, which will then validate if the operation can go through, like for example checking if the user is authenticated and has write access to the data.

No application code is involved in this process, other than the client-side code.

Not having to constantly hand-code REST endpoints just for doing secure CRUD is huge!

I use firebase, heavily.

The issue with that design pattern is that you end up with those hand-coded REST (or GraphQL) endpoints anyways.

You can't push all the business logic into the client because as soon as you do that, you end up duplicating all the business logic twice when you want to add a different client (especially in a different language).

For example, I have my Flutter phone app which is primarily for inputting data. I'm not going to put all the logic for talking to the firestore in the app because I also need to have a backend management app for viewing/editing all the data, which is web based.

So, instead, I built the logic into firebase functions via a graphql interface. Now both the phone and web clients can talk to graphql and share that single typesafe interface.

It is really nice that firebase is this flexible though. You can pick and choose and it is still relatively easy to secure the backend by using firebase functions.

Indeed. Firebase seems to be the leader here, who else is competing on "Not having to constantly hand-code REST endpoints just for doing secure CRUD"?

I thought there would be competitors, I don't know why Firebase is not getting more adoption then, it's still a bit of a niche and Google does not seem to be pushing it as much as a couple of years ago, even though the product is better than ever.

In our case, it's because we have more than web/mobile clients talking to our API.

If the business case doesn't require partner/3rd party API access, then maybe Firebase is appropriate.

Mentioned elsewhere in the comments here is the risk of vendor lock-in.

Given Google's history of EOLing products from under developers, there is real risk in putting all the business logic eggs in the Google basket.

Firebase is like a web developer friendly version of Google Cloud, that it bases itself on.

It gives easy to use serverless CRUD (with Firestore), authentication, authorization, secure file upload, hosting and server-side functions for things like image processing or database triggers.

I don't think Firebase will be end of live anytime soon, worst case it's user base would get migrated to Google Cloud, as it's really just Google Cloud under the hood. I don't think it's an option for Google to give up on the cloud at this point, even though AFIK Amazon has the biggest market share.

AWS Amplify is the competition.

My main concerns with serverless are unexpected costs and vendor lock-in, especially for small startups and side projects. I like that you can create an event-driven architecture and "glue" various services together (e.g, execute a lambda when an object is created on S3), but I feel like this setup could get out of control if you don't do a lot of planning and budgeting ahead of time -- these time costs should not be overlooked. A more traditional setup may not scale as well out of the box, but that could help you avoid system spike costs. I'd rather have my site go down than go into bankruptcy because of my serverless bill.

Some of my questions for anybody who's successfully implemented this at a 30 engineer+ company:

How do you tie in to version control?

How do you QA this? (Can it be QA'd locally?)

Would you have unit tests for serverless code? E2E tests?

How would you catch/diagnose a more junior engineer making an infinite loop of events? Would this potentially bring down production or eat the budget?

I can't really speak for large companies, but I believe my answers apply, no matter what's the company size.

> How do you tie in to version control?

You're probably talking about the infrastructure definition? Infrastructure-as-Code (e.g. AWS CloudFormation) works quite well and can easily be put under version control.

> How do you QA this? (Can it be QA'd locally?)

For AWS there are projects like the AWS SAM CLI (https://github.com/awslabs/aws-sam-cli) which try to offer the ability to run your serverless application locally, but I believe such approaches are fundamentelly flawed once your application reaches a certain complexity, as I believe such projects will never be able to re-implement all features and services made available by the Cloud provider.

What works well for us is to simply have QA environments spun up in addition to the productive ones. Once you got the infrastructure codified properly that's quite easy. The biggest downside is that it, depending on your architecture, might take some time to provision, so it's not as instant as if you'd run it locally. And you need internet access of course.

> Would you have unit tests for serverless code? E2E tests?

Yes and yes. Actually I see no reason why you'd want to handle test coverage differently from traditional applications.

> How would you catch/diagnose a more junior engineer making an infinite loop of events? Would this potentially bring down production or eat the budget?

Running code during development should never be able to affect production. For AWS the way to go is to use separate AWS accounts for production and testing/QA. You could even go so far to give each engineer his own account.

Regarding catching and diagnosing infinite loops, it all comes down to monitoring: Monitor how the cost evolves and other metrics of interest (e.g. the number of invocations of your serverless functions) and have automatic notifications once certain thresholds are crossed. Additionally limiting the maximum concurrency of serverless functions for non-production accounts might help to avoid things getting out of control too fast.

"Stateless: Functions as a Service is ephemeral, hence you can’t store anything in memory"

This is more a characteristic of certain stacks rather than a trait of serverless.

For example, it's a common practice in AWS Lambda functions to cache DB connections using static variables.

Right. I like to make a distinction here between "hard state" which needs to be persistent for correctness, and "soft state" which just needs to be around often enough for efficiency. Lambda's function re-use behavior, and the fact it lets you use static variables between invokes, is ideal for soft state.

Data locality, and amortizing the cost of expensive operations across multiple requests, are still things that matter in the serverless world.

I have a long-term side project, SlickDNS (https://www.slickdns.com/) which is a DNS hosting service, and recently moved the URL monitoring feature from four self-managed $5/month Digital Ocean VMs to AWS Lambda. My monthly hosting bill for that particular feature has dropped from $20/month to $0.10.

Porting the Go monitoring code to Lambda was trivial. I spent much more time trying to automate the deployment to Lambda & API Gateway and in the end gave up and just deployed it manually via the AWS console.

I also recently moved the main web app from a self-managed Linode setup to Heroku. Deploying via "git push" and not having to worry about configuring or updating servers is a huge time saver. Heroku provides the best of both worlds since I can still login into an ephemeral shell if I want to interactively poke around my live Django environment.

Also, FaaS isn't serverless, it's just one of many serverless services.

Many people think FaaS is the definition of serverless which leads to false assumptions.

First "We can build serverless with containers at home, because everyone can host FaaS!"

And "Coldstarts make serverless unusable for many problems!"

If you have to manage servers, vms or container (clusters) your solution isn't serverless.

If you don't like coldstarts, don't use FaaS for your API. Use something like Firestore, AppSync or FaunaDB.

Did you read the article? The author dedicated a whole section to this and introduced their own term “hostless”.

Statelessness is not a trait of serverless. Serverless, as poorly named as it is, just means that you don't have to manage servers, nor do you have any control over the servers.

There are plenty of stateful serverless services. DynamoDB, BigQuery, Serverless Aurora, etc. In all these cases, you get statefulness without managing or controlling servers.

most important trait of them all: coldstart. It can take upto 10 seconds for the first request to be handled if the "server" is cold. meaning it has been inactive for 15 minutes or more.

no thanks.

"Cloudflare Workers respond very quickly, typically in under 200 milliseconds, when cold starting. In contrast, both Lambda and Lambda@Edge functions can take over a second to respond from a cold start." Also included network latency, and maybe some bias.


Google expects you to have a TTFB of <=200ms. You've used up 100% of your allocation just on cold boot time with serverless lambdas, FaaS, whatever you want to call them.

I think you are being disingenuous.

Firstly, the Cloudflare number includes network time, which isn't included in Google's number.

Secondly you can't compare the numbers unless you know the percentiles used.

Thirdly, you are presuming your function is always cold loaded - which is only for the free tier of pricing from what I could see.

Google (v4 pagespeed) says: "Server response time is the time it takes for a server to return the initial HTML, factoring out the network transport time. Because we only have so little time, this time should be kept at a minimum - ideally within 200 milliseconds, and preferably even less!"

Disclaimer: happy customer of Cloudflare, we don't use Workers except for one debugging purpose.

I'm not referring to Cloudflare. There are many other FaaS providers, e.g., Zeit's Now. It's true that cold boots are more probable for low-traffic sites. If your function recently ran, it's unlikely for a subsequent execution to be a cold boot.

But it could be luck of the draw that Google's crawler hits your site during a period of low traffic, and your functions have to cold boot. That could impact page rankings, as response times influence those.

> I'm not referring to Cloudflare

Why would you use a slow provider for a comparison point?

> and your functions have to cold boot

For $5 per month, it looks like you get warm startup always on CloudFlare, which is very fast.

Also: Google surely pay some attention to network delays: serving Kenya from a Mombasa POP is going to be way faster than using some server in the US. I would expect Google to give that some juice.

Not with CloidFlare workers.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact