
Traits of Serverless Architecture - kiyanwang
https://www.thoughtworks.com/insights/blog/traits-serverless-architecture
======
userbinator
The first trait of "serverless" is that there is still a server, and it has
nothing to do with P2P networks. This must be one of the worst marketing-isms
I've come across, and I've heard some really awkward conversations due to it.
In the very brief contact I had with "serverless" systems, I remember the
phrase "serverless server" mentioned several times. A more appropriate name
would probably be "managed microservices".

~~~
dragonelite
I think opsless would be a better term, but probably has shitty marketing
potential.

~~~
fuball63
Opsless is pretty good. Function as a service (FaaS) is what my preference is.
It seems every time something about serverless comes up on the forum, there's
the inevitable "but there are servers" comment.

~~~
ilikehurdles
I don't like FaaS much more because of the ambiguity of the word "function".
In most servers I've worked on, you start by definition the single function
that determines where to route the request. What it really is, is managed
services, but that has its conflicting definitions.

~~~
arethuza
To add to the confusion, in Azure a Function App (the deployable thing) an
contain multiple Functions (the invocable things).

~~~
dkersten
Microservice as a service is clearest to me.

------
llarsson
Honestly, I disagree with the "low barrier to entry" point on the basis of all
the other points that follow. Yes, "Hello World" is easy. But a non-trivial
distributed system consisting of dozens or hundreds of these functions, where
each function is defined and deployed on its own, with an automated deployment
pipeline with rollback functionality... not to mention debugging
functionality, etc, etc.

But sure, Hello World is easy.

~~~
mjb
I work in this space, and the reason I do that is because I believe we can
(and should!) make it fundamentally easier to build big distributed systems.
Today's serverless offerings (including compute services like AWS Lambda,
databases like Aurora Serverless, and others) do a great job of reducing
complexity in some areas, but do still tend to add complexity in others. It's
also "just one more thing to learn", especially if you are already familiar
with server-based system operations.

One trend in this space I'm super excited about is tools like pywren (from UC
Berkeley) and gg (from Stanford), which show the way to building much higher
level abstractions that allow non-experts to build and run pretty huge
systems. I think we're going to be seeing a lot more of that, and the
serverless infrastructure is a huge enabler there.

Another trend is observability, and all the cool tools folks are building to
see how systems work 'as built'. These tools can cut through a lot of
complexity when debugging systems, and point very clearly to where problems
are happening. This is an area where serverless is catching up to single-box
tools, and will be for some time. Still, the core problem here is that
distributed systems are still harder to debug than single-box systems, but I
think that's entirely a reflection on immature tooling. I believe that the
fundamental law here is that with the right abstractions distributed systems
can be significantly easier to debug that single-box systems.

------
trabant00
> Unfortunately, much of the current literature around serverless architecture
> focuses solely on its benefits.

This article suffers from the same problem though. Where are the downsides?
What about production debugging? Moving complexity from inside of services to
service interoperability? Vendor lock-in? Lack of human support? Etc.

~~~
scarface74
Vendor lock-in from serverless -- It's not the boogeyman that many people make
it out to be.

Two broad use cases for serverless -- event based triggers and APIs. With
event based triggers you're already locked in by the events you're subscribing
to and as far as API's, at least with AWS, if you're using proxy integration,
you can use your standard Node/Express, C#/WebAPI, Python/Flask, etc.
framework. You add a few lines of code for the lambda proxy but you can use
the same code without any changes with your standard web servers.

With AWS, you also have "serverless Docker" with Fargate and you use standard
Docker containers.

Production debugging? The same issues and solutions with any micro-service
implementation. The solution usually being a common logging infrastructure.

On a broader scale, "vendor lock in" is severely overrated. Despite all of the
"repository patterns that will let us hypothetically move our million dollar
Oracle implementation to MySql", it rarely happens and often you end up with
sub optimal and more expensive/harder to maintain solutions by not going all
in on your vendor's of choice solution.

~~~
closeparen
> The same issues and solutions with any micro-service implementation. The
> solution usually being a common logging infrastructure.

A mature service mesh will let privileged users get inside containers to
attach profilers and debuggers, manually as needed or systematically through
performance monitoring tools, in situ or with temporary exclusion from the
load balancer pool. It's also sometimes necessary to correlate issues with
host-level metrics to understand weirder bottlenecks and tail latency issues.

> it rarely happens

The credible threat of it happening keeps prices low enough to make it
unnecessary.

~~~
scarface74
_The credible threat of it happening keeps prices low enough to make it
unnecessary._

As far as vendor lock in, in general not just with respect to the cloud, there
are so many path dependencies and the risk of disruption versus the reward is
so rarely worth it, it hardly ever happens.

How much would Oracle or Microsoft have to raise their prices on their
database products for instance to make it worthwhile for a large enterprise
company to move away from them to an open source alternative?

~~~
wstrange
A lot of CIOs would dearly love to move away from Oracle. The cost of
rewriting software that was not designed to be portable is immense. Oracle
knows this, and keeps the pain just below the threshold where porting to
postgres or mysql is cost effective.

I agree with the general sentiment that optimizing for extreme portability
does not make sense- but there is a balance that needs to be struck. Designing
your apps to be relatively SQL agnostic, or at least isolating those
dependencies, makes a lot of sense.

~~~
scarface74
And even if you do have a policy of “designing your apps to be SQL agnostic”
and you have had that policy for 5 years over dozens of apps. Would you trust
your policy enough that you would change your connection string from pointing
to Oracle to pointing to MySQL and hope everything would work?

What about all of the programs that you used that depended on Oracle specific
drivers. Even in a perfect world where everything was using standard SQL,
would the regression testing and migrations be something you would want to
tackle as a CTO? Would you be willing to take the reputational risks of
something going wrong?

------
vfc1
The term Serverless is meant to illustrate that by using things like the
Firebase ecosystem and namely the Firestore database, the most common database
operations such as CRUD operations can be done in a secure way straight from
the client side without having to develop server endpoints, such as for
example REST endpoints.

This is possible because the data modification operations are triggered
directly on the frontend using a SDK, and they are sent straight to the
service provider, which will then validate if the operation can go through,
like for example checking if the user is authenticated and has write access to
the data.

No application code is involved in this process, other than the client-side
code.

Not having to constantly hand-code REST endpoints just for doing secure CRUD
is huge!

~~~
pjc50
Indeed. Firebase seems to be the leader here, who else is competing on "Not
having to constantly hand-code REST endpoints just for doing secure CRUD"?

~~~
vfc1
I thought there would be competitors, I don't know why Firebase is not getting
more adoption then, it's still a bit of a niche and Google does not seem to be
pushing it as much as a couple of years ago, even though the product is better
than ever.

~~~
sojournerc
In our case, it's because we have more than web/mobile clients talking to our
API.

If the business case doesn't require partner/3rd party API access, then maybe
Firebase is appropriate.

Mentioned elsewhere in the comments here is the risk of vendor lock-in.

Given Google's history of EOLing products from under developers, there is real
risk in putting all the business logic eggs in the Google basket.

~~~
vfc1
Firebase is like a web developer friendly version of Google Cloud, that it
bases itself on.

It gives easy to use serverless CRUD (with Firestore), authentication,
authorization, secure file upload, hosting and server-side functions for
things like image processing or database triggers.

I don't think Firebase will be end of live anytime soon, worst case it's user
base would get migrated to Google Cloud, as it's really just Google Cloud
under the hood. I don't think it's an option for Google to give up on the
cloud at this point, even though AFIK Amazon has the biggest market share.

------
ahallock
My main concerns with serverless are unexpected costs and vendor lock-in,
especially for small startups and side projects. I like that you can create an
event-driven architecture and "glue" various services together (e.g, execute a
lambda when an object is created on S3), but I feel like this setup could get
out of control if you don't do a lot of planning and budgeting ahead of time
-- these time costs should not be overlooked. A more traditional setup may not
scale as well out of the box, but that could help you avoid system spike
costs. I'd rather have my site go down than go into bankruptcy because of my
serverless bill.

------
alexandercrohde
Some of my questions for anybody who's successfully implemented this at a 30
engineer+ company:

How do you tie in to version control?

How do you QA this? (Can it be QA'd locally?)

Would you have unit tests for serverless code? E2E tests?

How would you catch/diagnose a more junior engineer making an infinite loop of
events? Would this potentially bring down production or eat the budget?

~~~
Dunedan
I can't really speak for large companies, but I believe my answers apply, no
matter what's the company size.

> How do you tie in to version control?

You're probably talking about the infrastructure definition? Infrastructure-
as-Code (e.g. AWS CloudFormation) works quite well and can easily be put under
version control.

> How do you QA this? (Can it be QA'd locally?)

For AWS there are projects like the AWS SAM CLI
([https://github.com/awslabs/aws-sam-cli](https://github.com/awslabs/aws-sam-
cli)) which try to offer the ability to run your serverless application
locally, but I believe such approaches are fundamentelly flawed once your
application reaches a certain complexity, as I believe such projects will
never be able to re-implement all features and services made available by the
Cloud provider.

What works well for us is to simply have QA environments spun up in addition
to the productive ones. Once you got the infrastructure codified properly
that's quite easy. The biggest downside is that it, depending on your
architecture, might take some time to provision, so it's not as instant as if
you'd run it locally. And you need internet access of course.

> Would you have unit tests for serverless code? E2E tests?

Yes and yes. Actually I see no reason why you'd want to handle test coverage
differently from traditional applications.

> How would you catch/diagnose a more junior engineer making an infinite loop
> of events? Would this potentially bring down production or eat the budget?

Running code during development should never be able to affect production. For
AWS the way to go is to use separate AWS accounts for production and
testing/QA. You could even go so far to give each engineer his own account.

Regarding catching and diagnosing infinite loops, it all comes down to
monitoring: Monitor how the cost evolves and other metrics of interest (e.g.
the number of invocations of your serverless functions) and have automatic
notifications once certain thresholds are crossed. Additionally limiting the
maximum concurrency of serverless functions for non-production accounts might
help to avoid things getting out of control too fast.

------
matchagaucho
_" Stateless: Functions as a Service is ephemeral, hence you can’t store
anything in memory"_

This is more a characteristic of certain stacks rather than a trait of
serverless.

For example, it's a common practice in AWS Lambda functions to cache DB
connections using static variables.

~~~
mjb
Right. I like to make a distinction here between "hard state" which needs to
be persistent for correctness, and "soft state" which just needs to be around
often enough for efficiency. Lambda's function re-use behavior, and the fact
it lets you use static variables between invokes, is ideal for soft state.

Data locality, and amortizing the cost of expensive operations across multiple
requests, are still things that matter in the serverless world.

------
jbarham
I have a long-term side project, SlickDNS
([https://www.slickdns.com/](https://www.slickdns.com/)) which is a DNS
hosting service, and recently moved the URL monitoring feature from four self-
managed $5/month Digital Ocean VMs to AWS Lambda. My monthly hosting bill for
that particular feature has dropped from $20/month to $0.10.

Porting the Go monitoring code to Lambda was trivial. I spent much more time
trying to automate the deployment to Lambda & API Gateway and in the end gave
up and just deployed it manually via the AWS console.

I also recently moved the main web app from a self-managed Linode setup to
Heroku. Deploying via "git push" and not having to worry about configuring or
updating servers is a huge time saver. Heroku provides the best of both worlds
since I can still login into an ephemeral shell if I want to interactively
poke around my live Django environment.

------
k__
Also, FaaS isn't serverless, it's just one of many serverless services.

Many people think FaaS is the definition of serverless which leads to false
assumptions.

First "We can build serverless with containers at home, because everyone can
host FaaS!"

And "Coldstarts make serverless unusable for many problems!"

If you have to manage servers, vms or container (clusters) your solution isn't
serverless.

If you don't like coldstarts, don't use FaaS for your API. Use something like
Firestore, AppSync or FaunaDB.

~~~
alephnan
Did you read the article? The author dedicated a whole section to this and
introduced their own term “hostless”.

------
jedberg
Statelessness is not a trait of serverless. Serverless, as poorly named as it
is, just means that you don't have to manage servers, nor do you have any
control over the servers.

There are plenty of stateful serverless services. DynamoDB, BigQuery,
Serverless Aurora, etc. In all these cases, you get statefulness without
managing or controlling servers.

------
grezql
most important trait of them all: coldstart. It can take upto 10 seconds for
the first request to be handled if the "server" is cold. meaning it has been
inactive for 15 minutes or more.

no thanks.

~~~
robocat
"Cloudflare Workers respond very quickly, typically in under 200 milliseconds,
when cold starting. In contrast, both Lambda and Lambda@Edge functions can
take over a second to respond from a cold start." Also included network
latency, and maybe some bias.

[https://www.cloudflare.com/learning/serverless/serverless-
pe...](https://www.cloudflare.com/learning/serverless/serverless-performance/)

~~~
JMTQp8lwXL
Google expects you to have a TTFB of <=200ms. You've used up 100% of your
allocation just on cold boot time with serverless lambdas, FaaS, whatever you
want to call them.

~~~
robocat
I think you are being disingenuous.

Firstly, the Cloudflare number includes network time, which isn't included in
Google's number.

Secondly you can't compare the numbers unless you know the percentiles used.

Thirdly, you are presuming your function is always cold loaded - which is only
for the free tier of pricing from what I could see.

Google (v4 pagespeed) says: "Server response time is the time it takes for a
server to return the initial HTML, factoring out the network transport time.
Because we only have so little time, this time should be kept at a minimum -
ideally within 200 milliseconds, and preferably even less!"

Disclaimer: happy customer of Cloudflare, we don't use Workers except for one
debugging purpose.

~~~
JMTQp8lwXL
I'm not referring to Cloudflare. There are many other FaaS providers, e.g.,
Zeit's Now. It's true that cold boots are more probable for low-traffic sites.
If your function recently ran, it's unlikely for a subsequent execution to be
a cold boot.

But it could be luck of the draw that Google's crawler hits your site during a
period of low traffic, and your functions have to cold boot. That could impact
page rankings, as response times influence those.

~~~
robocat
> I'm not referring to Cloudflare

Why would you use a slow provider for a comparison point?

> and your functions have to cold boot

For $5 per month, it looks like you get warm startup _always_ on CloudFlare,
which is very fast.

Also: Google surely pay some attention to network delays: serving Kenya from a
Mombasa POP is going to be way faster than using some server in the US. I
would expect Google to give that some juice.

