
Serverless Best Practices - kiyanwang
https://medium.com/@PaulDJohnston/serverless-best-practices-b3c97d551535
======
actuator
He mentions DynamoDB for the data layer instead of regular RDBMS. I have tried
to use DynamoDB several times but have been disappointed with it. I am not
saying traditional RDBMSs solve the problem that DynamoDB has, but even
DynamoDB is a poor fit for systems with rapidly changing load patterns. Some
things that have been big pain points while using DynamoDB:

\- It has autoscaling built in now but even that leads a lot to be desired,
autoscaling is slow and your lambda functions will error out for significant
amount of time(I have seen it happen upto 5 minutes) for 5X spikes until it
scales up the IOPS. This fills up the queues quite fast and can lead to
patterns where your spike increases because of failure retries. You can
control this somewhat by limiting lambda invocations though. \- Limited number
of scale downs allowed. \- Scale down doesn't completely reduce the throughput
limits. I have seen situations where the write load went to 0 on certain times
but scale down reduced it to something like 400 IOPS which can be a huge cost
drain.

I am still looking for a DB which has done this well. Essentially what I want
is compute layer separated from the storage layer in the database well enough
so that both can be scaled independently and the compute layer can react to
quick changes in load patterns. Does anyone here have any recommendations?

~~~
jbergknoff
Similar experience with Dynamo here: it's very easy to get into a situation
where it's unusable (rejecting requests) unless you pre-provision enough
capacity. The act of scaling up capacity often takes 5-20 minutes, and then
there are the limits on how often you can scale. The backup/restore options
are very poor. There's no good way to clear a table besides deleting and
recreating it, which can be a problem when there are stream listeners. The
tooling around Dynamo, both first party and third party, is also pretty
shabby.

Maybe I'm just using it wrong. I wish Dynamo came close to its apparent
potential.

~~~
AmericanChopper
>Maybe I'm just using it wrong. I wish Dynamo came close to its apparent
potential.

Maybe, but you might also be using it for the wrong thing. Any business logic
that requires table scans (which are perfectly fine on RDBMS), is going to
suck in any document database. The way you win at Dynamo is by making sure
that one user operation is as close to one Dynamo call as you can make it. If
your workload can't be designed to suit that paradigm, then Dynamo will only
bring you pain and suffering.

That said though, migrations are terrible in Dynamo. Altering a table will
often involve creating a new one, backfilling it, and then cutting over. Which
you have to implement your own logic for. Also, think the GSI limit is 5? In
practice it's actually 4, because if you want to recreate a GSI, you need to
create a new one, and then drop the old one, so you have to keep one spare to
accomodate that process.

------
tomcam
I still don’t understand how you do a PostgreSQL insert on a serverless
system—please help! Also, this statement is patently silly:

> The biggest point to make here is that serverless architecture may well
> require you to rethink your data layer. That’s not the fault of serverless.

Well, it is the fault of serverless. It’s a shortcoming. Own it. The trade off
may we be worthwhile, I just can’t tell yet. I’m trying to wrap my mind around
serverless as regards database-backed apps, but obviously it would be
preferable to be able to use the RDBMS of my choice.

~~~
travbrack
So why is rdbms and serverless seemingly mutually exclusive? Are rdbms queries
really that slow compared to something like dynamodb?

~~~
k__
As far as I know, most RDBMS are connection based, which brings an overhead.
When you got your connection they're quite fast.

~~~
whopa
I wonder why all the predominant RDBMSes are heavyweight connection based, and
why this is presumably so hard to change. There's nothing in SQL that requires
this to be the case, at least conceptually.

~~~
pvg
Part of it is legacy but also, interactions with an RDBMS are often stateful
so it's useful to have a session (in which such things as transactions can
live, etc). I'd guess there isn't much reason to change this because there are
standard and effective workarounds that just haven't yet made their way to
things like lambda.

~~~
lostapathy
Can you elaborate on what “standard and effective workarounds” you are
referring to here?

~~~
pvg
Connection pooling is the most obvious one which is so common it tends to be
transparent and built into many DB access libraries. Then there are all sorts
of proxies/load balancers/multiplexors. If you think about it, the bits of
code that handle web requests in your typical web app/framework/whatnot are
very much like lambdas and when you write those, you generally don't have to
worry about the cost of DB connections because it's a well-solved problem.

------
jmull
This probably should be retitled “why you shouldn’t use serverless for most
things”. You would only need to slightly rephrase the sentences introducing
each section.

E.g., the sections could become:

Function Don’t Compose Well

Functions Don’t Connect to Data Stores Well

Otherwise Unnessary Queues are Needed

Based on this article, it makes it sound like current serverless isn’t a very
widely useful tool.

~~~
scarface74
There is more to Serverless than just APIs. For event based, eventually
consistent workflows where you don’t need real time responses like working
with queues, stream processing, etc. it’s great.

But, I think it’s used a lot more than necessary for end user facing APIs they
need fast responses.

------
languagehacker
Big ups for calling out how bad an idea connecting to the database is from a
serverless context. I do a lot with serverless on AWS Lambda. Since part of
that involves an ETL to write to a database, I also run a server that exposes
a service for writing data via RPC. The server provides a connection pool to
the database and appropriately encapsulates all the complex functionality we
have for incoming data behind a well-defined interface.

If you really insist on calling your entire stack serverless, do what we do
and run these servers as ECS tasks via Fargate. That is plausibly serverless,
albeit long-running. You get all the perks of a serverless environment with
all the perks of something like beanstalk (without having to patch a server!).
The drawback in this scenario is that running ECS tasks via Fargate don't
provide as much flexibility in cost.

~~~
Agebor
Connecting a DB might be a bad idea only currently, until cloud providers
implement some sort of RPC SQL layer.

We are successfully running Lambda functions connecting to AWS Aurora MySQL
using IAM authentication. It has some quirks but after figuring it out, works
well.

Just run a connection pool with only 1 connection and expect some additional
latency on cold start, though it's negligible compared to loading libraries,
runtime, etc.

~~~
zbentley
> implement some sort of RPC SQL layer

> Just run a connection pool

You just described the high-level architecture of what I believe is the
majority of web application servers and the applications they run: an RPC
layer over database accesses that run on persistent connections.

------
Agebor
I would rather disagree with these practices. Some of them are subjective and
some will likely be outdated soon.

> Each function should do only one thing. The problem with one/a few functions
> running your entire app, is that when you scale you end up scaling your
> entire application.

Sure, but is that a problem in general? Same thing can be said about a
monolith vs microservices, and there is always a trade-off. A function with a
larger code-base makes the code easier to navigate; may be easier to deploy.
First reason about / compare cold-start times before making the decision to
break the function.

> Functions don’t call other functions

Is the cost of calling the other function negligible (e.g. the functions are
called rarely)? If yes, feel free to do it.

> Use as few libraries in your functions as possible (preferably zero)

Security risk is another discussion, not related to serverless. Worry instead
about the cold-start time. If it's short enough for your use case - do use
libraries. Should be no problem for Go, Node is usually OK. Java - native
compilation is slowly approaching.

> Avoid using connection based services e.g. RDBMS

If adding another library is not a problem in your case, then why not? It's
not too difficult to create a pool with a single connection. As for security,
there are some new options like AWS Aurora IAM authentication. No VPC needed.

~~~
tylerhou
> Functions don’t call other functions

I think this point may might also be about reducing complexity, not just cost.
Pushing to a queue instead of directly calling another function decouples the
two functions and might make it easier to reason about how logic flows.

~~~
Agebor
True, might make it easier in some cases. This is a general architecture
decision, unrelated to serverless.

------
AYBABTME

        This one will get me into the most trouble. A lot of web application
        people will jump on the “but RDBMS are what we know” 
        bandwagon.
        
        It’s not about RDBMS. It’s about the connections. Serverless.
        works best with services rather than connections.
    

Guess what a service call will use to do its RPC...? a connection. This advice
makes no sense. Maybe you don't care about the response to your service?
That's different from saying "don't use connections". (I doubt most serverless
RPC calls are over UDP like protocol)

~~~
oxymoron
It’s not about TCP per se. RDMS connections are usually highly stateful and
resource consuming on the server end. That’s why they are usually pooled.

------
GuB-42
> If you’re not aiming to scale that far, then you can probably get away
> without following these best practices anyway.

If you don't intend to scale beyond what a server can offer (which is a lot!),
you probably shouldn't be "serverless" in the first place.

~~~
k__
Not having to set up and maintain your own web/app/db-servers is quiet a nice
thing to have, especially for small companies that don't have much money.

~~~
scarface74
True and that’s completely orthogonal to Serverless.

Use RDS for DB servers and just use Elastic Beanstalk. It takes most of the
complexity out of deploying load balanced web servers.

Serverless for app servers is sometimes fine if you don’t need real time
responsiveness like working with queues.

~~~
k__
Yes, I probably wouldn't use FaaS stuff like Lambda for an API, but AppSync.

------
glckr
Conceptually I would say I'm a fan of these sorts of ideas (serverless, and
queues in particular). Forcing you to look at the system as a chain of
processes operating on data can really bring architectural problems into line.

However, 99% of the work that I've done involves users hitting buttons and us
responding to them synchronously. In these scenarios, I simply can't figure
out how queues (and chains of serverless functions as advocated by this blog)
are supposed to work (if they are at all). There seem to be many ways to solve
this when the queues are all flowing freely, but as soon as there's any sort
of pressure on the system these things all look to fall down.

Looking at the amazon booking flow as an example -- it appears that they
always show a "your order has been placed" page with a big green banner
synchronously at the end of the cart flow. Some time later the user may then
receive an email saying their payment method was declined. This certainly
works, but a) it's horrible UX and b) it only works at the final stage of the
process.

I see queues (and serverless) advocated as good architectural decisions, but
every time they come up in a lecture/blog they're given in toy or data-sciency
sort of examples. Is it possible to use these patterns in a sensible way where
users are actually involved? (the blog mentions CQRS, but that seems... not a
perfect solution)

~~~
k__
I heard people were switching API-Gateway out with AppSync (the GraphQL
alternative), which allowed them to remove a huge amount of HTTP-bound Lambdas
and simply let AppSync manage that part of the stack.

~~~
e1g
AppSync is more of a wrapper around GraphQL rather than an alternative
(including being based on the most popular GraphQL client, Apollo)

~~~
k__
Oh, I meant it was a GraphQL based alternative to use API-GW backed by
Lambdas.

------
starkingclojure
Are there any major success stories of big projects going serverless? I like
the idea, but personally it mostly appeals to me as a nice way to reduce
maintenance overhead for side-projects rather than something I'd use for a
serious project.

~~~
lostcolony
iRobot (Roomba)'s are pretty heavily serverless.
[https://thenewstack.io/irobot-confronts-challenges-
running-s...](https://thenewstack.io/irobot-confronts-challenges-running-
serverless-scale/)

------
manigandham
These are all outdated already.

Serverless is a meaningless term. The real name is platform-as-a-service,
which is decades old. Running individual functions was just taking that to an
extreme but any complex app will have multiple functions working together so
it's right back to the same thing.

Many "serverless" environments are in fact converting to running an arbitrary
docker container for as long as it needs to run, effectively becoming the
next-generation of PaaS where you container can be whatever size you need,
while still not worrying about infrastructure and individual servers.

~~~
a_silly_name
I'd appreciate it if others could chime in on this one.

I'm mystified by the word 'serverless'. Why are we using this word? There are
clearly, _quite clearly_ servers involved. The wacky bugs involved in these
'server-less' services... are going to come down to what's happening on the
servers involved.

EDIT: my question seems to have offended. Unfortunate, and unintentional!

~~~
grzm
Serverless refers to you administering the application and its resources but
not the server directly. Yes, at some level servers are involved, but
generally the organization deploying applications is not responsible for
administering the hardware, the OS, other resources, often including the
scaling of the server resources: they're only concerned with the
administration of the application. One can argue that at some level the
organization deploying the application is often still concerned with the
underlying server and where that line is drawn, but many find it meaningful to
make such a distinction.

[https://en.wikipedia.org/wiki/Serverless_computing](https://en.wikipedia.org/wiki/Serverless_computing)

~~~
a_silly_name
Does make sense, but your description (and others) seems to match most vendors
API out there today. It's not clear to me what 'serverless' means vs 'hosted /
managed infrastructure', or even 'external API'.

It doesn't seem like a technical term at all.

Another commenter mentioned that it applies only to the price list, and that
makes sense to me.

------
staticassertion
> Functions have cold starts (when a function is started for the first time)
> and warm starts (it’s been started, and is ready to be executed from the
> warm pool). Cold starts are impacted by a number of things, but the size of
> the zip file (or however the code is uploaded) is a part of it. Also, the
> number of libraries that need to be instantiated.

I'm curious what size packages people are loading in lambdas currently. My
largest lambda is a 4.7MB zip.

If anyone has some anecdotal info on runtime + package size I'm super curious
to hear.

> So if you have to use an RDBMS, but put a service that handles connection
> pooling in the middle, maybe an auto scaling container of some description
> simply to handle that would be great.

Not sure I understand this. If I put another service in between me and the
database don't I just end up having to open connections to that service?

I have an Aurora instance my lambda talks to, and it creates a new connection
every time, and that's a bummer so I'm curious about how I could optimize that
patch.

~~~
k__
The service could be stateless.

~~~
staticassertion
Yeah, I wasn't thinking about what goes into a connection to a DB vs a
stateless service. Thanks.

------
krn
As long as people talk about a development practice only in terms of a set of
services from a proprietary cloud platform, you know, that it's not a mature
thing.

~~~
ryanmarsh
I don’t follow the logic in this argument. The same patterns work in other
cloud platforms.

~~~
krn
Serverless applications are attached to a single cloud platform they were
built on top of. You can't switch between the platforms without changing the
code. At its current stage, serverless is more like a hack.

~~~
lostcolony
So is every managed service then; in AWS almost everything uses IAM and that's
not portable. I can't use ECS; that's not portable. I can use EC2s, those are
just VMs...except how the heck do I build those? I can't do it by hand, that's
not portable, I can't use Cloudformations, that's not portable...I can't even
use Terraform, as despite their marketing speak I still have to change out my
configs, because they're still cloud dependent.

Really, it's all a sliding scale. Buying into any cloud, even just at the VM
level, means you've accepted your automation tools are going to be platform
specific, and require changing recipes/configs/etc to go elsewhere. If you
want to leverage anything beyond that, such as object storage (S3 in AWS,
which is super common), then your code has to become aware of IAM and S3
endpoints. Your code now has to change if you want to deploy it elsewhere.

I would contend your definition needs to be changed if the logical application
of it basically makes everything more complex than 'someone else's server' to
be "a hack"

~~~
krn
A public cloud only requires three things: compute instances, block storage,
and object storage. Everything else is just an additional service built on top
of these. If you use a cloud platform as an infrastructure provider, not as a
software provider, there is no vendor lock-in. There are enough mature open-
source tools even to build your own cloud on bare metal[1].

[1] [https://maas.io/](https://maas.io/)

~~~
lostcolony
How do you plan on accessing object storage in AWS (S3) from your code without
also using IAM to authenticate/authorize requests to it?

~~~
krn
I prefer to use object storage powered by OpenStack Swift[1], which makes my
application not attached to any proprietary API.

[1] [https://www.ovh.com/world/public-cloud/storage/object-
storag...](https://www.ovh.com/world/public-cloud/storage/object-storage/)

~~~
lostcolony
There ya go then. Like I said; you're eschewing leveraging -any- managed
service cloud providers give you, and instead just use bare VMs, and relying
on cobbling together open source stuff, that you have to manage yourself, to
build a solution. That's fine for those who have that kind of time, but some
of us want to build things and push as many things onto the cloud provider as
possible, so we don't have to deal with them.

I like having out of the box authentication with Cognito, out of the box DBs
with RDS and Dynamo, out of the box object storage with S3, Serverless as an
option with the API Gateway and Lambda...and I'll worry about vendor lockin
when Amazon ups the price of an AWS service.

~~~
krn
You are right, I prefer highly scalable and proven in production open-source
tools, such as Redis and Postgres, to managed services by some cloud provider.
It comes as very little overhead, if you pick the right tools and master them,
then re-use.

But I understand your point about just connecting a bunch of APIs and being
done with it, if you personally don't own the product, or if you are expecting
to sell it soon.

A nitpick: I don't host my own object storage, I just pick S3 alternatives
that provide standard APIs to access them.

------
bazza451
I’m not going to fault him on the RDBMS comment, they scale shockingly due to
the aforementioned connections comment but dynamodb is really not a
replacement for a traditional RDBMS.

But I’ve seen so many people get themselves into trouble by using dynamodb as
a silver bullet and not thinking heavily about the design of their data. If
you try and use RDBMS patterns (PK and FK relationships) there’s a whole heap
of pain due to atomicity only being on one record in one table. Some times you
can’t escape these patterns as well as it is the best fit for data or you
require strong consistency of your model (e.g. financial).

They released dynamodb transactions (for Java only) but from what I’ve seen
that just increases the complexity of solutions, you end up with more
reads/writes per call and a heap of other stuff surrounding your data.
Serverless story currently for stuff like this really has drawbacks

------
robertonovelo
IMHO RDBMS and libraries are OK, just make sure you know the tradeoffs and
have a strategy to withstand any potential scale issue or trouble. Do not
reinvent the wheel, build for scale when it's needed, know both your
function's and system scope/scale, etc.

------
aggre
Server-less functions will be thinner when launched new managed services like
AppSync.

