
Ops in the serverless world: monitoring, logging, config management - theburningmonk
https://medium.com/@theburningmonk/yubls-road-to-serverless-part-3-ops-6c82139bb7ee
======
skywhopper
"serverless technologies have significantly simplified the skills and tools
required to fulfil the ops responsibilities"

This article sure does a good job of disputing its own premise. There's plenty
of Ops going on here. The fact that you have to reinvent a bunch of
infrastructure, procedures, standards, and workflow doesn't mean you aren't
doing Ops.

~~~
tschellenbach
Yes it feels like the AWS Lambda for everything choice really worked against
them

------
FatalBaboon
He ends up being so tightly tied to Amazon it's scary to me.

"You can scale all you want, servers are not your problem anymore!"

 _proceeds to build a million dollar business_

"That'll be <way too many $$> and you got nowhere to run, now that all those
clients of yours depend on our service"

 _proceeds to pay insane amounts of $$ while rewriting in serverful_

Is there any standard in the serverless world as to what they accept and how
they work?

~~~
bdcravens
> Is there any standard in the serverless world

Numerous companies depend on s3. Along the way open-source solutions have been
built, as well as competitive solution that are API-compatible.

We're seeing the same in serverless. Frameworks that are moving toward multi-
cloud, competitive solutions, etc. While not totally portable, you can take
much of your code with you. You'd do the same if you had a standard Rails app
running on an easy-deploy solution like Heroku and had to move to Digital
Ocean. In other words, the potential for vendor lock-in is a problem that
we've been created patterns to solve for many years.

>> That'll be <way too many $$> and you got nowhere to run

That makes for a great conspiracy theory but AWS has never done anything
remotely like that. Through the history of the service they continue to bring
down the price on their various offerings. Not sure what makes Lambda the
piece they'd abandon all that for (insert evil laugh) world domination.

~~~
amazingman
>That makes for a great conspiracy theory but AWS has never done anything
remotely like that.

Just last year they changed the support model to be a percentage of your AWS
spend, as opposed to the previous flat rate. They even had the audacity to
dress it up as a straight-up price drop.

[https://aws.amazon.com/about-aws/whats-new/2016/07/aws-
suppo...](https://aws.amazon.com/about-aws/whats-new/2016/07/aws-support-
announces-update-to-developer-support-plan/)

Imagine you're an organization spending 100k/mo on AWS and your monthly
support line item goes from $50 to $3000. Oh and by the way, the free support
tier cannot create technical support requests.

~~~
jessaustin
Did any firm that spends $100k/month not realize that they were underpaying
for support at $50?

~~~
RhodesianHunter
If I'm spending six figures on your product I expect at least a certain amount
of support to be free.

~~~
kk_cz
You expect wrong. Standard support fees are usually in ranges of 10-20% of the
price that you are paying for the product. And it makes sense - costs of
providing support are not constant why should their price be?

------
geggam
because Conway’s law tells us that having an ops team is the surefire way to
end up with a set of operational procedures/processes, tools and
infrastructure whose complexity will in turn justify the existence of said ops
team.

bwhahahaahaha

turn devs loose in production and you will soon realize stability is a good
thing

------
tschellenbach
Yubl is a failed company: Social networking app Yubl shuts down despite £16m
investment, [http://www.businessofapps.com/social-networking-app-yubl-
shu...](http://www.businessofapps.com/social-networking-app-yubl-
shuts-16m-investment/)

Nothing about this setup feels lean. Why did they build their own feed
technology instead of using an open source or hosted solution?

Why did they use Lambda for everything, and invest a crazy amount of effort
into making it work well? Their job was to prove their ability to acquire
users, they didn't need to innovate on the tech side by using Lambda for
everything.

It feels like a lack of focus. I admire how Unsplash is so clear about what
the goal is and where their team can add most value:
[https://medium.com/unsplash-unfiltered/scaling-unsplash-
with...](https://medium.com/unsplash-unfiltered/scaling-unsplash-with-a-small-
team-fbdd55571906)

~~~
bpicolo
> we decided to write our own using API Gateway, Lambda and DynamoDB because:
> ... even running consul with 2 nodes (you need some redundancy for
> production) it is still order of magnitude more expensive

Engineering time is free, of course.

------
_hyn3
Do NOT use Lambda for production at scale. [pic]

[https://twitter.com/JamiesonBecker/status/802185522139582464](https://twitter.com/JamiesonBecker/status/802185522139582464)

~~~
RubenSandwich
I'm having trouble reproducing your math. Can you be more specific on how you
got (30.5 * 86,400) in your equation? Using this calculator I can't reproduce
your cost: [https://s3.amazonaws.com/lambda-tools/pricing-
calculator.htm...](https://s3.amazonaws.com/lambda-tools/pricing-
calculator.html#). What is your execution time and memory on each lambda?

Edit: I'm guessing 86,400 seconds in a day and ~30.5 days in a month. If so
you're using lambda wrong. You should not keep a lambda running forever rather
treat each lambda as a short living function that responds to an API request
and avoid calling async operations during the execution of it.

~~~
jamiesonbecker
Please see Pricing Example 1 on
[https://aws.amazon.com/lambda/pricing/](https://aws.amazon.com/lambda/pricing/).
That's Amazon's own pricing example using typical values.

30.5 is the average number of days in a month. (365/12)

86,400 is the number of seconds in a day.

So, in other words, based on the very example shown on the Lambda pricing
page, 3 million requests in a month is only 1.14 per second.

A single bare-bones t2.nano instance can easily handle that, but let's pretend
that it's very, very spikey... so you need 10 nano's. (Of course, let's also
pretend that you're constantly using them all, and autoscaling doesn't exist.)
A nano can probably easily handle 100 req/s in even the slowest API framework
(and certainly the languages supported by lambda), so that's where the rest of
the numbers came from.

So, in other words, a cluster that easily handles 1,000 req/s would cost
around $67/month, vs $1,600 and change for Lambda. Their very own example
proves that it's far from cheap.

The worst part is that as scale increases, the cost disparity increases
linearly. Recipe for disaster.

But, again, it's very useful for small things. (AWS certified Solutions
Architect and I don't really touch lambda except for things that aren't worth
running a whole server for.)

~~~
RubenSandwich
Makes sense. Thanks for explaining.

For scaling a lambda it might make sense to run your lambdas on EC2 using
this: [https://github.com/lambci/docker-
lambda](https://github.com/lambci/docker-lambda). (But at that point you loose
the 'NoOps' benefit of lambda and I can't imagine you'd get similar
performance to just running a Node Server or the like.)

~~~
jamiesonbecker
This looks cool! thanks for pointing it out!

------
dmourati
Self-congratulatory drivel. I read with interest until he name-dropped his
company. (Hint: they have since shut down).

~~~
tschellenbach
The article could have started with that. While we should have been working on
proving out our ability to acquire and retain users, we had some fun with AWS
Lambda instead.

In all fairness, probably not the author's fault, but something that went
wrong higher up in the organization.

------
mwcampbell
> a global.CONTEXT object (which works because nodejs is single-threaded)

If that's true, then it sounds to me like AWS Lambda is being grossly
inefficient, handling only one request at a time in a given Node.js process.
Am I correct about that?

~~~
stretchwithme
I think it's kind of the whole point. The instance of the application exists
to service one request and then goes away. How many instances there are scales
with the number of requests.

And if you have an unpredictable workload, that may be the most cost effective
and responsive solution.

~~~
dullgiulio
Well, if a server could take more than one request and also scale
horizontally, that would be so much more efficient.

I am sure it has been done before. /s

------
pryelluw
What about audits in a serverless world? How do you approach that?

~~~
Spivak
Serious question. What are you auditing? You have access to all of your code
and data and everything else seems like it would be up to your provider.

Wouldn't you know up front when you're choosing a stack whether they meet your
compliance demands?

~~~
dragonwriter
> Wouldn't you know up front when you're choosing a stack whether they meet
> your compliance demands?

The number of actors in industries with particularly notable compliance
requirements (like HIPAA) that have internally very weak, inaccurate, ideas of
the actual requirements and who discover the pracctical meaning of key
elements both long after they've been in force and long after they've relied
on an inaccurate understanding is, well, non-negligible.

------
wernerb
Curious he does not mention the hosted elasticsearch+kibana on Amazon. Even
integrates with cloudwatch and lambda to pull in the data.

~~~
BoorishBears
We just migrated off AWS ES to elastic.co this week even though we pay for AWS
support(and this try to centralize on it).

For basic use cases it might be fine, but AWS ES doesn't support X Pack, lags
behind official releases, and I even recall it being more expensive once you
setup production level ES domains since master nodes are paid for separately.
XPack is huge as far as solving headaches with ES, even for the authentication
management alone.

~~~
FLGMwt
Whoops. We just decided to try out AES for a new project we're in the early
stages of spinning up. I'd be curious to hear any more details about the two
options should you have the time to spare.

We use Terraform to provision our stack though and it seems that the Elastic
Cloud doesn't have any programmatic configuration options. How does your team
orchestrate your elasticsearch cluster(s)?

EDIT: I should clarify, our use case is not ELK, it's search over a modest
data set.

~~~
BoorishBears
We were using Cloudformation templates via Serverless since the ES domain was
tightly integrated with the Lambdas we were deploying. Today we'd probably use
SAM with the same CF templates ([https://github.com/awslabs/serverless-
application-model/blob...](https://github.com/awslabs/serverless-application-
model/blob/master/README.md)) to keep things simple.

We're pretty write heavy/low read with a comparatively small data set (~50GB
expected over 4 months). I know ES can be tuned for that use case to improve
index throughput at the cost of slower searching but I never tried setting it
up on AES (I'm guessing it'd go in the CF-Template).

One challenge with elastic.co is there's no equivalent to Amazons
ElasticsearchDestinationFirehose. We were putting data into a Firehose which
batched, backed up to S3 and accounted for failures via a Lambda but
elastic.co is giving you "raw" ES instances, so you need to handle those
usecases.

You could probably write a Kinesis stream that replicated that
functionality(and from what I understand Firehose came to be because that's
exactly what people were doing), but we control the devices uploading to the
Lambda, so we had the devices act as the buffer (increased send thresholds,
and propagated failures to send to ES to our devices which retry
automatically) and sent a duplicate record to an S3DestinationFirehose for
backup and replaying data

~~~
cryptarch
Might I ask how you bootstrap your CloudFormation setup? Do you set up all IAM
accounts via CloudFormation on the root account, or do you use something else
for that? What do you run your CloudFormation templates on, dev machines or
something on AWS?

I'm currently tasked with getting a SaaS application onto Amazon and picking
it up as I go along, mostly just from the docs. It's been very hard to find
high-level overviews of ways to use AWS's ops tools like CloudFormation, most
articles I've found are about one specific API and not about architecturing
AWS apps.

~~~
BoorishBears
Serverless gets its own user that has the permissions needed to apply our CF
templates, then the templates define the IAM roles. But I guess depending on
the structure of your team and deploy that can be an issue, since having the
serverless credentials = being able to create and assign IAMs. For example, if
those credentials are in your CI, now giving someone access to your CI also
gives them access to an AWS user who's practically a "superuser" (although
tbh, I'm not sure how much of a problem that is since access to CI is already
a pretty big deal).

We run serverless (which in turn applies the CF templates) on Jenkins instead
of locally (which also helps automate passing the correct stage and
parameters). I guess the equivalent without serverless would be just be using
aws-cli, since serverless doesn't do anything special to the CF templates

There was a great webinar on production deployments with SAM (which you could
replace with CF) via CodePipeline, but I can't seem to find the slides for it
anywhere. If we "did it again" we'd probably use CodePipline/CodeBuild(We just
used Jenkins because we didn't really have a chance to look at other options)

~~~
cryptarch
Thanks for the advice, I think I'm going the CodePipeline/CodeBuild route :)

------
lykron
"skills and efforts required to fulfill the ops responsibilities do not
justify the need for such specialization." You just pay someone else to do it;
get over yourself.

~~~
sokoloff
About 10 words prior to the quote you cited and objected to, the author says,
"no ops specialization in my organization."

I'd conclude that the author realizes they're paying someone else to do it and
is rationally choosing not to employ those specialists in their org.

~~~
sidlls
Not to speak for the parent, but it seems to me that paying for "ops
specialization" is different, exactly the opposite even, from "ops
specialization isn't needed."

~~~
sokoloff
"Isn't needed [anywhere]" is also different from "isn't needed in my
organization" which is how I read the author's words.

In context, the quote is:

"NoOps to me, means no ops specialization in my organization — ie. no
dedicated ops team — because the skills and efforts required to fulfill the
ops responsibilities do not justify the need for such specialization. As an
organization it’s in your best interest to delay such specialization for as
long as you can"

That's not downplaying the value of ops, IMO.

