
Ask HN: Have you shipped anything serious with a “serverless” architecture? - freedomben
I&#x27;ve been watching the rise and maturing of AWS lambda and similar offerings with excitement.  I&#x27;ve also shipped several microservices in both node and Java that are entirely serverless, making use of API gateway, lambda, dynamo db, sqs, kinesis, and others.<p>For the simple case, I found the experience to be great.  Deployment was simple and made use of shell scripts and the excellent AWS CLI.<p>I&#x27;ve been hesitant to build anything serious with it tho.  The primary concern has been visibility into the app.  The app&#x27;s operation can be quite opaque when deployed that way. 
 Further exacerbating the issue, we&#x27;ve a few times lost Cloudwatch logs and other reporting due to both configuration issues and improper error handling, but these are things that would have been much easier to identify and diagnose on a real server.<p>Have you shipped anything serious with a serverless architecture?  Has scaling and cost been favorable?  Did you run into any challenges?  Would you do it again?
======
twistedpair
@ mabl (mabl.com) we've been running a serverless backend on Google Cloud
Functions for over a year. It's handled 600M function calls/month without much
trouble. Our findings are:

* Eventually you need to promote a function to legitimate service for cost savings (e.g. GCF ️ AppEngine Node service)

• You need a buffer (e.g. GCF outscales services it calls) such as Pub/Sub

• Multi-repo/project layout is best for deployment speed, but needs extra
dev/CI tooling to simplify boilerplate

• Minimizing costs can be creative/tricky compared to legacy services

• GCF is great for automatic stats and logging (Stackdriver) and "it just
works" configs, compared to Lambda

• Don't go cloud functions everything, just the parts that are a good fit

We've gotten a good uptime using cloud functions, but we're always pushing for
more nines. Since the functions tie together a bunch of backend Pub/Sub queues
and services/stores, a brief cold start or queue backup has no notable impact
on the overall system latency or throughput.

BTW, the coolest feature of AWS Lambda I've found is tying it to SES/SNS for
inbound and outbound email routing. I've been running my personal email for
years through a Lambda function for a few cents a year.

Overall the space is rapidly evolving and we'll see lots more features on
Azure Functions, AWS Lambda, and Google Cloud Functions. See our learnings
[1].

[1] [https://www.slideshare.net/JosephLust/going-
microserverless-...](https://www.slideshare.net/JosephLust/going-
microserverless-on-google-cloud-mabl)

~~~
jonathan-kosgei
Any links on how to implement your email routing setup?

~~~
VectorLock
I'm curious about that as well. Does he have a Lambda that sends email he gets
to a spool file somewhere? What does that workflow look like? Whats the
advantage over just using a regular email client or Gmail?

~~~
twistedpair
The advantage is collecting email from disparate addresses (e.g. admin@your-
domain.com) and forwarding them all to your preferred Gmail accounts. You
can't setup ACM or other certs services without that to prove ownership.
Previously I paid ~$100/yr to get all these mail forwarding routes setup. You
can also capture inbound mail directly to a queue (e.g. unsubscribe@your-
domain.com) and feed it to a Lambda to take action.

------
not_kurt_godel
I have spent the last year and a half building a completely serverless
production service on Lambda, API Gateway, and DynamoDB (along with the
standard auxiliary services like CW, SNS, Route53, S3, CF, X-Ray, etc.). It
was a lot of work establishing new patterns for many of the operational
aspects, particularly custom CW metrics and A/B deployments with Lambda
traffic shifting, but in the end everything is set up nicely and I'm quite
pleased with the end result. We're starting to ramp up traffic now by orders
of magnitude (with many more to come) and it's soooooo awesome knowing the
stack is pretty much bombproof. Another super-nice thing is all internal
authentication and networking being controlled by IAM rather than security
groups/VPC/traditional networking - that aspect alone eliminates a tremendous
number of headaches.

My biggest complaints are probably DynamoDB eventual consistency (unavoidable
when using GSIs), occasional CloudFormation funkiness (though no urgent prod
issues yet, thankfully), CodeDeploy CW alarm rollback jankiness (which doesn't
tell you which alarm triggered a rollback!!), and lambda coldstarts. But none
of these are too terribly concerning and I have faith they'll get
incrementally better over time, hopefully.

The biggest cautionary tip I have is we run all our Lambdas with the max 3GB
memory both for peace-of-mind and because the underlying EC2 instances have
significantly faster CPU. We were seeing weird timeouts and latency initially
with <1GB memory, so I'd be hesitant to run the service if the extra cost of
using the biggest possible instances is a concern, which for us it is not.

Another cost concern I should also mention is that we mitigate cold starts by
running multiple canaries using scheduled lambdas (in addition to the standard
canary role of generating a baseline of metrics and immediately
detecting/alarming on end-to-end issues). We are effectively maintaining a
constant warm pool which, in theory anyway, greatly decreases the chances
customer traffic will hit cold starts. I'm not intimately involved with the
financial aspects but I suspect achieving the same effect with EC2 would be
significantly cheaper, at least with respect to infrastructure costs. I would
guess, though, that the developer time savings achieved by massively reduced
ops burden and overall system simplicity are probably comparable to the
increased infrastructure cost, and very possibly hugely outweighing it.

~~~
atmosx
Your setup, although appealing from an AWS ecosystem perspective, sounds like
a bit expensive to me on a first reading. Of course everything depends on the
specifics but Lambdas and DynamoDB are expensive at scale. I wonder how it
compares cost-wise to a more traditional solution.

~~~
aequitas
Indeed, I always get the feeling you need some sort of exit strategy to a
traditional model for when your service starts lifting of. Never did the cost
calculations though.

~~~
lucasgonze
That gets to lock-in.

The stacks are super proprietary. Porting away from AWS to another serverless
cloud vendor (e.g. Azure) would be a major project.

Porting to a server-ful architecture would be a full rewrite.

~~~
not_kurt_godel
It really wouldn't be that bad - at least not any worse than any other
migration of a massively complex project from one platform to another. We
deliberately kept our implementation flexible enough to be able to move off
Lambda if necessary. Our entire stack can be containerized using Docker. All
database interactions are behind interfaces that allow us to swap DB
implementations if needed (even to relational ones). All custom CW metric
publishing is centralized in a single object that can just as easily publish
somewhere else. All our APIs are defined using Swagger which is portable to
lots of tooling. The worst part would be replacing IAM with whatever
networking/permission model the other platform had, but even that could be
approached programmatically to reduce the difficulty.

Edit: We also broke the stack into a number of independent microservices, each
with their own API, DB, and dedicated CI pipeline. This would allow us to
incrementally migrate chunks in parallel without disrupting the entire
service.

------
cimmanom
We've shipped something simple and non-mission-critical in production (URL
rewriting for ad placements).

It has been pretty much set-and-forget. Last anyone had to even look at it was
almost 3 years ago, and afaik it's still working (our ad sales team would be
complaining loudly if it weren't).

For something peripheral like that, it's nice not to have to run servers for
it or devote any energy to keeping it running.

In terms of both server costs and upkeep costs, the economics have been highly
favorable.

I'm not sure I'd use it yet for something mission-critical or that shipped
changes frequently. My recollection is that when we did have to adjust it,
debugging was a bear. Tho tooling for that may have improved in the last 30
months.

~~~
tbrock
Regarding the tooling: I hated debugging Lambdas via CloudWatch logs when
using them for Hustle, it drove me to drink.

I eventually got upset enough that I made a tool to stream the cloudwatch logs
to the terminal and colorize, indent + nicely format json output:

[https://github.com/TylerBrock/saw](https://github.com/TylerBrock/saw)

~~~
jonathan-kosgei
There's also
[https://github.com/rpgreen/apilogs](https://github.com/rpgreen/apilogs) but I
haven't yet gotten it to work.

Cloudwatch feels like it was not made for humans. Searching and filtering for
specific log events is a huge pain.

I think one of the best things one could do is to pipe your Cloudwatch logs to
an ElasticSearch cluster.

~~~
tbrock
Anything where you have to install python is kind of a drag. Same thing with
awslogs, etc...

Saw installs as a single binary, is performant, and has better looking output
than all of them.

------
mabbo
Just the opposite of serious for me.

Example: there's a website for all the weather radar stations in Canada that
lets you see the last two hours of 10-minute radar snapshots. I wanted a dump
of them to try some ML algorithms, but even after requests and emails there
simply wasn't one.

So I set up a lambda to run every hour, load the website for all 31 weather
stations and save all 6 images from the last hour to S3.

It took me an hour to setup. I've never gotten around to making that ML
project, but lambda has kept on chugging away for me, GBs of data saved away.
The only real cost is the s3 storage, still under $2/month.

Serverless is brilliant for little things like that.

~~~
technics256
As someobe new to serverless, is your code available to study as a pseudo
tutorial?

~~~
vazamb
If you are into python I highly recommend Zappa. It turns your flask app
endpoints into lambda functions + API gateway. The big benefit here is that it
is trivial to test locally because before it gets transformed you can just do
'flask run' and use postman to test the endpoints

------
jacquesm
I can't name any names for obvious reasons or give you more hints about what
industry this company is in but I just did DD on a _very_ impressive outfit
that ran their entire company on Google's cloud platform, it held about 500T
of data and held up amazingly well under load.

I was super impressed with how they had set this all up and they were
extremely well aware of all the limitations and do's and dont's of that
particular cloud implementation.

Obviously there is the lock-in problem, _if_ you ever decide to move you have
a bit of work ahead, so build some abstraction layers in right from day 1 to
avoid hitting all your code if that time should ever roll around.

And cultivate contacts with your cloud vendor.

~~~
snowwrestler
DD?

~~~
BrentOzar
Due diligence.

~~~
jacquesm
Yep. Sorry.

------
tjholowaychuk
My uptime monitoring project uses AWS Lambda heavily, almost exclusively
[https://apex.sh/ping/](https://apex.sh/ping/) — it has been great. I've
processed 3,687,727,585 "checks" (requests really) with it, and I only had
roughly 1 hour of downtime two years ago in a single region. Since then it has
been stable.

I have 14 or so regions so doing the same thing with EC2 would have
considerable overhead, though I can still imagine many cases where Lambda
would not be cost effective, but its integration with Kinesis is fantastic as
well, stream processing almost cannot be easier, and while people say Kafka is
more cost-effective, with a bit of batching you can get a long way with
Kinesis as well.

~~~
juoemeka
How much does it cost you monthly?

~~~
Lord_Zero
Since he's charging the customer I doubt he will give that information up.

[https://apex.sh/ping/#pricing](https://apex.sh/ping/#pricing)

But the business idea is simple so I am sure it would be easy to calculate a
rough idea.

------
hkchad
99% off our sas analytics frontend is backed by aws lambda. I love not
worrying about underlying infrastructure. We have close to 150 lambdas running
our api. We do not use api gateway instead we use apigee. For logging we built
a logging module that logs to kinesis then to s3 and elasticsearch. Hardly
ever look at cloudwatch, those logs get expensive after a while so we only
keep 3 days. We use node, python and java depending on needs. It's a good idea
to benchmark your lambdas and determine the resource size, a little bump can
have a dramatic difference in execution time but after some point you are just
wasting $$$.

~~~
jcims
Your last point is very important. At the very least start with comparing your
function running with what it needs and then maxed out. Don’t forget warmup
time.

------
simplesleeper
Software developer at Transport for London here.

A substantial part of a few of our systems use Azure Functions.

We've actually found them excellent to work with, fairly cheap and scaling
very efficiently.

The main issues we've had are internal disagreements about how to pass app
settings in efficiently. Originally we put all the app settings in the ARM
templates used to deploy them. Then we put them as variables in VSTS. And then
finally we decided to put them inti variable groups, which are within task
groups, which are used by releases. It's a bit of a weird chain of
dependencies, but now all of our parameters are located in one place

~~~
tinco
Was latency a concern for you at all? If serverless did not exist, would the
functions have been (micro) services with similar latency overhead?

~~~
simplesleeper
I've found functions apps have a slow startup time, but once they are going,
they perform pretty fast.

I think we currently don't have any functions in time-sensitive streams
(they're rather new, so they are used for new features such as Oyster
automated refunds, which can be applied a few days later if need be), so when
I say we haven't had any performance issues with them, it has to be taken with
a pinch of salt.

I think if we were using microservices, they would have substantially more
logic in them than the individual functions have, so they would have a lower
network overhead. They wouldn't scale automatically, so we'd have to have
quite a few machines on all the time, which would cost a bit more.

The main benefit we enjoy from functions is the ability to change code with
zero downtime, low risk and minimum disruptions to the whole service.

We've found the interfaces in Azure also to be great - we can write end-to-end
tests that can poll service bus to check when all messages are delivered to
assert against any final case.

~~~
tomspeak
This is fascinating stuff — every time I use my oyster card I have wondered
what it is doing behind the scenes.

Do you know of any write ups about TFL's infrastructure/processes? Would love
to read about it.

I'd also like to buy whoever lead the TFL API's move from XML to JSON a
coffee|beer!

~~~
simplesleeper
I don't know of any comprehensive writeups - even the internal wiki is lacking

------
jonathan-kosgei
At ipdata.co we use the same APIG+Lambda setup replicated in 11 regions to
have the lowest latencies globally. We had to do some extra work to get API
keys and rate limiting working but it was worth it. Our setup averages ~44ms
response times - [https://status.ipdata.co/](https://status.ipdata.co/).

We wrote about our setup in detail on the Highscalability blog [1]

A few things have changed since we wrote that article;

\- We implement custom authorizers, which have helped lower our costs and the
auth caching means authentication only happens once per x minutes and all
subsequent requests are much faster.

\- We use redis and a couple of Kinesis consumers running on a real server to
sync data across all our regions. This setup has been battle tested and has
successfully processed more than a hundred million API calls in a single day
in near real time. [Use pipes and mget in redis for speed]

Here are some answers to a few specific things you raise in your question;

1\. Use Sentry for lambda for error handling. The logs you get are incredibly
detailed and have single handedly given us the greatest visibility into our
application, moreso than any other tool we've tried (like AWS Xray).

2\. Cloudwatch logs are tough. You might want to consider piping your logs to
an Elasticsearch cluster, that might be a bit costly if you use AWS's
Elasticache.

3\. We use terraform for deploying our lambda functions and other resources.
I'd strongly recommend it.

[1] [https://highscalability.com/blog/2018/4/2/how-ipdata-
serves-...](https://highscalability.com/blog/2018/4/2/how-ipdata-
serves-25m-api-calls-from-10-infinitely-scalable.html)

------
ronpeled
We did. We are building our entire company: SQQUID on 100% serverless
architecture. Scalability is awesome, in fact we had to do extra work to
serialize some operations in order not to bring down other major corporation's
server stack. Cost is a fraction of the traditional app scaling setup.

The best part is no devops needed. We use Serverless Framework. The biggest
downside are cold starts for frontend response time. But this hasn't been a
terrible issue as of yet. We have considered moving these 20 API endpoints to
a nodeJS server which will resolve the issue but didn't have the time to do it
yet.

We'll never go back. Serverless is the future.

~~~
ec109685
What would the nodeJS server be?

------
jgelsey
Key parts of [https://auth0.com](https://auth0.com) is built on top of their
public serverless offering Extend serving 100M+ authentications/day.

------
achalvs
We at ReadMe recently launched Build
([https://readme.build](https://readme.build)), a tool for deploying and
sharing API's! It uses serverless under the hood, which makes it fast and easy
to spin up your tasks in the cloud. Services can be consumed with code (Node,
Ruby, Python), or via integrations with Slack, Google Sheets, etc. All you
need is the _one_ API key we provide to you.

We use it internally when fetching usage metrics, receiving notifications for
new sign-ups, and to monitor page changes on our enterprise app
([https://readme.io](https://readme.io)), and use these endpoints frequently
from Slack channels.

We manage versioning, rate limiting, logging, documentation, and have offer
private services and user management for teams.

Creating an API is as easy as:

    
    
      $ npm install api -g
    
      $ api init
    
      $ api deploy
    

AWS Lambda performs wonders. It's enabled us to make it as simple as humanly
possible to create, deploy, and share functions. It requires little prior
knowledge to start tinkering with, has a growing community to provide support,
and handles smoothly in a production setting.

Serverless rules. I'd love to hear any feedback on our implementation of it!
Give Build a try at [http://readme.build](http://readme.build), and feel free
email your thoughts to achal[at]readme.io.

------
maktouch
Instead of hitting our ingresses / load balancer, we made it so that webhooks
hits a cloud functions, which then transform it to a cloud pubsub.

We listen to the cloud pubsub from a worker.

1) we don't manage it. We receive quite a lot of webhooks and it's nice to
offload that 2) all of our webhooks are async. We just have 1 worker that
handles it all, instead of provisioning a bunch of pods. 3) managing cloud
functions is dope, since you can make it autodeploy from git.

10/10 would use again. Not sure about building a whole app around it tho

------
pedalpete
We have our backend processing for both
[https://doarama.com](https://doarama.com) and
[https://ayvri.com](https://ayvri.com) done in lambda.

All of [https://ayvri.com](https://ayvri.com) is built using serverless.

(we'll be migrating from doarama to ayvri in the coming weeks)

For our processing we handled 200k uploads in one hour on ayvri when we where
building scenes for the Wings for Life event (the World's largest organized
run).

When just relying on serverless triggering an event from s3, the cost was high
due to the volume of scaling, time in spinning up new services, etc. etc. We
built a queuing system which manages load and then spins-up new instances
based on the load in the queue. This resulted in a much faster response, and
SIGNIFICANT reduction in cost.

For the ayvri website, some pages are slow due to the lambda's not being warm,
and I'm surprised users haven't complained. The important stuff is kept warm,
and we're working on scaling that out for more responsiveness across the site.

As far as visibility into the app, I'm not going to pretend this is a solved
problem. At the moment, we have most of the visibility we need via cloudwatch,
and we have built some of our own analytics.

We had one instance where there was an issue between db connectivity which we
were not able to resolve. We have put it down to a short networking issue
between services. It lasted for 5 minutes one Sunday morning and then went
away. So we had enough visibility into the service not being available, but
failed in deeper understanding of where the problem was.

If you have further questions I can help with, feel free to reach out.

I will say, that I bought into serverless and went whole hog. I probably don't
recommend that. We jump through some hoops we probably wouldn't need to if we
had run our website via an ec2 instance and cloud-formation managing the
scaling.

However, we have a few of our services which can come under high load quickly,
and we don't need to scale up the entire site to serve those, such as our
track processing. We believe Serverless was the correct decision for those
processes.

~~~
kbyatnal
Why was the cost high when relying on triggering from S3? Isn't lambda charged
per invocation?

~~~
pedalpete
There are a few reasons, it gets complex, but it isn't just invocation, it is
billed by a combination of invocation and time.
[https://aws.amazon.com/lambda/pricing/](https://aws.amazon.com/lambda/pricing/)

That time includes the amount of time to spin-up the lambda.

Ours is a long-running lambda, and we benefit from some local cache when they
are running as well.

In order to handle the load, we had to extend the number of lambda's available
on our account as well, so we are talking about 1000s of seconds being eaten
up every second.

------
_bxg1
The "serverless" architecture is cool, but it irks me every time I hear the
given name for it. For a marketing term mainly aimed at developers, I'm amazed
they picked one so terrible.

~~~
CodeM0nkey
I've heard the analagy used that "Servers are to Serverless what wires are to
wireless", the main point being that they're still there, but they're no
longer something you manage or directly interact with.

I'm also starting to see the term LaaS (Logic as a Service) used as an
alternative to "Serverless" here and there.

~~~
_bxg1
I think the analogy falls apart because in the part of the system that's
referred to as "wireless", there is literally no wire. The serverless
comparison would be more like if they used a lot of zip ties and some well-
placed rugs so you never _saw_ the wires going directly to you computer and
called that "wireless".

LaaS sounds much more accurate/palatable.

~~~
anothergoogler
Not just that, but "serverless" typically involves integrating a muddle of
AWS/GCP/Azure services, locking you in. Portable software is the analog to
"wireless" here, as it gives you the freedom to... move.

------
BillinghamJ
Yes. We use both Lambda and Lambda@Edge, but for different reasons.

Virtually all async processing we do is achieved by the main service (running
as a normal microservice - no serverless stuff) pushing into an SQS queue,
then a Lambda function running every minute pulls from the queue to e.g.
report policies to our underwriters, issue policy documents, capture payments,
etc.

Essentially anything which happens after the hot path to get a response to the
user - all this stuff can take a little while, isn’t really time sensitive,
often needs to be tried multiple times, etc.

The Cloudfront distribution we have for our API passes all requests and
responses through a Lambda@Edge function before/after the request hits our
real system.

If you’re not aware, Lambda@Edge runs in the Cloudfront PoPs, so super low
latency and can reject/respond to requests without going back to our real
server.

We use it basically as middleware and one way to protect our real backend from
potential bad actors:

\- applying CORS headers

\- doing basic (offline) auth checking - as we use JWTs (this happens in the
services too, where it’s checked online)

\- removing unwanted input headers

\- generating and setting a request ID header

\- setting X-Forwarded-For appropriately

\- enabling the use of persistent HTTP/2 connections on a single hostname,
even though our underlying services are all on separate hostname (basically
some URL rewriting)

\- enforcing minimum mobile app versions (as a regulated company, we
eventually have to break very old versions of our mobile app, as they contain
copy which is no longer correct/true)

\- even calculating and returning insurance pricing with ultra-low-latency
without having the latency of going all the way to our real servers in eu-
west-1 - we’ll do a blog post about this at some point

Overall I must say, I’m incredibly pleased with Lambda and Lambda@Edge.

Only criticisms are that the GUI is an incredible pain in the ass to use (we
don’t use any off-the-shelf serverless framework), and they’re very slow at
supporting new Node versions.

Lambda does now support Node 8 (with async/await support etc.), though it took
months. But Lambda@Edge still only supports Node 6 which has been quite
difficult to continue supporting across our monorepo.

~~~
jonathan-kosgei
As of May 14, lambda@edge supports node8

[https://aws.amazon.com/about-aws/whats-new/2018/05/lambda-
at...](https://aws.amazon.com/about-aws/whats-new/2018/05/lambda-at-edge-adds-
support-for-node-js-v8-10/)

~~~
BillinghamJ
Aha! That’s really excellent and will make it so much easier for us to support
Lambda@Edge.

Thanks for letting me know :)

------
djhworld
Never done any API/HTTP stuff with lambda, but I've built many ETL pipelines
using it.

A few years ago we tried the Kinesis -> Lambda thing, but it failed during a
large traffic event. This was due to the way the Kinesis poller works, and was
made worse by the fact that we had two lambdas running off the same Kinesis
stream.

The main issue was you can't control the Kinesis poller - meaning we ran into
resource contention during this high traffic event and the iterator age fell
behind quite drastically. So we abandoned it in favour of EMR + Flink + Apache
Beam.

Other then that though, the S3 -> Lambda stuff works perfectly, and has been
running for 3+ years with no issues.

~~~
jonathan-kosgei
We also found that using lambda to process kinesis logs was incredibly slow
and definitely not the best use case for lambda.

We had much better results running a server with a couple of kinesis consumers
and redis.

This setup was able to process hundreds of millions of records in near real
time.

------
thepratt
I have some python based lambdas for simple service/user story monitoring. The
couple problems I have with lambdas, which mean I will _never_ use them for a
proper application:

\- The language choices available do not meet my needs \- Impossible to create
a prod-like setup; all our services run on k8, so mini-kube works great
locally \- You lose any sort of control over architectural decisions (for
better or worse) \- Poor code structure/quality/re-usability leads into poor
developer experience. I enjoy most new technologies I pick up, all I got from
Lambda was frustration.

------
cagenut
As an ops person this is a super interesting question, so its really kindof
surreal to read dozens of replies wherein _not a single one_ mentions
throughput, tail latency, or error rate measurements.

~~~
jacquesm
For near realtime systems that scale it is right up there with the fastest
application servers. In fact, if you take the auto-scaling properties into
account it probably beats those servers because it can do it seamlessly up to
incredible number of requests / sec without missing a beat. If you want low
latency you can replicate your offering in as many zones as you feel like.

People start worrying about throughput, latency and error rates when they
become high enough (or low enough) to measure.

My personal biggest worry is that if your Google account should die for
whatever reason your company and all its data goes with it. That's the one
thing that I really do not like about all this cloud business, it feels very
fragile from that point of view.

~~~
jacques_chester
> _In fact, if you take the auto-scaling properties into account it probably
> beats those servers because it can do it seamlessly up to incredible number
> of requests / sec without missing a beat._

Autoscaling is one of those things that's easy to name but hard to actually
achieve. I've had some involvement with an autoscaler for a few months and
it's been educational, to say the least.

In particular people tend to forget that autoscaling is about solving an
economic problem: trading off the cost of latency against the cost of
idleness. I call this "hugging the curve".

No given autoscaler can psychically guess your cost elasticity. Lambda and
others square this circle by basically subsidising the runtime cost -- minute-
scale TTLs over millisecond-scale billing. I'm not sure whether how long that
will last. Probably they will keep the TTLs fixed but rely on Moore's Law to
reduce their variable costs over time.

Disclosure: I work for Pivotal on a FaaS product.

------
gsibble
Has anyone else found that the CloudWatch logging cost of millions of calls to
Lambda is actually as expensive or more expensive than Lambda itself? I have
to get rid of the IAM permission for it.

~~~
jonathan-kosgei
We actually found this out the hard way. We got DDoS'd and cloudwatch was our
second most expensive item after API Gateway.

Lambda and Dynamodb were almost insignificant.

------
awinder
Cloudwatch logs have been a big game changer versus on-disk logs for us.
Getting into larger clusters (and larger log files), figuring out what’s
happening in log files became somewhat arduous.
[https://github.com/jorgebastida/awslogs](https://github.com/jorgebastida/awslogs)
for viewing / tailing / searching is a lot easier. It’s also fairly
straightforward to get logs streamed through to ELK hosted within AWS, if
you’re interested in that angle.

~~~
tjholowaychuk
Do you guys find CloudWatch logs too slow? Even on very small apps I find
searching the past few days takes minutes (5-10m), but I'm using "structured"
JSON logs as well.

~~~
CSDude
All of the Cloudwatch is very slow (Logs & Metrics & Alarms) we hate it

~~~
kolanos
Have you tried IOpipe? [0]

[0]: [https://iopipe.com](https://iopipe.com)

Disclosure: Work on the Python and Go agents.

------
kolanos
For anyone interested, here's a handy calculator comparing Lambda to EC2:
[https://servers.lol](https://servers.lol)

------
jonathanfoster
I have a production service running on AWS Lambda and I haven't run into any
major challenges. The Lambda service is responsible for authenticating to a
downstream third party service and proxying requests along with an access
token. I would consider this a simple use case.

CloudWatch have provided me all the visibility necessary to troubleshoot
issues. I think the important thing here is to have a good logging strategy
(logs are only as good as what you put in them). In my case, I made sure info
messages were logging for the start and end of use cases (e.g., "reseting
password", "password reset successfully"), warn messages for non-fatal errors
(e.g. "username not found"), and error messages for fatal errors (e.g. "unable
to connect to database").

The only frustrating limitation I've run into is when the Lambda function
times out before receiving a response from the downstream service. At one
point, the downstream service was having major performance issues and
responses times were crazy high. This meant I couldn't get a response code and
had to run the downstream calls locally to troubleshoot.

Performance is not great (most requests are in the 400-500 ms range), but it's
more than adequate for my use case. A large portion of the response time is
likely due to the downstream service, but there are cold starts that spike
response time way out of normal range.

Overall, I'm really happy with AWS Lambda and it's definitely top of my list
when taking on a new project. I'm really interested in experimenting with AWS
Mobile Hub in the future. It doesn't get much better than one-stop serverless
shopping.

------
georgewfraser
My company uses them as a “shim” to ingest custom data sources into a data
warehouse. It produces a really elegant separation of concerns, where the
fiddly custom part is isolated in the lambda. I wrote up a blog post about how
this works: [https://fivetran.com/blog/serverless-etl-with-cloud-
function...](https://fivetran.com/blog/serverless-etl-with-cloud-functions)

------
x0rg
Disclaimer: this is all AWS related as this is the cloud I'm using. I haven't
tried Google Cloud Functions or the Azure equivalent.

I've been working with Lambda a lot more lately and it is not so bad... but
also not great. I'm saying this cause I found it hard to have a git-first (or
git ops) workflow that is good in AWS: it looks like everything is made to be
changed manually. CloudFormation is a slow with some resources (if you need
CloudFront it will take tens of minutes) and CodePipeline has a pretty
terrible UX and user experience. CodePipeline is cheap and it works for sure,
but it's not a good system for pipelines as restarting, terminating steps and
getting the output of steps just don't work in a decent way (I want to see the
output in the steps, not jump to CloudWatch). Pretty much every other system
outside of AWS is better than that, but the integration with Lambda and
APIGateway is not as good unfortunately. If you know of a better system for
CI/CD with AWS Lambda outside of CodePipeline, I'd be interested to try it.

In a similar way, most of the serverless frameworks I've tried are written for
a workflow that is executed from CLI which is great to start and attractive
for developers but not good enough for a company that aims at full
reproducibility of setups and "hands off" operations. Source code change
should trigger changes in the Lambda/API Gateway setup all the time and it
would be great if devs don't have to trigger changes manually.

Apart from those steps, I think Lambda is definitely promising and I see the
company I'm working for right now using it more and more. The developer
experience is still lacking IMO but I'm confident we'll get there at some
point.

------
gsibble
I've written and shipped numerous sites using Zappa for Python which makes
deploying on Lambda/API gateway very simple.
[https://www.storjdash.com](https://www.storjdash.com) is entirely Lambda
based (sorry for no real home page....you can read about StorJ at
[https://storj.io/](https://storj.io/))

------
intrasight
I did a project using the first version of Azure Functions. It was early days
with growing pains. I'll probably use Lambda on my next project. My general
take is that serverless is a black box just like any computer - you need to
figure out the rules of that black box and then accept those rules or tell the
people who can open the box to fix something if it's broke.

------
gingerlime
Serious, but small :)

Open source A/B testing backend using AWS Lambda and Redis HyperLogLog[0]

Have been using it in production for last couple of years with great success.
Replaced our (potential) ~$2000/mo Optimizely bill with somewhere in the
region of $10/mo.

[0] [https://github.com/Alephbet/gimel](https://github.com/Alephbet/gimel)

------
StreamBright
Yes, I have shipped 3 applications already using serverless
([https://serverless.com/](https://serverless.com/)) and AWS Lambda + API GW.
All applications are in production for few months now.

The applications:

\- creating snapshots of EBS volumes

\- disabling users in IAM

\- handling HTTP form submit for email list subscription

------
dlhavema
Are biggest complaint has been the cold start time. I cant say we have enough
production data to see how much its effecting clients though.. We dont have
enough constant traffic to keep them primed...

We are all in with AWS so the code build, cloudformation and code pipeline all
work really well with lambdas.

------
jayair
We run a service that manages deployments for Serverless Framework
applications - [https://seed.run](https://seed.run) and it is completely
serverless. It’s been great not worrying about the infrastructure. Would
definitely do it again.

------
thisisshi
Yes I have— Cloud Custodian ([https://github.com/capitalone/cloud-
custodian](https://github.com/capitalone/cloud-custodian)) relies heavily on
its event driven serverless policies to enable compliance across a large AWS
fleet— filling in the gaps that are otherwise missing in IAM. Currently there
are in the order of magnitude of 1000s of lambdas deployed across 100s of
accounts and it definitely does exactly what it needs to do with little
maintenance. Monitoring is done with a combination of cloudwatch, datadog and
pagerduty so getting alerted on failing or errored invocations is completely
built into our workflow.

------
eksemplar
We’re currently using Xamarin and azure as a backend, though not serverless
yet as stuff like cosmos pricing is still too expensive. But using Xamarin,
which is terrible on its own, we’re also splitting our clients up in two
languages. I’d like if we could adopt flutter and angularDart for clients, and
I’d really like to run some of the backend in something like Firebase / cloud
firestore.

I’m not sure if google is a good option in terms of privacy and EU legislation
though. I have my lawyers looking into it currently, but to be honest I’d love
if Microsoft made an Azure alternative and fully embraced dart for azure.

This is probably unlikely, but it would be really awesome.

------
zenovision
Developing a SaaS product based on completely proprietary stack that you can't
even host yourself is VERY dangerous!!! Just yesterday there was a report that
twitter bought a company and immediately closed their API access to all
customers.

What will you do, if Amazon decide to close your AWS account for some reason?
What if they discontinue one of the services you use?

Here is a huge list of products discontinued by Google as example:
[https://en.wikipedia.org/wiki/List_of_Google_products#Discon...](https://en.wikipedia.org/wiki/List_of_Google_products#Discontinued_products_and_services)

~~~
gizmodo59
Most of the stuff in GCP won't be closed like the list as they are used by
enterprise customers (Or at least, there will be proper notice, migration path
etc). I agree that at the end of the day you have little control over the
infrastructure, its impossible for a lot of companies to maintain them which
is why they are going to cloud. If you are very worried, you can just use the
VMs (EC2 or GCP's VMs etc) and not use other services.

~~~
thepratt
I think the other more important path mentioned would be: "what happens in
case of a ban/restriction?". Not with AWS, but we've previously had accounts
locked/closed "accidentally" \- thankfully they were non-mission critical.

------
vira28
Not sure about the qualification for the term "serious", but we are making
$1000 revenue each month serving hundreds of customers and all our backend is
built completely on Google cloud functions.

[https://aprl.la](https://aprl.la) [https://itunes.apple.com/us/app/aprl-mens-
clothing-network/i...](https://itunes.apple.com/us/app/aprl-mens-clothing-
network/id1342727273?mt=8)

In fact, we had one of our API running in EC2 which recently migrated to cloud
functions.

------
kaishiro
We're really passionate about static sites with cloud functions for dynamic
functionality. After using Snipcart on a few sites and feeling we could do
better, we actually built out our own e-commerce solution as a drop in product
for static sites. It's all baked on firebase and cloud functions and we're
_loving it_. It's super fun to work with and costs dollars to run. I'm usually
very averse to the "build" end of build or buy scenarios but we couldn't be
happier with the end result.

------
kring462
Yes, our entire email attachment -> image processing pipeline is serverless on
Lambda, written in JS.

So far, we love it. It handles roughly 20k images a day.

You are right that Cloudwatch logs are a hassle. So we pipe all of the log
events into Scalyr (and log JSON objects, which Scalyr parses into searchable
objects).

In terms of error handling, Lambda retries once on exception. So we raise
exceptions in truly exceptional cases (e.g. - some weather in the cloud
prevents a file from being downloaded or uploaded). We have Cloudwatch alerts
that notify the team for every true exception. Happens less than once a day.

In pseudo-exceptional cases (e.g. a user emails an invalid image), we simply
log to Scalyr with an attribute that identifies that the event was pseudo-
exceptional, and then set up Scalyr alerts to email us if the volume of those
events goes above x per hour.

tl;dr - Cloudwatch + Scalyr with good alerts and thoughtful separation of
exceptions from pseudo-exceptions is my recommendation!

------
bitL
Deep Learning classifier services via AWS Chalice. It's trivial. However,
50MB/250MB lambda limit is a pain, one has to cut down TensorFlow, Keras etc.
significantly before deployment (doable but difficult) or do some S3 tricks
with /tmp. I wish they allowed increasing this limit for extra money. It's
cheaper than EC2 Elastic Bean Stalk though.

I wouldn't do that again; I can waste time on more interesting things than to
hack artificial limits of architecture that will be changed at some point
anyway.

------
brianleroux
Yes! It's great. Scales very well. Costs almost nothing. Focused only on
business logic. Provision in minutes. Deploy in seconds. I will never go back.

\- [https://begin.com](https://begin.com) \-
[https://brian.io](https://brian.io) \- [https://arc.codes](https://arc.codes)
<\-- the framework I used

------
jdavis703
Yes, at my last company we needed to generate Open Graph thumbnails that
composited several images together. We decided to use a serverless
architecture that basically shelled out to an ImageMagick command and then
pushed the thumbnail to S3 which was served via a CDN. The main problem we ran
in to was a lack of processing power, but the new options from AWS solved our
issues. I'd definitely do it again.

~~~
kiallmacinnes
I think this gets at the core of why I think "100% serverless" isn't the right
move for many projects.

If you decide you're never going to boot a machine and manage it yourself,
you're locked into the exact set of choices your cloud had made available for
you. When your project has a need that's not covered, you're stuck.

I'm not talking about vendor lock in here, purely about the reduced
flexibility within a given vendor if you choose to never manage a server
yourself.

~~~
fapjacks
This is totally correct, but I think it's also correct to say there is a huge
vendor lock-in component to serverless. People make the claim that you can
engineer your application to be vendor agnostic, but even if that's the case,
you're still dumping _a ton_ of time/money into AWS-specific tooling to get
your application going, and almost none of that experience is transferable.
Nothing from AWS API Gateway is applicable anywhere else, for example, and
that's frankly one of the most awkward of all of the AWS services I've ever
used (and so the costs are even higher than if it weren't). That's not to say
that there won't be some form of serverless in the future that doesn't have
this enormous vendor-specific lock-in cost. But that serverless is not here
now.

------
southpolesteve
Yes. bustle.com, romper.com, and elitedaily.com are all 100% serverless and do
80+ million unique visitors per month. GraphQL+AWS Lambda

~~~
vincentmarle
Interesting. Are you using AppSync?

~~~
reconbot
Nope, appsync came out a long time after we were in production and while it's
very powerful, it's would require a full rewrite of our application. It's an
application service/framework upon itself.

The Bustle stack looks like Redis/Elasticsearch => NodeJS Lambda GraphQL API
layer => (sometimes api gateway) => NodeJS lambda render layer => api gateway
=> CDN. We're working towards removing all the api gateway usage if possible
with smarter CDNs like cloudflare workers and Lambda at edge, but it's not
currently possible.

This setup gets us an average of 70ms api response time and less than 200ms
worst case rendering time. Higher than 90% of cache misses never gets the
worst case as we can serve stale content in those cases. Lots of room for
improvement too. =)

~~~
avidal
I'm curious on what's missing from Cloudflare Workers to allow you to remove
the API Gateway usage. We're actively looking for more advanced use cases so
we can make sure we prioritize upcoming features. Reply here or send me an
email at <username> at cloudflare.com.

------
hellofunk
Anyone have a recommendation how to get started with AWS serverless options?
There are so many AWS services and it isn’t intuitive when you login and see
all the stuff you can do. Coming from Heroku where you do it all yourself,
knowing when and how to break up functionality among AWS tools is not always
clear.

------
adzicg
we’re running a collaborative document editing service for mind maps
(www.mindmup.com) entirely using lambda and associated services (such as
kinesics and api gateway). started migrating from Heroku in 2016, went all in
around February 2018. My anecdotal evidence is that we’re a lot more
productive this way.

------
meiraleal
"Serverless" (by the current way it is approached) is more of the likes of
"Cloud computing". Someone's server is your server. It is not difficult to
create real serverless apps today, that works disconnected and fetches cached
data when there is new data available.

------
joper90
We use Azure Functions to pull data out of DataDog and other services
(bespoke) to chuck in a capacity management database, and then use more FaaS's
to munge the numbers etc.

We also use it in prod with for events, and customers who subscribe to message
queues and webhooks etc.

------
tortilla
Just deployed a s3 bucket virus scanner for my main SaaS

Customized this for my needs: [https://github.com/upsidetravel/bucket-
antivirus-function](https://github.com/upsidetravel/bucket-antivirus-function)

------
iMerNibor
Not "serious", but I can definitely recommend it for simple (transformative as
in webhook -> api) endpoints you don't want to care about hosting/maintaining
a server for Low volume stuff is even free (at least on googles cloud)

------
dfischer
I've been shipping serverless ever since it was launched on AWS. More recently
I am using Google.

The biggest thing I am struggling with right now is how to appropriately split
projects for module size. This may be more of an issue with Firebase functions
than AWS because it's a lot easier to create separate projects with AWS for
firebase and constrain the project. Google Firebase Functions very much
assumes a big package architecture. We could break it up into multiple
firebase projects but that separation creates a lot of annoyances.

It'd be awesome if you could split packages up so the resources don't have to
be shared within the various functions.

tl;dr want to split modules and functions up with explicit dependencies to
optimize bundle size / cold boot.

Additionally I found it to be an anti-pattern to use any DB that requires a
connection pool vs HTTP based commands. It's annoying as heck to manage
connection pools with serverless and seems downright buggy or broken. If you
want to support it you need to centralize it with something like PGpool which
seems like a big anti-pattern. I hate dynamodb but am loving Google's offering
(firebase firestore or datastore).

------
dbetteridge
AWS lambda running flask endpoints Accepts a variety of geojson and json
inputs and talks directly to both RDS and S3.

Used as a backend for 2 webapps , one is for viewing geospatial data and 1 for
creating transport scenarios.

------
appdrag
We made a serverless backend with an API builder based on aws lambda. We have
hundreds of production system running on it, so we are pretty happy with
serverless :)

Https://cloudbackend.appdrag.com

------
com2kid
Currently building a startup off of serverless tech. Hoping to have few enough
API calls that it is actually cheaper this way.

I'll soon see if the back of the envelop calculations were correct. :)

------
neom
I recall Jed Schmidt talking at various events about UNIQLO mobile app having
a lot of serverless/lambda in their architecture. IIRC for
preprocessing/images.

------
bootcat
I was working for EllieMae, to build their cloud platform. They were not happy
with the performance of the product, though it reduced ops work.

~~~
PretzelFisch
was this mostly cold starts or another architectural choice causing a
bottleneck?

------
k__
Lambda is just one of the serverless services out there.

S3 and DynamoDB are serverless too and many big projects run with them.

------
stevehiehn
How does one put a bunch of functions in source control? Or is this an
opportunity for a new product?

------
deegles
A serverless app of mine gets only 30-40k hits a month... it’s serious to me
though.

------
kondro
iRobot run their surprisingly large systems using AWS Serverless products as
much as possible.

------
KirinDave
So, you've kinda mixed a few things here including: "Cloud functions", "Cloud
Streams" and "Cloud databases" [1].

I've shipped significant work on all 3 now, with the least focus on the newest
bit (cloud functions). Since you asked, here are my opinions:

# Cloud Databases

These are almost always a slam dunk unless you or someone else on your team
has a deep understanding of MySQL or Postgres [2]. They often have unique
interfaces with different constraints, but you can work around these
constraints and the freedom to scale these products quickly and not worry as
much about maintenance can be an enormous boon for a small team. This is
fundamentally different from something like AWS RDS, where you do in fact sort
of "have a server" and "configure that server". These other services have
distribution build into their protocol.

Of the modern selection, DynamoDB and Firebase come to mind as particularly
useful and spectacular products for key value and graph stores (DynamoDB is
surprisingly good at it!). If you're using GCE, Spanner is some kind of
powerful sorceros storm that does your bidding if you pay Google, it's really
surreal the problems it can just magically solve (it's the sort of thing where
it's so good your success with it disappears until you have to replicate it
elsewhere and realize how much your code relied on it).

# Cloud Streams

I've been using these nonstop for about 6 years now, with most time logged on
SQS. For some reason a lot of people object to streaming architecture on
grounds of backpressure [3], or "want to run their own because of performance"
and end up hooking zookeeper and Kafka into their infrastructure.

For small products or growing products, You Will Almost Certainly Not Overload
SQS or Kinesis. You Just Won't Unless You're Twitter or Segment. Write your
system such that you can swap streaming backends, and be prepared to solve
obnoxious replay problems moving to a faster and less helpful queue.

Lots of folks are convinced they need to run their own RabbitMQ service so
that they "can see what's going on." Given how incredibly reliable SQS has
been for me since its introduction, I'm disinclined to believe that. While
RabbitMQ is a fine product, I'd rather just huck stuff on SQS, obey sound
design principles, and then only transition to faster queues ones.

# Cloud Functions (Cλ)

Firstly, these solutions work fine. I've only shipped on Lambda, and I will
say I was underwhelmed. There are two reasons for this: cost and options.
Cloud Functions with API Gateway is just about the most expensive way you can
serve an API in the world of CSPs right now. The hidden requests costs are (or
were when I set this up, shipped then tore it down looking in horror at my
spend) just stupid. As for Options, it's very obnoxious how these environments
(GAE, Lambda, etc) can only bless specific environments rather than giving us
a specification over I/O or shm we could bind to. I want to ship Haskell in
some cases and it's stupid what I have to do to enable that [4].

Much has been said about how spaghetti-like these solutions are, but I think
this is more of a tooling issue. If you can actually specify Cλ endpoints in a
single file, then you can write a uni-repo for a family of endpoints that
share common libraries, build for those, and terraform/script them into
deployment. This is actually probably _more_ principled than how most folks
cram endpoints into a single fat binary. It also makes things like partial
rollouts on an API a heck of a lot more easy to implement.

But still, out of the trinity of CSP products, Cλ is by far the least exciting
to me. I seldom ship API endpoints there. I usually use it for small cron jobs
or data collection jobs where I'm confident I wont' end up with 4 running
instances because a looped call is timing out.

[0]: I'm experimenting with writing these mega posts with classical footnotes
as opposed to making them epic journeys to slog through my prose style.

[1]: I hate myself more every time I say the word cloud even knowing it's the
lingo folks will understand the most. They're service products. Let's all sink
into despair together.

[2]: And by "deep" I mean, "Good enough to have a reputation suitable for a
professional consultant and attract desperate clients."

[3]: To which I say, "Look, if you wanna pretend that the only possible
architecture is a spiderweb of microservices that positively push backpressure
up to the client and pretend that introspect-able queues don't give your
services equivalent confirmation, that's a game you can play. I think it's
disrespectful to folks who have equivalent backpressure schemes because they
have similarly refined infrastructure for understanding their queue volume.
Both methods are similar, and have different strengths. Needham's duality is
real and it's exactly the same here as it is on one single computer."

[4]: It's 2018, we have containers, and if you support Java with its slower
startup times you surely could support lightning fast Rust or Haskell
executables as well. Get with it, Amazon!

------
wainstead
Nope.

~~~
Sohcahtoa82
Why even leave a comment?

------
rgbrenner
Lambda et al have some serious shortcoming and a lot more work needs to be put
into these serverless platforms. The approach they're taking I don't think
will last. It really needs a redesign/restructure.

I think serverless is the future... but not today.. in 5-10 years. That sounds
like a long way off, but it's not.. it'll pass in no time. And maybe they'll
improve it enough by then to make it viable.

I wouldn't build anything serious with it, unless you're ok rewriting it a few
years from now.

~~~
tucaz
Lot's of empty claims.

> some serious shortcoming and a lot more work needs to be put into these
> serverless platforms

Not actionable at all

> The approach they're taking I don't think will last

No reason given

> It really needs a redesign/restructure

Nothing here too

> And maybe they'll improve it enough by then to make it viable

What makes it (un)viable?

~~~
rgbrenner
That's fair. I'm the founder of a company building a serverless platform. I
didn't want to write specifics because it may reveal details about our
approach.

~~~
pedalpete
Isn't this a great opportunity to sell us on your solution, or get our
feedback? You don't need to reveal details of your approach, but what are the
actual problems you see?

As a company that has built our most recent site completely on serverless,
we've become familiar with some of the issues. Though I definitely would not
consider it a no-go.

------
Firegarden
Listen to I just have to say this for everybody else out there there is no
serverless design it's not like the fucking things running in JavaScript on
the clients there are servers the whole name is stupid come up with a better
name for service man it's not serverless God help us

~~~
Firegarden
If you're going to call it serverless architecture then you better be running
it on webrtc data streams over P2P clients

