
How we built a serverless architecture with AWS - kdeorah
https://www.hypertrack.com/blog/2019/07/11/how-we-built-a-serverless-architecture-with-aws/
======
LaserToy
Well, you folks now made your business super coupled with AWS. I just have 1
word: Oracle

~~~
shiftpgdn
It's pretty obvious that someone at Amazon is watching and voting on this
thread because this is absolutely true. This kind of thing is vendor lock-in
to the nth degree.

~~~
LaserToy
Funny, right. I worked for at least 2 companies that at some point in time put
a lot of money into Oracle. One of them is a leading gaming network, a lot of
billions in revenue.

Teams struggled with migration off it. It was a multi year/multi millions
project and there is no end to it. And newcomers were saying -> oh, that was a
silly idea to use all this stuff (why didn't they used Dynamo :) ), hovewer,
15 years ago it was pretty ok + Oracle solution architects were all over the
company.

I don't see how amazon's strategy is different. And I don't get how folks, who
are saying Oracle lock was bad, but Amazon is ok, can justify such a thinking.

I will put my money on it, in 10 years those will be good examples of how not
to do things. Like, for example, when AWS leadership changes. And internet
will be: who would've seen it coming.

~~~
zaarn
Same reason that people think that Chrome lock-in is okay and IE lock-in was
bad; the new product is shiny and has good features. At the moment.

Well, and you gotta justify that 1 million $ investment (R&D + Costs) into
your AWS architecture somehow.

------
apsdsm
It occurs to me that you can stick a single colon into the title after
“architecture” and you pretty much get the summary of the article.

------
bborud
I like the idea of serverless architectures, but I still wouldn't use it for
anything that is important.

\- Using a serverless architecture almost always implied getting married to
your provider. You can run your code in only one place. You have given up all
bargain power. When the relationship ends you have to build your system over
again.

\- It isn't really serverless; they're just not your servers.

\- They are only efficient for the workloads the architecture is designed for.
Stray outside the parameters and things start to become expensive, slow or
both.

\- If you use serverless architectures you have to make damn sure the people
who built it stick around, because the only value you are left with if your
provider folds or increases prices on you is inside the heads of the people
who built the solution.

I have already seen friends getting burnt by this. Typically people build a
prototype or a technology demo, it gets funding, the CTO insists that it isn't
important to do anything about the serverless bits and just go with them
(there is no pause to make good on "we'll fix it later" once money gets
involved), then get jerked around by the service provider because they can't
provide the support needed. Then they slam head first into the costs of actual
production traffic which, somehow, even though it requires only basic
arithmetic skills, none of them had been able to calculate before the huge
bills started rolling in.

~~~
plufz
> Using a serverless architecture almost always implied getting married to
> your provider

I don’t disagree but I’ve made a web app with aws serverless. Frontend on s3,
Backend Python flask on serverless and MySQL server (haven’t tried RDS
serverless yet). Works fine, had a compiled library that did not work but all
standard stuff. No marriage. :)

~~~
bborud
Then strictly speaking it isn’t serverless.

------
holoduke
I don't understand what you really gain with this setup. I mean, this extreme
vendor lock-in situation is so short term. The absolute wrong strategy if you
ask me. I would be curious to see this company in 5 years from now.

~~~
akishinevsky
Let me ask you this question differently: lets say you are exploring a new
market opportunity for the an exciting product you want to build. Would you
rather spend cycles building what AWS has done already and thus delay speed to
market or would you rather use what has already been done by AWS managed
services and build what nobody has done before? Surely, this architecture will
evolve over time, but it will only evolve as the startup quickly discovers
what the market needs.

------
leetbulb
Oh boy, I bet that's costing a new car each month.

~~~
nisten
I used to hate aws for how expensive their bandwidth and storage was, until I
started actually using it last year. I think their new serverless stack is
about to leave a lot of devops out of a job.

You can setup a a CI/CD pipeline in about half an hour with amplify, at the
previous company I remember it taking a good 3 weeks to get CircleCi up and
running properly.

And then moving a microservice over to it is basically 1 command, a few
options, mostly just copy over the config from your old express backend with a
few changes, and you're done. It's insane.

One other dev I've showed the lighthouse scores of the react stack I deployed
on it even said "this should be illegal". And they're right, it's pretty much
automated devops, the whole ap now loads in 300ms. If you have server side
rendering in your app the static content will automatically be cached on their
CDNs.

And if you want to save a bit of money you can just use google firebase for
your authentication and db. GraphQl is surpsingly a breeze too as a middle
layer if you want to leave your java or .net backend apis untouched.

At the end of the day, nodejs is completely insecure by design, your
infrastructure will never be as secure as running it on gcp or aws. That's why
you go serverless and stop messing with security and front end scalability.

If they solve the cold-start issue of databases on aurora they will completely
dominate the market even more than they already have.

~~~
dmix
SSR is going to do wonders for page load times on the internet as it finally
gets popular via React/Vue. I hope it's the future for all of these heavy-
weight user-facing JS apps.

~~~
Jedi72
SSR is the future? What has PHP been doing for 15+ years?

~~~
dmix
I'm talking about Next.js/Nuxt.js style JS front-ends replacing exactly that
plus JS heavy frontends like Angular and SPA react apps which was the last
decade's modus operandi.

The way SSR hooks React/Vue into these JS apps "hydrating" them after loading
prerendered component based views...to make them interactive without losing
any performance compared to static HTML, is unique and extremely powerful,
which most people don't understand until they do it. It really is the future
of frontend development.

SSR combined with async loaded chunked bundles of components is far more than
prerendering some server side Web apps templating library with full HTTP
requests in between. All the power of a full fledged SPA but with none of the
performance or SEO downsides with automatic offline + service worker caching.
It's great for the webs future.

[https://ssr.vuejs.org/](https://ssr.vuejs.org/)

~~~
Jedi72
Yo dawg I heard you like job security, so we put a program in your program,
now you you can render while you render

More seriously, I do understand the difference, but disagree with the whole
approach in 95% of cases

~~~
dmix
Even the Haskell people are adopting SSR’d JS-heavy frontends (Miso,
Purescript, etc) for their web apps. That’s when you know it’s mainstream.
Good luck with PHP!

------
agraebe
Saw this post a while ago: [https://medium.com/@dadc/aws-appsync-the-
unexpected-a430ff71...](https://medium.com/@dadc/aws-appsync-the-
unexpected-a430ff7180a3) \- did you hit any limitations with AppSync?

~~~
kdeorah
We haven't hit any scaling issues yet. GraphQL is nice. It's really about
getting data directly from DynamoDB and Aurora to an end point that
Android/iOS/React-JS can query and subscribe to. Apache Velocity Template
Language that AppSync uses is a pain though. This post captures it well
(unfortunately):
[https://www.reddit.com/r/graphql/comments/b0zomv/aws_appsync...](https://www.reddit.com/r/graphql/comments/b0zomv/aws_appsync_vs_apollo_server/)

~~~
akishinevsky
AppSync does have limitations we have to contend with. Custom scalar types
cannot be defined hence we are not able to define strictly typed GeoJSON
objects. Apache VTL has its own learning curve; once you master it you can
implement functionality without leaning on invoking lambda functions and avoid
paying for their usage in high volume GraphQl call scenarios and access
queried data faster.

~~~
tirumaraiselvan
Just FYI, Hasura GraphQL Engine has native support for GeoJSON types:

[https://blog.hasura.io/graphql-and-geo-location-on-
postgres-...](https://blog.hasura.io/graphql-and-geo-location-on-postgres-
using-hasura-562e7bd47a2f/)

PS: I work here. Apologies for plug.

~~~
cbzehner
I've been using Hasura for several months at work and it's approach to GraphQL
has nailed the level of abstraction needed for early product development.

It's a great complement to serverless and static front-ends.

------
reilly3000
How do you deal with Lambda concurrency? I have found its pretty easy to hit
1K concurrents if functions take a long time to run and receive bursty
traffic.

~~~
xsmasher
Do you mean you don't want it to handle 1k concurrent requests (you want some
to be rejected or queued instead?) or do you mean that the concurrent
execution causes some other problem?

(honest question, not snark)

~~~
bzbz
I think they mean there’s a 1k concurrent request limit that they hit. Though
the alternative would be dedicated servers and load balancers, no?

~~~
reilly3000
Right, I'm referring to AWS limits. I was running a benchmark yesterday
against a logging endpoint I made with a similar architecture to the article.
One function is attached to a public ALB endpoint and does some validation
then writes the event to SQS; this was taking 100-200ms with 128Mb of RAM. A
second function was attached to the SQS queue; its job was to pull events and
write them out to an external service (Stackdriver, which sinks to BigQuery).
This function was taking 800-1200ms at 128Mb RAM, or 300-500ms at 512Mb
(expensive!).

While running some load testing with Artillery I found that I was often
getting 429 errors on my front-end endpoint. When pushing 500+ RPS, the 2nd
function was taking up over 50% of the concurrent execution limit and new
events coming into the front-end would get throttled and in this case thrown
out. That also means that any future Lambdas in the same AWS account would
exacerbate this problem. Our traffic is spiky and can easily hit 500+ RPS on
occasion, so this really wasn't acceptable.

My solution was to refactor the 2nd function into a Fargate task that polls
the SQS queue instead. It was easily able to handle any workload I threw at
it, and also able to run 24/7 for a fraction of the cost of the Lambda. Each
invocation of the Lambda was authenticating with the GCP SDK before passing
the event and the Lambda has to stay executing while the 2 stages of network
requests were completed.

I'm happy to report I haven't been able to muster a test that breaks anything
since I started using Fargate!

[https://docs.aws.amazon.com/lambda/latest/dg/concurrent-
exec...](https://docs.aws.amazon.com/lambda/latest/dg/concurrent-
executions.html)

~~~
lentil
> the 2nd function was taking up over 50% of the concurrent execution limit
> and new events coming into the front-end would get throttled and in this
> case thrown out.

It sounds like you already found a great solution for your particular case.
But it's also worth mentioning that you can apply per-function concurrency
limits, which can be another way to prevent a particular function from
consuming too much of the overall concurrency. For anyone who's lambda
workload is cheaper than a 27/7 task, that could be a good option.

> Each invocation of the Lambda was authenticating with the GCP SDK before
> passing the event

I'm curious whether you tried moving the authentication outside of the handler
function so it could be reused for multiple events? I've found that can make a
huge difference for some use cases.

------
thesanerguy
How does the cost of DynamoDB (and other components) compares to other options
that you considered, especially at scale? Would economics works with the same
architecture say at 100X scale?

~~~
kdeorah
Good question. At 100x, probably not. At 10x, yes would be better than
managing services on our own. By that time, we would have a better prioritized
list of which services to self-manage and which ones to leave to AWS. Are you
specifically concerned about DynamoDB for some reason?

~~~
thesanerguy
How easy or hard would it be to switch to self-managed components as you grow
from 10X to 100X? Quite often, they end up becoming a tech debt that remains
in the back burner. Just curious.

~~~
kdeorah
Ah yes. The engineer would tell you we can move when we want. The manager
would tell you it is harder than it looks. Management would tell you it will
never happen. :-)

See it as reducing startup risk and deferring the payment to when you become
successful and have money/time to throw at problem. Though there are best
practices to do it in a clean way so moving is easier.

Do you know some known gotchas here?

------
Charles_t
This is not going to scale. Lambdas are hella slow. The cold starts will kill
you.

~~~
vorpalhex
You can pay to never hit cold starts..

~~~
iends
No you can't, you can just try and keep your lambdas warm, but that isn't the
same thing.

~~~
jjtheblunt
wouldn't EC2 be an example of paying to never hit cold starts?

