
We reduced the AWS costs of our streaming data pipeline - cloudfalcon
https://www.taloflow.ai/blog/reducing-aws-costs
======
QuinnyPig
Hmm. This looks to me like a lot of the savings were realized by moving away
from managed services into a scenario where there’s more operator overhead.
The AWS bill gets lower, but what about the cost of the engineering work?

~~~
nojito
False equivalence. The engineer will be doing more than just cloud work.

This comparison is the #1 flawed sales tactic the cloud companies use to
convince you youre saving money

~~~
vageli
> False equivalence. The engineer will be doing more than just cloud work.

> This comparison is the #1 flawed sales tactic the cloud companies use to
> convince you youre saving money

Time is of a limited quantity and time spent managing postgres backups (for
example) is time not spent doing other (possibly more meaningful/impactful _to
the business_) work.

~~~
dylan604
what is involved in managing backups? isn't that just a cronjob?

~~~
otterley
First, you need to write the cronjob. But what goes in there? You need to
decide exactly how you're going to make a backup, and the process may differ
by what's being backed up. Ideally you want a quiescent snapshot, but the way
you do that varies by application. What if the application is a distributed
application, in which case you need to synchronize the snapshot process among
all its nodes? What if it's a master-replica design, where the node that runs
the cron job may vary based on the current topology?

And if you need some sort of cluster-aware lock to coordinate backups among
different peers, you'll need to decide which system works for you, implement
it, and maintain that as a separate system. And if that needs to be upgraded,
figure out a bulletproof process for upgrading it while it's still being used
as a coordinator.

Then, you need to ensure there's storage for the backup. You need to decide
what kind of storage you're going to use, make sure you've got enough space,
figure out how to encrypt the storage (very important in secure environments),
how to protect the storage using authn/authz. And lots of environments have
retention and storage lifecycle policies - you don't want to put the old
backups on the expensive fast media; you want it on the cheap slow media. And
some environments make you dispose of old data, so you have to figure out how
to age it out but without ever losing the backups you want to keep.

Finally, you need to make sure the backups you create are valid and usable. So
you'll want to build an automated regression testing procedure to ensure that
every time you make a change (regardless of how minor) to the system being
backed up or the backup process, that you end up with usable backups.

(Disclaimer: I work for AWS, but opinions expressed here are my own and not
necessarily those of my employer.)

~~~
ec109685
You make it sound like there aren’t cookbooks for many of these scenarios and
that the company will have to invent these scripts and procedures by hand.

Yes it is work, but this company’s whole reason for being is to save AWS
spend, so I assume they have patterns they employ for their clients regularly
that achieve their SLO.

~~~
sokoloff
There are original-definition cookbooks and yet it still costs me time to
provision my own lunch vs using the managed service of my corner restaurant.

~~~
ec109685
Yes, managed services are better in many cases.

------
meritt
I find it highly entertaining a two-year old company who was founded on the
basis of helping slash cloud spending found so much waste in their own AWS
spend. This is _not_ an example of dogfooding, but an example of sheer
incompetency and massive technical debt.

I'd really like to start seeing a series of blog posts from companies who are
running extremely lean and efficient tech environments by utilizing cloud in
an intelligent manner and avoiding the expensive and unnecessary bullshit
that's so prevalent today. The ones that can brag "How we run a $4M/yr SaaS on
$40k/yr of AWS spend!" are far more interesting than "How we stopped
incinerating millions of VC money by simply turning off shit we didn't need"

~~~
tilolebo
two-year old companies have limited resources. It might have been a deliberate
trade off to focus on work that produces value to the customers.

Maybe the blog post would have been "How we run a $1M/yr SaaS on $40k/yr of
AWS spend!" instead of $4M?

------
aritraghosh007
Back when AWS started, there would be articles about the work to master
scalability and performance for the modern web but as things matured, we
somehow ended up in a much larger heap of literature around AWS cost
optimization.

~~~
derex
In some sense this is a good problem to have. With on-prem you used to have
very limited resources to start with, so cost efficiency is a baked-in
requirement. With cloud providers you seem to have limitless resources and the
new problem of cost optimization arises.

Admittedly there’s difference between optimizing fully-controlled resources
and cloud provider managed services. For one, low visibility into cloud
service internals makes such optimization harder.

------
agounaris
I am curious about the actual cost in $! Managing your own kafka or
observability infra is expensive, you need a team to do this.

A 67% reduction doesn't say the whole truth. They have more services to manage
now, which means they need more people and more time to do this.

Saving 10k from your AWS bill by hiring 2 more engineers is not cost
effective.

~~~
nojito
Or course it is.

Where did we get the idea that engineers are hired to do only one thing?

This has never ever been the case in my experience.

Also Kafka being hard this manage is not the case. A simple look into many
small companies and startups running their own clusters shows otherwise.

~~~
agounaris
Engineers are hired to deliver and produce value. Tooling can be a part of it
but if you can outsource something which is not your source of income, you
have to do it. Engineering time is more valuable.

I also know many startups and small companies investing 5 people and 6 months
to get an observability platform up and running while they could just get
datadog or new relic for half the price... and I don't get into account
outages and updates to the platform.

I remember a recent uber blog post on how they moved from build tool A to
build tool B and a couple of weeks later, 3000 people where laid off. It's
important to spend development time on revenue streams.

This is some nice piece of advice [https://nav.al/build-a-team-that-
ships](https://nav.al/build-a-team-that-ships)

"Outsource everything that isn’t core. Resist the urge to pick up that last
dollar. Founders do Customer Service."

------
throwaway888abc
"Eliminate unused EC2 instances" -27% of cost

Haha, so cleaned the internal IT / DevOps mess and call it a day and than blog
post it

~~~
tjbiddle
Eh, you're pulling a quote out of context. It's 27% reduction in EC2 use,
which was only 18.5% total. So this only accounted for ~5% of total savings.

~~~
dannyw
I mean if a quarter of your EC2 instances were unused, that is absolutely an
internal devops / IT mess.

The whole point of AWS is to use services on demand; it's like buying 133
conference tickets for your 100 person company.

~~~
brianwawok
More like ordering 133 lunches every day for your 100 employees and dumping 33
in the trash.

~~~
billyhoffman
And doing this for months. Without noticing.

Honestly this isn’t ultimately engineerings fault. This is a SaaS business
Someone in their company is responsible for the COGS KPI. For that person to
either not notice an increase in COGS, or to not be aggressively incentivizing
engineering to reduce COGS, is giant red flag.

~~~
jinpan
The article did say that the motivation for doing so was because AWS credits
were running out .. why prematurely optimize a free resource? :)

------
anthonysarkis
It seems reasonable to make some of these cost comparisons more visible.

ie If working on a new product or feature to understand upfront "this managed
service is x% more then more bare bones" etc.

essentially turning an alchemy into a science

~~~
Cthulhu_
AWS offers a cost calculator for just that purpose; they offer 'easier'
products if you can't be arsed to dive into AWS costs and technologies
yourself.

I think a lot of people make the mistake of assuming AWS is just an easy off-
the-shelf thing you can just grab, but if you use it seriously it's a full-
time job and its own expertise.

Source: I've done some AWS certifications, never was able to put them into
practice though. I've also worked in multiple organizations that migrated to
AWS, they all had a full-time team of people managing it.

It's a full-time, specialist job and you can't just palm it off to your
engineers as a background thing.

------
tyingq
The initial pie chart seems to indicate that either AWS glue is significantly
overpriced, or that they were doing something wrong.

~~~
brodouevencode
As with all things AWS the more "magic" there is to it, the more expensive it
is.

~~~
csharptwdec19
Huge part of why I always try to build applications as platform agnostic as
possible.

If I make a .NET service or site, I know (with the tools I use) I can deploy
it on any linux or windows machine without issue. I can take it anywhere that
I can run any software.

Sure, may need more glue for certain scenarios, but you know that you can move
as soon as a provider shows it's fangs.

~~~
brodouevencode
Speaking from experience - not a bad idea.

------
theatraine
Interesting idea. Does anyone do this for Azure?

------
dirtydroog
We went through a similar process with GCP, which was annoying since GCP was
sold as being cheaper than AWS.

