
Ask HN: How can I quickly trim my AWS bill? - danicgross
Hi HN,<p>I work with a company that has a few GPU-intensive ML models. Over the past few weeks, growth has accelerated, and with that costs have skyrocketed. AWS cost is about 80% of revenue, and the company is now almost out of runway.<p>There is likely a lot of low hanging cost-saving-fruit to be reaped, just not enough people to do it. We would love any pointers to anyone who specializes in the area of cost optimization. Blogs, individuals, consultants, or magicians are all welcome.<p>Thank you!
======
boulos
Disclosure: I work on Google Cloud (but my advice isn’t to come to us).

Sorry to hear that. I’m sure it’s super stressful, and I hope you pull
through. If you can, I’d suggest giving a little more information about your
costs / workload to get more help. But, in case you only see yet another
guess, mine is below.

If your growth has accelerated yielding massive cost, I _assume_ that means
you’re doing inference to serve your models. As suggested by others, there are
a few great options if you haven’t already:

\- Try spot instances: while you’ll get preempted, you do get a couple minutes
to shut down (so for model serving, you just stop accepting requests, finish
the ones you’re handling and exit). This is worth 60-90% of compute reduction.

\- If you aren’t using the T4 instances, they’re probably the best
price/performance for _GPU_ inference. If you’re using a V100 by comparison
that’s up to 5-10x more expensive.

\- However, your models should be taking advantage of int8 if possible. This
alone may let you pack more requests per part. (Another 2x+)

\- You could try to do model pruning. This is perhaps the most delicate, but
look at things like how people compress models for mobile. It has a similar-
ish effect on trying to pack more weights into smaller GPUs, or alternatively
you can do a lot simpler model (less weights and less connections also often
means a lot less flops).

\- But just as much: why do you _need_ a GPU for your models? (Usually it’s to
serve a large-ish / expensive model quickly enough). If you’re going to be out
of business instead, try cpu inference again on spot instances (like the c5
series). Vectorized inference isn’t bad at all!

If instead this is all about training / the volume of your input data: sample
it, change your batch sizes, just don’t re-train, whatever you’ve gotta do.

Remember, your users / customers won’t somehow be happier when you’re out of
business in a month. Making all requests suddenly take 3x as long on a cpu or
sometimes fail, is better than “always fail, we had to shut down the company”.
They’ll understand!

~~~
ParanoidShroom
I was in the same boat and this is good advice!

I stopped using gpu's, "Vectorized inference isn’t bad at all!". This soo
much, I was blinded with gpu speed, using tensorflow builds with avx
optimization is actually pretty fast.

My discovery:

\+ Stop expensive GPU's for inference and switch to avx optimized tensorflow
builds.

\+ Cleaned up the inference pipeline and reduced complexity.

\+ Buying compute instance for a year or more provides a discount.

\- I never got pruning to work without a significant loss increase.

\- Tried spot instances with gpu's that are cheaper. Random kills and spinning
up new instances took too long loading my code. The discount is a lot, but I
couldn't reliable get it up. Users where getting more timeouts. I bailed and
just used cpu inference. The gpu was being underutilized, using cpu only
increased the inference to around 2-3 seconds. With the price trade off it was
a more simpel,cheaper and easier solution.

~~~
jwr
Also, consider physical servers from providers like Hetzner. These can be
several times cheaper than EC2.

~~~
papaf
I use Hetzner for quite a lot for personal projects and can recommend them for
reliability and predictable costs. I've done reasonably high CPU tasks like
compiling Android images on the larger Cloud instances.

However, this morning I was playing around with Scaleway bare metal [1] and
General Purpose instances [2] -- I am thinking of making a switch for high CPU
tasks.

[1] [https://www.scaleway.com/en/bare-metal-
servers/](https://www.scaleway.com/en/bare-metal-servers/)

[2] [https://www.scaleway.com/en/virtual-instances/general-
purpos...](https://www.scaleway.com/en/virtual-instances/general-purpose/)

~~~
jwr
Interesting! These look very good indeed. I will have to try them.

The main point is that physical servers are much cheaper than VMs and provide
significantly better performance as well (see my benchmarking and comparison:
[https://jan.rychter.com/enblog/cloud-server-cpu-
performance-...](https://jan.rychter.com/enblog/cloud-server-cpu-performance-
comparison-2019-12-12)).

------
kkielhofner
AWS/clouds aren't always the best solution for a problem. Often they're the
worst (just like any other tool).

You don't provide a lot of detail but I imagine at this point you need to get
"creative" and move at least some aspect of your operation out of AWS. Some
variation of:

\- Buy some hardware and host it at home/office/etc.

\- Buy some hardware and put it in a colocation facility.

\- Buy a lot of hardware and put it in a few places.

Etc.

Cash and accounting is another problem. Hardware manufacturers offer financing
(leasing). Third party finance companies offer lines of credit, special
leasing, etc. Even paying cash outright can (in certain cases) be beneficial
from a tax standpoint. If you're in the US there's even the best of both
worlds: a Section 179 deduction on a lease!

[https://www.section179.org/section_179_leases/](https://www.section179.org/section_179_leases/)

You don't even need to get dirty. Last I checked it was pretty easy to get
financing from Dell, pay next to nothing to get started, and have hardware
shipped directly to a co-location facility. Remote hands rack and configure it
for you. You get a notification with a system to log into just like an AWS
instance. All in at a fraction of the cost. The dreaded (actually very rare)
hardware failure? That's what the warranty is for. Dell will dispatch people
to the facility and replace XYZ as needed. You never need to physically touch
anything.

A little more complicated than creating an AWS account with a credit card
number? Of course. More management? Slightly. But at the end of the day it's a
fraction of the total cost and probably even advantageous from a taxation
standpoint.

AWS and public clouds really shine in some use cases and absolutely suck at
others (as in suck the cash right out of your pockets).

~~~
philliphaydon
> AWS/clouds aren't always the best solution for a problem.

And when they aren’t always the best. It’s often because you don’t know what
you’re doing.

It’s too uncommon for people to over provision. Or go with too many services
when they don’t need to.

Like let’s have a database and cache service and search search. When 95% of
the time they only need the database because it can do full text searching
adequate enough and they don’t have the traffic to warrant caching in redis,
and can do basic caching.

They don’t take advantage of auto scale groups, or run instances that are over
provisioned 24/7.

I’ve seen database instances where when it’s slow they throw more hardware at
it instead of optimising the queries and analysing / adding indexes.

The biggest cost of cloud providers is outbound data. The rest is almost
always the problem of the Developers.

~~~
rumanator
None of your comments are relevant to machine learning applications, and all
you do is throw blanket statements about ignorance. Your comments are very far
from the problem and from being helpful.

~~~
philliphaydon
Nope. We have no information of the OPs setup, bill, or anything. This entire
thread is based on assumptions. I common examples of developers screwing up
and generating large bills. Explain to me how machine learning is any
different.

Do we know if the instances used for MLing are running 24/7 idle until
customers use them? Do we know if the utilisation is optimal for the
workloads?

We know nothing. So claiming that cloud providers are not good is very far
from the problem and not helpful.

~~~
rumanator
> So claiming that cloud providers are not good

The statement is not that AWS is "not good". The statement is that AWS is very
expensive, specially for computational tasks, and there are cheaper
alternatives around.

AWS is notorious for positioning their services as a way to convert capex into
opex, specially if your scenario involves a SaaS that might experience
unexpected growth and must be globally available. Training ML models has zero
to do with those usecases. It makes no sense to mindlessly defend AWS as being
the absolute best service around for a job it was not designed for and with a
pricing model that capitalizes on added value on things that are not
applicable.

~~~
philliphaydon
I never defended AWS as being the absolute best. I said high bills are almost
always due to developers and not the cloud provider. Which you haven’t argued
against.

As I said I have examples of how Developers often cause large bills.

And I explained why we can’t help with the OPs large bill.

You’re saying that with ML there is absolutely 0 way to reduce costs on AWS
which is absolute rubbish.

~~~
rumanator
> I said high bills are almost always due to developers and not the cloud
> provider.

I feel that's where you keep missing the whole point. Somehow you're stuck on
thinking that an expensive service is not a problem if you can waste time
micromanaging and constantly monitoring expenditures to shave off a bit of
cost from the invoice. Yet, somehow you don't register in your universe the
fact that there are services out there that are both far cheaper and arguably
better for this use case.

Therefore, why do you keep insisting on the idea of wasting time and effort
micromanaging a deployment like pets to shave off some trimmings off a huge
invoice if all you need to do to cut cost to a fraction of AWS's price tag is
to.... switch vendor?

~~~
philliphaydon
So what you’re saying is because developers can’t control what they build they
need to be stuck with services that limit what they can do so they don’t end
up with big bills.

And that for cases like MLing it’s impossible to optimise costs.

Got ya.

~~~
rumanator
> So what you’re saying is because developers can’t control what they build
> they need to be stuck with services that limit what they can do so they
> don’t end up with big bills.

No, I'm pointing you the fact that developers are able to do exactly what they
want with less work and far cheaper by simply moving away from AWS and picking
pretty much any vendor. Why do you have a hard time understanding what others
are telling you and understand anything that points that AWS is not the best
solution for all usecases, specially those they were not designed for?

~~~
philliphaydon
Rubbish, you're saying that it's impossible to run on cloud cheaply. Therefore
no one should use cloud for any reason.

"I don't know how to use cloud so cloud is bad"

~~~
still_grokking
"You're holding it wrong!"

------
stratified
[DISCLAIMER] I work at AWS, not speaking for my employer.

We really need some more details on your infrastructure, but I assume it's EC2
instance cost that skyrocketed?

A couple of pointers:

\- Experiment with different GPU instance types.

\- Try Inferentia [1], a dedicated ML chip. Most popular ML frameworks are
supported by the Neuron compiler.

Assuming you manage your instances in an auto scaling group (ASG):

\- Enable a target tracking scaling policy to reactively scale your fleet. The
best scaling metric depends on your inference workload.

\- If your workload is predictable (e.g. high traffic during the daytime, low
traffic during nighttime), enable predictive scaling. [3]

[1] [https://aws.amazon.com/machine-
learning/inferentia/](https://aws.amazon.com/machine-learning/inferentia/)

[2] [https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-
sca...](https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-scaling-
target-tracking.html)

[3]
[https://docs.aws.amazon.com/autoscaling/plans/userguide/how-...](https://docs.aws.amazon.com/autoscaling/plans/userguide/how-
it-works.html)

~~~
belval
It could also be worth it to have a look at SageMaker? IIRC it's cheaper.

------
solresol
My pitch to help: you can probably replace the GPU-intensive ML model with
some incredibly dumb linear model. The difference in
accuracy/precision/recall/F1 score might only be a few percentage points, and
the linear model training time will be lightning fast. There are enough
libraries out there to make it painless in any language.

It's unlikely that your users are going to notice the accuracy difference
between the linear model and the GPU-intensive one unless you are doing
computer vision. If you have small datasets, you might even find the linear
model works better.

So it won't affect revenue, but it will cut costs to almost nothing.

Supporting evidence: I just completed this kind of migration for a bay area
client (even though I live in Australia). Training (for all customers
simultaneously) runs on a single t3.small now, replacing a very large and
complicated set up that was there previously.

~~~
Enginerrrd
Yeah, I agree with this. Rather than ask if OP is optimizing their AWS
billing, I'd also ask if are OP's devs even have any incentive to do better.
Even with machine vision it's stupidly easy to increase your computation
effort by 2 or more orders of magnitude for almost no benefit. Default
parameters often will do that in fact.

------
pixiemaster
I‘m a CTO of a compute intensive AI SaaS company, so I can relate.

One advice: speak to your AWS rep immediately. Get credits to redesign your
system and keep you running. you can expect up to 7 digits in credits (for
real!) and support for a year for free, they really want to help you in
avoiding this.

~~~
cj
This.

AWS has always been eager to get on the phone with me to discuss cost savings
strategies. And they don’t upsell you in the process.

------
kureikain
I was in same situation.

We bough 2 Dell servers via their financing program. Each server is about
19-25K. We paid AWS $60K per month before that. We pay $600 for co-location.

So my advice is try to get hardware via financing of provider Dell had a good
program I think.

~~~
lovetocode
What does colocation mean in this context? Did you buy the servers and AWS
hosted on their premises?

~~~
Nextgrid
Colocation just means buying space in a datacenter somewhere (and it comes
with a certain amount of power and bandwidth).

------
fxtentacle
You might be able to significantly lower your monthly bill in exchange for an
upfront payment by purchasing your own servers and then renting co-location
space.

I'm CTO of an AI image processing company, so I speak from experience here.

I personally use Hetzner.de and their Colo plans are very affordable, while
still giving you multi GBit internet uplinks per server. If you insist on
renting, Hetzner also offers rental plans for customer-specified hardware upon
request. The only downside is that if you call a Hetzner tensorflow model from
an AWS east frontend instance, you'll have 80-100 ms of roundtrip latency for
the rpc/http call. But the insane cost savings over using cloud might make
that negligible.

Also, have you considered converting your models from GPU to CPU? They might
still be almost as fast, and affordable CPU hosting is much easier to find
than GPU options.

I'm happy to talk with you about the specifics of our / your deployment via
email, if that helps. But let me warn you, that my past experience with AWS
and Google Cloud performance and pricing, in addition to suffering through low
uptime at the hands of them, has made me somewhat of a cloud opponent for
compute or data heavy deployments.

So unless your spend is high enough to negotiate a custom SLA, I would assume
that your cloud uptime isn't any better than halfway good bare metal servers.

------
QuinnyPig
Howdy.

I have loud and angry thoughts about this;
[https://www.lastweekinaws.com/blog/](https://www.lastweekinaws.com/blog/) has
a bunch of pieces, some of which may be more relevant than others. The
slightly-more-serious corporate side of the house is at
[https://www.duckbillgroup.com/blog/](https://www.duckbillgroup.com/blog/), if
you can stomach a slight decline in platypus.

~~~
beardface
Came here to recommend you! Your newsletter always provides both enlightenment
and a giggle.

~~~
atsaloli
I came here to recommend QuinnyPig's services as well. He's a pro at reducing
AWS costs.

~~~
dandandans
Corey Quinn (Quinnypig) at Duckbill Group would be my suggestion as well.

------
staticassertion
I'd suggest reaching out to AWS about this. Explain the situation. AWS has a
number of programs for startups that you may be able to apply for, including
one that includes 100k worth of credits.

Also, if you can't afford to scale to new customers... stop? I'm sure it
probably sucks, but like, does it suck more than having no runway? Seems like
you'd be best served slowing things down and spending some time with AWS on
cost optimization.

There aren't a lot of details to go off of here so I don't know what more
advice to give.

------
ssrs
We've managed to reduce our spends by almost 50-60%. Some pointers: 1\. Comb
through your bill. Inspect every charge and ask "Why do we need this?" for
every line item.

2\. If user latency is not a problem, choose the cheapest regions available
and host systems there.

3\. Identify low usage hours (usually twilight hours) and shut systems off.

4\. Transition one-off tasks (cron, scheduling etc.) to lambda. We were using
entire servers for this one thing that would run once a day. Now we dont.

5\. Centralize permissions to launch instances etc. within a few people. Make
everyone go through these 'choke-points'. You might see reduced instances.
Often engineers launch instances to work on something and then 'forget' to
shut them off.

6\. Get AWS support involved. I'm pretty sure with the bills you are racking
up you must have some AWS support. Get some of their architects etc. to check
out your architecture and advise.

7\. Consider Savings Plans and Reserved Instances. Often you get massive cost
savings.

8\. Consider moving some of the intensive number crunching to some of AWS'
data crunching services. We moved a high-powered ELK stack for analyzing
server logs to CloudWatch. A little more expensive in the short term, but we
are now looking to optimize it.

In my experience, AWS has been very supportive of our efforts at reducing
costs. Even after a 50-60% reduction I still feel there is scope for another
round of 50-60% reduction from the new baseline. All the best!

------
jayzalowitz
Here's my deck on this @quinnypig is a great resource elsewhere in this
thread.
[https://docs.google.com/presentation/d/1sNtFugQp_Mcq62gf4F1n...](https://docs.google.com/presentation/d/1sNtFugQp_Mcq62gf4F1n0aJU9IjHmHyMKYOPXYoWYRU/edit)
Last year I cut 75 million in spend, so you could say I have a track record
there.

Are you sure you are using the right type for what you need to generate? Can
you have your model generator self kill (stop) the instance when it finishes
the model?

100% If it doesnt need JIT go spot and build models off queue

Put in for the activate program. They can give you up to 100k of credits.

~~~
paulcole
Can you give a little context to the $75 million in savings? What was the
original amount you were spending? I didn’t see this on your deck.

------
calebkaiser
I maintain an open source ML infra project, where we've spent a ton of time on
cost optimization for running GPU-intensive ML models, specifically on AWS:
[https://github.com/cortexlabs/cortex](https://github.com/cortexlabs/cortex)

If you've done zero optimization so far, there is likely some real low-hanging
fruit:

1\. If GPU instances are running up a huge EC2 bill, switch to spot instances
(a g4dn.xlarge spot is $0.1578/hr in US West (Oregon) vs $0.526/hr on demand).

2\. If inference costs are high, look into Inferentia (
[https://docs.cortex.dev/deployments/inferentia](https://docs.cortex.dev/deployments/inferentia)
). For certain models, we've benchmarked over 4x improvements in efficiency.
Additionally, autoscaling more conservatively and leveraging batch prediction
wherever possible can make a real dent.

3\. Finally, and likely the lowest hanging fruit of all, talk to your AWS rep.
If your situation is dire, there's a very good chance they'll throw some
credits your way while you figure things out.

If you're interested in trying Cortex out, AI Dungeon wrote a piece on how
they used it to bring their spend down ~90%. For context, they serve a 5 GB
GPT-2 model to thousands of players every day:
[https://medium.com/@aidungeon/how-we-scaled-ai-
dungeon-2-to-...](https://medium.com/@aidungeon/how-we-scaled-ai-dungeon-2-to-
support-over-1-000-000-users-d207d5623de9)

------
sokoloff
Don’t overlook the possibility to use your own physical hardware, running
high-end commodity graphics cards (2080Ti, Titan RTX), especially for model
training. (I haven’t found this to be overly effort or time intensive and the
payoff is enormous on a dollars-basis.)

You didn’t give enough details for someone to get really specific. I’m
assuming from your text that the issue is inference not training costs, in
which case there’s some great advice already posted, but more details might
help.

------
glenngillen
Speak to your AWS account manager and/or someone on their startup team. Give
them the detail on what you’re running, what you want to do, and what/when
you’re hoping to reach the next milestone. There’s usually a few different
options available to them to try help you out. Including, but not limited to,
working out how to reduce the ongoing cost of what you’re trying to do.
“Customer obsession” and all that. It’s also just good business. It’s not in
anybody’s interest to have companies running out of runway, they’d rather you
were still in business and paying for compute 5 years from now.

------
lmeyerov
Sounds familiar =\

\- get devs on GPU laptops

\- for always-on, where doable, switch to an 8a - 6p policy, and reserved.
Call aws for a discount.

\- use g4dn x spot. Check per workload tho, it assumes single vs double.

\- consider if can switch to fully on-demand if not already , and hybrid via
GCP's attachable GPUs

\- make $ more visible to devs. Often individuals just don't get it, too easy
to be sloppy.

More probably doable, but increasingly situation dependent

~~~
lmeyerov
ALSO: For all the discussion of on-prem, for ML in particular, consider
running training on a dedicated local hw box and run only inference on the
cloud (which can be CPU)

~~~
FridgeSeal
I’ve been mulling this idea over in my head recently of investing a $2-3k in
building a machine to do exactly that (and use it as a normal dev day to day
machine when it’s not training), because it appears the economics of it are
surprisingly great.

Have you (or anything else here) had experience doing this? Did it end up
being a worthwhile approach? (Even for a while)

~~~
lmeyerov
It depends how long it is on.

If training only short while, may do better by setting up a cloud training
workflow that only has the server on while training. If on a lot, then a
private box makes more sense (ex: lambdalabs, at home/office/colo). Then setup
as a shared box for the team.

A lot of time ends up dev, not actual training, and folks end up keeping dev
cloud GPUs on accidentally. We still use cloud GPUs for this, but have primary
dev on local GPU laptops. For that, we started by System76 for everyone
(ubuntu Nvidia), but those had major issues (weight, battery draw...). I then
did a lightweight asus zenbook for myself, but that was too lightweight all
around. Next time will do more inbetween or explore Thinkpad options.

And yep, as a small team, this mix dropped our cloud opex spend by like 90%,
and pretty fast to offset the capex bump.

------
icedchai
Can you use spot instances? If so you can pay a lot less for compute. Your app
needs to tolerate being shutdown and restarted, however.

Is there anything you can turn off at night? A lot of startups have staging /
test systems that do not need to be running all the time.

Are you keeping a lot of "junk" around that you don't actually need? Look at
S3 objects, EBS snapshots, etc. A few here and there doesn't cost much, but it
does add up.

Are you using the correct EBS volume type? Maybe you're using provisioned IOPS
where you don't need it.

S3: make sure your VPC has an S3 endpoint. This isn't the default. Otherwise,
you're paying a lot more to transfer data to S3.

------
tarun_anand
I have replied to some of the comments below. My advice is to get off AWS or
any public clouds and avoid them like the plague.

They are too expensive for 95% of cases. If you are still not convinced DM me.

~~~
GordonS
Cloud is expensive for sure, especially so for VMs and bandwidth.

But cloud also comes with a lot of convenience - for example, having managed
k8s, and highly-available serverless, messaging, blob storage and databases.

Some of that is particularly challenging to get right, especially for
databases.

It's difficult to justify cloud VMs for heavy processing tho - they really are
just so damned expensive compared to bare metal and VPS providers, and there
isn't that much extra convenience for VMs in the same way there is for PaaS
stuff.

------
quickthrower2
While looking at the technical, also look at the commercial. Can you trace
revenue sources to aws costs? In other words calculate your variable costs for
each client/contract individually?

Eg are there some clients losing you money that you can either let go or raise
prices for?

------
mlthoughts2018
Don’t use GPUs at inference (serving) time unless you prove that you need to.

The only consistent case when I’ve found it’s needed (across a variety of NLP
& computer vision services that have latency requirements under 50
milliseconds) is for certain very deep RNNs, especially for long input
sequence lengths and large vocabulary embeddings.

I’ve never found any need for it with deep, huge CNNs for image processing.

Also consider a queue system if utilization is a problem switching from GPU.
Create batch endpoints that accept small batches, like 8-64 instances, and put
a queue system in front to mediate collating and uncollating batch calls from
the stream of all incoming requests (this is good for GPU services too).

------
aclelland
If you can handle some interruption to your work then spot instances are
probably going to be the biggest immediate change you can make.

Right now a g4dn.xlarge is $0.526/h on demand but only $0.1578/h as a spot
instance.

You might also be eligible for a 10k grant from AWS -
[https://pages.awscloud.com/GLOBAL-other-LN-accelerated-
compu...](https://pages.awscloud.com/GLOBAL-other-LN-accelerated-computing-
free-trial-2020-interest.html)

------
chmod775
If cost is an issue, get off AWS. Immediately. You're paying about 10x what
the same hardware/bandwidth would cost you if you just bought dedicated
servers.

------
alFReD-NSH
If you have the time to fix them asap you can follow this route:

\- use spot or reserved instances or saving plans. \- have a look at compute
optimizer \- understand aws networking costs are and try to optimise it (cross
az and internet egress can be costly) \- go through the trusted advisor
checks: [https://aws.amazon.com/premiumsupport/technology/trusted-
adv...](https://aws.amazon.com/premiumsupport/technology/trusted-advisor/best-
practice-checklist/)

You can enable trusted adviser checks by enabling business or enterprise
support. \- try using one of these cost optimisation tools:
[https://aws.amazon.com/products/management-tools/partner-
sol...](https://aws.amazon.com/products/management-tools/partner-
solutions/#Resource_and_Cost_Optimization)

\- contact aws for well architected review

If you don't have the time, then I suggest contacting AWS to introduce you to
a consulting partner. They can come and actually fix whatever is needed.

------
whb07
You train the model locally and push it for inference to the cloud?

What exactly are we talking about here?

Couldn’t you build a dual NVIDIA 20XX / 32 core / 64 GB for a sub $5k and then
save money while training/developing faster?

~~~
pocw
Except they (the gender non-specific singular) is probably running kubernetes
and has multiple clusters of 10 or so gpu hosts. Not that I disagree, but
spinning that up locally and orchestrating it will take time and money. And
explaining why training is paused because you keep blowing breakers in the
office will cost political capital.

~~~
sophiebits
You can just say “Except they are probably”.

------
joshuaellinger
GPU servers and coloc are pretty cheap these days. $1K/m rent per 20A of
power. ROI on hardware is usually 3-4 months max (ie - for the cost the
machine at AWS for 3-4 months, you can buy the same thing).

Lead time might be a problem for you but you can probably do it in a under a
month if you take available stock at your vendor. I work with a company called
PogoLinux ([http://pogolinux.com](http://pogolinux.com)) out of Seattle and
they sell boxes that have 4 GPUs in them.

That said -- the other advice is right. You can probably get by with a much
simpler model. The coloc route would probably only be better if you are can't
change the models due to people constraints and the ML stuff doesn't have a
lot of AWS dependencies. SysAdmins are a lot easier to find and hire than ML
specialists.

------
reilly3000
In terms of cost, I would recommend deeply interrogating the bill. Your data
transfer cost is likely to be really higher than you expected, and there are
lots of ways to mitigate that. GPUs are crazy expensive in the cloud, and
really makes sense to host locally. There is also usually some money to be
found with looking at S3 tiers - like Infrequent Access can save a lot if its
good for your use case. Finally, if EC2 is a big cost driver, spot pricing and
savings plans are good places to start.

I will say that more generally speaking, there has been a lot of recognition
in the industry at large that AI-driven startups all face this challenge,
where the cost of compute eats up most of the margin. There is no easy
solution to that, other than to make product-level decisions about how to add
more value with less GPU time.

------
speedgoose
AWS is super expensive. Switch to another cloud provider.

For example : Scaleway, OVH, or Hetzner.

~~~
njsubedi
Can confirm this. Personally I wanted to switch to AWS from Scaleway because
one of the regions was closer to the customers. No way I could justify the
costs. With some load balancers and API access, we were able to scale
horizontally without a problem.

~~~
GordonS
Are you using Postgres by chance? If so, I'd love to hear about how you
deployed it (struggling to figure out a performant, HA setup!)

------
parsimo2010
I don't know how deep you've dug but the very first thing you should be doing
is using spot instances instead of on demand instances (unless you absolutely
can never wait to train a model). Spot instances are cheaper than on demand
instances, with the downside that the price can fluctuate, so you need to
build in a precaution for shutting down if the price gets too high. So if the
price goes up, you either have to stop training until the price goes back down
or to suck it up and pay a higher price.

Luckily, it's pretty simple to handle interruptions for neural network like
models that train over several iterations. Just save the model state
periodically so you can shut the instance down whenever the price is too
expensive and start training again when the price is lower.

------
Havoc
If you're running GPU heavy stuff all the time then you're probably better off
just buying some GPUs outright and doing that part on-site.

Especially if you can keep the own gear busy 24/7\. i.e. run those 24/7 and
any excess GPU use above that fall back onto cloud for that.

------
amenghra
Talk to an AWS rep and also different cloud vendors. I know startups which
received large amounts of free compute in their early days and then went on to
become successful companies. I bet it was win-win for everyone involved.

------
nickjj
If you're storing a lot of data I talked to someone who went from $3,000 a
month to $3 a month by saving older dumps of their database into an S3 bucket
instead of keeping many many old RDS snapshots from weeks / months ago around.

Here's a direct timestamp link to that point in the podcast where it came
from: [https://runninginproduction.com/podcast/33-zego-lets-you-
eas...](https://runninginproduction.com/podcast/33-zego-lets-you-easily-buy-
insurance-by-the-hour#55:46)

------
bscanlan
Segment's blog posts on cost optimisation have plenty of detail and tips on
this topic:

[https://segment.com/blog/the-million-dollar-eng-
problem/](https://segment.com/blog/the-million-dollar-eng-problem/)
[https://segment.com/blog/spotting-a-million-dollars-in-
your-...](https://segment.com/blog/spotting-a-million-dollars-in-your-aws-
account/) [https://segment.com/blog/the-10m-engineering-
problem/](https://segment.com/blog/the-10m-engineering-problem/)

Similarly this Honeycomb writeup is also excellent:
[https://www.honeycomb.io/blog/treading-in-haunted-
graveyards...](https://www.honeycomb.io/blog/treading-in-haunted-graveyards/)

By the sounds of it, you need to take drastic action. It sounds like you will
not be able to just optimise your AWS spend to get more runway, though you
should definitely do some bill optimisation. You will need to optimise your
product itself and maybe even getting rid of unprofitable customers.

If you are not sure exactly who or what is driving the AWS cost, take a look
at Honeycomb to get the ability to dive deep into what is eating up resources.

------
pavelevst
AWS is one of most expensive hosting solution, I assume many of us somehow
start to think that it’s a best one, in my opinion they all kinda same. Moving
to other place will require effort but can let you reduce cost to 10-20% of
current one. Some easy things that you can do with aws is to resize VMs, it
will require to turn it off for a minute or so. Also can change to cheaper
tire, eg t2 -> t3. Also can change VMs from ec2 to lifhtsail

------
godzillabrennus
I help companies find bare metal options for training models. It’s usually
10-20% the cost of cloud.

Email me lanec (at) hey (dot) com if you’d like to speak.

Last year I took a company spending $24k/month training visual AI and cut that
down to $3,500/month with bare metal. I also helped them secure over $100k in
cloud credits to cover the monthly costs until the transition could happen.

Training in the cloud is generally much more expensive than bare metal.

------
blickentwapft
Run your own machines.

You don’t have to use cloud services.

------
sandGorgon
Simple answer. But the implementation is trickier.

You have to use Spot instances. Or as Google calls them - preemptible
instances. These are upto 80% cheaper.

The caveat is that they can be killed anytime, so your infrastructure must be
resumable.

Most likely you will need to do kubernetes. It's the only framework that
supports GPU, integrates with spot instance providers and works with Ml
platforms (using kubeflow)

------
dautovri
open source tools:

\- [https://github.com/antonbabenko/terraform-cost-
estimation](https://github.com/antonbabenko/terraform-cost-estimation)

\- [https://github.com/cloud-custodian/cloud-
custodian](https://github.com/cloud-custodian/cloud-custodian)

\- [https://github.com/aws/amazon-ec2-instance-
selector](https://github.com/aws/amazon-ec2-instance-selector)

\-
[https://github.com/rdkls/aws_infra_map_neo4j](https://github.com/rdkls/aws_infra_map_neo4j)

commercial:

\- [https://www.cloudhealthtech.com](https://www.cloudhealthtech.com)

\- [http://densify.com/](http://densify.com/)

\- [https://spot.io](https://spot.io)

\- [https://www.hpcdlab.com](https://www.hpcdlab.com)

------
ramraj07
How about you just purchase some motherboards and GPUs and start running them
in your office (assuming you're not bandwidth limited or looking for
millisecond response times).. I'm always tempted to do this when we have
fairly constant workload. Wasn't GPU instance pricing quite insane on AWS
compared to actual GPU costs?

------
lazylizard
This is not exactly it i imagine. But maybe longer term you could consider
this.

At my place people test on their desktops and run production stuff in the data
center.

Where are you located? These are prices in
singapore..[http://www.fuwell.com.sg/uploads/misc/Fuwell11072020.pdf](http://www.fuwell.com.sg/uploads/misc/Fuwell11072020.pdf)

You're looking for a cpu, board, 64gb ram, maybe 2 x 2080ti, small ssd n
psu(1000w?). You can leave these on ikea shelves n skip the casings if need
be.. 3 x 2080ti makes the board expensive and psu hard to find...

If you want more reliability. Get asus or supermicro. Or even sugon. 4gpu. 2u.

So that's like a few kw per machine and you need to think about how much power
you can draw per power socket..so usually the 2u stuff end up in datacenters.

------
JangoSteve
Someone else mentioned it already in these comments, but I'll mention again to
make sure it's not missed. If you're a startup using AWS, apply for the AWS
Activate program. All you need to do is apply, and they'll give you up to
$100k AWS credits, which will last for up to 2 years and automatically be
applied to your bill until they're used up.

[https://aws.amazon.com/activate/](https://aws.amazon.com/activate/)

It's not a solution to the larger problem of business model and percentage of
revenue going toward compute costs to provide your service, but there are a
lot of other great recommendations and suggestions here for that. This could
provide you some time to actually implement the other recommendations.

------
ucha
nVidia forces cloud providers to use their expensive professional line-up. But
other providers that use consumer GPUs are way cheaper, 4x or more. If your
models don't need a lot of memory or double precision, providers such as
GPUEater or RenderRapidly can be worth looking at.

------
MaxBarraclough
Related reading: Ask HN: How did you significantly reduce your AWS cost?
(2017) [0]

The top comment is great. Two easy wins:

* Putting an S3 endpoint in your VPC gives any traffic to S3 its own internal route, so it's not billed like public traffic. (It's curious that this isn't the default.)

* Auto-shutdown your test servers overnight and on the weekends

See also this thread from 2 days ago, _Show HN: I built a service to help
companies save on their AWS bills_ [1]

Those threads aren't specific to GPU instances though.

[0]
[https://news.ycombinator.com/item?id=15587627](https://news.ycombinator.com/item?id=15587627)

[1]
[https://news.ycombinator.com/item?id=23776894](https://news.ycombinator.com/item?id=23776894)

------
kavehkhorram
Disclosure: I am the founder of Usage.ai

The product my team works on, www.usage.ai, automates the process of finding
AWS savings using ML (through reserved instances, savings plans, spots, and
rightsizing). Every recommendation is shown on a sleek webpage (we spent a lot
of resources on UX).

We haven't fully explored the ML use case, but I'd love to figure out how we
can help you drive down the costs associated with your GPU models. Would you
have 15 minutes this week for a discussion?

If you're interested, you can reach me at kaveh@usage.ai

------
jorgemf
It is very unlikely that anyone is going to give you a good advice with so
little information about your cost structure. There is great people here who
can provide invaluable insights about your costs but they need to have more
information.

We use a lot of GPU intensive models and 80% of revenue goes into AWS, doesn't
mean that your AWS cost is mostly GPU. It should mean that, but who knows.
Tell us how is your AWS infrastructure, what instances do you have, how much
do they cost to you, etc. Because with your information about the costs the
best advice you can get is to not use AWS neither GPU-intensive ML models.

------
red0point
For the ML models you can also switch to dedicated server providers, such as:
[https://www.dedispec.com/gpu.php](https://www.dedispec.com/gpu.php)

For storage, there‘s always Wasabi / B2 with S3 compatible interfaces. If the
data itself is not changing that much, so regular backups are possible, just
use some dedicated storage servers with hard drives and install MinIO. Do not
rely on S3 for outgoing data (much too expensive), use a caching layer on
another provider (OVH , Hetzner, ...), or if it fits your workload, Wasabi
(„free“ egress).

------
vmurthy
At a startup I worked earlier, we tried two things that helped : 1\. Reserved
instances (you commit for a year and you can save 20% - charged monthly. AFAIK
no upfront costs)

2\. Like another reader suggested here, there are accelerators/foundations
which give away $10k for the 1st year towards cloud usage. We were in
healthcare and had a big chip company pay about $10k in credits for a year of
AWS. Depending on the domain you are in, there may be a few. If you let me
know which domain you work in ( healthcare , media etc.) someone here might be
able to point to the right resource

------
cure
Start by looking at the breakdown of your costs in the cost analyzer. Look for
the categories of your biggest spend. Is it storage? EC2? Something else? For
storage; see if you can clean up things you don't need anymore. See if you can
move infrequently used data into long-term, cheap storage (but beware
retrieval costs!). For EC2, consider changing node types to cheaper ones.
Newer classes are can be much better value for the money. Make sure you use
spot instances where you can. Focus on the biggest expense first.

------
GauntletWizard
Without any idea of what your infrastructure looks like, I can't give you
anything actionable, but that might be enough advice in and of itself: go
after the low hanging fruit first. What are you spending on? Look at the top
two or three services by spend and dig a little deeper.

Are you spending on bandwidth? See if there's compression you can enable. Ec2?
Can your reduce instance sizes or autoscale down instances you're not using
overnight? Elasticache or elastic search? Tune your usage, store smaller keys
or expire things out.

------
atlbeer
Using large data stored in S3?

Make sure you are fetching it via a S3 endpoint in your VPC instead of via
public HTTP. You are paying for an (expensive) egress cost you don’t need to
be paying for.

~~~
GordonS
That's insane that that's the default!

------
kichik
Spot instances. Easy and saves a ton.

------
ISL
If AWS cost is 80% of revenue, and the added cost per customer isn't paying
for itself, perhaps one could either charge more or pause customer
acquisition?

------
KaiserPro
We had the same problem!

We managed to cut our costs by about 2/3rd by doing two things:

1) moving to batch (this spins up machines to run docker containers without
much hassle. You can also share instances) 2) use spot instances.

Spot instances integrate nicely into batch, and depending on how you set it
up, you can optimise for speed or cost. for example a p2.xlarge is still $.9
but on spot its about $0.25-0.35

------
tomcooks
Dedicated server somewhere close to your office.

------
rrrix1
I have a strong suspicion the OP is trolling, or at least his motives aren't
obvious.

Check his profile.

He has people, or knows of people that can can likely help with this. He's a
CEO of a an Accelerator and is not a newb by any sense.

OR... he's using gamification to find someone to hire to actually help solve
this problem. If that's the case... Bravo sir!

------
chris_armstrong
[Disclosure - my company sells a cost optimisation product]

1\. You are going to get a lot of advice to move to your own hardware - DON'T.
Companies use cloud for the flexibility and lower operational overhead, not
because it's cheap. Consider if your org is mature enough to run its own
servers and has the 6 months it will take to get everything setup.

2\. Talk to your AWS account manager. They will work their asses off to stop
you churning to another provider or to your own hardware, because they know
they are losing your revenue entirely for minimum 2 years.

3\. _Switch it off._ If you're not using it outside of business hours, you're
wasting money. This is the easiest cost saving you will make (my company,
GorillaStack, provides a service that makes this easy to setup without
scripting and a 14 day trial)

4\. If you have a baseline of servers that you will constantly need, reserved
instances offer great savings. There is a secondary market for these, where
you can get shorter periods cheap from other customers who don't need them.

5\. If you haven't already, look at your bill and the breakdown of services.
Cost optimisation consultants (they do exist) will start here, and by
attacking the biggest line items first.

They are usually EC2 Compute, EBS, Transfer Costs, etc. Prioritise based on
size and ease of implementation.

You should make a habit of checking it at least every few days to keep on top
what is going on.

6\. Delete unused resources - you need to be ruthless with developers who
leave unused EC2 resources around after creation. The key isn't to lock down
your environment and stop developers from creating what you need, but
enforcing a tagging policy on resources to track who owns what. There are
crude scripts that scan your environment and delete untagged resources after a
certain period.

7\. Once things are under control, use CloudWatch Cost Alarms to get
notifications when things are crossing a predefined threshold. These can be
connected to SNS for receiving emails (and there are simple deployable
solutions for receiving these via Slack webhooks, for example).

Some further advice: 'right-sizing' is often held up as an important cost
saving method, but can often be much more trouble than its worth. If your
workload is going to be a pain and require endless planning and regression
testing when you switch instance size, reconsider - you will waste more in
developer time than the cost savings over a few years.

------
pvm3
This thread from yesterday might be useful
[https://news.ycombinator.com/item?id=23776894](https://news.ycombinator.com/item?id=23776894)

[https://github.com/similarweb/finala](https://github.com/similarweb/finala)
seems promising

------
barrald
Lots of people in here mentioning Reserved Instances, so it's worth mentioning
Reserved AI ([https://www.reserved.ai/](https://www.reserved.ai/)).

We're customers and have been very happy with them, it was super quick to get
set up and saves us a big chunk.

------
us0r
A quick Google search for GPU dedicated server is probably going to save you
tens of thousands of dollars a year.

------
stunt
There are quick wins with spot instances and also Fargate. It’s hard to say
anything without knowing type of workloads and compute that you have. But
there is always opportunity to save there.

Other than that, you should also look at your architecture. Often there is
opportunity to save there as well.

------
user5994461
Make sure to tag every instance/resource/disk in AWS, with their purpose,
team, etc...

Then you can go into the AWS costs explorer and see the costs breakdown per
tag.

Usually there will be a few resources standing out. 80% of the costs is 20% of
resources. Find out what they are and cut.

------
zeckalpha
Lots of comments with tips on “how”, but your last paragraph makes it sound
like your are looking for a “who”.

I’ve heard good things about
[https://www.duckbillgroup.com/](https://www.duckbillgroup.com/)

------
goatherders
If you are growing and/or have funding the other cloud providers will throw
credits at you.

If you arent growing or have funding then go to a less expensive host. There
are TONs of high quality hosts out there that are quite a bit less expensive.

------
rohanaed
Please check this Show HN thread -
[https://news.ycombinator.com/item?id=23776894](https://news.ycombinator.com/item?id=23776894)

A developer has created an app for this same purpose

------
xendo
Depending on the entropy of your input caching may be a way out. Sometimes if
you can't cache end result you can cache some intermediate results.

I would assume that if you are big enough you may be able to negotiate some
pricing.

------
textient
1Cloudhub an AWS partner has a very affordable product called Costimize for
AWS to help save costs. I will be happy to help. Contact
Sankar.nagarajanAT1cloudhub.com

Secondly, you can use AWS spot instances for GPUs which costs less.

------
benjaminwootton
Reserved instances are usually the lowest hanging fruits depending on your
usage profile and how much you can commit to and/or pay for up front. Savings
of 30%+ are very achievable.

------
literallycancer
AWS likely has a retention department that can give you discounts or credits
to make you stay. Ask for credits and use the extra time to set up your own
hardware.

------
billman
If your computational load is spikey, I would suggest looking at fargate and
the spot market. Also for storage, I would suggest leveraging S3 whenever
possible.

------
byko3y
The answer is simple: don't use AWS. You will never get out of this hole
unless you move from AWS, because AWS is not scalable budget-wise.

------
postit
Take a look in the cost explorer

A low hanging fruit are spot instances if you can manage stateless sessions.

If you have multiple snapshots that could cost money as well

------
pachico
It would be helpful to know which services you use. Do you use ML services or
instances with GPU? Where is most of the cost?

------
tonymet
Post your bill. You often have unexpected charges for ingress or services you
are unaware

------
tmwed
an acquaintance of mine has a business that specializes in the problem you’re
facing. please feel free to reach out to them:
[https://www.taloflow.ai/](https://www.taloflow.ai/)

------
dvfjsdhgfv
Inless GPUs are an absolute must, just use Hetzner and never look back at AWS.

------
unixhero
Switch from RDS to lightsail instances for trivial dB workloads

Could also apply for EC2

------
alzaeem
if using deep learning models, consider using distilled and/or quantized
models to reduce the resources required for inference

------
gshdg
Reserved instances? Instant 20-40% savings.

------
juskrey
Real hardware with colocation

------
A21z
Blog article about real-life AWS cost optimization strategy :

[https://medium.com/teads-engineering/real-life-aws-cost-
opti...](https://medium.com/teads-engineering/real-life-aws-cost-optimization-
strategy-at-teads-135268b0860f)

TL;DR:

\- Monitor your daily costs

\- Cloudify your workloads

\- Use reservations, Spot instances, saving plans

\- Look around for unused infrastructure

\- Optimize S3 consumption

~~~
tarun_anand
In our experience this is a Sisyphian task. The number of elements that get
charged is just enormous.

------
lumost
These are the biggest ways to lower cost that I've used in the past, with a
high burn rate it's important to focus on the things that can change the
economics on a short timeline ( think next week ), as well as activities on a
longer-timeline ( next year ). You should have a plan in place for your board
- and be able to discuss the cost reduction strategy for Cost of Goods Sold in
any future financing rounds. Carefully consider the full TCO - buying colo
hardware means opting out of ~3 years of future price reductions/hardware
improvements in the cloud + opportunity cost.

1) Call your provider and find out what options they have to cut your cost.
This can take the form of discounts, credits, or increased reservations

2) It's not uncommon for ML teams to have excess capacity sitting around for
forgotten R&D activities. Make sure that you're team is tearing down hardware,
consider giving all scientists their own dedicated workstation for model
development activities. You can smoke test the opportunity here by verifying
that the GPUs are actually being utilized to ~40-80% average capacity.

3) Really dive into whether you need the parameters/model architecture you
have. The best model for your company will need to balance latency/cost with
accuracy. If you're using a transformer where a CNN or even a logistic
regression with smart feature extractors could do with 1% accuracy loss. Then
do your customers really need the transformer?

4) As others have suggested drill-down on the inference and training costs.
Train less frequently/not at all/or sample your data. Generally the benefit of
using more data in a model is logarithmic at best vs. the linear training
time.

5) Buy your own hardware, particularly for GPU inference RTX cards can be
purchased in servers for your own colo - but not in clouds. The lead time
would be a few months but the payoff could occur within ~2-6 months in a colo.

6) Leaving this here as it used to affect Analytics/Ad-Tech and other "big-
data" companies. Programming languages are not created equal in performance,
and given equal implementations a statically typed language will crunch data
between 10 and 1000x faster and cheaper than a dynamically typed language. If
your business is COGS pressed then your team will probably spend more time
trying to optimize hardware deployments and squeezing perf out of your dynamic
language than you gain in productivity. Drill down on your costs and check how
much of it is raw data-processing/transaction scheduling/GPU scheduling and
make sure that you're on the right tech path for your customers.

Lastly at an 80% Cost of Goods Sold(COGS) it's quite possible that your
business is either low margin or the pricing structure isn't well aligned as
this is a new startup - ask yourself if you expect to raise prices for future
non-founding customers. If so then it's possible that your current customers
are helping reduce your marketing expenditures, and you may be able to
leverage the relationship to help "sell" to future customers.

