Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: How can I quickly trim my AWS bill?
151 points by danicgross on July 11, 2020 | hide | past | favorite | 131 comments
Hi HN,

I work with a company that has a few GPU-intensive ML models. Over the past few weeks, growth has accelerated, and with that costs have skyrocketed. AWS cost is about 80% of revenue, and the company is now almost out of runway.

There is likely a lot of low hanging cost-saving-fruit to be reaped, just not enough people to do it. We would love any pointers to anyone who specializes in the area of cost optimization. Blogs, individuals, consultants, or magicians are all welcome.

Thank you!

Disclosure: I work on Google Cloud (but my advice isn’t to come to us).

Sorry to hear that. I’m sure it’s super stressful, and I hope you pull through. If you can, I’d suggest giving a little more information about your costs / workload to get more help. But, in case you only see yet another guess, mine is below.

If your growth has accelerated yielding massive cost, I assume that means you’re doing inference to serve your models. As suggested by others, there are a few great options if you haven’t already:

- Try spot instances: while you’ll get preempted, you do get a couple minutes to shut down (so for model serving, you just stop accepting requests, finish the ones you’re handling and exit). This is worth 60-90% of compute reduction.

- If you aren’t using the T4 instances, they’re probably the best price/performance for GPU inference. If you’re using a V100 by comparison that’s up to 5-10x more expensive.

- However, your models should be taking advantage of int8 if possible. This alone may let you pack more requests per part. (Another 2x+)

- You could try to do model pruning. This is perhaps the most delicate, but look at things like how people compress models for mobile. It has a similar-ish effect on trying to pack more weights into smaller GPUs, or alternatively you can do a lot simpler model (less weights and less connections also often means a lot less flops).

- But just as much: why do you need a GPU for your models? (Usually it’s to serve a large-ish / expensive model quickly enough). If you’re going to be out of business instead, try cpu inference again on spot instances (like the c5 series). Vectorized inference isn’t bad at all!

If instead this is all about training / the volume of your input data: sample it, change your batch sizes, just don’t re-train, whatever you’ve gotta do.

Remember, your users / customers won’t somehow be happier when you’re out of business in a month. Making all requests suddenly take 3x as long on a cpu or sometimes fail, is better than “always fail, we had to shut down the company”. They’ll understand!

I was in the same boat and this is good advice!

I stopped using gpu's, "Vectorized inference isn’t bad at all!". This soo much, I was blinded with gpu speed, using tensorflow builds with avx optimization is actually pretty fast.

My discovery:

+ Stop expensive GPU's for inference and switch to avx optimized tensorflow builds.

+ Cleaned up the inference pipeline and reduced complexity.

+ Buying compute instance for a year or more provides a discount.

- I never got pruning to work without a significant loss increase.

- Tried spot instances with gpu's that are cheaper. Random kills and spinning up new instances took too long loading my code. The discount is a lot, but I couldn't reliable get it up. Users where getting more timeouts. I bailed and just used cpu inference. The gpu was being underutilized, using cpu only increased the inference to around 2-3 seconds. With the price trade off it was a more simpel,cheaper and easier solution.

Also, consider physical servers from providers like Hetzner. These can be several times cheaper than EC2.

I use Hetzner for quite a lot for personal projects and can recommend them for reliability and predictable costs. I've done reasonably high CPU tasks like compiling Android images on the larger Cloud instances.

However, this morning I was playing around with Scaleway bare metal [1] and General Purpose instances [2] -- I am thinking of making a switch for high CPU tasks.

[1] https://www.scaleway.com/en/bare-metal-servers/

[2] https://www.scaleway.com/en/virtual-instances/general-purpos...

Interesting! These look very good indeed. I will have to try them.

The main point is that physical servers are much cheaper than VMs and provide significantly better performance as well (see my benchmarking and comparison: https://jan.rychter.com/enblog/cloud-server-cpu-performance-...).

I was just looking at Hetzner yesterday, looking to host a HA Postgres setup.

Their block storage volumes look interesting, but I couldn't find any information on performance guarantees, or even claims.

Anyone have an idea about performance (IOPS or MB/s)?

I use them but don't have that info off the top of my head. However, you can easily make an account, get a VPS with a volume and benchmark it in a few minutes for a few cents.

Note that we are talking about two different things here: a VPS is not the same thing as a dedicated server.

I only use their dedicated servers with NVMe SSDs and have never benchmarked the I/O.

Right, but the GP was talking about the network volumes AFAICT.

I worked on an unrelated market study - look at Upcloud and Raptr as well.

Oh and I should have said why they shouldn’t bother attempting to migrate somewhere “cheaper” (whether GCP, Hetzner, or whatever else): it doesn’t sound like they have time. I read the call for help as: we need something we can do in the next week or two to keep us in business. Any “move the infrastructure” plan will take too long and you should still do the “choose the right GPU / CPU, optimize your precision” change no matter what.

AWS/clouds aren't always the best solution for a problem. Often they're the worst (just like any other tool).

You don't provide a lot of detail but I imagine at this point you need to get "creative" and move at least some aspect of your operation out of AWS. Some variation of:

- Buy some hardware and host it at home/office/etc.

- Buy some hardware and put it in a colocation facility.

- Buy a lot of hardware and put it in a few places.


Cash and accounting is another problem. Hardware manufacturers offer financing (leasing). Third party finance companies offer lines of credit, special leasing, etc. Even paying cash outright can (in certain cases) be beneficial from a tax standpoint. If you're in the US there's even the best of both worlds: a Section 179 deduction on a lease!


You don't even need to get dirty. Last I checked it was pretty easy to get financing from Dell, pay next to nothing to get started, and have hardware shipped directly to a co-location facility. Remote hands rack and configure it for you. You get a notification with a system to log into just like an AWS instance. All in at a fraction of the cost. The dreaded (actually very rare) hardware failure? That's what the warranty is for. Dell will dispatch people to the facility and replace XYZ as needed. You never need to physically touch anything.

A little more complicated than creating an AWS account with a credit card number? Of course. More management? Slightly. But at the end of the day it's a fraction of the total cost and probably even advantageous from a taxation standpoint.

AWS and public clouds really shine in some use cases and absolutely suck at others (as in suck the cash right out of your pockets).

100% agree. Most public clouds are ripoffs. We have spent 11 years on it and now thrown in the towel.

Go for some colocation facility where costs are predictable.

It depends on your use case and internal infrastructure support. A lot of start-ups start on "cloud" when they have unpredictable needs and little immediate cash for kit & sys-admins (to manage more than the bare servers: backups and monitoring and other tasks that a cloud arrangement will offer the basics of at least, will need to be managed by you or a paid 3rd party on your kit). Later when things have settled they can move to more static kit and make a saving in cost at the expense of the flexibility (that they no longer need). Or they go hybrid if their product & architecture allows it: own kit for the static work, spreading load out to the cloud if a temporary boost of CPU/GPU/similar power is needed (this works best for loosely-coupled compute-intensive workloads, which may be the case here depending on exactly what they are trying to get out of ML and what methods & datasets are involved).

This should be top voted. Buy the hardware and expect your costs to fall 10x.

There are also more upfront costs (not just monetary), you can't scale quicky, and you lose all the managed solutions that make building things super fast and effective. Your hardware cost may be lower 10x but the operational and developmental cost will be higher as well as a limit on your business to grow.

A balanced approach is to only put the most expensive hardware portion of the business with the smallest availability requirement in colo, and horizontally scale it over time. Simultaneously use a cloud provider to execute on the cheap stuff fast and reliably.

> AWS/clouds aren't always the best solution for a problem.

And when they aren’t always the best. It’s often because you don’t know what you’re doing.

It’s too uncommon for people to over provision. Or go with too many services when they don’t need to.

Like let’s have a database and cache service and search search. When 95% of the time they only need the database because it can do full text searching adequate enough and they don’t have the traffic to warrant caching in redis, and can do basic caching.

They don’t take advantage of auto scale groups, or run instances that are over provisioned 24/7.

I’ve seen database instances where when it’s slow they throw more hardware at it instead of optimising the queries and analysing / adding indexes.

The biggest cost of cloud providers is outbound data. The rest is almost always the problem of the Developers.

None of your comments are relevant to machine learning applications, and all you do is throw blanket statements about ignorance. Your comments are very far from the problem and from being helpful.

Nope. We have no information of the OPs setup, bill, or anything. This entire thread is based on assumptions. I common examples of developers screwing up and generating large bills. Explain to me how machine learning is any different.

Do we know if the instances used for MLing are running 24/7 idle until customers use them? Do we know if the utilisation is optimal for the workloads?

We know nothing. So claiming that cloud providers are not good is very far from the problem and not helpful.

> So claiming that cloud providers are not good

The statement is not that AWS is "not good". The statement is that AWS is very expensive, specially for computational tasks, and there are cheaper alternatives around.

AWS is notorious for positioning their services as a way to convert capex into opex, specially if your scenario involves a SaaS that might experience unexpected growth and must be globally available. Training ML models has zero to do with those usecases. It makes no sense to mindlessly defend AWS as being the absolute best service around for a job it was not designed for and with a pricing model that capitalizes on added value on things that are not applicable.

I never defended AWS as being the absolute best. I said high bills are almost always due to developers and not the cloud provider. Which you haven’t argued against.

As I said I have examples of how Developers often cause large bills.

And I explained why we can’t help with the OPs large bill.

You’re saying that with ML there is absolutely 0 way to reduce costs on AWS which is absolute rubbish.

> I said high bills are almost always due to developers and not the cloud provider.

I feel that's where you keep missing the whole point. Somehow you're stuck on thinking that an expensive service is not a problem if you can waste time micromanaging and constantly monitoring expenditures to shave off a bit of cost from the invoice. Yet, somehow you don't register in your universe the fact that there are services out there that are both far cheaper and arguably better for this use case.

Therefore, why do you keep insisting on the idea of wasting time and effort micromanaging a deployment like pets to shave off some trimmings off a huge invoice if all you need to do to cut cost to a fraction of AWS's price tag is to.... switch vendor?

So what you’re saying is because developers can’t control what they build they need to be stuck with services that limit what they can do so they don’t end up with big bills.

And that for cases like MLing it’s impossible to optimise costs.

Got ya.

> So what you’re saying is because developers can’t control what they build they need to be stuck with services that limit what they can do so they don’t end up with big bills.

No, I'm pointing you the fact that developers are able to do exactly what they want with less work and far cheaper by simply moving away from AWS and picking pretty much any vendor. Why do you have a hard time understanding what others are telling you and understand anything that points that AWS is not the best solution for all usecases, specially those they were not designed for?

Rubbish, you're saying that it's impossible to run on cloud cheaply. Therefore no one should use cloud for any reason.

"I don't know how to use cloud so cloud is bad"

"You're holding it wrong!"

Nothing is stopping you from applying all those optimizations to on-premise hardware, right?

That is, I am not sure "public cloud, if you spend lots of effort to optimize it and ask devs to be careful, can be as cheap as a naive on-prem implementation where devs don't need to be careful" is an argument for public cloud.

Well if your on prem then you’re probably bit more limited in what you can do. You can’t just go “let’s solve this with x” cos x doesn’t exist so you need to prevision it yourself and maintain it yourself. It’s probably better cos you actually need to think about what you’re building rather than just throwing services left and right at the problems.

I’m also not suggesting optimising and being careful is an argument for cloud. I’m saying that ruling out cloud is stupid. You can absolute have a Low cost solution perform very well on a cloud provider. The OP seems to think it’s not possible.

[DISCLAIMER] I work at AWS, not speaking for my employer.

We really need some more details on your infrastructure, but I assume it's EC2 instance cost that skyrocketed?

A couple of pointers:

- Experiment with different GPU instance types.

- Try Inferentia [1], a dedicated ML chip. Most popular ML frameworks are supported by the Neuron compiler.

Assuming you manage your instances in an auto scaling group (ASG):

- Enable a target tracking scaling policy to reactively scale your fleet. The best scaling metric depends on your inference workload.

- If your workload is predictable (e.g. high traffic during the daytime, low traffic during nighttime), enable predictive scaling. [3]

[1] https://aws.amazon.com/machine-learning/inferentia/

[2] https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-sca...

[3] https://docs.aws.amazon.com/autoscaling/plans/userguide/how-...

It could also be worth it to have a look at SageMaker? IIRC it's cheaper.

My pitch to help: you can probably replace the GPU-intensive ML model with some incredibly dumb linear model. The difference in accuracy/precision/recall/F1 score might only be a few percentage points, and the linear model training time will be lightning fast. There are enough libraries out there to make it painless in any language.

It's unlikely that your users are going to notice the accuracy difference between the linear model and the GPU-intensive one unless you are doing computer vision. If you have small datasets, you might even find the linear model works better.

So it won't affect revenue, but it will cut costs to almost nothing.

Supporting evidence: I just completed this kind of migration for a bay area client (even though I live in Australia). Training (for all customers simultaneously) runs on a single t3.small now, replacing a very large and complicated set up that was there previously.

Yeah, I agree with this. Rather than ask if OP is optimizing their AWS billing, I'd also ask if are OP's devs even have any incentive to do better. Even with machine vision it's stupidly easy to increase your computation effort by 2 or more orders of magnitude for almost no benefit. Default parameters often will do that in fact.

I would second that. NN model is the catch all approach but it's very expensive to train. The shallow learning algorithms can work well in a variety scenarios.

linear model can be even offloaded to the client (javascript) so no compute will be even needed

I‘m a CTO of a compute intensive AI SaaS company, so I can relate.

One advice: speak to your AWS rep immediately. Get credits to redesign your system and keep you running. you can expect up to 7 digits in credits (for real!) and support for a year for free, they really want to help you in avoiding this.


AWS has always been eager to get on the phone with me to discuss cost savings strategies. And they don’t upsell you in the process.

Second this. You'll be surprised at the flexibility they show if you ask (and have a genuine problem).

I was in same situation.

We bough 2 Dell servers via their financing program. Each server is about 19-25K. We paid AWS $60K per month before that. We pay $600 for co-location.

So my advice is try to get hardware via financing of provider Dell had a good program I think.

This! We did the exact same, though our payback period was 2 months of AWS costs. Try and put the base load on your own servers, use the cloud to scale up and down when needed.

Cloud servers are a “luxury” that most don’t realise and just take for granted. Having said that, there are obvious overheads with handling your own servers, but when your costs are several salaries it’s probably worth considering.

Was about to suggest the same thing. You can buy physical machines with beefy specs for much less than your cloud bill when you get to these extremes.

this is good advice-- you can run those boxes into the ground for five years and easily get paid back

What does colocation mean in this context? Did you buy the servers and AWS hosted on their premises?

Colocation just means buying space in a datacenter somewhere (and it comes with a certain amount of power and bandwidth).


I have loud and angry thoughts about this; https://www.lastweekinaws.com/blog/ has a bunch of pieces, some of which may be more relevant than others. The slightly-more-serious corporate side of the house is at https://www.duckbillgroup.com/blog/, if you can stomach a slight decline in platypus.

Came here to recommend you! Your newsletter always provides both enlightenment and a giggle.

I came here to recommend QuinnyPig's services as well. He's a pro at reducing AWS costs.

Corey Quinn (Quinnypig) at Duckbill Group would be my suggestion as well.

You might be able to significantly lower your monthly bill in exchange for an upfront payment by purchasing your own servers and then renting co-location space.

I'm CTO of an AI image processing company, so I speak from experience here.

I personally use Hetzner.de and their Colo plans are very affordable, while still giving you multi GBit internet uplinks per server. If you insist on renting, Hetzner also offers rental plans for customer-specified hardware upon request. The only downside is that if you call a Hetzner tensorflow model from an AWS east frontend instance, you'll have 80-100 ms of roundtrip latency for the rpc/http call. But the insane cost savings over using cloud might make that negligible.

Also, have you considered converting your models from GPU to CPU? They might still be almost as fast, and affordable CPU hosting is much easier to find than GPU options.

I'm happy to talk with you about the specifics of our / your deployment via email, if that helps. But let me warn you, that my past experience with AWS and Google Cloud performance and pricing, in addition to suffering through low uptime at the hands of them, has made me somewhat of a cloud opponent for compute or data heavy deployments.

So unless your spend is high enough to negotiate a custom SLA, I would assume that your cloud uptime isn't any better than halfway good bare metal servers.

I'd suggest reaching out to AWS about this. Explain the situation. AWS has a number of programs for startups that you may be able to apply for, including one that includes 100k worth of credits.

Also, if you can't afford to scale to new customers... stop? I'm sure it probably sucks, but like, does it suck more than having no runway? Seems like you'd be best served slowing things down and spending some time with AWS on cost optimization.

There aren't a lot of details to go off of here so I don't know what more advice to give.

We've managed to reduce our spends by almost 50-60%. Some pointers: 1. Comb through your bill. Inspect every charge and ask "Why do we need this?" for every line item.

2. If user latency is not a problem, choose the cheapest regions available and host systems there.

3. Identify low usage hours (usually twilight hours) and shut systems off.

4. Transition one-off tasks (cron, scheduling etc.) to lambda. We were using entire servers for this one thing that would run once a day. Now we dont.

5. Centralize permissions to launch instances etc. within a few people. Make everyone go through these 'choke-points'. You might see reduced instances. Often engineers launch instances to work on something and then 'forget' to shut them off.

6. Get AWS support involved. I'm pretty sure with the bills you are racking up you must have some AWS support. Get some of their architects etc. to check out your architecture and advise.

7. Consider Savings Plans and Reserved Instances. Often you get massive cost savings.

8. Consider moving some of the intensive number crunching to some of AWS' data crunching services. We moved a high-powered ELK stack for analyzing server logs to CloudWatch. A little more expensive in the short term, but we are now looking to optimize it.

In my experience, AWS has been very supportive of our efforts at reducing costs. Even after a 50-60% reduction I still feel there is scope for another round of 50-60% reduction from the new baseline. All the best!

Here's my deck on this @quinnypig is a great resource elsewhere in this thread. https://docs.google.com/presentation/d/1sNtFugQp_Mcq62gf4F1n... Last year I cut 75 million in spend, so you could say I have a track record there.

Are you sure you are using the right type for what you need to generate? Can you have your model generator self kill (stop) the instance when it finishes the model?

100% If it doesnt need JIT go spot and build models off queue

Put in for the activate program. They can give you up to 100k of credits.

Can you give a little context to the $75 million in savings? What was the original amount you were spending? I didn’t see this on your deck.

Don’t overlook the possibility to use your own physical hardware, running high-end commodity graphics cards (2080Ti, Titan RTX), especially for model training. (I haven’t found this to be overly effort or time intensive and the payoff is enormous on a dollars-basis.)

You didn’t give enough details for someone to get really specific. I’m assuming from your text that the issue is inference not training costs, in which case there’s some great advice already posted, but more details might help.

I maintain an open source ML infra project, where we've spent a ton of time on cost optimization for running GPU-intensive ML models, specifically on AWS: https://github.com/cortexlabs/cortex

If you've done zero optimization so far, there is likely some real low-hanging fruit:

1. If GPU instances are running up a huge EC2 bill, switch to spot instances (a g4dn.xlarge spot is $0.1578/hr in US West (Oregon) vs $0.526/hr on demand).

2. If inference costs are high, look into Inferentia ( https://docs.cortex.dev/deployments/inferentia ). For certain models, we've benchmarked over 4x improvements in efficiency. Additionally, autoscaling more conservatively and leveraging batch prediction wherever possible can make a real dent.

3. Finally, and likely the lowest hanging fruit of all, talk to your AWS rep. If your situation is dire, there's a very good chance they'll throw some credits your way while you figure things out.

If you're interested in trying Cortex out, AI Dungeon wrote a piece on how they used it to bring their spend down ~90%. For context, they serve a 5 GB GPT-2 model to thousands of players every day: https://medium.com/@aidungeon/how-we-scaled-ai-dungeon-2-to-...

Speak to your AWS account manager and/or someone on their startup team. Give them the detail on what you’re running, what you want to do, and what/when you’re hoping to reach the next milestone. There’s usually a few different options available to them to try help you out. Including, but not limited to, working out how to reduce the ongoing cost of what you’re trying to do. “Customer obsession” and all that. It’s also just good business. It’s not in anybody’s interest to have companies running out of runway, they’d rather you were still in business and paying for compute 5 years from now.

Sounds familiar =\

- get devs on GPU laptops

- for always-on, where doable, switch to an 8a - 6p policy, and reserved. Call aws for a discount.

- use g4dn x spot. Check per workload tho, it assumes single vs double.

- consider if can switch to fully on-demand if not already , and hybrid via GCP's attachable GPUs

- make $ more visible to devs. Often individuals just don't get it, too easy to be sloppy.

More probably doable, but increasingly situation dependent

ALSO: For all the discussion of on-prem, for ML in particular, consider running training on a dedicated local hw box and run only inference on the cloud (which can be CPU)

I’ve been mulling this idea over in my head recently of investing a $2-3k in building a machine to do exactly that (and use it as a normal dev day to day machine when it’s not training), because it appears the economics of it are surprisingly great.

Have you (or anything else here) had experience doing this? Did it end up being a worthwhile approach? (Even for a while)

It depends how long it is on.

If training only short while, may do better by setting up a cloud training workflow that only has the server on while training. If on a lot, then a private box makes more sense (ex: lambdalabs, at home/office/colo). Then setup as a shared box for the team.

A lot of time ends up dev, not actual training, and folks end up keeping dev cloud GPUs on accidentally. We still use cloud GPUs for this, but have primary dev on local GPU laptops. For that, we started by System76 for everyone (ubuntu Nvidia), but those had major issues (weight, battery draw...). I then did a lightweight asus zenbook for myself, but that was too lightweight all around. Next time will do more inbetween or explore Thinkpad options.

And yep, as a small team, this mix dropped our cloud opex spend by like 90%, and pretty fast to offset the capex bump.

Can you use spot instances? If so you can pay a lot less for compute. Your app needs to tolerate being shutdown and restarted, however.

Is there anything you can turn off at night? A lot of startups have staging / test systems that do not need to be running all the time.

Are you keeping a lot of "junk" around that you don't actually need? Look at S3 objects, EBS snapshots, etc. A few here and there doesn't cost much, but it does add up.

Are you using the correct EBS volume type? Maybe you're using provisioned IOPS where you don't need it.

S3: make sure your VPC has an S3 endpoint. This isn't the default. Otherwise, you're paying a lot more to transfer data to S3.

I have replied to some of the comments below. My advice is to get off AWS or any public clouds and avoid them like the plague.

They are too expensive for 95% of cases. If you are still not convinced DM me.

Cloud is expensive for sure, especially so for VMs and bandwidth.

But cloud also comes with a lot of convenience - for example, having managed k8s, and highly-available serverless, messaging, blob storage and databases.

Some of that is particularly challenging to get right, especially for databases.

It's difficult to justify cloud VMs for heavy processing tho - they really are just so damned expensive compared to bare metal and VPS providers, and there isn't that much extra convenience for VMs in the same way there is for PaaS stuff.

While looking at the technical, also look at the commercial. Can you trace revenue sources to aws costs? In other words calculate your variable costs for each client/contract individually?

Eg are there some clients losing you money that you can either let go or raise prices for?

If you can handle some interruption to your work then spot instances are probably going to be the biggest immediate change you can make.

Right now a g4dn.xlarge is $0.526/h on demand but only $0.1578/h as a spot instance.

You might also be eligible for a 10k grant from AWS - https://pages.awscloud.com/GLOBAL-other-LN-accelerated-compu...

If cost is an issue, get off AWS. Immediately. You're paying about 10x what the same hardware/bandwidth would cost you if you just bought dedicated servers.

If you have the time to fix them asap you can follow this route:

- use spot or reserved instances or saving plans. - have a look at compute optimizer - understand aws networking costs are and try to optimise it (cross az and internet egress can be costly) - go through the trusted advisor checks: https://aws.amazon.com/premiumsupport/technology/trusted-adv...

You can enable trusted adviser checks by enabling business or enterprise support. - try using one of these cost optimisation tools: https://aws.amazon.com/products/management-tools/partner-sol...

- contact aws for well architected review

If you don't have the time, then I suggest contacting AWS to introduce you to a consulting partner. They can come and actually fix whatever is needed.

You train the model locally and push it for inference to the cloud?

What exactly are we talking about here?

Couldn’t you build a dual NVIDIA 20XX / 32 core / 64 GB for a sub $5k and then save money while training/developing faster?

Except they (the gender non-specific singular) is probably running kubernetes and has multiple clusters of 10 or so gpu hosts. Not that I disagree, but spinning that up locally and orchestrating it will take time and money. And explaining why training is paused because you keep blowing breakers in the office will cost political capital.

You can just say “Except they are probably”.

GPU servers and coloc are pretty cheap these days. $1K/m rent per 20A of power. ROI on hardware is usually 3-4 months max (ie - for the cost the machine at AWS for 3-4 months, you can buy the same thing).

Lead time might be a problem for you but you can probably do it in a under a month if you take available stock at your vendor. I work with a company called PogoLinux (http://pogolinux.com) out of Seattle and they sell boxes that have 4 GPUs in them.

That said -- the other advice is right. You can probably get by with a much simpler model. The coloc route would probably only be better if you are can't change the models due to people constraints and the ML stuff doesn't have a lot of AWS dependencies. SysAdmins are a lot easier to find and hire than ML specialists.

In terms of cost, I would recommend deeply interrogating the bill. Your data transfer cost is likely to be really higher than you expected, and there are lots of ways to mitigate that. GPUs are crazy expensive in the cloud, and really makes sense to host locally. There is also usually some money to be found with looking at S3 tiers - like Infrequent Access can save a lot if its good for your use case. Finally, if EC2 is a big cost driver, spot pricing and savings plans are good places to start.

I will say that more generally speaking, there has been a lot of recognition in the industry at large that AI-driven startups all face this challenge, where the cost of compute eats up most of the margin. There is no easy solution to that, other than to make product-level decisions about how to add more value with less GPU time.

AWS is super expensive. Switch to another cloud provider.

For example : Scaleway, OVH, or Hetzner.

Can confirm this. Personally I wanted to switch to AWS from Scaleway because one of the regions was closer to the customers. No way I could justify the costs. With some load balancers and API access, we were able to scale horizontally without a problem.

Are you using Postgres by chance? If so, I'd love to hear about how you deployed it (struggling to figure out a performant, HA setup!)

Yes absolutely. This is the right fit for 95% of customers.

I don't know how deep you've dug but the very first thing you should be doing is using spot instances instead of on demand instances (unless you absolutely can never wait to train a model). Spot instances are cheaper than on demand instances, with the downside that the price can fluctuate, so you need to build in a precaution for shutting down if the price gets too high. So if the price goes up, you either have to stop training until the price goes back down or to suck it up and pay a higher price.

Luckily, it's pretty simple to handle interruptions for neural network like models that train over several iterations. Just save the model state periodically so you can shut the instance down whenever the price is too expensive and start training again when the price is lower.

If you're running GPU heavy stuff all the time then you're probably better off just buying some GPUs outright and doing that part on-site.

Especially if you can keep the own gear busy 24/7. i.e. run those 24/7 and any excess GPU use above that fall back onto cloud for that.

Talk to an AWS rep and also different cloud vendors. I know startups which received large amounts of free compute in their early days and then went on to become successful companies. I bet it was win-win for everyone involved.

If you're storing a lot of data I talked to someone who went from $3,000 a month to $3 a month by saving older dumps of their database into an S3 bucket instead of keeping many many old RDS snapshots from weeks / months ago around.

Here's a direct timestamp link to that point in the podcast where it came from: https://runninginproduction.com/podcast/33-zego-lets-you-eas...

Segment's blog posts on cost optimisation have plenty of detail and tips on this topic:

https://segment.com/blog/the-million-dollar-eng-problem/ https://segment.com/blog/spotting-a-million-dollars-in-your-... https://segment.com/blog/the-10m-engineering-problem/

Similarly this Honeycomb writeup is also excellent: https://www.honeycomb.io/blog/treading-in-haunted-graveyards...

By the sounds of it, you need to take drastic action. It sounds like you will not be able to just optimise your AWS spend to get more runway, though you should definitely do some bill optimisation. You will need to optimise your product itself and maybe even getting rid of unprofitable customers.

If you are not sure exactly who or what is driving the AWS cost, take a look at Honeycomb to get the ability to dive deep into what is eating up resources.

AWS is one of most expensive hosting solution, I assume many of us somehow start to think that it’s a best one, in my opinion they all kinda same. Moving to other place will require effort but can let you reduce cost to 10-20% of current one. Some easy things that you can do with aws is to resize VMs, it will require to turn it off for a minute or so. Also can change to cheaper tire, eg t2 -> t3. Also can change VMs from ec2 to lifhtsail

I help companies find bare metal options for training models. It’s usually 10-20% the cost of cloud.

Email me lanec (at) hey (dot) com if you’d like to speak.

Last year I took a company spending $24k/month training visual AI and cut that down to $3,500/month with bare metal. I also helped them secure over $100k in cloud credits to cover the monthly costs until the transition could happen.

Training in the cloud is generally much more expensive than bare metal.

Run your own machines.

You don’t have to use cloud services.

Simple answer. But the implementation is trickier.

You have to use Spot instances. Or as Google calls them - preemptible instances. These are upto 80% cheaper.

The caveat is that they can be killed anytime, so your infrastructure must be resumable.

Most likely you will need to do kubernetes. It's the only framework that supports GPU, integrates with spot instance providers and works with Ml platforms (using kubeflow)

How about you just purchase some motherboards and GPUs and start running them in your office (assuming you're not bandwidth limited or looking for millisecond response times).. I'm always tempted to do this when we have fairly constant workload. Wasn't GPU instance pricing quite insane on AWS compared to actual GPU costs?

This is not exactly it i imagine. But maybe longer term you could consider this.

At my place people test on their desktops and run production stuff in the data center.

Where are you located? These are prices in singapore..http://www.fuwell.com.sg/uploads/misc/Fuwell11072020.pdf

You're looking for a cpu, board, 64gb ram, maybe 2 x 2080ti, small ssd n psu(1000w?). You can leave these on ikea shelves n skip the casings if need be.. 3 x 2080ti makes the board expensive and psu hard to find...

If you want more reliability. Get asus or supermicro. Or even sugon. 4gpu. 2u.

So that's like a few kw per machine and you need to think about how much power you can draw per power socket..so usually the 2u stuff end up in datacenters.

Someone else mentioned it already in these comments, but I'll mention again to make sure it's not missed. If you're a startup using AWS, apply for the AWS Activate program. All you need to do is apply, and they'll give you up to $100k AWS credits, which will last for up to 2 years and automatically be applied to your bill until they're used up.


It's not a solution to the larger problem of business model and percentage of revenue going toward compute costs to provide your service, but there are a lot of other great recommendations and suggestions here for that. This could provide you some time to actually implement the other recommendations.

nVidia forces cloud providers to use their expensive professional line-up. But other providers that use consumer GPUs are way cheaper, 4x or more. If your models don't need a lot of memory or double precision, providers such as GPUEater or RenderRapidly can be worth looking at.

Related reading: Ask HN: How did you significantly reduce your AWS cost? (2017) [0]

The top comment is great. Two easy wins:

* Putting an S3 endpoint in your VPC gives any traffic to S3 its own internal route, so it's not billed like public traffic. (It's curious that this isn't the default.)

* Auto-shutdown your test servers overnight and on the weekends

See also this thread from 2 days ago, Show HN: I built a service to help companies save on their AWS bills [1]

Those threads aren't specific to GPU instances though.

[0] https://news.ycombinator.com/item?id=15587627

[1] https://news.ycombinator.com/item?id=23776894

It is very unlikely that anyone is going to give you a good advice with so little information about your cost structure. There is great people here who can provide invaluable insights about your costs but they need to have more information.

We use a lot of GPU intensive models and 80% of revenue goes into AWS, doesn't mean that your AWS cost is mostly GPU. It should mean that, but who knows. Tell us how is your AWS infrastructure, what instances do you have, how much do they cost to you, etc. Because with your information about the costs the best advice you can get is to not use AWS neither GPU-intensive ML models.

For the ML models you can also switch to dedicated server providers, such as: https://www.dedispec.com/gpu.php

For storage, there‘s always Wasabi / B2 with S3 compatible interfaces. If the data itself is not changing that much, so regular backups are possible, just use some dedicated storage servers with hard drives and install MinIO. Do not rely on S3 for outgoing data (much too expensive), use a caching layer on another provider (OVH , Hetzner, ...), or if it fits your workload, Wasabi („free“ egress).

At a startup I worked earlier, we tried two things that helped : 1. Reserved instances (you commit for a year and you can save 20% - charged monthly. AFAIK no upfront costs)

2. Like another reader suggested here, there are accelerators/foundations which give away $10k for the 1st year towards cloud usage. We were in healthcare and had a big chip company pay about $10k in credits for a year of AWS. Depending on the domain you are in, there may be a few. If you let me know which domain you work in ( healthcare , media etc.) someone here might be able to point to the right resource

Without any idea of what your infrastructure looks like, I can't give you anything actionable, but that might be enough advice in and of itself: go after the low hanging fruit first. What are you spending on? Look at the top two or three services by spend and dig a little deeper.

Are you spending on bandwidth? See if there's compression you can enable. Ec2? Can your reduce instance sizes or autoscale down instances you're not using overnight? Elasticache or elastic search? Tune your usage, store smaller keys or expire things out.

Start by looking at the breakdown of your costs in the cost analyzer. Look for the categories of your biggest spend. Is it storage? EC2? Something else? For storage; see if you can clean up things you don't need anymore. See if you can move infrequently used data into long-term, cheap storage (but beware retrieval costs!). For EC2, consider changing node types to cheaper ones. Newer classes are can be much better value for the money. Make sure you use spot instances where you can. Focus on the biggest expense first.

Disclosure: I am the founder of Usage.ai

The product my team works on, www.usage.ai, automates the process of finding AWS savings using ML (through reserved instances, savings plans, spots, and rightsizing). Every recommendation is shown on a sleek webpage (we spent a lot of resources on UX).

We haven't fully explored the ML use case, but I'd love to figure out how we can help you drive down the costs associated with your GPU models. Would you have 15 minutes this week for a discussion?

If you're interested, you can reach me at kaveh@usage.ai

Using large data stored in S3?

Make sure you are fetching it via a S3 endpoint in your VPC instead of via public HTTP. You are paying for an (expensive) egress cost you don’t need to be paying for.

That's insane that that's the default!

Spot instances. Easy and saves a ton.

If AWS cost is 80% of revenue, and the added cost per customer isn't paying for itself, perhaps one could either charge more or pause customer acquisition?

We had the same problem!

We managed to cut our costs by about 2/3rd by doing two things:

1) moving to batch (this spins up machines to run docker containers without much hassle. You can also share instances) 2) use spot instances.

Spot instances integrate nicely into batch, and depending on how you set it up, you can optimise for speed or cost. for example a p2.xlarge is still $.9 but on spot its about $0.25-0.35

Dedicated server somewhere close to your office.

This thread from yesterday might be useful https://news.ycombinator.com/item?id=23776894

https://github.com/similarweb/finala seems promising

I have a strong suspicion the OP is trolling, or at least his motives aren't obvious.

Check his profile.

He has people, or knows of people that can can likely help with this. He's a CEO of a an Accelerator and is not a newb by any sense.

OR... he's using gamification to find someone to hire to actually help solve this problem. If that's the case... Bravo sir!

Lots of people in here mentioning Reserved Instances, so it's worth mentioning Reserved AI (https://www.reserved.ai/).

We're customers and have been very happy with them, it was super quick to get set up and saves us a big chunk.

A quick Google search for GPU dedicated server is probably going to save you tens of thousands of dollars a year.

There are quick wins with spot instances and also Fargate. It’s hard to say anything without knowing type of workloads and compute that you have. But there is always opportunity to save there.

Other than that, you should also look at your architecture. Often there is opportunity to save there as well.

[Disclosure - my company sells a cost optimisation product]

1. You are going to get a lot of advice to move to your own hardware - DON'T. Companies use cloud for the flexibility and lower operational overhead, not because it's cheap. Consider if your org is mature enough to run its own servers and has the 6 months it will take to get everything setup.

2. Talk to your AWS account manager. They will work their asses off to stop you churning to another provider or to your own hardware, because they know they are losing your revenue entirely for minimum 2 years.

3. Switch it off. If you're not using it outside of business hours, you're wasting money. This is the easiest cost saving you will make (my company, GorillaStack, provides a service that makes this easy to setup without scripting and a 14 day trial)

4. If you have a baseline of servers that you will constantly need, reserved instances offer great savings. There is a secondary market for these, where you can get shorter periods cheap from other customers who don't need them.

5. If you haven't already, look at your bill and the breakdown of services. Cost optimisation consultants (they do exist) will start here, and by attacking the biggest line items first.

They are usually EC2 Compute, EBS, Transfer Costs, etc. Prioritise based on size and ease of implementation.

You should make a habit of checking it at least every few days to keep on top what is going on.

6. Delete unused resources - you need to be ruthless with developers who leave unused EC2 resources around after creation. The key isn't to lock down your environment and stop developers from creating what you need, but enforcing a tagging policy on resources to track who owns what. There are crude scripts that scan your environment and delete untagged resources after a certain period.

7. Once things are under control, use CloudWatch Cost Alarms to get notifications when things are crossing a predefined threshold. These can be connected to SNS for receiving emails (and there are simple deployable solutions for receiving these via Slack webhooks, for example).

Some further advice: 'right-sizing' is often held up as an important cost saving method, but can often be much more trouble than its worth. If your workload is going to be a pain and require endless planning and regression testing when you switch instance size, reconsider - you will waste more in developer time than the cost savings over a few years.

Make sure to tag every instance/resource/disk in AWS, with their purpose, team, etc...

Then you can go into the AWS costs explorer and see the costs breakdown per tag.

Usually there will be a few resources standing out. 80% of the costs is 20% of resources. Find out what they are and cut.

Lots of comments with tips on “how”, but your last paragraph makes it sound like your are looking for a “who”.

I’ve heard good things about https://www.duckbillgroup.com/

If you are growing and/or have funding the other cloud providers will throw credits at you.

If you arent growing or have funding then go to a less expensive host. There are TONs of high quality hosts out there that are quite a bit less expensive.

Please check this Show HN thread - https://news.ycombinator.com/item?id=23776894

A developer has created an app for this same purpose

Depending on the entropy of your input caching may be a way out. Sometimes if you can't cache end result you can cache some intermediate results.

I would assume that if you are big enough you may be able to negotiate some pricing.

1Cloudhub an AWS partner has a very affordable product called Costimize for AWS to help save costs. I will be happy to help. Contact Sankar.nagarajanAT1cloudhub.com

Secondly, you can use AWS spot instances for GPUs which costs less.

Reserved instances are usually the lowest hanging fruits depending on your usage profile and how much you can commit to and/or pay for up front. Savings of 30%+ are very achievable.

AWS likely has a retention department that can give you discounts or credits to make you stay. Ask for credits and use the extra time to set up your own hardware.

If your computational load is spikey, I would suggest looking at fargate and the spot market. Also for storage, I would suggest leveraging S3 whenever possible.

The answer is simple: don't use AWS. You will never get out of this hole unless you move from AWS, because AWS is not scalable budget-wise.

Take a look in the cost explorer

A low hanging fruit are spot instances if you can manage stateless sessions.

If you have multiple snapshots that could cost money as well

It would be helpful to know which services you use. Do you use ML services or instances with GPU? Where is most of the cost?

Post your bill. You often have unexpected charges for ingress or services you are unaware

an acquaintance of mine has a business that specializes in the problem you’re facing. please feel free to reach out to them: https://www.taloflow.ai/

Inless GPUs are an absolute must, just use Hetzner and never look back at AWS.

Switch from RDS to lightsail instances for trivial dB workloads

Could also apply for EC2

if using deep learning models, consider using distilled and/or quantized models to reduce the resources required for inference

Reserved instances? Instant 20-40% savings.

Real hardware with colocation

Blog article about real-life AWS cost optimization strategy :



- Monitor your daily costs

- Cloudify your workloads

- Use reservations, Spot instances, saving plans

- Look around for unused infrastructure

- Optimize S3 consumption

In our experience this is a Sisyphian task. The number of elements that get charged is just enormous.

These are the biggest ways to lower cost that I've used in the past, with a high burn rate it's important to focus on the things that can change the economics on a short timeline ( think next week ), as well as activities on a longer-timeline ( next year ). You should have a plan in place for your board - and be able to discuss the cost reduction strategy for Cost of Goods Sold in any future financing rounds. Carefully consider the full TCO - buying colo hardware means opting out of ~3 years of future price reductions/hardware improvements in the cloud + opportunity cost.

1) Call your provider and find out what options they have to cut your cost. This can take the form of discounts, credits, or increased reservations

2) It's not uncommon for ML teams to have excess capacity sitting around for forgotten R&D activities. Make sure that you're team is tearing down hardware, consider giving all scientists their own dedicated workstation for model development activities. You can smoke test the opportunity here by verifying that the GPUs are actually being utilized to ~40-80% average capacity.

3) Really dive into whether you need the parameters/model architecture you have. The best model for your company will need to balance latency/cost with accuracy. If you're using a transformer where a CNN or even a logistic regression with smart feature extractors could do with 1% accuracy loss. Then do your customers really need the transformer?

4) As others have suggested drill-down on the inference and training costs. Train less frequently/not at all/or sample your data. Generally the benefit of using more data in a model is logarithmic at best vs. the linear training time.

5) Buy your own hardware, particularly for GPU inference RTX cards can be purchased in servers for your own colo - but not in clouds. The lead time would be a few months but the payoff could occur within ~2-6 months in a colo.

6) Leaving this here as it used to affect Analytics/Ad-Tech and other "big-data" companies. Programming languages are not created equal in performance, and given equal implementations a statically typed language will crunch data between 10 and 1000x faster and cheaper than a dynamically typed language. If your business is COGS pressed then your team will probably spend more time trying to optimize hardware deployments and squeezing perf out of your dynamic language than you gain in productivity. Drill down on your costs and check how much of it is raw data-processing/transaction scheduling/GPU scheduling and make sure that you're on the right tech path for your customers.

Lastly at an 80% Cost of Goods Sold(COGS) it's quite possible that your business is either low margin or the pricing structure isn't well aligned as this is a new startup - ask yourself if you expect to raise prices for future non-founding customers. If so then it's possible that your current customers are helping reduce your marketing expenditures, and you may be able to leverage the relationship to help "sell" to future customers.

Don’t use GPUs at inference (serving) time unless you prove that you need to.

The only consistent case when I’ve found it’s needed (across a variety of NLP & computer vision services that have latency requirements under 50 milliseconds) is for certain very deep RNNs, especially for long input sequence lengths and large vocabulary embeddings.

I’ve never found any need for it with deep, huge CNNs for image processing.

Also consider a queue system if utilization is a problem switching from GPU. Create batch endpoints that accept small batches, like 8-64 instances, and put a queue system in front to mediate collating and uncollating batch calls from the stream of all incoming requests (this is good for GPU services too).

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact