
Ask HN: How can startups running ML models reduce their compute costs? - rococode
Machine learning stuff seems generally expensive to run in production, especially at scale (how many GPU machines do you need up and running to serve 100 concurrent users without major delays?). It seems especially hard for B2C&#x2F;freemium-style companies to offset those costs.<p>If, for example, you&#x27;re a freemium web app where each user request takes 100ms of inference time, you may have trouble turning a profit. Even something as basic as a translation tool (à la Google Translate) seems tricky for a startup to run at a profit. Larger companies presumably have the flexibility to treat some of these things as loss leaders, whereas a startup with ML as its core product likely needs to be more conscious about cost.<p>What are some ways that you&#x27;ve seen companies cut down on compute costs for machine learning services?
======
sidlls
Batch as much as possible. For companies that absolutely must have actual ML
the temptation to make everything real-time/on-demand is high but sometimes
(often) unnecessary. Take a serious, hard look at where that line is. It’s
like companies that need simple rules or analytics reaching for ML: an
unnecessary expense in terms of development time and hard cash.

------
consultutah
If you can batch things up and don’t need real-time returns using spot
instances is a great way to lower costs.

