Machine learning stuff seems generally expensive to run in production, especially at scale (how many GPU machines do you need up and running to serve 100 concurrent users without major delays?). It seems especially hard for B2C/freemium-style companies to offset those costs.
If, for example, you're a freemium web app where each user request takes 100ms of inference time, you may have trouble turning a profit. Even something as basic as a translation tool (à la Google Translate) seems tricky for a startup to run at a profit. Larger companies presumably have the flexibility to treat some of these things as loss leaders, whereas a startup with ML as its core product likely needs to be more conscious about cost.
What are some ways that you've seen companies cut down on compute costs for machine learning services?