Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: How to offer AI services with my H100?
2 points by jdthedisciple 28 days ago | hide | past | favorite | 4 comments
I have a Linux server with Gigabit internet connectivity, which I recently upgraded with a single H100.

Given that models are becoming smaller, more efficient, and faster to run, I am considering to offer AI services (LLMs, RAG, summarization, custom ...) to local clients (professionals, small businesses, private people), who may opt for such services in return for privacy and first level support.

How would I go about setting it up, in terms of

    - resource management
    - resource sharing / load balancing
    - monitoring
    - usage-based charging
Some direction and advice would be greatly appreciated.

Thanks and Happy New Year!




You can check https://openmeter.io for usage metering and billing.


If you didn't consider these things before buying an H100 then I frankly wonder why you even bought it in the first place. Was CUDA burning a $30,000 hole in your pocket or something?


I acquired it to experiment with models firsthand and so far made myself a nice RAG-based webapp (in a rather niche field).

Would be nice if it paid for itself somehow though.


I guess; but to anyone out there reading this : the cheapest available rtx with the memory you need is likely just fine for experimentation unless you're dead set on experimenting with some notoriously heavy hitting large-vram suites. no need for such heavy equipment just to experiment with.

although I understand the plight -- I routinely buy heavy equipment that is far and above beyond what I need and then fruitlessly hope for it to pay for itself , I have a garage full of such purchases!




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: