I have a Linux server with Gigabit internet connectivity, which I recently upgraded with a single H100.
Given that models are becoming smaller, more efficient, and faster to run, I am considering to offer AI services (LLMs, RAG, summarization, custom ...) to local clients (professionals, small businesses, private people), who may opt for such services in return for privacy and first level support.
How would I go about setting it up, in terms of
- resource management
- resource sharing / load balancing
- monitoring
- usage-based charging
Some direction and advice would be greatly appreciated.
Thanks and Happy New Year!