Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> How to utilize multiple NVIDIA GPUs?

| Tabby only supports the use of a single GPU. To utilize multiple GPUs, you can initiate multiple Tabby instances and set CUDA_VISIBLE_DEVICES (for cuda) or HIP_VISIBLE_DEVICES (for rocm) accordingly.

So using 2 NVLinked GPU's with inference is not supported? Or is that situation different because NVLink treats the two GPU as a single one?



> So using 2 NVLinked GPU's with inference is not supported?

To make better use of multiple GPUs, we suggest employing a dedicated backend for serving the model. Please refer to https://tabby.tabbyml.com/docs/references/models-http-api/vl... for an example


I see. So this is like, I can have tabby be my LLM server with this limitation or I can just turn that feature off and point tabby at my self hosted LLM as any other OpenAI compatible endpoint?


Yes - however, the FIM model requires careful configuration to properly set the prompt template.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: