devnen's comments

devnen · 2025-08-06T10:17:16 1754475436

That's a great point about the dependencies.

To make the setup easier and add a few features people are asking for here (like GPU support and long text handling), I built a self-hosted server for this model: https://github.com/devnen/Kitten-TTS-Server

The goal was a setup that "just works" using a standard Python virtual environment to avoid dependency conflicts.

The setup is just the standard git clone, pip install in a venv, and python server.py.

k4rnaj1k · 2025-08-06T10:50:48 1754477448

Oh wow, really impressive. How long did this take you to make?

devnen · 2025-08-07T07:09:57 1754550597

It didn't take too long. I already have two similar projects for Dia and Chatterbox tts models so I just needed to convert a few files.

devnen · 2025-04-22T20:41:19 1745354479

This is really impressive work and the dialogue quality is fantastic.

For anyone wanting a quick way to spin this up locally with a web UI and API access, I put together a FastAPI server wrapper around the model: https://github.com/devnen/Dia-TTS-Server

The setup is just a standard pip install -r requirements.txt (works on Linux/Windows). It pulls the model from HF automatically – defaulting to the faster BF16 safetensors (ttj/dia-1.6b-safetensors), but that's configurable in the .env. You get an OpenAI-compatible API endpoint (/v1/audio/speech) for easy integration, plus a custom one (/tts) to control all the Dia parameters. The web UI gives you a simple way to type text, adjust sliders, and test voice cloning. It'll use your CUDA GPU if you have one configured, otherwise, it runs on the CPU.

Might be a useful starting point or testing tool for someone. Feedback is welcome!