Hacker Newsnew | past | comments | ask | show | jobs | submit | devnen's commentslogin

That's a great point about the dependencies.

To make the setup easier and add a few features people are asking for here (like GPU support and long text handling), I built a self-hosted server for this model: https://github.com/devnen/Kitten-TTS-Server

The goal was a setup that "just works" using a standard Python virtual environment to avoid dependency conflicts.

The setup is just the standard git clone, pip install in a venv, and python server.py.


Oh wow, really impressive. How long did this take you to make?


It didn't take too long. I already have two similar projects for Dia and Chatterbox tts models so I just needed to convert a few files.


This is really impressive work and the dialogue quality is fantastic.

For anyone wanting a quick way to spin this up locally with a web UI and API access, I put together a FastAPI server wrapper around the model: https://github.com/devnen/Dia-TTS-Server

The setup is just a standard pip install -r requirements.txt (works on Linux/Windows). It pulls the model from HF automatically – defaulting to the faster BF16 safetensors (ttj/dia-1.6b-safetensors), but that's configurable in the .env. You get an OpenAI-compatible API endpoint (/v1/audio/speech) for easy integration, plus a custom one (/tts) to control all the Dia parameters. The web UI gives you a simple way to type text, adjust sliders, and test voice cloning. It'll use your CUDA GPU if you have one configured, otherwise, it runs on the CPU.

Might be a useful starting point or testing tool for someone. Feedback is welcome!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: