Thanks for this, I actually appreciate the honesty. It is always difficult for m...

Oranguru · on Jan 2, 2023

For STT, take a look at Wenet: https://github.com/wenet-e2e/wenet

They provide support for running in a Raspberry Pi and it runs in real-time. I have tried the desktop version and the quality is good enough when the audio is clean.

echelon · on Jan 2, 2023

No problem!

There are other ML TTS models that are both lightweight and can run on a CPU. Check out Glow-TTS for something that will probably work.

Also swap out the HifiGan vocoder for Melgan or MB-Melgan as these will also better support your use case.

I ran this exact setup on cheap Digital Ocean droplets (without GPUs) and it ran faster than real time. It should work on a Pi.

Unfortunately I'm not aware of STT models that operate under these same hardware constraints, but you should be good to go for TTS. With a little bit of poking around, I'm sure you can find a solution for STT too.