There are other ML TTS models that are both lightweight and can run on a CPU. Check out Glow-TTS for something that will probably work.
Also swap out the HifiGan vocoder for Melgan or MB-Melgan as these will also better support your use case.
I ran this exact setup on cheap Digital Ocean droplets (without GPUs) and it ran faster than real time. It should work on a Pi.
Unfortunately I'm not aware of STT models that operate under these same hardware constraints, but you should be good to go for TTS. With a little bit of poking around, I'm sure you can find a solution for STT too.
There are other ML TTS models that are both lightweight and can run on a CPU. Check out Glow-TTS for something that will probably work.
Also swap out the HifiGan vocoder for Melgan or MB-Melgan as these will also better support your use case.
I ran this exact setup on cheap Digital Ocean droplets (without GPUs) and it ran faster than real time. It should work on a Pi.
Unfortunately I'm not aware of STT models that operate under these same hardware constraints, but you should be good to go for TTS. With a little bit of poking around, I'm sure you can find a solution for STT too.