Depending on the TTS model being used latency can be reduced further yet with an LRU cache, fetching common phrases from cache instead of generating fresh with TTS.
However the naturalness of how it sounds will depend on how the TTS model works and whether two identical chunks of text will sound alike every generation.
As someone who loved the Moomintroll illustrations I find this both familiar and hilarious. I suppose I might have a different opinion if I'd actually read any of Tolkien's works.
> "She even made some of the characters especially tiny to elevate the landscapes." wish there were more examples of this in the images shown in the article.
Jansson’s illustrations were reused in one of the annual Tolkien calendars. I kept it on my wall for years, changing the month every so often. So… 13 illustrations reprinted
However the naturalness of how it sounds will depend on how the TTS model works and whether two identical chunks of text will sound alike every generation.