One interesting note with voice AI is that you can shove static datasets into the long context windows of these newer models like 2.0-flash-lite. It creates a Model Assisted Generation(MAG) and returns super low latency and 99% relevant information to the bot. Theres a good example in the foundational example of the pipecat github.