I've been actively exploring ways to achieve a seamless and natural conversation with an AI.
I played with detecting silences and punctuation ( STT can detect eg question marks ), but this is clearly not enough for turn detection.
I think you made a huge step into that direction.
Did you write an article or blog post about how you trained your model ?
I'd love to make this work with multiple languages, or things like rhetorical question detection.