Thanks for these thoughts and compliments. I love the idea of preventing landfill with this tech. Our team is awesome and we really love our customers and all the jobs that can be done with this kind of tech!
We're partnering with GPU infrastructure providers like Replicate. In addition, we have done some engineering to bring down our stack's cold and warm boot times. With sufficient caches on disk, and potentially a running process/memory snapshot we can bring these cold/warm boot times down to under 5 seconds. Of course, we're making progress every week on this, and it's getting better all the time.
We have the ability to send phonetic pronunciations as guidance, and this could be a great addition to our LLM/response generation stack! Adding a check for names and then adding in the phoneme.
Thanks for that insight. Brian here, one of the engineers for CVI. I've spoken with CVI so much, and as it has become more natural, I've found myself becoming more comfortable with a conversational style of interaction with the vastness of information contained within the LLMs and context under the hood. Whereas, with Google or other search based interactions I'm more point and shoot. I find CVI is more of an experience and for me yields more insight.
Checkout Tavus.io for realtime. They have a great API for realtime conversational replicas. You can configure the CVI to do just about anything you want to do with a realtime streaming replica.