Do you think the same about text that is indistinguishable from human-written text (LLM chatbots)? Or voice that is indistinguishable from a human talking?
Illegal things, like fraud and impersonation, are illegal. There's a difference between the tool and the actions people do with the tool.
There are tons of useful applications of interactive avatars - from corporate training to kids education to language learning and more. Plus, why would you want to stop this little guy from existing in the world? :) https://lemonslice.com/try/alien
I don't think the same of them because they are not the same thing. Can you not see that the potential for harm is far greater? You can't simply ignore the potential uses of the technology you create. You have the choice to design your technology so it retains its usefulness while limiting the harm; have you given any time to thinking about how you could do that?
The alien is a diversion from the concern; I'm talking about realistic human avatars. Let's stay focused on that.
Let me suggest a worthwhile exercise. Just take ten minutes. What are some of the ways that realistic human avatars would make deception more effective or more scalable than previously possible?
Come up with three scenarios, and let's talk about them, honestly and thoughtfully.
This is so obvious now that you say it (* facepalm *). We definitely need to give the LLM context on the appearance (both from the initial image as well as any /imagine updates during the call). Thanks for pointing it out!
This is good feedback thanks! The "not patient" feeling probably comes from our VAD being set to "eager mode" so that the latency is better. VAD (i.e. deciding when the human has actually stopped talking) is a tough problem in all of voice AI. It basically adds latency to whatever your pipeline's base latency is. Speech2Speech models are better at this.
Good idea. We need to do that. I'm also excited to push the /imagine stuff further and have B-roll interspersed with the talking (like a documentary) or even follow the character around as they move (like a video game)
reply