TL;DR: we propose an end-to-end audio-only conditioned video diffusion model named Loopy. Specifically, we designed an inter- and intra-clip temporal module and an audio-to-latents module, enabling the model to leverage long-term motion information from the data to learn natural motion patterns and improving audio-portrait movement correlation. This method removes the need for manually specified spatial motion templates used in existing methods to constrain motion during inference, delivering more lifelike and high-quality results across various scenarios.
Cover your any song into a Black Myth Wukong voice singing with Remusic's AI Black Myth Wukong Song Generator. Create hilarious Black Myth Wukong covers and original Black Myth Wukong songs for free online, bringing your favorite AI Black Myth Wukong's voice to life in just a few clicks.
reply