Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency

cchance · 2024-09-06T18:07:42.000000Z

Fucking amazing examples, will you guys be putting up on huggingface to play with or releasing the model... or going commercial and locking it up?

caohongyuan · 2024-09-06T07:16:24.000000Z

TL;DR: we propose an end-to-end audio-only conditioned video diffusion model named Loopy. Specifically, we designed an inter- and intra-clip temporal module and an audio-to-latents module, enabling the model to leverage long-term motion information from the data to learn natural motion patterns and improving audio-portrait movement correlation. This method removes the need for manually specified spatial motion templates used in existing methods to constrain motion during inference, delivering more lifelike and high-quality results across various scenarios.