Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Vocal timing conditioned audio diffusion in real-time (riffusion.com)
8 points by haykmartiros 6 months ago | hide | past | favorite | 3 comments
We've been cooking up a new experiment where you can record yourself singing or talking and the app will generate vocals to match your words and timings. It's backed by an end-to-end latent diffusion model that generates audio conditioned on both the style and the lyric timings - and it's quite fast. Your actual voice and melody are not used, just the transcription, and we don't store the recording.

We've found it's a really natural way to control the output you want and dream up a song concept. Curious to hear what you think!




I've been pretty bearish on gen AI for music, but this is the most fun I've had playing with an AI tool in a long time- the filters remind me of the OG Instagram filter effect, where even shitty photos from phones could "magically" be enhanced.

This + the Music ControlNet post from yesterday gives me some hope that audio AI will go the direction of creative tools, rather than dystopian full song generation.


I'm impressed with the quality of the sound! Some of my generations were for certain bops I'm finding myself regenerating on "Surprise" just to see what the model can toss up.

Would it be possible for the model to generate based on the recorded melody in the future? It might also be cool to have increased controls, e.g. choose between male and female vocals, and things like that.

Super nice work!


Very cool! Is this the state of the art music gen model out there?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: