Hacker News new | past | comments | ask | show | jobs | submit login
Music ControlNet: Multiple Time-Varying Controls for Music Generation (musiccontrolnet.github.io)
75 points by GaggiX on Nov 14, 2023 | hide | past | favorite | 7 comments



I was thinking recently, now that we have multimodal text and image models, music and sound generation will probably get rolled in to the big foundation models. And then we can look at adding more niche modalities like 3D model generation. As we begin to explore large numbers of modalities we will have highly generalized models.


Looking at the "Melody & Rhythm Control" section under "Cherry-picked"... the rhythm control is weird. In many of the examples, the generated music clearly has a different BPM from the reference. But the model seems to still be trying to align the notes in units of time (rather than beats), so the notes get desynced with the beat. The model then tries to cover up the discrepancy and make it sound like syncopation, by emphasizing or de-emphasizing or outright altering notes, but it doesn't work very well.

Maybe conditioning on the BPM would help?


The model used here is very small, 41M, I wonder how well it would scale at a bigger size.


I understand that this paper is about controls. I wish there was more detail in how it differs to other music generation methods like MusicLM. That seems to be in the MusicGen paper though [5]!

But then I am more curious about how this compares to MusicLM in terms of music generation.


Love seeing the MIR research from CMU recently. Chris Donahue is the man!!


Awesome, I've been curious about using controlnet like this. I'm glad someone tried it out.


can I try this out somewhere?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: