I've seen some models like MuseGAN but nothing yet that has the same level of performance and generality as Stable Diffusion (granted, they are different applications). Does such a model exist?
I'm looking forward to such a model being announced too, as I'm sure people must already be working on it.
I did think, though, that a pre-requisite for such a model would be a system which could separate a track into its component instruments (and reverse engineer all the audio mixing that went into the final product) in order to reduce the dimensionality of the input to the learning model.
There's been some progress on that front, but not enough to produce a perfect transcription, and I'm not even sure if a transcription to sheet music would be the ideal data representation for an AI to truly understand what makes a good piece of music anyway.
It would not be as high level, at least initially. It's more querying for distinct layers of a song to compile together.
The tools are intended to be an interaction between the artist and the computer to form a feedback loop similar to what is found when you are able to play an instrument, but more accessible. The idea is that everyone knows what sounds good to them when they hear it, but very few comparatively are currently able to make what sounds good. In the same way that dalle aims to empower everyone to make digital art, neptunely aims to empower everyone to make music.
Mostly private IP still but this is bound to come out at some point. There are some folks on Fiverr who've definitely technically figured out some kind of solution.
I did think, though, that a pre-requisite for such a model would be a system which could separate a track into its component instruments (and reverse engineer all the audio mixing that went into the final product) in order to reduce the dimensionality of the input to the learning model.
There's been some progress on that front, but not enough to produce a perfect transcription, and I'm not even sure if a transcription to sheet music would be the ideal data representation for an AI to truly understand what makes a good piece of music anyway.