But from a marketing standpoint, "Audio Splitter" is probably not the term you want to use for this.
I only knew what this was because of frequently using Spleeter, Demucs, OpenUnmix etc to do Music Source Separation for hobbyist electronic music production on weekends.
"Audio Splitter" made me think it was some hardware device for redirecting audio streams. Like the thing that let's you connect multiple headphones to one jack.
For technical folk, the term they're probably searching is "Music Source Separation"
For audio folk, it's likely "Automatic stem extraction/isolation"
To the layperson, I'm not sure -- "How to convert song into individual parts"?
Last point I have no issue with: but you may face some hurdles due to the rest of this research area being very transparent and OSS.
And since they're comparing closed source offerings, it seems notable that iZotope RX (which can isolate not just vocals but also percussion and bass) isn't in the mix.
Mentioned here. It's not a significantly different or better implementation than bog standard Spleeters elsewhere, true to Izotopes exaggeratory marketing.
I just tried this and it's pretty nice. It's still not perfect but it's getting there.
I ran the theme song from the credits of Cyberpunk 2077 and the resulting vocal has tiny bits of instrumental elements in it every now and then but in parts it's perfect.
That song turned out to be a pretty good benchmark (plus I kinda wanted the vocals anyway), because it has a lot of reverb, stereo imaging trickery and various other effects on the instruments and the vocal. The instrumental didn't really come out too well, sadly. But the vocal is very usable! The Deezer one was noticeably worse on the vocals but to be honest even that one was amazing to see in action.
Compared to an AI like this the way people used to sweat blood to try to get something usable for a remix or a sample out of a song seems so archaic.
I might reinstall my DAW and instruments and play around with the extracted vocals now. :)
Is there a paper on how they're resynthesizing phase components? IME neutral networks are real real bad at handling fft phase, so separation tend to use frequency making, or use a learned filter bank.
Looks like demucs is using a learned filter bank (so no handling of fft phase), and open umix and spleeter are using spectrogram magnitude masking (just reuse phase from original fft in all channels). Can't immediately tell what bytedance is doing.
And I don't mean it this way.
But from a marketing standpoint, "Audio Splitter" is probably not the term you want to use for this.
I only knew what this was because of frequently using Spleeter, Demucs, OpenUnmix etc to do Music Source Separation for hobbyist electronic music production on weekends.
"Audio Splitter" made me think it was some hardware device for redirecting audio streams. Like the thing that let's you connect multiple headphones to one jack.
For technical folk, the term they're probably searching is "Music Source Separation"
For audio folk, it's likely "Automatic stem extraction/isolation"
To the layperson, I'm not sure -- "How to convert song into individual parts"?
Last point I have no issue with: but you may face some hurdles due to the rest of this research area being very transparent and OSS.