Cassiopeia: New Audio Splitter Superior to Spleeter and UMX

gavinray · on Oct 23, 2021

I know this comes off as "Eugh, that guy"

And I don't mean it this way.

But from a marketing standpoint, "Audio Splitter" is probably not the term you want to use for this.

I only knew what this was because of frequently using Spleeter, Demucs, OpenUnmix etc to do Music Source Separation for hobbyist electronic music production on weekends.

"Audio Splitter" made me think it was some hardware device for redirecting audio streams. Like the thing that let's you connect multiple headphones to one jack.

For technical folk, the term they're probably searching is "Music Source Separation"

For audio folk, it's likely "Automatic stem extraction/isolation"

To the layperson, I'm not sure -- "How to convert song into individual parts"?

Last point I have no issue with: but you may face some hurdles due to the rest of this research area being very transparent and OSS.

abetusk · on Oct 23, 2021

Spleeter and UMX are both FOSS [0] [1], whereas, as far as I can tell, Cassiopeia is not.

[0] https://github.com/deezer/spleeter/blob/master/LICENSE

[1] https://github.com/sigsep/open-unmix-pytorch

CharlesW · on Oct 23, 2021

And since they're comparing closed source offerings, it seems notable that iZotope RX (which can isolate not just vocals but also percussion and bass) isn't in the mix.

IndySun · on Oct 26, 2021

Izotope RX is spleeter, tweaked.

CharlesW · on Oct 27, 2021

It's great to know its origin story, thank you!

IndySun · on Oct 27, 2021

https://github.com/deezer/spleeter#projects-and-softwares-us...

Mentioned here. It's not a significantly different or better implementation than bog standard Spleeters elsewhere, true to Izotopes exaggeratory marketing.

beezischillin · on Oct 23, 2021

I just tried this and it's pretty nice. It's still not perfect but it's getting there.

I ran the theme song from the credits of Cyberpunk 2077 and the resulting vocal has tiny bits of instrumental elements in it every now and then but in parts it's perfect.

That song turned out to be a pretty good benchmark (plus I kinda wanted the vocals anyway), because it has a lot of reverb, stereo imaging trickery and various other effects on the instruments and the vocal. The instrumental didn't really come out too well, sadly. But the vocal is very usable! The Deezer one was noticeably worse on the vocals but to be honest even that one was amazing to see in action.

Compared to an AI like this the way people used to sweat blood to try to get something usable for a remix or a sample out of a song seems so archaic.

I might reinstall my DAW and instruments and play around with the extracted vocals now. :)

sdenton4 · on Oct 23, 2021

Is there a paper on how they're resynthesizing phase components? IME neutral networks are real real bad at handling fft phase, so separation tend to use frequency making, or use a learned filter bank.

gavinray · on Oct 23, 2021

There's maybe something useful for you here?

https://github.com/facebookresearch/demucs

https://github.com/sigsep/open-unmix-pytorch

https://github.com/bytedance/music_source_separation

https://github.com/deezer/spleeter

sdenton4 · on Oct 23, 2021

Looks like demucs is using a learned filter bank (so no handling of fft phase), and open umix and spleeter are using spectrogram magnitude masking (just reuse phase from original fft in all channels). Can't immediately tell what bytedance is doing.

herodotus · on Oct 23, 2021

How well would this kind of splitter work for Opera? I suspect that the separation problem is significantly more difficult.