
Comparison of state-of-the-art music source separation models on Californication - lapink
https://soundcloud.com/voyageri/sets/source-separation-in-the-waveform-domain
======
ksaj
I'm surprised at how much better Tasnet and especially Demucs are than
Spleeter at pulling the bass guitar out.

Overall, it's obvious that there is a long ways to go in this technology. But
it makes me wonder how close all this gets to how we (humans) can narrow in on
a single voice in a crowd so clearly.

~~~
lapink
Speech source separation has gone a long way, thanks to Yi Luo amazing work.
With Dual Path RNN, he now achieves almost 20 Signal to Noise Ratio for 2
speaker separation, see [1]. This is a bit of an artificial setting though,
only two speakers and they are manually mixed together. I'm not sure if there
is any good dataset of speech source separation in real environments (an
airport, restaurant etc).

[1]:
[https://arxiv.org/pdf/1910.06379.pdf](https://arxiv.org/pdf/1910.06379.pdf)

------
gnat
Anyone have a pointer to a write-up with who did this, identifying the
software being compared?

~~~
lapink
Author here, this is part of the release of Demucs, you can find more
information on my repo:
[https://github.com/facebookresearch/demucs](https://github.com/facebookresearch/demucs)

