
Open-source AI tool quickly isolates the vocals in any song - DyslexicAtheist
https://www.theverge.com/2019/11/5/20949338/vocal-isolation-ai-machine-learning-deezer-spleeter-automated-open-source-tensorflow
======
ksaj
I'd thought about this kind of process:

Take a stereo image, and subtract left from right to make a mask. It'll
essentially be the sound from anything that wasn't centered (usually main
vocals and a few other things that are centered). Then use that resulting file
as a mask for both the right and left again, which will isolate the middle,
leaving just the vocals and some artifacts from other centered audio.

One of these (the left or the right channel) is likely to be cleaner than the
other, depending on how/where the stereo stuff was mixed. Listen or analyze
them to determine if one is better than the other, and then continue working
on that one.

With the better of the two resulting "center" tracks, use this AI tool to
clean up any remaining artifacts that don't belong there.

In a way, it's a hybrid analog/digital method of vocal extraction, but it
would probably produce a lot cleaner of a vocal track.

And of course if you wanted a good stereo backing track to sing Karaoke over,
you can use that clean vocal extraction as a mask for both the original right
and left channels, which in theory would produce a better karaoke file than
the original centered-audio elimination.

I think this would in essence be a modified/improved version of Dolby
3-Stereo.

------
detaro
Discussion of the project, 175 comments:
[https://news.ycombinator.com/item?id=21431071](https://news.ycombinator.com/item?id=21431071)

------
DyslexicAtheist
original deezer PR post (didn't get much attention here):
[https://news.ycombinator.com/item?id=21456673](https://news.ycombinator.com/item?id=21456673)

spleeter project:
[https://github.com/deezer/spleeter](https://github.com/deezer/spleeter)

