
Ask HN: How to remove voice from a music file? - mettamage
I&#x27;ve seen some methods (e.g. inverting one stereo channel and cancelling it out with the others) but I was wondering if someone has achieved how to do this relatively well.<p>I was looking online for some AI solutions, but I couldn&#x27;t find anything, would the collective knowledge of HN have an answer to this question?
======
unlinked_dll
The technical name for this problem is "blind source separation" [1] and is
related to the "cocktail party effect" [2].

[1]
[http://www.mit.edu/~gari/teaching/6.555/LECTURE_NOTES/ch15_b...](http://www.mit.edu/~gari/teaching/6.555/LECTURE_NOTES/ch15_bss.pdf)

[2]
[https://en.wikipedia.org/wiki/Cocktail_party_effect](https://en.wikipedia.org/wiki/Cocktail_party_effect)

Commercial solutions are out there, but imo/e they're not always that great
and the problem is of prime interest for researchers both in industry and
academia.

------
hnaccount141
There's not currently a plug and play solution that will work well on all
types of music, but there's a lot of research happening in this area. If
you're interested in digging into some of the cutting edge source separation
algorithms there's a great python library called nussl that provides
implementations of many of them.
[https://interactiveaudiolab.github.io/nussl/](https://interactiveaudiolab.github.io/nussl/)

------
paulrpotts
Izotope RX can do a pretty credible job of this, although it depends a lot on
the source file. This is a commercial product, though, so I'm not sure if it
is what you are looking for. [https://ask.audio/articles/isolating-separating-
remixing-usi...](https://ask.audio/articles/isolating-separating-remixing-
using-izotope-rx7-music-rebalance-module)

~~~
rewgs
Professional audio person here. RX is the tool to use, but even for someone
skilled at this, it's going to be tough and will likely result in some
degradation of the rest of the music.

Get in touch and I'll give it a shot, but no promises.

~~~
codetrotter
Note that the e-mail field on HN is not visible to other users. If you want
others to be able to contact you by e-mail, you should put your email address
in the “about” box as well.

And probably when you do that you might want to type it out as

yourname at example com

Rather than as “yourname@example.com”.

Still, even if you write it that way there are probably some scrapers out
there that will recognize it as an email and include it in some list.

You could try to be a bit cryptic like some do though. If for example the
local-part of your email address is the same as your HN username, and you are
using GMail, you could write something like:

“You can reach me on GMail. My username there is the same as the one I use on
HN.”

Just don’t be _too_ cryptic, or you won’t just ensure that scrapers can’t
parse it — even people legitimately wanting to get in touch could be unable to
understand what your address actually is :P

Alternatively, instead of stating your email address in any form you could put
a link to your profile on some other platform where people can reach you.

------
robbrown451
Interestingly, many years ago I had this start happening with my car stereo.
The vocals were mostly missing but the other instruments were there. When it
got to the instumental solo, the main instrument was missing. This happened on
90% of songs.

Turned out I had accidentally caused wires to come loose in the trunk, leaving
the speakers wired in series, which caused the stereo channel cancelation
effect.

~~~
Laforet
It's a common problem with headphones when the wire becomes damaged, allowing
two audio channels to become electrically connected. One could even simulate
this effect by delibrately not inserting an audio jack all the way in on some
devices.

------
hantusk
A few relevant links:

[https://www.celemony.com/](https://www.celemony.com/) Celemony melodyne

[https://www.youtube.com/watch?v=FMEk8cHF-
OA](https://www.youtube.com/watch?v=FMEk8cHF-OA)

[https://www.youtube.com/watch?v=zL6ltnSKf9k](https://www.youtube.com/watch?v=zL6ltnSKf9k)

[https://github.com/f90/Wave-U-Net](https://github.com/f90/Wave-U-Net)

------
superfamicom
I have tried many options, and
[https://phonicmind.com/](https://phonicmind.com/) has had the best results.

~~~
erik_p
It can do a pretty impressive job, it does help to use high quality source
material. Using mp3 rips from youtube might get a little more artifact-y.
sometimes, vocal like instruments will get pulled into the vocal rip instead
of the "karaoke" instrument part of the extract.

~~~
superfamicom
An interesting side effect I've seen with FLAC vinyl rips is an interesting
type of noise removal on the "karaoke" tracks.

------
psychometry
I've always wondered where karaoke bars get vocal-free versions of seemingly
every track that would ever get requested. Do record companies make them?

~~~
BurningFrog
A lot are rerecorded by studio musicians, I believe.

~~~
dajohnson89
i'd be curious about the copyright legality here...it must be expensive to pay
royalties for a huge library of instrumental tracks.

~~~
aidenn0
Cover versions can always be made under a compulsory license; you have to pay
the songwriter, but they can't say no.

So recording a version with a separate vocal track is 100% doable for any
song.

However, to actually use it in Karaoke, it's no longer just a cover; even
though it's just the words that are shown along with it, it's part of a larger
work, so you need the same sort of license you would need for using it in e.g.
a TV ad, and that needs to be negotiated with the copyright owner.

[edit]

For more details google "mechanical license" and "synchronization license";
the first is the "I want to record a cover" and the second is what you would
need for using in karaoke.

~~~
anamexis
That’s fascinating, thanks for the google prompts.

------
elamje
I believe this is only effectively possible, not fully possible. Inherently,
music and the voice will share some of the same frequency samples, since they
are discrete. I’m sure you might be able to get a solution that works to the
human ear, but I don’t know that it’s possible to perfectly strip out one or
the other.

------
sachinsmc
[https://github.com/deezer/spleeter](https://github.com/deezer/spleeter) worth
checking once

------
ojm
I spent some time researching this (albeit in 2014) for my wife. Heres the
best solution I could come up with at the time: [https://ojm.co/blog/using-
audacity-remove-vocals-audio-free/](https://ojm.co/blog/using-audacity-remove-
vocals-audio-free/)

------
abdulhaq
Voice is often at the centre of a stereo recording so invert the phase of one
channel and combine?

~~~
grepfru_it
that is the basic premise. there are also audio engineering effects to deal
with such as reverb, chorus, delay and phasing due to poor input quality. lets
not get into the likelihood of layered vocal tracks. TL;DR: you want the
original raw vocal to do the best removal, and if you have that then you
probably already have the tracked version of the mixdown so you can just mute
the vocal layers.

in my DJ days, you could just write a record label and ask for instrumental or
vocal only demo tracks to perform mixes (this is how dj's became popular in
the NYC scene). 80% of the time its a yes, and 90% of those vocal tracks were
"audiomarked" copies so you could only make demos (but didn't stop people from
using the non-watermarked part for a 15 second clip in their DJ mix.

anyway, what i'm saying is that your method works for basic recordings. as you
move up the production food chain you need to become more creative in your
methods

------
sellingwebsite
There is a thread on the homepage that might be useful to you:
[https://news.ycombinator.com/item?id=21431071](https://news.ycombinator.com/item?id=21431071)

------
timrichard
There are several products from Audionamix that might be worth checking out :

[https://audionamix.com/shop-adx/](https://audionamix.com/shop-adx/)

------
LinuxBender
This is not a direct answer to the technical aspect of your question, but if
you know who produced the music, they might give/sell you a version that is
missing the vocal tracks.

~~~
mettamage
In my particular case that is unfortunately not possible. But otherwise, I'd
go the social route as well.

------
monotoSTEREO
Those interested in removing voice from a music file may wish to check out the
many resources available at my website, monotoSTEREO.info
([https://www.monotostereo.info](https://www.monotostereo.info)). There is
also a companion Facebook page
([https://www.facebook.com/monotostereo.info](https://www.facebook.com/monotostereo.info))
where I post updates and related content. "Like" us on Facebook to follow the
page for the updates! Be sure to check out the many examples on the MEDIA
pages of the website!

------
person_of_color
Welcome to the rabbit hole that is the inverse problem.

------
conductr
I always wonder where DJs get the instrumental versions to sample/mix.

~~~
peapicker
A lot of places sell em, look up “DJ Stems” to get the idea.

------
villmann
My date was sending me mixed signals, so I did a fourier analysis.

------
techload
What about the inverse, isolating only the voice?

~~~
percutaneous
For a perfect solution, it would simply be a matter of subtracting one
waveform from the other. I suspect both isolated voice and isolated music
would have significant "noise" leftover. This would likely be more noticable
in the voice due to our increased perception of odd vocal sounds.

------
pvtmert
if its like talking, fourier helps !

~~~
karambahh
Can you explain why would fourier help on talking and not singing?

~~~
akabaka777
maybe because voice has a very low frequency range so its easier to separate
from the high frequency noises.

------
dx7tnt
This is a longstanding problem in audio engineering, along the lines of "how
does one un-bake a cake to get the eggs?" There's always going to be artefacts
and distortion ranging from unpleasant to extreme, hence audio engineers when
mixing from stems/channels will do a stereo instrumental mix and a mix with
vocals.

~~~
jotm
Yeah, it's impossible to do it perfectly as long as human voices' frequencies
overlap with instruments, basically.

It's like asking to recover layers from a PNG/JPG File.

