Ask HN: How to remove voice from a music file?
121 points by mettamage on Oct 30, 2019 | hide | past | favorite | 46 comments
I've seen some methods (e.g. inverting one stereo channel and cancelling it out with the others) but I was wondering if someone has achieved how to do this relatively well.

I was looking online for some AI solutions, but I couldn't find anything, would the collective knowledge of HN have an answer to this question?

The technical name for this problem is "blind source separation" [1] and is related to the "cocktail party effect" [2].

[1] http://www.mit.edu/~gari/teaching/6.555/LECTURE_NOTES/ch15_b...

[2] https://en.wikipedia.org/wiki/Cocktail_party_effect

Commercial solutions are out there, but imo/e they're not always that great and the problem is of prime interest for researchers both in industry and academia.

There's not currently a plug and play solution that will work well on all types of music, but there's a lot of research happening in this area. If you're interested in digging into some of the cutting edge source separation algorithms there's a great python library called nussl that provides implementations of many of them. https://interactiveaudiolab.github.io/nussl/

Izotope RX can do a pretty credible job of this, although it depends a lot on the source file. This is a commercial product, though, so I'm not sure if it is what you are looking for. https://ask.audio/articles/isolating-separating-remixing-usi...

Professional audio person here. RX is the tool to use, but even for someone skilled at this, it's going to be tough and will likely result in some degradation of the rest of the music.

Get in touch and I'll give it a shot, but no promises.

Note that the e-mail field on HN is not visible to other users. If you want others to be able to contact you by e-mail, you should put your email address in the “about” box as well.

And probably when you do that you might want to type it out as

yourname at example com

Rather than as “yourname@example.com”.

Still, even if you write it that way there are probably some scrapers out there that will recognize it as an email and include it in some list.

You could try to be a bit cryptic like some do though. If for example the local-part of your email address is the same as your HN username, and you are using GMail, you could write something like:

“You can reach me on GMail. My username there is the same as the one I use on HN.”

Just don’t be too cryptic, or you won’t just ensure that scrapers can’t parse it — even people legitimately wanting to get in touch could be unable to understand what your address actually is :P

Alternatively, instead of stating your email address in any form you could put a link to your profile on some other platform where people can reach you.

I'd love to! But I don't see your email. My email is in my profile bio.

If anyone wants to give it a shot removing Donna vocals from some choice 70's dead, much obliged! Thanks in advance.

If there's a trial version, I'm willing to try it.

Yep, iZotope RX7 has a 10-day free trial. The feature you're looking for is Music Rebalance:


I downloaded the trial a few weeks ago. The results were not great unfortunately.

Interestingly, many years ago I had this start happening with my car stereo. The vocals were mostly missing but the other instruments were there. When it got to the instumental solo, the main instrument was missing. This happened on 90% of songs.

Turned out I had accidentally caused wires to come loose in the trunk, leaving the speakers wired in series, which caused the stereo channel cancelation effect.

It's a common problem with headphones when the wire becomes damaged, allowing two audio channels to become electrically connected. One could even simulate this effect by delibrately not inserting an audio jack all the way in on some devices.

I have tried many options, and https://phonicmind.com/ has had the best results.

It can do a pretty impressive job, it does help to use high quality source material. Using mp3 rips from youtube might get a little more artifact-y. sometimes, vocal like instruments will get pulled into the vocal rip instead of the "karaoke" instrument part of the extract.

An interesting side effect I've seen with FLAC vinyl rips is an interesting type of noise removal on the "karaoke" tracks.

I've always wondered where karaoke bars get vocal-free versions of seemingly every track that would ever get requested. Do record companies make them?

A lot are rerecorded by studio musicians, I believe.

i'd be curious about the copyright legality here...it must be expensive to pay royalties for a huge library of instrumental tracks.

Cover versions can always be made under a compulsory license; you have to pay the songwriter, but they can't say no.

So recording a version with a separate vocal track is 100% doable for any song.

However, to actually use it in Karaoke, it's no longer just a cover; even though it's just the words that are shown along with it, it's part of a larger work, so you need the same sort of license you would need for using it in e.g. a TV ad, and that needs to be negotiated with the copyright owner.


For more details google "mechanical license" and "synchronization license"; the first is the "I want to record a cover" and the second is what you would need for using in karaoke.

That’s fascinating, thanks for the google prompts.

I would assume they're released under different licenses for manufacturers of those products. This is a lot in my mind like how Microsoft Windows Enterprise exists, but is only attainable via volume licensing or piracy; The product exists, and is for sale, but the general public just isn't permitted to buy it.

Some of them are licensed official tracks from the actual producers, but for the unofficial ones I think the karaoke companies have music writers who just encode synths to roughly match the track.

I believe this is only effectively possible, not fully possible. Inherently, music and the voice will share some of the same frequency samples, since they are discrete. I’m sure you might be able to get a solution that works to the human ear, but I don’t know that it’s possible to perfectly strip out one or the other.

I spent some time researching this (albeit in 2014) for my wife. Heres the best solution I could come up with at the time: https://ojm.co/blog/using-audacity-remove-vocals-audio-free/

Voice is often at the centre of a stereo recording so invert the phase of one channel and combine?

that is the basic premise. there are also audio engineering effects to deal with such as reverb, chorus, delay and phasing due to poor input quality. lets not get into the likelihood of layered vocal tracks. TL;DR: you want the original raw vocal to do the best removal, and if you have that then you probably already have the tracked version of the mixdown so you can just mute the vocal layers.

in my DJ days, you could just write a record label and ask for instrumental or vocal only demo tracks to perform mixes (this is how dj's became popular in the NYC scene). 80% of the time its a yes, and 90% of those vocal tracks were "audiomarked" copies so you could only make demos (but didn't stop people from using the non-watermarked part for a 15 second clip in their DJ mix.

anyway, what i'm saying is that your method works for basic recordings. as you move up the production food chain you need to become more creative in your methods

Most of the energy in an audio signal is at the center of the image, so in practice you don't really remove that much.

There is a thread on the homepage that might be useful to you: https://news.ycombinator.com/item?id=21431071

There are several products from Audionamix that might be worth checking out :


This is not a direct answer to the technical aspect of your question, but if you know who produced the music, they might give/sell you a version that is missing the vocal tracks.

In my particular case that is unfortunately not possible. But otherwise, I'd go the social route as well.

Welcome to the rabbit hole that is the inverse problem.

I always wonder where DJs get the instrumental versions to sample/mix.

A lot of places sell em, look up “DJ Stems” to get the idea.

My date was sending me mixed signals, so I did a fourier analysis.

What about the inverse, isolating only the voice?

For a perfect solution, it would simply be a matter of subtracting one waveform from the other. I suspect both isolated voice and isolated music would have significant "noise" leftover. This would likely be more noticable in the voice due to our increased perception of odd vocal sounds.

if its like talking, fourier helps !

Ah, unfortunately, it's singing. But good to know! That knowledge does come in handy :)

Can you explain why would fourier help on talking and not singing?

maybe because voice has a very low frequency range so its easier to separate from the high frequency noises.

This is a longstanding problem in audio engineering, along the lines of "how does one un-bake a cake to get the eggs?" There's always going to be artefacts and distortion ranging from unpleasant to extreme, hence audio engineers when mixing from stems/channels will do a stereo instrumental mix and a mix with vocals.

Yeah, it's impossible to do it perfectly as long as human voices' frequencies overlap with instruments, basically.

It's like asking to recover layers from a PNG/JPG File.

