Historically, blending between mixed stereo tracks has limited to mixing EQ bands, but now DJs will be able to layer and mix the underlying stems themselves -- like putting the vocal from one track onto an instrumental section on another (even if there were never a capella / instrumental versions released.)
It also opens up a previously unreachable world for amateur remixing in general; for instance, creating surround sound mixes from stereo or even mono recordings for playback in 3D audio environments like Envelop (https://envelop.us) [disclaimer: I am one of the co-founders of Envelop]
I shall use this knowledge and endeavour to share it where possible.
(not that some of these are not already available when you know where to search, but it's not very... structured)
The "open source" format/practice exists already: just bounce your mix into separate audio files, one for each track or group, into a folder, zip it, ship.
Only, it's not (yet) much embraced on the commercial side. When you pay up to 20 boxes to get a full album, why couldn't you pay, say, 100 to get the same album but with separated tracks for your own use + instructions as who to contact & how for any other kind of uses?
I emailed them requesting more info on how their implementation works. I think this might be a violation of the MIT license?
But they seem friendly and proactive (from my experience on music forums anyway), so hopefully you'll get a helpful reply.
Note: I'm not affiliated with this project; I just think it's cool.
(1) Open the file in Audacity and switch to Spectrogram view,
(2) set a high-pass filter with ~150 Hz, i.e. filter out frequencies lower than that (which tend to be loud anyway),
(3) don’t remove the higher frequencies (which aren’t loud), because they are what make the consonants understandable (apparently),
(4) look for specific noises, select the rectangle, and use “Spectral Edit Multi Tool”.
But if machine learning can help that would be really interesting! This Spleeter page does mention “active listening, educational purposes, […] transcription” so I'm excited.
Cheaper versions of RX still have various noise reduction tools, de-verb for reducing reverb and room echo, and a range of spectral editing tools as well.
Sorry it's not something you can use now, but I just thought I would mention it! I also did a quick Google search but unfortunately I couldn't find any AI noise removal tools that might solve this problem.
I’ve not had time to try it yet but have read good things.
I was even able to run it on their notebook https://colab.research.google.com/github/deezer/spleeter/blo... without setting anything up locally.
The results of vocal separation were quite impressive.
- Sample track: https://files.catbox.moe/56op27.mp3
- Spleeted vocals: https://files.catbox.moe/4d9aru.wav
- Spleeted accompaniment: https://files.catbox.moe/y67g23.wav
Could this make it possible to automatically remove the music from the MP3 file they have available? With 6 tracks per hour times 4 hours, manually removing the music is time consuming.
I doubt it, as it seems all vocals are are output to a single file...
Is there any other tool someone can recommend?
Sox etc. could be used for silence detection, probably best done in post (scriptable), but could be piped through after experimenting with settings. Otherwise, even old desks can trigger when a mic channel fader is raised, so that too is a possibility for pausing the recording during music.
Audacity. I can think of two ways.
0. Import into audacity
1. Playback the recording at 4x (1 hour of playback for 4 hours real-time broadcast). Mark the edges where music stops and starts. You have to do that 12 times for 6 songs. Youll have to slow down near the changes in order catch the precise time of an edge. Delete the music between the two edges. Repeat 5 more times.
There may be audacity plugins that do what you want or do something closer to it.
2.use some combination of low pass and high pass filters to remove the music. It's not going to be perfect and you'll still need to edit out the filtered music anyway.
Neil schon from journey. Lead guitar
Heart sisters doing lead vocals and lead/rthyum guitar
Flea -- bass guitar from Chili Peppers
Neal Peart -- drummer from rush
Tony kay --- keys from genesis
The only difficulty is they must all be playing the same song. Then we can extract, transpose if needed, and remix together.
If not, I'm sure if you ask nicely Amazon will give you a few credits to burn on a pandemic art project.
Because the ability to separate any song into separate tracks would be amazing. The ability to remix any song or just play with any instrument or vocal track would be awesome. But does it have the same poor quality and limitations of most frequency based source separation?
It isn’t magical and the results still have artefacts (mostly that kind of slightly underwater sound of a low bitrate MP3, I believe due to the way the audio is reconstructed from FFTs), and some songs trip it up entirely, but it’s definitely worth playing around with and I think it could potentially have applications for DJ/remix use if you added enough effects etc.
It’s fairly easy to install and runs quickly without GPU, or you can try their Collab notebook, or seems someone has hosted a version at https://ezstems.com/
Though then you'll run into compatibility errors like "No module named 'tensorflow.contrib'" which you'll have to fix.
Here is the zenodo response:
Your access request has been rejected by the record owner.
Message from owner:
no justification given
MUSDB18-HQ - an uncompressed version of MUSDB18
The decision to reject the request is solely under the responsibility of the record owner. Hence, please note that Zenodo staff are not involved in this decision.
https://github.com/introlab/odas https://github.com/introlab/manyears https://github.com/introlab/16SoundsUSB
Website of the team behind these:
You routinely suppress background noise and room acoustics when listening to someone speaking. But you don't do the same thing when listening to music. At best you can focus on individual elements in a track, and you can parse them musically (and maybe lyrically).
But you don't suppress the rest to the point where you don't hear it.
To somebody with APD, it sounds like science fiction, although it does require more suspension of disbelief than faster than light travel or teleportation.
Not sure how this pertains to music, but this ability normally requires localizing different voices and noises.
Sorry for the thumbnail :-D btw
So can it be run in real-time?
I am thinking about extracting features for music visualization but it could make a DJ happy also.
The first one refers to the speed of the processing in relation to the length of the recording - so, say, you can process a 10 minute recording in 1 minute then you're 10x real-time. However, your analysis might require the full track to be available for best outcomes, and so you cannot really start with the processing until the full source is available.
The latter is what "online" processing refers-to, the ability to process on-the-fly in parallel to the recording. Obviously, this cannot be faster than real-time ;-) but hopefully it is not slower, either. Often times, though, you get a (somewhat constant and) hopefully slow offset, i.e., you can process a 10 minute recording online in the same time but you need another 10 seconds on top of that.
This is, by the way, not restricted to source separation, it applies to other disciplines as well, say, automatic speech recognition.
To be used with arbitrary audio in real-time, after initialization and setup you need an API that looks like:
ProcessAudio (samples, num_namples)
And it would return n packets of num_namples samples. One packet for each generated track.
created an max for live native version of spleeter and demos it here:
It's way faster than real time, im not sure why slowing it down would be an advantage. You still need to take the resultant data and do things with them, as a dj, and faster is better.
They should spend dev time on something that matters