
Spleeter – Music Source-Separation Engine - jph98
https://deezer.io/releasing-spleeter-deezer-r-d-source-separation-engine-2b88985e797e
======
roddylindsay
Once this technology gets incorporated into DJ mixers / CDJs, this is going to
make DJing much more creatively interesting.

Historically, blending between mixed stereo tracks has limited to mixing EQ
bands, but now DJs will be able to layer and mix the underlying stems
themselves -- like putting the vocal from one track onto an instrumental
section on another (even if there were never a capella / instrumental versions
released.)

It also opens up a previously unreachable world for amateur remixing in
general; for instance, creating surround sound mixes from stereo or even mono
recordings for playback in 3D audio environments like Envelop
([https://envelop.us](https://envelop.us)) [disclaimer: I am one of the co-
founders of Envelop]

~~~
Juliate
Wouldn't it be much more efficient for everyone (and even lucrative for the
owners) to also provide the studio stems at a slightly higher/different price?

(not that some of these are not already available when you know where to
search, but it's not very... structured)

~~~
marksomnian
Native Instruments tried that with their Stems [0] project. Didn't seem to get
all too far though.

[0]: [https://www.native-
instruments.com/en/specials/stems/](https://www.native-
instruments.com/en/specials/stems/)

~~~
Juliate
This one is proprietary to them.

The "open source" format/practice exists already: just bounce your mix into
separate audio files, one for each track or group, into a folder, zip it,
ship.

Only, it's not (yet) much embraced on the commercial side. When you pay up to
20 boxes to get a full album, why couldn't you pay, say, 100 to get the same
album but with separated tracks for your own use + instructions as who to
contact & how for any other kind of uses?

------
SyneRyder
For anyone who wants to try Spleeter in a version that "just works" without
having to install TensorFlow and mess with offline processing, Spleeter has
been been built into a wave editor called Acoustica from Acon Digital. It's
been working really well for me, and the whole package is solid competition to
editors like iZotope RX:

[https://acondigital.com/products/acoustica-audio-
editor/](https://acondigital.com/products/acoustica-audio-editor/)

~~~
theobr
I've been trying for months to make redistributable Spleeter "binaries" that I
can bundle with user-facing applications. Happy to see someone's succeeded
where I've failed. Really sad they've chosen not to share their changes :(

I emailed them requesting more info on how their implementation works. I think
this might be a violation of the MIT license?

~~~
SyneRyder
The MIT license isn't copyleft, there's no obligation to share modifications
or provide source - just to acknowledge the copyright / credit.

But they seem friendly and proactive (from my experience on music forums
anyway), so hopefully you'll get a helpful reply.

------
mwcampbell
Previous discussion, where I posted a demo using a full song (legally under
Creative Commons):

[https://news.ycombinator.com/item?id=21431071](https://news.ycombinator.com/item?id=21431071)

Note: I'm not affiliated with this project; I just think it's cool.

~~~
dang
I'm going to pretend that we didn't see this (otherwise extremely helpful)
link to a major discussion from 6 months ago, so as not to have to mark the
current post a dupe.

------
svat
I often have voice recordings with a lot of background noise (e.g. a public
lecture in a room with poor acoustics, recorded from a phone in the audience —
there's usually sounds of paper rustling, noises from the street, etc). Is
this "source-separation" the sort of thing that could help, or does anyone
have other tips? The best thing I have so far is based on this
[https://wiki.audacityteam.org/wiki/Sanitizing_speech_recordi...](https://wiki.audacityteam.org/wiki/Sanitizing_speech_recordings_made_with_portable_audio_recorders#A_simple_two-
step_process_taking_a_minute) —

(1) Open the file in Audacity and switch to Spectrogram view, (2) set a high-
pass filter with ~150 Hz, i.e. filter out frequencies lower than that (which
tend to be loud anyway), (3) _don’t_ remove the higher frequencies (which
aren’t loud), because they are what make the consonants understandable
(apparently), (4) look for specific noises, select the rectangle, and use
“Spectral Edit Multi Tool”.

But if machine learning can help that would be really interesting! This
Spleeter page does mention “active listening, educational purposes, […]
transcription” so I'm excited.

~~~
Guillaume86
You could give a shot to the Nvidia RTX Voice plugin if you have one of the
compatible cards. I'm not sure how it deals with low background noises, the
youtube reviews mostly tested it with over the top cases like a vacuum cleaner
next to the speaker.

~~~
Cactus2018
Works on non-RTX cards too [https://arstechnica.com/gaming/2020/04/you-can-
get-nvidias-r...](https://arstechnica.com/gaming/2020/04/you-can-get-nvidias-
rtx-voice-noise-filtering-without-a-pricey-rtx-card/)

------
iseanstevens
There is a Max/Ableton live plugin version here, which makes it much easier to
experiment with Spleeter artistically.

[https://github.com/diracdeltas/spleeter4max/releases/](https://github.com/diracdeltas/spleeter4max/releases/)

~~~
tomduncalf
Nice! I had this idea and was too lazy to do it haha, glad someone else wasn’t

------
tomduncalf
Another recent open source contender for source separation is Open Unmix:
[https://github.com/sigsep/open-unmix-
pytorch/](https://github.com/sigsep/open-unmix-pytorch/)

I’ve not had time to try it yet but have read good things.

~~~
tomduncalf
Just tried this and it's really impressive, I'd say it does a nicer job on
vocals than Spleeter. Less of the "underwater" effect compared to what I
remember of Spleeter.

------
voiper1
Very cool!

I was even able to run it on their notebook
[https://colab.research.google.com/github/deezer/spleeter/blo...](https://colab.research.google.com/github/deezer/spleeter/blob/master/spleeter.ipynb)
without setting anything up locally.

The results of vocal separation were quite impressive.

~~~
antman
Can you share the outputs please?

~~~
voiper1
Sorry, I tried it on a (typical) copyrighted song.

------
leoncvlt
Here's the sample output, for those who are curious:

\- Sample track:
[https://files.catbox.moe/56op27.mp3](https://files.catbox.moe/56op27.mp3)

\- Spleeted vocals:
[https://files.catbox.moe/4d9aru.wav](https://files.catbox.moe/4d9aru.wav)

\- Spleeted accompaniment:
[https://files.catbox.moe/y67g23.wav](https://files.catbox.moe/y67g23.wav)

------
Myce
A local radiostation has a broadcast of four hours. They are required to play
an x amount of music tracks by the station (about 6 per hour), but there has
been demand to make the broadcast available as podcast without the music.

Could this make it possible to automatically remove the music from the MP3
file they have available? With 6 tracks per hour times 4 hours, manually
removing the music is time consuming.

I doubt it, as it seems all vocals are are output to a single file...

Is there any other tool someone can recommend?

~~~
lozf
Presumably they own the rights on broadcast material, so they'd have to be
directly involved in the podcast production. That given, it would probably be
more straight forward to take the microphone feeds from their broadcast desk
(via "aux-out" perhaps) and record only the spoken output separately.

Sox etc. could be used for silence detection, probably best done in post
(scriptable), but could be piped through after experimenting with settings.
Otherwise, even old desks can trigger when a mic channel fader is raised, so
that too is a possibility for pausing the recording during music.

------
jph98
Leveraging a state-of-the-art source separation algorithm for music
information retrieval

[https://www.youtube.com/watch?time_continue=42&v=JIR6HJISrtY...](https://www.youtube.com/watch?time_continue=42&v=JIR6HJISrtY&feature=emb_logo)

------
TedDoesntTalk
Now we can create all-star bands that never existed. For example:

Neil schon from journey. Lead guitar

Heart sisters doing lead vocals and lead/rthyum guitar

Flea -- bass guitar from Chili Peppers

Neal Peart -- drummer from rush

Tony kay --- keys from genesis

The only difficulty is they must all be playing the same song. Then we can
extract, transpose if needed, and remix together.

~~~
floatrock
We can deep fake vocals and redraw your photos as if they were painted by van
gogh... I'm sure someone has trained something that immortalizes different
artists into their AI instrumental avatar.

If not, I'm sure if you ask nicely Amazon will give you a few credits to burn
on a pandemic art project.

------
grawprog
I couldn't find any examples so was wondering for anyone that's tried this are
the results better than using a bandpass filter and an equalizer to isolate
frequencies or one of those auto karaoke things?

Because the ability to separate any song into separate tracks would be
amazing. The ability to remix any song or just play with any instrument or
vocal track would be awesome. But does it have the same poor quality and
limitations of most frequency based source separation?

~~~
tomduncalf
Yeah, the results are a lot better than filtering... deep learning has pushed
the state of the art in source separation on quite a lot recently.

It isn’t magical and the results still have artefacts (mostly that kind of
slightly underwater sound of a low bitrate MP3, I believe due to the way the
audio is reconstructed from FFTs), and some songs trip it up entirely, but
it’s definitely worth playing around with and I think it could potentially
have applications for DJ/remix use if you added enough effects etc.

It’s fairly easy to install and runs quickly without GPU, or you can try their
Collab notebook, or seems someone has hosted a version at
[https://ezstems.com/](https://ezstems.com/)

------
marksomnian
Had a play with the Colab and it's quite good indeed. The authors claim "100x
real time speed", which is mighty impressive, but I'd be more interested in
seeing a "Try Really Hard" mode, trading off quality and speed. Is that a
thing that can be done in the current code, I wonder?

------
mehrdadn
If you're trying to run it on Windows with Python 3.8, add numpy and cython to
the dependencies, and change Tensorflow's requirement to be >= rather than ==.

Though then you'll run into compatibility errors like "No module named
'tensorflow.contrib'" which you'll have to fix.

------
mbushey
While this is awesome, it's trained on MUSDB18-HQ which as far as I can tell
is proprietary. zenodo.org claims it is available, however I have filled out
their "request access" page a half-dozen times. Does anyone know of a training
data-set that's possible to obtain?

Here is the zenodo response:

Your access request has been rejected by the record owner.

Message from owner: no justification given

Record: MUSDB18-HQ - an uncompressed version of MUSDB18
[https://zenodo.org/record/3338373](https://zenodo.org/record/3338373)

The decision to reject the request is solely under the responsibility of the
record owner. Hence, please note that Zenodo staff are not involved in this
decision.

------
pabs3
This reminds me of this open source project (and its predecessor manyears and
open hardware projects 8/16soundsusb).

[https://github.com/introlab/odas](https://github.com/introlab/odas)
[https://github.com/introlab/manyears](https://github.com/introlab/manyears)
[https://github.com/introlab/16SoundsUSB](https://github.com/introlab/16SoundsUSB)

Website of the team behind these:

[https://introlab.3it.usherbrooke.ca/](https://introlab.3it.usherbrooke.ca/)

------
TheOtherHobbes
Out of interest, and to put this in context - your brain can only do this for
conversation, not music.

You routinely suppress background noise and room acoustics when listening to
someone speaking. But you don't do the same thing when listening to music. At
best you can focus on individual elements in a track, and you can parse them
musically (and maybe lyrically).

But you don't suppress the rest to the point where you don't hear it.

~~~
michaelcampbell
Is there some research behind this or are you opining on how YOUR brain works?

~~~
aasasd
[https://en.wikipedia.org/wiki/Cocktail_party_effect](https://en.wikipedia.org/wiki/Cocktail_party_effect)

Not sure how this pertains to music, but this ability normally requires
localizing different voices and noises.

------
fit2rule
This is ultra-cool .. I have a few terabytes of jam-session recordings that
I'm going to throw at this. If it ends up being usable to the point that I can
re-do vocals over some of the greatest moments in the archive, I'll be
praising whatever Spleeter deity makes itself visible to me at the time, most
highly ..

------
fold_left
Once you have obtained just the Guitar from a track, are there any tools out
there which can work out the Tablature (eg. [https://www.ultimate-
guitar.com//top/tabs](https://www.ultimate-guitar.com//top/tabs)) so you can
play along?

~~~
dspig
[https://products.zplane.de/decoda](https://products.zplane.de/decoda) can
help

------
InstaHeads
Well, it seems neural networks started to appear for vocal and instrumental
track isolation^^ recently I've discovered
[https://www.lalal.ai](https://www.lalal.ai) and it works quite well

------
philipov
I tried using the 2 stem model to remove the music from an audio recording of
two people talking. It kept sucking in some of the music whenever someone
started talking, however. Is there a better model to use for that?

------
FraKtus
It says it can be 100 times faster than in real-time.

So can it be run in real-time?

I am thinking about extracting features for music visualization but it could
make a DJ happy also.

~~~
kleiba
Sometimes the distinction is made between "real-time" and "online" processing.

The first one refers to the speed of the processing in relation to the length
of the recording - so, say, you can process a 10 minute recording in 1 minute
then you're 10x real-time. However, your analysis might require the full track
to be available for best outcomes, and so you cannot really start with the
processing until the full source is available.

The latter is what "online" processing refers-to, the ability to process on-
the-fly in parallel to the recording. Obviously, this cannot be faster than
real-time ;-) but hopefully it is not slower, either. Often times, though, you
get a (somewhat constant and) hopefully slow offset, i.e., you can process a
10 minute recording online in the same time but you need another 10 seconds on
top of that.

This is, by the way, not restricted to source separation, it applies to other
disciplines as well, say, automatic speech recognition.

~~~
FraKtus
Exactly, while fast if this method needs to parse the full track before
starting to generate the results then it can't be used in real-tine.

To be used with arbitrary audio in real-time, after initialization and setup
you need an API that looks like:

ProcessAudio (samples, num_namples)

And it would return n packets of num_namples samples.

------
manceraio
You could try spleeter on the cloud here
[https://voxremover.com](https://voxremover.com)

------
philipov
The output appears to cut off after 10 minutes. How do you make it operate on
longer files, like in the 100 minute range?

------
jbverschoor
Deezer is pretty useless if all supported hardware require your phone to
stream.

They should spend dev time on something that matters

------
peterhookgen
This is very cool, I have started using it for experimenting creating
hardstyle dance remixes of popular songs

