
Spleeter: Extract voice, piano, drums, etc. from any music track - dsr12
https://github.com/deezer/spleeter
======
mwcampbell
For your listening pleasure, here's a full-length demo. I decided to use the
Jonathan Coulton classic "Re Your Brains", because I can legally share and
modify his music under its Creative Commons license.

First, the original:

[https://mwcampbell.us/tmp/spleeter-demo/jonathan-coulton-
re-...](https://mwcampbell.us/tmp/spleeter-demo/jonathan-coulton-re-your-
brains/original.mp3)

Now the derived stems:

Vocals: [https://mwcampbell.us/tmp/spleeter-demo/jonathan-coulton-
re-...](https://mwcampbell.us/tmp/spleeter-demo/jonathan-coulton-re-your-
brains/stems/vocals.mp3)

Accompaniment: [https://mwcampbell.us/tmp/spleeter-demo/jonathan-coulton-
re-...](https://mwcampbell.us/tmp/spleeter-demo/jonathan-coulton-re-your-
brains/stems/accompaniment.mp3)

Note: I'm not affiliated with this project or Mr. Coulton. I just think this
is a cool project and wanted to share.

~~~
codedokode
While it's a great technology, the result sounds somewhat robotic. On the
original recording the voice sounds soft, but after separation it sounds like
it is synthesized or passed through a vocoder, something is missing. The voice
contains pieces of strumming sound. Guitar also sounds "blurred", as if
someone cut an object from the picture and blurred the cut to make it less
visible. Clap sound is distorted, on the original recording it sounds the
same, but after separation it sounds different every time, as if it was
filtered or compressed with low bitrate.

It is amazing how the ear manages to distinguish all the sounds without
distortion.

~~~
jacquesm
> While it's a great technology, the result sounds somewhat robotic.

That's like complaining about how bad the pig plays the violin. This is
absolutely incredible. The complexity level for this problem is right off the
scale and the software does a passable job of it. Given some time and more
training data and a few more people working on it this has serious potential.

~~~
mikorym
I think you meant how well the pig paints. [1] [2]

[1] [https://pigcasso.org/wp-
content/uploads/2018/12/7.jpg](https://pigcasso.org/wp-
content/uploads/2018/12/7.jpg) [2] Also, I am not affiliated with anything in
particular that has been mentioned, or with the pig.

~~~
jacquesm
I think the original was in relation to pigs dancing.

------
voicedYoda
I gave a talk at pycon this year about dsp [1], specifically some of the
complexities surrounding this. I came across a few other ml projects that
claimed to do this as well, and the biggest hold up is getting enough properly
trained data, tagged appropriately, in order to let the models train
correctly. in the git repo of this project they also explicitly state you need
to train on your own data set, though you can use their models of your like.
YMMV. I will love to try this out, as it's definitely a complex bit of audio
engineering. That said, i loved learning everything i did preparing for my
talk and need to finish up some other parts of the project to get the jukebox
working... Maybe this will help :)

1\.
[https://m.youtube.com/watch?v=fevxy-s0vo0](https://m.youtube.com/watch?v=fevxy-s0vo0)

~~~
lubujackson
Seems like most music (from the 70s on at least) is recorded multi-track and
the data is out there, just not accessible to anybody. If you ever watch Rick
Beato videos, he takes classic songs and isolates vocal/drum/etc. tracks all
the time, I'm not sure how he has access to them:
[https://www.youtube.com/playlist?list=PLW0NGgv1qnfzb1klL6Vw9...](https://www.youtube.com/playlist?list=PLW0NGgv1qnfzb1klL6Vw9B0aiM7ryfXV_)

But you probably don't need to bother with old recordings since there is SO
MUCH music being produced via tracking software right now I feel like it
should be possible to get a pretty big dataset - the difference being, of
course, professional production that affects how all these things sound in the
final mix.

Although... if you have enough songs with separated tracks, couldn't you just
recombine tracks and adjust the settings to create a much, much broader base
for training? Just a dozen songs could be shuffled around to give you a base
of 10,000+ songs easily enough. That might lead to a somewhat brittle result
but it would be a decent start.

~~~
abraae
Rick says in one of his videos that he and some of his buddies have got old
copies of the original source (separated) tracks, and they kind of pass them
around between each other.

I find that pretty amazing given the litigiousness of some in the music
industry, but there we are.

Side note: I discovered Rick Beato a few months ago and I've watched heaps of
his videos. It's really fascinating hearing old classics torn down to their
constituent parts. Here's one of my favourites of his:
[https://www.youtube.com/watch?v=ynFNt4tgBJ0](https://www.youtube.com/watch?v=ynFNt4tgBJ0)
(Boston - More than a feeling).

~~~
ehnto
Rick Beato is excellent. Nahre Sol and Adam Neely also do great analyses of
things. Adam in a more theory oriented way and Nahre in a more feeling and
composition focussed way; "Funk as digested by a classical musician" for
example looks at funk to try and find the key structures of the style which
illuminates things I might not have noticed otherwise.

~~~
mushishi
Also 8 bit music theory has very solid video essays on varying compositional
concepts that are reflected using game music. I actually find his work most
consistently satisfying. Neely and Beato are great but lower s/n ratio. Nahre
not enough watches to say but thumbs up for her, too.

Don't forget JazzDuets's channel. His content seems to be most mature and uses
actual playing a lot to tune your ear. I find him actually a bit too advanced
for my level but I like a lot his very humble and friendly personal touch.

------
jacquesm
This is very timely. I've been working for about 3 months now on a utility
that transforms mp3's to midi files. It's a hard problem and even though I'm
making steady progress the end is nowhere in sight. This will give me
something to benchmark against with for instance voice accompanied by piano.
Thank you for making/posting this.

For an idea how this project is coming along:

[https://jacquesmattheij.com/toccata.mp3](https://jacquesmattheij.com/toccata.mp3)

Yes, it's terrible :) This particular file the result of the following
transformations:

midi file -> wav file (fluidsynth)

wav file -> midi file (my utility)

midi file -> wav file (fluidsynth once more)

wav file -> mp3 file (using lame)

Of course it also works for regular midi files (piano only for now). The
reason why I use the workflow above is that it gives me a good idea how well
the program works by comparing the original midi file with the output one.

But I did not yet have a way to deal with piano/voice which is a very common
combination so this might really help me.

Possible applications: automatic music transcription, tutoring, giving regular
pianos a midi 'out' port, using a regular piano as an arranger keyboard,
instrument transformation and many others.

Having fun!

Edit: I've done a little write-up: [https://jacquesmattheij.com/mp3-to-
midi/](https://jacquesmattheij.com/mp3-to-midi/)

~~~
IAmGraydon
Just FYI in case you weren't aware - Ableton Live and several other DAWs have
this capability built in. It's far from perfect, but great for humming a
melody and then quickly turning it into MIDI.

~~~
alez
There’s a pretty cool library from Googles Magenta team that does piano
transcription pretty well. [https://magenta.tensorflow.org/onsets-
frames](https://magenta.tensorflow.org/onsets-frames)

They say it’s only really good for piano, but I definitely use it for all
kinds of samples. Great for inspiration

~~~
jacquesm
So, I used it to run the same toccata test, here are the results:

The Magenta code:

f_measure 71.56 precision 65.75 recall 78.49 accuracy 55.72

My little batch of code:

f_measure 77.74 precision 93.40 recall 66.57 accuracy 63.58

Interpreting the results is tricky, they are obviously better on 'recall' but
that is at the expense of being _much_ less precise which gives a much better
result for my code; besides it is nicer to listen to because there are far
fewer spurious notes.

My code also runs about 100 times as fast and uses very little in terms of
resources. So, rather than being depressed it looks like I'm on to something
:)

~~~
alez
Oh wow that's extremely promising! Yeah the magenta thing destroys my browser
when I run it. Still feels like magic though haha. I would be extremely
interested in some other options so good luck!

~~~
jacquesm
Thank you! If you have any files you want me to test with then feel free to
send them, email is in my profile.

------
czr
messed around with the 2stem model for a bit and it's reasonably good. I think
phonicmind is still a bit better - phonicmind tends to err on the side of
keeping too much, while the 2stem model tries to isolate aggressively and
often damages the vocal as a result (distorting words by losing some
harmonics, or losing quiet words entirely)

example:

[https://files.catbox.moe/wjruiv.mp3](https://files.catbox.moe/wjruiv.mp3)
(phonicmind)

[https://files.catbox.moe/uuzot3.mp3](https://files.catbox.moe/uuzot3.mp3)
(spleeter 2stem)

you can hear spleeter does better at actually taking out the bass drums, but
phonicmind _never_ loses or distorts any part of the vocal, while 2stem
occasionally sounds like singing is through metal tube (harmonics are
missing). will try to read instructions more carefully and see if there's some
way to fix.

~~~
roryokane
For those who, like me, hadn’t heard of PhonicMind before, it’s an online
service at [https://phonicmind.com/](https://phonicmind.com/) that charges $4
to $1.5 per song to separate out vocals, drums, bass, and the rest of the
sounds. You can upload any audio file to that website and get a 30-second
preview of separated parts for it.

------
lreichold
An interesting alternative approach for instrument sound separation is to use
a fused audio + video model. So, given that you also have video of the
instruments being played, you can perform this separation with higher
fidelity.

I was fascinated by the work done by “The Sound of Pixels” project at MIT.

[http://sound-of-pixels.csail.mit.edu/](http://sound-of-pixels.csail.mit.edu/)

~~~
renaudg
That’s quite clever but not really practical : instruments heard in most music
produced today aren’t "played" by humans.

------
ooobo
Gave this a go, it's an easy install with pip, and results are pretty quick
even on an old macbook. Splits into 2stems (vocals/accompaniment) on some
random songs I chose actually quite good using the pretrained models provided.
Of course, ripping the vocals out of the accompaniment takes out a good chunk
of the middle frequencies so some songs sound a bit wonky. Worth a play if you
are interested.

~~~
tomrod
Same thoughts here. I ran _Thriller_ , _Alligator_ by Of Monsters and Men, and
_In Hell I 'll be in Good Company_ by The Dead South on the 2 / 5 / 4 stems,
respectively. Impressive results. Definitely agree that some of the middle
frequencies show some error.

It would be really cool to create "music mappers"/life sounds tracks like what
you can do with pictures & art styles (e.g.
[https://medium.com/tensorflow/neural-style-transfer-
creating...](https://medium.com/tensorflow/neural-style-transfer-creating-art-
with-deep-learning-using-tf-keras-and-eager-execution-7d541ac31398))

~~~
colorincorrect
known nothing about the results, i suspect that mid-ranges are poorer mainly
because human frequency response is most sensitive towards mid-range aka
vocal-pitch frequency

------
iamchrisle
Non-open source products that also separate vocals from music if you need
something more "professional".

One-click process: Xtrax Stems 2 ([https://audionamix.com/technology/xtrax-
stems/](https://audionamix.com/technology/xtrax-stems/))

Professional: ADX Trax Pro 3 ([https://audionamix.com/technology/adx-trax-
pro/](https://audionamix.com/technology/adx-trax-pro/))

Both products use a server which have a much larger pre-trained models. The
professional one has added features such as handling sibilance, GUI to edit
note following as a guide for the models, and an editor tool for extracting
using harmonics.

(Note: I don't work for this company. I do pay for / use their products, and I
also happen to know someone who works there.)

------
xamuel
I wonder how it would fare on Pink Floyd's "Sheep", where vocals seamlessly
transform into instrumentals and it's impossible to tell where one ends and
the other begins.
[https://www.youtube.com/watch?v=3-oJt_5JvV4](https://www.youtube.com/watch?v=3-oJt_5JvV4)
(skip to around 1:40)

------
SemiTom
Interesting to read Thomas Dolby's thoughts on music/technology interfaces--
particularly with VR [https://semiengineering.com/thomas-dolbys-very-
different-vie...](https://semiengineering.com/thomas-dolbys-very-different-
view-of-progress/)

------
Intermernet
I'd love to see how this compares with Celemony Melodyne. As far as I've been
able to determine, Melodyne doesn't use ML, but it's hard to find out exactly
what it _does_ use.

Either way, an open source competitor to Melodyne is a welcome addition!

~~~
SyneRyder
That's the second time I've seen someone mention Melodyne for separating
vocals from a full song source - I don't think that's something it can do?
Melodyne is for tuning vocals / instruments & correcting timing on already
isolated tracks.

~~~
czr
melodyne's editing interface lets you remove different notes from a polyphonic
track. so if it's just vocals + other tonal sounds, you can manually remove
the other tonal sounds. example:
[https://youtu.be/2ZjdDatxTaQ?t=83](https://youtu.be/2ZjdDatxTaQ?t=83)

~~~
vonseel
Hmm, never tried that with melodyne myself and the video you posted isn't a
great example of an accurate vocal extraction - those are more like vocal
chops and are already pretty dirty to begin with. Based on my experience with
Melodyne, I'd be surprised if you could cleanly extract a plain singing vocal
without tons and tons of work.

------
bravura
Is the paper, "Spleeter: A Fast And State-of-the Art Music Source Separation
Tool With Pre-trained Models", available yet? What is the methodology?

~~~
czr
they made an extended abstract for ismir:
[http://archives.ismir.net/ismir2019/latebreaking/000036.pdf](http://archives.ismir.net/ismir2019/latebreaking/000036.pdf)

methodology is a separate u-net per instrument type to predict a soft mask in
spectrogram space (time x frequency), then they apply that mask to the input
audio. fairly standard.

------
davidy123
I look forward to a day I can click a button to watch videos online without
any unnecessary and distracting background music (though it would be better if
there were an option and precedent to offer unornamented narrative in video
players). The next step after this would be to have live 'music cancelling'
headphones for the grocery store (if such a thing still exists).

~~~
swagasaurus-rex
Wow. Office background noise mute.

The headphones can filter out speech that isn't above a certain threshold.
Coworkers nearby can be heard loud and clearly.

Music can play at volume then quiet itself when it detects a person speaking
directly to you.

Maybe even a training button to inform it that it is false-positive-ing
background noise, or true negative and silencing a co worker you would like to
hear.

------
huskyr
This is incredible. I made an example using David Bowie's "Changes". A bit
robotic, but even the echo is still present in the vocal track.
[https://www.youtube.com/watch?v=KPlmrq_rAzQ](https://www.youtube.com/watch?v=KPlmrq_rAzQ)

------
iagooar
Does it work with spoken word as well? My use case: improve podcast quality by
extracting the vocals only, and leaving out all background and accidental
noise.

~~~
ssttoo
Not free nor open source but you can try a plugin called izotope Rx for this
purpose

------
clashmeifyoucan
I wonder how this compares to Open Unmix ([https://github.com/sigsep/open-
unmix-pytorch](https://github.com/sigsep/open-unmix-pytorch)), that one calls
itself state-of-the-art as well and is done in collaboration with Sony from
what I see of their paper.

~~~
clashmeifyoucan
Oh I just found out their paper,
[http://archives.ismir.net/ismir2019/latebreaking/000036.pdf](http://archives.ismir.net/ismir2019/latebreaking/000036.pdf).
It's pretty competitive.

------
alez
Tried it on “Halleluwah” by CAN, had to hear those drums:

[https://soundcloud.com/alezzzz/can-halleluwah-drums-
extracte...](https://soundcloud.com/alezzzz/can-halleluwah-drums-extracted)

Finding drum breaks in music is very time consuming. This is gonna be amazing
for music production. Think how 90s jungle would’ve been if they had access to
every drum take ever

~~~
smrq
Wow, this is the isolated track I never knew I needed to hear.

------
exogen
The extracted vocals sound great! But the resulting accompaniment tracks I've
heard so far (tried on a handful of songs) aren't of usable quality for most
purposes where you'd want an instrumental track – they're too sonically
mangled.

Since people are often interested in doing this for a handful of specific
tracks and not necessarily en masse, I'd be curious about what a human-
assisted version of this could look like and whether you really could get
near-perfect results...

What if you explicitly selected portions of the track you knew had vocals, so
it could (1) know to leave the rest alone and (2) know what the backing track
for the specific song naturally sounds like when there's no singing happening?
It could try to match that sonic profile more carefully in the vocal-removed
version.

Or what if you could give it even more info, and record yourself/another
singing (isolated) over the track? Then it would have information about what
phonemes it should expect to find and remove (and whatever effects like reverb
are applied to them).

------
BeeBoBub
I am working on a product which makes use of this technology. I generate vocal
pitch visualizations for karaoke

[http://pitchperfected.io](http://pitchperfected.io)

~~~
noja
Cool - but your website needs some work. It looks like a landing page to
gather interest rather than something backed by a real product. Show us some
videos and singing, before and after, etc.

------
gdsdfe
Wait the implications of this are huge for electronic music DJs

~~~
exikyut
The audio (^F soundcloud) sounds a little warbly... if that can be largely
mitigated, then yes, remixes will never be the same

~~~
yowlingcat
While not great, the phase smearing is orders of magnitude better than most
vocal isolation plugins I've used. I only expect it to get better. Very cool!

------
cma
Is there anything like this for images? Meaning essentially trying to
decompose back into photoshop layers. Wouldn't be feasible for lots of stuff
that is completely opaquely covering something, but I'm thinking for things
like recoloring a screen print, etc.

~~~
exikyut
I have no idea how I managed to re-find these, but I did. Two recent
moderately-related/relevant posts:

[https://news.ycombinator.com/item?id=20978055](https://news.ycombinator.com/item?id=20978055)

[https://news.ycombinator.com/item?id=21220458](https://news.ycombinator.com/item?id=21220458)

------
mettamage
Ah, so this should've been the answer to my ask HN [1].

;-)

Edit: I see someone added it in as an answer 14 hours ago. Well, you have my
vote ^^

[1]
[https://news.ycombinator.com/item?id=21399838](https://news.ycombinator.com/item?id=21399838)

------
orloffm
Played with it. The quality of the result is mostly dependent on the amount of
clipping in the source file. Basically, all post-90s masters produce weird
results with orcs singing in the background. And classics from 60s yield
fantastic results.

~~~
cantankerous
Might be the training data?

------
strags
This is awesome. I now have Guns'n'Roses playing in my office, and Axl Rose is
a faint voice coming from the garage.

------
murat124
I gave it a try with Megadeth's Holy wars[1], was expecting something like
this[2] but got very deep audio. Not sure why but perhaps it's because bassist
David Ellefson uses pick which gives the percussive sound and it suits to
Megadeth.

Any parameter I could use with spleeter to get a similar output?

[1]
[https://www.youtube.com/watch?v=9d4ui9q7eDM](https://www.youtube.com/watch?v=9d4ui9q7eDM)

[2]
[https://www.youtube.com/watch?v=uWkykQHsJ-Y](https://www.youtube.com/watch?v=uWkykQHsJ-Y)

------
mothsonasloth
I'm trying to find something to generate tracks without guitar. Then I can
cover them with my guitar. Will this software help me?

~~~
ksherlock
depends? It splits it into 2 parts (vocal, everything else), 4 parts (drum,
bass, vocal everything else), or 5 parts (drum, bass, vocal, piano, everything
else). piano isolation is the weakest.

If you want to play along to drum + bass, then yes.

------
sheinsheish
Can we use sample libraries to write, record and simulate desired stems for
training? I guess the more naturally played the better?

------
theLotusGambit
Not only are the results good, but the music is generated decently rapidly.
The implications are clear: whoever wants to make a quick fortune on YouTube
should start converting and uploading truckloads of songs as fast as possible.
The demand is there. I could easily see that bringing in millions of views.

~~~
ErikAugust
They’d still get tagged for copyright.

------
jknz
On iOS, Chord AI [1] gives pretty good results for the guitar chords of any
music surrounding the phone.

[1]: [https://apps.apple.com/us/app/chord-
ai/id1446177109](https://apps.apple.com/us/app/chord-ai/id1446177109)

~~~
ohlookabird
That's sounds pretty nice! Anyone know an Android version? I just checked
Yamaha Chord Tracker and MyChord, but both don't seem to be able to use the
microphone.

------
toptal
Can someone provide a demo link of source music vs. output?

~~~
tomrod
I gave it a test using the project audio sample. Neat stuff.
[https://soundcloud.com/thomas-
roderick-836298141/sets/spleet...](https://soundcloud.com/thomas-
roderick-836298141/sets/spleeter_splits)

~~~
semiotagonal
Holy shit that works way better than I expected. The github project should
link to this or a similar example, the technical description doesn't do it
justice.

~~~
unlinked_dll
I really disagree. It sounds... awful. On par with other approaches, sure, but
the main vocals sounds like a case study in digital artifacts and the
accompaniment sounds like there's a filter automated over the track.

Far from useful.

~~~
semiotagonal
Do you know of something superior?

------
amelius
> Spleeter is the Deezer source separation library with pretrained models

Curious, what is Deezer using this for?

~~~
neohaven
Gonna guess beat/mood/song analysis.

~~~
amelius
Yeah but they can use the raw song for that I suppose.

~~~
cbHXBY1D
Easier way to match voices?

------
tomrod
This is so neat! I went looking a few months back for something like this, and
the best I found was Google's Magenta.

It would be really cool to use this to feed into Magenta. Think of the
mashups!

------
esfandia
Karaoke with the most obscure songs!

~~~
xamuel
And even better, karaoke that doesn't suck--most karaoke tracks are covers by
cheap bands and you can clearly tell the inferior quality if you're familiar
with the song.

~~~
megablast
Sure, not to mention the awful singing accompanying most karaoke tracks.

------
hodder
Very cool. A close friend of mine (and lead singer in our band May years ago)
recently died and we have a couple great recordings from 2 decades ago of his
vocals. When we recorded the rest of the instruments they were DI into a Boss
BR8. The lyrics sound awesome but the guitar and drums are recorded poorly.
This may give us a chance to split the vocals out of the final tracks, and re-
record the tracks as a tribute.

Much appreciated.

------
antman
How could we extract anything but the voice e.g. karaoke?

~~~
aasasd
I'd guess extract the voice and then subtract it from the rest with something
like Audacity. I'm not sure which operation would do that, but I believe that
it exists.

Also, other comments here speak of “separating into the voice and
accompaniment,” so maybe the model/program already do exactly what you need.

~~~
marcan_42
Invert and mix.

When you have an instrumental version of a song (from the same stems as the
vocal version) this is already one way to get the vocals out without any fancy
machine learning. The main tricks besides what you can do in Audacity like
that are properly time-aligning the tracks (even if they drift a bit) and
compensating for phase issues and compression. I wrote a dirty tool that does
that and I've been meaning to turn it into some kind of nicer GUI version.

~~~
navjack27
I've been doing something like this for a bit in Audition. Center channel
extract > invert phase > save as wav > create multi-track project > add
original > add modified > up the volume on the vocal extracted modified
version until the vocals go away

Or you can do the exact opposite and instead of center channel extracting the
vocals you can remove the vocals and use this method to better isolate vocals.

Although if things do fun stuff with stereo it might not be exact.

------
fdej
Has anyone tested this with Glenn Gould recordings?

~~~
jacquesm
Hehe, you want to split his singing and humming into a separate track?

------
jefftk
I have a large number of multitrack recordings of contra dances if anyone
wants to try training this on them.

~~~
exikyut
_You_ train this on them! (And then put the results on SoundCloud or YouTube.)

------
dgreensp
Are there any examples I can listen to?

~~~
tomo-makes
I set up a Colab notebook to try spleeter out for myself. You can try picking
up your favorite mp3, renaming it to "audio_sample.mp3", uploading it to the
Colab, and spining all the cells on the notebook. Enjoy.

[https://colab.research.google.com/gist/tomo-
makes/33b9bc7f22...](https://colab.research.google.com/gist/tomo-
makes/33b9bc7f22e257468ff9bbc7122a6e82/spleeter-multi-source-separation-
demo.ipynb)

~~~
sillysaurusx
This is great. Thank you! I extended the notebook with an example of
downloading a youtube video, extracting the audio, then feeding it through
spleeter.

Notebook:
[https://colab.research.google.com/gist/shawwn/0f286f5d4bc22e...](https://colab.research.google.com/gist/shawwn/0f286f5d4bc22e04d745657ba3ccad26/spleeter-
multi-source-separation-demo.ipynb)

~~~
tomo-makes
Cool! That's so handy.

------
yoogidoky
The long story about Spleeter : [https://deezer.io/releasing-spleeter-deezer-
r-d-source-separ...](https://deezer.io/releasing-spleeter-deezer-r-d-source-
separation-engine-2b88985e797e)

------
nagateja_1995
I was just testing it out on Gazal, it seems to work perfectly. But when it
seems to fail with Qawwalis. My understanding of how this works. Is it safe to
assume that the training data from dreezer lacks enough examples of Qawwalis?

------
seventhtiger
Aren't all the most popular audio formats lossy? Extracting full data from
lossy compression requires reconstruction. Even if they are able to completely
extract all tracks they would have gaps and be very low quality.

~~~
ZeikJT
If you want lossless audio in a decently popular format then just go looking
for FLAC, you'll definitely find some. Even bandcamp uses it [1].

[1] -
[https://en.wikipedia.org/wiki/FLAC#Adoption_and_implementati...](https://en.wikipedia.org/wiki/FLAC#Adoption_and_implementations)

------
xiphmont
Just tried it on _Meet The Sniper_. Disappointment :-(

[This was an unreasonable test, and it did really well considering the likely
training set. I bet it could do much better with better data. Still... man, I
was so hoping for magic.]

------
lebuffon
Has someone in the Intelligence community approached the author? Oops that's
classified. :)

This has implications for extracting sounds from noisy recordings or am I off
base? Does it only track pitch patterns?

~~~
jononor
Source separation of speech and speech denoising is a well established field,
more researched in general than music source separation. Intelligence officers
very likely have access to a range of well-performing ML tools for extracting
speech.

------
haywirez
Is there a known approach that attempts to separate all distinct sounds
(timbres rather than pitch) in a track? Specifically targeted at electronic
music, not standard acoustic ensembles.

~~~
dancek
That's an interesting idea. Many instruments have greatly varying timbres,
though. Combining the timbres back to instruments would require another level
of processing.

------
jMyles
Somewhat of a tangent, but does anyone have a recommendation of an open source
(ideally python) program that can make MIDI from piano audio?

~~~
jacquesm
I've been working on this for the last 3 months.

Can you send me your audio file? I'll send you back the midi I can generate;
it won't be perfect but it might be usable. See comment elsewhere in this
thread.

~~~
jMyles
Awesome! Is your project open source? Can't wait to see it.

Here's the first (but by no means only) file I have in mind:

[http://ocrmirror.org/files/music/albums/ff7/MP3/3-02%20Shnab...](http://ocrmirror.org/files/music/albums/ff7/MP3/3-02%20Shnabubula%20-%20Stone%20Eyes%20\(The%20Great%20Warrior\).mp3)

Stone Eyes from the Final Fantasy VII tribute album Voices of the Lifestream.

------
odiroot
Well, can it extract the bass track from "And the justice for all"?

------
amelius
Can this also separate the backing vocals from the lead singer?

------
marsknight
Thank you!! This is really amazing :)

------
unlinked_dll
demo links would be helpful

------
jsilence
Karaoke everything!

------
foobaw
This is amazing - so much possible learning for aspiring producers.

------
nycjobsboard
Cool stuff

------
propercoil
Pytorch > TensorFlow

