An interesting alternative approach for instrument sound separation is to use a fused audio + video model. So, given that you also have video of the instruments being played, you can perform this separation with higher fidelity.
I was fascinated by the work done by “The Sound of Pixels” project at MIT.
I was fascinated by the work done by “The Sound of Pixels” project at MIT.
http://sound-of-pixels.csail.mit.edu/