That is one way to do it. I wonder if you couldn't apply the effects to the tracks themselves.
Take tracks A and B. If the final audio track is composited as λ(αA+βB), perhaps you could save λ(αA) and λ(βB) as separate tracks in the file to be composited as λ(αA)+λ(βB). That assumes λ is linear, but you can likely cheat a bit to make it work on non-linear functions as well.
But once you start going down that route, you need to assume that the player will mix those perfectly, and looking at history, I wouldn't take that for granted. Some shader compilers are buggy, imagine what would happen on low-power HW, I guess it can only be done reliably if you control the player as well.
Take tracks A and B. If the final audio track is composited as λ(αA+βB), perhaps you could save λ(αA) and λ(βB) as separate tracks in the file to be composited as λ(αA)+λ(βB). That assumes λ is linear, but you can likely cheat a bit to make it work on non-linear functions as well.
But once you start going down that route, you need to assume that the player will mix those perfectly, and looking at history, I wouldn't take that for granted. Some shader compilers are buggy, imagine what would happen on low-power HW, I guess it can only be done reliably if you control the player as well.