Hi I'm Jesse, one of the authors, thanks for the interesting questions! - In ter...

unlinked_dll · on Jan 16, 2020

Thanks for the reply! This work is fascinating and while I'm not a python guy I'm going to play with your library a bunch.

I do think you should investigate comparisons to adaptive FIRs much more. This field is critical to the design of low power medical devices like hearing aids, which need feedback reduction, echo cancellation, and the like with minimal filter orders.

My question on correlated parameters was a bit more abstract. Often in the design of classical audio signal processors for creative applications you find that the user space parameters can be correlated, which map to more design space parameters that are even more correlated, and down to implementation level parameters which are even more correlated. For example in a filter designed by frequency sampling, the adjacent bins of an FFT are highly correlated in their I/O and I was curious if you optimized a bit by taking a DCT or similar approach for reparameterization like you'd find in calculating MFCCs and the like. It's really tough to design ML approaches for creative signal processing that are better than traditional methods due to this nature, humans learn and adapt to correlations very quickly, machines not so much when dealing with oscillation and ripple. Many local extrema in the parameter space and all that.

AstralStorm · on Jan 16, 2020

Adaptive IIR would be more interesting, as automatically controlling and designing those filters in a stable way is rather hard. And they're both differentiable and power efficient. Especially anything that is not a biquad series, and because they have recursion related computational noise, which the ANN should be able to optimize out.

hcrisp · on Jan 16, 2020

If the FIR filters are time variant, what determines the values of the kernel at any given moment - is it the prior values of the signal being generated or something else?

For IIR filters, I imagine the network would have to be recurrent. If you train the weights with labeled data, does that amount to a different mechanism for designing a filter, sort of like exchanging bilinear transformation for a data-driven approach? The former seems like forward engineering from first principles, whereas the latter seems like reverse engineering from data that exhibits transformations that you want the filter to encode. Fascinating stuff.

unlinked_dll · on Jan 16, 2020

I imagine the authors keep filter order as a design parameter, since dynamically changing order is a rather tricky proposition.

For IIRs the approach (imo) would be to use transformed analog biquads (such as an SVF topology, via the TPT/Zavalishin's method) which is designed to handle the time variant issues with state evolution. From there your model wouldn't synthesize filter coefficients directly but either the design parameters of the filter or the gains of the outputs or multi mode filters built from cascaded biquad sections. A good example of this in practice is the Oberheim "pole mixing" filter (analog) or the Shruthi's variation on it, which is a 4th order filter that achieves a wide array of frequency responses by mixing the outputs of each stage. Those gains in the mixer can be varied for very cool results.

jessejengel · on Jan 16, 2020

Hi, yup that's true we keep the filter order fixed. For the experiments in the paper, the time-varying coefficients are generated by a neural network that is trained end-2-end to generate audio like the training set (conditioned on high-level controls such pitch and loudness).

I agree that IIRs are a great avenue of future study, also with time-varying coefficients. I've played around a bit with them, but they are harder to efficiently train with current autodiff software and GPUs/TPUs. I think they may require writing a custom cuda kernel, but I'm hopeful for things like JAX's scan operation.