Hacker News new | past | comments | ask | show | jobs | submit login

Hi I'm Jesse, one of the authors, thanks for the interesting questions!

- In terms fo the FIRs, I think you can think of this as a form of more general/nonlinear filter modeling. The difference being I think that you can have a filter as one of several components, and adapt them all jointly to achieve some task (which itself can be more flexibly defined (different losses, adversarial etc.). The filter itself is still just LTV-FIR, but it's being controlled nonlinearly. We only have examined synthesis so far, but other signal processing problems like denoising are definitely good directions. The "effects" processors are designed for this.

- It's true neural networks often learned correlated parameters but it usually is of less significance because they operate in an overparameterized "interpolative" regime, which has a lot of interesting ongoing research trying to understand it.

- We didn't do a quantitative comparison, but in general the tradeoffs will be different. Dereverberation by a modular generative model will only sound as good as the generative model itself, so artifacts will be from not modeling the source properly. However, if you learn a good model, the dereverberation should be essentially perfect (you can losslessly apply different reverb), although that's a big if.




Thanks for the reply! This work is fascinating and while I'm not a python guy I'm going to play with your library a bunch.

I do think you should investigate comparisons to adaptive FIRs much more. This field is critical to the design of low power medical devices like hearing aids, which need feedback reduction, echo cancellation, and the like with minimal filter orders.

My question on correlated parameters was a bit more abstract. Often in the design of classical audio signal processors for creative applications you find that the user space parameters can be correlated, which map to more design space parameters that are even more correlated, and down to implementation level parameters which are even more correlated. For example in a filter designed by frequency sampling, the adjacent bins of an FFT are highly correlated in their I/O and I was curious if you optimized a bit by taking a DCT or similar approach for reparameterization like you'd find in calculating MFCCs and the like. It's really tough to design ML approaches for creative signal processing that are better than traditional methods due to this nature, humans learn and adapt to correlations very quickly, machines not so much when dealing with oscillation and ripple. Many local extrema in the parameter space and all that.


Adaptive IIR would be more interesting, as automatically controlling and designing those filters in a stable way is rather hard. And they're both differentiable and power efficient. Especially anything that is not a biquad series, and because they have recursion related computational noise, which the ANN should be able to optimize out.


If the FIR filters are time variant, what determines the values of the kernel at any given moment - is it the prior values of the signal being generated or something else?

For IIR filters, I imagine the network would have to be recurrent. If you train the weights with labeled data, does that amount to a different mechanism for designing a filter, sort of like exchanging bilinear transformation for a data-driven approach? The former seems like forward engineering from first principles, whereas the latter seems like reverse engineering from data that exhibits transformations that you want the filter to encode. Fascinating stuff.


I imagine the authors keep filter order as a design parameter, since dynamically changing order is a rather tricky proposition.

For IIRs the approach (imo) would be to use transformed analog biquads (such as an SVF topology, via the TPT/Zavalishin's method) which is designed to handle the time variant issues with state evolution. From there your model wouldn't synthesize filter coefficients directly but either the design parameters of the filter or the gains of the outputs or multi mode filters built from cascaded biquad sections. A good example of this in practice is the Oberheim "pole mixing" filter (analog) or the Shruthi's variation on it, which is a 4th order filter that achieves a wide array of frequency responses by mixing the outputs of each stage. Those gains in the mixer can be varied for very cool results.


Hi, yup that's true we keep the filter order fixed. For the experiments in the paper, the time-varying coefficients are generated by a neural network that is trained end-2-end to generate audio like the training set (conditioned on high-level controls such pitch and loudness).

I agree that IIRs are a great avenue of future study, also with time-varying coefficients. I've played around a bit with them, but they are harder to efficiently train with current autodiff software and GPUs/TPUs. I think they may require writing a custom cuda kernel, but I'm hopeful for things like JAX's scan operation.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: