Just going to shamelessly post it here.
It's always great to see more DSP projects though!
Swift community is currently trying to come up with their own differentiable programming initiative 
and blog posting with audio examples: https://magenta.tensorflow.org/ddsp
- (w.r.t time varying FIRs) How did your results compare to traditional NLMS/adaptive approaches? Were you able to achieve similar results with fewer CPU cycles/lower filter order?
- (also w.r.t FIRs) Have you looked at your approach as more general/nonlinear model of adaptive filtering?
- How do you deal with highly correlated parameters in your models?
- (w.r.t dereverberation) How does your approach compare in fidelity and performance to homomorphic filtering approaches for deconvolution?
- In terms fo the FIRs, I think you can think of this as a form of more general/nonlinear filter modeling. The difference being I think that you can have a filter as one of several components, and adapt them all jointly to achieve some task (which itself can be more flexibly defined (different losses, adversarial etc.). The filter itself is still just LTV-FIR, but it's being controlled nonlinearly. We only have examined synthesis so far, but other signal processing problems like denoising are definitely good directions. The "effects" processors are designed for this.
- It's true neural networks often learned correlated parameters but it usually is of less significance because they operate in an overparameterized "interpolative" regime, which has a lot of interesting ongoing research trying to understand it.
- We didn't do a quantitative comparison, but in general the tradeoffs will be different. Dereverberation by a modular generative model will only sound as good as the generative model itself, so artifacts will be from not modeling the source properly. However, if you learn a good model, the dereverberation should be essentially perfect (you can losslessly apply different reverb), although that's a big if.
I do think you should investigate comparisons to adaptive FIRs much more. This field is critical to the design of low power medical devices like hearing aids, which need feedback reduction, echo cancellation, and the like with minimal filter orders.
My question on correlated parameters was a bit more abstract. Often in the design of classical audio signal processors for creative applications you find that the user space parameters can be correlated, which map to more design space parameters that are even more correlated, and down to implementation level parameters which are even more correlated. For example in a filter designed by frequency sampling, the adjacent bins of an FFT are highly correlated in their I/O and I was curious if you optimized a bit by taking a DCT or similar approach for reparameterization like you'd find in calculating MFCCs and the like. It's really tough to design ML approaches for creative signal processing that are better than traditional methods due to this nature, humans learn and adapt to correlations very quickly, machines not so much when dealing with oscillation and ripple. Many local extrema in the parameter space and all that.
For IIR filters, I imagine the network would have to be recurrent. If you train the weights with labeled data, does that amount to a different mechanism for designing a filter, sort of like exchanging bilinear transformation for a data-driven approach? The former seems like forward engineering from first principles, whereas the latter seems like reverse engineering from data that exhibits transformations that you want the filter to encode. Fascinating stuff.
For IIRs the approach (imo) would be to use transformed analog biquads (such as an SVF topology, via the TPT/Zavalishin's method) which is designed to handle the time variant issues with state evolution. From there your model wouldn't synthesize filter coefficients directly but either the design parameters of the filter or the gains of the outputs or multi mode filters built from cascaded biquad sections. A good example of this in practice is the Oberheim "pole mixing" filter (analog) or the Shruthi's variation on it, which is a 4th order filter that achieves a wide array of frequency responses by mixing the outputs of each stage. Those gains in the mixer can be varied for very cool results.
I agree that IIRs are a great avenue of future study, also with time-varying coefficients. I've played around a bit with them, but they are harder to efficiently train with current autodiff software and GPUs/TPUs. I think they may require writing a custom cuda kernel, but I'm hopeful for things like JAX's scan operation.
For someone looking to learn more about signals and audio processing DFTs FFTs etc any good material people can recommend?
I've completely forgotten my Uni CompEng Signals and want to revisit it.
Once you're pretty confident with your understanding of wave math, next focus on linear algebra. Linear algebra sounds like a strange requirement for understanding signals, but it actually is fundamental. Fourier series should then just "click" for you -- a Fourier transform is basically a change of basis in a vector space.
Bonus points if you learn about infinite-dimensional vector spaces, what Dirac delta functions are and what they mean. For example, why is it that a delta function has infinite spectral energy? Either you have no idea, or you find the answer obvious. There is no in between. Good DSP folks can answer theoretical questions like this without thinking. Once you understand how to think about waves in the time basis and the frequency basis, you will be well equipped to understand pretty much everything in DSP.
No, real wave maths is continuous wavelets and those are not as useful in processing. (DFT is a kind of wavelet transform, just limited.)
And then you have the advanced tensor wave math which is used in physics but rarely in sound processing. I bet you didn't have evaluating Schrodinger's and symmetric solutions was what you had in mind.
You'll get much more mileage out of statistics and discrete mathematics, plus control theory and optimization theory.
Well momentum is the FT of position so...
Also note the OP's question is "Whats wave math?" not "why don't use the (D|F)FT?" which I'm still surprised was suggested: Its a tool, use it when appropriate, use something else when its not.
Introduction to Digital Filters with Audio Applications
Mathematics of the Discrete Fourier Transform