
IIR filters can be evaluated in parallel - zdw
https://raphlinus.github.io/audio/2019/02/14/parallel-iir.html
======
Stenzel
The transposed direct form II allows a biquad to be calculated with two scalar
by vector multiplications and some shuffling, which should be faster than the
proposed matrix solution I believe.

~~~
raphlinus
I'd be happy to see benchmarks of that. The problem is that the "shuffling"
creates serial data dependencies, while the matrix form doesn't. Sure, the
number of multiplications is smaller for direct forms, but that's not what has
the most effect on performance.

------
ttoinou

      y[i] = x[i] * c + (x[i - 1] * c + y[i - 2] * (1 - c)) * (1 - c)
    

So you can compute y[i]=f(y[i-n],x) for any n, but that doesn't give you
intermediary results : all y[i-j] for j between n and 0...

I feel like trying to compute (y[i-j] for j between n and 0) on the GPU (n+1
things to compute in parallel here) would not be efficient because the more
you increase n the more y[i] becomes really longer to compute (at least linear
with n) and you'll need to wait for the calculus of y[i] to finish before
having the result of the result of y[i-n]. What am I not seeing here ?

~~~
mntmoss
You can treat it as a multiplexing operation. The first y[i] is computed
normally. y[i+1], y[i+2] etc. are computed with a parallel form, up to as many
cores as is optimal. Normally cores will wait for data, finish it very
quickly, and then sit idle since they're waiting on memory, but this allows
each processing core to return more results from data at a given time i
without a serial readback(which introduces memory bandwidth pressure). The
optimal throughput strategy is to push the parallelization upwards until the
latency of doing this heavy computation outstrips the memory latency savings.

~~~
ttoinou
Interesting thanks for the answer. From what you said it feels like it could
be good for AVX-like CPU accelerated instructions with all those latencies it
would be an optimization like loop unrolling ; but for GPU, really ?

------
saagarjha
Sorry for being "that" person, but this article uses terminology I'm so
unfamiliar with that I don't know what it's about, or what it's trying to
explain. Would someone mind enlightening me? The Wikipedia page on IIR (which
I assume means infinite impulse response) isn't very helpful:
[https://en.wikipedia.org/wiki/Infinite_impulse_response](https://en.wikipedia.org/wiki/Infinite_impulse_response)

~~~
panic
If x is an array of input samples and y is an array of output samples, a
simple FIR filter might look like:

    
    
      y[t] = a*x[t] + b*x[t-1]
    

Whereas a simple IIR filter might look like:

    
    
      y[t] = a*x[t] + b*y[t-1]
    

In the first equation, only two input samples (x[t] and x[t-1]) are used to
compute each output sample. A sharp "impulse" at x[0] will only affect the
values of y[0] and y[1]. That's what makes it an FIR filter—the number of
output samples affected by an impulse is finite.

On the other hand, the second equation has a y[t-1] term on the right hand
side. Each output sample feeds back into the next one. To see what this looks
like purely in terms of x samples, you can substitute using the equation for
y[t], then multiply:

    
    
      y[t] = a*x[t] + b*y[t-1]
      y[t] = a*x[t] + b*(a*x[t-1] + b*y[t-2])
      y[t] = a*x[t] + b*a*x[t-1] + b^2*y[t-2]
      y[t] = a*x[t] + b*a*x[t-1] + b^2*(a*x[t-2] + b*y[t-3])
      y[t] = a*x[t] + b*a*x[t-1] + b^2*a*x[t-2] + b^3*y[t-3]
      ...
    

Eventually you'll get the sum (assuming the samples start at x[0]):

    
    
      y[t] = sum from n=0 to t of b^n*a*x[t-n]
    

The sum ranges over the entire array of samples, all the way back to the
beginning. Each input sample from x influences every output sample afterward.
The output for an impulse—a single positive x[0] sample followed by x[1] =
x[2] = ... = 0—is

    
    
      y[t] = b^t*a*x[0]
    

Since these samples are non-zero for all t (though decaying exponentially
according to the value of b), the second equation describes a filter with
infinite impulse response.

~~~
saagarjha
Thanks for the explanation; this clears up the mathematics behind the
operation. Follow up question, though: why would you want to do this? Is this
some sort of digital signal processing procedure?

~~~
panic
Yeah; say for example you're trying to recognize a swipe with a certain
velocity on a touchscreen:

    
    
        def touchDown():
            velocity = 0
            lastTime = now
        def touchMoved(from, to):
            velocity = (to - from) / (now - lastTime)
            lastTime = now
        def touchFinished():
            if velocity.x > threshold:
                performGesture()
    

You test it out, and discover that the gesture is kind of flaky. Sometimes it
works properly, but just as often it fails when it feels like it should have
worked. It turns out that maintaining a constant velocity is hard—as your
finger moves across the screen, the velocity can fluctuate around between
samples. The test in touchFinished() looks at only the last sample taken,
which is often the worst, as your finger has just lifted off from the screen.

So instead of just looking at the last velocity, you keep track of a
"smoothedVelocity" which averages your velocity samples together:

    
    
        def touchDown():
            smoothedVelocity = 0
            lastTime = now
        def touchMoved(from, to):
            velocity = (to - from) / (now - lastTime)
            smoothedVelocity = 0.25 * velocity + 0.75 * smoothedVelocity
            lastTime = now
        def touchFinished():
            if smoothedVelocity.x > threshold:
                performGesture()
    

With this change, your smoothedVelocity moves a bit toward the measured
velocity with each sample, but won't jump directly there. This feels a lot
more predictable. Stopping your finger momentarily as it lifts off the screen
won't stop the gesture from being performed, for example.

The smoothedVelocity code implements a low-pass IIR filter (it filters out the
high-frequency fluctuations while letting the lower-frequency motion pass
through). You could also imagine just keeping the last 2 velocities and
averaging them together at the end: that would be an FIR filter.

~~~
ttoinou
Good example. I would add that you don't need to understand DSP and IIR to
understand that particular example (IIR with only one memory case), it's only
a 1D barycenter between previous version and last weighted value.

