Hacker News new | past | comments | ask | show | jobs | submit login
Why is it a bad idea to filter by zeroing out FFT bins? (2020) (dsp.stackexchange.com)
91 points by bcaa7f3a8bbc on May 9, 2021 | hide | past | favorite | 40 comments



This is indeed an excellent way of filtering if your audio is cyclic and fits into a single FFT, like a periodic waveform, a drum loop or an Optigan track. Just make sure not to apply any window.


You also need to make sure that the gibbs phenomena does not cause the filtered signal to leave the range of the representation. So a prefiltered signal is less than 1, but the post filtered signal wont be. Meaning if 1 is the cap for your audio output, then enjoy a brand new category of distortion. But its a terrible way in general ofc, just optimize an appropriate, zero phase filter with an unaffected passband and minimum distortion. Its trivially easy, and there is no excuse to just use the terribly shitty short "classic" filters common in audio processing or graphics as implemented by skilled programmers who dont know the difference between dft and fft. (which is always easy to tell as they use the term fft as if it was synonymous with frequency transform estimates)


This is the case for any sharp filter. It is not unique to the FFT approach. It doesn't matter if you use a linear phase FIR; any time you "remove" frequencies you can increase your peak levels. Try graphing sin(x) + 0.2sin(3x) and then try removing/filtering out the 3x component.

It's even true for reconstruction. A digital waveform can represent peak levels far above "digital peak", in between samples.

This is why if you're mastering songs, you'd better keep your peak levels at -0.5dB or -1dB so (so the filtering from lossy compression won't make it clip), and why you'd better use an oversampling limiter. Especially if you're doing loudness war style brutal limiting, because that's the stuff that really creates inter sample peaks. But you shouldn't be doing that, because Spotify and YouTube will just turn your song down to -14 LUFS anyway and all you'll have accomplished is making it sound shitty :-)


not quite, designing minimum passband distortion filters with no amplitude increase anywhere is slightly harder, but its not impossible. Even if you are strictly removing some specific frequencies, as you can design the filter in such a way that the spectrum amplitude ringing strikes zero for them. In practice though, simply reducing these frequencies by a factor of 100 is good enough and thats possible for bands without needing to have any amplitude above 1.


You didn't understand my example. This isn't about spectral ringing. It doesn't matter if you have zero spectral ringing, and no amplitudes above 1. There is no way to have a sharp filter that removes (or almost removes) certain frequencies, even if it has zero spectral ringing, while guaranteeing it doesn't increase peak levels in the time domain. The filter will decrease the total energy of the signal, but a decrease in signal energy can still cause an increase in peak levels. This is because the addition of a frequency component can decrease peak levels by lining up with the existing peaks in such a way, and thus removing it can conversely increase peak levels.

Just punch sin(x) + 0.2sin(3x) into a graphing calculator, then remove the 0.2sin(3x) component and look at peak levels increase. No filter can fix that without also decreasing the sin(x) component significantly to compensate.


Your proposed zero phase filter won't work in realtime, and there are many use cases for what you consider shitty.


Modern computers are ridiculously powerful, processing a 4k image with a 50x50 filter is real time >60hz on cpu using a decent fft solution, and on a decent gpu you could do many times more >60hz.

Oh sure, there are lots of usecases for fast but bad. But what bugs me is that people arent using them for those cases. Worse still, people are using repeated fast but bad, and then more fast and bad to fix the problems that caused, and effectively creating slow and bad filters.


I'd assumed fft was just a dft with O(n log(n)) performance - am I missing something?


You're not. The FFT is just a particular way of implementing the DFT.


Quite. So I'm puzzled by what mistake the 'skilled programmers' are making, when confusing DFT and FFT. Implementing DFT in quadratic time?


It is possible to get around the cyclic properties of convolution via FFT by zero padding your array prior to transforming though.


While it’s not necessarily a “bad idea” depending upon your application, you may be able to do “better” in the side lobes using masking functions (windows) other than rectangles, which is what zeroing is. See hanning, hamming, etc.


I heard a maxim that declares: "a sharp discontinuity in the frequency domain equals ringing in the time domain, and a sharp discontintuity in the time domain equals ringing in the frequency domain".

In other words it will cause ringing (oscillations, Gibbs phenomenon) in the time domain (for signals, or spatial domain for images). If instead you want a smoother result, you will need to use oscillation in the FFT domain when zeroing out bins. As that is hard to do perfectly, it's easier to specify a window function (or a classically derived FIR or IIR filter kernel) and convolve it with your input signal / image. It is also more efficient to do online when the data is streaming.


Said another way, you're making a filter. Zeroing out FFT bins is a brickwall filter. Brickwall filters have poor frequency and amplitude accuracy if you are trying to preserve the signal in the passband. A Flattop filter will give maximum amplitude accuracy, a gaussian filter will give good amplitude and excellent frequency accuracy. Other filter types can be implemented to require less computation for resource constrained systems under certain circumstances. Zeroing is the simplest filter to implement in code, but it's performance is essentially the worst from a signals point of view.


>Brickwall filters have poor frequency and amplitude accuracy if you are trying to preserve the signal in the passband.

Brickwall filters have the greatest possible frequency accuracy. I think you might be getting filters mixed up with windows - a boxcar window does not have the best frequency accuracy.


It can work fine if you implement an FIR with sufficient bins.


How does a brickwall filter affect the following things?

A tone played for a fixed time, a glissando, vibrato, a pure tone that lies between two frequency bins?


If you're doing this then you assume your signal is stationary in the window length. Many times, it's not...


https://github.com/pmarks-net/chromadoze multiplies white noise by a stairstep shape (rectangles of various heights), then runs an IDCT to generate colored noise.

I wonder if this actually has subtle artifacts, or if it doesn't matter because the input is noise?


I disagree that it's always a bad idea. If you didn't have real data in those bins to begin with, then the absence of ringing was never real either. You're just choosing between interpolation strategies to fill in the data you deleted. You have to realize that whatever you do, you're making up data. One could say that you're taking out your dry erase marker and writing in your priors. For image processing, you probably don't have ringing in the scene you took a picture of, so you don't want to zero bins. In other signal processing contexts where you might actually have no signal to measure in certain bins, and sometimes you want to zero them.


Thats not how the frequency transform works. Its more likely to have more signal than noise in higher frequencies iif the signal is smooth and the noise white. But real relevant signals often have discontinuities.


> If you didn't have real data in those bins to begin with, then the absence of ringing was never real either.

What is unreal data? If you have periodic data that's not aligned you're going to have a continuous signal going all the way to the high frequencies.


>What is unreal data?

The easiest example would be when an earlier step supersampled the data, and you know that nothing above the original Nyqiust could possibly reflect reality. That's one example of when you'd want to zero bins.


Yes, but what's stopping you from using non brick wall windows and filtering nyquist frequencies before sampling? Situations where "unreal" data is present only in higher frequencies is rare, and situations where all data is absent from the higher frequencies are very rare unless you sample in a perfect context where everything is aligned with the sampling period.


A much less well known fact about the FFT (or the DFT) is that it is also exhibits a brickwall (rectangular) response. This can manifest itself when using zero padding in the frequency domain for upsampling, but is also the reason for artifacts when FFTing a nonperiodic window.

It's used in OFDM, where the subchannels are generated by an FFT and have a sinc shape (the Impulse response of the FFT) in the frequency domain.


Thanks for mentioning this, because after reading the answer my first thought was "wait, but what if I do nothing before doing IFFT which is also equivalent to multiplying by a rectangle (but that nonzero from -nyquist to nyquist)"


as mentioned, worth pointing out that most practical digital filters don't have that many taps so unless you have other reasons to go to the frequency domain, time domain filtering can be faster...

but yeah, the implicit boxcar is a sinc.

sometimes it's used for image data as 2d convolutions can be expensive though...


Most used digital filters are small this is true. But that is an error caused by legacy solutions which required filters to be exceedingly short beeing inappropriately applied to modern systems and problem. The most egregious examples are from videogames, where several different short classic filters are usually applied in series in order to deal with e.g. aliasing. The problem is twofold, it effectively makes a larger filter, and that larger filter is terrible at its job. The effect of using a single appropriately designed large filter is astonishing.


i'm guessing you're talking about things like og 4 tap biquads and the like.

in the time domain, you're looking at what... for each sample a pointwise multiply for each tap and then a sum, right? i'm guessing for most audio applications at commonly used sample rates, you're rarely going to have more than 24 taps at the very most? (with most only really needed 8 to 16?)

with the fft, unless you're using super exotic multiwindow schemes, you're looking at a pointwise multiply just to do the windowing before the fft. then you're looking at n log n to compute the fft itself, then zeroing or applying an envelope (pointwise multiply), then another n log n back to the frequency domain.

i think with simd you're way faster to just stay in the time domain.

would be interesting to bench for sure though... small ffts and simd may be super fast and not that many instructions.


It's not unusual to use hundreds of taps for a clean linear phase filter with a sharp slope. Discrete convolution reverb would need hundreds of thousands - which is why you use an FFT for that.


interesting. just looked at a datasheet for a modern audio dsp and they set aside memory for ~256 coefficients per 48k channel. i guess big filters are a thing.

still, you're looking at applying a window before the fft in most applications. that's a full pointwise multiply before even stepping into the fft (and back).

reverbs have delay lines, right? what does that look like in the spectral domain? some shift of the phases?


This would be a really good interview question!


I think it's a terrible interview question, unless maybe you're hiring an electrical engineer.

It's not something you can expect someone to reason through from the basics, and it's not something I'd expect someone to know unless they've worked on problems involving the technique.


Seems like a pretty reasonable question for a job involving signal processing where the candidate has experience. Certainly that could be electrical engineers. In that context putting a signal into the frequency domain seems pretty basic actually. Knowing the caveats of sharp edged filters in that situation doesn't seem like outrageously obscure knowledge.


It's blindingly obvious that you would only ask this to someone who should know it.


Why use an FFT at all? Yes it's easy but you'll almost always get a better result with a lot less computation by building an actual digital filter.


An IIR filter will always beat an FFT, but the most efficient way to implement longer FIR kernels is often with a WOLA or similar "streaming" FFT filter implementation. At some point, it takes fewer cycles to do N*log(N) multiplication in the frequency domain than to do N^2 multiplications to convolve in the time domain.

The crossover point will be system-dependent and heavily influenced by overhead, but a crude WAG might be in the vicinity of 64- to 128-wide kernels. There is no question of one implementation being "better" than the other, they are capable of identical results if implemented accordingly.


Applying a FIR in frequency domain faster for 1D filters longer than some very small number, usually 7-16. The old school graphics and audio filters are shorter sure, but they are also horribly bad compared to what you can do with a designed filter in the 20-40 range.


Some standards define a frequency response of the whole measurement chain, including things like the frequency response of your sensor and anti aliasing filter. An FFT is the easiest way to deal with that.


When you want to do signal analysis in the frequency domain?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: