(a) if my intuition is super wrong I'd love for HN to correct it, heh. (b) long shot, but anyone have links to code? It's HN after all -- maybe some commenters in this thread are aware of cool/idiomatic/simple/etc uses of wavelet code in open source software.
(I know of a bunch of cool uses they've been put to, from JPEGs to the spacex fluid dynamics presentation about using wavelet compression on gpu, but I've personally never used them as a tool for anything, and it'd be fun to learn about them with code!)
A mathematician explained it to me as an alternative to a fourier transform: instead of describing the function as a sum of sine waves, its a sum of mexican hats (or whatever basis function). And it turns out that’s a simpler representation in the case of function with sharp discontinuities. It’s also an alternative to a Taylor series, replacing sums of derivatives with the sum of the scaled basis function. Seemed like a pretty elegant explanation to me.
it has been a long time, so i'm probably going to say something really stupid... but it's fun to try and hopefully someone will correct me if i'm wrong,
my understanding is that one place that they're useful is in terms of time-frequency uncertainty that occurs with regular fourier transforms. for example, if you're doing spectral analysis and looking for low frequencies you need a wider analysis window (a full cycle has a long period for low frequencies), but that wide analysis window now covers a lot of time, so if you pick up a low frequency component, you don't know where it is in that big wide window. wavelets instead fit these families of arbitrary basis functions that can be more compact, so you can use a narrower window and therefore have less time uncertainty for the things you find within it.
...most applications sort of go from there, where you want to warp the equivalent of the x-axis in the frequency domain in various ways and perhaps be able to either use a narrower analysis window for better time resolution or say, a smaller number of basis functions for reconstruction in lossy compression where some bands are less important than others and resolution in those bands can be discarded.
A classic Fourier spectrogram makes one choice about time uncertainty vs. frequency uncertainty and uses it for all frequencies. You can think of it as dividing time-frequency space into little squares that work okay for most quantities of interest.
Wavelets alter that tradeoff. At low frequencies, we often care about small differences in frequency (e.g., 4 vs 5 Hz), while the precise location of the peaks (e.g., 10 vs 10.1 s) matters less because the signal is changing slowly anyway. The situation is reversed at high frequencies: 1 kHz vs 1.001 kHz is often physically irrelevant, but the timing matters because they're so short). We therefore divide the time-frequency space up so that the windows are long in time, but narrow in frequency at low frequencies, square in the middle, and short in time but wide in frequency at higher ones.
On top of that, wavelets recognize that sine waves are a mathematically convenient basis, but may not reflect your data, so you can also swap in a "mother wavelet" that better matches whatever phenomena you're studying.
A FT on an audio file, for example, will tell you how much of each frequency is present in that file. Each component is localized in frequency. A wavelet, however, is localized in both frequency and time (due to this decaying “hat” shape). When you scan this wavelet across the audio file and record the outputs you therefore learn not only which frequencies are present, but also their locations in the file.
When you say:
When you scan this wavelet across the audio file
Apologies for the naive code analogies -- your comment makes me think of something like an XLA 'reduceWindow' operation (edit: or an OpenCL kernel), where a wavelet-sized 'window' is slid over the input, the difference is computed for each data point -- and then (waves hands: probably something smarter than just a) 'cumulative sum' to see how close/different is is.
(a) is that on the right track/what you meant by 'scanning the wavelet across?
(b) what about transforming the wavelet -- to different scales, etc?
This seems like the kind of thing that's actually like a search problem, heh, where you'd need to slide it over the input many many times, and where it could conceivably benefit from (waves hands) machine learning techniques as a result, to come up with something that does the search for a good wavelet encoding more efficiently based on the input.
Edit: this one seems simple enough to learn from, it's minimal, MIT licensed, audio-focused, written in Julia, and references the course material that it's based on:
It, uh ... like lots of scientific code, it has a lot of single-letter variables. :D But if one was to google up a few acronyms ('DWT'), install Julia, get it to run, and then re-implement it in another language, my feeling is that you might understand wavelets pretty good, no? Or be at least be well-equipped to understand more interesting uses of wavelets?
(I'm posting this because I might use some 'shop time' doing approximately that, so I would definitely appreciate any HN comments telling me 'consider learning from $some_other_example instead, for $some_reason.')
Intuitively, wavelets, and SP in general, seem like something behavioral folks could benefit from, but I can't come up with any good use cases that would justify going through all the associated math.
Yeah, I agree. Wavelets were all the rage in the late 80's and 90's and they seem to have fallen out of fashion.
As a matter of fact, It's kind of strange that applied math techniques be subject to fashion.
I think there is quite a lot to be done looking at deep nets in terms of wavelets for example.
Apropos deep nets, with the explosion in machine learning in the past few years, I've been seeing a lot of research interest statements change to meet that. In particular, there's an awful lot of numerical linear algebra being done now.
I suspect that things will come full-circle soon enough, and those tools developed in numerical linear algebra (via their connections to functional analysis) will make their way to harmonic analysis.
(This is notwithstanding the fact that compressed sensing is picking up a little momentum as a research area in applied mathematics and other disciplines that study signal processing. Wavelets, curvelets, shearlets, chirplets, etc. will likely see some action there too.)
I suspect the OP is not talking about how deep nets are implemented, but rather how people are trying to understand how and why they work so well, or how to reverse-engineer knowledge out of a trained net, or how to train them faster, etc ...
In that space, you need quite a bit more than matrix multiplication.
If you were training some deep learning model...
...should you be trying a few Wavelet transforms on your inputs, and feeding those in to your model, too, to see if your model performs better with wavelet inputs?
You’ve just described modern machine learning.
Do you know how we discovered X-Rays? Henri Becquerel realized his photographic plates had been darkened after being left in a drawer with uranium sulfate.
Do you know how electric guitar distortion was discovered? Willie Kizart dropped his Fender amp.
Worse things have happened than experimenting by throwing one more transform on your inputs before processing them.
How do you determine if they are good for your application and how do you choose which family of wavelets to apply?
CNNs have gotten so good though it seems a little moot.
The chief advantage of this technique is a) certain regular wavelets have performance benefits over convolutions, especially when talking low level/FPGA/ASIC space b) being not learned can be beneficial, as these is nothing to overfit.
I can see it being handy for embedded/rasppi like applications. In particular, you can festoon a crude face detector with a dozen Haar filters.
End to end training is just soooo convenient though.
"Wavelets came about as a kind of update to an enormously useful mathematical technique known as the Fourier transform. In 1807, Joseph Fourier discovered that any periodic function — an equation whose values repeat cyclically — could be expressed as the sum of trigonometric functions like sine and cosine."
Due to the "periodic", that math is Fourier series, not the Fourier transform. Given almost any periodic function, the Fourier transform won't exist.
The math of Fourier theory is done carefully in two of the W. Rudin texts: Fourier series is in his Principles of Mathematical Analysis, and Fourier transforms is in his Real and Complex Analysis. For more, his Functional Analysis covers the related distributions. See also the "celebrated" Riesz–Fischer theorem, IIRC in his Real and Complex Analysis.
As I recall, wavelets have some good completeness properties.
Power spectra are of interest, and for that see the Wiener–Khinchin theorem. For the relevant statistics of estimating power spectra, see the work of Blackman and Tukey.
The referenced article also has
"That’s because Fourier transforms have a major limitation: They only supply information about the frequencies present in a signal, saying nothing about their timing or quantity."
Phase gives some information on timing, and the power spectra, on quantity.
The article also has
"Whenever you have a particularly good match, a mathematical operation between them known as the dot product becomes zero, or very close to it."
A dot product value of zero means orthogonality, that is, in the usual senses, neither signal is useful for saying any much about the other.
Of course, in the case of stochastic processes, the orthogonality of a dot product (inner product) is not the same as probabilistic independence.
For computations, of course, see the many versions of the fast Fourier transform, in response to a question from R. Garwin, rediscovered by Tukey, first programmed by Cooley, later given various developments and many applications.
At one point early in my career, these topics got the company I was working for "sole source" on a US Navy contract and got me a nice, new high end Camaro, a nice violin, a nice piano for my wife, some good French food, a sack full of nice Nikon camera equipment, and a nice stock market account! My annual salary was 6+ the cost of the Camaro.
If you're very familiar with both of them, it can be easy to underestimate how confusing each is to a newcomer.