Hacker News new | past | comments | ask | show | jobs | submit login
Wavelets allow researchers to transform and understand data (quantamagazine.org)
129 points by theafh 14 days ago | hide | past | favorite | 40 comments

So if I'm understanding this right -- if I have a stream of numbers coming in forming a squiggly line, and I have a bucket of these wavelet shapes, I can pick up wavelets and stretch and squeeze and resize them and overlay them on my line, and ... use them to characterize that squiggly line? Working as feature recognition and also serving as a way to compress it? So instead of 20k data points, I have a sequence like 'mexican hat, mexican hat', and maybe elements in the sequence are different sizes and overlap?

(a) if my intuition is super wrong I'd love for HN to correct it, heh. (b) long shot, but anyone have links to code? It's HN after all -- maybe some commenters in this thread are aware of cool/idiomatic/simple/etc uses of wavelet code in open source software.

(I know of a bunch of cool uses they've been put to, from JPEGs to the spacex fluid dynamics presentation about using wavelet compression on gpu, but I've personally never used them as a tool for anything, and it'd be fun to learn about them with code!)

Wavelets are used for pattern recognition in many iris recognition systems. First, the position of the iris in the input image is determined, then any eyelash and eyelid occlusions are removed and the iris is extracted by converting it to a polar coordinated image. The resulting image signal is convolved with wavelets of different shapes and sizes. The resulting signal is encoded using phase demodulation to produce an IrisCode. Two iris codes can be checked for a match by computing their hamming distance. [1] is the paper which describes the original system invented by Daugman. [2] is an open source implementation of that method.

[1] https://www.robots.ox.ac.uk/~az/lectures/est/iris.pdf [2] http://iris.giannaros.org/

That paper by Daugman was the basis for my undergrad thesis, so when I saw the title, that's what my mind immediately jumped to! Lo and behold, I open the comments and find a link to the paper. Not only that, but it's still stored on my old professor's website (~az = Andrew Zisserman)!

This is quality content. Thank you


A mathematician explained it to me as an alternative to a fourier transform: instead of describing the function as a sum of sine waves, its a sum of mexican hats (or whatever basis function). And it turns out that’s a simpler representation in the case of function with sharp discontinuities. It’s also an alternative to a Taylor series, replacing sums of derivatives with the sum of the scaled basis function. Seemed like a pretty elegant explanation to me.

yes, this is how i understand them.

it has been a long time, so i'm probably going to say something really stupid... but it's fun to try and hopefully someone will correct me if i'm wrong,

my understanding is that one place that they're useful is in terms of time-frequency uncertainty that occurs with regular fourier transforms. for example, if you're doing spectral analysis and looking for low frequencies you need a wider analysis window (a full cycle has a long period for low frequencies), but that wide analysis window now covers a lot of time, so if you pick up a low frequency component, you don't know where it is in that big wide window. wavelets instead fit these families of arbitrary basis functions that can be more compact, so you can use a narrower window and therefore have less time uncertainty for the things you find within it.

...most applications sort of go from there, where you want to warp the equivalent of the x-axis in the frequency domain in various ways and perhaps be able to either use a narrower analysis window for better time resolution or say, a smaller number of basis functions for reconstruction in lossy compression where some bands are less important than others and resolution in those bands can be discarded.

I think about it in terms of trading in time-frequency space. The total amount of time-frequency uncertainty is conserved: it's just like the Heisenberg uncertainty principle from physics, and is in fact deeply connected to it.

A classic Fourier spectrogram makes one choice about time uncertainty vs. frequency uncertainty and uses it for all frequencies. You can think of it as dividing time-frequency space into little squares that work okay for most quantities of interest.

Wavelets alter that tradeoff. At low frequencies, we often care about small differences in frequency (e.g., 4 vs 5 Hz), while the precise location of the peaks (e.g., 10 vs 10.1 s) matters less because the signal is changing slowly anyway. The situation is reversed at high frequencies: 1 kHz vs 1.001 kHz is often physically irrelevant, but the timing matters because they're so short). We therefore divide the time-frequency space up so that the windows are long in time, but narrow in frequency at low frequencies, square in the middle, and short in time but wide in frequency at higher ones.

On top of that, wavelets recognize that sine waves are a mathematically convenient basis, but may not reflect your data, so you can also swap in a "mother wavelet" that better matches whatever phenomena you're studying.

One of the critically useful properties of wavelets, as I understand it, is that they are localized in both the original domain and the frequency domain.

A FT on an audio file, for example, will tell you how much of each frequency is present in that file. Each component is localized in frequency. A wavelet, however, is localized in both frequency and time (due to this decaying “hat” shape). When you scan this wavelet across the audio file and record the outputs you therefore learn not only which frequencies are present, but also their locations in the file.

This is the stuff I want to know more about, because I understand my analogies/mental model, but I have only crude/naive 'code mental model' of how these kinds of things are implemented

When you say:

    When you scan this wavelet across the audio file
Is that literal in code?

Apologies for the naive code analogies -- your comment makes me think of something like an XLA 'reduceWindow' operation (edit: or an OpenCL kernel), where a wavelet-sized 'window' is slid over the input, the difference is computed for each data point -- and then (waves hands: probably something smarter than just a) 'cumulative sum' to see how close/different is is.

(a) is that on the right track/what you meant by 'scanning the wavelet across?

(b) what about transforming the wavelet -- to different scales, etc?

This seems like the kind of thing that's actually like a search problem, heh, where you'd need to slide it over the input many many times, and where it could conceivably benefit from (waves hands) machine learning techniques as a result, to come up with something that does the search for a good wavelet encoding more efficiently based on the input.

... which, I guess, means I should just look around github for "wavelet compression" and see what code pops up, since that search for good encodings will be done in any of those.

--- Edit: this one seems simple enough to learn from, it's minimal, MIT licensed, audio-focused, written in Julia, and references the course material that it's based on:


It, uh ... like lots of scientific code, it has a lot of single-letter variables. :D But if one was to google up a few acronyms ('DWT'), install Julia, get it to run, and then re-implement it in another language, my feeling is that you might understand wavelets pretty good, no? Or be at least be well-equipped to understand more interesting uses of wavelets?

(I'm posting this because I might use some 'shop time' doing approximately that, so I would definitely appreciate any HN comments telling me 'consider learning from $some_other_example instead, for $some_reason.')

I think one of the major advantages of wavelets is the time-frequency representation of the signal, which means being able to see the different spectra of the signal change over time (at different scales) which is not easily attainable with Fourier transform. So you could detect transient interesting patterns (spatial frequencies) in the data that would otherwise go unnoticed. Same you could enhance or remove certain spatial frequencies of the signal depending on the task.

Nice to see some love for wavelets. They're great for many more things. Robust volatility measurement, galerkin methods for numerical solutions to differential equations in finance. Just please don't use a Haar wavelet for anything other than teaching.

Hear wavelets can be extremely useful when the signal of interest is square-shaped, or fast processing is required.

Can you use wavelets in analyzing behavioral data (clicks, decisions, etc.)?

Intuitively, wavelets, and SP in general, seem like something behavioral folks could benefit from, but I can't come up with any good use cases that would justify going through all the associated math.

Wavelets have also been used in the physical sciences. Here they are used to translate sea surface patterns into water depths: https://www.mdpi.com/2077-1312/8/10/772

>Nice to see some love for wavelets.

Yeah, I agree. Wavelets were all the rage in the late 80's and 90's and they seem to have fallen out of fashion.

As a matter of fact, It's kind of strange that applied math techniques be subject to fashion.

I think there is quite a lot to be done looking at deep nets in terms of wavelets for example.

Mathematicians are human just the same and are wont to get their hands on shiny toys too from time to time. There's some value in seeking out novelty.

Apropos deep nets, with the explosion in machine learning in the past few years, I've been seeing a lot of research interest statements change to meet that. In particular, there's an awful lot of numerical linear algebra being done now.

I suspect that things will come full-circle soon enough, and those tools developed in numerical linear algebra (via their connections to functional analysis) will make their way to harmonic analysis.

(This is notwithstanding the fact that compressed sensing is picking up a little momentum as a research area in applied mathematics and other disciplines that study signal processing. Wavelets, curvelets, shearlets, chirplets, etc. will likely see some action there too.)

It's rational to seek out novelty. What's more likely: that you'll be the first to discover a momentous consequence of a theorem published last week, or that you'll be the first to discover a momentous consequence of a theorem published by Euler?

Could you provide some examples of recent numerical linear algebra inspired by machine learning? The only numerical linear algebra related to machine learning, in particular deep learning, is matrix multiplication due to convolution. I am curious what else in numerical linear algebra is impacting machine learning.

> The only numerical linear algebra related to machine learning, in particular deep learning, is matrix multiplication

I suspect the OP is not talking about how deep nets are implemented, but rather how people are trying to understand how and why they work so well, or how to reverse-engineer knowledge out of a trained net, or how to train them faster, etc ...

In that space, you need quite a bit more than matrix multiplication.

I though Haar wavelets were good for text and other hard-edged images?

So, dumb question...

If you were training some deep learning model...

...should you be trying a few Wavelet transforms on your inputs, and feeding those in to your model, too, to see if your model performs better with wavelet inputs?

Instead of saying "I'm going to try this, maybe it will work", you should instead be asking if wavelet transforms are appropriate given the domain you are building a model for. Don't just transform data in the hopes that it will magically work.

>Don't just transform data in the hopes that it will magically work.

You’ve just described modern machine learning.

Do you know how we got penicillin? Alexander Fleming didn't keep a clean lab.

Do you know how we discovered X-Rays? Henri Becquerel realized his photographic plates had been darkened after being left in a drawer with uranium sulfate.

Do you know how electric guitar distortion was discovered? Willie Kizart dropped his Fender amp.

Worse things have happened than experimenting by throwing one more transform on your inputs before processing them.

Sure, and I understand this sentiment. But in practice/industry, it's best not to build/deploy models you don't understand. Edge cases in ML models that are fully automating business decisions can be pretty dangerous. I think the likelihood of a penicillin level discovery happening when I'm trying to train a model to make better marketing budget decisions for a company is quite low.

Or if you do, write down the results and publish them even if they’re negative. That’s how science is meant to work.

Hypothetically, if the computer could try both of these experiments at no cost to you, and tell you whether it improved things or not, does asking whether wavelets are appropriate for the domain even matter?

I guess the overarching question is...

How do you determine if they are good for your application and how do you choose which family of wavelets to apply?

The first few layer of CNNs are basically learned wavelets. I've had the thought to use basis functions, based on wavelets and/os Legendre polynomials, to perform the first few layers of image processing (more regular and don't need backprop). Haven't been in that space in a while though and don't have much time to mess with it.

CNNs have gotten so good though it seems a little moot.

Not in the field, but that seems to me a potentially interesting intuition.

it's almost universally known that the whole point of NNs is that they've obviated the need for feature engineering (for some tasks)

Right, exactly. This is manually engineering the lower features, which CNNs just learn.

The chief advantage of this technique is a) certain regular wavelets have performance benefits over convolutions, especially when talking low level/FPGA/ASIC space b) being not learned can be beneficial, as these is nothing to overfit.

I can see it being handy for embedded/rasppi like applications. In particular, you can festoon a crude face detector with a dozen Haar filters.

End to end training is just soooo convenient though.

It very much depends on your inputs, but it’s likely worth a shot. Note that wavelet transforms are generally much slower than fast Fourier transforms.

Not a dumb question at all. There has been some research on using a wavelet scattering transform as a feature extractor instead of learned convolutions in the first layers of deep neural networks. It works, but isn't as good, _given enough data_. For low-data regimes it makes sense to do this.

It's already a thing, check out Kymatio. Currently it underperforms compared to a standard CNN but it's an interesting avenue of research.

The referenced article has:

"Wavelets came about as a kind of update to an enormously useful mathematical technique known as the Fourier transform. In 1807, Joseph Fourier discovered that any periodic function — an equation whose values repeat cyclically — could be expressed as the sum of trigonometric functions like sine and cosine."

Due to the "periodic", that math is Fourier series, not the Fourier transform. Given almost any periodic function, the Fourier transform won't exist.

The math of Fourier theory is done carefully in two of the W. Rudin texts: Fourier series is in his Principles of Mathematical Analysis, and Fourier transforms is in his Real and Complex Analysis. For more, his Functional Analysis covers the related distributions. See also the "celebrated" Riesz–Fischer theorem, IIRC in his Real and Complex Analysis.

As I recall, wavelets have some good completeness properties.

Power spectra are of interest, and for that see the Wiener–Khinchin theorem. For the relevant statistics of estimating power spectra, see the work of Blackman and Tukey.

The referenced article also has

"That’s because Fourier transforms have a major limitation: They only supply information about the frequencies present in a signal, saying nothing about their timing or quantity."

Phase gives some information on timing, and the power spectra, on quantity.

The article also has

"Whenever you have a particularly good match, a mathematical operation between them known as the dot product becomes zero, or very close to it."

A dot product value of zero means orthogonality, that is, in the usual senses, neither signal is useful for saying any much about the other.

Of course, in the case of stochastic processes, the orthogonality of a dot product (inner product) is not the same as probabilistic independence.

For computations, of course, see the many versions of the fast Fourier transform, in response to a question from R. Garwin, rediscovered by Tukey, first programmed by Cooley, later given various developments and many applications.

At one point early in my career, these topics got the company I was working for "sole source" on a US Navy contract and got me a nice, new high end Camaro, a nice violin, a nice piano for my wife, some good French food, a sack full of nice Nikon camera equipment, and a nice stock market account! My annual salary was 6+ the cost of the Camaro.

Wavelets are like localized Fourier transforms. Super useful in some compression algorithms.

I agree that the discrete wavelet transform is analogous to the discrete Fourier transform, but they are calculated differently, and present very different information in a very different way. I think most people find DWT results much more confusing than FFT results.

If you're very familiar with both of them, it can be easy to underestimate how confusing each is to a newcomer.

They are very similar to a short term Fourier transform (STFT). Both of which permit one to analyze the spectrum of signal over time

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact