
How to Linear-Fit a Noisy Signal with Regular Discontinuities - jgforbes
https://www.jforbes.io/linear-fit-regular-discontinuities
======
jofer
This particular problem pops up in quite a few domains. (We often refer to it
as "phase unwrapping" in the geosciences.) The approach here is a good one so
long as your noise doesn't result in lots of mistaken "wrap-arounds".

However, it will fail badly in the presence of noise in many cases. It's
particularly problematic when the slope changes (e.g. a polynomial) or where
the slope is high and the noise is high. (Note that polynomials are still
linear in the sense mentioned here: linear regression).

At any rate, this is definitely a nice write-up, but a bit more discussion of
where the approach breaks down would be useful. It's actually a classic
example of an elegant solution that breaks down frequently in practice (i.e.
it's commonly used as a teaching example in various courses). A better
solution is usually more complex, domain specific, and therefore out-of-scope,
but failure modes for this method make for a nice set of examples.

~~~
jgforbes
Hey, I’m the author of the post.

You’re completely right. I investigated this as a means for solving the phase-
unwrapping problem I was working on, and while it worked relatively well, a
more domain specific solution was eventually used.

I purposely stayed away from mentioning phase unwrapping as I was trying to
make this as accessible as possible without overloading the reader with
jargon. My goal was to more show how problem transformation (like the
frequency domain) can sometimes make hard problems far simpler (I was also
just playing with data visualization). Looking at it now though, I probably
should have added in the conclusion some external resources for people who
have the background. It definitely wouldn’t have made the piece less readable,
and could have added a bit more value.

~~~
jgforbes
Also, and this is a major oversight on my part - I was specifically looking at
fitting data generated by an affine function, not “linearly fitting data”. How
I titled this is definitely confusing.

Part of what interested me in writing about this though is how the
discontinuities changed a trivial problem into something a bit trickier. If
the data could be generated by more complex functions, then I would have
forgone looking for an easy solution (as an aside, the problem I was working
on had sharp timing and hardware constraints which kept me from using a more
general solution).

------
GChevalier
I would have instead done like this:

(tldr; use a sine and cosine function regression like a linear regression.
Think like solving for a free angle and a free phase instead than for a free
bias and weight).

1\. Convert the hours to an angle in degrees or in radians (a simple linear
transformation).

2\. Take the cos and sin of the angle to get the x and y position in a plane,
respectively.

3\. Introduce a time axis such that the thing doesn't draw a circle but rather
an helix (like DNA).

4\. So we now have a ton of 3D data points: (time, x, y). Create a ML model to
fit a sine and a cosine to those data points to match them perfectly. Your
model has only 2 free parameters to optimize for: a shared phase offset and a
shared frequency. The sine uses (time, y) and the cosine (time, x).

5\. Initialize the model with a random phase offset and a frequency ideally
already close to the one you think you have. Don't initialize with a too high
frequency to avoid fitting just Nyquist-frequency-close-noise.

6\. Optimize! (With the least squares.) I guess that you might congerge only
to a local minima and need to try different randon starting frequencies if you
fail to converge.

7\. The answer to your problem is the now-optimized free parameter of the
frequency. It won't sit between two bins of your fft anymore.

Related: [https://stackoverflow.com/questions/16716302/how-do-i-
fit-a-...](https://stackoverflow.com/questions/16716302/how-do-i-fit-a-sine-
curve-to-my-data-with-pylab-and-numpy)

Note: This link contains images picturing the transformations I try to
explain.

Disclaimer: I didn't do that yet, this is just off the top of my head. If I
said something wrong, please comment. Mostly about a wrong convergence to
Nyquist freq or something like that (?).

In the end, this way, you won't have discrete fft bins. You'll approach the
problem orthogonally to that: you solve for finding the one best fft bin
(frequency) directly.

In other words: solve for the content in the exponential of "e" as free
parameters, and for one such frequency and phase offset instead of many bins.

~~~
GChevalier
Also, I forgot: to improve convergence, I'd use an Hann-Poisson window such as
here:
[https://en.wikipedia.org/wiki/Window_function#Hann%E2%80%93P...](https://en.wikipedia.org/wiki/Window_function#Hann%E2%80%93Poisson_window)

I'd apply the window to randomly-sampled mini-batches of consecutive points
instead of optimizing the neural network on just randomly-sampled batch points
or on all the dataset at once. I guess that using an Hann-Poisson window will
make the "gradient" valley easier to "ski down" with gradient descent which is
a greedy algorithm. I guess that the spectral leakage caused by the Hann-
Poisson window function will make the gradient landscape more monotonically
decreasing in every point towards the global minima.

