

Why do we Sometimes get Nonsense-Correlations between Time-Series? - ahalan
http://www.math.mcgill.ca/dstephens/OldCourses/204-2008/Handouts/Yule1926.pdf

======
EvanMiller
The money quote of this article is:

"I propose to term such correlations... the _serial correlations_ for the
given series."

The basic idea here is that observations in a time-series are not actually
independent because each observation is highly correlated with the previous
observation, and so the usual significance tests and standard errors do not
apply.

Some links if you're interested in learning more about analyzing serial
correlation and time-series data:

<http://en.wikipedia.org/wiki/Autoregressive_model>

<http://en.wikipedia.org/wiki/Newey–West_estimator>

<http://en.wikipedia.org/wiki/Prais-Winsten_transformation>

------
andrewcooke
i think the argument being made is that if you sample a continuous signal at a
frequency higher than where most of the power in the signal's spectrum lies,
then those samples are not independent. so standard statistical tests that
assume independent measurements overestimate significance.

so if you have two smooth, continuous signals, over a relatively short time
(compared to the underlying process that is generating them) then you should
simply ask whether they both slope in the same general way (if you like,
there's a 50:50 chance that both go up (or down) compared to one going one way
and one the other). both sloping in the same way is not terribly significant
(50:50 likely by chance). and that doesn't change even if you sample like
crazy, and generate lots and lots of points, which appear to show a hugely
significant correlation.

[edit as i slowly grok this better] more generally, correlation coefficient
isn't a good tool to use for comparing signals. it should be used for
comparing random samples from populations (a signal is not a population). and
i don't think people use it that way these days. so i guess this paper won
out.

but i may have missed something, or be simply wrong, because this was
published 100 years after fourier died, yet when i scanned it i saw nothing
that mentioned fourier analysis, which seems like an obvious way (see above)
to phrase this (but i may be biased, since i guess fourier analysis boomed
once machines existed to compute ffts).

------
greenyoda
Note: 63-page PDF of a mathematical paper published in 1926.

