
Avoiding a Common Mistake with Time Series - davidkellis
http://svds.com/post/avoiding-common-mistake-time-series
======
foobarian
I'm not sure. The author says that adding a common component to two random
time series doesn't make them correlated. But that's not true, by
construction, at least using any of the simple correlation tests. It's a
complicated subject explained in a confusing way.

~~~
Leszek
I don't get it either. The author is adding a common dependency to two
independent series makes them correlated, which to me seems trivially true. He
then goes on to say that this form of correlation is uninteresting, because
we're only interested in whether variations are correlated, which doesn't seem
so trivially true: after all, a linear trend is a first-order variation, and
if two things both increase at the same rate, then their relationship is worth
at least a second look.

It looks like what the author is really trying to say is that we should pass
the data through a high-pass filter, eliminating any 'expected' trends such as
inflation, and instead observing if the noise of the two datasets is
correlated. This is an observation that has some value, but is certainly not
trivial to pick a threshold for the high-pass (certainly it's not always just
a linear trend), and the mutually dependent variable can have as much noise
(if not more) as the two measured data sources, so you still might get "false"
correlation.

~~~
arrrg
It seems to me that in the end it all comes down to your actual hypothesis
whether and what is actually interesting …

------
photon137
What the author is trying to explain are the concepts of cointegration and
stationarity. A useful introduction here:
[http://www.uta.edu/faculty/crowder/papers/drunk%20and%20dog....](http://www.uta.edu/faculty/crowder/papers/drunk%20and%20dog.pdf)

~~~
lazzlazzlazz
Great article, thanks very much for posting it. Nobody else seems to have
captured the relevant terminology: "cointegration" and "stationarity". Those
perfectly captured it for me.

------
omalleyt
Never use first differencing, that's crazy. It magnifies measurement error.
Pass the signal through a high pass filter to remove trend

~~~
radarsat1
Or simply subtract a first-order fit. (What the matlab/numpy `detrend()`
function does)

Edit: I'll note that this is the same thing as subtracting a very long-
windowed low-pass filter, i.e., performing high-pass filtering.

------
bbrazil
Something I've repeatedly found useful is that when debugging and you have a
conjecture, not only look for evidence that a correlation/causation is
present; but also look for evidence that it isn't.

Doing a very quick A/B test helps too.

~~~
jerf
"Something I've repeatedly found useful is that when debugging and you have a
conjecture, not only look for evidence that a correlation/causation is
present; but also look for evidence that it isn't."

In my opinion, the absolute, utter core essence of science can be expressed
simply as "Always be trying to prove yourself wrong." The human brain is
extremely biased in the other direction, and it's _darned good_ at proving
itself correct. It can prove itself correct in absurdly powerful ways. Always
be fighting it, always be looking for ways to prove yourself wrong. If you do
it seriously, things like "the scientific method" will naturally fall out of
your serious attempts and need not be surrounded by near-worship, whereas no
amount of worshipfully-following a checklist of the "scientific method"
without the true effort to prove yourself wrong will produce truth; the human
brain is far more powerful than the "scientific method" or any such static
methodology.

I have deliberately left the word 'theory' out of this post. This scientific
mindset beyond that into engineering and any number of day-to-day activities.

~~~
drcomputer
Try to prove yourself wrong about believing it is better to be wrong all the
time.

Insanity!

I agree with you for the most part. But there's a line. When it starts to
destroy your personal identity and sense of self, you've crossed that line.
Sometimes it's just better to be right and trust yourself.

Or not. That's zen, I think. Trying to have so much information about
everything that you wind up overloading yourself with it and can't make heads
or tails of it, because in every simple truth exists an infinite proof.
Analysis paralysis!

~~~
jerf
Despite superficially resembling some self-referential paradoxes, if the
heuristic can stand up to itself, it's not a paradox. You try to prove it
wrong, and you fail, so you keep it, which, lo, is exactly what I've done. (Or
perhaps more precisely, while it is not a perfect fit and I could quibble with
details myself, it is the _best_ one-sentence summary of true science in my
opinion, and I'm familiar with most of the usual ones, and aware that I've
only seen this variant in one or two other places even so.) No paradox, no
contradiction, no insanity.

Alas, if you're looking for an excuse to tie either yourself or me up in some
sophomoric philosophical conundrum, this doesn't do it. But there's no lack of
such things if you look, so don't be disappointed. Keep on trucking. May I
suggest Godel, Escher, Bach: An Eternal Golden Braid if you would like the
industrial strength version rope, err, braid to tie yourself in?

~~~
drcomputer
I understand where you are coming from, but I'm not trying to engage you in a
'sophomoric' debate. I came to the existential philosophical argument from
mathematics and the computer sciences, not the other way around. I started my
interest in computing science through automating proof (mathematical and
computational). I doubt you care, but it's more about looking at the big
picture of software development versus the little picture. You can't build
anything if you keep testing the first thing for every bug, including
everything it was built with, including all the mathematics used to reason
about it.

But I'm sure there are two ways to respond to my comment: pretend you know
what you are talking about, or admit you could be wrong and care to engage me
in a real conversation. I've been down the same road hundreds of times on the
internet, and it is hard to find people to talk to about things and meet in
some kind of scientific, logical middle. But if that's not your thing, oesn't
bother me if all you came here for was to prove yourself right about the noble
ethics of your science. At least I'm no longer trapping my mind in a paradox
of it's own creation. (Or am I? I never really know).

Logic is very different from science, that's all I have to say. Rigor is for
precision, not discovery.

~~~
jerf
"I understand where you are coming from, but I'm not trying to engage you in a
'sophomoric' debate."

Considering the entire rest of your message consists of you essentially trying
to lord over me whatever superiority you seem to think you have, and not
missing an opportunity to slide a snide comment in, which I'd also observe is
consistent with your original post, I've come to the conclusion that the rest
of your post belies this claim. You're being abrasive and abusive while trying
to pretend to be so much more open minded.

No sale. I recognize that tactic and refuse to engage with it. I'm sticking
with my original assessment; you're being too sophomoric to engage with, and I
reject your attempts to psychology me into engaging with you in the
presupposed frame of your superiority.

~~~
drcomputer
Apologies, I would not interpret receiving my post back to me in the same way
(I do not think, I usually read my posts back to myself and pretend I am
talking to myself to try to understand how another person may take it, which I
realize is fairly biased given my potentially unique perspective on reality).
Human socialization is very difficult for me.

I don't like psychology.

------
esfandia
Isn't the author exaggerating in the other direction? There is obviously
correlation between the two time series. Sure, who's saying there is causation
(as mentioned in the article there can be a third random variable that the
first two depended on)? But also, who's to say _there 's no causation_? Is it
ok to always remove the correlated part of the two time series? What if that's
the interesting part and the explanation you're looking for?

~~~
photon137
Even if you're looking for causation, detrending is usually necessary to
obtain consistent estimators (in statistical terms). For example, Granger
causality works with two stationary timeseries:

[https://en.wikipedia.org/wiki/Granger_causality](https://en.wikipedia.org/wiki/Granger_causality)

------
n00b101
This is called spurious correlation. It's well known in financial / economic
time-series analysis. The lesson is that you never measure the correlation
between the PRICE LEVELS of products, instead you measure the correlation
between the daily/weekly/etc CHANGE IN PRICE LEVELS.

A famous example of this:

The tale of David Leinweber, which is related in the excellent new book
"Quantitative Value," illustrates this point about "stupid data miner tricks."
Leinweber sifted through a United Nations CD covering the economic data of 140
countries. He found that butter production in Bangladesh explained 75 percent
of the variation of the S&P 500 Index. Not satisfied, he found that if he
added a broader category of global dairy products, the correlation would rise
to 95 percent. Then he added a third variable, the population of sheep, and
found that he had now explained 99 percent of the variation in the S&P 500 for
the period 1983-'99.

([http://www.cbsnews.com/news/what-butter-production-means-
for...](http://www.cbsnews.com/news/what-butter-production-means-for-your-
portfolio/))

------
SixSigma
I ordered the book, Quantitative Forecasting Methods by Farnum and Stanton
(PWS-KENT, 1989). it was only £2.81, sounds like money well spent.

~~~
awch
I got the last "cheap" one, at $3.94.

~~~
SixSigma
hmm, my purchase got cancelled and now they are all 10x the price.

I reckon they had a look at the competition before putting it in the envelope

------
nemo44x
Does this mean that if I apply this algorithm and that 2 or more time series
data sets are still similar that they are in fact correlated? I find this test
fascinating.

------
eouw0o83hf
Wow, and the graphs on
[http://www.tylervigen.com/](http://www.tylervigen.com/) are just incredible.

------
plg
also: statistical tests on correlation coefficients don't test whether the
correlation is "significant" or not --- they only test whether the correlation
is reliably different than 0.00

So a small correlation (e.g. r=0.10) can still be "statistically significant"
at p<0.001 but all this means is that r is reliably different than 0.00 --- it
doesn't mean r is big

------
princeb
perform a dickey fuller if you are unsure if a time series is nonstationary,
perhaps.

------
gengkev
where are the scrollbars??

