
Everything Is Correlated - KenanSulayman
https://www.gwern.net/Everything
======
ajuc
When the correlation is close to 0 it's often because of a feedback loop.

For example - in economy with central bank trying to hit inflation target -
interest rates and inflation will have near 0 correlation (interest rates
change but inflation remains constant). That's because central bank adjusts
interest rates to counter other variables so that inflation remains near the
target.

Other example (my favorite, it was mindblowing when my teacher showed it to us
on econometrics as a warning :) ) - gas pedal and speed of a car driving on a
hilly road. Driver wants to drive near the speed limit, so he adjusts the gas
pedal to keep the speed constant. Simplistic conclusion would be - speed is
constant despite the gas pedal position changing therefore they are unrelated
:)

~~~
mojomark
Good discussion. On the flip side, in my data mining class the professor keeps
saying ~"you may be able to find clusters in a data set, but often no true
correlation exists." However, that's an absolute statement I just don't
swallow. In my mind what I see is that if an unexplained correlation or non-
correlation appears, it may be random (or true) or it could be the result of
an unmeasured (hidden) variable. In your two examples, your simply pointing
out two respective hidden variables that weren't accounted for in the original
analysis.

I think any data analysis should always be caveated with the understanding
that there may be hidden variables shrouding or perhaps enhancing correlations
- from economics to quantum mechanics. It's up to the reviewer of the results
to determine, subjectively or by using a standard measure, whether the level
of rigor involved in data collection & analysis sufficiently models reality.

~~~
dalore
Perhaps they are trying to explain clustering illusion? The phenomenon that
even random data will produce clusters. You can take that further and state
random data WILL produce clusters. If you don't have clusters then your data
is not random and some pattern is at play.

This really tricks up our mind as our mind tries to find patterns everywhere.
If you try and plot random dots you will usually put dots without clusters. A
true random plot will have clusters.

[https://en.wikipedia.org/wiki/Clustering_illusion](https://en.wikipedia.org/wiki/Clustering_illusion)

Edit: Note your professor said "often" which means they did not make an
absolute statement

~~~
AstralStorm
Ipso factum all "natural" variables are related to bounded random walk which
produces clusters (Markovian process), or otherwise have complex chaotic (e.g.
fractal) mechanics, which also produces clusters. This follows from physics.

Maximum entropy as well as zero entropy is a very rare state to observe.

~~~
ChainOfFools
does this imply that the universe somehow rewards structures that engender
'compressibility' (coarse graining)? it does seem like our brains subjectively
enjoy identifying it, to the point of over-optimization in the form of
phenomena like pareidolia

~~~
throwawaymath
The universe doesn’t “reward” it so much as it’s just a consequence of random
events. For example, if you flip a coin many times, you’ll see long sequences
of heads. From the central limit theorem it follows that sufficiently many
random events will form a normal distribution, which exhibits clustering
phenomenon. Take a look at a Galton board in action.

~~~
AstralStorm
That's ignoring anything related to actual life we observe and Gaussian
distributed data does not have to exhibit clustering either. (But it allows
that.)

About the only thing that is naturally uniform so far within bounds is large
scale homogeneity and isotropy of universe. Which is an unsolved mystery
potentially involving dark matter.

~~~
dalore
I would argue if it didn't do clustering then there was some sort of
pattern/bias at play that caused it.

------
limbicsystem
It is true that, as Fisher points out, with enough samples you are almost
guaranteed to reject the null hypothesis. That's why we tell students to
consider both p values (which you could think of as a form of quality control
on the dataset) and variance explained. Loftus and Loftus make the point
nicely: p tells you if you have enough samples and any effect to consider,
variance explained tells you if it's worth pursuing. Both are useful guides to
a thoughtful analysis. In addition, I'd make a case for thinking about the
scientific significance and importance of the hypothesis and the Bayesian
prior. And to put a positive spin on this, given how easy it is to get small p
values, big ones are pretty much a red flag to stop the analysis and go and do
something more productive instead.

~~~
nonbel
> "It is true that, as Fisher points out, with enough samples you are almost
> guaranteed to reject the null hypothesis. "

Where does Fisher point this out?

> "That's why we tell students to consider both p values (which you could
> think of as a form of quality control on the dataset)"

How is this "quality control"? It just tells you whether your sample size was
large enough to pass an arbitrary threshold...

~~~
gwern
> Where does Fisher point this out?

Probably in the Fisher excerpt.

~~~
nonbel
I looked but did not see it.

------
ivan_ah
Agree that NHST using simple null hypothesis of the form

    
    
       H0:  μ = 0
    

doesn't provide much value. H0 is never true, and the conclusion of "rejecting
H0" based on a p-value is therefore not super profound. Also "rejecting H0"
conclusion doesn't really tells anything about the alternative hypothesis HA
(not even considered when computing p-value, since p-value is under H0).
Dichotomies in general are bad, but NHST with point H0 is useless!

However a composite hypothesis setup of the form

    
    
       H0:  μ ≤ 0
       HA:  μ > 0
    

is probabilistically sound (in as much as some journal requires you to report
a p-values). Much better to report effect size estimate and/or CI.

~~~
mjfl
Couldn't you make an argument that the point H0 has use when you are testing
whether two populations are identical? i.e. it's probably true that \mu is
very close to 0 if it is the difference in heights of men from Nebraska vs men
from Iowa.

~~~
rwj
You've kind of hit the point with the second half of your comment. Two
populations are virtually never identical, so you don't need any statistics to
answer the question. A more reasonable question is whether or not you have the
statistical power (i.e. measurement precision) to see the difference, and
whether the difference is big enough to matter.

------
JohnJamesRambo
This reminds me of the current omnigenic hypothesis about genes. That
unexpectedly almost every gene seems to affect the expression of traits.

[https://www.quantamagazine.org/omnigenic-model-suggests-
that...](https://www.quantamagazine.org/omnigenic-model-suggests-that-all-
genes-affect-every-complex-trait-20180620/)

"Drawing on GWAS analyses of three diseases, they concluded that in the cell
types that are relevant to a disease, it appears that not 15, not 100, but
essentially all genes contribute to the condition. The authors suggested that
for some traits, “multiple” loci could mean more than 100,000."

~~~
nonbel
That is just a special case of the "everything is correlated" principle.

------
RosanaAnaDana
I think a major issue here is that, perhaps, there is a tendency to want to
use statistics to decide what the 'truth' is, because it takes the onus of
responsibility for making a mistake away from the interpreter. Its nice to be
able to stand behind a p-value and not be accountable for whatever argument is
being made. But the issue here, is that most any argument can be made in a
large enough dataset, and a careful analyst will find significance.

This is of course the case only if one does not venture far from the principal
assumptions of frequentism, most of which are routinely violated outside of
almost every example except pure random number generation and fundamental
quantum physics.

So a central issue that isn't addressed in STATS101 level hypothesis testing
is the impact that the question has on the result. Its almost inevitable that
people want to interpret a failure to reject as a positive result. But a
p-value really doesn't tell you if its a useful result; but rather, your
sample size is big enough to detect a difference.

Statistical significance is something that can be calculated. Practical
significance is something that needs to be interpreted.

------
anthony_doan
I think this article is trying to tie two things together, the p-value problem
and the fact you can throw in more data.

I disagree.

It's cheating, it's goes against experimental design analysis, and it does not
differentiate between given data and data that was carefully collected. We
have experimental design class for a reason. It helps us to be honest. Of
course there are tons of pit falls many novice statisticians can do.

It also implicitly leads people to think that statistic can magically handle
given data and big data by doing the old fashion statistic way. If you do that
than of course you'll get a good p-value.

~~~
gwern
> It's cheating, it's goes against experimental design analysis, and it does
> not differentiate between given data and data that was carefully collected.
> We have experimental design class for a reason. It helps us to be honest. Of
> course there are tons of pit falls many novice statisticians can do.

Explicit sequential testing runs into exactly the same problem. The problem
is, the null hypothesis is not true. So no matter whether you use fixed
(large) sample sizes or adaptive procedures which can terminate early while
still preserving (the irrelevant) nominal false-positive error rates, you will
at some sample size reject the null as your power approaches 100%.

~~~
nonbel
This is mostly right, but you are still thinking of these rejections as "false
positives" for some reason. They are real deviations from the null hypothesis
("true positives"). The problem is the user didn't test the null model they
wanted, it is 100% user error.

~~~
dwaltrip
Can you explain that last sentence? What is a valid null model if everything
is correlated?

~~~
nonbel
A model of whatever process you think generated the data.

EDIT:

I guess I should say that the concept of testing a "null model" without
interpreting the fit relative to other models is wrong to begin with. You need
to use Bayes' rule and determine:

    
    
      p(H[0]|D) = p(H[0])p(D|H[0])/sum(p(H[0:n])*p(D|H[0:n]))
    

Lots of stuff wrong with what has been standard stats for the last 70 years,
it literally amounts to stringing together a bunch of fallacies and makes no
sense at all.

~~~
dwaltrip
Thanks for the response. Do you know of any you good blog posts or articles
that dive into this a bit more? It looks very interesting.

~~~
nonbel
This is the best description of the main problem (testing your own vs some
default hypothesis) I have seen:

Paul E. Meehl, "Theory-Testing in Psychology and Physics: A Methodological
Paradox," Philosophy of Science 34, no. 2 (Jun., 1967): 103-115.
[https://doi.org/10.1086/288135](https://doi.org/10.1086/288135)

Free download here:
www.fisme.science.uu.nl/staff/christianb/downloads/meehl1967.pdf

Andrew Gelman (andrewgelman.com) has a great blog that often touches on this
issue.

~~~
dwaltrip
Thanks a bunch for sharing, I appreciate it. I'll add these resources to my
reading list. I may also pass it along to my brother who getting a graduate
degree in psych :)

------
nonbel
>" _The fact that these variables are all typically linear or additive_
further implies that interactions between variables will be typically rare or
small or both (implying that most such hits will be false positives, as
interactions are far harder to detect than main effects)."

Where does this "fact" come from? And if everything is correlated with
everything else all these effects are true positives...

Also, another ridiculous aspect of this is that when data becomes cheap the
researchers just make the threshold stricter so it doesn't become too easy.
They are (collectively) choosing what is "significant" or not and then acting
like "significant" = real and "non-significant" = 0.

Finally, I didn't read through the whole thing. Does he claim to have found an
exception to this rule at any point?

~~~
gwern
> Finally, I didn't read through the whole thing. Does he claim to have found
> an exception to this rule at any point?

Oakes 1975 points out that explicit randomized experiments, which test a
useless intervention such as school reform, can be exceptions. (Oakes might
not be quite right here, since surely even useless interventions have _some_
non-zero effect, if only by wasting peoples' time & effort, but you might say
that the 'crud factor' is vastly smaller in randomized experiments than in
correlational data, which is a point worth noting.)

~~~
nonbel
Thanks,

How about this "fact": _The fact that these variables are all typically linear
or additive_?

~~~
gwern
That is simply a corollary of the fact that Pearson's r and regressions are
usually linear/additive, and things like Meehl's demonstration wouldn't work
if they weren't. You'd just calculate all the pairwise correlations and get
nothing if they were solely totally nonlinear/interactions. (In which case
you'd have a hard time proving they were related at all.)

~~~
nonbel
> You'd just calculate all the pairwise correlations and get nothing if they
> were solely totally nonlinear/interactions.

I don't believe this. Most nonlinear correlations also show up as non-zero
(linear) correlation coefficients. There are really only a couple pathological
cases I can think of where it would not happen.

------
pierrebai
Is this trying to be too clever? If the correlation is weaker than the random
noise of the data, then it is equivalent to not being correlated.

Otherwise, we'd get conclusions like the color of your car influencing your
risk of lung cancer or some such nonsense. With enough data, you could see a
weak correlation of red car to cancer, but it would still be insignificant.
That's what the null-hypothesis is for: to put a treshold under which we can
just ignore whatever weak correlation seems to be there.

------
Sniffnoy
Question: Are these correlations typically transitive? That is to say, does it
typically happen that in addition to everything having nonzero correlation
with everything else, it additionally happens that the sign of the correlation
between A and C is equal to the product of the signs of the correlations
between A and B and between B and C?

Thorndike's dictum would suggest that this is so, at least in that particular
domain. What about more generally?

------
SubiculumCode
Like a background radiation, we have an "absolute background" correlation
value...a value we might test against e.g. |+/\- .02321|

Or we could drop the null

~~~
ahazred8ta
REJECT THE NULL HYPOTHESIS !!! :-)

------
purplezooey
It's well known that the number of Nicholas Cage movies is correlated with a
wide variety of natural phenomena.

------
leaky_valve
Sample means and true means are different things.

~~~
gwern
You're being downvoted because you missed the point repeatedly made in the
intro and many of the excerpts that this is in fact a claim about the 'true
means'.

