Hacker News new | past | comments | ask | show | jobs | submit login

Gah, I wish I had time to fully read this and get into it, but I have to spend the next few hours driving.

Unfortunately the original article isn't very clearly explained, and it's only on reading the discussion in the comments under it that it becomes clear what it's actually saying.

The point is about signal & noise. Say your random variable X contains a signal component and a noise component, the former deterministic and the latter random. Say you correlate Y-X against X, and further say you use the same sample of X when computing Y-X as when measuring X. In this case your correlation will include the correlation of a single sample of the noise part of X with its own negation, yielding a spurious negative component that is unrelated to the signal but arises purely from the noise. The problem can be avoided by using a separate sample of X when computing Y-X.

The example in the original "DK is autocorrelation" article is an extreme illustration of this. Here, there is no signal at all and X is pure noise. Since the same sample of X is used a strong negative correlation is observed. The key point though is that if you use a separate sample of X that correlation disappears completely. I don't think people are realising that in the example given the random result X will yield another totally random value if sampled again. It's not a random result per person, it's a random result per testing of a person.

This is only one objection to the DK analysis, but it's a significant one AFAICS. It can be expected that any measurement of "skill" will involve a noise component. If you want to correlate two signals both mixed with the same noise sources you need to construct the experiment such that the noise is sampled separately in the two cases you're correlating.

Of course the extent to which this matters depends on the extent to which the measurement is noisy. Less noise should mean less contribution of this spurious autocorrelation to the overall correlation.

To give another ridiculous, extreme illustration: you could throw a die a thousand times and take each result and write it down twice. You could observe that (of course) the first copy of the value predicts the second copy perfectly. If instead you throw the die twice at each step of the experiment and write those separately sampled values down you will see no such relationship.




Hey omnicognate, good to see you here, appreciated our previous discussion.

What you're saying is that we need to verify the statistical reliability of the skill tests DK gave, and to some extent that we need to scrutinize the assumption that there indeed is such a thing as "skill" to be measured in the first place. I hope we can both agree that skill exists. That leaves the test reliability (technical term from statistics, not in the broad sense).

What's simulated by purely random numbers is tests with no reliability whatsoever. Of course if the tests DK gave to subjects don't actually measure anything at all, the DK study is meaningless. If that's what the original article's author is trying to say, they sure do it in a very roundabout way, not mentioning the test reliability at all. I'd be completely fine reading an article examining the reliability of the tests. Otherwise, I again fail to see how the random number analysis has anything to do with the conclusions of DK.

In fact, DK do concern themselves with the test reliability, at least to some extent. That doesn't appear in the graph under scrutiny but appears in the study.

If you assume the tests are reliable, and you also assume that DK are wrong in that people's self-assessment is highly correlated with their performance, and generate random data accordingly, you'll still get no effect even if you sample twice as you propose.

> The key point though is that if you use a separate sample of X that correlation disappears completely

Separate sample of X under the assumption of no dependence at all of the first sample, i.e., assuming there is no such a thing as skill, or assuming completely unreliable tests. So, not interesting assumptions, unless you want to call into question the test reliability, which neither you nor the author are directly doing.


I think the other piece that has been glossed over a bit is that DK are using quantiles (for both the test and the self-assessment). That means everything is bounded by 0 and 1, and you can't underestimate your performance if it was poor, or overestimate your performance if it was perfect. Or conversely, if you're the most skilled person in the room, your (random) actual performance on the day of the test is bounded above by your true skill, and vice versa for the least skilled. So e.g. we could simulate data with perfect self-assessment of overall skill, add a small amount of noise to actual performance on the day of the test, and get the same results. The bottom quartile (grouped by actual test score) will be a mix of people who are actually in the bottom quartile in skill and some who are in the higher quartiles. The top quartile by actual test score will be a mix of some from the top quartile in skill and some from lower quartiles.


I agree in principle, although I think to get an effect size similar to what DK observed you'd need quite large noise. Which again comes back to the test reliability.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: