Hacker News new | past | comments | ask | show | jobs | submit login
Berkson's Paradox (wikipedia.org)
87 points by dedalus on Dec 14, 2018 | hide | past | favorite | 21 comments

Example that I remember getting pointed out on HN a while back


A couple more examples from higher education: Within the pool of enrolled students, language skills and quantitative skills are negatively correlated; and within the pool of students accepted by any law school, LSAT and GPA are negatively correlated.

Implication is that anything within a company that gets you hired that isn't A is negatively correlated to A I guess, which is pretty interesting.

Looks, competence, workexp, interview skills, height, rich background, any positive discrimination, nepotism etc etc. All should be negatively correlated with eachother this. Although there are limits to the strength of the effect right, it depends on how much of a slice is being excluded and if the two variables have an enormous correlation that might outweigh it?

You'd also expect the effect to be stronger the more selective the environment (for each pair of variables). So if your place hires very strongly on looks and competence they'll be very negatively correlated.

Also it depends on the hiring policy, we're assuming some sort of (A+B) > C evaluation over things that they care about, but if it's (A>A0, B>B0, C>C0), pass all of those and you're in then this effect should be totally absent in those variables.

Re: your final paragraph it seems like it could be a0 > A & b0 > B & ... if you're also trying to (or forced by the market to) minimize the sum of a0 + b0 ...

Of course some of those things might be positively correlated in the general population strongly enough that this Berkson's effect will only make them less positively correlated in the company.

Offtopic, but I'm really curious... What's with the AMP URL? I've noticed these started appearing recently, but I don't know where they're coming from. Did you grab an AMP URL on purpose or is it Google serving them somewhere and you didn't notice?

He's probably on mobile. Google chrome on mobile loves to give AMP pages instead of normal ones.

I didn't notice this, thanks. In that case, the AMP situation is worse than I thought.

My first exposure to this idea was in Jordan Ellenberg's fantastic book "How Not to Be Wrong: The Power of Mathematical Thinking". Here's a post from him that goes into the same example he uses in the book:


An example of this I heard once is:

For a given car, there is no correlation between whether or not a car battery is dead or a fuel pump is broken.

However if you have a car that does not start, if you test the battery and it is working, you can now consider it more likely that the fuel pump is broken (because you have ruled out one cause of the car not starting, all other causes are now more likely).

This means that if you were to gather statistics about batteries and fuel pumps of all cars taken into the auto shop, you would find that there is a negative correlation between broken batteries and broken fuel pumps, despite this being clearly nonsensical for the general population.

There's a fabulous numberphile video exploring this paradox: https://www.youtube.com/watch?v=FUD8h9JpEVQ

TL;DR Berkson's paradox is a false observation of a negative correlation between two positive traits

Which occurs when/because for whatever reason, samples with two negative traits are excluded, the example demonstrates it well: if P(a, b), P(~a, b), P(a, ~b), P(~b, ~a) all equal 0.25:

If you exclude the (~a, ~b) sample then within the population that remains it looks like a and b are negatively correlated. (p(a|~b) = 1, p(a|b)=0.5)

It's interesting because this is something that happens a lot => when dating, looks negatively correlate with niceness within people you date because you don't date people who are neither. Diseases in hospital populations are negatively correlated because if you don't have anything you're not in hospital.

This is excellent, concise summary. Thank you zimablue!

Great examples, thanks!

Can you suggest any good material you know on stats? (I'd prefer videos, but everything works) It seems to me you have a very solid background.

look up 3blue1brown for some instructional videos on math, including some on stats

Is this just a term for the outcome of a sampling bias?

This is a special case of sampling bias.

Took me until reading the example involving stamps (three or four times) to finally grok it, but I finally understand now.


This is the inverse of two wrongs don't make a right.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact