Hacker News new | past | comments | ask | show | jobs | submit login

There's a simple technique to "control for" those factors. First you split your population into groups that each have a similar race, gender, and age. Then within those groups you make comparisons of the hotmail users to the others. https://en.wikipedia.org/wiki/Controlling_for_a_variable

You would need to keep in mind xkcd #882 (https://xkcd.com/882/).

Let me ask some clarifying questions, because I think I understand the broad strokes of what you're saying, but I'm not clear on some details.

OK, so I find that among Black users, Hotmail is associated with a 1.05 relative risk to the baseline, and among white users, 1.15 relative risk. How do I then apply this knowledge to an incoming user of an unknown race with a Hotmail account? Do I give them the population weighted average of of the relative risks among all my buckets? What if I do know their race? Do Black users get assigned 1.05 as much risk and white users get 1.15?

Also, what if some of my buckets aren't full enough to get a high confidence of the relative Hotmail risk in that bucket? Let's say I don't have enough gay Hispanic women in my cohort. Do I just drop them from the analysis and hope they're similar to the other populations?

Also, what if Hotmail is associated with a 1.15 relative risk for both black and white users, but a black user is significantly more likely to have a Hotmail account than a white user?

Applications are open for YC Summer 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact