Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Karraker and her co-author did the analysis again, and found the results stand only when wives develop heart problems, not other illnesses.

This is p-hacking after-the-fact, right? Seems like the classic example: The broader hypothesis doesn’t hold, but if you look for ways to slice and dice the data you’re likely to find a (spurious?) correlation eventually.



No, it's a coding error that lead to a wacky subsample being used for their treatment variable.

The blog post list this as the original Stata code:

> replace event`i’ = 1 if delta_mct`i’ != 0 | spouse_delta_mct`i’ != 0

and this as the correct one:

> replace event`i’ = 1 if (delta_mct`i’ != 0 | spouse_delta_mct`i’ != 0) & delta_mct`i’ != . & spouse_delta_mct`i’ != .

It looks like the authors didn't properly handle missing values in Stata, leading to marking people with missing health information being marked as being "severely ill" instead of being excluded from the analysis.

It's an unfortunate mistake, but it happens.


So you’re saying that their current conclusion - that this is significant for heart events, but not in general - is valid statistically?

(Honest question)


Not OP, but to me it sounds line p-hacking aka bad science as well: If you slice a dataset en enough subsamples you will very likely find random correlations. That’s the nature of these kinds of analyses and we should be sceptical of conclusions that are based on suce analyses.


I think you and p51-remorse are discussing different parts of the article. They're saying the updated analysis is suspect because of the risk of false discoveries. I believe that's probably true in the usual way--if we study 20 subgroups with no actual effect, then we expect one to show an effect with p < 0.05. There's no mention of preregistration or anything like a Bonferroni correction to manage that risk.

You're saying the original analysis was wrong due to a coding error. I believe that's also true, but that's not what they were discussing. The variable names are inscrutable, but the article text also seems to imply that line (mis)codes divorce, not severe illness:

> People who left the study were actually miscoded as getting divorced.

So they actually found a correlation between severe illness and leaving the study. That's perhaps intuitive, if those people were too busy managing their illness to respond.


If you have no statistical significance in any other type of illness but one, the chances of the data for that one type of illness being not balanced are high. They should do a follow up study to verify their findings on heart conditions with a new set of data.


I didn't get that impression.

Although you're right that there's likely no relationship between wives developing heart conditions and subsequent divorce, there's not enough information from the article to know whether there's anything statistically meaningful about heart conditions specifically. It seems more likely that it's just statistical noise. I read that section and got the impression that the relationship is interesting but it doesn't necessarily mean anything, rather than implying that the original study still has some validity.


Yes, because there’s no indication that heart disease vs other diseases is properly controlled for demographically.


I wondered that but looking at both papers, it seems much less likely. For one thing, they looked at the same illnesses in both the original paper and the new paper, so it’s not like they did a bunch of new analysis in order to find something that’s significantly.

For another, the study doesn’t look at that many diseases, just four: cancer, heart disease, lung disease, and stroke. It’s possible the original study was doing some p-hacking in order to reach significance but these are pretty major categories of disease so they seem defensible. It would not surprise me if these are the only diseases in the original dataset.

Finally, the results are significant at the .01 level. Combined with the number of diseases, this is not anything as blatant as the classic XKCD green jellybean comic, although more subtle p-hacking could be at play.

Retracted: https://journals.sagepub.com/doi/pdf/10.1177/002214651456835...

Corrected: https://journals.sagepub.com/doi/abs/10.1177/002214651559635... (Not open access but it’s available from sci-hub.)


I think so?

xkcd/882




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: