Was this article written by Dr. Harkonen's publicist or something?
The case seems pretty clear: he knowingly misrepresented the results of a drug trial for the financial benefit of his firm and, by association, himself. He did this at the possible expense of critically to terminally ill patients, and at the further expense of the scientific and medical integrity of his research. And he received 6 months of house arrest at his cushy, 3-story San Francisco home as punishment. Forgive me if I don't strain myself reaching for my violin.
I sincerely hope this piece is not representative of the journalistic integrity of the Post under its new ownership. The article's blatant slant, its casual blending of editorial opinion and facts-based reporting, and its weirdly patronizing tone (ex: "the so-called 'p-value'") do no justice to the reputation of the newspaper.
The author opens with a rather silly rhetorical question, one with an obvious answer:
"Is it a crime for a medical researcher to hype his results? To put a heavy spin on the findings when there are millions of dollars, and possibly lives, at stake?"
Yes. Yes, it is. Especially when there are millions of dollars, and possibly lives, at stake. You don't get to cut corners in the scientific method because you think you're on to something.
I got the opposite impression. In fact, I think this may be the best medical statistics article I've ever seen in the popular press:
But there was a problem. This mild-to-moderate subgroup
wasn’t one the researchers said they would analyze when
they set up the study. Subdividing patients after the
fact and looking for statistically significant results
is a controversial practice. In its most extreme form,
it’s scorned as “data dredging.” The term suggests that
if you drag a net through a bunch of numbers enough
times, you’ll come up with something significant sooner
or later."
It turns out he's a part-time journalist and part-time licensed physician: "He works four days a week at the Post and two-thirds of a day at a general internal medicine clinic in Baltimore supervising third-year medical students."
I also didn't find it biased toward Harkonen at all. Consider the closing:
InterMune did run another trial. It was big — 826
patients at 81 hospitals — in order to maximize the
chance of getting clear-cut results. It enrolled only
people with mild to moderate lung damage, the subgroup
whose success was touted in the press release.
And it failed. A little more than a year into the study,
more people on the drug had died (15 percent) than people
on placebo (13 percent). That was the death knell for the
drug.
I don't know if I've ever seen a major newspaper link directly to a primary scientific paper as a source. I encourage you to read it again and see if your view changes. I've read it twice now and think it is an absolutely stellar piece of science writing, worthy of a Best Science Writing compilation.
Subdividing patients after the fact and looking for statistically significant
results is a controversial practice
If by controversial they meant "is considered outright fraud" ... then yes, it is controversial. This is one of the parts that made me see an unacceptable level of bias.
There are a lot of parts that are similar to this, it might do a better than average job of explaining p-values and talking about the realities of experimental design but the tone in every case is dismissive in the "bunch of eggheads worrying about little details" kind of way.
This is so biased I assume there is an agenda here rather than just bad journalism, the other explanation is that the writer is a good writer, researched the unfamiliar terms but didn't understand any of it and is good enough at regurgitating information to make it look like they did understand it. Either way, it's awful science writing even if they did use a lot of science words correctly. I have a higher standard for my hypothetical Best Science Writing awards I guess ... like requiring they get the science right.
> If by controversial they meant "is considered outright fraud" ... then yes, it is controversial
Hold on there... controversial, yes, but fraud - no. If the original data was reported, and after the fact you discovered a class of patients responded differently than the rest and you could find a good classifier, then it's certainly not fraud. In this case, they weren't just taking all the "good" patients and putting them in one adhoc group and all the "bad" patients and putting them in another. They found they were able to sub-classify the patients using existing clinical traits.
I thought the article did a pretty good job trying to explain the statistical issues at play. As described, the study isn't very clear, but it's hardly fraud to spin the results in that way. Is it reaching? Perhaps, but I so is calling this criminal. Remember using 0.05 as a p-value cutoff for significance is just a convention the number itself is arbitrary. In my work, I have to use something much more stringent such as 0.001 due to multiple testing issues.
In my reading, the issue wasn't the analysis. The issue was the hype of the analysis. In this case, he definitely went overboard, but I'm not sure it rose to the level of fraud.
If you divide patients in 20+ arbitrary groups (say, based on first letter of their name) and give them a 0 efficiency treatment, then almost always you'll see a group with a p-0.05 significant improvement; that' just basic stats.
So, if you afterwards discover some class of patients with this different response, it is grounds only to investigate that group further separately, not anything more - claiming that there is such an effect is fraud. You can't make extra claims until you have verification of that separate group.
Of course, but they divided the patients into two groups: Mild/moderate and severe clinical phenotypes. It's not optimal, but it is far from subdividing into 20 classes or making up arbitrary classifiers until you got the number you were looking for. The clinical phenotypes were already established prior to the analysis.
> and after the fact you discovered a class of patients responded differently than the rest and you could find a good classifier, then it's certainly not fraud
Or you could say "after the fact you chose/created the class of patients that responded differently".
It certainly is fraudulent, I suppose this issue isn't quite a Stats 101 thing, but it's certainly undergrad level because it's only a little bit subtle.
A useful metaphor to keep in mind is the one that gives the Texas Sharpshooter Fallacy it's name: "The name comes from a joke about a Texan who fires some shots at the side of a barn, then paints a target centered on the biggest cluster of hits and claims to be a sharpshooter." The examples on wikipedia are also instructive (http://en.wikipedia.org/wiki/Texas_sharpshooter_fallacy)
As PeterisP pointed out, the only correct interpretation is to say "That's interesting, we need to do another study with that subgroup to see if that's a real effect" (although that xkcd comic illustrates what's called "The Filedrawer Effect" or simply "publication bias" which is the same error made a different way).
Give me a dataset of people rolling dice many times and the ability to group them after examining the data and I will find you a group that seems to have magical craps playing abilities that don't actually exist. It might be men over 6' with brown hair or it might be married people who are left handed or maybe bilingual introverts ... who knows, I can't tell you what that group will be until I see the data (and that last statement is why you know it's not a real ability). This is precisely why experimental design depends on defining your tests before you examine your data or, put another way, you cannot use the data that suggested an effect to prove an effect.
The idea that the CEO of a biotech company didn't know this is absurd, because this is Running a Scientific Experiment 101. Selecting groups after seeing the data is probably the most common method chosen to tweak failed experiments by those committing fraud, second is probably using the filedrawer (just rerun the experiment until you get a result by chance and pretend the failed experiments never happened). These are so rife that there are hundreds of articles about the prevalence of this type of fraud published in the last few years. This type of behaviour is a big deal these days.
Tone of the article was certainly skewed in Harkonen's favor, so I stopped mid-second page and turned to HN comments.
It's great that the journalist found the debunking study and mentioned it. But does the order matter? He goes on and on about how "a sizable proportion of [scientists] might be in jail today".
Most readers would drop half way through, as I did. What will they learn? Oh, there was evidence that drug can be lifesaving but technicality killed the white knight.
The problems with after-the-fact analyses are well known. I reckon there should be a strict mathematical proof why those are wrong.
We just got yet another example of that.
How else can you keep charlatans out if not by being rough on those who break the rules?
Especially when breaking the rules possibly has lethal consequences for less sophisticated yet desperate consumers.
I'm not saying Harkonen is a charlatan, but he certainly jumped the gun.
For whatever reason, but he did.
I'll give it another read in light of what you're saying, but in the meantime, to clarify one line:
"I don't presume it has anything to do with Bezos, but if it does, I hope stays on a buying spree!"
I don't presume it has anything to do with Bezos, either. I would assume (?) he hasn't really taken the reins yet, at any rate. I've since edited that statement for clarity. Not trying to blame him for anything, or attribute this to him in any way. Rather, I am hopeful that the quality improves under him.
I've reread the article a few times since reading your post, and while I might moderate the severity with which I first lashed at the author, I stand by my assessment. Here's my rationale:
1) The tone of the article is skewed in Harkonen's favor right from the outset, particularly in the way the author initially and repeatedly presumes the purity of Harkonen's motives ("saving lives"). In fact, we have no evidence of any motives other than the financial ("billion-dollar market," "$500 million opportunity," etc.).
2) Harkonen's methodology was flawed, full stop. While there might well have been something promising in the cohort he analyzed, a remainder cohort from one study does not take the place of a full set of clinical trials. Medical and statistical ethics would demand another study be conducted of the cohort Dr. Harkonen identified as more promising. Jumping the gun was unwarranted and possibly reckless. (As it turns out, a full study was later conducted, and indeed it proved the recklessness with which Harkonen leapt to his early conclusions).
3) Harkonen grossly misstated the results of the study in his press release, and, while it's hard to prove his intent, he is on record stating his financial upside in so doing. The author devotes all of a couple lines to this crucial point, then trots out a couple of irrelevant character-witness quotes to pave it over. (One quote essentially states 'If this is a crime, then a lot of scientists are in trouble!' Well, it is a crime. And if lots of other scientists are similarly cherrypicking their data and making misleading statements about it, then I'd say we have a serious problem on our hands.)
4) Your point about the closing statement in the article doesn't indicate absence of bias in favor of Harkonen. It adds some balance, yes. But it's buried at the bottom of the piece, where the typical reader either won't find it, or will glaze over it.
5) The opening paragraphs of the article, by contrast, paint a tragic portrait of an eminent and well-meaning doctor humiliated and punished for the crime of simply being overeager to help people (with the strong implication that his humiliation and punishment were unjust).
6) The author makes the repeated assertion that there is 'nothing factually inaccurate' about the content of Harkonen's press release. This is true by the letter, but not by the spirit. Harkonen cherrypicked the "facts" that he portrayed, accurately or otherwise, in that press release. And the later study bears out the potentially disastrous consequences of his having done so, negligently and recklessly.
I'll grant you that "fraud" seems a little extreme in this case, and that's where the controversy lies. But we can demonstrate mens rea, and at the very least, Harkonen committed an act of criminal negligence. That he stood to gain significantly from his misleading statements is strong circumstantial evidence in favor of fraud, though the article doesn't give us enough facts to make a definitive claim there. At the very least, we see no evidence that Harkonen was doing this to save lives -- a point the author strongly and repeatedly implies throughout the article.
We are likely starting from different points. I see a media that almost universally amplifies the hype of scientific press releases, and I'm overjoyed to see an article that (I feel) critically examines the details. Perhaps my bias against "science by press release" is so strong that I don't even register the attempts to exonerate Harkonen.
And if lots of other scientists are similarly
cherrypicking their data and making misleading statements
about it, then I'd say we have a serious problem on our
hands.
I feel this is where we are at, with Harkonen's approach almost at the middle of the road. This isn't to imply his behaviour is acceptable, just that we have a very serious problem. Have you read John Ioannidis' essay "Why Most Published Research Findings Are False"?
http://www.plosmedicine.org/article/info:doi/10.1371/journal...
At the very least, we see no evidence that Harkonen was
doing this to save lives -- a point the author strongly
and repeatedly implies throughout the article.
I guess I didn't see that message as strongly as you. I felt it was granted that he had a strong vested interest:
The prosecutors also emphasized that Harkonen had a
financial motive for spinning the study in the most
positive way. This wasn’t hard to find. The third
paragraph of the press release said: “We believe these
results will support use of Actimmune and lead to peak
sales in the range of $400-$500 million per year,
enabling us to achieve profitability in 2004 as planned.”
But you are right that this wasn't emphasized. I think for me the hard point is trying to understand the boundary between ethics and law. I worry that asking a lay jury to decide guilt based on the contextual interpretation of p-values isn't going to end well.
I also wonder to what extent Harkonen was optimistically self-delusional versus blatantly fraudulent. I presume that as well as desiring financial success, he truly hoped he had found a useful cure for a dread disease. Along that lines, I recently read a great article by a 1930's Nobel Prize winner entitled "Pathological Science" on how even good scientists can occasionally fool themselves into seeing results that just aren't there: http://yclept.ucdavis.edu/course/280/Langmuir.pdf
As a trained (and hopefully ethical) statistician, I agree with the interpretation that Dr. Harkonen did mis-represent the results of the trial. This article does a pretty good job describing the controversy to a layperson audience, but I do not feel the result is all that controversial.
It is well-known to statistically-minded people that p-values are computations that rely on particular assumptions being made. One of those assumptions is that only a single, pre-specified hypothesis is being tested. By making additional comparisons, the p-value that was reported by Dr. Harkonen mis-represented the true significance of the trial. Perhaps factually the p-value was 0.004, but publicizing the p-value as if it were obtained by a fair trial, as opposed to finding it in a hunt for (quasi-)statistical significance, was a manipulation of the facts to support one's personal interests. That's not science; it's bias, and self-serving bias at that.
Just because the original study wasn't found to be significant, that doesn't mean that an already-existing sub group wasn't significant. They used an existing clinical trait as a separate classifier to look at the patients in a different way. That isn't too controversial (if you have a high enough patient count, which was probably the biggest fault of this post-hoc analysis).
Then again, I'm a biologist. We're trained to not trust statisticians. (Or course, we're also trained to not speak in absolutes and cover every statement in doubt, so he failed in that regard too).
What is likely, although we cannot of course know for sure, is that the doctor looked at more than disease severity (the "existing clinical trait") as a separate classifier, hunting for subgroups for which the p-value indicated a promising trend.
The principle behind the proscription against multiple comparisons is well-known to statisticians. If we consider a 1 chance in 20 result to be statistically significant, then, randomly, on average 20 "trials" will yield one statistically significant result.
By dividing the patients into disease severity subgroups, Dr. Harkonen increased the number of "trials" from 1 to 4, thereby elevating the likelihood of yielding an effect that appeared to be statistically significant. If he also examined other subgroups in his quest to find a positive result, then he elevated the likelihood of finding a positive result toward certainty.
Our desire to find patterns and see cause and effect make us prone to confirmation bias. We can guard against this bias with care, including the use of statistics. It was not a surprise that a subsequent study looking only at the "mild-to-moderate" group did not demonstrate any benefit of the treatment. The belief that the treatment would benefit "mild-to-moderate" patients was speciously derived.
(Even if they did correct for multiple tests, I think the sub-group would have potentially been significant. An uncorrected p-value of 0.004 is what is sticking in my head).
But the point is, just because he's bad at statistics does that make it fraud? Based on what we know from the article, I'd argue no. People are allowed to be wrong and make mistakes in their analysis. They just aren't allowed to knowingly make those mistakes. And this is what we don't know... what he knew and what he thought at the time.
sure, but when people deliberately lie in order to gain millions of dollars, I can't get that upset when they get 6 months stuck in their house
he, and statisticians at his company, knew or should have known what he was doing was wrong. This stuff is covered in the first inferential stats course taken as an undergrad.
I initially thought he was partially justified in claiming the number with the 'mild-to-medium patients' qualification if there was a real benefit in the drug that the study had not been designed to detect. Except that a follow up study focusing on the mild-to-medium subgroup failed to show statistically significant benefits.
> This mild-to-moderate subgroup wasn’t one the researchers said they would analyze when they set up the study. Subdividing patients after the fact and looking for statistically significant results is a controversial practice. In its most extreme form, it’s scorned as “data dredging.” The term suggests that if you drag a net through a bunch of numbers enough times, you’ll come up with something significant sooner or later.
He could have just kept that data secret, and ran another trial but specifically targeted at people with mild to moderate illness. That would have protected him legally, and made the numbers look even better.
That's the kind of thing that many people are campaigning against. Companies should release all the research they do rather than cherry picking the useful (to them) results.
Actually, that would have been way better. If they had done the study you suggest, and the result had still been significant, then he would have been entirely justified in reporting what he did.
The issue is that dividing the participants after the fact and then looking for correlation in the existing data reduces the significance of the statistic considerably (we have other statistics for that). The p-value is not representative when used that way.
But if you do another study focused on that group in particular and still get a significant result, you're fine! The problem isn't that they located a group on which the drug worked in a dishonest way, or some such - the problem is that they were dishonest to claim they had significant evidence that the drug worked on that group. If they'd done an additional study on that group in particular, they would have their evidence (or, of course, a null result).
i think the person you were replying to was implying that the same data be used, while what you are arguing is that there should be new observations made (and i agree with you, if the new work is independent; i just wanted to explain why i think the original comment was arguing for greater transparency).
I believe he's suggesting that the doctor could have legally covered up the results of the first trial (by simply not releasing them,) then run a second trial on only the most beneficial population, releasing those results without mentioning the first trial.
At that point, his product would look great, hiding its failures.
This way, while he misinterpreted the P value in an illegal and fraudulent way, he did release all relevant information - ironically, better for the informed reader than if he had rerun the trial legally.
Malician, I'm not sure you and DanBC understand this fully?
It would be absolutely fine to run a new trial on the supposedly most beneficial population (those with mild/moderate lung damage; lets call them 'the subpopulation').
If that second trial succeeded, then it would be strong evidence that the drug was beneficial for the subpopulation.
There would be no need to hide the results of the first trial, as the first trial did not provide evidence that the drug didn't work on the subpopulation.
If you read the article to the end, they did in fact do such a trial on the subpopulation. And they got evidence it wasn't working on the subpopulation - which is how science goes.
The problem was that the first trial wasn't set up to examine the subpopulation, but they reported results as if it was. You can't do that with standard NHST, as it invalidates the assumptions of the statistical framework being used.
But you can absolutely decide to run a whole new test on a new sub population, based on hints you get from the first results.
And, while it'd in general be better if all test results (positive AND negative) were published, that is not relevant to this situation - the first trial said nothing bad about the effects on the subpopulation, so there'd be nothing to gain from hiding it, if you just wanted to claim it worked on the subpopulation.
Its not like a situation where they got evidence that the subpopulation would not benefit in the drug in the first test, and then decided to do another test, planning to only report the second.
Yes, I understand this. This is correct if the result of the test on that said subpopulation is only interpreted by the public and/or scientific community as applying to the subpopulation.
However, if the results of the original test are hidden, the results of the second test could well be taken as evidence for a wider or stronger effect, yes? If this isn't the case, then I wouldn't see a problem with that behavior - but from the reading I've done, I suspect it is in fact the case and is common practice.
edit: I may be completely wrong on this - if, indeed, that's not a significant problem.
ah, ok. so, you're right, but not as right as th eoriginal issue being discussed :o) i can explain if you're interested...
what i think you're saying is that they would hide the original negative study and publish a subsequent (new, separate, on different people) positive study.
[aside - that's not a perfect description because for one particular group the first study was positive; it's just that the group in question wasn't explicitly targetted].
and, in general, that's considered a bad thing. because (1) you can keep repeating studies until you get a positive, and then publish and (2) because the negatives aren't published, people have incomplete information.
but it's not a terribly bad thing, because if something isn't true then, if you repeat a study, it's likely going to show it isn't true. the standards are set high enough that you'd need to do hundreds of studies before you showed something to be true (when it really isn't).
and because hundreds of studies are expensive, it's unlikely to happen (but then you think of the industry as a whole, and it is doing hundreds, and so some of those are likely wrong...).
in contrast, what this guy was prosecuted for was hunting in the data. you can think of that like doing a new study, but without the cost. it's pretty easy to dream up hundreds of different questions you can ask existing data. and just by luck you're going to find the occasional surprising answer.
so hunting through data is like doing hundreds of studies until you find something, but it's cheap! and that's why it's "worse" than simply hiding negative results and repeating studies. because it's much more likely to happen in practice.
I think you're getting hung up on their use of the word "hide." What they're saying is that the first study could have been disregarded except as a good reason to run the second study. Of course, that later happened, and the effect disappeared - but maybe it wouldn't have. That's how science works.
I don't thing that you're disagreeing with them, just reiterating.
It would actually be ok to do what you say - the problem is that there WASN'T a 'most beneficial population' for which his treament works; he made up that 'population' from data which, at best, show that this population could be more beneficial if it's verified.
He wouldn't even need to keep it secret. It's completely legitimate to say "We were trying to prove XYZ and didn't, but the data does hint that ABC might be true. Let's do another study looking specifically at ABC."
I'll suggest reading to the end of the article. They did do another trial targeted at people with mild to moderate illness. It failed: "A little more than a year into the study, more people on the drug had died (15 percent) than people on placebo (13 percent)."
> FINDINGS:
At the second interim analysis, the hazard ratio for mortality in patients on interferon gamma-1b showed absence of minimum benefit compared with placebo (1.15, 95% CI 0.77-1.71, p=0.497), and indicated that the study should be stopped. After a median duration of 64 weeks (IQR 41-84) on treatment, 80 (15%) patients on interferon gamma-1b and 35 (13%) on placebo had died. Almost all patients reported at least one adverse event, and more patients on interferon gamma-1b group had constitutional signs and symptoms (influenza-like illness, fatigue, fever, and chills) than did those on placebo. Occurrence of serious adverse events (eg, pneumonia, respiratory failure) was similar for both treatment groups. Treatment adherence was good and few patients discontinued treatment prematurely in either group.
My point is that by saying that "more patients on the drug died", they're implying that the drug was itself killing people. With 35 patients, the 2% difference is about one person. They stopped the trial because the drug wasn't doing anything, but it's misleading to suggest that it was contributing further to mortality.
I don't think they are. I think I may have cut my quote in a bad place from that perspective. They continue:
"That was the death knell for the drug. Most insurers stopped paying for it."
I don't think they're implying that the drug killed people. I think they're saying that the study made it obvious it wasn't helping, so insurers stopped covering it and other consequences followed.
I agree that they needed the additional study (which is underway, I believe). However, I think it is important to realize the cost of another study of that size is very expensive and often small vaccine companies can't afford unplanned major studies without going back to the financial drawing board.
This is very interesting. In most papers I read during uni, the p-value was always set to 0.10. But I suppose it makes sense to have a more rigorous null hypothesis testing when you are talking about saving lives. I'm curious to see, on the whole, if all researchers in pharma try to move the goalposts like this guy did.
Thanks for pointing that out. Yes, it's hard to believe that someone with so much education (degrees in Statistics, Atmospheric Science, Meteorology, and Math) and so much professional experience (University Professor, Wall Street Quant, National Weather Service, US Air Force) would get that completely wrong. What do you figure the chances of that are? ;)
This case, like so many others, appears to be the product of an overzealous prosecutor looking to add to his resume before he begins applying to work at much higher paying private law firms. The concept of moral hazard does not exist for prosecutors - they can take all the shots they want at other people with no consequences. Until there are consequences, we will continue seeing blatant abuses of our justice system for the personal gain of those that work within it. Though it will never happen, private law firms should simply refuse to hire former prosecutors - many of these nonsensical prosecutions would vanish overnight.
The case seems pretty clear: he knowingly misrepresented the results of a drug trial for the financial benefit of his firm and, by association, himself. He did this at the possible expense of critically to terminally ill patients, and at the further expense of the scientific and medical integrity of his research. And he received 6 months of house arrest at his cushy, 3-story San Francisco home as punishment. Forgive me if I don't strain myself reaching for my violin.
I sincerely hope this piece is not representative of the journalistic integrity of the Post under its new ownership. The article's blatant slant, its casual blending of editorial opinion and facts-based reporting, and its weirdly patronizing tone (ex: "the so-called 'p-value'") do no justice to the reputation of the newspaper.
The author opens with a rather silly rhetorical question, one with an obvious answer:
"Is it a crime for a medical researcher to hype his results? To put a heavy spin on the findings when there are millions of dollars, and possibly lives, at stake?"
Yes. Yes, it is. Especially when there are millions of dollars, and possibly lives, at stake. You don't get to cut corners in the scientific method because you think you're on to something.