

The press-release conviction of a biotech CEO and its impact on research - bqe
http://articles.washingtonpost.com/2013-09-23/national/42314943_1_intermune-scott-harkonen-actimmune

======
jonnathanson
Was this article written by Dr. Harkonen's publicist or something?

The case seems pretty clear: he knowingly misrepresented the results of a drug
trial for the financial benefit of his firm and, by association, himself. He
did this at the possible expense of critically to terminally ill patients, and
at the further expense of the scientific and medical integrity of his
research. And he received 6 months of house arrest at his cushy, 3-story San
Francisco home as punishment. Forgive me if I don't strain myself reaching for
my violin.

I sincerely hope this piece is not representative of the journalistic
integrity of the _Post_ under its new ownership. The article's blatant slant,
its casual blending of editorial opinion and facts-based reporting, and its
weirdly patronizing tone (ex: "the so-called 'p-value'") do no justice to the
reputation of the newspaper.

The author opens with a rather silly rhetorical question, one with an obvious
answer:

 _" Is it a crime for a medical researcher to hype his results? To put a heavy
spin on the findings when there are millions of dollars, and possibly lives,
at stake?"_

Yes. Yes, it is. _Especially_ when there are millions of dollars, and possibly
lives, at stake. You don't get to cut corners in the scientific method because
you _think_ you're on to something.

~~~
nkurz
I got the opposite impression. In fact, I think this may be the best medical
statistics article I've ever seen in the popular press:

    
    
      But there was a problem. This mild-to-moderate subgroup
      wasn’t one the researchers said they would analyze when
      they set up the study. Subdividing patients after the
      fact and looking for statistically significant results
      is a controversial practice. In its most extreme form,
      it’s scorned as “data dredging.” The term suggests that
      if you drag a net through a bunch of numbers enough
      times, you’ll come up with something significant sooner
      or later."
    

I don't presume it has anything to do with Bezos, but if it does, I hope stays
on a buying spree! I was inspired to find out more about the author:
[http://www.washingtonpost.com/david-
brown/2011/02/28/AB2Y0sM...](http://www.washingtonpost.com/david-
brown/2011/02/28/AB2Y0sM_page.html)

It turns out he's a part-time journalist and part-time licensed physician: "He
works four days a week at the Post and two-thirds of a day at a general
internal medicine clinic in Baltimore supervising third-year medical
students."

I also didn't find it biased toward Harkonen at all. Consider the closing:

    
    
      InterMune did run another trial. It was big — 826
      patients at 81 hospitals — in order to maximize the
      chance of getting clear-cut results. It enrolled only
      people with mild to moderate lung damage, the subgroup
      whose success was touted in the press release.
    
      And it failed. A little more than a year into the study,
      more people on the drug had died (15 percent) than people
      on placebo (13 percent). That was the death knell for the
      drug. 
    

It even links to the actual study:
[http://www.ncbi.nlm.nih.gov/pubmed/19570573?dopt=Abstract](http://www.ncbi.nlm.nih.gov/pubmed/19570573?dopt=Abstract)

I don't know if I've ever seen a major newspaper link directly to a primary
scientific paper as a source. I encourage you to read it again and see if your
view changes. I've read it twice now and think it is an absolutely stellar
piece of science writing, worthy of a Best Science Writing compilation.

~~~
freshhawk

        Subdividing patients after the fact and looking for statistically significant
        results is a controversial practice
    

If by controversial they meant "is considered outright fraud" ... then yes, it
is controversial. This is one of the parts that made me see an unacceptable
level of bias.

There are a lot of parts that are similar to this, it might do a better than
average job of explaining p-values and talking about the realities of
experimental design but the tone in every case is dismissive in the "bunch of
eggheads worrying about little details" kind of way.

This is so biased I assume there is an agenda here rather than just bad
journalism, the other explanation is that the writer is a good writer,
researched the unfamiliar terms but didn't understand any of it and is good
enough at regurgitating information to make it look like they did understand
it. Either way, it's awful science writing even if they did use a lot of
science words correctly. I have a higher standard for my hypothetical Best
Science Writing awards I guess ... like requiring they get the science right.

~~~
mbreese
> If by controversial they meant "is considered outright fraud" ... then yes,
> it is controversial

Hold on there... controversial, yes, but fraud - no. If the original data was
reported, and after the fact you discovered a class of patients responded
differently than the rest __and __you could find a good classifier, then it 's
certainly not fraud. In this case, they weren't just taking all the "good"
patients and putting them in one adhoc group and all the "bad" patients and
putting them in another. They found they were able to sub-classify the
patients using _existing_ clinical traits.

I thought the article did a pretty good job trying to explain the statistical
issues at play. As described, the study isn't very clear, but it's hardly
fraud to spin the results in that way. Is it reaching? Perhaps, but I so is
calling this criminal. Remember using 0.05 as a p-value cutoff for
significance is just a convention the number itself is arbitrary. In my work,
I have to use something much more stringent such as 0.001 due to multiple
testing issues.

In my reading, the issue wasn't the analysis. The issue was the hype of the
analysis. In this case, he definitely went overboard, but I'm not sure it rose
to the level of fraud.

~~~
PeterisP
If you divide patients in 20+ arbitrary groups (say, based on first letter of
their name) and give them a 0 efficiency treatment, then almost always you'll
see a group with a p-0.05 significant improvement; that' just basic stats.

So, if you afterwards discover some class of patients with this different
response, it is grounds only to investigate that group further separately, not
anything more - claiming that there is such an effect is fraud. You can't make
extra claims until you have verification of that separate group.

For an illustration, see [http://xkcd.com/882/](http://xkcd.com/882/)

~~~
mbreese
Of course, but they divided the patients into two groups: Mild/moderate and
severe clinical phenotypes. It's not optimal, but it is far from subdividing
into 20 classes or making up arbitrary classifiers until you got the number
you were looking for. The clinical phenotypes were already established prior
to the analysis.

------
jerrytsai
As a trained (and hopefully ethical) statistician, I agree with the
interpretation that Dr. Harkonen did mis-represent the results of the trial.
This article does a pretty good job describing the controversy to a layperson
audience, but I do not feel the result is all that controversial.

It is well-known to statistically-minded people that p-values are computations
that rely on particular assumptions being made. One of those assumptions is
that only a single, pre-specified hypothesis is being tested. By making
additional comparisons, the p-value that was reported by Dr. Harkonen mis-
represented the true significance of the trial. Perhaps factually the p-value
was 0.004, but publicizing the p-value as if it were obtained by a fair trial,
as opposed to finding it in a hunt for (quasi-)statistical significance, was a
manipulation of the facts to support one's personal interests. That's not
science; it's bias, and self-serving bias at that.

~~~
mbreese
I'm not as convinced as you are.

Just because the original study wasn't found to be significant, that doesn't
mean that an already-existing sub group wasn't significant. They used an
existing clinical trait as a separate classifier to look at the patients in a
different way. That isn't too controversial (if you have a high enough patient
count, which was probably the biggest fault of this post-hoc analysis).

Then again, I'm a biologist. We're trained to not trust statisticians. (Or
course, we're also trained to not speak in absolutes and cover every statement
in doubt, so he failed in that regard too).

~~~
jerrytsai
What is likely, although we cannot of course know for sure, is that the doctor
looked at more than disease severity (the "existing clinical trait") as a
separate classifier, hunting for subgroups for which the p-value indicated a
promising trend.

The principle behind the proscription against multiple comparisons is well-
known to statisticians. If we consider a 1 chance in 20 result to be
statistically significant, then, randomly, on average 20 "trials" will yield
one statistically significant result.

By dividing the patients into disease severity subgroups, Dr. Harkonen
increased the number of "trials" from 1 to 4, thereby elevating the likelihood
of yielding an effect that appeared to be statistically significant. If he
also examined other subgroups in his quest to find a positive result, then he
elevated the likelihood of finding a positive result toward certainty.

Our desire to find patterns and see cause and effect make us prone to
confirmation bias. We can guard against this bias with care, including the use
of statistics. It was not a surprise that a subsequent study looking only at
the "mild-to-moderate" group did not demonstrate any benefit of the treatment.
The belief that the treatment would benefit "mild-to-moderate" patients was
speciously derived.

~~~
mbreese
Right... multiple testing correction, false-discovery rates, etc... I'm quite
familiar.

(Even if they did correct for multiple tests, I think the sub-group would have
potentially been significant. An uncorrected p-value of 0.004 is what is
sticking in my head).

But the point is, just because he's bad at statistics does that make it fraud?
Based on what we know from the article, I'd argue no. People are allowed to be
wrong and make mistakes in their analysis. They just aren't allowed to
knowingly make those mistakes. And this is what we don't know... what he knew
and what he thought _at the time_.

~~~
x0x0
sure, but when people deliberately lie in order to gain millions of dollars, I
can't get that upset when they get 6 months stuck in their house

he, and statisticians at his company, knew or should have known what he was
doing was wrong. This stuff is covered in the first inferential stats course
taken as an undergrad.

------
DanBC
> This mild-to-moderate subgroup wasn’t one the researchers said they would
> analyze when they set up the study. Subdividing patients after the fact and
> looking for statistically significant results is a controversial practice.
> In its most extreme form, it’s scorned as “data dredging.” The term suggests
> that if you drag a net through a bunch of numbers enough times, you’ll come
> up with something significant sooner or later.

He could have just kept that data secret, and ran another trial but
specifically targeted at people with mild to moderate illness. That would have
protected him legally, and made the numbers look even better.

That's the kind of thing that many people are campaigning against. Companies
should release all the research they do rather than cherry picking the useful
(to them) results.

~~~
Bakkot
Actually, that would have been way better. If they had done the study you
suggest, and the result had still been significant, then he would have been
entirely justified in reporting what he did.

The issue is that dividing the participants after the fact and then looking
for correlation _in the existing data_ reduces the significance of the
statistic considerably (we have other statistics for that). The p-value is not
representative when used that way.

But if you do another study focused on that group in particular and still get
a significant result, you're fine! The problem isn't that they located a group
on which the drug worked in a dishonest way, or some such - the problem is
that they were dishonest to claim they had significant evidence that the drug
worked on that group. If they'd done an additional study on that group in
particular, they would have their evidence (or, of course, a null result).

~~~
andrewcooke
i think the person you were replying to was implying that the same data be
used, while what you are arguing is that there should be new observations made
(and i agree with you, if the new work is independent; i just wanted to
explain why i think the original comment was arguing for greater
transparency).

~~~
Malician
I believe he's suggesting that the doctor could have legally covered up the
results of the first trial (by simply not releasing them,) then run a second
trial on only the most beneficial population, releasing those results without
mentioning the first trial.

At that point, his product would look great, hiding its failures.

This way, while he misinterpreted the P value in an illegal and fraudulent
way, he did release all relevant information - ironically, better for the
informed reader than if he had rerun the trial legally.

~~~
feral
Malician, I'm not sure you and DanBC understand this fully?

It would be absolutely fine to run a new trial on the supposedly most
beneficial population (those with mild/moderate lung damage; lets call them
'the subpopulation').

If that second trial succeeded, then it would be strong evidence that the drug
was beneficial for the subpopulation.

There would be _no need_ to hide the results of the first trial, as the first
trial did not provide evidence that the drug didn't work on _the
subpopulation_.

If you read the article to the end, they did in fact do such a trial on the
subpopulation. And they got evidence it wasn't working on the subpopulation -
which is how science goes.

The problem was that the first trial wasn't set up to examine the
subpopulation, but they reported results as if it was. You can't do that with
standard NHST, as it invalidates the assumptions of the statistical framework
being used.

But you can absolutely decide to run a whole new test on a new sub population,
based on hints you get from the first results.

And, while it'd in general be better if all test results (positive AND
negative) were published, that is not relevant to this situation - the first
trial said nothing bad about the effects on _the subpopulation_ , so there'd
be nothing to gain from hiding it, if you just wanted to claim it worked on
the subpopulation.

Its not like a situation where they got evidence that _the subpopulation_
would not benefit in the drug in the first test, and then decided to do
another test, planning to only report the second.

~~~
Malician
Yes, I understand this. This is correct if the result of the test on that said
subpopulation is only interpreted by the public and/or scientific community as
applying to the subpopulation.

However, if the results of the original test are hidden, the results of the
second test could well be taken as evidence for a wider or stronger effect,
yes? If this isn't the case, then I wouldn't see a problem with that behavior
- but from the reading I've done, I suspect it is in fact the case and is
common practice.

edit: I may be completely wrong on this - if, indeed, that's not a significant
problem.

~~~
andrewcooke
ah, ok. so, you're right, but not as right as th eoriginal issue being
discussed :o) i can explain if you're interested...

what i think you're saying is that they would hide the original negative study
and publish a subsequent (new, separate, on different people) positive study.

[aside - that's not a perfect description because for one particular group the
first study was positive; it's just that the group in question wasn't
explicitly targetted].

and, in general, that's considered a bad thing. because (1) you can keep
repeating studies until you get a positive, and then publish and (2) because
the negatives aren't published, people have incomplete information.

but it's not a terribly bad thing, because if something isn't true then, if
you repeat a study, it's likely going to show it isn't true. the standards are
set high enough that you'd need to do hundreds of studies before you showed
something to be true (when it really isn't).

and because hundreds of studies are expensive, it's unlikely to happen (but
then you think of the industry as a whole, and it is doing hundreds, and so
some of those are likely wrong...).

in contrast, what this guy was prosecuted for was hunting in the data. you can
think of that like doing a new study, but without the cost. it's pretty easy
to dream up hundreds of different questions you can ask existing data. and
just by luck you're going to find the occasional surprising answer.

so hunting through data is like doing hundreds of studies until you find
something, but it's cheap! and that's why it's "worse" than simply hiding
negative results and repeating studies. because it's much more likely to
happen in practice.

------
JoeAltmaier
Alternate title: "Man commits fraud to profit from terminally ill patients,
gets slap on wrist"?

------
yarou
This is very interesting. In most papers I read during uni, the p-value was
always set to 0.10. But I suppose it makes sense to have a more rigorous null
hypothesis testing when you are talking about saving lives. I'm curious to
see, on the whole, if all researchers in pharma try to move the goalposts like
this guy did.

------
greenyoda
Here's a statistician's take on this story:

[http://wmbriggs.com/blog/?p=9308](http://wmbriggs.com/blog/?p=9308)

~~~
sanskritabelt
Everybody reading this and saying 'oh! a statistician!' should remember that
Briggs is also, among other things, a global warming denier.

~~~
nkurz
Thanks for pointing that out. Yes, it's hard to believe that someone with so
much education (degrees in Statistics, Atmospheric Science, Meteorology, and
Math) and so much professional experience (University Professor, Wall Street
Quant, National Weather Service, US Air Force) would get that completely
wrong. What do you figure the chances of that are? ;)

[http://wmbriggs.com/blog/?page_id=1085](http://wmbriggs.com/blog/?page_id=1085)

~~~
sanskritabelt
Yeah its almost as if he has a track record of going against the evidence in
favor of acting the iconoclast.

------
mnbvcxza
Who's got the Dune quote for this?

~~~
foobarbazqux
Science is made up of so many things that appear obvious after they are
explained.

------
downandout
This case, like so many others, appears to be the product of an overzealous
prosecutor looking to add to his resume before he begins applying to work at
much higher paying private law firms. The concept of moral hazard does not
exist for prosecutors - they can take all the shots they want at other people
with no consequences. Until there are consequences, we will continue seeing
blatant abuses of our justice system for the personal gain of those that work
within it. Though it will never happen, private law firms should simply refuse
to hire former prosecutors - many of these nonsensical prosecutions would
vanish overnight.

------
fiatmoney
Thank God R.A. Fisher still had all his toes when he invented the concept, or
we could have been stuck with a P-value threshold of 0.052631579...

