Hacker News new | past | comments | ask | show | jobs | submit login

Actually, it's quite terrifying when you remember that stopping a clinical trial early is a great way to create statistical fraud. https://blogs.sciencemag.org/pipeline/archives/2019/08/08/st... cites an example of a drug whose clinical trial was stopped early due to obviously effective success... only to discover later on that the "obviously effective" was a statistically spurious event, and the drug did not confer any benefits whatsoever.

It's only a "great way to create statistical fraud" if you assume that the statisticians, and the FDA, are either stupid or corrupt. And contrary to popular belief, they tend to be competent enough to at least cope with the sort of hair-brained schemes HN believes would fool them.[0]

Stopping a trial early and it's statistical implications should, by the way, be somewhat familiar with web developers: it's commonly done with A/B tests, and the related problem is the "One-armed bandit" (https://en.wikipedia.org/wiki/Multi-armed_bandit#Empirical_m...)

[0]: Legend says they are even familiar with correlation != causation, and some have mastered the advanced skill of knowing that trial size matters, which is why they started, a few years back, to test drugs on more than one person.

> It's only a "great way to create statistical fraud" if you assume that the statisticians, and the FDA, are either stupid or corrupt.

This is a legitimate issue in the scientific community. The FDA made it a requirement to publish drug trial criteria beforehand to prevent trials from stopping prematurely or adding more participants. This resulted in a massive drop in drugs with positive results: "...before the registry was created, more than half of the published studies of heart disease showed positive results. After the registry was created, only 8 percent had positive results". [0]

The FDA and scientific community isn't as "pure" as your flippant dismissal would suggest. The FDA can only do so much, and trial runners have misaligned incentives.

[0] https://www.npr.org/templates/transcript/transcript.php?stor...

> It's only a "great way to create statistical fraud" if you assume that the statisticians, and the FDA, are either stupid or corrupt.

Considering how much money is in pharmaceuticals, I'd definitely go with corrupt, especially considering this explains how a majority of the world works now and also since there's currently an outbreak and likely want to capitalize on it.

I'm not at all saying this is the case, but that if something like that did come out then I really wouldn't br surprised considering it seems to be a weekly occurrence that you hear about corrupt behavior now.

The issue with cynicism and corruption conspiracy theories is that they become a self-fulfilling prophecy.

If everyone is corrupt, then it’s OK if you are, etc.

There is no fair application of the rule of law, there’s no reason for rule of law, only the rule of men. Better hope they’re on your side!

My point is that clear corruption as we think of it (fraud, bibery, self dealing) in most Western countries is NOT considered normal. Ie. It happens but we’ve built institutions to counteract it. The FDA is notoriously bureaucratic in order to detect and prevent corruption, and for the few cases to slip through, to correct it.

When you hear about corruption stories, that’s about the system WORKING against those cynics that are testing it for their own benefit.

Cynicism is self destructive. If our institutions are broken, we must work to repair them.

Are you familiar with what PJ O'Rouke says about politicians?

Have a listen to this https://www.youtube.com/watch?v=JbIqKqojOZU

He is the author of Parliament of Whores

Parliament of Whores is an international best-selling political humor book by P. J. O'Rourke published by Atlantic Monthly Press in 1991. Subtitled "A Lone Humorist Attempts to Explain the Entire US Government"

I feel like Hanlon's razor is applicable here.

> Stopping a trial early and it's statistical implications should, by the way, be somewhat familiar with web developers: it's commonly done with A/B tests

AFAIK stopping an A/B test early is the way to go if you want to convince yourself (or your customer) that something has an effect, even though it hasn't.

Yes, if you "peek" at the results of an A/B test before it's done in order to decide whether to stop early, the numbers you "peeked" at have much lower statistical power than if you foreswore stopping early. Obviously, failing to take that into account when drawing conclusions about the effectiveness of the treatment is a colossal mistake.

However, the decrease in statistical power is still quantifiable, and with the right math you can still calculate an accurate 95% confidence interval on the effect size (which will be much wider than the wrong math where you naively don't account for the "peeking"). And of course, it's totally possible that the treatment is so effective that even the accurate calculation shows that the lower bound on the effect size is so much higher than the control group that the responsible thing to do is to stop the trial early.

Here's some the math on how much lower the statistical power is if you "peek": https://www.evanmiller.org/bayesian-ab-testing.html

The word is harebrained unless you think hair has a brain.

I was a contractor working on an IS at the FDA in the early 2000's. Shortly before the VIOXX scandal (https://www.drugwatch.com/vioxx/). The number of cheats they were specifying to go around safeguards was scary. Fortunately, they fired the company I worked for and awarded the contract to the company employing the husband of one of the FDA managers. ISYN. A relief to leave.

rule #1 of hn: anyone on hn is smarter than everyone not on hn no matter the domain. we could solve all the world's problems by just getting everyone into hn!

rule #2 of hn: High signal to noise is preferred over low signal to noise. HN > twitter.

To be fair, and when it comes to health and biology, Twitter >> HN. The comments on most bio-related posts here make me cringe.


I read HN regularly, but for health and biology related information, I'd much rather turn to the various professional communities on Twitter. HN is at best "Lies my Bio 101 Professor Told Me" most of the time, and at worst a worked example of physicist/engineer's syndrome.

We have the best brains!

>>the sort of hair-brained schemes HN believes would fool them

Is there much value in an ad-hominem reply?

Or in generalizing it to an entire community? The irony seems hard to avoid when such a generalization is made without mentioning any statistical foundation, while pontificating about statistics.

> Is there much value in an ad-hominem reply?

> Or in generalizing it to an entire community?

I don't like this style either but sometimes I think we deserve it for being a smug echo chamber.

Basically, much the same thing could have been said in a different way and I would have upvoted it.

> we deserve it for being a smug echo chamber.

This and more. Almost every comment here is made with absolute confidence and authority, much more than is warranted. The tone of the person who said that "it's a great way to commit statistical fraud" without a caveat sounded like a know-it-all who knows better than the FDA and the scientists who made this cure in the first place or that they're outright unethical.

Most times such comments get upvoted. Sometimes they get downvoted when they're called out on their BS. So there's some balance.

Have you read any thread that involves statistics? I havent really detected that low an opinion for the FDA in particular but I recognised the broad outline painted.

I could add quite a few more generalisations myself while I'm at it, HN isn't solely populated by hyper intelligent, well reasoned, logical doctors, who form their ideas completely independently of everyone else. There's going to be group think, there's going to be like minded individuals attracted to one another, that's human nature, deal with it.

I don't even think it's an ad-hominem. HNers are generally argumentative gits (generalisation), who like finding an exception/hack/new way of looking at something (generalisation). I have no problem with some of those schemes being described as 'hairbrained' (sic although actually... [1]), but that's why I come here, because if people didn't think that way they wouldn't be Hackers.

[1] https://www.thefreedictionary.com/harebrained

> sic although actually...

Not sure if that was for comedic effect but I laughed out loud! A brilliant demonstration of the behaviour you describe.

It's a fair question, this shouldn't be downvoted even though I also think the original generalization was fair.

While I agree in general, the article indicates this was stopped after 681 of a planned 725 patients, which seems okay.

Additionally, my understanding from a quick read is that the SPMS trials were aiming to delay progression in disability, rather than eliminate the effects altogether. That seems trickier to measure, and also more subject to just... not spending enough time looking at it.

Ebola, on the other hand, has a binary outcome (dead vs cured) with a timeline of a couple weeks, IIRC.

> Ebola, on the other hand, has a binary outcome (dead vs cured) with a timeline of a couple weeks, IIRC.

Doesn't it leave (cured) people potentially permanently damaged as well? eg whatever organs were damaged, are still damaged

I think what the parent meant was that it's terminal, so if you're alive then you win, even if you're not in the condition you were before.

> While I agree in general, the article indicates this was stopped after 681 of a planned 725 patients, which seems okay.

Are you talking about the European trial, referring to “358 patients with SP-MS were allocated placebo and 360 were allocated interferon beta-1b; 57 patients (31 placebo, 26 interferon beta-1b) were lost to follow-up.”?

If so, your interpretation isn’t quite right. It’s not saying they stopped 57 patients short, it’s saying those 57 people didn’t participate at all, but that’s irrelevant to the early ending. The trial stopped short of it’s intended completion time, not short of a number of patients. The blog post indicates it was stopped 2 years early, out of a planned 2-3 year study period.

That trial wasn't a "statistically spurious event", it was a flawed trial

> the reason the first trial came to an exaggerated impression seemed to be the number of patients who might not have fully progressed to SPMS

Stopping early can lead to statistical fraud, which is why the bar is so high on doing so. But it has to be balanced with the recognition that, if the interim results are correct, continuing the trial will lead to a significant number of avoidable deaths.

And for what it's worth, given the believed cause of the flawed trial being cited, it sure sounds like if the trial had run to its conclusion it still would have produced flawed results.

Having worked in clinical trialling, what I saw made me realise that the final stats may be worth much less because of mismanagement of data.

I can give some hare-raising examples[0] but for obvious reasons... The one I can give is it was known that docs who prescribed this to multiple patients and saw an apparent improvement the attributed to the new drug, would switch the patients on the old drug to the new and not inform us. Obviously they couldn't or they'd invalidate the trial and they knew it. And it was done entirely with the patient's best interests at heart. Docs care about their lives.

That may have been rare and a flaw in the trial protocol, but much worse stuff was done via utter incompetence. And I mean including at the top of these giant drug companies. Run by idiots, really.

Wider lesson: just cos you hand over a process to a third party does not mean it's going to be done right.

[0] tribute to the thread elsewhere

I think it's hilarious that in this one HN thread we have a complete word swap of terms. You said Hare when it should be Hair and they did the opposite.


It's supposed to be hair-raising because it's alarming and surprising etc. Nothing to do with taking care of hares :)

Switching extra patients to a new drug would make the trial more conservative though... so perhaps don’t worry so much.

In general it’s worth remembering that RCTs are estimating the effect of an intention to treat ( the randomisation) not the treatment itself.

> ...would make the trial more conservative though...

If the doc correctly divines that the drug is improving things, yes, but there is noise in the signal so it may be just noise causing the few apparent improvements the doc sees. If so, doc's action may be smothering a less-then-obvious signal.

There's too much at stake for that to be acceptable.

> ...are estimating the effect of an intention to treat ( the randomisation) not the treatment itself.

I don't understand - the protocol applies the drug (aka the treatment) and results are measured. I don't understand 'intention to treat'. What is 'intention' here, so tech term I am not familiar with?

There’s quite a large statistical literature on stopping rules for trials. Any trial that is stopped would have had to plan this in advance and an evaluation of his policy would be an important thing for the data monitoring and ethics committee to review, but shouldn’t be a cause for concern. OP is right to celebrate

Just do stats all the way down.

There are 6 possibilities:

1- stop early, effective

2- stop early, ineffective

3- passed all trials, effective

4- passed all trials, ineffective

5- failed trials, effective

6- failed trials, ineffective

Assign a utility value (lives saved, risks, cost,...) and a probability to each situation and do the maths.

P(1) = P(3) + P(5)

P(2) = P(4) + P(6)

So, if (P(3) + P(5))U(1) + (P(4) + P(6))U(2) > P(3)U(3) + P(4)U(4) + P(5)U(5) + P(6)U(6) it is better to stop early.

The risk of fraud can be taken into account.

Statistics Done Wrong has a great chapter on this: https://www.statisticsdonewrong.com/regression.html

FYI people design trials with breakpoints, bandits etc. The issue isn't early stopping, the issue is inappropriate analysis of early stopping.

It's not that terrifying, there are many drugs approved that are not even more effective than cheaper, older drugs on the market. It takes a wild-scale deployment with 1000's of patients to find rare side effects and to measure the true population-level effectiveness of a drug. This is contrary to the clinical-trial level "efficacy" of a drug, which is with a more limited in terms of N= and in terms of diversity, population.

Really, the whole drug approval process is filled with uncertainty. Partly it's how the sausage is made in biotech, but also pharma companies love this method as it allows them to put forward non-inferior drugs with higher margins. We can reform the process, and should, but there will still be statistical uncertainty for 100's of new candidates, until they reach the post-market study phase.

It seems that if this is the case for this new drug it would be fairly noticable that the death rate jumps back up to 50%

That works both ways too, the mortality rate went from 50-75% to 6% if they sought treatment immediately. Which is a significant indicator of success.

Doesn't giving the drug to everyone in the trial at least add some statistical information (i.e. what is the likelyhood on it working for everyone vs. it being a spurious event)?

That's not statistical fraud.

It can be important to ethically stop a trial based on the information available, even if the information available later turns out to be incorrect.

Right but its Ebola. Under the conditions of certain, painful death its difficult to imagine a scenario the risk of no benefits outweighs the cost of taking the drug.

These are, for the record, most often specified pre-trial (as are the stopping thresholds for harm)).

no, when certain conditions lead to certain outcomes like cancer, infections and so on, you cannot really go wrong when results come in early and are so positive they are massively significantly different vs the usual course. effect size.

Indeed you can go very wrong by not doing so. Medical ethics is a balance of risk and benefit, not "What is best for statistical power?" Equipoise is required for trials, and is often a precondition of being able to do studies in these settings.

Heck, a lot of recent work on novel trial designs like stepped-wedge are specifically designed to balance equipoise and statistical needs.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact