Hacker News new | comments | ask | show | jobs | submit login
Case Studies Where Phase 2 and Phase 3 Trials had Divergent Results [pdf] (fda.gov)
104 points by aaavl2821 6 months ago | hide | past | web | favorite | 31 comments

A more recent, and very high profile, example of this is Incyte and Merck's failed study of Incyte's IDO inhibitor in combo with Merck's PD-1 inhibitor [0]

Immuno-oncology has been the most active area in biotech VC for years. THe excitement has been driven by PD-1 and PD-L1 studies showing dramatic benefit in many types of cancer. However only 20-50% of patients respond to these drugs, so VCs and pharma companies have been spending billions of dollars studying "combo therapies", basically regimens where you add drug x to a PD-1 inhibitor to improve effectiveness and response rate

IDO was one of the most promising combo drugs. Bristol myers bought a leading IDO startup for $1B 18 months after series a. after this study pretty much all IDO programs were canned across industry. beyond that, many investors / companies now want to see that a drug works as a standalone agent, not just in combo w another drug

as the paper says, there are lots of valid reasons to use innovative study designs, but proper phase 3 RCTs are importnat

[0] https://www.businesswire.com/news/home/20180406005141/en/Inc...

I was surprised when I first got involved with drug development, how rushed the process is. Phase I and Phase II trials are initiated by pharma with strikingly little understanding of how the drug works. A classic example is iniparib, which made it to a phase 3 trial but turned out not actually hit the target it was supposed to hit (PARP).

The reason for this is I think economics. There is a large advantage to being first to market, especially if the disease has no other treatments. This means that you can set your price, there is zero competition, and you disincentivize all other companies from developing similar drugs. Under these conditions, you can recoup development and trial costs in 1 - 2 years. So it is no wonder that drugs with any sort of signal are pushed quickly through early phase trials.

Yes, this is a very good point -- you're on the clock as soon as you file your patent, so sponsors have an incredibly strong incentive to take risks with trial design and use surrogate endpoints rather than direct endpoints for the primary to cut down on time. Then it's really an educated guess (albeit a highly educated guess) for your Phase III trial design to balance out your market vs chances of proving efficacy.

Patent life is an important factor, but the other one is that it’s sometimes really hard to validate how a drug works. Either you don’t have the technology, money or time to do it. And in the end all that matters is it works in humans.

My favorite story is about gabapentin. Designed to block the GABAase enzyme. It made it all the way to market before they realized it actually works through a different mechanism.

As someone on a phase 2 trial an interesting note is that I know I'm on the trial drug (it's not a double blind trial), I also know that the drug worked in animal and vitro trials.

Furthermore I know, from the trial literature, that a very similar drug works (the trial drug is an anti-PD-1 immunotherapy) well with the particular genetic markers on my tumours.

The trial is to assess the efficacy of this treatment, not run a comparison to other treatments, but having burnt through 2 rounds of chemo in short order to no effect it is more a comparison against that.

Whether the placebo effect is relevant is an interesting question because under the trial the treatment is, by necessity, more personalised.

Previous studies have shown that, essentially, human contact/interest adds to the placebo effect.

This happened to Otonomy about a year ago and their stock fell off a cliff. The interesting thing with them was that they were doing 2 sets of phase 3 trials: one in the US, and one in Europe. A little while after the US one failed, the data from the European one came out and showed statistical significance. I remember reading that some of the scientists at Otonomy believed the problem with the US study was that some of the doctors who were administering the drug were telling the patients that it really worked, because the success for placebo patents was higher in the US than it was in Europe.

A lot of people underestimate the challenge of running a clinical trial. Clinical trials are often run across multiple sites/countries where medical standards differ. The placebo effect can be brutal particularly in the more subjective trial endpoints.

People crap all over the anti-depressant clinical trial data, but guess what? It's a poster boy for the placebo effect and trying to development objective endpoints you can measure.

Here is a recent example where the same trial has and has not been a success, depending on how you look at it: https://www.forbes.com/sites/matthewherper/2018/07/05/biogen...

After looking at the first few examples I skimmed the summary header, ALL of the (examples of) phase 3 failures include lack of efficacy (some also lack of safety) as the main failure.

Since medial trails are very expensive and time consuming, I can't help but wonder if identifying this would have been possible from prior phases, possibly with better study design of at least the phase 2 version of the study.

The most commonly cited study of success rates across trials lists Phase 3 probability of success, in aggregate, at 70% [0]

Phase 2 success rate is 34%.

Phase 3 studies are more expensive than Phase 2, but the higher failure rate in Phase 2 makes Phase 2 failure the largest driver of the cost of drug development.

Because predicting Phase 2 success from earlier stage data is a daunting task, the industry has started tackling this issue by "failing faster / cheaper", getting to Phase 2 data quickly and cheaply, to reduce the cost of Phase 2 failure. However, companies don't want to fail -- so they sometimes design Phase 1/2 proof of concept studies to increase chance of success rather than increase truth finding. This may end up causing the Phase 3 failure rate to increase, though there isnt yet data to prove this. The IDO-PD1 failure from Merck / Incyte could be interpreted as an example of this phenomenon

I wrote a blog post that in part discusses ways companies are trying to reduce cost of Phase 2 failure: https://newbio.tech/blog/bio_startup_ideas.html

[0] https://www.ncbi.nlm.nih.gov/pubmed/20168317

The problem is that we don't have any reliable means to predict in vivo efficacy.

Recognizing failed attempts earlier means substantial reductions to the cost of drugs (i.e., cost savings on the order of hundreds of millions of dollars), so there is very strong financial incentive to stop projects that will fail phase 3 as early as possible. Pretty much every pharmaceutical company has as its "what we'll start doing better" goal as killing off failures earlier. That nothing has appreciably changed in this regard suggests that the fix is beyond our current capabilities--human biology is simply too complicated and too confusing for us to reliably predict at this point.

Actually, 1 out of the 22 (Olanzapine Pamoate) only had lack of safety, not lack of efficacy. It went on to be approved with a “Risk Evaluation and Mitigation Strategy”.

Figuring this out is a "and now I am rich beyond avarice" scale problem.

The problem is the marketing department. Most of the big diseases (cancer, depression, heart disease, dementia, etc) have complex and diverse causes and so only a subset of patients will respond to any one drug. At Phase 2 the trial co-ordinators have relatively strict inclusion criteria targeting the likely responders (i.e. the subset that have the version of the disease the drug will work on).

When it gets to Phase 3 marketing gets involved in the inclusion criteria and they want the criteria to be as broad as possible so they can market to the widest number of patients. The aim is to have a criteria that is as broad as possible where the Phase 3 trial just scrapes across the line. This is a difficult thing to get right and sometime they include too many non-responders and the trial falls over.

The only real solution is to get the cost of developing a drug down so that there is not the need to go so broad. Of course this will require taking more risks (the level of pre-clinical and clinical testing needs to be reduced) so I am not sure this is viable.

This isn't quite right - the selection of patients is usually not explicitly looser in phase 3 trials.

Speaking from experience in oncology, in early phase trials, the patients are younger, healthier, wealthier, better educated, have less burden of disease and also probably have better disease biology. This is partly because doctors have to decide to refer a patient for a trial, which may be at another hospital or even another city - they select the 'best' patients for these purposes. This is just being practical in many ways - an 80 year old in a wheel chair isn't going to be able to spend 2 hours commuting every day to enrol in a phase I trial. A single mother with 4 kids working 3 jobs is also going to struggle to take multiple days off work for the trial. Patients that self-select for early phase trials are also a distinct group, generally healthier, wealthier and better educated.

There is a subtle point about disease biology - many early phase trials start in populations that have run out of other treatment options. You actually need a relatively indolent cancer to make it to this stage. Very aggressive tumours will kill the patient before they can get through 1 or max 2 lines of therapy to even be eligible for the trial.

In phase 3 trials, these selection biases are less prevalent (that's the point) and randomisation also takes out the effect of patients that were going to do relatively well anyway receiving a treatment that may not be effective.

As someone who works in the industry, none of this seems accurate.

- Marketing gets involved before phase 3. Phase 2 results are usually where the business decision to proceed to phase 3 are made, so commercial needs input on the trial design.

- Inclusion/exclusion criteria might be tweaked between phase 2 and 3, but they don't tend to drastically change as you're just asking for the trial to fail.

- Development works with the FDA to craft inclusion criteria that preserves the scientific rigor of the trials, but also allows for the broadest possible indication. In general, the FDA grants a much broader label than what is described in the trial criteria.

The point about industry moving away from drugs that require large phase 3 studies to get approval in large populations is accurate. The vast majority of r&d these days goes to cancer -- increasingly genetically defined homogenous populations -- and rare disease. Smaller markets, but theoretically more tractable patient population (genetically defined population treated with genetically targeted tx) and higher pricing power

You have basically just summarised what I said :)

I would not say the criteria are just “tweaked” between Phase 2 and 3 with any drug aimed at a broad market. It is always a balance between broadness and success and sometimes even the best in the business get it wrong.

The FDA do allow a broader label than the exact trial criteria, but not massively so. If you restrict the Phase 3 to an identifiable sub-population good luck getting approval to expand to the larger population without running post-approval trials.

Where we disagree is on three points:(1) marketing gets involved well before phase 3,(2) most trial criteria don’t change that much between phase 2 and 3 [yes there are exceptions] and (3) the FDA allowing a broader label than inclusion criteria.

As for the last one, most phase 3 trials have strict criteria: no other diseases, age limits, disease severity limits, etc. Those typically don’t make it into the label’s indication.

I do agree that trials are designed to balance getting a positive outcome vs. the broadest population possible, but marketing isn’t the only one driving that.

1. Yes marketing gets involved before Phase 3, but they tend to not have the clout they have going into Phase 3 - generally when it is unclear that you have something that works at all it hard for marketing to argue in favour of broadening the inclusions criteria.

2. I think your point 2 is where we disagree the most. We are really arguing over what is a significant change or not.

3. The FDA will be looser on some things like age and other diseases (within reason as you won’t get approval for children if you only trial in adults), but not on sub-groups. If you make your trial exclude on some disease sub-group criteria you are not going to get FDA approval to go outside that sub-group and sell to everyone with the disease.

Who is driving trial broadening other than marketing? If I want the drug to succeed I want the trial population to be a homogeneous as possible and only include likely responders.

You sound like you understand this process well, so dumb question: why is the Phase 3 process so rigid?

Wouldn't it make more sense to simply have multiple inclusion criteria, (e.g. broad, more targeted, and same as Phase 2) and collect enough data in the front to be able to discriminate the patient at an individual level?

That way if Phase 3 for 'broad' criteria fail, you can still demonstrate the drug works for a more strict population and maybe actually release it with that on the label?

This also avoids data mining (where you keep looking for subsets until one seems to pass your significance criterion) since you came up with the criteria ahead of time.

You'll need more patients, which is expensive.

But yes, you're actually right -- sponsors do run stratified phase 3 trials where subsets of patients are measured independently per pre-approved statistical analysis plan. The rub is that your patient population and your trial design needs to be sufficiently statistically powered to do that kind of analysis. The required number of patients quickly increases for the number of stratifications/subsets you're trying to prove an indication for, and patients are extremely expensive, so sponsors generally try to run the smallest trial possible while still being adequately statistically powered.

This is generally why in Phase IIs, sponsors will stratify by exploratory biomarkers, where there may not be sufficient statistical power per stratification, but can give them some idea for what to include in the IIIs.

The reason why is that you can’t make any money just selling to the sub-population in many cases.

Yes you are right that you can design the trial to identify the sub-population that the drug works in. Sometimes drug companies do do this, but if they are going to go down this path they normally run a tighter Phase 3 trial and then try to expand post-approval by running more trials. It is all a cost/benefit analysis taking into account the patent life and lots of other commercial factors.

The point I was making is that a high failure rate at Phase 3 is to be expected given the marketing needs of the company. A Phase 3 trial is not just a scientific exercise in proving a drug works, it is a complex balancing act where commercial concerns are critical.

It is quite possible to succeed and still fail your trial. Which is one of the many reasons these things should be thrown out in favor of patient choice. You could spend 1/10 of the cost of later trials on publishing risk assessments and still be far ahead of the game of risk management.

Another problem is that too many of today's medical technologies are horribly marginal in their benefits. Marginal benefits have a way of smoothing out to nothing as the patient population broadens. The industry isn't approaching medicine for age-related disease in the right way. If you are actually addressing a useful mechanism, the effect won't be marginal. Science managed that shift from marginal to effective for infectious disease. There is now a pending transition for the diseases of aging - from marginal messing with the disease state to actually addressing the root causes.

Here is a phase 3 that succeeded and failed. Clearly it works. But the study design is such that they had an unexpected result:


They show a good effect in the treated eye, and their gene therapy also produces benefits in the untreated eye. Since they don't know how that could happen, no approval.

Patient choice is an idea that sounds good in theory but just has no good way of actually working in practice.

Most people are going to be uninformed about medical science. (Hell, most medical practitioners struggle to stay informed). Worse, there is a strong bias towards misinformation--look at the health supplement market, which is basically a market the seedy snake oil sellers set up to get around FDA regulation.

On top of this, keep in mind that it is very easy to accidentally design bad experiment methodology. With respect to clinical trials, it's all too easy to take a failed trial and say "it actually worked on this subgroup" because there are often enough subgroups to find one with a spurious correlation. And when you have monetary pressure to salvage a failed product, the p-hacking is less likely to be internally questioned. The current regulatory regime requires that you declare how you're going to define acceptance before you run the trial to prevent this kind of p-hacking, and if you truly believe that you're not p-hacking, you're welcome to submit another trial to show correlation on that subgroup (unsurprisingly, many of these end up coming negative).

> Worse, there is a strong bias towards misinformation--look at the health supplement market, which is basically a market the seedy snake oil sellers set up to get around FDA regulation.

The pure food and drug act was passed to deal with the snake oil industry, which was in the process of reinventing and legitimizing itself into the pharmaceutical industry. The supplement industry was later exempted from the requirements for patent medicine producers because there is no way to monopolize the profits of many treatments based on supplements. Some companies have patented certain aspects of their supplement's production or delivery mechanisms, but (I believe) the natural biological molecule itself can't be patented.

https://en.wikipedia.org/wiki/Pure_Food_and_Drug_Act (1908)

https://en.wikipedia.org/wiki/Federal_Food%2C_Drug%2C_and_Co... (1938)

You can't have an informed risk assessment without clinical trials because you don't have the data to inform assumptions about risk / benefit. The models currently used to predict risk / benefit are really poor predictors of whether something is safe and effective. If you don't do phase 3 RCTs you'll just end up w a bunch of people gaming the risk assessments and getting drugs that don't work approved

Statistics and study powering are designed to address the issue you cite about marginal benefits "smoothing out" as the population gets bigger. That's why you have to do big phase 3 studies with __predefined__ endpoints and statistical analysis plans powered by estimated effect sizes

Saying that phase 3 study "clearly worked" is not true. It didn't do better than sham. Yes it could be true that there is some drug related benefit in the nontreated eye, but another interpretation is the sham injection itself could cause the observed improvement. Or the study design was wonky, endpoints inconsistently measured, etc

> If you are actually addressing a useful mechanism, the effect won't be marginal.

If there are 10,000 different mechanisms that each only affect 0.01% of the population, because the actual "problem" is just gradual DNA damage over time leading to one of 10,000 different problems... then what else is there to do but to come up with 10,000 solutions? (I mean, you could figure out how to entirely replace DNA with something that isn't affected by radiation, or has cryptographically-strong checksums, or something. But in terms of realistic solutions...?)

In the case of infectious disease, clearly your scenario isn't what happens in the real world. Otherwise no disease could be cured.

In the case of aging, everyone ages for the same reasons. A few types of underlying molecular damage that are comparatively straightforward to investigate and address, when compared to the enormous complexity of metabolism. Look at senolytic drugs: remove senescent cells, life span increases, aspects of aging are reversed in old individuals. Aging is like rust in a complex metal structure; the failure modes appear varied and complex because the structure is complex. But rust isn't complicated. Aging is the same story.

Genetic variation is not particularly meaningful for the vast majority of age-related disease. It isn't important for most of the life span. It only becomes even somewhat influential in very late life where there is variation in resistance to damage and consequences of damage. But why care thing one about that when the right strategy is to repair the underlying damage in order to ensure that people either never enter or are removed from the situation in which genetics start to matter a little?

"In the case of infectious disease, clearly your scenario isn't what happens in the real world. Otherwise no disease could be cured."

Note that your statement is only true for a relatively narrow range of infectious diseases, and mostly in special cases where we have "Nuke the site from orbit" class drugs that don't target things eukaryotic cells use.

> Here is a phase 3 that succeeded and failed. Clearly it works.

That's not clearly works. Quantifying works/succeed with statistic is how you know it clearly works because it's statistically significant. Other wise your clearly work statement is just your opinion and the opinion of a drug company trying to make money.

If it fail and succeed in phase 3 then it's not clear at all if it works.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact