Hacker News new | past | comments | ask | show | jobs | submit login
Regression to the mean is the main reason ineffective treatments appear to work (dcscience.net)
137 points by Amorymeltzer on Dec 15, 2015 | hide | past | favorite | 55 comments

Also known as the Drill Sergeant Paradox.

Imagine that shouting has no actual effect on performance, but it is traditional to shout at underlings when they do something particularly poorly. When your trainees screw up, you berate them - and afterwards they actually do tend to do better. Unfortunately, this is because the screwup is more often than not a random variation, and the improvement is due to the mean regression, not the treatment. Conversely, praising them when they do well (again, assuming no underlying effect) actually seems to worsen their performance.

My favorite way to illustrate this is to imagine your 'recruits' are coins that you will flip. Heads is a good result, tails is a bad result.

Now, flip all your coins. Yell at the ones that got tails, praise the ones that got heads. Now, repeat your test. Wow, 50% of the ones you yelled at 'improved' and 50% of the ones you 'praised' got worse!

It is clear we need to yell at coins more.

I wonder, has anyone ever tried berating the successes and praising the failures?

The VC culture? If you define failure as "didn't make any money" and success as "made some money". Recently there was an article lamenting the "early exit syndrome" that startups have, where they prefer getting some money to potentially getting a lot more or nothing

I think you might be referring to the early exit syndrome of Midwestern startups [1].

The point of the article was not to say that exits are bad, only that to develop a cultural milieu of technology in a particular region, there is a need for at least some companies to stick it out and remain determined to become long-lived and predicated upon a chosen workplace culture.

If companies never do that, then a given region can't generate enough inertia to remain a competitive place, and this has all sorts of bad effects on the employment market in that area.

I thought Michael O. Church had a great extension of that article in his recent post [2]. He argues that the broader short-term goals of VCs in general are misaligned with the fundamental value creation premise that undergirds the idea of start-up culture (or at least what was start-up culture when the term was new).

[1] https://news.ycombinator.com/item?id=10579370

[2] https://michaelochurch.wordpress.com/2015/11/17/its-not-earl...

That's a really dumb idea ;) </meta>

As written, I agree. But a more sensible way (well, kinda, still backwards): Praise the individuals even when they screw up (great job cleaning that kitchen, even though there's still dirt under all the tables) and berate them when they do well (nitpick even the tiniest speck of remaining dirt in an otherwise immaculate kitchen).

Still seems like a bad idea. Destroyed morale for the ones that do well but get scolded anyways, and encouraged laziness for the ones that do poorly.

I believe this is called "marriage"

Wasn't that one of the items in that CIA sabotage manual posted a few weeks ago?

Shouting in the military particularly during boot camp is used to create a simulated stressful environment.

The same idea happens in sports coaching, where simulating a stressful environment is not generally desired except in certain situations.

Yet coaches yelling at youth league basketball players who just so happened to miss their first few free throws, but then regressed to some respectable mean by making an average number of free throws after that, will pat themselves on the back and believe that yelling was a form of constructive feedback.

The coach may not be using yelling as a tool. He may be stressed because he needs the win either to stay on as a middle or high school coach, or because he needs the school trophy for his sense of self worth. The kids may just be a resource for his own goals.

Also big with speed camera installations. A spate of accidents occur, speed cameras get installed. The rate of accidents falls, the speed cameras are claimed to be the result.

The problem is that accidents are randomly spaced (assuming some level of road design and driver training) and putting in a teepee beside the road would probably show the same effect. Particularly so if the speed camera is not obvious, so is unlikely to have an effect on a driver unless they travel the route frequently and get fined.

Isn't part of the Drill Sergeant's role to encourage close team cohesion? If you have 50 diverse marines that need to get along, giving them a common enemy (the shouting Drill Sergeant) is beneficial. If he shouts at everyone all the time, there are no favorites.

>Imagine that shouting has no actual effect on performance

Why would shouting have no effect on performance? People generally don't like being shouted at and often modify their behavior as a result. At the very least, a person being shouted at has to process the input.

Think about it, the shout interrupts whatever context existed before the shout. Future actions are now based on a context where there was an interruption instead of a context where there was no interruption.

In the military, there are three options. 1. You know you fucked up and you are going to get shouted at. 2. You did something forbidden and you might get shouted at if you get caught. 3. You did something OK, still got shouted at and now you don't know how you would do it better. You repeat what you did, get shouted at again. Until the asshole has ran out of steam.

The shouting has absolutely no information value ever.

What you describe might work for low motivation individuals. But they are hopelessly used to the shouting at the end of basic training. So the officers have to use threats instead.

I used to get these tension headaches while waiting to be shouted at. Being shouted at was a relief - switch off, nod nod. Running around and around carrying something heavy after the shouting was relaxing; while I was doing that, I could switch off. There wouldn't be any shouting while I was running around and around carrying something heavy. I realised I was being conditioned to seek out being shouted at.

If we look at it from a behavioralist perspective, one that drops the addition of a introspective hypothesis ("People generally don't like being shouted at and often modify their behavior as a result", contexts) in favor of empirical evidence, we could simply test if shouting has any effect on performance.

Behaviorialism is bunk.

>we could simply test if shouting has any effect on performance

Maybe we are using different definitions of performance, but what you want to test is essentially untestable. You'd have to be able to observe the exact same situation twice, once with shouting and once without. This is impossible.

To get 100% certainty, yes.

But this isn't generally required to show close to certain probability.

If the same is large enough (doesn't have to be very large) and controlled for possible bias, then the test should reveal the effect of the difference between sets.

Perhaps. I'd say that each individual situation is highly context dependent. I doubt the sergeant yells the exact same thing to everyone. The problem is really that "improved performance" can't really be defined (someone could attempt to), therefore there is no test to detect.

> "improved performance" can't really be defined

If performance cannot be measured then the sergeant shouldn't be yelling in the first place. Performance is set at a very clear standard, if you underperform consistently and you then perform at the standard or above, you've improved your performance.

I don't advocate for yelling. I'm only pointing out that yelling must have some effect on future outcomes. You may not be measuring all future outcomes, however.

The desired outcome is a change in performance relative to a standard that has been set. That's all that needs to be measured.

And I agree it may have other effects, but what we're interested in is the desired result of the sergeant.

And yelling may be effective, but without testing it we shouldn't advocate either way.

Behavioralism might be, but behaviorism isn't. My mistake.

I assumed you meant behaviorism, but used your words. Behaviorism is bunk.

I guess that depends on the reason for poor performance. If it's lack of effort then I guess you are right. If it's lack of ability, shouting may not make a difference. If it's nervousness or lack of confidence, then shouting is probably going to make things worse.

More generally this is an empirical question. Does shouting cause improved performance? Just Googling, a lot of people don't think it does (me among them) and seem to have plausible reasons, but that's not an empirically decisive conclusion.

Threat of shouting likely increases the cognitive load leading to poor performance.

This is, unfortunately, rampant in healthcare. The natural variation amongst people as well as the natural variation of our health throughout out life makes actually analyzing healthcare outcomes incredibly difficult.

90%+ of published outcomes can be invalidated by simply looking at the published data. If you want some chuckles, read blog posts by Al Lewis ripping on research publised by companies touting their own performance. He's acerbic and condescending but also, mostly, correct.

In a bio class a professor summed it up with "the hardest thing about medicine is being able to tell whether anything actually works."

I told a friend of mine who's into alt.med once that yes I think a lot of alt.med is bunk but percentage wise it's probably not that much more bunk than "mainstream" medicine.

It's easy for emergency medicine. Customer comes in with bullet wound, leaves alive and healthy and without bullet wound. But outside of domains like that it's crushingly hard to tell the difference between treatment and noise. Compounding the fact is that humans aren't a standardized item that can be compared against reference performance metrics. We're all different and our genome and environment are constantly changing.

> alt.med is bunk but percentage wise it's probably not that much more bunk than "mainstream" medicine

Oh yes it is!

First of all, the actual effort is there in real medicine, while alternative medicine just doesn't care about evidence.

Also, most real drugs work as intended. The fringe cases get emphasized a lot, but still, the basic things, like antibiotics, painkillers etc. work very well. Medicine has advanced A LOT in the last 100-150 years.

Fake medicine, like homeopathy, is nothing like this. Don't mislead people.

I agree to some extent, but not completely. Some day population-wide studies fishing for a p value below a certain threshold will be looked at with the same disdain that we look at homeopathy. "They actually believed you could do studies across genetically diverse populations like that?!?! wow! Didn't they understand that everyone's metabolism is different?"

Throw in a bit of pharma companies funding their own studies and I don't think the situation is as black and white as you say.

Preachy overconfidence in the clear superiority of enlightened scientific medicine actually bolsters public support for alt.med as a reactionary movement.

I hate being the guy to call fallacy,* but this reminds me of the Nirvana fallacy here.

You're not comparing the level of scientific rigor in mainstream medicine to the level of scientific rigor in alternative medicine. You're comparing it to the level of scientific rigor that we wish existed in a perfect world. Are you also willing to measure alternative medicine against the same standard so that they can be compared on an equal footing?

P-value fishing is hardly the exclusive province of mainstream medicine. And in the "funding their own studies" department the most notable difference between mainstream pharma and alt med is that the mainstream pharma companies are legally required to disclose this fact (and preregister their studies in order to discourage fishing, too), while in alt med it's standard practice to use cute little financial engineering tricks to try and hide where the funding came from.

* But I don't hate lying on occasion.

I'm pointing out that as a social movement alt.med is driven in part by a mismatch between the real success and reliability of scientific medicine vs. the claims made by its authorities.

Is fat good for your or bad for you this week?

Pointing out that the core of the movement consists of throwing the baby out with the bathwater isn't really an effective defense.

I think depends on the medicine. Like fixing a bullet wound, some medicines obviously work and you don't need a study to know that (for blood pressure, insulin, severe pain, etc). But those are perhaps the ones we take for granted and don't really include when we think about the idea of medicine being bunk.

What's more suspicious is medicine that people use just because they're in a culture of taking medicine for everything. Reliefs for colds, coughs, mild pain, sore throat, psychological problems, etc. These illnesses have a strong get-better-anyway effect so it's very easy for people to believe that whatever they took cured them. Don't believe me? Go to China and you'll see people taking herb drinks for the same illnesses and having just as much faith in their effectiveness. Other popular cures include drinking warm water and "growing a pair".

I'm not sure if your second paragraph is aimed at mainstream medicine, or "alternative" medicine. Mainstream medicine does not claim to have a cure for a cold. Mainstream medicine masks the symptoms so that people are less miserable while the cold runs its natural course.

What's curious with your list is that one is a disease ("cold"; the rhinovirus) while the others are symptoms ("cough, mild pain, sore throat"), and the other is an entire class ("psychological problems"). It's hard to respond to what you said because you called them all "illnesses".

While you wrote "I think depends on the medicine", I think it actually depends on the disease.

I think you are correct in that many people confuse palliative treatment/medicines (eg, cough drops, warm water, gargling with salt water) with curative treatment/medicines. However, in doing so I think you've changed the topic from the differences in medical vs. alt. med., as measured in patient outcomes, to the differences in how patients subjectively view the different forms of treatment.

So the research that established the placebo effect—an effect that’s well-known, and widely regarded as an illustration of the importance of control groups in research—itself had no control group? That’s incredible.

I don't know about that. There seems to be a lot of research investigating the difference between placebo and no-treatment. The publication [0] mentioned in article searches out all the studies that tested placebo treatment, and they all included a no-treatment group.

Here's an excerpt:

>We included randomised placebo trials with a no-treatment control group investigating any health problem

[0] http://www.ncbi.nlm.nih.gov/pubmed/20091554

I meant the original 1955 publication of “The Powerful Placebo” that put the idea of the “placebo effect” into the public consciousness. (Use of the term really takes off starting then [1].)

That article's more recent (2010) link supports its conclusion, which wouldn't need to be stated unless it contrasted with what it calls “pharmacological folklore” from the original publication, which the author says he “took literally” for many decades after.

The point being that the 1955 version of the idea is currently much more widely known than modern no-treatment comparisons, and most people don't realize that its research basis is so flimsy.

[1] https://books.google.com/ngrams/graph?content=placebo+effect...

At school, I had a friend (studying to be a Physician) who referred to Chiropractic as "applied regression to the mean".

>"The only way to measure the size of genuine placebo effects is to compare in an RCT the effect of a dummy treatment with the effect of no treatment at all."

This is just wrong, the following procedure would be much more convincing than a RCT:

Decide what you are measuring, collect group of people A, measure it, give people placebo, measure it again, record results. Then repeat under different circumstances with a different group of people B. Maybe even go back and do A again. Are you getting consistent measurements? Good.

Now come up with an explanation for why the results have that distribution, sources of variation, etc. Use that to quantitatively predict what should be seen a new group of people C. Now go check group C. Did it match the predictions? If so, good, keep at it.

Make a prediction for group D. Do the group D results match? If so, you are probably interpreting the results right.

How does that address the get-better-anyway effect, without a group to measure it?

Because you are predicting precisely how much better can be accounted for by the placebo effect, no more, no less (of course within error bounds).

The get-better-anyway effect is too vague. Say theory A predicts value x is in the range [3.1,3.3] and you observe x=3.36 +/-.1. Also, there is a theory B that predicts x>0. Theory A has been much more severely tested so is supported by the evidence. Theory B, meh.

Say theory A (the placebo effect theory) predicts a 25% improvement, because it's been measured in the past with that result. Then there's also theory B (the get-better-anyway theory, or null hypothesis), which wasn't measured initially but in recent studies shows a similar 25% improvement.

I don't understand how more studies testing theory A and getting 25%, while excluding theory B from testing, can show that A should be preferred over B. The rationale that A has been tested more thoroughly and therefore has a smaller confidence interval sounds circular.

>"Say theory A (the placebo effect theory) predicts a 25% improvement, because it's been measured in the past with that result."

That is not a theory. You have to come up with an explanation (a theory) for why it should be that value. Then from that theory you deduce predictions for what it should be under other circumstances. If the predictions are consistently close to the observations it indicates you are onto something.

The discussion is about how to test the placebo effect (which corresponds to your example's "theory A"), so the proposed alternative to RCT would have to include the process of first deducing the effect size. For this problem, how would you realistically do that without referring to past experimental results? There are so many unknowns, including psychology. The 1955 paper was itself a survey of experimental results.

I expect it to be much easier to come up with a statistical prediction for the get-better-anyway effect (regression to the mean), with more specificity than x>0. It would be needed regardless, in order to exclude the null hypothesis.

My proposal includes collecting experimental data. It is just not in RCT form (which was claimed to be "the only way").

There is no reason to ignore RCT results, but that is not the only way to get the necessary data (which is what I disagreed with). An RCT is also not sufficient on its own, you need the theory that explains the results as well.

As an untrained outside observer (but one who has been thinking a lot recently about the statistics of causality) I'll chime in to say that I'm siding with 'ashearer' here. As I'm reading it, your approach doesn't make sense to me. If you want to distinguish between two treatments (placebo=treated and do-nothing=untreated) you need to measure outcomes for both treated and untreated individuals, and these individuals need to be "exchangeable" to some degree.

Perhaps when you say in the first post "give people placebo", you mean "give the placebo to some subset of the individuals leaving the remainder untreated"? I agree that you can often draw useful conclusions from an observational study as long as some comparable individuals receive each treatment, and as long as you can control for the initial differences between the treated and untreated. But if you are saying that you don't need to look at the results for the untreated group at all, then I think you are mistaken.

>"Perhaps when you say in the first post "give people placebo", you mean "give the placebo to some subset of the individuals leaving the remainder untreated"?"

I meant just placebo. I am using predictive skill of the theory behind the explanations to distinguish between them, this is common in physics/astronomy/etc. As noted by asherer, placebo effect has been studied for a long time in the manner of seeing how two groups differ. There is nothing wrong with RCTs, but does near sole reliance on them (and their observational analogue) appear to have lead to cumulative knowledge and growth of understanding?

Both hypotheses in this case are plausible. Beyond that, it would be great to have the tools to numerically calculate a psychological phenomenon such as the placebo effect the way a physicist calculates the energy of a particle. But we're very far from having them, and in the meantime, I don't know of a practical alternative to RCTs that would expand our knowledge at least as much. (I think we can both agree that over-reliance on significance testing is a problem, though not one limited to RCTs.)

Check out the paper I posted here earlier: https://news.ycombinator.com/item?id=10556521

This factual background deserves to be better known by Hacker News participants, who often avidly discuss statements about medical research without much background in medical research methodology. You, your friends, and all of us deserve to have new proposed treatments and old standard treatments evaluated thoroughly for safety and for effectiveness.

A really good source for all of us to follow about the latest research on placebos in human medicine is the group-edited blog Science-Based Medicine,[1] which is edited by several active medical researchers and also includes lawyers, pharmacists, and even a reformed chiropractor among its contributors. The blog includes many informative posts about the placebo effects[2] observed in human medical research that help illuminate issues we all discuss a lot here on Hacker News.

Some of my favorite recent articles on placebo effects from Science-Based Medicine include "Are Placebos Getting Stronger?"[3] (21 October 2015) by Steven Novella, a neurologist; "Placebo by Conditioning"[4] (29 July 2015), also by Dr. Novella; "Should placebos be used in randomized controlled trials of surgical interventions?"[5] (25 May 2015) by David Gorski, a surgeon and cancer researcher; and "Placebo, Are You There?"[6] (24 February 2015) a translation by a French-language article, translated by Harriet Hall. All go a long way toward explaining just what has been shown, and just what has not been shown, by previous research on placebo effects in human medicine.

"Placebo medicine" has so far only been shown to have any effect at all on self-reported subjective patient symptoms such as pain and nausea that ebb and flow in the natural course of untreated disease. If you broke your arm, you wouldn't seek placebo treatment from your doctor, but actual effective treatment. If you have an injury that causes chronic pain (as I do), you are best off looking for the best available medically verified standard treatment, and not looking for any kind of placebo treatment.

[1] https://www.sciencebasedmedicine.org/about-science-based-med...

[2] https://www.sciencebasedmedicine.org/?s=placebo

[3] https://www.sciencebasedmedicine.org/are-placebos-getting-st...

[4] https://www.sciencebasedmedicine.org/placebo-by-conditioning...

[5] https://www.sciencebasedmedicine.org/should-placebos-be-used...

[6] https://www.sciencebasedmedicine.org/placebo-are-you-there/

Applications are open for YC Winter 2023

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact