However, I'm a bit puzzled by the weird direction the journalist ran with this, which is straight to his preconceived notions that are not that supported by the data he's looking at.
But there's a bit more to this than just that one chart. In addition to self-correction (e.g. beginning to require pre-registration of trials), science is somewhat additive. Is it not possible that the low-hanging fruit had been found earlier, in the 1970s-1990s, and the problem got harder? With the advances in treatments for cardiovascular disease couldn't the problem just be harder?
And look at that "harm" study. Turns out it's one of many from the Women's Health Initiative , and it's the test of whether hormone therapy for post-menopausal women causes heart problems: that is, it's not a test of a drug to prevent heart conditions, but a test of side effects.
How many other of these studies are like that: studies about effects on the heart, not trials for new drugs to treat the heart? How did the ratio change before and after 2000?
In any case, don't trust every popular news article you read about science, particularly if it's written by Kevin Drum and posted on Mother Jones.
Individual studies can be really interesting. They're important for researchers to know about to inform their future work. But any one study - even ones that are done honestly, with good methodology and sound foundations - can be just totally wrong. There could be confounding factors you couldn't have known about that completely invalidate the result. Your test subjects could be unusual in some way, your animal models could be a poor analogue for humans in this particular case, you could have just had really aberrant statistical flukes in your statistic sampling.
It's the body of scientific research, the dozens, hundreds, thousands of studies stacked on top of each other that bring certainty.
The problem of "problems getting harder" is a continuous phenomenon. Why would that be the case suddenly after 2000, and not before?
> However, I'm a bit puzzled by the weird direction the journalist ran with this, which is straight to his preconceived notions that are not that supported by the data he's looking at.
Is it possible that your own preconceived notions about the author and the publication may have caused you to judge this way?
In any case, don't trust every comment posted on Hacker News (including my own) :)
Well, there was that whole dotcom boom and a lot of things changed for computers & the internet which led to researchers being able to share more information, use more powerful computer techniques, etc.
In my experience working with/contracting for neuro labs, a lot of researchers don't really know how to fully leverage the technology that's available, and often rely upon proprietary tools they have limited knowledge of, which doesn't bode well for being able explore for themselves.
The few that I have met that can push the limits of current technology are working in labs ran by the above…
I'm not sure how it is in other fields, but in convos from some other commentators on HN over the past years, makes me think this is not just in neuroscience.
Maybe the problem is that the skills needed to explore the solution space and communicate it effectively have gone up because the complexity it has added to the process without research labs/academia addressing the gap sufficiently? I don't think this is a problem with just labs or academia though, not many people in general have the skills to be able to leverage technology to it's fullest for even the most banal tasks.
Also I had not heard of Kevin Drum before this, and had a positive view of Mother Jones. I'm left with a poor impression of Kevin Drum and a hit to Mother Jones' reputation after reading this.
You could look for confirmation or disconfirmation of the hypothesis that preregistration leads to an increase in null results in other data sets.
Both of those seem like more useful ways to dispute this finding than squinting at a graph and finding fault with a blog post headline.
The "squinting at the graph" was assuming that there's a sudden change. I read the paper, and looked up several studies and came to the conclusion that the paper was being misrepresented by the blog text. And I agreed with the headline, but not with the blog text.
There are explicit exploratory methods you can use to carry out research and that is very fine, but most research today is done using an inference model, where you formulate a hypothesis and use data to corroborate it.
That is one of the premises of big data that most people do not get: It enables you to do research explicitly exploratory again, while still being rigorous and somehow falsifiable..
>every significant clinical study of drugs and dietary supplements for the treatment or prevention of cardiovascular disease between 1974 and 2012
There is room for a lot of bias when selecting which studies are 'significant' and on topic. Not to mention deciding which metric to report from the study.
Literature reviews in meta-analyses are usually conducted like this, with specified lists of search keywords and flow charts of inclusion criteria.
There's some argument to be said that in times past there were fewer "scientists" working on these topics, explaining the delays. But I don't think that's fair. The reason I put "scientist" in quotes is because many of these things have minimal prerequisites and the average home would have all that's needed to experimentally test and discover these concepts. And their immense value, far beyond the academic, meant their discovery likely would have been able to rapidly spread regardless of the origin - so that, at least to some degree, precludes authority as a necessity.
Many things that now constantly elude our ability to understand, perhaps dark matter is a great example, will likely one day be child's play.
I'm pretty sure we have no significant advances in the treatment of cardiovascular disease.
And given the alarmingly climbing rates of chronic diseases, e.g. cardiovascular disease, diabetes, obesity, cancer, I'm also pretty sure that the health care industry has done more harm than good when it comes to chronic diseases.
For everything else, for problems they can measure and treat with a pill, sure, I'm no anti-vaxxer, but they failed hard at chronic diseases, promoting cures and guidelines that did more harm than good.
Recent US outcomes:
US & New York State outcomes 1980-2007, Figure 2: https://www.health.ny.gov/diseases/cardiovascular/heart_dise...
I haven't hit the major mortality sources, but these bits match everything else I've heard.
I'd also like to know the basis for asserting that the health care industry doing more harm than good.
I mean, can't this be attributed to other things too? (The way people live has changed in the past 20 years...)
I'd rather see some studies to show if/why those things have spiked rather than disavowing attempts to help it.
What else are we supposed to do?
Very good advice. Over 95% of the crap posted on Mother Jones should be ignored.
This is a ridiculous "plain English" description of what is happening here, and I say that as someone who is regularly very critical of academia, drug trials, and research science (I've lived it; check my bio).
Clinical trials and mandatory registration are great things. It does not mean that researchers were massively cheating in the past, however - it means that they were finding secondary and tertiary findings and reporting them instead of the main investigative thrust of the research. Yes, some blatant cheating happened, as did p-hacking (though this problem still exists), but to act like the clinical registration database completely stopped a massive ring of fraud is ridiculous and the data does not support that, merely a conspiratorial narrative around it.
I do wonder if this chart misrepresents something, though: there are studies that produce incidental--but genuinely valuable--discoveries. It's unclear to me if that accounts for the pre-2000 results or not. With the new rules, would there have to be another study stating the new objective?
I'm not questioning that bad science occurs, but I am questioning what this graph really tells us.
I think it's only fair to force you to replicate at least once the positive result you think you see in the data you collected for another purpose before you can claim you got something.
The author is kind of doing the same thing here that they accuse others of doing...
Whether it's a clinical trial, or a psychological study with n participants, or a look at how changes in x soil conditions affects the y tree. Anything where data samples are taken and statistically analyzed is prone to p-hacking and after-the-fact hypothesis changes.
This example happens to be for a narrow set of studies: clinical trials for cardiovascular drugs/supplements. The take away from that graph applies broadly.
Until you can ... they are for all intents and purposes.
Clinical trials are not basic research, though they rely on a great body of basic research, and are themselves experiments. Requiring that clinical studies have clearly defined end points serves the purpose of ensuring that the results are robust and well understood. This is important because the end goal of a clinical trial is developing a treatment of some sort that will go into many humans, and mistakes can be very harmful.
Choosing an appropriate endpoint for a clinical trial can also be very challenging. Say for example you're trying to advance a drug candidate for treating cardiovascular disease. You have many choices for how to measure that - chest pain, resting heartrate, cholesterol etc. It's very much possible that your drug does improve cardiovascular health, but your trial can fail because you chose the wrong endpoint.
When I perform an experiment, I do so with a hypothesis which is sometimes proven right, and sometimes proven wrong. Either way, the results can be interesting and form pieces of the large puzzle that is a scientific study. Further, questions that are addressed in a scientific study are generally going to be more broad than those addressed in a clinical trial - we don't really do exploratory clinical trials, but exploratory research is a very important part of science. For these sorts of studies, it is often difficult or impossible to have a well defined endpoint such as is required in a clinical trial. One example for you: I hypothesize that enzyme X has function Y. During my studies, I discover that enzyme X actually has function W! If the evidence I present for enzyme X having function W is solid, my results can still be great and useful, even if my initial hypothesis was wrong.
"A reasonable takeaway from this article is that any one scientific study may contain flaws, or show bias, and therefore there is a possibility for it to be confirmed, improved, or disproven in the future. As a result, it is better to take early results with a grain of salt until additional supporting evidence is found, than to take them as gospel.
"It should be noted that these results are due to flaws in the scientific establishment, flaws in the human application of science, and flaws in the humans themselves, but not the scientific method itself. The scientific method itself is sound and can be trusted. You use it every day to see whether the water in your shower is too hot or cold, or whether it is raining outside."
As sample size goes up, the probability that an estimate's confidence interval contains the true value of an effect goes..
The "laymen" are the ones funding all of this through their taxes. I'm sure scientists would love it if the plebs just shut up and kept on giving them money, but it doesn't work like that. In fact that is why it is becoming harder and harder for scientists to communicate with the public to affect policy.
So yes... public posted information should be layman ready
More importantly, I can make whatever I want public, and I can do so with whatever audience I have in mind.
So inciting panic online is fine but not in a theater ?
Putting something in the public domain allows other people with knowledge in the field to use the information. It doesn't mean that the lay-person should be able to digest it.
The public domain isn't some lowest common denominator clearing house of information, it's just public.
I can imagine a non-statistically minded person thinking "So what it's not what they were looking for originally? We're missing positive findings now. This is a terrible regulation." When in reality these "positive" findings were p-hacked to meet minimum criteria for statistical significance and likely arose by chance since luck would have it that in any study there will be some set of data that by chance is a statistical aberration.
In some alternate universe, it is considered malpractice for those who design the study to be the same group that runs the study.
I don't think we can get there from here, but if we had a core track of theorists who designed studies, and a second equally prestigious track of practitioners, who independently tested and ran studies, experimental science would be much more rigorous.
Your prestige should be tied to your ability to identify novel experiments to try, or in rigorous testing procedures, never tied to your ability to shape data to make your claims appear grand.
A lot of the PL students at my school are extremely wary of doing any follow-up work on ideas that have been published before, even if the implementations of those things are obviously shoddy and don't really demonstrate that the idea works. There's a lot of novelty chasing, which is part of what's pushing people to include deep learning in their work, since it often allows them to claim novelty, even if their results aren't very good.
Why is this considered good? Isn't this just a counterproductive limitation? Significant benefits found without knowing what are they going to be in advance are still significant benefits and if they are observed scientifically and proven reproducible I'm glad we've found them.
> Once they had to explain beforehand what primary outcome they were looking for, practically every study came up null. The drugs turned out to be useless.
Aren't newly discovered drugs meant to undergo strict and targeted clinical trials? How can they even be considered being drugs before this? And how can they turn out to be useless after passing this stage?
Also in some cases when nobody wants to fund clinical trials despite very interesting life-enhancing effects supposed or when it's clear the time to general availability through the fully white research and approval chain is going to take longer than people want to wait some non-approved substances happen to be sold on eBay (or, for more questionable substances, on black market), hundreds or thousands of people buy it and report their experience on reddit and this data can be a source for further clues for research.
For a more in-depth analysis, see "The Deluge of Spurious Correlations in Big Data": https://www.di.ens.fr/users/longo/files/BigData-Calude-Longo...
Here's the abstract:
Very large databases are a major opportunity for science and data analytics is a remarkable new field of investigation in computer science. The effectiveness of these tools is used to support a ‘‘philosophy’’ against the scientific method as developed throughout history. According to this view, computer-discovered correlations should replace understanding and guide prediction and action. Consequently, there will be no need to give scientific meaning to phenomena, by proposing, say, causal relations, since regularities in very large databases are enough: ‘‘with enough data, the numbers speak for themselves’’. The ‘‘end of science’’ is proclaimed. Using classical results from ergodic theory, Ramsey theory and algorithmic information theory, we show that this ‘‘philosophy’’ is wrong. For example, we prove that very large databases have to contain arbitrary correlations. These correlations appear only due to the size, not the nature, of data. They can be found in ‘‘randomly’’ generated, large enough databases, which—as we will prove—implies that most correlations are spurious. Too much information tends to behave like very little information. The scientific method can be enriched by computer mining in immense databases, but not replaced by it.
Indeed. But this doesn't disqualify looking for "something" as a valid and useful method of research to be a stage of the whole research chain. That ought to be allowed although research papers produced this way should make this clear.
I'm having a hard time seeing the problem with taking an exploratory approach and just testing placebo vs. some treatment and reporting whatever you find.
Please correct me if I'm misguided on any of this.
Assuming the effects they found were actually legitimate (that is, the alleged "torture" of the data was statistically sound), how is this at all a bad thing?
Assuming that the effects are legitimate is exactly the problem. You are right, if we somehow know that the effects found are real and reproducible, then all is good. The problem is that we almost never know this. Presumably what you mean is "If the results are reproducible, what's the harm?". I'd agree with this, but the problem is how to know ahead of time that the results are going to be reproducible.
Shouldn't it be safe to assume that they could simply repeat the trial with a different stated objective and successfully yield the positive result?
If they did the statistics correctly (accounting for the multiple inferences, all assumptions about iid data met, no biased dropouts, everything else aboveboard) and got a really solid result, then yes, it's theoretically likely that the results would be reproducible. The problem is that they almost certainly didn't do the statistics correctly, and intentionally or not they probably violated a lot assumptions. In too many cases, even the main line conclusions can't be replicated. It's rarely "safe" to assume that an effect is real until it's actually been replicated, and almost never safe to make this assumption for result obtained by sifting the data after the fact looking for correlations.
There are ways to do it this way, but it usually takes larger sample sizes than are available. It also requires "bespoke" statistics that are easy to get wrong. In practice, it's usually better to use the incidental "results" as idea generators for future experiments, rather than assuming that the findings are real and don't require further testing.
Analysis after the fact isn't useless. However the bar of significance needs to be much higher. You need a much larger sample size, or better yet design a new study (controlling for things the original didn't) to draw conclusions.
Note, technically statistics do not expect to find 5 false positives in 100. It is close enough for discussion and makes intuitive sense. If you want the real truth be prepared for a lot of math.
The simplest, most straightforward explanation for the mechanism, and why it's bad, is this:
In the old scheme of things, the researchers would have found and reported "significant" bad outcomes connected to green jelly beans. With required preregistration, they must preregister 20 studies, so if 19 similar studies show no effect, and 1 of them shows "significant" effect, then it's likely that you're just dealing with random fluke.
That still seems a good first step. It may also be curious to find out what were the things that have actually lead to the effect if not the one we were checking. E.g. it may happen that it the substance researched only affects subjects with particular features or that something else they were doing has produced the result on itself and knowing any of these still seems useful (some really important discoveries were made this way AFAIK).
This sounds like a good idea in theory, but in practice, it is usually even harder to answer this question than the original question that yielded the data. This type of question requires extremely careful study designs, large sample sizes, accurate power estimates, etc.
So the headline really should read something like "Why you should trust scientific studies a lot more now than you did before"
Another, perhaps better, headline would be: "Why You Shouldn't Trust Every 'Study' You See".
if they’re positive enough then the drug company will fund another study with that as the primary outcome. that seems prudent too: it should be the primary thing you’re examining so that you can design the study correctly, rather than a simple “oh and by the way” side note
Unless you're looking for the effect from the outset, you can't be sure that what you saw wasn't actually random. There's a term for what you're describing, it's p-hacking, and it's explicitly the very thing declaring what you're looking for before you run the test is designed to prevent.
Like many science related topics, there's a XKCD about p-hacking that describes a similar scenario to your example: https://xkcd.com/882/
My question is: Was the free-range research actually effective? Or, was it "technically-correct effective just so you can't call me out on a failure, moving on..."
This article also feels like it has an agenda. Maybe I'm not familiar with MotherJones, but the tone of the article strikes me as unprofessional. And the headline, that's obviously correct and means nothing. It's like saying, "Why You Shouldn't Trust Every Stranger You Meet".
I assume that taruz is saying that journalists should be required to say what they're investigating before publishing an investigative report. Right now, they start investigating, and if there's something outrageous (even if it wasn't what they were initially looking for), they publish. Sometimes they even skip the "start investigating" part and just put up a tip-line for anyone who has a beef to get a story out there.
This is great for manufacturing outrage and hence clicks, but it gives the public a hugely skewed perspective on how the world is. Imagine that 0.01% (1 in 10,000) of all peoples' actions are outrageous and will piss off a large portion of planet. Most people, by those numbers, would say that the majority of folks are decent, law-abiding citizens. Now imagine that a news outlet is allowed to freely go over someone's life, and they end up evaluating 1000 actions. There's a 10% chance that they'll find something outrageous. Now imagine that 10 such reporters do this to 10 people, and if any one of them finds something publishable (= outrageous), they go to press. Suddenly there's a 64% chance that one of them will find something, and you've likely got your news cycle for the day.
With the millions of people looking for something bad that a tip-line can generate, outrage is virtually assured. And that's where journalism is today. The world isn't actually a worse place than it was in 1980; in fact, by most metrics it's significantly better. But we've increased the amount of unpleasantness that people can be exposed to by 3-4 orders of magnitude and then implemented a selection filter that ensures that only the worst stories go viral. Of course we get only bad news; that's all that's economically viable, and we have such a large sample size that we can surely find it.
As people pay less for good journalism, and ad revenue is shrinking, the media is getting desperate to stay afloat.
Journalists today have significantly less independence than they had 15 years ago. Everything is far more controlled and geared towards sales.
I would like to see more alternative finance models for quality journalism.
Given how social media is actually starting to destroy democracy, it may be worth considering government grants to independent media organizations as a part of national defense. A democracy cannot function if the media is utterly broken. That means citizens are no longer capable of making informed decisions.
“If you give me six lines written by the hand of the most honest of men, I will find something in them which will hang him.”
– Commonly attributed to Cardinal Richelieu
I.e. any fact or action can be willfully misinterpreted to fit almost any narrative.
That increased CO2 emissions should cause a temperature increase has been proposed over 100 years ago. They did not look at climate change recently and suddenly decide "lets blame CO2".
(3) the man made climate change theory, does not rest of a single study, it rests on countless ones.
(4) There is actually a scientific model explaining why we see the measurements we get. They are not digging around at random.
The case for man made climate change is pretty strong. Look at these graphs and tell me which graph correlates most strongly with global temperature increases: https://www.bloomberg.com/graphics/2015-whats-warming-the-wo...