Hacker News new | past | comments | ask | show | jobs | submit login
The Flawed Reasoning Behind the Replication Crisis (nautil.us)
185 points by dnetesn 18 days ago | hide | past | web | favorite | 115 comments



The article says:

> The main reason scientists have historically been resistant to using Bayesian inference instead is that they are afraid of being accused of subjectivity. The prior probabilities required for Bayes’ rule feel like an unseemly breach of scientific ethics. Where do these priors come from?

It's not just "being afraid"; the problem is that random guessing (of priors) is not a reasonable replacement for science.

Bayes' rule is great if you can find a reasonable justification for a prior. Bayes is widely used for decision-making (for example), where you need an answer quickly & you aren't trying to make general scientific claims. But if you can't find a justifiable prior in a scientific work, using Bayes' rule just replaces one statistical fallacy for another one. After all, Bayes' rule will give nonsense answers if you give it a nonsense prior!

Bayes' rule is a great tool in many circumstances! But it has a great weakness: it requires a prior. That doesn't make it useless; few tools are useful in all circumstances. But requiring "everyone to use Bayes' rule, even though we have no reasonable way to find a good estimate of the priors," is unlikely to ever happen (and rightly so). The article rightly points out a serious problem with the typical application of statistics, but there needs to be a better justification for priors than is suggested in this article.

I could imagine systemic worldwide ways to deal with this. For example, perhaps the scientific community could allow people to propose initial priors, and then allow multiple different papers to improve the estimation of the probability over time. But that would require much more than articles repeatedly saying "there's no serious problem with priors"; having a justifiable way to estimate and update priors is the fundamental problem with Bayesian analysis in the scientific community.


IMO the most significant problem isn't any particular statistical approach, but the incredibly archaic nature of scientific publication.

I read a paper and think that there's something wonky with the statistics. Can I go to the journal's website and download the full dataset? No. Can I download the source code used to compute those statistics? No. I can e-mail one of the authors and ask nicely for that information, but what are the odds that they'll respond if they know that their work is weak or outright fraudulent?

Science is supposed to be transparent, it's supposed to be about sharing information, but there's a big opaque wall concealing anything that won't fit into the archaic size limitations of a paper journal.

There's no One Weird Trick to fix the replication crisis, there's no obvious Bayesian replacement for p-values or confidence intervals, but there's an obvious shortcoming in scientific publishing that is a huge impediment to secondary research.

A high schools student can't get full credit on a math exam without showing their working - why should scientists be held to a lower standard?


It's even worse than that. The whole peer-review process we have today is stuck in the 19th century. The whole idea that publication and the credibility that is associated with it should be a binary quality -- a paper is either "published in a peer-reviewd publication", and therefore credible, or it isn't and so it isn't -- is absurd on its face. If there are any shades of grey, they are associated with the "prestige" of the publication, which again, if you think about it, is absurd, and flies in the face of the scientific ideal: an idea is either supported by the data or it isn't, and what two randomly selected anonymous referees on the review board of a journal, people who likely have ulterior motives in terms of their own career advancement, have to say about it should be the least of anyone's concern.

The right model IMHO is open publication a la arxiv with an associated open review process so that anyone who wants to can weigh in. The wheat can be separated from the chaff by doing a "reviewer-rank" analogous to page rank so that the opinions of the people who know what they're doing count more than the crackpots. But this idea that there's an elite priesthood of anointed Scientists, a.k.a. corporate researchers and tenure-track university faculty, who pass final judgement on all scientific truth really has to end.

(And I say this as someone who made their living working successfully within the current system for 15 years. I quit in no small measure because I saw firsthand just how corrupt and dysfunctional it could become, and I was working in a field that is generally considered "hard" science!)


19th century peer review was far more entertaining and public than the current system. Frequently, an article would be published, then various colorful responses would be published tearing apart the article (and not infrequently the character and competence of its authors).


This still happens but it's a lot less fun, yes. See for example "Tai's Formula Is the Trapezoidal Rule" (https://doi.org/10.2337/diacare.17.10.1224) and the paper it was responding to, "A Mathematical Model for the Determination of Total Area Under Glucose Tolerance and Other Metabolic Curves" (https://doi.org/10.2337/diacare.17.2.152)


> The whole peer-review process we have today is stuck in the 19th century.

Peer review as something associated with getting a paper published dates from the mid-20th century. The 19th century used the superior approach of publishing whatever and letting interested parties review it then.


You are right, but the parent is on point. When I was a student I tried many times to download datasets, with very little success. Only statistical institutions like INE, Igape and so on happily provided the data used for their studies, which is a given since they publish most of their datasets in their website anyway.

This sucks, because maybe you have more questions that could be answered with their datasets, but you can't do shit because this shitty system.

I'm not a scientist, and partly because though the uni I was really put of by the bureaucracy and the smell of endogamy, obscure and old system and so on. It's so hard to build new knowledge when everyone is taking care of their little yard and keeping everyone else away... and social sciences need this openness like farmers need their rain.


> You are right, but the parent is on point.

Yes, I never meant to imply otherwise.


> The whole peer-review process we have today is stuck in the 19th century.

There are journals that do light initial screening, then publish the submitted manuscripts online, then publish the whole peer review (reviewer comments and authors' responses) online, and then proceed (or not) to publish the reviewed article. Also anyone on the internet can leave an additional review, or just some questions, during the review process.

Also there are journals that have comment sections for the published articles.


Example(s)?


Open peer review process: All European Geosciences Union journals https://www.egu.eu/publications/open-access-journals/

Comment section for published articles: For example all PLOS journals https://journals.plos.org/


No ranking replaces the access to the real work behind a scientific article: data, programs, experimental procedures, etc. Peer-review is a good thing compared with no review, but access to the real data is simply better. Think that reviewers themselves don't have access to more data than other readers, in the actual system.


Socrates and Galileo are but two of the more recognizable names on an endlessly long list. That list being made up of individuals who were correct but had their views judged by the masses, and lost. Socrates was killed, Galileo was merely imprisoned in his house until his final breath. Science is an issue, like many others, that often struggles when you get into matters of popular consensus. New ideas which end up being correct after often vigorously fought against, particularly when they undermine science that others want to be true.

We wanted the Earth to be the center of the universe. When somebody published writings suggesting we're just another body of no uniqueness revolving around a similarly indistinct star it posed a typical issue. If a scientist wanted to advance their career, they could come out quite strongly against this. Coming out in support of it risked turn the mob's attention on you. Refuting invalid refutations of this view is a non-trivial matter. Refuting what might, in modern times, be thousands or more - would be impossible. Indeed Galileo's correct heliocentric view would not come to be accepted until many decades after his death.

Of course we've overcome such reactionary barbarism though, right? Today there are at least two major issues we face. The first is corporate science. To give a now recently resolved example consider leaded fuel. Leaded fuel is something that from its earliest day drew skepticism. Nonetheless it persisted for the better part of a century with government and scientific endorsements of its safety. This was in large part because the corporations profiting from it made sure to stack the science in their favor and to also vigorously attack any critics of its safety. In an open system you'd see the same problem as above. Support the science suggesting leaded fuel may be unsafe, face the well funded mob. Oppose the science, get grants, accolades, and personal recognition. Compare Midgley, who lives in sufficient infamy that only his last name need be referenced, to e.g. Alice Hamilton.

The other issue is one of social matters, very similar to the issues that Galileo faced. Genetics is a perfect modern example. As but one example there, the connection between genetics and intelligence is in no way ambiguous. But this is something we, socially, do not want to accept and so progress is absolutely glacial and if one steps a hair outside an invisible line that we've drawn, they risk getting condemnation, hate, and exclusion from the scientific system. In many ways this is exactly what happened to Galileo. He was not imprisoned for his views, so much as for his disregard for social norms when expressing them - similar as well to the story of Socrates, who was unrepentantly provocative til the end.

The problem there is that science should be judged based on merit, not on presentation or nicety. As you move more towards a social system I expect we'd begin to move ever more from the former to the latter. This would be magnified a million times over if we chose to use a pagerank style system as you proposed. I fully agree our current system is, at best, dysfunctional. But I think our idealization of social systems is quite often contradicted by what they turn into - something Socrates, quite prophetically, often spoke of.


You make it sound like Socrates was killed due to scientific misunderstanding, but that is not quite the case.

Socrates was not just "provocative til the end". He was part of particularly deadly politics and animosity toward him had a lot to do with autocratic rule his friends and students installed and deaths they caused. The death was miscarriage of justice, because there was supposed to be amnesty and gods thing was likely a way to work around that.

Nevertheless, it did not had less to do with him being right and more to do with revenge and more to do with where exactly his philosophy led just a few years ago.


He was executed for failing to honor gods nobody believes in anymore, and for 'corrupting the youth' with his teachings which are still studied and extremely highly regarded thousands of years after his death. It was the exact case with Galileo as well and generally with all names on the list. They are not punished directly for what they say, as there tend to be no laws explicitly against such, but instead with mostly fake charge sprung as a means of silencing them or attacking them, sometimes permanently. For instance in Galileo's case he was similarly imprisoned for life under 'suspicion of heresy.'


Just a few years before, two of his friends and students run autocratic goverment called "thirty tyrants". They got in power thanks to enemy city Sparta. They killed impressive number of people, tortured those oposed and so on.

Their distaste to democracy this rule was based on had to do with Socrates philosophy.

Si yeah, gods largely because there was a.) amnesty for things that went on during thirty tyrants period b.) Socrates himself was not actively perticipating in killing citizens and at one point ignored order to go and kill (presumably). He did not left the city either through he could nor opposed the rule.

When he was cocky during court case, he knew gods accusation is bullshit. He likely underestimated how those who were against his friends rule resented him and likely how much they seen his ideology as threat to their own freedom, life, democracy and what not.

Athens were quite short after bloody revolutions and pretending that it had nothing to do with anything except immediate charges just makes history boring and incomprehensible.

Socrates was not executed because he was misunderstood genius. He was executed because he was seen as very real threat.


As you mention Socrates was in no way involved with the thirty, and even openly opposed their orders - fully knowing it likely meant his head would end up on their chopping block sooner or later. I have immense admiration for him since he is one of the extremely few individuals in history that, in spite of numerous opportunities to do so, did not sway from his values to benefit friend, hurt foe, or even to enrich himself. Indeed in his trial I do not see any evidence of him underestimating anything. Rather it seems he was more than willing to let himself be unjustly killed if not only to show that his views of democracy were fully justified.

And that gets back quite directly to the point of this entire thread. Many things are not well decided by consensus. Major advances in science are sometimes actively and openly embraced by all, as was the case for relativity. But many times progress is met with vigorous opposition, not even on a technical level but on an inertial one. It was none other than Max Planck that witted, "Science progress one funeral at a time."


You claimed Socrates was killed for Gods that dont exist and that is not the case. None of what happened had anything to do with science or technology.

He was killed for his association with rule of thirty. The single act of not actively participating does not change that.

He was not actively opposed to that rule either in any way or shape. Other (numerous) people were actively opposed and died or were tortured for it. His values partially gave ideological background to rule of thirty.

He did underestimated situation. He was confident and surprised at the result.


He had no association with the thirty whatsoever. Two were former students. That's it. And not only did he not participate (nor would he have been invited to participate in any case) but he actively opposed them while telling them that he did not fear death - an assertion they undoubtedly would have put to the test, sooner or later, had their government lasted longer than 8 months.

And as for his trial the indictment against him read, "This indictment and affidavit is sworn by Meletus, the son of Meletus of Pitthos, against Socrates, the son of Sophroniscus of Alopece: Socrates is guilty of refusing to recognize the gods recognized by the state, and of introducing new divinities. He is also guilty of corrupting the youth. The penalty demanded is death." Socrates intentionally provoked the 500 jurors with statements such as, "Men of Athens, I honor and love you but I shall obey God rather than you, and while I have life and strength I shall never cease from the practice and teaching of philosophy." The use of "God" and not "Gods" was not an accident. When Socrates was asked to propose his own punishment (in lieu of death) he proposed he be rewarded.

He obviously was not surprised when he was found guilty. The jurors were 500 random Athenian men, hardly his compatriots. His parting words speak of his intention, "The hour of departure has arrived, and we go our ways - I to die, and you to live. Which to the better fate is known only to God." He knew he had lived a great life and one that would extend well beyond himself. His martyrdom here was the grand finale to emphasize the correctness of the unpopular things (within Athens) that he had been saying all along. He warned that Democracy can become an unreliable and capricious system. Democracy responded by murdering him. You're undoubtedly correct that some of the majority that decided he should be murdered were driven by a misguided lust for revenge. That is a key validation of everything he ever said, and he knew it would be.


"He was part of particularly deadly politics"

Haha. Human nature never changes.


I agree. Another benefit to having the data easily available is apparent when you find papers that study a question that is very similar to what you're interested in but not quite the same. It would be amazing to get their data and apply your own analysis, not just to check for errors, but to pursue your own questions.


> random guessing (of priors) is not a reasonable replacement for science.

Priors in Bayesian statistics are not randomly guessed. The prior is supposed to reflect your state of knowledge prior to looking at the data from the experiment/study/test/whatever (hence the name "prior"). For example, in the mammogram example, it is assumed that the doctor knows the base rate of breast cancer in the population (which is a very reasonable assumption since there are mountains of data on such rates) and the false positive and false negative rates of the test (which come from data on previous tests). Those are the prior. The doctor doesn't just make up those numbers; they come from prior knowledge.

> if you can't find a justifiable prior in a scientific work, using Bayes' rule just replaces one statistical fallacy for another one

In other words, if you have no prior knowledge about something, you can't expect to do statistics on some small set of data and get reliable answers. Yes, that's true. And Bayesianism tells you this is true, by telling you that you can't find a justifiable prior. (Technically, you can always find a maximum entropy prior, but in most cases that's not really any better than having no prior at all since it is basically telling you you need a lot more data before you can conclude anything.) Whereas statistical methods as they are currently done in many scientific fields simply ignore this issue completely and pick arbitrary thresholds for statistical significance. Which is the article's point.

In other words, Bayesianism, properly viewed, is not a magic bullet for extracting statistical answers from nowhere. It is a tool for exercising discipline, so that you know when you have to say "sorry, not enough data for a meaningful answer".


It’s also not absolutely necessary to use Bayesian reasoning to come up with a concrete final answer, but still allow for scientific style “we have a significant discovery, but it may not change our view from the status quo just yet” types of publications—significant changes in probability which nevertheless do not make unlikely things likely or vice versa.

So there is a very convenient way to view Bayes’ theorem by looking at odds ratios, Pr(A)/Pr(A'), where A' is the complement of A (the event which obtains precisely when A does not), which are equivalent to probabilities but span the range from 0-∞ rather than the range 0-1:

      Pr(A | E)      Pr(A E)      Pr(E|A)      Pr(A)
     ------------ = ---------- = ---------- x --------
      Pr(A' | E)     Pr(A' E)     Pr(E|A')     Pr(A')
In other words the odds after obtaining some evidence E are a multiplicative factor times the prior odds.

Taking the logarithm, one essentially can measure the impact of any new evidence in decibels as a linear result; the core of Bayes’ theorem can be stated as saying that evidence is measured in decibels.

For example in the breast cancer example in the original article, you have a +13 dB evidence—Pr(E|A)/Pr(E|A') = 20—but the prior log-odds are given as -20dB and thus the shocking conclusion is that after the evidence your log-odds are still negative, -7dB, and you likely do not have cancer. For a life-or-death situation, resolving the relative nature of evidence to a specific probability is clearly essential.

But in terms of p-values, if you have an experiment which gives +20dB evidence to something, I would say that this should still be meaningfully publishable as a relative increment even if it hits something whose prior is -30dB and therefore it still is only a ~1/10 chance that the thing is actually true.

Like, I think that the key is having a vocabulary for relative results versus absolute results and I think that the Bayesian language can be used to take one step towards that, but perhaps it would be nice if we could have a standardized way in our language to indicate that numbers are relative rather than absolute, the way that we do habitually with decibels and pressures and such all the time...


"It is a tool for exercising discipline, so that you know when you have to say "sorry, not enough data for a meaningful answer"."

As opposed to the frequentist approach of saying, "We have data D and hypothesis H. If the probability of D when not-H is true is below a threshold, H is true," which hides an awful lot of assumptions.

The advantage of Baysian statistics is that you have to explicitly state your assumptions in terms of a prior.


>As opposed to the frequentist approach of saying, "We have data D and hypothesis H. If the probability of D when not-H is true is below a threshold, H is true," which hides an awful lot of assumptions.

My textbook (what would be called a frequentist book) is very explicit: You either reject the null hypothesis, or you fail to reject the null hypothesis. A significance level test never concludes that the null hypothesis is true. It's nuanced, but is not quite the same as your statement.


Yes, and under this description, the equivalent of the Bayesian prior is picking what the null hypothesis is. Frequentists simply fail to admit (or perhaps fail to understand) that that choice is just as much of a subjective judgment as the Bayesian choice of prior.


>Frequentists simply fail to admit (or perhaps fail to understand) that that choice is just as much of a subjective judgment as the Bayesian choice of prior.

Again, this claim is something I see only from self described Bayesians. I've not met a professional statistician ever fail to admit it. My textbook (written in the 90's), in at least two places, talks about it. In one place it explicitly warns against assuming the "traditional" approach is more objective, and points out that the assumptions exist in the model and the practitioner should be aware of those assumptions.

And really, when it comes to p-values, I've never seen any stats textbook not give a proper description and interpretation. No statistics textbook I've read describes significance testing as a mere prescription from which you can deduce binary answers. My book discusses factors in picking your p values, and the implications, and that the appropriate value is very dependent on the problem.


I think it would be useful if you could apply these principles to one of the examples in the article. Assume we have a woman who has a mammogram that indicates malignancy. The question that the patient would like answered is "What is the chance that I have breast cancer?"

How does frequentist statistics answer this question? What's the null hypothesis that should be used here? Can this question be answered without assuming a prior on the likelihood that the individual had breast cancer before the test was performed?


>How does frequentist statistics answer this question?

By applying Bayes Theorem, just as the article did. By invoking Type I and Type II errors.

This is a standard problem in stats textbooks - not breast cancer but the general problem of a diagnostic measure that is 99% accurate, for a disease that has less than 1% prevalence.

Frequentists don't insist on using p-values to solve all problems.

>Can this question be answered without assuming a prior on the likelihood that the individual had breast cancer before the test was performed?

Why should one not assume a prior? There is a base rate for breast cancer. Why should a frequentist not use that information? I've never heard a statistician who does not describe himself as a Bayesian say one shouldn't. Every textbook suggests one should. Where are people getting the idea that frequentists don't use Bayesian methods. They always have used them.

I'm finding this whole discussion orthogonal to Bayesian vs frequentist statistics. The main issue frequentists have with a lot of Bayesian approaches is that Bayesians often want to assign probabilities to one off events, whereas frequentists insist only on repeatable events. That is where the accusation of "subjective" comes from. Frequentists like to believe that any question of probability can be decided by taking N samples and seeing the outcome (even if only conceptually).

For problems where there is a population, and one can do sampling (i.e. repeatability), there's never a problem with using Bayesian methods.


Good point. It would be illustrative to get some examples where a prototypical (but competent) Bayesian and a prototypical (but competent) frequentist would give different answers.

It seems to me that when there really is a known base rate, the answers would coincide.

And if there is no base rate, the Bayesian would guess (but explicitly) or refuse to answer, while the frequentist would give a plausible answer, which, however, very much hinges on the unknown base rate.

In the light of the replication crisis, maybe the Bayesian approach is better.


I didn't say "picking the p-values". I said "picking the null hypothesis". (Although p-value hacking is certainly a major contributing factor to the replication crisis.)

It's good that the textbooks you were taught from point out that assumptions exist and should be acknowledged.


>It's good that the textbooks you were taught from point out that assumptions exist and should be acknowledged.

Have you found a "standard" text book that doesn't?

Not a probability textbook, but a statistics one.


"Again, this claim is something I see only from self described Bayesians."

The claim is that the prior is subjective and they're bad. The Bayesians are drawing the analogy that frequenteist assumptions are just as subjective (but maybe less explicit) to the prior selection so objecting to one and not the other is... ...in need of further justification.


>The claim is that the prior is subjective and they're bad.

The only such claims I've seen are ones where they disagree with the prior, which is no different that disputes amongst themselves where so-called frequentists argue whether p=0.05 or p=0.001 is appropriate. Frequentists don't pretend that their p-value is set in stone.

Let me repeat what I said in another comment:

I'm finding this whole discussion orthogonal to Bayesian vs frequentist statistics. The main issue frequentists have with a lot of Bayesian approaches is that Bayesians often want to assign probabilities to one off events, whereas frequentists insist only on repeatable events. That is where the accusation of "subjective" comes from. Frequentists like to believe that any question of probability can be decided by taking N samples and seeing the outcome (even if only conceptually).

For problems where there is a population, and one can do sampling (i.e. repeatability), there's never a problem with using Bayesian methods.

My complaint is that while there are differences amongst Bayesians and frequentists, this article does not present any, and wrongly implies things about frequentists.


"In one place it explicitly warns against assuming the "traditional" approach is more objective, and points out that the assumptions exist in the model and the practitioner should be aware of those assumptions."

The Bayesian approach forces you to explicitly model those assumptions. Or to admit that you are ignoring them.


>Priors in Bayesian statistics are not randomly guessed.

And frequentism (if there is such a thing) does not prohibit using base rates in their analysis. The article is setting up false dichotomies.

Also, this in the article is a "random guess":

>Maybe we’re not so dogmatic as to rule out “The Thinker” hypothesis altogether, but a prior probability of 1 in 1,000, somewhere between the chance of being dealt a full house and four-of-a-kind in a poker hand, could be around the right order of magnitude.


You can't "not assume" a prior. Frequentist statistics does indeed use (sometimes improper) implicit priors, and thus implicitly assumes subjective information. Bayesian inference makes it explicit. Anyone who thinks they can evaluate evidence objectively in the first place is fooling themselves. There is always subjectivity.

If you really want to assume "minimum information," use the Jefferys prior for your domain. This minimizes the square root of the determinant of the Fisher information matrix.

The Jefferys has the key property of being invariant upon changes of parameters. This property is not generally true of implicit frequentist priors (unless they correspond with the Jefferys prior), but yet is essential for uninformativeness. If your model depends on the units you choose for your parameters, you can hardly call it objective!

https://en.wikipedia.org/wiki/Jeffreys_prior


Explicit nonsense is still nonsense. I don't believe that either Bayesian or frequentist approaches are the "one true way"; they are merely tools that help us understand the world, and both approaches have limitations.

I suggest looking at the essay "Beyond Bayesians and Frequentists" by Jacob Steinhardt http://cs.stanford.edu/~jsteinhardt/stats-essay.pdf , who says, "The essential difference between Bayesian and frequentist decision theory is that Bayes makes the additional assumption of a prior... and optimizes for average-case performance rather than worst-case performance. It follows, then, that Bayes is the superior method whenever we can obtain a good prior and when good average-case performance is sufficient. However, if we have no way of obtaining a good prior, or when we need guaranteed performance, frequentist methods are the way to go." The same author also has an essay that tries to explain why Bayesians shouldn't be so confident in their approach: "A Fervent Defense of Frequentist Statistics", 18th Feb 2014, https://www.lesswrong.com/posts/KdwP5i6N4E4q6BGkr/a-fervent-... Note that the author isn't against Bayesian approaches, but against the dogmatic assumption that frequentist approaches are always worse.


> Bayes makes the additional assumption of a prior... and optimizes for average-case performance rather than worst-case performance

This is nonsense. Nothing stops a Bayesian from picking a prior that optimizes for worst-case rather than average-case performance, given a particular utility function. The really questionable premise here is that the utility function is known; in practically any case of real interest, it isn't, and that's a problem regardless of whether your decision theory is "Bayesian" or "frequentist" (which are misnomers for decision theories anyway for the reasons I gave in my other post just now).


> the essay "Beyond Bayesians and Frequentists" by Jacob Steinhardt

This essay is about decision theory, and the things he is calling "Bayesian" and "frequentist" are more than just statistical methods, which is what the replication crisis is about. Decision theory, particularly when other agents are present, cannot be handled by any method that only considers statistics; game theory is involved. The Steinhardt article is basically claiming that "Bayesians" can't use game theory while "frequentists" can, which is nonsense.


When I say you can't "not assume" a prior, I am being 100% literal. Every frequentist technique can be interpreted in a Bayesian way, and the priors always contain information (which has a mathematical definition). It is also true vice-versa, but is much more awkward and complicated.


> the problem is that random guessing (of priors) is not a reasonable replacement for science.

I disagree!

> After all, Bayes' rule will give nonsense answers if you give it a nonsense prior!

You are absolutely correct here, but draw the wrong conclusion. Bayes rule isn't just some convenience that you ought to ignore without a solid prior. Fundamentally, what can any study say about a subject whose prior likely ranges over a few orders of magnitude among the experts? That subject represents a gaping hole in our knowledge. If a single study were to narrow the band by a whole decade, it would still leave a decade sized hole... All of this should be explicit and out in the open.

> I could imagine systemic worldwide ways to deal with this. For example, perhaps the scientific community could allow people to propose initial priors, and then allow multiple different papers to improve the estimation of the probability over time.

Yeah, this is the way to go. Researchers should preregister their studies with their priors. Future studies will revise them. After all, its easy to argue about numbers and those who disagree can easily supply their own.


This ignores the labor market economics involved in research. If there was lots of pre-registration and always multiple replication attempts then there wouldn’t be much of a replication crisis in the first place. It’s not frequentist stats that’s to blame, it’s the labor market for researchers.


No, the problem is much wider than just researchers. If for example the public, or at least the media, didn’t buy the bullshit they were selling (with high p values, non-public data and other shoddy statistical hacks), there wouldn’t be any replication crisis either. The general problem is that just stamping the word “science” on a black box that smells of poo gives it much more credibility than it should, and it often also prevents people from thinking critically.


I appreciate your point, but surely both are problems? One is bad math while the other is a sort of a prisoners dilemma where everyone has already defected. I don't think changing the incentives would necessarily lead to better math or even that we have to change the incentives before changing the math.


"It's not just "being afraid"; the problem is that random guessing (of priors) is not a reasonable replacement for science."

As the article goes on to argue, ignoring (or hiding) the existence of priors is not science, either. (Two incidences of SIDS in one family are probably not independent, even if they aren't murder.)


> Bayes' rule is a great tool in many circumstances! But it has a great weakness: it requires a prior. That doesn't make it useless; few tools are useful in all circumstances. But requiring "everyone to use Bayes' rule, even though we have no reasonable way to find a good estimate of the priors," is unlikely to ever happen (and rightly so). The article rightly points out a serious problem with the typical application of statistics, but there needs to be a better justification for priors than is suggested in this article.

If you don't have good information on what your prior should be, you can just use a really weak prior. And you probably know more about your data than you think you do -- if you are doing counts, that leads you to a different prior than if you are doing things that can be in increments smaller than a single integer.


Why can't the result itself be the map prior->posterior (instead of the result of applying this map to a fixed prior)?


This is known as the "evidence". Usually we expect scientific papers to not only present their data, but also some analysis and some kind of conclusion. If we eliminate that second half, it probably won't do much to solve the problem.

More importantly, your priors help you elect plausible explanations for data that let you delve deeper and identify the root cause. If you pretend not to have priors, you won't have any idea how to understand the data.

For example, you run an experiment that shows that several communities using pesticide X average a 1% higher rate of lung cancer than comminities using pesticide Y. There's a lot of variance though, and some communities X have much lower cancer rates.

You could publish the data with no further analysis and leave the broader scientific community to do the work of trying to draw conclusions. Or you could remember that you expected the opposite effect, because pesticide Y has a more direct route to enter the lungs, and then you go back and check the smoking rates in those communities and find a nearly perfect correlation. Accounting for the smoking rates, pesticide Y appears to increase lung cancer rates by 1% relative to pesticide X.

Yes, you could have just published the raw data and other scientists could have figured that out. But realistically, they won't be nearly as involved as you and your peer reviewers are in the work. Even assuming you identify the communities in your published dataset, few scientists doing a meta-review would go deep enough to cros correlate your data against smoking rates if you didn't already do it for them. And if they actually do it, it's only because their priors are telling them that there's an important effect that could alter the conclusion...

In other words, it's your priors that give you a nagging feeling that no matter how big the sample size D is comparing cancer rates among X and Y, you just won't feel P(C|X,~Y,D)-P(C|~X,Y,D) collapse to a narrow distribution in your mind, because you'll know that the outcome hinges critically on a third factor that you haven't seen yet.


Well, that’s essentially what frequentist statistics gives you: the probability of the results given the hypothesis, which Bayes’ rule lets you combine with a prior to get the probability of the hypothesis given the results. But people don’t usually actually perform that calculation...


>Well, that’s essentially what frequentist statistics gives you: the probability of the results given the hypothesis

No. The whole point of NHST is that it gives the probability of the results given the null hypothesis. It's whole point is that it doesn't touch probability of the results given other hypotheses (except when designing the experiment, where you need other subjective inputs).


This acts like there isn't just as much subjectivity in frequentist analysis. Ever run a power analysis? The same p value with an underpowered study can be more misleading, so you still need some subjective inputs to your design. This is a subjective frequentist fix for a real issue. 80% power with 5% significance with a presumed effect size of x contains three totally subjective numbers (and one reeeeeally subjective one)? They are crucial in the same way, and subjective in the same way. You need a lot more than a p-value to do frequentist science, and it's not that objective.

What moved me over was the practical learning that your results often aren't that sensitive to your prior and that dependence is usually easy enough to characterize.


For almost all the typical use cases a Frequentist 95% confidence interval will be nearly the same as a Bayesian 95% credible interval determined using a uniform prior. In every case I have seen where this is not true the Frequentist interval gives nonsensical results.

So your complaint is a total non-issue. Most people are interpreting their confidence intervals as credible intervals anyway... Essentially, for nearly a hundred years researchers have already been using Bayesian stats with flat priors.


Is it not valid to do a multiverse analysis using all conceivable priors in this case? There’s no real need to make an arbitrary decision, unless you truly know nothing about what you’re looking at in which case, shrug.


Yeah, I don't agree. How does one go about assessing the validity of any given prior a priori? All priors are based on some sort of logical reasoning based on past experience, but much of the time when investigating something novel there's no guarantee that past experience is a good predictor. Often we don't even know to what degree we can or cannot rely on past experience.

I just don't buy that there's something magically more "mathy" about just conjuring assumptions you want to test.


If you follow the derivation of frequentist formulas, you'll find they always start from some kind of assumption, such as a uniform prior. You can't escape having a prior.

The only way to avoid having a prior is by pulling a formula from a textbook and applying it without looking at its derivation.


I don't know the first thing about statistics, other than noticing that each time there is a "fight" between Bayesians and Frequentists (on internet forums) there are always "experts" on both sides dismissing the other without ever showing a systematic, scientific argument, and without ever reaching a conclusion.


The thing is, hard core frequentists don't really exist. You have Bayesians, and you have people who happily use Bayes theorem as appropriate.

Take this article as an example. Reading it, I got the sense that the author at some point in life discovered Bayesian statistics, and is now on a crusade to twist everything into a pro and anti-Bayesian stance.

His example of the base rate fallacy (breat cancer diagnosis) is probably in every "frequentist" text book out there. Frequentists are well aware of it, and have no aversion to using Bayes Theorem. You will not find a frequentist objecting to taking into account the base rate when applying statistics. The difference, as the GP mentioned, is that the base rate is fairly well known and not subject to much debate. Whereas this:

>Maybe we’re not so dogmatic as to rule out “The Thinker” hypothesis altogether, but a prior probability of 1 in 1,000, somewhere between the chance of being dealt a full house and four-of-a-kind in a poker hand, could be around the right order of magnitude.

Is a number he pulled out of his rear end, and his subsequent calculation is not meaningful to people who don't agree with the prior. Sure, anyone can manufacture a prior if they wanted. And part of me is merely wondering: If he wanted a prior of 1 in 1000, why not simply require a p value of 0.001 instead of the 0.03 the paper used? The problem with the paper is that the sample size is small (n=57), and small samples are a lot more likely to give extreme results. I'll be OK with a p=0.03 if n=10000, but not if n < 100.


> why not simply require a p value of 0.001 instead of the 0.03 the paper used?

That is the proper response to a low prior probability, in general, yes.

More to the point, in an ideal world the p value one picks as the significance criterion should somehow capture both the state of prior knowledge and the consequences of reaching the wrong conclusion. If it really doesn't matter what you conclude, p=0.5 (not a typo: 1/2) is fine. If the conclusion really matters for something important, p=0.03 is likely too high.

Most published research that does significance testing seems to have no particular discipline for picking their threshold p values other than cargo-culting, unfortunately.

> the sample size is small (n=57), and small samples are a lot more likely to give extreme results.

That's already captured in the p value, no? That is, the sample size is already part of the computation of the p value. If you come out with p=0.03, then that means that if the null hypothesis holds in 3% of cases you'd see your observed results, whatever size your sample is. I'd genuinely like to understand why you feel there is a qualitative difference between n=1e4, p=0.03 and n=1e2, p=0.03, because I feel like I'm missing something there.

(Now it's a lot easier to get p=0.001 with n=10000 if your effect is real than it is with n=57. So in that sense, having larger samples helps. Having a larger sample _might_ also help with the "I tried a bunch of experiments until I got one that tested significant" problem, if it's genuinely harder to do a larger-sample experiment. Of course people could also apply a Bonferroni correction, but most practitioners of statistical testing don't seem to realize it exists or might be needed...)


I don't know your age, but from my experience, at least in the 1980s and 1990s, there really were real warring camps of Bayesians and Frequentists. People working on the same scientific topic who were Frequentists wouldn't even cite the papers of their Bayesian colleagues. Frequentist textbooks like A.W.F Edward's "Likelihood" would spend pages disparaging Bayesian methods. But I agree that things are much calmer these days with most people being pragmatists that don't care about being "pure" but use a mixture of methods from both camps.


Indeed, Bayes Theorem is proven -- how can it be controversial?

Perhaps a good idea with priors is to vary the priors and see how the results vary. This shouldn't be too hard with small-to-moderate sized data sets.

A result that depends heavily on a particular prior may demand additional investigation.


Couldn't you plot a graph of prior probability versus Bayesian significance? So that someone could see how significant it is if you assume that the prior probability is 1 in 100 vs 1 in 1000 vs 1 in a million, etc.


One other example of the flaw (I think) that came out in pop culture was a few years ago during the Serial podcast about Adnan Syed.

Near the end when the hosts were wrestling about whether they should think he was guilty, the interviewer's friend (the producer? Dana?) came on and shared why she thought he was guilty, and it was this convoluted argument about if Syed was innocent, he'd have to be the unluckiest guy in the world, so therefore he was probably guilty.

Her point was that it was just too strong a coincidence that it was his ex-girlfriend that got killed, on that particular day during some period of time when he didn't have an alibi, etc etc.

That seemed to me like it could be a bad Bayesian argument - like it would have been a good argument had Serial selected a random citizen to interview, but they selected someone they already knew was connected to the story. Murders happen and by definition the circumstances are always highly unlikely because murders are so rare, and innocent people closely connected to the circumstances of the crime are very, very unlucky by definition. You can't point to that unluckiness as evidence that they're probably guilty.


Add to that, highly unlikely events occur all the time, but most of the time they're not anywhere near a murder. Once you start digging around for unlikely things, you'll find them.

Heck, even if you dig into a random person associated with a murder, you'll find some unlikely things, but if you dig into a person who has already been accused of the crime (rightly or wrongly), of course there will be loads of suspicious circumstances. That's why they were accused in the first place.


> highly unlikely events occur all the time

They do?


Yeah, sure, it's a matter of how you define the event.

Let's say some day you find an acquaintance on the street. Not extraordinary. But that specific acquaintance that you hadn't seen in 10 years. On that specific day, on that specific street, at that specific time. If you had wondered the day before how likely it would be that that specific set of circumstances would come togethet, you would have rightly concluded it was exceedingly unlikely.

This is another way of looking at the first example in the article. The question was not how likely it was that that specific woman had their two children die of natural causes, but how likely it was that someone in the country or the world would have such a thing happen in a large interval of time.


"The chances of this exact event are 1 in a trillion, therefore with a high degree of confidence, it isn't happening and I am in fact hallucinating. You're not real, Bob from highschool!"

And imagine how often you almost run into acquaintances on the street, and don't notice. It must happen more often than when you do notice. If you made a concerted effort to look for near-coincidences, you would suddenly find yourself noticing lots more unlikely things that were happening all along.

Once you start digging into someone's life you'll learn all sorts of things that might seem really unlikely... but you'd find unlikely things in someone's life picked at random, you just never dug into them, because they're not accused of murder.


Yes. Your chance of being hit by lightning in a given year is quite low. The average number of people struck by lightning in the US annually in the last decade is 270, according to https://www.weather.gov/safety/lightning-odds (and while that's an estimate, the 27 average deaths/year over that period is not).

So a specific person being killed by lightning within a given year is pretty unlikely (less than one in 10 million chance), and on average it happens once every two weeks in the US. Lots of people, see.

Same thing with other situations where there are lots of observations...


The likelihood of me being hit by lightning is very low. Therefore, it is an highly unlikely event. And it doesn't happen very often.

The chance of someone being hit by lightning is not very low. Therefore, it is not a highly unlikely event. And it does happen relatively frequently.

Unlikely events don't happen very often. That's just a definition. If event X happens frequently, then it isn't an unlikely event.


Any given unlikely event does not happen very often.

But if you have a whole bunch of possible unlikely events, then one of them (a different one each time, usually) can happen fairly often.

Back to the lightning example, any given person being hit by lightning is an unlikely event. But as you not "someone being hit by lightning" is not, because we are now observing these unlikely events across so many people.

All of which is to say, observing that an unlikely event happened doesn't provide much information on its own, if you have observed a lot of things happening in general...


Beautifully put - I nominate this as the official summary of the subthread. It's the reason why we need to pad time estimates for projects and leave early to catch trains, even when we can't think of any likely reason for delay - the sum of all unlikely reasons can cross the threshold into "likely".

So it's quite correct to say "unlikely events happen all the time".


The odds of you dying from any particular unlikely death (like lightning) is super low. The odds of you dying of any of them (bear mauling, tied to a railway track, spontaneous combustion, etc) is significantly higher. That's the sort of thing I mean.

Individual coincidences are rare. But if you start really looking for them, you will absolutely find some coincidence. The one that happens will have been unlikely and impossible to predict ahead of time, and yet you were more or less absolutely certain some coincidence would occur, because there are so many possible configurations of coincidence that one is bound to happen.


That's nothing, multiple people were struck by lightning several times. https://listverse.com/2018/06/02/10-people-who-have-been-str...


This is fundamental to Chaos theory (how small initial states can lead to rich systems that contain both expected and unexpected outcomes), which isn't a theory as much as an area of mathematical modeling of the most complex systems. Me waving off a butterfly is just as likely to influence a hurricane as the unmolested butterfly flapping.

Any given reader's brain existing to comprehend this comment is highly unlikely, yet it occurs and continues within layers of chaotic systems, that exist turbulently, in perceived reality.


"The point is that we have good reason to be skeptical, and we should follow the mantra of the mathematician (and Bayesian) Pierre-Simon Laplace, that extraordinary claims require extraordinary evidence. By ignoring the necessity of priors, significance testing opens the door to false positive results."

Such an important quote. I believe the replication crisis is especially present in nutrition studies. We live in an age where headlines go viral and new wave diets are taken on in rapid succession. Take all nutrition studies with a grain of salt. (assuming, of course, that salt is good for you...or is it???)


> we should follow the mantra of the mathematician (and Bayesian) Pierre-Simon [de]* Laplace, that extraordinary claims require extraordinary evidence*

We should, but we don't, because we crave extraordinary results and are ready to give up reason to get them. But if it's too good to be true (or too spectacular), it probably is.


"Harvard Business School professor Amy Cuddy’s 2010 study of “power posing:” the idea that adopting a powerful posture for a couple of minutes can change your life for the better by affecting your hormone levels and risk tolerances."

Hey, now. I like standing with my hands on my hips, imagining I have a cape flowing in the breeze behind me.


<sigh> Okay, let's do this again.

Anyone arguing over frequentist versus Bayesian statistics is missing the foundations of their statistical training. Both are subsumed by the framework of decision theory. This isn't new. It goes back to Wald's work in the 1950's.

And the examples he is talking about wouldn't be saved by different statistical methods. No fiddling with techniques at the end can save a failed design of the trial.


I get that you're probably trying to leave it as an exercise to the reader, but the word "frequentist" doesn't even appear on the wikipedia article for Decision Theory, nor the Stanford Encyclopedia of Philosophy.

I can't tell if you're being hyperbolic, or if it requires deep study to grasp how bayesian and frequentist statistics are rendered irrelevant by decision theory.

Anyone care to take a stab at a layman's summary for the benefit of the under-educated folks around here?


Googling <statistical decision theory> might help you out. My opinion is the Wikipedia page for decision theory could use a good rewrite. Someday I may get around to it.

Also, madhadron didn't quite say that Bayesian and frequentist statistics are rendered irrelevant by decision theory. Rather, statistical decision theory includes Bayesian and frequentist statistics as possible statistical decision rules. You might want to not use Bayesian or frequentist rules and instead use minimax regret or something else.

I do think that it is fine that people argue about Bayes vs frequentist. I wish they'd consider everything else though.


>> You might want to not use Bayesian or frequentist rules and instead use minimax regret or something else.

... What? Any such decision algorithm at least in theory still takes a probability distribution as an input. You still need to approximately follow the proper rules of probability to get that distribution, there's no way around it.


You don't need to have a distribution to make a decision. I don't think I understand what you are saying.


Sorry for the super late reply. I don't know if anyone will see this... But...

In minimax regret, you have a set of available decisions D, and a set of possible states of nature N, and a utility U(D,N). Each state of nature also has a probability P(N) (which can be influenced by the decision too in some problems).

States of nature include "interest rates rise 1%", "interest rates fall 1%", and "interest rates stay the same". Decisions include "invest in stocks" and "invest in bonds".

Minimax regret proposes to ignore the probabilities P(N), instead suggesting a way to make a decision purely based on the utilities of the outcomes. But that is actually an illusion.

Outside of math class word problems, we don't have N or U(D,N) handed to us on a silver platter. There is always an infinite range of possible states of nature, many of which have a probability approaching but never reaching zero, including states such as "win the lottery", "communist revolution", and "unexpected intergalactic nuclear war".

In commonsense decision-making we don't include those states of nature in our decision matrix, because our common sense rules them out as being implausible before we even think about our options. You wouldn't choose to invest in bonds just because stocks have the most regret in the event of a communist takeover.

So what actually happens is we intuitively apply some probability threshold that rules out states of nature falling below it from our consideration. Then we minimize max regret on the remaining "plausibly realistic" states of nature.

Humans are so good at doing probability mentally that this step happens before we even realize it. But if you are writing code that makes decisions, you'll need to do it, and so you'll need to have at least a rough stab at the probability distributions.


How do minimax criteria not follow the"proper rules of probability"?


Or minimax regret


The sibling gave some pointers. It's not that they're irrelevant. It's that they're historically important special cases, and decision theory provides the framework for thinking about when they make sense.

The general setup is this: you have a set of possible states of nature, a random variable on that set which will produce some observations, a set of decisions you may make, and a loss function defined over tuples of (state of nature, decision). The task is to produce a "good" function that maps from the observations to a decision.

It turns out that "good" isn't unique. You start by defining the expected loss for a given state of nature and decision function. Then you generally narrow your decision functions of interest to those for which there is no other function that has a lower expected loss for all states of nature (we call such ones "inadmissible" and ones that aren't uniformly dominated like this "admissible").

But then you end up with a whole collection of decision functions that are better under different conditions, and you need a way of choosing among them. Unbiasedness is one such criterion, as is maximum likelihood (though in many situations the maximum likelihood criterion may single out decision functions that are inadmissible), Bayes under a given prior, or minimax.

For some of these, particularly minimax, you also run into the fact that there aren't "enough" deterministic decision functions to select a best one...but if you allow randomized decision functions, there are.

Bayes procedures also enter in an interesting way: the class of Bayes procedures over all sensible priors is a superset of the set of admissible procedures...but usually not a much bigger set, so you can sometimes prove properties of all admissible procedures by proving them for all Bayes procedures.

For those who have studied game theory, this also leads to an immediate extension of repeated procedures where the decision rule at each step is "stop and accept a decision" or "sample again." These sometimes go under the name of "online trials" and are really useful. And since any admissible procedure is Bayes under some prior, you can often get away with doing the math for these online trials in a Bayesian framework, adding a hard coded stopping rule to correspond to your underlying decision procedure, even if you are a died in the wool frequentist.

The other, really important result in Bayesian statistics is that for a well behaved prior, data eventually drowns it out.

For my part, I don't take Bayesian statistics very seriously as a philosophical point of view. In the 1960's there was a lot of interest in it as a unified system of inference because there was a theorem showing that rational actors with a prior are Bayesian. When you start really digging into Bayesian statistics, though, you realize that a single prior is overspecified, and the more correct class of objects to work with is classes of priors. As soon as you do that, though, the rational actor is Bayesian theorem collapses.

I don't take the frequentist approach that seriously, either. Honestly, this is a space where I don't feel that there is a truly satisfactory underpinning besides decision theory plus "shut up and calculate." But that's true of so much of science that I'm not that bothered by it anymore.


I get and agree with what you're saying, but not with how I think you're suggesting it be applied. If you have a screwy design, you can't learn anything. Garbage in, garbage out. However, some methodologies are more susceptible to erroneously thinking you can learn something from garbage.

When power analyses are more motivated by funding feasibility and 80% conventions than prior belief, decision theory isn't going to help.


> When power analyses are more motivated by funding feasibility and 80% conventions than prior belief, decision theory isn't going to help.

You're assuming that power analyses happen. I'd be thrilled to get most practitioners that far.


Wow, I had read an argument criticising statistcal significance before but it made nowhere near as good of a case as this article has.


I was reminded of this article, particulraly the bit about Sally Clark's case, when I read this today:

https://www.theguardian.com/uk-news/2019/aug/02/louise-porto...

It's about a woman who was jailed for killing all three of her kids. The prosecutors alleged she wanted to sleep around and she couldn't because of the kids. I'm struct by the fact that there was no direct evidence that the mother was responsible for the deaths. The only "evidence" seems to have been that it was not clear why the girls died:

Both deaths were consistent with deliberate airway obstruction, and doctors could not find “any natural reason why either, let alone both, should have died”, prosecutors said.

I would have thought that if there was uncertainty about the cause of death (and that is exactly what "the doctors could not find any natural reason" states: uncertainty about the cause of death) then there is no sufficient evidence to convict.

But, I'm going by only what's in The Guardian article and I don't know the details of the case.


I see there's one more problem with thinker sculpture study, that has nothing to do how we analyze the data. There was another study a while ago, where a dime was placed (or not placed for control group) in the public copying machine. And then researchers would ask those people who were using that copying machine a few questions, one of them was "on a scale from 0 to 10 how happy you are?" People who found their dimes reported significantly higher happiness scores, but I don't remember by how much. Assuming this experiment was true, and not some journalists twisting the data wihtout understanding statistics, I see there's some analogy with the sculpture study. In particular, the conclusion I would make is that asking people questions about happiness or religion or anything whatsoever doesn't tell you much about their actual believes, since their answers are heavily affected by their mood at the moment. I asume this statement can be turned into another study verifying its correctness, and then after we collect the results, we'd have to start arguing about Bayes vs. Fischer all over again...


It's easy to challenge the subjectivity of a prior, pretending that frequentist testing is objective... until you design your own experiment, rather than just analyze one. Turns out this is a more important part. It'd be illustrative to explain that process.

A p-value of 0.05 (or a test with 95% significance) will, if there's actually nothing there, tell you'll think there's something there 5% of the time (wrongly). It says nothing about error rates if there is something there, which is presumably what we're interested in. So for that, you'll do what's called a power analysis. Power of the conventional 80% tells you, if there's actually an effect size of X and you know all about your noise, you'll think there's nothing there 20% of the time (wrongly).

5% comes out of nowhere. 80% comes out of nowhere. A priori knowing your noise is often feasible, but not always. X, the presumed effect size you have to magically know before you run your experiment, comes out of nowhere. Getting this right is crucial to a reliable experiment, and it needs to really reflect your prior belief, even though it's frequentist. It's totally subjective.

What's worse is that X is often taken to be whatever leads to a feasible/fundable experiment that'll be done fast enough, not anything scientific.

It's waaay harder to understand the impact of the subjective 5%/80%/X decisions than it is to understand the impact of your prior. A prior takes way less training. Better yet, you can report your results assuming many different priors and let your reader subjectively decide what to think, so you don't have to commit as hard.

tl;dr The "gold standard" way of doing science is already really subjective and that's okay. Equally subjective alternatives can still be better science.


So how does one go about choosing reasonable bayesian priors for an experiment?


It becomes part of your experimental design. Just like people can quibble with your setup, with your questions, with your procedures, they can quibble with your priors. The difference is that it's out there, explicit.

Note that a big weakness of Bayes rule is that you can look at any data and specify a prior that will make it look good. To continue with the mammogram example, suppose the doctor says "We really don't know if you are likely to have cancer or not. So we're just going to give 50-50 odds, and see what the test comes back with." That's a very different prior from the known base rate. The results would be, where C means "Cancer" and "R" means "Positive Result":

  P(C|R) = P(R|C) * P_prior(C) / P(R)
         = 1.0 * 0.5 / (1.0 * 0.5 + 0.05 * 0.5)
         = 0.5 / 0.55
         = 0.9
A much higher probability. As you can imagine, you can do that in a paper as well: you know the data you have, and you come up with a "plausible" prior to make the data seem important.

In my opinion, in any switch to using Bayesian analysis in scientific work, pre-registering priors will be essential.


> a big weakness of Bayes rule is that you can look at any data and specify a prior that will make it look good

This isn't a weakness in Bayes' rule, it's a weakness in your experimental protocol. You're supposed to pick the prior before doing the experiment and seeing the data.

> In my opinion, in any switch to using Bayesian analysis in scientific work, pre-registering priors will be essential.

Pre-registering statistical criteria and assumptions should already be essential, whether you're a Bayesian or not. The fact that it isn't is a key factor behind the replication crisis.


> This isn't a weakness in Bayes' rule, it's a weakness in your experimental protocol. You're supposed to pick the prior before doing the experiment and seeing the data.

Sure, if the goal is to get to something true. If the goal is to publish or to maintain your position, though, you’ll work differently.

You don’t even need to have seen the data. If I set my priors for the earth being flat extreme enough, it’ll take a long time for even good faith updating to converge to reality.

As I said in another reply, I was simply pointing out that Bayesian analysis can also be abused, and that proper protocols still need to be followed. A point on which I believe we agree. :-)


> Bayesian analysis can also be abused, and that proper protocols still need to be followed. A point on which I believe we agree. :-)

Yes, indeed. :-)


It's also nice in that it's a more accessible statistical number than significance testing. It's easier to look at a prior and know the science is suspect (since when is that a 5 in 100,000 chance) than it is a P value test (so this data would fall outside of 8 standard deviations 10% of the time...? I'm not sure how that maps).


Taking 50/50 as a prior indicates that you don't know the true prevalence of C.

Note that here 0.9 is not a probability, but rather a degree of belief. In this concrete example it tells you how much you should be worried upon observing R. If the prevalence of C is unknown (even if C is actually rare!), then given a positive R one should be worried a lot, which is exactly what 0.9 says.

Also note the language change and its implications for policy making, etc: Instead of saying "there is [not enough] evidence that .." we say "given the observations and these assumptions we should [not] believe/expect that .."


All of this is true. I was writing to mainly illustrate that Bayesian analysis is still open to abuse—though as the sibling commenter says, it's hopefully clearer when it's being abused.


Has anyone heard of a scientific journal adopting a peer-replicated approach(only accept papers that peers can replicate)? With the replication rates so low, i'd imagine some of these prestigious journals would want to vet their papers more strictly, but a cursory google search didn't dig anything up for me.


My favorite way to show the base rate fallacy is to take the most extreme example.

Suppose I have an invisible dragon detector that is accurate 99.9% of the time... if it tells me there is an invisible dragon in the room, it doesn't mean there is only a 0.1% chance there is not a dragon... there is a 100% chance there is not an invisible dragon in the room, because they don't exist.


That only works if you already know something is/isn't true with absolute certainty.

It's useless for fundamental research, because by definition you're exploring what you don't know yet.

The real problem is more one of labelling. "True according to science" isn't a binary, but it's treated as if it is - especially by marketers.

Science is more like a set of concentric circles of decreasing confidence. You can be very confident indeed about the contents of the centre circle which includes undergraduate physics and engineering. You can also be confident that there are commonly agreed edge cases, areas of inaccuracy, and extreme circumstances where the science stops being reliable.

As you get further away from the centre confidence decreases. A lot of the debate about replication is about research that is a long way from the centre, where uncertainty is high.

But neither researchers nor the science press nor the mainstream media will report this. Studies are usually presented as "Science says...", as if you're supposed to be just as confident of the results of a psychological study that asks a population of 30 undergrads from the same college and the same year some poorly designed questions as you are in Special Relativity.


It is just to show why base rate matters by using the most extreme example... it doesn't imply that it is easy to calculate.


Or just link to the xkcd comic about the sun explosion detector. https://www.explainxkcd.com/wiki/index.php/1132:_Frequentist...


I always interpreted that comic to be synergistically thoughtful. If the sun exploded, it doesn't matter who has the $50, so it's a good bet from a pragmatic standpoint.


It's incredible it's held up this long. It shows how much science actually is dogma. Human nature wins out over rationality.


Not so incredible if you consider how the sausage is made. People who hate math pursue social science PhDs and are taught "p-values = truth". These same people are pressured to produce lots of research, and not just boring research but preferably shocking research. Journals are incentivized to publish such shocking research because that kind of research gets cited. When you think about it, this stuff will probably hold up for a long time to come.


It has little to do with stereotypes about social scientists hating math or misunderstanding p-values.

The current system incentivizes p-hacking. Nobody wants to throw away their work if it doesn't meet the p < 0.05 criterion, especially when their career is on the line.


While it isn't the most concrete sample size, when I was at college the mathematics requirements for the soft sciences were far inferior to the mathematics requirements for hard sciences. To the extent that even the easier statistics course in the math department wasn't require and instead was replaced by a statistics class in the social science department that was only valid for social sciences.


so...why are we calling this science if it doesn't meet the basics of the Scientific method?


The author teaches a basic course on statistics on harvard extension school and charges ~$2800 for that. He wrote a clickbait random article and I can see it on hacker news front page. And we say only facebook can publish fake news ;)


sooo... what's fake? You made a ton of ad-hominem there, but what's the actual criticism of the article?


Just from a casual look, the cancer diagnosis modeling is flawed.

> the doctor would need to consider the overall incidence rate of cancer among similar women with similar symptoms, not including the result of the mammogram

That's a bad assumption. Mammogram is a radiology tool to investigate tissue. It's not a randomizing element as it's fundamental to arriving at the thesis, which is then correlated from MULTIPLE vectors.

> a similar patient finds a lump it turns out to be benign

A manual inspection is not the same. In good faith, let's assume they are the same for no reason but to argue about how not to do medicine as some precept for "Base rates are effectively random".

Turns out, Base rates are not random guesses.

What's interesting about Bayesian Theory is we use it all the time and then observe and collect statistical data about the outcomes, after making assumptions (like a specific Base rate) and use back propagation to correct until models fit measurable events. This is why, tests often have caveats about efficacy. Base rate is sometimes, reasonably, unknown because there is no additional correlation. This doesn't indict Base rates wherein the vast majority of cases, there are multiple vectors (or new vectors are generated) that show this process is reliable (beyond a few dice rolls). There have also been cases where there is no corroboration from new measures and the deduction is the that original measure and Base rate were random.

It's a lot of hand waving from a classic troll. Why? Probably for students who want to feel like they have "discovered" how the establishment is ignorant.


The author mangled the mammogram example, but when written correctly it's a good example to use.

The author should have used a woman with no symptoms who goes for a mammogram screening test, not a woman with symptoms who goes for a diagnostic test.

https://www.harding-center.mpg.de/en/fact-boxes/early-detect...




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: