> The main reason scientists have historically been resistant to using Bayesian inference instead is that they are afraid of being accused of subjectivity. The prior probabilities required for Bayes’ rule feel like an unseemly breach of scientific ethics. Where do these priors come from?
It's not just "being afraid"; the problem is that random guessing (of priors) is not a reasonable replacement for science.
Bayes' rule is great if you can find a reasonable justification for a prior. Bayes is widely used for decision-making (for example), where you need an answer quickly & you aren't trying to make general scientific claims. But if you can't find a justifiable prior in a scientific work, using Bayes' rule just replaces one statistical fallacy for another one. After all, Bayes' rule will give nonsense answers if you give it a nonsense prior!
Bayes' rule is a great tool in many circumstances! But it has a great weakness: it requires a prior. That doesn't make it useless; few tools are useful in all circumstances. But requiring "everyone to use Bayes' rule, even though we have no reasonable way to find a good estimate of the priors," is unlikely to ever happen (and rightly so). The article rightly points out a serious problem with the typical application of statistics, but there needs to be a better justification for priors than is suggested in this article.
I could imagine systemic worldwide ways to deal with this. For example, perhaps the scientific community could allow people to propose initial priors, and then allow multiple different papers to improve the estimation of the probability over time. But that would require much more than articles repeatedly saying "there's no serious problem with priors"; having a justifiable way to estimate and update priors is the fundamental problem with Bayesian analysis in the scientific community.
I read a paper and think that there's something wonky with the statistics. Can I go to the journal's website and download the full dataset? No. Can I download the source code used to compute those statistics? No. I can e-mail one of the authors and ask nicely for that information, but what are the odds that they'll respond if they know that their work is weak or outright fraudulent?
Science is supposed to be transparent, it's supposed to be about sharing information, but there's a big opaque wall concealing anything that won't fit into the archaic size limitations of a paper journal.
There's no One Weird Trick to fix the replication crisis, there's no obvious Bayesian replacement for p-values or confidence intervals, but there's an obvious shortcoming in scientific publishing that is a huge impediment to secondary research.
A high schools student can't get full credit on a math exam without showing their working - why should scientists be held to a lower standard?
The right model IMHO is open publication a la arxiv with an associated open review process so that anyone who wants to can weigh in. The wheat can be separated from the chaff by doing a "reviewer-rank" analogous to page rank so that the opinions of the people who know what they're doing count more than the crackpots. But this idea that there's an elite priesthood of anointed Scientists, a.k.a. corporate researchers and tenure-track university faculty, who pass final judgement on all scientific truth really has to end.
(And I say this as someone who made their living working successfully within the current system for 15 years. I quit in no small measure because I saw firsthand just how corrupt and dysfunctional it could become, and I was working in a field that is generally considered "hard" science!)
Peer review as something associated with getting a paper published dates from the mid-20th century. The 19th century used the superior approach of publishing whatever and letting interested parties review it then.
This sucks, because maybe you have more questions that could be answered with their datasets, but you can't do shit because this shitty system.
I'm not a scientist, and partly because though the uni I was really put of by the bureaucracy and the smell of endogamy, obscure and old system and so on. It's so hard to build new knowledge when everyone is taking care of their little yard and keeping everyone else away... and social sciences need this openness like farmers need their rain.
Yes, I never meant to imply otherwise.
There are journals that do light initial screening, then publish the submitted manuscripts online, then publish the whole peer review (reviewer comments and authors' responses) online, and then proceed (or not) to publish the reviewed article. Also anyone on the internet can leave an additional review, or just some questions, during the review process.
Also there are journals that have comment sections for the published articles.
Comment section for published articles: For example all PLOS journals
We wanted the Earth to be the center of the universe. When somebody published writings suggesting we're just another body of no uniqueness revolving around a similarly indistinct star it posed a typical issue. If a scientist wanted to advance their career, they could come out quite strongly against this. Coming out in support of it risked turn the mob's attention on you. Refuting invalid refutations of this view is a non-trivial matter. Refuting what might, in modern times, be thousands or more - would be impossible. Indeed Galileo's correct heliocentric view would not come to be accepted until many decades after his death.
Of course we've overcome such reactionary barbarism though, right? Today there are at least two major issues we face. The first is corporate science. To give a now recently resolved example consider leaded fuel. Leaded fuel is something that from its earliest day drew skepticism. Nonetheless it persisted for the better part of a century with government and scientific endorsements of its safety. This was in large part because the corporations profiting from it made sure to stack the science in their favor and to also vigorously attack any critics of its safety. In an open system you'd see the same problem as above. Support the science suggesting leaded fuel may be unsafe, face the well funded mob. Oppose the science, get grants, accolades, and personal recognition. Compare Midgley, who lives in sufficient infamy that only his last name need be referenced, to e.g. Alice Hamilton.
The other issue is one of social matters, very similar to the issues that Galileo faced. Genetics is a perfect modern example. As but one example there, the connection between genetics and intelligence is in no way ambiguous. But this is something we, socially, do not want to accept and so progress is absolutely glacial and if one steps a hair outside an invisible line that we've drawn, they risk getting condemnation, hate, and exclusion from the scientific system. In many ways this is exactly what happened to Galileo. He was not imprisoned for his views, so much as for his disregard for social norms when expressing them - similar as well to the story of Socrates, who was unrepentantly provocative til the end.
The problem there is that science should be judged based on merit, not on presentation or nicety. As you move more towards a social system I expect we'd begin to move ever more from the former to the latter. This would be magnified a million times over if we chose to use a pagerank style system as you proposed. I fully agree our current system is, at best, dysfunctional. But I think our idealization of social systems is quite often contradicted by what they turn into - something Socrates, quite prophetically, often spoke of.
Socrates was not just "provocative til the end". He was part of particularly deadly politics and animosity toward him had a lot to do with autocratic rule his friends and students installed and deaths they caused. The death was miscarriage of justice, because there was supposed to be amnesty and gods thing was likely a way to work around that.
Nevertheless, it did not had less to do with him being right and more to do with revenge and more to do with where exactly his philosophy led just a few years ago.
Their distaste to democracy this rule was based on had to do with Socrates philosophy.
Si yeah, gods largely because there was a.) amnesty for things that went on during thirty tyrants period b.) Socrates himself was not actively perticipating in killing citizens and at one point ignored order to go and kill (presumably). He did not left the city either through he could nor opposed the rule.
When he was cocky during court case, he knew gods accusation is bullshit. He likely underestimated how those who were against his friends rule resented him and likely how much they seen his ideology as threat to their own freedom, life, democracy and what not.
Athens were quite short after bloody revolutions and pretending that it had nothing to do with anything except immediate charges just makes history boring and incomprehensible.
Socrates was not executed because he was misunderstood genius. He was executed because he was seen as very real threat.
And that gets back quite directly to the point of this entire thread. Many things are not well decided by consensus. Major advances in science are sometimes actively and openly embraced by all, as was the case for relativity. But many times progress is met with vigorous opposition, not even on a technical level but on an inertial one. It was none other than Max Planck that witted, "Science progress one funeral at a time."
He was killed for his association with rule of thirty. The single act of not actively participating does not change that.
He was not actively opposed to that rule either in any way or shape. Other (numerous) people were actively opposed and died or were tortured for it. His values partially gave ideological background to rule of thirty.
He did underestimated situation. He was confident and surprised at the result.
And as for his trial the indictment against him read, "This indictment and affidavit is sworn by Meletus, the son of Meletus of Pitthos, against Socrates, the son of Sophroniscus of Alopece: Socrates is guilty of refusing to recognize the gods recognized by the state, and of introducing new divinities. He is also guilty of corrupting the youth. The penalty demanded is death." Socrates intentionally provoked the 500 jurors with statements such as, "Men of Athens, I honor and love you but I shall obey God rather than you, and while I have life and strength I shall never cease from the practice and teaching of philosophy." The use of "God" and not "Gods" was not an accident. When Socrates was asked to propose his own punishment (in lieu of death) he proposed he be rewarded.
He obviously was not surprised when he was found guilty. The jurors were 500 random Athenian men, hardly his compatriots. His parting words speak of his intention, "The hour of departure has arrived, and we go our ways - I to die, and you to live. Which to the better fate is known only to God." He knew he had lived a great life and one that would extend well beyond himself. His martyrdom here was the grand finale to emphasize the correctness of the unpopular things (within Athens) that he had been saying all along. He warned that Democracy can become an unreliable and capricious system. Democracy responded by murdering him. You're undoubtedly correct that some of the majority that decided he should be murdered were driven by a misguided lust for revenge. That is a key validation of everything he ever said, and he knew it would be.
Haha. Human nature never changes.
Priors in Bayesian statistics are not randomly guessed. The prior is supposed to reflect your state of knowledge prior to looking at the data from the experiment/study/test/whatever (hence the name "prior"). For example, in the mammogram example, it is assumed that the doctor knows the base rate of breast cancer in the population (which is a very reasonable assumption since there are mountains of data on such rates) and the false positive and false negative rates of the test (which come from data on previous tests). Those are the prior. The doctor doesn't just make up those numbers; they come from prior knowledge.
> if you can't find a justifiable prior in a scientific work, using Bayes' rule just replaces one statistical fallacy for another one
In other words, if you have no prior knowledge about something, you can't expect to do statistics on some small set of data and get reliable answers. Yes, that's true. And Bayesianism tells you this is true, by telling you that you can't find a justifiable prior. (Technically, you can always find a maximum entropy prior, but in most cases that's not really any better than having no prior at all since it is basically telling you you need a lot more data before you can conclude anything.) Whereas statistical methods as they are currently done in many scientific fields simply ignore this issue completely and pick arbitrary thresholds for statistical significance. Which is the article's point.
In other words, Bayesianism, properly viewed, is not a magic bullet for extracting statistical answers from nowhere. It is a tool for exercising discipline, so that you know when you have to say "sorry, not enough data for a meaningful answer".
So there is a very convenient way to view Bayes’ theorem by looking at odds ratios, Pr(A)/Pr(A'), where A' is the complement of A (the event which obtains precisely when A does not), which are equivalent to probabilities but span the range from 0-∞ rather than the range 0-1:
Pr(A | E) Pr(A E) Pr(E|A) Pr(A)
------------ = ---------- = ---------- x --------
Pr(A' | E) Pr(A' E) Pr(E|A') Pr(A')
Taking the logarithm, one essentially can measure the impact of any new evidence in decibels as a linear result; the core of Bayes’ theorem can be stated as saying that evidence is measured in decibels.
For example in the breast cancer example in the original article, you have a +13 dB evidence—Pr(E|A)/Pr(E|A') = 20—but the prior log-odds are given as -20dB and thus the shocking conclusion is that after the evidence your log-odds are still negative, -7dB, and you likely do not have cancer. For a life-or-death situation, resolving the relative nature of evidence to a specific probability is clearly essential.
But in terms of p-values, if you have an experiment which gives +20dB evidence to something, I would say that this should still be meaningfully publishable as a relative increment even if it hits something whose prior is -30dB and therefore it still is only a ~1/10 chance that the thing is actually true.
Like, I think that the key is having a vocabulary for relative results versus absolute results and I think that the Bayesian language can be used to take one step towards that, but perhaps it would be nice if we could have a standardized way in our language to indicate that numbers are relative rather than absolute, the way that we do habitually with decibels and pressures and such all the time...
As opposed to the frequentist approach of saying, "We have data D and hypothesis H. If the probability of D when not-H is true is below a threshold, H is true," which hides an awful lot of assumptions.
The advantage of Baysian statistics is that you have to explicitly state your assumptions in terms of a prior.
My textbook (what would be called a frequentist book) is very explicit: You either reject the null hypothesis, or you fail to reject the null hypothesis. A significance level test never concludes that the null hypothesis is true. It's nuanced, but is not quite the same as your statement.
Again, this claim is something I see only from self described Bayesians. I've not met a professional statistician ever fail to admit it. My textbook (written in the 90's), in at least two places, talks about it. In one place it explicitly warns against assuming the "traditional" approach is more objective, and points out that the assumptions exist in the model and the practitioner should be aware of those assumptions.
And really, when it comes to p-values, I've never seen any stats textbook not give a proper description and interpretation. No statistics textbook I've read describes significance testing as a mere prescription from which you can deduce binary answers. My book discusses factors in picking your p values, and the implications, and that the appropriate value is very dependent on the problem.
How does frequentist statistics answer this question? What's the null hypothesis that should be used here? Can this question be answered without assuming a prior on the likelihood that the individual had breast cancer before the test was performed?
By applying Bayes Theorem, just as the article did. By invoking Type I and Type II errors.
This is a standard problem in stats textbooks - not breast cancer but the general problem of a diagnostic measure that is 99% accurate, for a disease that has less than 1% prevalence.
Frequentists don't insist on using p-values to solve all problems.
>Can this question be answered without assuming a prior on the likelihood that the individual had breast cancer before the test was performed?
Why should one not assume a prior? There is a base rate for breast cancer. Why should a frequentist not use that information? I've never heard a statistician who does not describe himself as a Bayesian say one shouldn't. Every textbook suggests one should. Where are people getting the idea that frequentists don't use Bayesian methods. They always have used them.
I'm finding this whole discussion orthogonal to Bayesian vs frequentist statistics. The main issue frequentists have with a lot of Bayesian approaches is that Bayesians often want to assign probabilities to one off events, whereas frequentists insist only on repeatable events. That is where the accusation of "subjective" comes from. Frequentists like to believe that any question of probability can be decided by taking N samples and seeing the outcome (even if only conceptually).
For problems where there is a population, and one can do sampling (i.e. repeatability), there's never a problem with using Bayesian methods.
It seems to me that when there really is a known base rate, the answers would coincide.
And if there is no base rate, the Bayesian would guess (but explicitly) or refuse to answer, while the frequentist would give a plausible answer, which, however, very much hinges on the unknown base rate.
In the light of the replication crisis, maybe the Bayesian approach is better.
It's good that the textbooks you were taught from point out that assumptions exist and should be acknowledged.
Have you found a "standard" text book that doesn't?
Not a probability textbook, but a statistics one.
The claim is that the prior is subjective and they're bad. The Bayesians are drawing the analogy that frequenteist assumptions are just as subjective (but maybe less explicit) to the prior selection so objecting to one and not the other is... ...in need of further justification.
The only such claims I've seen are ones where they disagree with the prior, which is no different that disputes amongst themselves where so-called frequentists argue whether p=0.05 or p=0.001 is appropriate. Frequentists don't pretend that their p-value is set in stone.
Let me repeat what I said in another comment:
My complaint is that while there are differences amongst Bayesians and frequentists, this article does not present any, and wrongly implies things about frequentists.
The Bayesian approach forces you to explicitly model those assumptions. Or to admit that you are ignoring them.
And frequentism (if there is such a thing) does not prohibit using base rates in their analysis. The article is setting up false dichotomies.
Also, this in the article is a "random guess":
>Maybe we’re not so dogmatic as to rule out “The Thinker” hypothesis altogether, but a prior probability of 1 in 1,000, somewhere between the chance of being dealt a full house and four-of-a-kind in a poker hand, could be around the right order of magnitude.
If you really want to assume "minimum information," use the Jefferys prior for your domain. This minimizes the square root of the determinant of the Fisher information matrix.
The Jefferys has the key property of being invariant upon changes of parameters. This property is not generally true of implicit frequentist priors (unless they correspond with the Jefferys prior), but yet is essential for uninformativeness. If your model depends on the units you choose for your parameters, you can hardly call it objective!
I suggest looking at the essay "Beyond Bayesians and Frequentists" by Jacob Steinhardt http://cs.stanford.edu/~jsteinhardt/stats-essay.pdf , who says, "The essential difference between Bayesian and frequentist decision theory is that Bayes makes the additional assumption of a prior... and optimizes for average-case performance rather than worst-case performance. It follows, then, that Bayes is the superior method whenever we can obtain a good prior and when good average-case performance is sufficient. However, if we have no way of obtaining a good prior, or when we need guaranteed performance, frequentist methods are the way to go." The same author also has an essay that tries to explain why Bayesians shouldn't be so confident in their approach: "A Fervent Defense of Frequentist Statistics", 18th Feb 2014, https://www.lesswrong.com/posts/KdwP5i6N4E4q6BGkr/a-fervent-...
Note that the author isn't against Bayesian approaches, but against the dogmatic assumption that frequentist approaches are always worse.
This is nonsense. Nothing stops a Bayesian from picking a prior that optimizes for worst-case rather than average-case performance, given a particular utility function. The really questionable premise here is that the utility function is known; in practically any case of real interest, it isn't, and that's a problem regardless of whether your decision theory is "Bayesian" or "frequentist" (which are misnomers for decision theories anyway for the reasons I gave in my other post just now).
This essay is about decision theory, and the things he is calling "Bayesian" and "frequentist" are more than just statistical methods, which is what the replication crisis is about. Decision theory, particularly when other agents are present, cannot be handled by any method that only considers statistics; game theory is involved. The Steinhardt article is basically claiming that "Bayesians" can't use game theory while "frequentists" can, which is nonsense.
> After all, Bayes' rule will give nonsense answers if you give it a nonsense prior!
You are absolutely correct here, but draw the wrong conclusion. Bayes rule isn't just some convenience that you ought to ignore without a solid prior. Fundamentally, what can any study say about a subject whose prior likely ranges over a few orders of magnitude among the experts? That subject represents a gaping hole in our knowledge. If a single study were to narrow the band by a whole decade, it would still leave a decade sized hole... All of this should be explicit and out in the open.
> I could imagine systemic worldwide ways to deal with this. For example, perhaps the scientific community could allow people to propose initial priors, and then allow multiple different papers to improve the estimation of the probability over time.
Yeah, this is the way to go. Researchers should preregister their studies with their priors. Future studies will revise them. After all, its easy to argue about numbers and those who disagree can easily supply their own.
As the article goes on to argue, ignoring (or hiding) the existence of priors is not science, either. (Two incidences of SIDS in one family are probably not independent, even if they aren't murder.)
If you don't have good information on what your prior should be, you can just use a really weak prior. And you probably know more about your data than you think you do -- if you are doing counts, that leads you to a different prior than if you are doing things that can be in increments smaller than a single integer.
More importantly, your priors help you elect plausible explanations for data that let you delve deeper and identify the root cause. If you pretend not to have priors, you won't have any idea how to understand the data.
For example, you run an experiment that shows that several communities using pesticide X average a 1% higher rate of lung cancer than comminities using pesticide Y. There's a lot of variance though, and some communities X have much lower cancer rates.
You could publish the data with no further analysis and leave the broader scientific community to do the work of trying to draw conclusions. Or you could remember that you expected the opposite effect, because pesticide Y has a more direct route to enter the lungs, and then you go back and check the smoking rates in those communities and find a nearly perfect correlation. Accounting for the smoking rates, pesticide Y appears to increase lung cancer rates by 1% relative to pesticide X.
Yes, you could have just published the raw data and other scientists could have figured that out. But realistically, they won't be nearly as involved as you and your peer reviewers are in the work. Even assuming you identify the communities in your published dataset, few scientists doing a meta-review would go deep enough to cros correlate your data against smoking rates if you didn't already do it for them. And if they actually do it, it's only because their priors are telling them that there's an important effect that could alter the conclusion...
In other words, it's your priors that give you a nagging feeling that no matter how big the sample size D is comparing cancer rates among X and Y, you just won't feel P(C|X,~Y,D)-P(C|~X,Y,D) collapse to a narrow distribution in your mind, because you'll know that the outcome hinges critically on a third factor that you haven't seen yet.
No. The whole point of NHST is that it gives the probability of the results given the null hypothesis. It's whole point is that it doesn't touch probability of the results given other hypotheses (except when designing the experiment, where you need other subjective inputs).
What moved me over was the practical learning that your results often aren't that sensitive to your prior and that dependence is usually easy enough to characterize.
So your complaint is a total non-issue. Most people are interpreting their confidence intervals as credible intervals anyway... Essentially, for nearly a hundred years researchers have already been using Bayesian stats with flat priors.
I just don't buy that there's something magically more "mathy" about just conjuring assumptions you want to test.
The only way to avoid having a prior is by pulling a formula from a textbook and applying it without looking at its derivation.
Take this article as an example. Reading it, I got the sense that the author at some point in life discovered Bayesian statistics, and is now on a crusade to twist everything into a pro and anti-Bayesian stance.
His example of the base rate fallacy (breat cancer diagnosis) is probably in every "frequentist" text book out there. Frequentists are well aware of it, and have no aversion to using Bayes Theorem. You will not find a frequentist objecting to taking into account the base rate when applying statistics. The difference, as the GP mentioned, is that the base rate is fairly well known and not subject to much debate. Whereas this:
Is a number he pulled out of his rear end, and his subsequent calculation is not meaningful to people who don't agree with the prior. Sure, anyone can manufacture a prior if they wanted. And part of me is merely wondering: If he wanted a prior of 1 in 1000, why not simply require a p value of 0.001 instead of the 0.03 the paper used? The problem with the paper is that the sample size is small (n=57), and small samples are a lot more likely to give extreme results. I'll be OK with a p=0.03 if n=10000, but not if n < 100.
That is the proper response to a low prior probability, in general, yes.
More to the point, in an ideal world the p value one picks as the significance criterion should somehow capture both the state of prior knowledge and the consequences of reaching the wrong conclusion. If it really doesn't matter what you conclude, p=0.5 (not a typo: 1/2) is fine. If the conclusion really matters for something important, p=0.03 is likely too high.
Most published research that does significance testing seems to have no particular discipline for picking their threshold p values other than cargo-culting, unfortunately.
> the sample size is small (n=57), and small samples are a lot more likely to give extreme results.
That's already captured in the p value, no? That is, the sample size is already part of the computation of the p value. If you come out with p=0.03, then that means that if the null hypothesis holds in 3% of cases you'd see your observed results, whatever size your sample is. I'd genuinely like to understand why you feel there is a qualitative difference between n=1e4, p=0.03 and n=1e2, p=0.03, because I feel like I'm missing something there.
(Now it's a lot easier to get p=0.001 with n=10000 if your effect is real than it is with n=57. So in that sense, having larger samples helps. Having a larger sample _might_ also help with the "I tried a bunch of experiments until I got one that tested significant" problem, if it's genuinely harder to do a larger-sample experiment. Of course people could also apply a Bonferroni correction, but most practitioners of statistical testing don't seem to realize it exists or might be needed...)
Perhaps a good idea with priors is to vary the priors and see how the results vary. This shouldn't be too hard with small-to-moderate sized data sets.
A result that depends heavily on a particular prior may demand additional investigation.
Near the end when the hosts were wrestling about whether they should think he was guilty, the interviewer's friend (the producer? Dana?) came on and shared why she thought he was guilty, and it was this convoluted argument about if Syed was innocent, he'd have to be the unluckiest guy in the world, so therefore he was probably guilty.
Her point was that it was just too strong a coincidence that it was his ex-girlfriend that got killed, on that particular day during some period of time when he didn't have an alibi, etc etc.
That seemed to me like it could be a bad Bayesian argument - like it would have been a good argument had Serial selected a random citizen to interview, but they selected someone they already knew was connected to the story. Murders happen and by definition the circumstances are always highly unlikely because murders are so rare, and innocent people closely connected to the circumstances of the crime are very, very unlucky by definition. You can't point to that unluckiness as evidence that they're probably guilty.
Heck, even if you dig into a random person associated with a murder, you'll find some unlikely things, but if you dig into a person who has already been accused of the crime (rightly or wrongly), of course there will be loads of suspicious circumstances. That's why they were accused in the first place.
Let's say some day you find an acquaintance on the street. Not extraordinary. But that specific acquaintance that you hadn't seen in 10 years. On that specific day, on that specific street, at that specific time. If you had wondered the day before how likely it would be that that specific set of circumstances would come togethet, you would have rightly concluded it was exceedingly unlikely.
This is another way of looking at the first example in the article. The question was not how likely it was that that specific woman had their two children die of natural causes, but how likely it was that someone in the country or the world would have such a thing happen in a large interval of time.
And imagine how often you almost run into acquaintances on the street, and don't notice. It must happen more often than when you do notice. If you made a concerted effort to look for near-coincidences, you would suddenly find yourself noticing lots more unlikely things that were happening all along.
Once you start digging into someone's life you'll learn all sorts of things that might seem really unlikely... but you'd find unlikely things in someone's life picked at random, you just never dug into them, because they're not accused of murder.
So a specific person being killed by lightning within a given year is pretty unlikely (less than one in 10 million chance), and on average it happens once every two weeks in the US. Lots of people, see.
Same thing with other situations where there are lots of observations...
The chance of someone being hit by lightning is not very low. Therefore, it is not a highly unlikely event. And it does happen relatively frequently.
Unlikely events don't happen very often. That's just a definition. If event X happens frequently, then it isn't an unlikely event.
But if you have a whole bunch of possible unlikely events, then one of them (a different one each time, usually) can happen fairly often.
Back to the lightning example, any given person being hit by lightning is an unlikely event. But as you not "someone being hit by lightning" is not, because we are now observing these unlikely events across so many people.
All of which is to say, observing that an unlikely event happened doesn't provide much information on its own, if you have observed a lot of things happening in general...
So it's quite correct to say "unlikely events happen all the time".
Individual coincidences are rare. But if you start really looking for them, you will absolutely find some coincidence. The one that happens will have been unlikely and impossible to predict ahead of time, and yet you were more or less absolutely certain some coincidence would occur, because there are so many possible configurations of coincidence that one is bound to happen.
Any given reader's brain existing to comprehend this comment is highly unlikely, yet it occurs and continues within layers of chaotic systems, that exist turbulently, in perceived reality.
Such an important quote. I believe the replication crisis is especially present in nutrition studies. We live in an age where headlines go viral and new wave diets are taken on in rapid succession. Take all nutrition studies with a grain of salt. (assuming, of course, that salt is good for you...or is it???)
We should, but we don't, because we crave extraordinary results and are ready to give up reason to get them. But if it's too good to be true (or too spectacular), it probably is.
Hey, now. I like standing with my hands on my hips, imagining I have a cape flowing in the breeze behind me.
Anyone arguing over frequentist versus Bayesian statistics is missing the foundations of their statistical training. Both are subsumed by the framework of decision theory. This isn't new. It goes back to Wald's work in the 1950's.
And the examples he is talking about wouldn't be saved by different statistical methods. No fiddling with techniques at the end can save a failed design of the trial.
I can't tell if you're being hyperbolic, or if it requires deep study to grasp how bayesian and frequentist statistics are rendered irrelevant by decision theory.
Anyone care to take a stab at a layman's summary for the benefit of the under-educated folks around here?
Also, madhadron didn't quite say that Bayesian and frequentist statistics are rendered irrelevant by decision theory. Rather, statistical decision theory includes Bayesian and frequentist statistics as possible statistical decision rules. You might want to not use Bayesian or frequentist rules and instead use minimax regret or something else.
I do think that it is fine that people argue about Bayes vs frequentist. I wish they'd consider everything else though.
... What? Any such decision algorithm at least in theory still takes a probability distribution as an input. You still need to approximately follow the proper rules of probability to get that distribution, there's no way around it.
In minimax regret, you have a set of available decisions D, and a set of possible states of nature N, and a utility U(D,N). Each state of nature also has a probability P(N) (which can be influenced by the decision too in some problems).
States of nature include "interest rates rise 1%", "interest rates fall 1%", and "interest rates stay the same". Decisions include "invest in stocks" and "invest in bonds".
Minimax regret proposes to ignore the probabilities P(N), instead suggesting a way to make a decision purely based on the utilities of the outcomes. But that is actually an illusion.
Outside of math class word problems, we don't have N or U(D,N) handed to us on a silver platter. There is always an infinite range of possible states of nature, many of which have a probability approaching but never reaching zero, including states such as "win the lottery", "communist revolution", and "unexpected intergalactic nuclear war".
In commonsense decision-making we don't include those states of nature in our decision matrix, because our common sense rules them out as being implausible before we even think about our options. You wouldn't choose to invest in bonds just because stocks have the most regret in the event of a communist takeover.
So what actually happens is we intuitively apply some probability threshold that rules out states of nature falling below it from our consideration. Then we minimize max regret on the remaining "plausibly realistic" states of nature.
Humans are so good at doing probability mentally that this step happens before we even realize it. But if you are writing code that makes decisions, you'll need to do it, and so you'll need to have at least a rough stab at the probability distributions.
The general setup is this: you have a set of possible states of nature, a random variable on that set which will produce some observations, a set of decisions you may make, and a loss function defined over tuples of (state of nature, decision). The task is to produce a "good" function that maps from the observations to a decision.
It turns out that "good" isn't unique. You start by defining the expected loss for a given state of nature and decision function. Then you generally narrow your decision functions of interest to those for which there is no other function that has a lower expected loss for all states of nature (we call such ones "inadmissible" and ones that aren't uniformly dominated like this "admissible").
But then you end up with a whole collection of decision functions that are better under different conditions, and you need a way of choosing among them. Unbiasedness is one such criterion, as is maximum likelihood (though in many situations the maximum likelihood criterion may single out decision functions that are inadmissible), Bayes under a given prior, or minimax.
For some of these, particularly minimax, you also run into the fact that there aren't "enough" deterministic decision functions to select a best one...but if you allow randomized decision functions, there are.
Bayes procedures also enter in an interesting way: the class of Bayes procedures over all sensible priors is a superset of the set of admissible procedures...but usually not a much bigger set, so you can sometimes prove properties of all admissible procedures by proving them for all Bayes procedures.
For those who have studied game theory, this also leads to an immediate extension of repeated procedures where the decision rule at each step is "stop and accept a decision" or "sample again." These sometimes go under the name of "online trials" and are really useful. And since any admissible procedure is Bayes under some prior, you can often get away with doing the math for these online trials in a Bayesian framework, adding a hard coded stopping rule to correspond to your underlying decision procedure, even if you are a died in the wool frequentist.
The other, really important result in Bayesian statistics is that for a well behaved prior, data eventually drowns it out.
For my part, I don't take Bayesian statistics very seriously as a philosophical point of view. In the 1960's there was a lot of interest in it as a unified system of inference because there was a theorem showing that rational actors with a prior are Bayesian. When you start really digging into Bayesian statistics, though, you realize that a single prior is overspecified, and the more correct class of objects to work with is classes of priors. As soon as you do that, though, the rational actor is Bayesian theorem collapses.
I don't take the frequentist approach that seriously, either. Honestly, this is a space where I don't feel that there is a truly satisfactory underpinning besides decision theory plus "shut up and calculate." But that's true of so much of science that I'm not that bothered by it anymore.
When power analyses are more motivated by funding feasibility and 80% conventions than prior belief, decision theory isn't going to help.
You're assuming that power analyses happen. I'd be thrilled to get most practitioners that far.
It's about a woman who was jailed for killing all three of her kids. The prosecutors alleged she wanted to sleep around and she couldn't because of the kids. I'm struct by the fact that there was no direct evidence that the mother was responsible for the deaths. The only "evidence" seems to have been that it was not clear why the girls died:
Both deaths were consistent with deliberate airway obstruction, and doctors could not find “any natural reason why either, let alone both, should have died”, prosecutors said.
I would have thought that if there was uncertainty about the cause of death (and that is exactly what "the doctors could not find any natural reason" states: uncertainty about the cause of death) then there is no sufficient evidence to convict.
But, I'm going by only what's in The Guardian article and I don't know the details of the case.
A p-value of 0.05 (or a test with 95% significance) will, if there's actually nothing there, tell you'll think there's something there 5% of the time (wrongly). It says nothing about error rates if there is something there, which is presumably what we're interested in. So for that, you'll do what's called a power analysis. Power of the conventional 80% tells you, if there's actually an effect size of X and you know all about your noise, you'll think there's nothing there 20% of the time (wrongly).
5% comes out of nowhere. 80% comes out of nowhere. A priori knowing your noise is often feasible, but not always. X, the presumed effect size you have to magically know before you run your experiment, comes out of nowhere. Getting this right is crucial to a reliable experiment, and it needs to really reflect your prior belief, even though it's frequentist. It's totally subjective.
What's worse is that X is often taken to be whatever leads to a feasible/fundable experiment that'll be done fast enough, not anything scientific.
It's waaay harder to understand the impact of the subjective 5%/80%/X decisions than it is to understand the impact of your prior. A prior takes way less training. Better yet, you can report your results assuming many different priors and let your reader subjectively decide what to think, so you don't have to commit as hard.
tl;dr The "gold standard" way of doing science is already really subjective and that's okay. Equally subjective alternatives can still be better science.
Note that a big weakness of Bayes rule is that you can look at any data and specify a prior that will make it look good. To continue with the mammogram example, suppose the doctor says "We really don't know if you are likely to have cancer or not. So we're just going to give 50-50 odds, and see what the test comes back with." That's a very different prior from the known base rate. The results would be, where C means "Cancer" and "R" means "Positive Result":
P(C|R) = P(R|C) * P_prior(C) / P(R)
= 1.0 * 0.5 / (1.0 * 0.5 + 0.05 * 0.5)
= 0.5 / 0.55
In my opinion, in any switch to using Bayesian analysis in scientific work, pre-registering priors will be essential.
This isn't a weakness in Bayes' rule, it's a weakness in your experimental protocol. You're supposed to pick the prior before doing the experiment and seeing the data.
> In my opinion, in any switch to using Bayesian analysis in scientific work, pre-registering priors will be essential.
Pre-registering statistical criteria and assumptions should already be essential, whether you're a Bayesian or not. The fact that it isn't is a key factor behind the replication crisis.
Sure, if the goal is to get to something true. If the goal is to publish or to maintain your position, though, you’ll work differently.
You don’t even need to have seen the data. If I set my priors for the earth being flat extreme enough, it’ll take a long time for even good faith updating to converge to reality.
As I said in another reply, I was simply pointing out that Bayesian analysis can also be abused, and that proper protocols still need to be followed. A point on which I believe we agree. :-)
Yes, indeed. :-)
Note that here 0.9 is not a probability, but rather a degree of belief. In this concrete example it tells you how much you should be worried upon observing R. If the prevalence of C is unknown (even if C is actually rare!), then given a positive R one should be worried a lot, which is exactly what 0.9 says.
Also note the language change and its implications for policy making, etc: Instead of saying "there is [not enough] evidence that .." we say "given the observations and these assumptions we should [not] believe/expect that .."
Suppose I have an invisible dragon detector that is accurate 99.9% of the time... if it tells me there is an invisible dragon in the room, it doesn't mean there is only a 0.1% chance there is not a dragon... there is a 100% chance there is not an invisible dragon in the room, because they don't exist.
It's useless for fundamental research, because by definition you're exploring what you don't know yet.
The real problem is more one of labelling. "True according to science" isn't a binary, but it's treated as if it is - especially by marketers.
Science is more like a set of concentric circles of decreasing confidence. You can be very confident indeed about the contents of the centre circle which includes undergraduate physics and engineering. You can also be confident that there are commonly agreed edge cases, areas of inaccuracy, and extreme circumstances where the science stops being reliable.
As you get further away from the centre confidence decreases. A lot of the debate about replication is about research that is a long way from the centre, where uncertainty is high.
But neither researchers nor the science press nor the mainstream media will report this. Studies are usually presented as "Science says...", as if you're supposed to be just as confident of the results of a psychological study that asks a population of 30 undergrads from the same college and the same year some poorly designed questions as you are in Special Relativity.
The current system incentivizes p-hacking. Nobody wants to throw away their work if it doesn't meet the p < 0.05 criterion, especially when their career is on the line.
> the doctor would need to consider the overall incidence rate of cancer among similar women with similar symptoms, not including the result of the mammogram
That's a bad assumption. Mammogram is a radiology tool to investigate tissue. It's not a randomizing element as it's fundamental to arriving at the thesis, which is then correlated from MULTIPLE vectors.
> a similar patient finds a lump it turns out to be benign
A manual inspection is not the same. In good faith, let's assume they are the same for no reason but to argue about how not to do medicine as some precept for "Base rates are effectively random".
Turns out, Base rates are not random guesses.
What's interesting about Bayesian Theory is we use it all the time and then observe and collect statistical data about the outcomes, after making assumptions (like a specific Base rate) and use back propagation to correct until models fit measurable events. This is why, tests often have caveats about efficacy. Base rate is sometimes, reasonably, unknown because there is no additional correlation. This doesn't indict Base rates wherein the vast majority of cases, there are multiple vectors (or new vectors are generated) that show this process is reliable (beyond a few dice rolls). There have also been cases where there is no corroboration from new measures and the deduction is the that original measure and Base rate were random.
It's a lot of hand waving from a classic troll. Why? Probably for students who want to feel like they have "discovered" how the establishment is ignorant.
The author should have used a woman with no symptoms who goes for a mammogram screening test, not a woman with symptoms who goes for a diagnostic test.