If clinical trialists use p-values wrong, how is moving to Bayesian methods going to be less misused and misunderstood?
The real issue is the the established practice in the research field. It's hard to introduce new methods if peer review is not familiar with them, or if everyone in the field has problems with such basic concept as p-values. Researchers have the tendency to apply statistical tools they have learned mechanically and peer review accepts them mechanically. New tools need more thought, not less.
Bayesian methods have the advantage that, while they can still be misused and misunderstood, at least when used correctly they tell us what we want to know and are easy to interpret (a posterior probability is exactly what you think it is) whereas p-values are hard to interpret correctly (a low p-value can imply a large effect size, a large sample size, or in the face of publication bias, lord knows what).
That said, I do think that the advantage to Bayesian thinking in research would mostly not be about the methods but about the attitude: aside from a couple of weirdos pushing Bayes factors, Bayesian statisticians almost universally communicate results using credible intervals (HPD) instead of dichotomizing the evidence into significant or not significant. Frequentist confidence intervals will get you most of the way there and could completely replace p-values, but if you're going to advocate for better statistics and uproot established practices, might as well go all the way and encourage better methods and better ways of communicating results at the same time.
> By using posterior credible intervals, we might reject the
null, but by using Bayes’ rule directly we see that this rejection is made prematurely as
there is no decrease in the plausibility of the zero point.
This could totally come up if you have a non factorising multivariate posterior distribution, and you want some sound way to summarise it when you can't reason from the marginals alone.
Post argue that Likelihood is basically Bayes, it's actually Fisher school's of thought which is a half way point or a can be seen as a compromise between Frequentist and Bayesian approach.
A good book that talk about this which I've been meaning to read is... In All Likelihood: Statistical Modelling and Inference Using Likelihood by Yudi Pawitan.
I've read the first chapter and it's a very interesting read. I might skip bayes and go straight to likelihood coming from the school of frequency.
It's good but it's mostly a technical introduction to statistical inference (i.e. the mathematics that make statistics work), and the remarks about the differences between different schools of statistics are mostly short asides. It's not a great recommendation for non-statisticians who want more insight into the different schools of statistics.
Also, both frequentist and Bayesian methods rely on likelihoods as an intermediary, so I guess you could say it's a "compromise" but only in a very uninteresting way.
The likelihood L(theta|data) is simply shorthand for P(data|theta), and to get a posterior probability you then simply apply Bayes' theorem to get P(theta|data) ~ P(data|theta) * P(theta), and this last part is your prior.
I also think that a lot of the criticism against Frequentist inference comes from people who have experienced trouble applying a clean Frequentist solution in a more complex case and thus switched to Bayesian statistics as a replacement that is easier to handle.
Astronomy and astrology also answer different questions, it's just that astrology doesn't answer the questions we're interested in, and it doesn't even do a great job with the questions it does purport to answer.
When quantifying uncertainty, the only quantity of interest is the posterior probability, which Bayesian methods provide and frequentist methods don't. Period. The only reason we accept a likelihood instead is because it can provide a reasonable approximation and might be faster or easier to calculate, and the only reason we'd accept a p-value over the full likelihood function is when we need a quick-and-dirty metric of the importance of a variable or an intervention. These are all valid reasons, and heck, few things are more useful than a quick (frequentist) OLS regression, but to then conclude that frequentist and Bayesian methods are entirely equal and complementary is disingenuous.
A good blogpost which deconstructs this line of reasoning in more detail: http://www.bayesianphilosophy.com/how-science-can-permanentl...
It's a bit like the popularity of Excel - we see many people complain about Excel's automated changing of strings to dates, for example. If we'd all switch to R to fix that problem everybody would complain about stringsAsFactors=T instead.
Or you can just force them to give confidence interval.
Sounds to me like a good reason and a good outcome. Don't you think?
Ultimately, what is a "probability"? It's a number that can be used for making predictions about the future. Neither Bayesian methods nor frequentist methods are "ideal" in the sense of predicting (computing) the future — in fact, results from algorithmic information theory put the best bounds on what we can "hope to predict". Very loosely speaking, the best predictor is the shortest program that reproduces the data. But this is uncomputable in general, which means we use something like minimum message length (MML) or minimum description length (MDL) — which are also sometimes uncomputable, but a bit more manageable at least.
In some situations, Bayesian methods can be shown to be equivalent to MDL, in the sense that sizeof(model) + sizeof(parameters) + sizeof(residual data) is a log reformulation of Bayes theorem.
See this paper for more details: http://homepages.cwi.nl/~paulv/papers/mdlindbayeskolmcompl.p...
Not at all. Frequentists cannot define a probability on whether it will rain in a location on a given day. They will respond that such a probability is meaningless. Bayesians can, however, give a meaning to it.
In the Bayesian way of thinking you are always wondering if there is another source of information that can enhance your probabilities. Like in the random forest method, you try to gain new knowledge from many sources.
Not sure I completely understand it, but good point.
Yes you can use prior evidence.
Yes you have to be careful of multiple comparisons.
Yes you have to account for reproducibility.
Oh and Bayesian has p-values too.
- Don't worry, they'll tell you.
Gives out pdfs of the exercises in his textbook on his home page. Thanks for pointer.
- Where did this difference come from? When did it develop?
- What are the basic premises that a Bayesian believes that a frequentist doesn't, and vice versa? Reason it all the way through front and back.
- What does the B/F's model look like? What are the pieces they use, how are they arranged, what are the dependencies, how does causality flow?
- Why are the choices made by one invalid for the other's model? Where do they agree deliberately despite this?
- What are the consequences in the real world? Give me a real example on why this difference matters? "Real" meaning I don't care about dice, I care about engineering and science.
Instead you get some bullshit about fitting a distribution you don't understand to a model you can't see, while relying on understanding the nuances between words like probability and likelihood which is what you are trying to learn in the first place. Plus I swear the numbers agree in 99% of the "examples" given, with some handwaving "but it's different" to excuse it.
Fucking explanations, how do they work? Not in academia.
They're two different academic traditions for what constitutes Good Statistics. They're originally rooted in the philosophical dispute over whether to treat probabilities as frequencies of random outcomes ("frequentist") or as degrees of plausibility ("Bayesian").
In actual fact, a well-trained frequentist knows exactly how and when to use Bayes' rule for gambling, and a well-trained Bayesian knows exactly how and when to publish a paper with a p-value.
The really important difference is over how a whole field expresses its consensus or tradition about what constitutes strong evidence or a plausible theory. A Bayesian would like researchers to elicit priors before experiments (which express something like what reviewers' expectations will be about the experiment), and then calculate posterior distributions after experiments. We could thus then trade off "weak" and "strong" experiments against prior beliefs, while also reducing publication bias' pernicious effect on statistical strength -- or so Bayesians claim. Bayesian methods are also usually more computationally intensive and can make use of small sample sizes.
Frequentists had a lot of disagreements with that sort of thing, and so Neyman-Pierce and Fisher and the like developed a whole lot of statistical methods that don't rely on ever treating a probability as a belief. They preferred to differentiate clearly between a frequency of experimental outcomes, and what researchers think. They figured that Bayesian "priors" were subjective, biased, and untrustworthy. Also, quite importantly, their methods involved a lot less rote computation and instead made use of impressively large experimental samples.
Depending on which tradition you were raised in, and which philosophers of science you side with, you can argue until the end of the world about which one's better. My advice? Use whatever your field demands you use to publish, but be Bayesian on the inside.
Like the person at the root of this thread, I have struggled with explanations on why Bayesian is so great. The answers that worry me tend to be along the lines of "Well, suppose you want the probability for event X (typically a "one-off" event). Frequentist statistics cannot give you an answer (one-off events have no distribution to speak of). But with Bayesian statistics, I can compute a probability for it!"
Yes, but as someone else has pointed out, what the heck do you mean by "probability"? Frequentist statistics is fairly clear on the definition. The whole argument given above seems like he is happy he has some mechanism to get an answer, with little thought about whether he is asking a meaningful question.
Which is why your comment resonates with me:
>They preferred to differentiate clearly between a frequency of experimental outcomes, and what researchers think. They figured that Bayesian "priors" were subjective, biased, and untrustworthy.
I don't want an answer that's dependent on how the person thought. That definitely comes across as subjective to me.
Then I think you'll be somewhat disappointed when you learn more about philosophy of science and the core debates over methodology. The biggest problem is: nothing is purely objective. Everything involves assumptions of some sort, otherwise we run head-on into the Problem of Induction, white ravens, No Free Lunch Theorems (on the more machine-learny side), and other such problems.
>Yes, but as someone else has pointed out, what the heck do you mean by "probability"? Frequentist statistics is fairly clear on the definition. The whole argument given above seems like he is happy he has some mechanism to get an answer, with little thought about whether he is asking a meaningful question.
I don't think frequentist statistics are very clear here at all! A p-value, after all, is a likelihood, which frequentist statisticians insist is not a probability, but which the math clearly says is a conditional probability. So when you get a p<0.05 finding, it never means, "We actually ran this experiment under a control hypothesis N times, for some large N, and fewer than five came out this way." It's a measure of counterfactual outcomes, conditional on an assumption which we pretend to expect to be true. When the p-value is small, we then pretend to be surprised, and pretend to make an interesting inference.
I say "pretend" because an ordinary NHST is mathematically equivalent to a Bayesian credible hypothesis test with a uniform prior over the hypotheses. Performing the frequentist test involves pretending to believe that uniform prior, even though you probably actually set up the experiment in order to obtain a significant p-value.
In the end, the NHST is a chiefly social practice, and the p-value is chiefly social evidence. It's a way of convincing peer reviewers to accept (that is, subjectively believe) that you did a real experiment, when they would otherwise skeptically believe that you made it all up (which, unfortunately, some researchers have been known to do!).
Bayesian methods don't get rid of this subjective, social component to science and make everything "objective", any more than you can do that by hiring Mr. Spock to do your statistics. Bayesian methods drag the subjective, social component of prior elicitation out into the sunlight where everyone involved has to acknowledge it. They also give you numbers that are actually about the experiment you really did, as opposed to measuring your experiment against an infinity of counterfactual experiments you never really performed.
(And also they're easier with small sample sizes, their results are more intuitive to interpret, and generative models are more intuitive to think about than test statistics.)
All that said, I totally have used frequentist statistics (took a very similar class to yours) when called upon to do so. Fighting a philosophy-of-statistics holy war against your higher-ups in the workplace hierarchy is a really bad idea, so however nice Bayesian or frequentism might sound, sometimes you buckle down and do what ships products and publishes papers.
When I first encountered p-values, even with a frequentist mindset, I saw the huge problem that one could have with them. Many frequentists do not like p-values. I wouldn't be surprised if most actual frequentist statisticians (not those in fields like medicine, psychology, etc) do not like p-value usage.
Attacking p-values is not a valid argument against frequentist statistics.
I'll also add that it seems that many Bayesians are really dying for a number, and because frequentist stats doesn't give it to them, they reach for another tool that will - but with little thought about the validity of the tool. I'm not here to defend frequentist statistics, but just because it doesn't give all the answers, that does not mean that some other tool that does give some answers is correct.
It is equally abusable as p-values. I suppose if a Bayesian says he used Bayesian approaches because it made sense given his problem, that's fine (and in my mind, he is just being a statistician, not a Bayesian). The self-identified Bayesians I always encounter don't fall into that mold. They fall into the category of "Look what I can compute that I could not with frequentist statistics" - but any attempts I have to understand what that number means fails - they cannot explain it either, beyond "this is how I feel".
Strongly disagree, tbh. Picking one side or the other in this debate is silly. Don't "be Frequentist" so as to avoid Bayesian model building techniques since you'll end up stuck all the time and don't "be Bayesian" so as to look down upon simple, workable, un-motivated estimation procedures with good performance.
For example, back in my MSc days, I would run a whole lot of metrics on our dataset, and look for patterns. Sometimes I would find a strong, interesting pattern, and go try to tell my advisor about it. He would ask me to double-check my code for bugs, rerun things, and see if the pattern was still there. Often, it wasn't.
My advisor was nobody's Bayesian, a frequentist (and even a user of purely descriptive statistics, oftentimes) through and through.
So to me, "Bayesian on the inside" ends up meaning, "at least Bayesian enough to look for experimental errors." This attitude has helped me a lot in debugging difficult snafus in industry, too.
But what if our beliefs differ?
On several occasions, while tutoring friends who were taking introductory probability, they'd be posed with a HW problem. They would compute the answer in two different ways, and occasionally get two different answers. Both methods seemed correct to them, but they were not - one was always wrong. I used to argue with them about their reasoning on the incorrect answer, but it didn't help much.
What did help? Just doing the homework problem in real life, with a reasonable number of samples. It could be literally in real life or through a computer simulation. The result would always closely agree with one of the answers.
That's why I like frequentist statistics. It gives me a way to settle the answer outside of my own belief system.
Subject to a few technical requirements (basically absolute continuity of priors), it's a theorem in that your posteriors will eventually converge as more evidence is gathered.
That's why I like frequentist statistics. It gives me a way to settle the answer outside of my own belief system.
Can you explain this? To me this makes no sense - as a Bayesian I run simulations too.
Re: Where did the difference come from, that's down to different interpretations of probability. The frequentist interpretation says that probability describes the world, whereas the Bayesian interpretation describes our beliefs. Here's another common misconception: You don't need to subscribe to one interpretation to the exclusion of the other. People who use the Copenhagen interpretation of quantum mechanics (a frequentist formulation if ever there was one) will also speak of fractional belief (the definition of Bayesian probability). It is important to be clear about which interpretation you're using at any one time, but you don't need to tie yourself to one interpretation, and it doesn't need to be part of your identity or world-view.
I'd say I'm philosophically Bayesian, but frequentist techniques are often more convenient.
Perhaps something like quality control, where we want a procedure that only rejects 5% of within-spec parts?
There are separate theoretical foundations, which can be confusing since both Bayesians and Frequentists use probability theory in the same ways. A short explanation of the foundational difference is that Bayesians and Frequentists use probability in different ways.
To a Frequentist, a probability is nothing more and nothing less than a long run frequency: the proportion of times you expect an event to occur if a random experiment is conducted many times. This proportion is usually conceived of as a true, but unknown, constant. A good Frequentist thus can't describe "the probability that you have cancer", because you either have cancer, or you do not. If you want to see what kind of constraints this places on frequentist descriptions of real-world phenomena, look up the definition of the frequentist confidence interval.
(many) Bayesians trace their probabilistic approach to modeling reality to work done in a decision theoretic context in the early 1900's (https://en.wikipedia.org/wiki/Bayesian_probability#Axiomatic...)
In short, Bayesians claim that:
1. Your beliefs should be describable as probability distributions
2. You should update your beliefs when observing new evidence using Bayes' rule
There are solid theoretical justifications for both of these statements.
To a Bayesian, therefore, it is perfectly sensible to talk about the "probability that you have cancer", because there is uncertainty about the phenomenon.
This discussion is, however, almost completely orthogonal to the "applied" implications of choosing a Bayesian or a Frequentist approach to statistical inference. Some thoughts:
1. Bayesian procedures tend to be more computationally intensive
2. non-degenerate Bayesian prior distributions have the effect of "shrinking" parameter estimates towards some null value, which has benefits in high dimensional problems (see: frequentist Lasso and ridge regression)
3. Bayesian inference makes it easy to think about problems in a conditional fashion (e.g., if I knew what "X" was, I would know how "Y" would behave. If I knew what "Y" was, I would know how "Z" would behave."). This makes it quite easy to specify intuitive, yet complex, models of interesting phenomena.
4. There are conceptual advantages to thinking about things as probability distributions.
5. Eliciting prior distributions is hard, but it is also work that any good statistician should be doing (at least informally) regardless of whether they're a frequentist or a Bayesian.
Yes a million times! This problem is mirrored IMO in many domains requiring somewhat complicated math. You end up with an explanation of many layers of concepts flattened into one very hard to grok pancake.
The difference between Bayesians and Frequentists is in the loss function that they attempt to minimize.
Bayesian loss functions assume a constant dataset and sums across one's hypothesis set.
Frequentist loss functions assume a constant hypothesis and sum over across possible datasets.
Really though this is false dichotomy, as it's perfectly possible to be both a Bayesian and a Frequentist by using a loss function which sums over both one's hypothesis set and across possible datasets.
From a more technical perspective all this comes down to a simple fact that some consider probabilities within the framework of Information Theory, while others prefer to use a standalone axiomatic foundation.
I'm not sure of the specifics but it (a) appears to be a fundamental dichotomy on ways to practice "finding a good model" given statistical mathematical foundations and (b) has been heavily politicized historically.
A lot of statistical historical practice is developing a good general purpose way of finding a good statistical model and proving that it works pretty well under some assumptions. Historically, Bayesian methods were considered taboo (perhaps because we generally lacked the ability to compute them) and so most papers were Frequentist. Very historically (Gauss) Bayesian methods were often used to generate some of the first statistical models used in physics.
In basic mathematics both sides share the same beliefs, but in practice they favor different means to construct and evaluate models. See my other answer for many more details, but essentially Frequentists evaluate their models by seeing how much they diverge from reality and Bayesians evaluate them by comparing relative likelihoods of models given what they observe. This leads to wide variations in the means of constructing, elaborating, and talking about models.
A Frequentist's model can be literally anything. You might legitimately consider "the minute of the day that the mailman arrived" an estimator for "the expected time when stock A will beat out stock B three months from now" and then you use Frequentist methods to evaluate how your estimator performs. You'll also likely conclude that this estimator is terrible.
A Bayesian's model typically flows from a "generative story" which results in a massively parameterizable model which covers a huge swath of potential realities and then the Bayesian goes looking in that space for the "most probable" model.
Frequentists can use Bayesian methods if they like. Bayesians can evaluate their "most probable" models using Frequentist evaluations if they like. Good statisticians do all of the above.
We both want to travel from Boston to SF. I fly and get there quickly, you drive and have a great road trip. We both arrive at approximately the same place but our methods and experiences differ. For sufficiently short trips they're even identical.
More to the point, Frequentists and Bayesians disagree about their mechanisms for getting to good models. Really dogmatic Frequentists and Bayesians can disagree about "the meaning of probability" but as far as I'm concerned this has much more to do with decision theoretic policy making and education rather than mathematics.
Lets say you want to model an engineering problem statistically. Frequentist methods will probably end up requiring some leaps of logic and clever tricks to get to the best result but they will also end up with at least a few algorithms you could run on constrained hardware. Bayesian methods will be easier to "plug and chug" in many parts (though they still require a lot of finesse) but the final result will almost invariably require a fast computer.
I'd compare it to integration. One school of thought is that if you're pretty clever you can integrate many things by exploiting their structure to find the antiderivative. Another school of thought is "I can answer most practical questions here through numerical integration at the end of the day, so why both finding an antiderivative?"
Both work essentially but you face very different challenges on each road and some problems can be much easier for one perspective or the other. If you're really good you have both of these tools in your toolbelt and think carefully about when to pull each one out
Go into a lab, do an experiment, call that
one trial, measure a number, call that
number (the value at this trial of)
random variable X.
We might want the average value, expected
value, or expectation of X denoted by
Under meager assumptions, if we take a
sequence of independent samples of X, then
their average will converge to E[X]; this
is the law of large numbers.
We might be interested in the event,
call it A, when X > 1.
We might want the probability of A, that
is, P(A) = P(X > 1).
For random variable X, we can define its
cumulative distribution: For real
F_X(x) = P(X <= x)
[Here are using TeX notation where F_X is
F with a subscript X.]
Then with calculus and meager assumptions,
the probability density of X is
f_X(x) = d/dx F_X(x)
With meager assumptions, calculus and
f_X(x) can give us the expectation E[X].
The likelihood of X = x is just f_X(x),
that is, the value of the density at x.
For the Gaussian distribution, the maximum
likelihood is at the central peak of the
density which is also the expectation.
In some approaches to statistical
estimation, we have some data and seek
estimate x that maximizes the likelihood
of getting the data we actually did get.
Given events A and B,
we can define the conditional probability
of event A given event B by
P(A|B) = P(A and B) / P(B)
So, if we think of events as geometric
regions and their probabilities as their
areas (actually part of a serious
approach), then P(A|B) is the fraction of
B that is also A.
is Bayes Rule.
If we do experiments and believe from
whatever prior to the experiment that we
have a meaningful estimate of P(B) or
P(A|B), then maybe we are being Bayesian.
More generally knowing that event B
occurred we can regard that as
information we have obtained, and what
that information says about event A is
Then, if events A and B are independent,
event B gives us no more information about
event A and we have
P(A|B) = P(A)
So, if we are interested in event A and
its probability P(A) and suddenly are told
that event B occurred, then for event A we
now want the updated view P(A|B).
Using the measure theory foundations of
probability and the Radon-Nikodym theorem
of measure theory, under meager
assumptions we can define for random
variables X and Y
which is a function, say, f(X), of random
variable X and the best non-linear least
squares estimate of Y for any function of
This measure theory approach also lets us
for an infinite set Z of random variables.
This definition is useful, e.g., in the
Poisson process where each increment of
time to the next arrival is independent of
all previous increments, Markov processes,
a stochastic process adapted to a
I started my post by quoting the user's question; that's the question I answered.
I never used the word frequentist and made only minimal use of the word Bayesian. I avoided all political fights.
Note: Possibly of special interests to Bayesian, I touched on E[X|Z] for an infinite collection of random variables Z. This will be important in conditioning (the core of Bayesian) in statistics of stochastic processes.
Also of interest in conditioning, I did mention that if events A and B are independent, then B gives no more information about A because
Anyone working with conditioning needs to know this concept.
More generally, I mentioned the sense in which conditioning gives the best possible non-linear least squares estimate; so, here we begin to see the power of conditioning, of which Bayes Rule is the most elementary case.
Also you may find that my touching on the Radon-Nikodym theorem is a first step to high end versions of Bayesian, e.g., the old idea of sequential testing (A. Wald) and the concepts of stopping times, optimal stopping, the strong Markov property, etc. I wrote out an earlier response, longer, I didn't post, that did go back to
sigma algebras, measurability, etc. I did omit measurable selection, sufficient statistics, etc. For such concepts, the Radon-Nikodym theorem and sigma algebras are crucial, and my post may be the only one here that mentioned either.
Also, comparing my response to others, you may find that my response was comparatively clear, precise, understandable, for such a short post without poorly defined or undefined terms, correct, and from a mature view.
By the way, I hold a Ph.D. in applied math from one of the world's best and best known research universities. My research was on stochastic optimal control and passed an oral exam from a Member, US National Academy of Engineering. I've published as sole author peer-reviewed original research in mathematical statistics. Once I did a statistical estimation of expected revenue growth for the BoD of FedEx; my work got two Board representatives from investor General Dynamics to change their mind and stay and, thus, saved FedEx. I've worked in statistical consulting in finance, marketing, etc., including in computing and statistical consulting for the faculty at Georgetown University. My work in statistical power spectral estimation got my company sole source on an important contract from the US Navy. Once I did a Monte Carlo estimation of a statistical estimation I did of the survivability of the US SSBN fleet under a special scenario of global nuclear war limited to sea -- the US Navy was pleased. My work passed review from J. Keilson, a world class expert in statistics.
Maybe, instead of what I wrote, some readers were looking for something else.
Sorry some people were offended.
H1: answer ~ upset + error
H0: answer ~ error
Frequentist methods rarely directly answer the question we actually have. But they're generally far easier to compute. Bayesian methods are often much more intuitive but are far more complicated and less performant.
Assuming a model M characterized by parameters T and giving rise to data Y, what is P(T|Y,M)
To be sure, you can compare the probability of models as well, and there are Bayesian semiparametric techniques, but models are still really important.
Which is why no working statistician is really 100% Bayesian (intractable in many cases) or 100% frequentist (obviously wrong in some cases). We all use Bayes' Rule (don't need to be "a Bayesian" to do that) and we all are forced to do Newman-Pearson-style power calculations now and then (holds nose). Even the latter have their uses, in preventing the worst of the worst abuses of frequentist techniques (it's not frequentism that is inherently bad per se, it's the profound abuses that turn it into a magnet for bad science; sometimes a Bayesian formulation of a problem is simply intractable).
Bayesians like to walk in the park, and see step by step how things go. The more they walk, the more accurate the result will be. For example at step 5 with 1st roll: 1, 2sd: 6, 3rd: 3, 4th: 1, 5th: 2, you will have (1 + 0 + 0 + 1 + 0) / 5 = 0.4. At step 6 with a roll 5 you will have (1 + 0 + 0 + 1 + 0 + 0) / 6 = 0.333. The answer being closer and closer to the true answer after each roll, each step. Ultimately with enough rolls, bayesians will start to give you an answer close to the frequentists' one.
If you have a coin that comes up heads 60% of the time, a frequentist looks at that as, "as the number of times I flip my coin goes towards infinity the proportion of heads I get goes to 60%." A Bayesian thinks, "absent other evidence on the how the coin gets flipped, I'm about 60% sure the outcome of the flip will be heads."
This lets Bayesians talk about the probability of single events, like basketball games, where frequentists can't.
Bayesians also see conditional probabilities everywhere. A conditional probability says, "well, if I know something about the situation, I should include it in my beliefs." Circumstances matter. It doesn't make a ton of sense to talk about the chances I get hit by a car. They very wildly depending on whether I'm standing on the highway or eating in my kitchen.
Another basketball example. The chances that Spurs win changes dramatically if they play the Warriors or if they play the Kings. I might say "they have a 90% chance of winning given they play the Kings, but a 40% chance of winning given they play the Warriors.
I also need a likelihood function. What are the odds I saw my data given my hypothesis is true. If I got hit by a car, what are the chances I was standing in the street? Given the Spurs won, what are the chances they played the Warriors?
We use something called Bayes Rule which allows us to pile on more and more information on something we call a "prior belief," what we thought about our hypothesis before we saw our data. As we pile on data, we expect to change our beliefs. We become more sure of what we thought, maybe we become less sure, maybe we can totally change our minds.
I want to use Bayes' example since I think it's so good. Imagine you came out of Platos cave and saw the sun rise. You'd think, that's weird, I bet that doesn't happen again. The sun goes down, and you spend some time in the dark. The next morning the sun comes up again. Now you're less sure that sun rises are fluke events. Plus you found some people who aren't freaking out about the whole "big ball of fire in the sky" thing. Maybe now you don't expect the sun not to rise tomorrow. Maybe it will, maybe it won't. As you see more and more sun rises, eventually you get to the point where you are extremely confident that the sun rises every morning. You saw more sunrises and updated your beliefs.
We need one last piece of information: a prior. That's that initial belief you're cave-escaping-self had that sun rises are weird and you probably won't see another one. We can estimate them through population data--percentage of games the Spurs won against the Warriors--or we could just make them up. This is just our belief about the truth of our hypothesis; the chance I get hit by a car regardless of where I am.
We take all this put it into Bayes rule, a blender that gives us the probability that our hypothesis is true given we saw our data. We can use this as a new prior too.
One last example. I'm 70% sure the earth is round. I see a picture of the horizon taken from a hot air balloon and I think there's a 90% chance that I would see that if the earth were truly round. Without going into the calculation, I'm now give or take 80% sure the earth is round. I saw data to support my hypothesis and my belief got stronger.
Why do Bayesian analysis? Because someone once published a study that says that frogs can sense earthquakes some time before they occur. That may be true, but I'm skeptical. My skeptical prior would only get moved slightly to become less skeptical, but it would still need more information, a replication of the study by other people, to actually convince me.
They both yield the same results when your knowledge of the given probabilities is exact
But Frequentists will look at a "top down" view, Bayesians will look "bottom-up", more importantly, as based on Bayes's theorem, they will look at "if a then b" kind of probabilities.
This sounds like a good explanation (the 1st answer) https://stats.stackexchange.com/questions/22/bayesian-and-fr... also obligatory XKCD: https://xkcd.com/1132/
(Though frequentists are not so naive usually)
The confusion comes from the way you read papers. A frequentist looks at a paper as evidence but not truth. Bayesian on the other hand gets to the same place with slightly different math.
The truth is that nobody thinks exclusively on frequentist or Bayesian terms. But that's not comics-grade material, and mixing them would hide instead of surface their differences.
Now, the standard counter example is a composite statistic. Like roll 100 6 sided dice get 600 and assume they are not fair. But, importantly there is a standard deviation assumed in the experiment and there was more than one dice roll. However, if you combine that with something else then your sample size drops back to one.
You might need a standard deviation if you want to do some naive Z-test based on the CLT approximation (since the normal distribution requires a standard deviation), but that's not what XKCD was describing. XKCD was describing an exact test using the true distribution.
I can say this is a dice and therefore it should have distribution X in theory. But, that does not mean it's actual distribution is X without testing. Further, even after testing nothing says the distribution will be unchanged.
Note: The above seem pedantic, but it has significant real world implications.
In this case, the model of 1/36 odds of rolling 2x6 would have to actually be 1/20 (or smaller) to invalidate this test. Do you find it plausible that the bias in 2 die is that high?
In a wider context that single data point is evidence that the detector was tripped or was not tripped. But, unlike a Bayesian the frequentest does not say they then know the actual probability involved and they don't update their priors. Because, to do it correctly you need to pick a P value and a model before doing the test.
Significant: https://xkcd.com/882/ makes a similar mistake by assuming a frequentist would accept that study design before running the tests. Multiple tests require more evidence, though when multiple groups are involved and not all publish you do get this problem.
The idea that A cause cancer is ridiculous. Collect data, well the A group has 10x as much cancer, but that's ridiculous so I conclude there is no relationship between A and cancer.
This should be about right.
Yet I find this particular criticism of Bayesians not fully convincing. The Bayesian approach is to take the existing knowledge (where the phone was often list in the past) with new knowledge (where the beeping is coming from) to come up with probabilities (where to look for the new phone). This seems to me to be the correct approach in general, it's just that in the case of the phone the new knowledge almost entirely outweighs the existing knowledge.
Frequentists and Bayesians disagree on the processes for building and evaluating models. Their techniques are often complementary and are, in current and historical practice, almost always used together by professionals. I would call them two sides of the same coin, although some take philosophical perspectives which are more dogmatic.
The theoretical foundation which divides them (confusingly called Bayes' Law) states that there is a relationship between "the probability of seeing some event when a model is true" and "the probability of a model being true given some event happened". In short, Frequentists tend to build their processes off of the first notion and Bayesians off the second.
In practice, Frequentist methods build their models via whatever tools they like. These often include basic optimization tools for picking the best set of "parameters" of a model. They then evaluate the performance of these models by asking "how unlikely was reality given this model was true?" and rejecting models which fail to predict what happened.
Bayesian methods are more fixed but also more dramatic in their ways of constructing models. They tend to create vast models with many moving parts using what's known as a "generative story". This is acceptable since they use the data they observe to compute "probability of truth" for all possible permutations of their model. This is considered a final result since someone might want to ask "how much more likely is model A to be true than model B?" but Bayesians will also at this point use optimization techniques to find "the most probable model".
In many cases these two approaches arrive at the same places. In times that they do not they provide interesting questions about what we really mean when we say that we "trust a model" and this leads to endless discussion. It's also often the case that "avowed Frequentists" have historically used Bayesian methods to discover their basic models and then evaluated those models in a Frequentist fashion for publication (Fisher was known to do this). This arose because at a certain point in statistical history Bayesian methods were not socially acceptable. Finally, it's a pretty good idea for Bayesians to evaluate their models in Frequentist forms in order to have more ways to discuss how their models perform.
Probably the last and most practical difference between the two is that Frequentists methods are often built taking into account their time and space complexity. Frequentists are more likely to evaluate the performance of various extremely simple estimation techniques and pick the best. The Bayesian process nearly always results in an extraordinarily difficult to evaluate integration problem that requires modern computers to get results out of. That said, Frequentists often arrive at their best results via "strokes of genius" while Bayesians can usually chug through any modeling problem and arrive with a decent (again, computational only) model.