Objective Bayesian Hypothesis Testing

underlines · 2024-08-23T09:25:16 1724405116

A great question that I came across in Hypothesis Driven Development a long time ago: Should you use Frequentist Statistics or Bayesian Statistics? It's relevant when you do A/B or Multivariate Testing.

As it was very difficult for someone like me without higher stats or math education, I can highly recommend the following additional sources:

- https://www.redjournal.org/article/S0360-3016(21)03256-9/ful...

- https://amplitude.com/blog/frequentist-vs-bayesian-statistic...

- https://indico.cern.ch/event/568904/contributions/2651065/at...

shiandow · 2024-08-23T10:23:22 1724408602

The Bayesian approach to A/B testing gives an interesting example of how frequentists and Bayesian approaches can differ.

A frequentist approach tries to limit the probability that a test setup will accept a 'false' result, one that could simply arise by chance.

A Bayesian approach actually calculates the probability that a test result could occur 'by chance'. You can then stop the test at any point and be sure you only accept <x% of results that could occur by chance, by the power of expectation values you never breach the x% limit no matter how often you 'stop' the test.

The interesting thing is that while these would seem to be very similar, there actually isn't anything stopping the Bayesian approach from accepting any test eventually. Giving it 0 statistical power in the frequentist sense. The only thing the Bayesian approach ensures is that for any 'false' test you accept after time T there are many more that will keep running.

Vecr · 2024-08-23T13:25:22 1724419522

Why would you care about that though? Calculate the odds between your hypotheses, not the probability you'd ever see one.

shiandow · 2024-08-23T13:42:15 1724420535

The Bayesian stance is that you should not care. The frequentist stance is that a test that has a p-value of 1 is the worst possible.

My stance is that you should know why to care about either. Oh and that the thing you're calculating an expected value off should somehow contribute linearly to your profits/costs, averages do strange things to nonlinear functions.

LegionMammal978 · 2024-08-23T14:14:08 1724422448

Eh, even an expected value that's linear with respect to profits can end up with strange results like the St. Petersburg paradox. In general, naively maximizing it breaks down at the point where you stop being insensitive to the possible risks.

shiandow · 2024-08-24T08:19:34 1724487574

One possible resolution to that paradox is to recognise that money is not linearly proportional to 'value'.

LegionMammal978 · 2024-08-24T21:50:31 1724536231

If happiness is, e.g., log(money), then you can just adjust the game to have the payout be $2^2^n after the nth step. This cancels out the logarithm and recovers the paradox. The only way to get out of it with diminishing returns is to have happiness reach a finite asymptote.

shiandow · 2024-08-25T18:13:39 1724609619

When there is a decent probability that you may crash the financial system I'm nor even sure if a strictly monotonically increasing function is appropriate.

vcdimension · 2024-08-23T05:33:14 1724391194

This article is very interesting and informative, however it's a bit ironic that an article about misinterpretations of the meaning of the p-value, misinterprets the misinterpretation; in the first blue box it's clear that Bernstein is interpreting the p-value as the probability of randomly rejecting the null (which is what you do when you get something statistically significant) yet in the text following that they say he's interpreting it as the probability of the null. Bernsteins mistake is that he appears to interpret it as an unconditional probability rather than a conditional one (correct interpretation; p-value = Prob(rejecting the null when the null is true)).

kqr · 2024-08-23T11:37:29 1724413049

> correct interpretation; p-value = Prob(rejecting the null when the null is true)

This is also not quite correct. The p-value is the probability of falsely rejecting the null due to sampling error. It is quiet on all other errors that are frequently committed.

The real probability of falsely rejecting the null starts at 15 % thanks to mathematical slip-ups alone: https://two-wrongs.com/the-lying-p-value

nalzok · 2024-08-23T19:14:38 1724440478

> by kqr, published 2024-11-19

It's from the future! ;)

null08 · 2024-08-23T08:31:14 1724401874

Yes I had the same issue. But the wording "there is a < 5% probability that an outcome was the result of chance" is in fact problematic since many readers will go on to conclude "hence a >95% probability that the outcome was not the result of chance", so it is easier to misinterpret than the technical definition P( Observation | H_0 ).

In courses I will typically use wordings like "If there was truly no association, then the probability of getting an observation like this is <5%".

elmomle · 2024-08-23T05:29:59 1724390999

When the author says "objective" they are referring to a prior that gives equal weight to values within the null hypothesis and to those without (along with a few other things: symmetric and non-increasing away from the mean). I appreciate this approach, and think there's much to commend it, but think that that's a key thing to be aware of (because any use of "objective" when referring to priors is, shall we say, dubious).

vcdimension · 2024-08-23T06:48:33 1724395713

Yes, it would be nice to know how things change for different weightings of the null and alternative priors.

bookofjoe · 2024-08-23T16:02:01 1724428921

Off topic but topical: Mike Lynch's yacht was named "Bayesian"

vcdimension · 2024-08-23T19:07:36 1724440056

So I guess we'll never know the p-value of that event...

clircle · 2024-08-24T02:23:18 1724466198

Since “that event” happened, its probability is 1.

vcdimension · 2024-08-25T18:04:19 1724609059

but what if H0 = Hewlett Packard did not plan to eliminate Mike Lynch and Stephen Chamberlain...