
Negative binomial / binomial experiments: frequentist or Bayesian? - RA_Fisher
http://statwonk.com/binomial-negative-binomial-experiments.nb.html
======
contravariant
Surely it isn't that surprising that the outcomes of two different experiments
can end up providing the exact same information?

In fact this case can be inferred from the fact that the Beta distribution is
the conjugate prior distribution for both the binomial and negative binomial
distribution. There are many more distributions that have the Beta
distribution as conjugate prior to one of the variables (in fact it's pretty
much all distributions with a factor p or (1-p) somewhere).

~~~
tel
It is often considered surprising and controversial. In particular, if the
experiment performed included "controversial" stopping criteria then you could
imagine the same data arising but not trusting each result identically.

Arguments exist for each side.

~~~
contravariant
So I guess the problem here is that to calculate the probability that the
likelihood exceeds a certain bound can't be calculated solely from the
likelihood (+prior) itself.

Yet interestingly, the likelihood + prior _is_ enough to make statements about
the distribution of the parameters you're looking at. So you _can_ say
something about the certainty that your parameter exceeds a certain bound.

I suppose it really depends on your application which you'd want to use.

~~~
tel
That's a Bayesian perspective, but the heart of the Likelihood Principle is
that non-Bayesians/non-Likelihoodists can find reason to believe that the
Likelihood doesn't really contain all of the relevant information for making
inferences.

~~~
nonbel
> "all of the relevant information for making inferences"

I think this terminology is way too strong. Obviously the context matters. For
example, the way the data is collected (whether or not the data collector is a
known liar/p-hacker, the sensor is known to malfunction at certain
temperatures, etc) should affect inference.

~~~
tel
Right, and the Likelihoodist/Bayesian would put those details into the model.
Real world concerns get handled by each side in acceptable ways, but very
special details about how sampling and posterior distributions differ are what
make this tricky.

------
nonbel
What is the purpose of the call to prod here?

    
    
      negative_binomial_likelihood <- function(p) {
        prod(dnbinom(40, 7, p))
      }
      binomial_likelihood <- function(p) {
        prod(dbinom(7, 47, p))
      }

~~~
RA_Fisher
Nice eye! In this case it's a no op and the function has no effect because
I've used the `rowwise()` call such that the data frame is evaluated one row
at a time x 1.

The `prod` call is an artifact of me previously using the function in a
vectorized manner to calculate the model's likelihood.[1] This isn't the
common way one might see. More often one would work with the log likelihood
[2]. The reason being that the product of the density can be converted into
sums avoiding overflow errors. The likelihood is __very cool_. I liken it to a
grand generalization of the needle of a record player wrt to information
(entropy).

[1]
[https://en.wikipedia.org/wiki/Maximum_likelihood_estimation#...](https://en.wikipedia.org/wiki/Maximum_likelihood_estimation#Principles)

[2] [https://en.wikipedia.org/wiki/Likelihood_function#Log-
likeli...](https://en.wikipedia.org/wiki/Likelihood_function#Log-likelihood)

~~~
nonbel
In R you can just plug in a vector of probabilities directly to get the
likelihood. Ie:

    
    
      p  = seq(.01, .99, by = .01)
      y1 = dbinom(7,  47, p)
      y2 = dnbinom(40, 7, p)
    

It doesn't matter much for this use case but that will also be much faster. It
does simplify the code quite a bit too though.

~~~
RA_Fisher
Thank you for pushing me to see that, indeed I should take advantage of the
vectorized aspect of this code!

------
nerdponx
> Frequentist: yes.

I don't follow.

~~~
tel
Frequentist analyses will depend upon assumptions of the sampling process
which often include specific stopping conditions. This means that the _reason_
an experiment ends is an important part of data. That exactly defies the
likelihood principle since it means that our outcome is dependent on
information not observed.

~~~
RA_Fisher
Yes! I feel like I'm finally getting to this crux in the difference between
the Bayesian / information theoretic methods and Frequentism.

------
gerdesj
April 1st is three months away.

