
Conjugate Priors Basics - keyboardman
https://leimao.github.io/blog/Conjugate-Priors/
======
CrazyStat
> By the way, some people, including Kevin Murphy, are trying to determine the
> distribution of p(X|μ) is a normal distribution, but I found they were
> wrong.

Very humble.

> Then they claim that p(X|μ) follows a normal distribution [omitted]. This is
> incorrect because [omitted] contains random variable s and s is related to
> x[-bar]. Therefore this term could not be treated as constant.

s and X-bar are independent, a basic result in mathematical statistics (used
among other things for the derivation of the t-distribution for the t-test).
Because they're independent you can ignore the term involving s, and p(X|μ) is
indeed a Normal distribution.

This should not be surprising, since you _defined_ p(X|μ) to be a Normal
distribution at the beginning of your example.

~~~
keyboardman
Thanks CrazyStat. "sigma and mu are independent". But "s and x-bar are
independent" is an assumption which does not hold in practice. It should also
be noted that X and x are different. I defined p(x|mu) to be a Normal
distribution but p(X|mu) was never defined to be normal.

However, I did make a mistake, I should have said p(x-bar|mu), instead of
p(X|mu), follows normal distribution, when N->infty. This means that
p(x-bar|mu) only approximates normal distribution but it will never be the
exact normal distribution. This also matches the conclusion from central limit
theorem.

I am going to fix the "p(x-bar|mu)" typo. Thanks for the discussion.

~~~
CrazyStat
sigma and mu may or may not be independent depending on the specified priors.
In this case you're treating sigma as fixed and only putting a prior on mu, so
they are (trivially) independent--a fixed value is independent of every random
variable.

x-bar and s are independent for data coming from a Normal distribution, which
your model assumes to be the case. Like I said, this is a basic result in
mathematical statistics. The first google result [1] for "independence of s
and x bar" gives a proof.

> It should also be noted that X and x are different. I defined p(x|mu) to be
> a Normal distribution but p(X|mu) was never defined to be normal.

From your writeup (apologies for formatting):

"We have N data points X={x_1,x_2,⋯,x_N} independently sampled from normal
distribution N(μ,σ^2)."

This means X|mu has a multivariate Normal distribution with mean vector (mu,
mu, mu, ..., mu) and covariance matrix sigma^2 * I_n, where I_N is the NxN
identity matrix. p(X|mu) is (multivariate) Normal by definition.

Since X|mu has a multivariate Normal distribution, x-bar|mu has a univariate
Normal distribution. This is the result of a basic fact about Normal
distributions: if a vector X has a multivariate Normal distribution, then MX,
where M is any matrix of an appropriate size to be multiplied by X, also has a
(multivariate or univariate) Normal distribution (in other words, linear
combination of Normally distributed random variables also have a Normal
distribution). In the case of x-bar, M is the 1xN matrix [1/N, 1/N, ..., 1/N].

The Central Limit Theorem is a far more general result. In this case since
we're starting with a Normal distribution we don't need it, as we can use
properties of the Normal distribution to arrive at the distribution of
x-bar|mu directly, without needing the CLT.

[1]
[http://jekyll.math.byuh.edu/courses/m321/handouts/mean_var_i...](http://jekyll.math.byuh.edu/courses/m321/handouts/mean_var_indep.pdf)

~~~
keyboardman
One small thing is that in the proof they use n-1 instead of n as the
denominator for sample variance which is different from my/Kevin's settings.
Although this might be trivial, I will further look into that to see if it
makes any difference. Overall this (full) derivation I would say is non-
trivial, although it is not some discoveries.

~~~
keyboardman
I looked closely and confirmed that the denominator would not matter. Thanks
for providing the proof.

