
The hacker's guide to uncertainty estimates - mzl
https://erikbern.com/2018/10/08/the-hackers-guide-to-uncertainty-estimates.html
======
BeniBoy
Nitpick, but shouldn't you set the ddof of np.std at 1 when you estimate the
variance in your plot_confidence_interval? If I recall correctly, the default
is 0, which makes it a biased estimator. Weird default by the way!

Great article though, definitely learned some useful things!

Edit: Seems like it is,
[https://docs.scipy.org/doc/numpy/reference/generated/numpy.s...](https://docs.scipy.org/doc/numpy/reference/generated/numpy.std.html)

~~~
mlthoughts2018
Unless your sample size is small, it makes no practical difference and isn’t
worth splitting hairs over.

Also, the topic of correct degree of freedom adjustment can get quite
complicated, for example see Satterthwaite degree of freedom calculation.

Sometimes you have to calibrate a confidence interval or standard error
calculation by comparing with the result from some paper or library function,
and you need to match the degree of freedom calculation even if it is not the
ideal choice.

I had to do this once when migrating a legacy program from Stata into Python
and matching the exact rounding conditions for percentiles (Stata’s xtile) and
degrees of freedom for standard error in a few different significance test
functions became a huge pain.

While I could get the source for NumPy and SciPy, I could not for all the
relevant Stata functions, and spent a lot of time creating experiments that
could decide which of these subtle choices were being used in Stata, and then
match them in Python.

~~~
BeniBoy
Definitely true!

But as in the precise situation you described, this can leads to discrepancies
in result. So if you're coming from Matlab of R (or even Pandas), this can
help to know that in Numpy the default is not the usual statistical one!

~~~
mlthoughts2018
I would not say there is any such thing as "the usual statistical" number for
degrees of freedom. There are just different formulas for different estimators
that have different properties (like bias) that sometimes matter in practical
results and sometimes don't matter.

For example, in Bayesian statistics, summary statistics coming from a
posterior distribution are _always_ biased, exactly according to the degree of
bias encoded into the prior distribution, You _want_ results that are biased,
so long as you believe your prior model of the bias accurately reflects the
information you have available at the time.

If you chose to use a summary statistics like MAP in that setting, you would
not care one bit whether it was biased or consistent or whatever. You'd just
care that the posterior distribution is useful for a practical purpose.

In this sense, I think frequentist stats education falls short a lot of time
time. There is nothing special about NHST as a framework for measuring
significance of an effect. It's just one way to do things that sometimes is
useful and other times isn't.

Similarly there is no special reason to ever care about BLUE estimators,
unbiased estimators, efficient estimators, consistent estimators, etc. etc.,
or differences between different hypothesis tests like Wald test or Welch's
t-test or Mann Whitney non-parametric test, yadda yadda yadda.

They are just different things with different properties and different
formulas. None of them are "the usual statistical one." And any time you
reference a software package using any of them, you should not expect the
software package to have the same assumption about what default choices to
make that you might believe from a textbook or something. There's no reason to
expect them to be connected really at all.

------
clircle
> Since all outcomes are 0 or 1, and drawn with the same (unknown)
> probability, we know that the number of ones and zeros follows a binomial
> distribution. This means that the confidence interval of a “k out of n”
> scenario is a Beta distribution.

Pretty confused by this sentence. The mean of a bunch of 0s and 1s cannot
follow a beta distribution. The support set is not continuous. I _think_ the
author is making Clopper-Pearson intervals, but dropping some terms.

~~~
ezke
The beta dist is conjugate to the binomial, it is a natural prior for the
proportion parameter. The mean of a bunch of 0s and 1s will be between [0,1],
the support of the beta dist.

~~~
clircle
The mean of n 0s and 1s has support {0, 1/n, 2/n, ..., n/n}, not [0,1].

>The beta dist is conjugate to the binomial, it is a natural prior for the
proportion parameter.

I don't think the author is doing Bayesian inference.

