
Bootstrapped – A Python library to generate confidence intervals - jimarcey
https://github.com/facebookincubator/bootstrapped
======
petters
There are better algorithms for bootstrap intervals that you perhaps should
look into. Better in the sense of quality, not speed.

Google e.g. "interval BCa"

~~~
spencebeecher
Thanks for the feedback Petters! I agree in principle. I am familiar with that
method. The use case for this is for situations where you have large initial
sample counts (so the correction should be less important, we do throw
warnings when the initial sample counts are low). We also provide tools to
check power (I'll commit an example of this later today).

Also - I gladly accept diffs if you are motivated. It is not clear to me that
BCa and other variants provide substantial improvement for most practical
situations. I would invite criticism here.

Tldr - thanks for the feedback

------
kenthorvath
From the project README:

How bootstrapped works tldr - Percentile based confidence intervals based on
bootstrap re-sampling with replacement.

\---

MIT OCW 18.05 has this to say about the technique:

[https://ocw.mit.edu/courses/mathematics/18-05-introduction-t...](https://ocw.mit.edu/courses/mathematics/18-05-introduction-
to-probability-and-statistics-spring-2014/readings/MIT18_05S14_Reading24.pdf)

The bootstrap percentile method is appealing due to its simplicity. However it
depends on the bootstrap distribution of mean(x') based on a particular sample
being a good approximation to the true distribution of mean(x). Rice says of
the percentile method, “Although this direct equation of quantiles of the
bootstrap sampling distribution with confidence limits may seem initially
appealing, it’s rationale is somewhat obscure.”

In short, don’t use it.

Use the empirical bootstrap instead (we have explained both in the hopes that
you won’t confuse the empirical bootstrap for the percentile bootstrap).

\---

Updated to reflect suggestions in comments below.

~~~
johnmyleswhite
I'm not a big fan of the percentile bootstrap, but that reference is a little
too cavalier with material that deserves to be treated more rigorously.
Chapter 8 of Wasserman's "All of Statistics" is much more careful about
outlining the conditions under which the percentile bootstrap will work.
Moreover, he works through a specific example that demonstrates that the
percentile bootstrap does not generate results that are profoundly different
from other methods.

~~~
spencebeecher
John, you are a true wizard. I admire you & will work to incorporate your
feedback (gathered offline) into the library =)

Thanks for the feedback!

------
malayandi
Is there a reason you chose the percentile instead of something like the
pivotal (and for not adding the options for other intervals)? Is it solely for
simplicity?

~~~
spencebeecher
Excellent q - we intend to add in other options as we go. Pivotal being one.
Id also like to add in permutation tests. If you have ideas we welcome diffs
=)

~~~
malayandi
Would love to contribute. I'll try to put some work in this weekend when I
have the time.

~~~
spencebeecher
<3

------
bede
While on the topic of confidence intervals, has anyone encountered a package
capable of generating multinomial confidence intervals similar to CRAN's
MultinomialCI? I've yet to find a Python solution.

------
Sauliusl
Neat!

OP, how does this compare to scikits.bootstrap [1] feature/performance-wise?

[1]
[https://scikits.appspot.com/bootstrap](https://scikits.appspot.com/bootstrap)

~~~
spencebeecher
That uses the BCa method which in some situations is better.

This library gives you a/b test functionality and should be faster on large
input datasets.

------
pbnjay
It's a nice wrapper on a powerful technique. Could be very useful to some
folks - but requiring numpy and pandas is kind of excessive.

~~~
laughfactory
Yeah, I'm not sure that this is a fair comment: how would _you_ avoid the
necessity of pandas and numpy? Besides which, in most projects where you're
interested in confidence intervals you'll probably already have both imported
for other functionally anyway.

~~~
pbnjay
TBH, I haven't used python for science in a few years, so maybe numpy is the
norm now and I'm showing my age. But when I was doing more python, I wrote
bootstrapping, monte carlo and CI code without anything but the standard lib.
I probably used pypy to get it fast enough, but if everyone has numpy now then
that's definitely the way to go and I retract my comment!

I'm not trying to sound arrogant or anything, if numpy is the standard now
then there's definitely no point in reinventing the wheel. (But pandas is
still overkill...)

~~~
lumpypua
pandas is the standard now for most python data munging I've come across.
numpy is "low level".

~~~
pbnjay
bootstrapping does not require any munging.

------
startupdiscuss
I am 87% confident that this has a 74% change of hitting it off with the HN
community.

