
Think Stats, using python to learn stats  - ms4720
http://www.greenteapress.com/thinkstats/
======
emehrkay
Thanks, I'll put this on my ipad and go through it because I am interested in
stats.

Did a professor use a non-commercial book for their class? If so, that is
amazing.

~~~
candre717
Yes. Olin, the college where the author teaches, encourages that kind of
stuff. I hope they are the start to a trend.

------
pama
I commend the effort to teach statistics to python programmers, but I'd
recommend interfacing with R for creating the figures. R will create beautiful
figures by default; it can also handle sophisticated stats, should the need
arise. The current figures are filled with unneeded inwards-facing ticks that
often overlap with data, and have tick and axis labels of inconsistent font
sizes. The negative numbers in the y-axis in Fig 2.3 and x-axis in Fig 4.6 and
7.1 don't display properly, and the presentation of the data in Fig 3.1 is too
noisy to deliver a useful message.

~~~
araneae
I think the point is to learn statistics using an easier language than R, ugly
figures be damned.

~~~
phren0logy
I see this as a real issue. R is useful but sorta crappy as languages go. I
have hopes for Incanter [<http://incanter.org/>] (a clojure-based stats
environment), but it looks stalled out vs dead in the water. Even still, for
basic stats it's pretty viable.

I would love to see a stats environment based on a sane, full-featured
language that was ready to take advantage of multiple cores and GPU computing.
Clojure has all the building blocks, but it's not built yet.

~~~
araneae
I have to confess that whenever I do any statistics (which isn't much, these
days) I struggle with R for a bit and then say "screw it" and use MatLab.

~~~
pama
Quick R, helped several of my friends try R: <http://www.statmethods.net/>

Once you're hooked, go to the R inferno: <http://www.burns-
stat.com/pages/Tutor/R_inferno.pdf>

------
PostOnce
To me, Allen Downey is the Salman Khan of programming education.

------
dmix
How heavily is this tied to Python?

Can a ruby/JS hacker with no python experience jump into this book?

~~~
FraaJad
Come on young rubyist. embrace the dark side. You will like it. /jk

I went through the fist couple of chapters and all the python code I've seen
so far is quite close to "pseudo-code" than deeply-idiomatic Python.

Considering the fact that the Prof. also wrote "How To Think Like a Computer
Scientist: Learning with {Python/Java/C++}", you can be rest assured that he
knows how not to trip up students with clever syntax.

~~~
verysimple
_Come on young rubyist. embrace the dark side. You will like it. /jk_

I adopted Python because I was told Ruby _is_ the dark side. oO

------
malloc
anyone knows of a similar resource for physics?

~~~
ylem
What area of physics do you want to know about? There are some free books (for
example I know of one on small angle neutron scattering that you can find at
NIST)--also there are many lecture notes online. For texts, though do you
think it would be better to have a static PDF, or a wiki, where an author
could add what they could and the community could improve it?

------
NY_USA_Hacker
I looked through Downey's book.

(A) Definition of Probability.

As far as I could tell, the book defined probability in only one of two ways:

(1) Relative frequency assuming finitely many trials (page 13) and

(2) Bayesian prior intuitive belief (page 52).

No, here the book is doing readers a disservice.

(B) Exponential Distribution

The 'derivation' of the exponential distribution on page 37:

"I’ll start with the exponential distribution because it is easy to work with.
In the real world, exponential distributions come up when we look at a series
of events and measure the times between events, which are called inter-arrival
times. If the events are equally likely to occur at any time, the distribution
of inter-arrival times tends to look like an exponential distribution."

is a bit too vague.

(C) Random.

The book also uses 'random' too frequently, loosely, and unnecessarily.
Instead, essentially we can just drop the word 'random' and avoid questions
about what 'random' means and if some data is 'truly random'.

For (A), essentially universally in advanced work, there is only one approach:

We have a non-empty set of 'trials'. Then an 'event' is a subset of the set of
all trials. The set of all trials is an also an event. The set of all events
is closed under countable unions and relative complements. In particular, the
empty set is an event.

A 'probability' is a function P from the set of all events to the interval
[0,1]. For an event A, P(A) is its probability. The probability of the set of
all trials is 1.

P is 'countably additive': For countably infinitely many pair-wise disjoint
events A_i, for i = 1, 2, ..., the probability of the the union of the A_i is
the sum of the P(A_i).

A 'random variable' X is a function from the set of all trials to the real
numbers such that for any real number x the set of all trials w such that X(w)
<= x is an event. Usually we suppress the notation of the trials and just
write the event X <= x. This is our only use of 'random', and we give no more
definition of it.

In practice, essentially any number at all that we observe can be regarded as
a random variable. In particular, the number might have been something
observed about a person. The statement on page (52) that

"Anything involving people is pretty much off the table."

would be a silly foundation for probability.

For Bayesian and prior beliefs, can regard such a belief as an estimate of a
conditional probability where we condition on what we know. E.g., we know a
lot about people so that given a person, we can guess that with probability
over 99% the person is less than 10 feet tall. This point does not mean that
we have a 'Bayesian' foundation for probability. It's better just to drop
mention of 'Bayesian'.

The 'distribution' of a random variable X is the function F_X(x) = P(X <= x).

Full details are in any of, say,

Jacques Neveu, 'Mathematical Foundations of the Calculus of Probability',
Holden-Day.

Leo Breiman, 'Probability', ISBN 0-89871-296-3, SIAM.

M. Loeve, 'Probability Theory, I and II, 4th Edition', Springer-Verlag.

Kai Lai Chung, 'A Course in Probability Theory, Second Edition', ISBN
0-12-174650-X, Academic Press.

Yuan Shih Chow and Henry Teicher, 'Probability Theory: Independence,
Interchangeability, Martingales', ISBN 0-387-90331-3, Springer-Verlag.

Essentially what is going on is that we are defining P as a non-negative real
'measure' with 'total mass 1' as in any of:

Paul R. Halmos, 'Measure Theory', D. Van Nostrand Company, Inc.

H. L. Royden, 'Real Analysis: Second Edition', Macmillan.

Walter Rudin, 'Real and Complex Analysis', ISBN 07-054232-5, McGraw-Hill.

This approach to probability is the one used in essentially all advanced work,
e.g.,

Ioannis Karatzas and Steven E. Shreve, 'Brownian Motion and Stochastic
Calculus, Second Edition', ISBN 0-387-97655-8.

Jean-Rene Barra, 'Mathematical Basis of Statistics', ISBN 0-12-079240-0,
Academic Press.

Early in Neveu can see some ways we get essentially pushed into this
foundation for probability.

With this approach, for a random variable X, we define its expectation E[X] as
just its integral with respect to measure P in the sense of measure theory.
Then essentially by a routine 'change of variable' we can get E[X] as an
integral in terms of the distribution F_X and the measure it defines on the
reals.

With this approach, if, say, for some positive integer n we measure the height
of n people, then we say that we have the values of the n random variables

X_1, X_2, ..., X_n

and do not say that we have the values of n trials of one random variable X.

Indeed, with this approach, all our experience, all of the universe, is only
one trial.

Given a set of events, we can define what it means for the set to be
'independent'. And we can proceed similarly for random variables.

Then we can give a more careful statement of the central limit theorem and
also state the law of large numbers. We can also say why we use an average to
estimate an expectation, e.g., as in

Paul R. Halmos, "The Theory of Unbiased Estimation", 'Annals of Mathematical
Statistics', Volume 17, Number 1, pages 34-43, 1946.

On page 12 there is:

"The mean of this sample is 100 pounds, but if I told you 'The average pumpkin
in my garden is 100 pounds,' that would be wrong, or at least misleading. In
this example, there is no meaningful average because there is no typical
pumpkin."

No: An 'average' does not have to be 'typical' to be 'meaningful'. The
expectation of the weight X of a pumpkin in the garden remains meaningful. And
the law of large numbers will still apply.

For the exponential distribution, there is a good 'qualitative' derivation: If
we have arrivals where the inter-arrival times are stationary and independent,
then the inter-arrival times are independent and have the same exponential
distribution. Good details are early in the chapter on Poisson processes in

Erhan Cinlar, 'Introduction to Stochastic Processes', ISBN 0-13-498089-1,
Prentice-Hall.

Downey's book places a lot of emphasis on distributions, e.g., Weibull, and
descriptions, e.g., skewness. So an implication is, given some data, we should
try to find its distribution and various summary properties of it, but usually
this is unpromising. It is good to have the concept of a distribution but,
given some data, usually not good to try to find its distribution. Instead, we
get our results by manipulating the data in ways where we need to know little
or nothing about the distribution.

The strong emphasis on hypothesis testing is curious. There is more in:

E. L. Lehmann, 'Testing Statistical Hypotheses', John Wiley and Sons.

E. L. Lehmann, 'Nonparametrics: Statistical Methods Based on Ranks', ISBN
0-8162-4994-6.

Downey's book mentioned the chi-square distribution: The main point is that if
X_1, X_2, ..., X_k are random variables with Gaussian distribution with mean 0
and variance 1, then

Y = X_1^2 + X_2^2 + ... + X_k^2

has chi-squared distribution with k degrees of freedom.

Downey's book mentioned the variance of a finite population; while variance is
of high interest, the variance of a finite population is not. What is usually
wanted is an estimate of variance, and for that the formula in the book should
be dividing by n - 1 instead of n; this change yields an 'unbiased' estimate.

It would be good for students to get at least an intuitive understanding of
the most powerful hypothesis testing via the Neyman-Pearson result: Using an
analogy from real estate investing, regard the false alarm rate as money to be
spent and spend the money to get the best return on investment buy buying the
property with the highest ROI first, then the one with the second highest,
etc. Yes, in principal this process leads to a knapsack problem which is NP-
complete. Likely the nicest proof of the Neyman-Pearson result is from the
Hahn decomposition; this follows from the Radon-Nikodym result with a famous
proof by von Neumann in Rudin.

The mention of 'resampling' is curious: There are cases where we won't have
independent random variables but will have 'exchangeable' random variables,
and exchangeability can be enough.

One example is extending nonparametric tests to multidimensional data and
using it for anomaly detection in server farms and networks where
multidimensional data is ubiquitous, e.g.,

N. B. Waite, "A Real-Time System-Adapted Anomaly Detector", 'Information
Sciences', volume 115, April, 1999, pages 221-259.

