Hacker News new | past | comments | ask | show | jobs | submit login
The Probability and Statistics Cookbook (vallentin.net)
182 points by tel on Sept 16, 2012 | hide | past | web | favorite | 16 comments



This covers a lot of ground, and is quite accurate and balanced. Very nice work. Kudos to the author.

I noticed a couple of things.

The graph of the F distribution on page 8 is mislabeled as chi-square.

The sets Ai must be disjoint in the law of total prob. and Bayes rule on page 6.

In section 5 on page 7, "variance of sum equals sum of variances" certainly does not imply ("iff") independence. I'm not positive it implies uncorrelated, although it certainly might. The safe thing is "variance of sum equals sum of variances" if uncorrelated. Uncorrelated is usually abbreviated with an inverted T (reminiscent of "orthogonal", although that abbreviation is not introduced in these notes). The inverted capital Pi used here means independence.

A small typo: the Strong Law of Large Numbers is mis-abbreviated, it is the SLLN (sec. 10.1).

And, neither the WLLN nor the SLLN requires Var(X1) < Infinity. They just need finite first moment ("E[X1] exists finite.") This is not an error in the notes, it's just that the result holds in more generality than is stated there, and the lack of need for a second moment shows the strength of the result (i.e., if the mean exists, you always get convergence to it, end of story). (This is in Billingsley's book, or Durrett's book, or also in http://www.math.ust.hk/~makchen/Math541/Chap1Sec7.pdf as Thm 1.7.)

Also, one omission: Brownian motion in Stochastic Processes (sec. 20). Since Poisson processes and Markov processes are there, it would make sense to have one continuous process. ("random walk" gets a couple of bullet points in sec. 21, but it's not the right place nor the right position.) All you need to define B.M., or a gaussian random process for that matter, is that B.M. is the continuous process with independent increments characterized by:

  X_0 = 0
  X_t - X_s ~ N(0, t-s) for t > s
Sec. 20.0 might also be a good place to introduce Kolmogorov's extension theorem (http://en.wikipedia.org/wiki/Kolmogorov_extension_theorem), since it is such a powerful result, is easy to state, and explains the centrality of finite-dimensional distributions.


Variance of sum equals sum of variances is not sufficient for pairwise uncorrelated for more than 2 random variables. Simplest counterargument is any Cov(A,B)=-Cov(B,C)!=0 for three variables.

Closest results I can think of to the "iff" they're getting at:

If sum var=var sum for all linear transformations of the individual variables (i.e. X_i -> a_i*X_i), that's sufficient for pairwise uncorrelated;

If sum var=var sum for all transformations of the variables for which variances exist (i.e. X_i -> f_i(X_i)), that should be sufficient for independence (but I don't think that's an easy proof and maybe I'm missing technical conditions).


You're right, there's no way that could work for >2 variables.


Thanks for feedbach, this is much appreciated!

I fixed the obvious bug in the most recent version. Here are some comments:

- The Ai in the Law of Total Probability indeed have to be disjoint. I indicated this by using a squared union symbol, though have not introduced this notation elsewhere. There are several implicit notation assumptions throughout the cookbook. For consistency reasons, I'll either address all of them or none.

- I demoted the wrong equivalence to an implication in the sum of variances. I purposefully did not write "if Cov[X_i, x_j] = 0" because that is obvious from the statement above.

- I simply removed the unnecessary finite variance restriction from the LLN discussion. With the notation E[X_i] = mu I mean to imply E[X_i] < infinity, I hope this is clearer now.

- The Stochastic Processes is section has a very narrow focus. Indeed, it would benefit from further extesion. At this point, I unfortunately do not have the cycles to add new content myself, but feel free to do so by submitting a pull request.

- Similarly, if you find a consistent way of integration Kolmogorov's extension theorem, I'd be happy to merge a pull request. However, note that I have not yet introduced the notion of a measure in the cookbook, which appears to be necessary ingredient of the theorem argumentation.


If anyone is reading this and wants to know what it means, this is the best resource I have found for people with some math knowledge who want to develop a practical knowledge of statistics. http://www.itl.nist.gov/div898/handbook/


Also, if you are interested in learning R while you build your knowledge of probability and statistics this is pretty good: http://ipsur.org/


I also found StatSoft's statistics textbook to be useful:

http://www.statsoft.com/textbook


This really is a fairly thorough overview of an undergraduate statistics degree. It reminds me a lot the summaries I would write before exams. The first few sections covering probability are especially good. Though some of the later sections do simplify whole subjects a bit much. For example, an introduction to stochastic process (http://www.ma.utexas.edu/users/gordanz/notes/introduction_to...) will cover more about state classification and absorption.


Holy crap, I have a statistics exam on Wednesday and I was looking for something like this...Talk about lucky, thanks!


The pages of the pdf are too wide to fit comfortably on my screen. Any way I could fix it (like modify the LaTeX) besides buying a bigger monitor?


Probably a command called \landscape in the header. Look for that, then remove. LaTeX's default is A4 portrait, if I recall correctly. The source is on Github the website said.


I wouldn't call it a cookbook.

It is an (unusually large) cheatsheet.


John D Cook's blog post actually inspired me to rename this work from cheatsheet to cookbook, as it outgrew the form factor of a classical printable cheatsheet.

http://www.johndcook.com/blog/2010/10/04/probability-and-sta...


A cookbook has recipes. If this is a cookbook, where are the recipes? :-)

Such a concept might make sense in this context. e.g. Recipe to calculate p-values. How to fit various distributions. A recipe to calculate posterior probabilities, etc.


Thanks for posting this. It's most helpful!


many thanks! this looks great and useful.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: