
Random – Probability, mathematical statistics, and stochastic processes (2015) - Agrodotus
http://www.math.uah.edu/stat/index.html
======
CoVar
Currently in a MS in Statistics program. This website is definitely on my
favorites now. I've been collecting class-contained resources before my start
of the program next semester. Here they are, in order of depth/difficulty of
the subject:

Stanford
[https://lagunita.stanford.edu/courses/course-v1:OLI+ProbStat...](https://lagunita.stanford.edu/courses/course-v1:OLI+ProbStat+Open_Jan2017/about)

CMU [http://oli.cmu.edu/courses/free-open/statistics-course-
detai...](http://oli.cmu.edu/courses/free-open/statistics-course-details/)

UCI
[http://ocw.uci.edu/courses/math_131a_introduction_to_probabi...](http://ocw.uci.edu/courses/math_131a_introduction_to_probability_and_statistics.html)

[http://ocw.uci.edu/courses/math_131b_introduction_to_probabi...](http://ocw.uci.edu/courses/math_131b_introduction_to_probability_and_statistics.html)

MIT [https://ocw.mit.edu/courses/electrical-engineering-and-
compu...](https://ocw.mit.edu/courses/electrical-engineering-and-computer-
science/6-041sc-probabilistic-systems-analysis-and-applied-probability-
fall-2013/)

Harvard
[https://projects.iq.harvard.edu/stat110](https://projects.iq.harvard.edu/stat110)

~~~
gsteinb88
If you're going to do any random processes, the graduate course (and
corresponding textbook) from EECS at MIT is great:
[https://ocw.mit.edu/courses/electrical-engineering-and-
compu...](https://ocw.mit.edu/courses/electrical-engineering-and-computer-
science/6-262-discrete-stochastic-processes-spring-2011/)

Textbook: [https://www.amazon.com/Stochastic-Processes-Applications-
Rob...](https://www.amazon.com/Stochastic-Processes-Applications-Robert-
Gallager/dp/1107039754)

------
cko
Before this becomes one of the many ignored bookmarks in my Favorites folder,
can anyone tell me how useful this resource is to a self-ascribed mathematical
lightweight as myself?

I took calc BC back in high school and now I'm 31. I realize to a certain
extent that many things I believe today are based on a rudimentary
understanding of probability theory. Aren't all clinical trials, scientific
studies, etc, validated by that 95% confidence interval? And I notice that
whenever something happens and my human nature wants draw a conclusion of
causality, that cold, rational part of my brain tells me to wait until I have
more data.

Anyway I'd love to get into what I feel might be the most fundamental
mathematical concept. Probability theory might be our modern day religion.

~~~
graycat
I looked at it quickly. It has some nice features: Nicely organized with
tables of contents, bios of the people mentioned, good references, usually
good intuitive introductions to the solid mathematics, some interactive
computing examples, good notation, etc.

I especially liked the chapter on sufficient statistics -- the best I've seen,
and in my experience not all professors of statistics know this material well.
IIRC there is a paper of E. Dynkin that shows that sufficient statistics are
not very _stable_ \-- I'm not sure yet that the OP covers this.

For your question, it's a good introduction and foundation for work in or that
uses probability, statistics, and stochastic processes. Of course, in each
case, especially the last two, there is more, e.g., stochastic optimal
control. And maybe not all the more recent work in resampling, what Leo
Breiman did (used in _machine learning_ , etc.), stochastic differential
equations, connections with potential theory, etc. are covered.

But, to answer your question, you sort of need to know what's going on, what
the _lay of the land_ is, and that's not so easy to see from the common
discussions. I'll try here:

Random Variables: A key, core idea is that of a _random variable_. So, go out,
observe something, get a number, go back. You now have the value of a _random
variable_ , call it X. Then X will have a _cumulative distribution_ : For real
number x, function F_X(x) = P(X <= x). Here F_X is supposed to be F with a
subscript X. So, we use F for cumulative distribution and put the subscript X
on it to indicate we're talking about the cumulative distribution of random
variable X. The cumulative distribution is simple -- just look at the P(X <=
x) part and see that as x increases, that thing grows, _cumulatively_. So,
right, as x increases from -infinity to infinity, F_X grows from 0 to 1, the 1
of _certainty_.

In the usual cases, X will have an average or _expectation_ , E[X], sometimes
-infinity or infinity but usually a finite number. Not all random variables
have an expectation -- some goofy, _pathological_ cases don't, but usually
don't encounter those in applications.

In nice cases, can take the calculus first derivative (slope) f_X(x) = d/dx
F_X(x), and that is the _probability density_ of random variable X. So, the
Gaussian bell curve is such a density.

Random variables -- that's the data you work with in probability, statistics,
and stochastic processes.

Foundations: For many decades, people had lots of heartburn over the
_mathematical foundations_ of probability theory. That was cleared up in 1933
by a paper by A. Kolmogorov, right "father of modern probability theory". Here
Kolmogorov used the more fundamental mathematics of _measure theory_ as the
mathematical foundations of probability theory. For some decades now, nearly
all the more serious work in probability, statistics, and stochastic processes
(call those PSSP) has been done using the measure theory foundations. But,
often don't need to see the foundations so don't need to confront measure
theory.

Measure Theory: You remember calculus, especially the integration part where
you find the area under a curve. You did this by partitioning the X axis and
getting tall, thin rectangles that under and over approximated the curve. Then
you let the width of the widest rectangle go small to zero and took the limit
of the areas of the rectangles, the common limit of the over estimate and the
under estimate, as the definition of the integral of the curve you started
with. Fine. Has worked great quite broadly in pure and applied math, science,
engineering, etc. Was invented by Newton but made more precise by B. Riemann
and others.

By about 1900, E. Borel and others saw some rough edges with the Riemann
integral and cleaned them up. The result is _measure theory_. Here really
_measure_ is just another name for simple old area (length, volume, etc.). The
main guy involved was H. Lebesgue, a student of Borel in France. The clean up
was important for some math theorem proving, e.g., about sines and cosines in
Fourier theory, now used in analyzing signals. The integral in measure theory
gives the same numerical values (answers) as the Riemann integral when both
integrals exist. But Lebesgue's work has some nicer theorems about
convergence. "Exist"? Right, it's easy enough to cook up pathological
functions where the Riemann integral does not exist, usually because the areas
of the outer and inner rectangles don't converge to the same number. Okay, the
first, obvious example is the function that is 1 at each rational number and 0
otherwise. Right, it's pathological.

In simplest terms, what Lebesgue did was do the partitions on the Y axis
instead of the X axis. So, right, Lebesgue's partitions resulted in, first,
cut, horizontal rectangles instead of vertical ones. For a curve that is
positive, Lebesgue's rectangles only underestimate -- he drops consideration
of the over estimation. So, as you will notice, Lebesgue's rectangles get
chopped up as the curve goes below some of the horizontal rectangles. For when
a curve goes negative, Lebesgue treats that separately and, then, subtracts
from the positive part. Still, not every curve has an integral -- if both the
positive and negative parts have infinite area, then, nope, Lebesgue defines
no answer. Still, Lebesgue's approach is better. Really if avoid some absurd
uses of infinity, its darned tricky to come up with a function that does not
have a Lebesgue integral. And what Lebesgue did also assigns area in a
consistent way to more subsets of the real line; a subset of the real line
that does not have a Lebesgue measure is really tricky, e.g., the usual
examples need the axiom of choice. Net, Lebesgue's stuff is powerful, nicely
better than what Riemann did. But for where the Riemann integral is working
well, no reason to bother changing to Lebesgue.

PSSP: Well, since Lebesgue is partitioning on only the Y axis, he doesn't
partition on the X axis and the X axis, that is, the domain of the function,
can be something really abstract, very general, where would have no way to
partition the darned thing. Okay, presto, bingo, call the X axis part, the
domain of the function, an _abstract measure space_ , and use Lebesgue's
integral to integrate the function.

For the abstract measure space, just need a definition of area ( _measure_ ).
On the real line, Lebesgue usually used just ordinary length as the _measure_
(or the start of his definition of measure).

Okay, for PSSP and random variables: Have a set of _trials_. One of these
trials is when you do an experiment and observe values of random variables. A
set of these trials, an _event_ , has an area, a measure, a _probability_. So,
right, probability is just an area or Lebesgue's idea of area. Then apply
Lebesgue's work and get the integral of a random variable, and that's its
expectation, E[X], its average.

As you will see in the OP, the set of all events is assumed to be a _sigma
algebra_ \-- that's so that you can also consider, as you want to, for events
A and B, event A and B, event A or B, event not A, etc.

So, net, you went for a walk in the basement with Kolmogorov and Lebesgue to
come up with a mathematical foundation for events, probabilities, random
variables. and expectation E[X].

Now you know.

The rest of the OP is just ordinary PSSP, and a nice treatment. So, you can
skim the measure theory foundations, or study them carefully if you wish, and
then move on to the more ordinary parts.

Yes, some of the more advanced parts touch on the measure theory foundations.
The measure theory stuff is good; we wouldn't want to be without it; sometimes
even for applications it's good; but for day to day in PSSP mostly we don't
see the measure theory part.

Secret Situation: For that _trial_ , it turns out that we are assuming that in
all the universe we see only one trial. Most of the elementary approaches to
PSSP like to regard each observation as a _trial_ \-- if think a little too
much, then that approach doesn't work very well.

For the measure theory foundations, for a positive integer n, a _sample_ of
size n, that is, what we usually average to estimate the expected value, is
not some n _trials_ but n random variables X_1, X_2, ..., X_n that, maybe, are
independent and have the same distribution (independent and identically
distributed, i.i.d.).

Uh, should insert here, the sigma algebra stuff gives us a super nice
generalization of independence -- we can define what it means for infinitely
many random variables to be independent. How? Briefly, using the inverse
functions of the random variables, get some sigma algebras of events, and then
work with the elementary definition of independent events, e.g.,

P(A and B) = P(A)P(B)

This generalization of independence gets to be crucial for stochastic
processes.

The measure theory approach gets more serious when considering sufficient
statistics, a neglected subject, and stochastic processes.

When you get a little farther into PSSP, you will find that there are two
biggies -- independence and correlation (really a cosine and much the same as
inner product, that is, in physics, dot product, and co-variance). If random
variables X and Y are independent, then they have correlation (if it exists),
inner product, and co-variance 0. Those concepts are biggies because typically
what we do is observe some random variable X and try to use it to say
something about some random variable Y we don't have (e.g., what Google's
stock will be selling for tomorrow). If X and Y are independent, then X is
never any help at all, ever. Otherwise we have a shot.

Generally we are trying to use a sequence of random variables to approximate
what we want. So, we care about how a sequence can converge to what we want.
The OP discusses the important cases of convergence -- the most important case
is convergence in L^2 or co-variance or mean-square or least squares. Right,
in that case often get a generalization of the Pythagorean theorem.

Statistics: The shortest description of statistics is that we observe some
random variable X, manipulate it with some function u, and get result random
variable U = u(X) which hopefully approximates something we want to know. If
E[U] is exactly the right value of what we want to know, then the _statistical
estimator_ u(X) is _unbiased_. Can also consider minimum variance, maximum
likelihood, etc. So, here consider the _quality_ of our statistical
estimation.

Hopefully this introduction will let you make use of the OP. The elementary
stuff there will be fast and easy for you. Full understanding of all of that
material would be about three semesters of a graduate course taken three times
-- no joke. Take the course just once and can say that you have "seen it".

Nearly all the OP's important references are quite old -- this material has
not changed much in decades. Then, other treatments, also decades old, that
should be helpful include the famous texts by Neveu, Breiman, Chung, and
Loeve. For the measure theory background, texts by Rudin and Royden are
standard. So, you can learn a lot of analysis, functional analysis, Banach and
Hilbert spaces, Fourier theory, etc. Biggie connection: The set of all L^2
random variables form a Hilbert space -- amazing, astounding, powerful,
valuable, and true.

~~~
rayuela
Without looking at your name it only took me about 3 paragraphs to figure out
who had written this. I'm glad this particular post drew one of your long
responses.

------
wodenokoto
Can anyone comment on how this compares to _MIT 6.041x Introduction to
probability_ [1]

I started that course a few years ago, but never finished, but really liked
it. Wondering if they cover the same, if I should do both, or one instead of
the other.

[1]
[https://courses.edx.org/courses/MITx/6.041x/1T2014/info](https://courses.edx.org/courses/MITx/6.041x/1T2014/info)

------
dejawu
Oh, awesome, I'm in the equivalent of this class right now.

Hopefully this means I can pass...

------
nicklaf
This book makes for a nice example of embedding applets and hyperlinked
definitions within the prose of the text.

(Didn't look too carefully at the content, but it looks good too.)

------
j605
This is a wonderful resource and I come back to it for some courses quite
often.

------
bladecatcher
this is one of the most comprehensive treatments of the subject I've found.
Also works great as a reference/hand book

------
JamesUtah07
Any chance there is a PDF version of this?

~~~
wodenokoto
Given that it contains many interactive elements, I would assume not.

------
anacleto
Just love it.

------
kensai
"Technologies and Browser Requirements

This site uses a number of advanced (but open and standard) technologies,
including HTML5, CSS, and JavaScript. To use this project properly, you will
need a modern browser that supports these technologies. The latest versions of
Chrome, Firefox, Opera, and Safari are the best choices. The Internet Explorer
and Edge browsers for Windows do not fully support the technologies used in
this project.

Display of mathematical notation is handled by the open source MathJax
project."

I've tried the interactive examples also in Safari Technology Preview and
Firefox Developer Edition and work ok. :)

------
ice109
i used this as a resource all through MS while working on a stats second
major. it's a very high quality set of notes at the level of Casella Berger
(but more complete since it includes measure theory).

