
Ask HN: What are your favorite statistics and probability textbooks? - newguynewguy
I took stats in undergrad, but it was a very rudimentary &quot;push x sequence of buttons on your calculator in y situation&quot; ordeal and left me with less applicable knowledge than I&#x27;d like.<p>Since sometime after graduation, I&#x27;ve taken up serious study of higher-level maths as a hobby. I think it will be most useful in my career if I also have a strong grasp on probability and statistics.<p>[edit] The suggestions so far have been great, but it occured to me to add that I&#x27;m working through Spivak and Apostol right now with Dedekind&#x27;s essays on the side. Hopefully that gives an idea of the tone&#x2F;rigor I&#x27;m after. Answers to problems is also ideal.
======
dwohnitmok
As a first text, I really like Introduction to Probability by Blitzstein and
Hwang. It's not quite as mathematically sophisticated as some of the other
treatments (e.g. you're not going to see the sigma-algebra treatment of events
over a sample space), but it has a very heavy focus on building good
statistical and probabilistic intuition which I think is invaluable for
studying statistics in and of itself rather than statistics as a subset of
measure theory. The book imparts a good knack for being able to quickly glance
at a statistical problem and see "oh it's one of those kinds of problems," or
"hmmm looks like there's a good symmetry argument [or other general approach]
I can use here," which continues to reap rewards in more advanced and
specialized statistical studies.

There are accompanying lectures, problems, and accompanying solutions on its
website
[https://projects.iq.harvard.edu/stat110/home](https://projects.iq.harvard.edu/stat110/home)

------
ArtWomb
I think you will enjoy Efron & Hastie's Computer Age Statistical Inference:
Algorithms, Evidence and Data Science.

[https://web.stanford.edu/~hastie/CASI/](https://web.stanford.edu/~hastie/CASI/)

MIT 6.008 Introduction to Inference also goes over many of the classic
problems such as Cookie Jar, Monte Hall, Fair Dice, One-Armed Bandits, etc.
with tons of labs.

I find this stuff easier to grok by just writing code and running a
simulation. To that end, I'd setup an Anacondas environment and start
experimenting with PyMC3. As your intuition suggests, there's no downside to
mastery of probabilistic programming. Best of luck!

[https://docs.pymc.io/](https://docs.pymc.io/)

~~~
kgwgk
I think Efron & Hastie may be a bit too advanced and terse for the OP. They
cover many things in not so many pages and "our intention was to maintain a
technical level of discussion appropriate to Masters’-level statisticians or
first-year PhD students."

But given that they distribute the PDF for free it's worth checking out.
Hastie, Tibshirani & Friedman's The Elements of Statistical Learning and the
watered-down and more practical Introduction to Statistical Learning are also
nice. All of them can be downloaded from
[https://web.stanford.edu/~hastie/pub.htm](https://web.stanford.edu/~hastie/pub.htm)

~~~
achompas
Elements of Statistical Learning is the other text I came in here to
recommend.

One of my most valuable activities in grad school was printing and studying
each chapter of EoSL.

It's a comprehensive text on the fundamentals of statistics and machine
learning, a solid foundation for the cutting-edge techniques relying on deep
learning and reinforcement learning.

------
ilzmastr
You should state your goals and timeline. There is a lot of deep stuff that
can be ignored by many... Often there is a good 1st book, another for 2nd, for
3rd, and hopefully not many more after that.

It also depends what community you ask, engineers vs mathematicians vs actual
statisticians. So the community you want to be a part of is the one you should
ask. It is probably too much of a melting pot here to get one of these
flavors.

In my past I’ve looked at courses I wanted to take, and then looked at the
chain of prerequisites and books used there, hws assigned, and did those.
Nothing beats talking to the teachers who you think are good. Alternatively
you could have colleagues who you like, and whose history you want to emulate.

Having a specific goal is important. Egregious amounts of time can be spent in
many corners of probability theory and statistics. Especially in the case that
steps are taken in the wrong order. Trying to do Billingsley without first
taking real analysis would be crazy. And maybe studying from Billingsley is
crazy for most people’s goals to begin with.

That said, I’ve enjoyed reading the suggestions here.

------
eigenvalue
Feller Vol. 1 is considered the classic text for discrete probability:
[https://epubs.siam.org/doi/10.1137/1011021](https://epubs.siam.org/doi/10.1137/1011021)

Volume 2 is easily found as a pdf on google, but it's much harder.

~~~
graycat
Both Feller volumes have some good stuff. E.g., volume 2 has the renewal
theorem -- roughly, arrivals from many independent sources look like a Poisson
process, i.e., times between arrivals are independent identically distributed
random variables with exponential distribution where the parameter in the
distribution is the arrival rate. E.g., expect that between 1 PM and 2 PM,
arrivals at a popular Web site will be Poisson, and might use that for
capacity planning or anomaly detection.

But as my main probability prof summarized, an unguided tour of Feller is not
promising. I believe he was correct. Look elsewhere for the first intuition,
common applications, or the high quality math, e.g., based on sigma algebras
and the Radon-Nikodym theorem, the limit theorems -- laws (weak and strong) of
large numbers, central limit theorem (the strong Lindeberg-Feller version,
irony here), the ergodic theorem, martingales, Skorohod representation,
Kolomogorov extension theorem, Lebesgue decomposition, etc.

------
geokon
A great starting book that helps build a good intution is "An Introduction to
Error Analysis" by John Taylor (of Classical Mechanics fame). Basically if you
are taking any kind of measurements and need to compose them intelligently
this is where to start. From sig-figs to chi squared. A must read for any
starting researcher

My experience otherwise with stats books has been horrendous.. (havent found a
good "medium level" book to move on to yet)

------
kgwgk
Lindley’s Understanding Uncertainty

Jaynes’s Probability Theory: The Logic of Science

MacKay’s Information Theory, Inference, and Learning Algorithms
([http://www.inference.org.uk/itila/book.html](http://www.inference.org.uk/itila/book.html))

~~~
ReadEvalPost
Jaynes is a delight to read if you like an opinionated style over the more
usual dry prose of other textbooks.

------
johnmyleswhite
All of Statistics by Larry Wasserman is the book I recommend for anyone who
knows calculus and linear algebra and wants to learn statistics.

~~~
ellisv
Came here to post this and “Advanced Data Analysis from an Elementary Point of
View“ by Cosmos Shalizi

~~~
madcaptenor
I just wish this would come out in paper book. It’s excellent but I have
difficulty reading math books online.

~~~
mkl
Supposedly in process. From the book's website: "The book is under contract to
Cambridge University Press; it should be turned over to the press [...] before
the end of 2015." But it sounds like that was after a few delays already, so
maybe there have been more.

------
halhen
I'm not well rounded enough to draw a clear path from where you are. For me
Gelmans Data Analysis Using Regression and Multilevel/Hierarchical Models [0]
drove home many, many points. More recently, I have a sense/hope that Pearl's
The Book of Why [1] might take this to yet another level.

[0]
[http://www.stat.columbia.edu/~gelman/arm/](http://www.stat.columbia.edu/~gelman/arm/)
[1] [https://www.amazon.com/Book-Why-Science-Cause-
Effect/dp/0465...](https://www.amazon.com/Book-Why-Science-Cause-
Effect/dp/046509760X)

~~~
Karrot_Kream
Pearl's landmark rigorous book is Causality

------
mhneu
If you're working through Spivek and Apostol, get Feller v1 and v2. Same level
of rigor and emphasis on intuition. (If you're having fun after that, pick up
Gallager's book on stochastic processes. Similar approaches, intuition, and
focus on discrete probability to skip the metric theory complications of
continuous dists.)

Another very good book is Bertsekas and Tsitsiklas, Introduction to
Probability. This book will also give you some intuition.

I've heard Grimmitt recommended but I think B&T is better.

You could also start with Papoulis (a stochastic processes classic book, but
it does the intro probability too.)

~~~
abstrakraft
In my experience, those who grew up with Papoulis, 2nd edition loved it. I
grew up with Papoulis, 3rd edition, and find the layout and typesetting to be
among the worst of any book I've ever read, bad enough to significantly affect
usability and ability to quickly find things.

~~~
mhneu
Yeah, I personally don't love Papoulis. But it has been regarded as the
standard stochastic processes text.

------
dyukqu
Anyone has anything to say about The Lady Tasting Tea[0]? I'd like to
hear/read your thoughts about this book. (I came across it on the comment
section of John D Cook's blog post[1] years ago but haven't had a chance to
read it yet).

[0]
[https://en.wikipedia.org/wiki/The_Lady_Tasting_Tea](https://en.wikipedia.org/wiki/The_Lady_Tasting_Tea)

[1] [https://www.johndcook.com/blog/2013/01/12/elementary-
statist...](https://www.johndcook.com/blog/2013/01/12/elementary-statistics-
book/)

~~~
mindcrime
I enjoyed it. I read it when I knew almost nothing about statistics, and found
that it motivated me to learn more. And it's frankly an interesting /
entertaining read in its own right. It's not a textbook, so don't expect to
come away knowing a lot of stats after reading it (unless you already do!),
but I'd say it's worth reading.

------
stewbrew
I personally like Kruschke's Bayesian Data Analysis very much. It does a
wonderful job in introducing readers to Bayesian statistics -- a good first
text book for Bayesian statistics.

For a more classic approach to statistics, I also liked Builder's Analysis of
categorical data. It's focused on categorical data but does a good job in
explaining what's going on. A good introductory book.

Both books are well written and really try to explain stuff. IMHO they are two
examples of good didactics in statistics.

If you're into ML you might want to choose something else though.

~~~
kgwgk
Another nice intro to Bayesian statistics is McElreath’s Statistical
Rethinking: A Bayesian Course with Examples in R and Stan. There are also
recordings of the lectures based on the book.
[http://xcelab.net/rm/statistical-
rethinking/](http://xcelab.net/rm/statistical-rethinking/)

------
a_bonobo
I GREATLY enjoyed Intuitive Biostatistics by Motulsky.

It assumes that you're not going to calculate any test statistic on your own,
you're going to use Excel, R, or his own software. Therefore the book contains
little to no formulas. Instead, it's plain English going over (biology's) most
common tests and attributes, and explains what the test assumes about your
data, what you can and cannot infer from the output, and what the pitfalls
are. VERY useful if you like me come from a 'push X button no clue why'
background.

------
anonymouslee
I've been enjoying the struggle through Statistical Inference by George
Casella as a foundational text. This was the introductory text for stats
graduate students at Arizona State University a few years back.

The Mathematical Methods of Statistics by Harald Cramér is also excellent.

~~~
achompas
+1 to this, I'm a big fan of Casella. I personally think it's most useful as a
reference, moreso than a textbook.

------
jamessantiago
I'd recommend almost anything by Andy Field. He has a clear cut approach with
a bit of comic wit to make the reading more enjoyable for hobby reading. His
book on SPSS was fantastic, but may be a bit too specialized for what you're
looking for. Check out discovering statistics:

[https://www.discoveringstatistics.com/](https://www.discoveringstatistics.com/)

------
madhadron
For probability, you need measure theory first. That's also when discrete and
continuous methods unify, so it's an amazing edifice.

Statistics has four major branches: inference, exploratory data analysis,
experimental design, and visualization.

Your question is probably mostly asking about inference. I like the first few
chapters of Kiefer's book to get the context of decision theory. Then you're
going to have to wander far and wide, but there have been some good
recommendations. Since you're digging into Dedekind on the side, you might
like to go back to some of the classics here as well: Wald's "Statistical
Decision Functions" and Savage's "Foundations of Statistics."

For exploratory data analysis, Tukey's book "Exploratory Data Analysis"
remains the place to start, if you can find a copy that doesn't cost a
fortune. Casella wrote a book on experimental design that's solid. For
visualizations Wilkinson's "Grammar of Graphics" and Tufte's books are the
usual recommendations.

~~~
mhneu
No no on measure theory. See below on Feller, Bersekas, Gallager. I think it's
best to learn discrete prob first and skip the difficult math until you decide
to become a mathematician or really need it for e.g. finance work.

One other objection: yes, those are fields in statistics. But statistics as a
field is MUCH broader than departments called statistics. A ton of stats is
done in engineering, psychology, and economics (econometrics). In fact I'd say
the major research in stats for the past few decades has been done outside of
"statistics" as a field. But yes, the books by Tukey, Tufte, and Casella are
good.

If the OP is asking about statistics and probability needed for machine
learning, he wants to focus on engineering statistics, like: estimation,
stochastic processes, and filtering and mathematical statistics. The Wald rec
is good and complements Feller.

~~~
madhadron
The OP is reading Spivak and Dedekind and makes no mention of machine
learning. He's interested in the deep mathematics.

------
Boulth
Statistics by Freedman et al. All exercises use real data as cases, I learned
a lot more from them!

[https://www.amazon.com/Statistics-4th-David-
Freedman/dp/0393...](https://www.amazon.com/Statistics-4th-David-
Freedman/dp/0393929728)

~~~
krnsll
The sequel to this book -- Statistical Models: Theory and Practice -- and it's
accompanying (well, this is how I was encouraged to read them) text --
Statistical Models and Causal Inference: A Dialogue with Social Sciences --
are excellent for intuition building and accumulating a number of examples for
personal reference.

~~~
Boulth
Wow, I didn't know there was a sequel, oh what a pleasant surprise!

Statistics already had a lot of useful random trivia and left me with an
impression that this is how a good student's resource should look like (my
university had a really bad stats course so I had to learn stats myself).

But now you say there is more? Thank you random internet stranger!

------
stakhanov
"Theory of Probability" by Bruno de Finetti is the best text on probability
ever written.

~~~
kgwgk
"A book destined ultimately to be recognized as one of the great books of the
world" according to the foreword by Lindley included in the English
translation. "Probability is a description of your (the reader of these words)
uncertainty about the world. So this book is about uncertainty, about a
feature of life that is so essential to life that we cannot imagine life
without it."

------
doomjunky
The Probabilistic Method by Noga Alon and Joel H. Spencer.

I took a seminar on the probabilistic method and my advisor recommended me to
read this book. This book is for scientists. It lays out the foundation of the
probabilistic method and a theoretic background.

------
cosmosa
I'm in the same boat as you. I started studying advanced machine learning/AI
concepts, and understand the intuition just fine, but I feel like i'm grasping
around in the dark without a good foundation in statistics. Currently reading
this online book and watching lectures:
[https://www.stat.berkeley.edu/~stark/SticiGui/Text/toc.htm](https://www.stat.berkeley.edu/~stark/SticiGui/Text/toc.htm).
The material is stated quite simply without advanced math, but it actually
helps a lot to understand concepts better.

~~~
cosmosa
Great!

------
jackgavigan
_Against the Gods_ by Peter L Bernstein, _Chaos_ by James Gleick, and _The
(Mis)Behaviour of Markets_ by Benoit B Mandelbrot and Richard L Hudson.

~~~
kgwgk
In this less mathematical, more "pop science" vein I remember I enjoyed The
Flaw of Averages by Sam Savage (who happens to be the son of Jimmie Savage)
but I read it many years ago and I'm not sure I would still recommend it.

------
skittleson
Thought this book had good detail without going too deep for some dev work I
was doing: "Practical Statistics for Data Scientists: 50 Essential Concepts"
[https://amzn.to/2MUmZt7](https://amzn.to/2MUmZt7)

------
bjornsing
“Probability theory: The logic of science” by Jaynes is an amazing book on
Bayesian statistics.

------
atroyn
Somewhat different to the suggestions so far, but Thrun et. al's
'Probabilistic Robotics' is a very good applied probability text, with a focus
on physical systems.

Plenty of worked examples and problem sets as well.

------
fusiongyro
Kolmogorov's _Foundations of the Theory of Probability_ is pretty good, IMO. I
haven't made it very far, but he has great examples, plenty of rigor, and the
book is like $10.

------
VBprogrammer
I've been watching this series recently. It's got quite a bit of statistics,
mixed with some code and real world applications. It's certainly refreshed my
memory on some things; some of them I'm not sure I really appreciated first
time round.

[https://youtu.be/C1lhuz6pZC0](https://youtu.be/C1lhuz6pZC0)

It's probably not as rigourous as you mention in your edit but I'll leave it
here in case someone else is drawn to the title.

------
yogeshp
Introduction to Probability, 2nd Edition by Dimitri P. Bertsekas, John N.
Tsitsiklis. It is textbook for EECS Probability course at MIT and very well
written for beginners.

Additionally, there are video lectures available on MIT OCW which mirror this
book closely.

[https://www.amazon.com/Introduction-Probability-2nd-
Dimitri-...](https://www.amazon.com/Introduction-Probability-2nd-Dimitri-
Bertsekas/dp/188652923X/)

------
_eigenfoo
If you're looking for just mathematical statistics, I liked Hogg McKean and
Craig's "Introduction to Mathematical Statistics" (4th edition was much better
than the ones after it, imo).

But if you're looking to learn prob/stats for applications to ML, most ML
textbooks have a chapter or two reviewing the relevant stuff. I liked the
first two chapters of Bishop's PRML for that.

------
timwaagh
Ronald Meesters book A natural introduction to probability is pretty good. i
had it as an introductory probability theory text as an undergrad. its not too
long and leaves you with a pretty good intuition of 'basic' probability
theory. it does prove things but is not based on measure theory. id recommend
searching for it as those big textbooks everyone is using are generally worse.

------
nabla9
With your background it's unclear if you want to statistics or mathematical
statistics style book.

These are two very different approaches. Mathematical statistics goes trough
measure theoretic probability, it connects statistics and mathematic and
unifies the discrete and the continuous cases into a coherent whole.

Examples of both

\- All of Statistics: A Concise Course in Statistical Inference by Wasserman

\- Mathematical Methods of Statistics by Cramer

------
DoreenMichele
_How to lie with statistics

The Cartoon Guide to Statistics_

(Don't laugh, it's a serious tome. Past chapter 2, it's not introductory level
info.)

~~~
sitkack
Darrell Huff is hilarious.

[https://en.wikipedia.org/wiki/Darrell_Huff](https://en.wikipedia.org/wiki/Darrell_Huff)

------
vinchuco
FWIW here is a list of texts used in a data science conferene as references in
presenter's slides (Cassella Berger was mentioned here by others, Hastie
Tibshirani is also good).
[https://hastebin.com/raw/upiqumenas](https://hastebin.com/raw/upiqumenas)

------
Spooky23
Not a textbook, but I found “Statistics in a Nutshell” to be a good refresher
for my long dormant stats knowledge.

------
ziotom78
I found Wilcox's "Fundamentals of Modern Statistical Methods" extremely
interesting: it explains the pros and cons of classical statistics with means
and standard deviations. And all of this list carefully explained through
examples and real-world cases.

------
cfusting
Probability and stochastics by Erhan Cinlar is a modern, measure theoretic
approach to probability.

------
apathy
Fun: [http://xcelab.net/rm/statistical-
rethinking/](http://xcelab.net/rm/statistical-rethinking/)

Rigor: Feller vol 1, Casella & Berger 2e

------
yantrams
I'd recommend Feller for Probability seeing how you are interested in rigor
and completeness. For Bayesian approach, I'd suggest Probability Theory - The
Logic of Science by E.T.Jaynes.

------
placebo
I've heard good things about Lev Tarasov's "The World is Built on
Probability", but haven't dived into it so can't offer my own opinion

------
daikonraidish
All of statistics by wasserman, and then all of nonparametric statistics.

Since you’re also interested in the philosophy side of things, definitely
suggest Judea pearl too!

------
singingfish
Hosmer, Applied Logistic Regression.

Lots of applied stats problems are based on categorical data, and this book
really helped me to understand the theory better.

------
soniman
This isn't what the OP is looking for but Introduction to Probability with
Texas Hold ’em Examples is a fun intro to probability.

------
adamnemecek
Shafer & Vovk: probability and finance. It bridges this gap for me that I
didn’t quite understand.

------
graycat
How I learned: Early in my career, I was around DC doing mostly work in
applied math and computing for US national security. No joke -- constantly the
work was heavily probability, statistics, and stochastic processes. I had a
good ugrad math major but no courses in any of those three subjects. So I was
thrown into the deep end of the pool and was constantly struggling to
understand. I did pick up a good overview and a lot of intuition. But the
sources varied widely, in both the topics and the quality, over stacks of
books and papers, documentation of software, etc.

Lesson: At least at first, one way to learn is just to jump in at the deep end
and struggle using lots of texts, references, etc.

Sad Lesson: While nearly all the famous books were good, some of the books
that, e.g., from the publisher, might have seemed good were not. The guy who
wrote the stuff, I hope he got tenure -- can't be any other reason.

Later I got an applied math Ph.D. and had a terrific course in _analysis and
probability_. So, the analysis part was, right, basically Royden, _Real
Analysis_ and the first half of Rudin, _Real and Complex Analysis_. There was
also some material from Oxtoby, _Measure and Category_ (ice cream and cake
dessert -- super fun stuff).

The probability was right from the beginning sigma algebras, etc. So, a
central topic was the Radon-Nikodym theorem and conditional expectation --
gorgeous once see it. So, there was beautiful coverage of the classic limit
theorems, especially martingales.

Best course of any kind I ever took in school. The prof was a star student of
E. Cinlar, long at Princeton.

For the course, the main texts in probability were from J. Neveu, L. Breiman,
K. Chung, M. Loeve.

For statistics, for the applied stuff, I just remember the stacks of books I
worked with early on, especially multivariate statistics. For the math, I just
regard that as applied probability and sometimes just do my own derivations,
sometimes at least a little new. I never found a statistics book I like or can
recommend as the _single_ , main book, e.g., like Rudin in analysis or Neveu
in probability. All I can suggest is just to dig into the stacks of the most
famous books and also glance at some of the software documentation.

I suspect that there is a really good statistics book to be written, and maybe
someone has written it, or is writing it, but I haven't seen it.

Here is a simple derivation I typed in yesterday with an intuitive result in
statistics that maybe people should keep in mind. In a sense this little
derivation shows the strongest possible result in statistical estimation is,
and may I have the envelope please [drum roll], and the discrete data version
of the winner is just cross tabulation, assuming that have enough data.

The context is a person applying for credit. Might proceed similarly for, say,
ad targeting, etc.

We assume that Y is a real valued random variable where E[Y^2], that is, the
_expectation,_ of Y^2 is finite -- meager assumption, especially for practice.

The Y is something about _credit worthiness,_ e.g., loss on a loan, we are
interested in.

We assume that X is a random variable taking possibly very general values,
e.g., a credit history at uncountably infinitely many points in time in the
past. We assume that we have the value of X -- that's our credit data on the
person.

Let's do a little preliminary derivation: What value of real number a
minimizes

E[(Y - a)^2]

Well, we have

E[(Y - a)^2]

= E[Y^2 - 2 Ya + a^2]

= E[Y^2] - 2aE[Y] + a^2

= E[Y^2] + E[Y]^2 - 2aE[Y] + a^2 - E[Y]^2

= E[Y^2] + (E[Y] - a)^2 - E[Y]^2

which we minimize with a = E[Y].

Or, for one interpretation, the minimum rotational moment of inertia is for
rotation about the center of mass.

So, for our main concern, suppose we want to use the data we have X to
approximate Y. So, we want real valued function f with domain the possible
values of X so that f(X) approximates Y.

For the most accurate approximation, we want to minimize

E[(Y - f(X))]^2

Claim: For f(X) we want

f(X) = E[Y|X]

So, f(X), using X, is the best non-linear least squares approximation to Y.

Proof:

We start by using one of the properties of conditional expectation and then
continue with just simple algebra:

E[(Y - f(X))^2]

= E[ E[Y^2 - 2Yf(X) + f(X)^2|X] ]

= E[ E[Y^2|X] - 2f(X)E[Y|X]

\+ f(X)^2 ]

= E[ E[Y^2|X] E[Y|X]^2 - 2f(X)E[Y|X]

\+ f(X)^2 - E[Y|X]^2 ]

= E[ E[Y^2|X]

\+ (E[Y|X] - f(X))^2

\- E[Y|X]^2 ]

which is minimized with

f(X) = E[Y|X]

Done.

~~~
valgor
I think if you have such an exposure to so many stat and probability books,
and yet cannot recommend one good one, then clearly you are the person fated
by the universe to write that one book written correctly. :)

~~~
graycat
Back in grad school, my fellow students and I were amazed at how polished were
the books of Rudin, Neveu, Royden, Luenberger, Dynkin, etc. but how
comparatively ..., say not good, were the books in statistics. There are some
hints that there are more good statistics books now.

One of my fellow students was very capable, and I was hoping would write a
good book; I doubt if he ever got around to it.

I'm glad the statistics community has at least one foot in important
applications, but both feet? Way back there was Cramer. At the Brown
University of Applied Math long was U. Grenander -- maybe he could have
written a Cramer Volume II.

I'd like to see (A) much more polish on the foundations and then (B) selected
with good insight and expertise some of the keys to some of the more important
applications.

Some of the application areas where I suspect, with varying degrees of
strength, there is some good work include (i) particle physics such as at the
LHC, (ii) a huge range of bio-medical research, (iii) high end military radar,
sonar, and tracking more generally.

When I was in grad school, some of the gossip was that statistics of sample
paths of stochastic processes was a wide open field -- I suspect it still is.

Apparently in the US, for well done theory, at least in attitudes, statistics
is a poor cousin of probability theory, that is a poor cousin of pure math,
and stochastic processes is just out of the picture.

I haven't tried to be a statistician, but I've done some projects and gotten
some results. But for each of the results, clearly there were plenty of loose
ends and more to do but without any very clear theory, examples, experience,
or methods to tie off the loose ends.

Maybe here's one -- maybe since I'm not putting a lot into this just now:
Above I gave my little derivation that with data X, the best estimate of Y is
E[Y|X] with the idea that this partly justifies cross tabulation as the
discrete version. Okay, but X might be a sample path of a history of a
stochastic process with lots of dimensions with goofy _data types_. So, maybe
to cut down some on the exponential explosion of the data required for cross
tabulation on several variables, exponential in the number of variables, pick
and choose the variables. Okay, but first cut we have not even zip, zilch, or
zero on how to do that.

Once I published a paper on multi-dimensional, distribution-free statistical
hypothesis tests, intended for zero-day computer security. But, again, the
number of variables with data is huge; we encounter another exponential
explosion and would like some help on which variables to choose.

Very broadly, from 200,000 feet up, we get to choose the variables to use and
then want to know something about the accuracy of our results -- too often we
are to use the TIFO (try it and find out) method, some form of Monte Carlo,
_resampling_ techniques (B. Efron, P. Diaconis) deleting some variables or
observations and trying again, etc.

My guess is that finding a welcoming department in a research university
and/or an interested problem sponsor in a funding agency would be too tough.

------
selimthegrim
What are people’s thoughts on Loeve? Too old, too much to bite off?

~~~
srean
It's an excellent book to get ones fundamentals right. Might not work for a
large part of hacker news audience. Since CS deals primarily with discrete
spaces they don't need as much rigour in analysis or measures.

------
georgewsinger
_All of Statistics_ by Larry Wasserman (for its rigor).

------
itronitron
I recommend reading Taleb's books, The Black Swan, Fooled by Randomness, and
Anti-Fragile. They will either reinforce your education or serve as an
antidote.

------
perl4ever
Hugh Gordon's Discrete Probability

------
NannyOgg
fuller- intoduction to probability theory

------
mlevental
casella and burger is the standard grad math stats book and good-ish
departments i.e. it proves things using calculus but it's for statisticians
(covers things like efficiency and biasedness and etc).

