Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: What are your favorite statistics and probability textbooks?
452 points by newguynewguy on July 6, 2018 | hide | past | web | favorite | 85 comments
I took stats in undergrad, but it was a very rudimentary "push x sequence of buttons on your calculator in y situation" ordeal and left me with less applicable knowledge than I'd like.

Since sometime after graduation, I've taken up serious study of higher-level maths as a hobby. I think it will be most useful in my career if I also have a strong grasp on probability and statistics.

[edit] The suggestions so far have been great, but it occured to me to add that I'm working through Spivak and Apostol right now with Dedekind's essays on the side. Hopefully that gives an idea of the tone/rigor I'm after. Answers to problems is also ideal.

As a first text, I really like Introduction to Probability by Blitzstein and Hwang. It's not quite as mathematically sophisticated as some of the other treatments (e.g. you're not going to see the sigma-algebra treatment of events over a sample space), but it has a very heavy focus on building good statistical and probabilistic intuition which I think is invaluable for studying statistics in and of itself rather than statistics as a subset of measure theory. The book imparts a good knack for being able to quickly glance at a statistical problem and see "oh it's one of those kinds of problems," or "hmmm looks like there's a good symmetry argument [or other general approach] I can use here," which continues to reap rewards in more advanced and specialized statistical studies.

There are accompanying lectures, problems, and accompanying solutions on its website https://projects.iq.harvard.edu/stat110/home

I think you will enjoy Efron & Hastie's Computer Age Statistical Inference: Algorithms, Evidence and Data Science.


MIT 6.008 Introduction to Inference also goes over many of the classic problems such as Cookie Jar, Monte Hall, Fair Dice, One-Armed Bandits, etc. with tons of labs.

I find this stuff easier to grok by just writing code and running a simulation. To that end, I'd setup an Anacondas environment and start experimenting with PyMC3. As your intuition suggests, there's no downside to mastery of probabilistic programming. Best of luck!


I think Efron & Hastie may be a bit too advanced and terse for the OP. They cover many things in not so many pages and "our intention was to maintain a technical level of discussion appropriate to Masters’-level statisticians or first-year PhD students."

But given that they distribute the PDF for free it's worth checking out. Hastie, Tibshirani & Friedman's The Elements of Statistical Learning and the watered-down and more practical Introduction to Statistical Learning are also nice. All of them can be downloaded from https://web.stanford.edu/~hastie/pub.htm

Elements of Statistical Learning is the other text I came in here to recommend.

One of my most valuable activities in grad school was printing and studying each chapter of EoSL.

It's a comprehensive text on the fundamentals of statistics and machine learning, a solid foundation for the cutting-edge techniques relying on deep learning and reinforcement learning.

You should state your goals and timeline. There is a lot of deep stuff that can be ignored by many... Often there is a good 1st book, another for 2nd, for 3rd, and hopefully not many more after that.

It also depends what community you ask, engineers vs mathematicians vs actual statisticians. So the community you want to be a part of is the one you should ask. It is probably too much of a melting pot here to get one of these flavors.

In my past I’ve looked at courses I wanted to take, and then looked at the chain of prerequisites and books used there, hws assigned, and did those. Nothing beats talking to the teachers who you think are good. Alternatively you could have colleagues who you like, and whose history you want to emulate.

Having a specific goal is important. Egregious amounts of time can be spent in many corners of probability theory and statistics. Especially in the case that steps are taken in the wrong order. Trying to do Billingsley without first taking real analysis would be crazy. And maybe studying from Billingsley is crazy for most people’s goals to begin with.

That said, I’ve enjoyed reading the suggestions here.

Feller Vol. 1 is considered the classic text for discrete probability: https://epubs.siam.org/doi/10.1137/1011021

Volume 2 is easily found as a pdf on google, but it's much harder.

Both Feller volumes have some good stuff. E.g., volume 2 has the renewal theorem -- roughly, arrivals from many independent sources look like a Poisson process, i.e., times between arrivals are independent identically distributed random variables with exponential distribution where the parameter in the distribution is the arrival rate. E.g., expect that between 1 PM and 2 PM, arrivals at a popular Web site will be Poisson, and might use that for capacity planning or anomaly detection.

But as my main probability prof summarized, an unguided tour of Feller is not promising. I believe he was correct. Look elsewhere for the first intuition, common applications, or the high quality math, e.g., based on sigma algebras and the Radon-Nikodym theorem, the limit theorems -- laws (weak and strong) of large numbers, central limit theorem (the strong Lindeberg-Feller version, irony here), the ergodic theorem, martingales, Skorohod representation, Kolomogorov extension theorem, Lebesgue decomposition, etc.

Feller is damn hard work. I'd put feller's probability on the same level as Rudin's real analysis.

Agree Feller is excellent. I don't think it has much in the way of prerequisites and comes at probability from a solid "fundamental understanding" POV.

Volume 2 is more difficult but also very good.

Don't forget that you always read mathematics ( or probability theory ) with a pencil in hand. It's a participatory activity.

A great starting book that helps build a good intution is "An Introduction to Error Analysis" by John Taylor (of Classical Mechanics fame). Basically if you are taking any kind of measurements and need to compose them intelligently this is where to start. From sig-figs to chi squared. A must read for any starting researcher

My experience otherwise with stats books has been horrendous.. (havent found a good "medium level" book to move on to yet)

Lindley’s Understanding Uncertainty

Jaynes’s Probability Theory: The Logic of Science

MacKay’s Information Theory, Inference, and Learning Algorithms (http://www.inference.org.uk/itila/book.html)

Jaynes is a delight to read if you like an opinionated style over the more usual dry prose of other textbooks.

All of Statistics by Larry Wasserman is the book I recommend for anyone who knows calculus and linear algebra and wants to learn statistics.

Came here to post this and “Advanced Data Analysis from an Elementary Point of View“ by Cosmos Shalizi

Cosma Shalizi. I think your post got mangled by auto correct.


I just wish this would come out in paper book. It’s excellent but I have difficulty reading math books online.

Supposedly in process. From the book's website: "The book is under contract to Cambridge University Press; it should be turned over to the press [...] before the end of 2015." But it sounds like that was after a few delays already, so maybe there have been more.

You can print ebooks.

Yes, I loved this book! It’s compact enough that you can really get a flavor of things very quickly.

Amazing book. It has an ambitious title, but it really lives up to it. I just wish I had read it years ago.

I'm not well rounded enough to draw a clear path from where you are. For me Gelmans Data Analysis Using Regression and Multilevel/Hierarchical Models [0] drove home many, many points. More recently, I have a sense/hope that Pearl's The Book of Why [1] might take this to yet another level.

[0] http://www.stat.columbia.edu/~gelman/arm/ [1] https://www.amazon.com/Book-Why-Science-Cause-Effect/dp/0465...

Pearl's landmark rigorous book is Causality

Gelman & Hill

If you're working through Spivek and Apostol, get Feller v1 and v2. Same level of rigor and emphasis on intuition. (If you're having fun after that, pick up Gallager's book on stochastic processes. Similar approaches, intuition, and focus on discrete probability to skip the metric theory complications of continuous dists.)

Another very good book is Bertsekas and Tsitsiklas, Introduction to Probability. This book will also give you some intuition.

I've heard Grimmitt recommended but I think B&T is better.

You could also start with Papoulis (a stochastic processes classic book, but it does the intro probability too.)

In my experience, those who grew up with Papoulis, 2nd edition loved it. I grew up with Papoulis, 3rd edition, and find the layout and typesetting to be among the worst of any book I've ever read, bad enough to significantly affect usability and ability to quickly find things.

Yeah, I personally don't love Papoulis. But it has been regarded as the standard stochastic processes text.

Anyone has anything to say about The Lady Tasting Tea[0]? I'd like to hear/read your thoughts about this book. (I came across it on the comment section of John D Cook's blog post[1] years ago but haven't had a chance to read it yet).

[0] https://en.wikipedia.org/wiki/The_Lady_Tasting_Tea

[1] https://www.johndcook.com/blog/2013/01/12/elementary-statist...

I enjoyed it. I read it when I knew almost nothing about statistics, and found that it motivated me to learn more. And it's frankly an interesting / entertaining read in its own right. It's not a textbook, so don't expect to come away knowing a lot of stats after reading it (unless you already do!), but I'd say it's worth reading.

Nice pop science/history of stats book. entertaining read. You need to have quite a deep understanding of statistics to really understand it though.

I personally like Kruschke's Bayesian Data Analysis very much. It does a wonderful job in introducing readers to Bayesian statistics -- a good first text book for Bayesian statistics.

For a more classic approach to statistics, I also liked Builder's Analysis of categorical data. It's focused on categorical data but does a good job in explaining what's going on. A good introductory book.

Both books are well written and really try to explain stuff. IMHO they are two examples of good didactics in statistics.

If you're into ML you might want to choose something else though.

Another nice intro to Bayesian statistics is McElreath’s Statistical Rethinking: A Bayesian Course with Examples in R and Stan. There are also recordings of the lectures based on the book. http://xcelab.net/rm/statistical-rethinking/

I GREATLY enjoyed Intuitive Biostatistics by Motulsky.

It assumes that you're not going to calculate any test statistic on your own, you're going to use Excel, R, or his own software. Therefore the book contains little to no formulas. Instead, it's plain English going over (biology's) most common tests and attributes, and explains what the test assumes about your data, what you can and cannot infer from the output, and what the pitfalls are. VERY useful if you like me come from a 'push X button no clue why' background.

I've been enjoying the struggle through Statistical Inference by George Casella as a foundational text. This was the introductory text for stats graduate students at Arizona State University a few years back.

The Mathematical Methods of Statistics by Harald Cramér is also excellent.

+1 to this, I'm a big fan of Casella. I personally think it's most useful as a reference, moreso than a textbook.

Is that the same as Casella & Berger? Nice Sherlock Holmes quotes at the beginning of each chapter.

I'd recommend almost anything by Andy Field. He has a clear cut approach with a bit of comic wit to make the reading more enjoyable for hobby reading. His book on SPSS was fantastic, but may be a bit too specialized for what you're looking for. Check out discovering statistics:


Statistics by Freedman et al. All exercises use real data as cases, I learned a lot more from them!


The sequel to this book -- Statistical Models: Theory and Practice -- and it's accompanying (well, this is how I was encouraged to read them) text -- Statistical Models and Causal Inference: A Dialogue with Social Sciences -- are excellent for intuition building and accumulating a number of examples for personal reference.

Wow, I didn't know there was a sequel, oh what a pleasant surprise!

Statistics already had a lot of useful random trivia and left me with an impression that this is how a good student's resource should look like (my university had a really bad stats course so I had to learn stats myself).

But now you say there is more? Thank you random internet stranger!

The Probabilistic Method by Noga Alon and Joel H. Spencer.

I took a seminar on the probabilistic method and my advisor recommended me to read this book. This book is for scientists. It lays out the foundation of the probabilistic method and a theoretic background.

I'm in the same boat as you. I started studying advanced machine learning/AI concepts, and understand the intuition just fine, but I feel like i'm grasping around in the dark without a good foundation in statistics. Currently reading this online book and watching lectures: https://www.stat.berkeley.edu/~stark/SticiGui/Text/toc.htm. The material is stated quite simply without advanced math, but it actually helps a lot to understand concepts better.


"Theory of Probability" by Bruno de Finetti is the best text on probability ever written.

"A book destined ultimately to be recognized as one of the great books of the world" according to the foreword by Lindley included in the English translation. "Probability is a description of your (the reader of these words) uncertainty about the world. So this book is about uncertainty, about a feature of life that is so essential to life that we cannot imagine life without it."

Against the Gods by Peter L Bernstein, Chaos by James Gleick, and The (Mis)Behaviour of Markets by Benoit B Mandelbrot and Richard L Hudson.

In this less mathematical, more "pop science" vein I remember I enjoyed The Flaw of Averages by Sam Savage (who happens to be the son of Jimmie Savage) but I read it many years ago and I'm not sure I would still recommend it.

For probability, you need measure theory first. That's also when discrete and continuous methods unify, so it's an amazing edifice.

Statistics has four major branches: inference, exploratory data analysis, experimental design, and visualization.

Your question is probably mostly asking about inference. I like the first few chapters of Kiefer's book to get the context of decision theory. Then you're going to have to wander far and wide, but there have been some good recommendations. Since you're digging into Dedekind on the side, you might like to go back to some of the classics here as well: Wald's "Statistical Decision Functions" and Savage's "Foundations of Statistics."

For exploratory data analysis, Tukey's book "Exploratory Data Analysis" remains the place to start, if you can find a copy that doesn't cost a fortune. Casella wrote a book on experimental design that's solid. For visualizations Wilkinson's "Grammar of Graphics" and Tufte's books are the usual recommendations.

No no on measure theory. See below on Feller, Bersekas, Gallager. I think it's best to learn discrete prob first and skip the difficult math until you decide to become a mathematician or really need it for e.g. finance work.

One other objection: yes, those are fields in statistics. But statistics as a field is MUCH broader than departments called statistics. A ton of stats is done in engineering, psychology, and economics (econometrics). In fact I'd say the major research in stats for the past few decades has been done outside of "statistics" as a field. But yes, the books by Tukey, Tufte, and Casella are good.

If the OP is asking about statistics and probability needed for machine learning, he wants to focus on engineering statistics, like: estimation, stochastic processes, and filtering and mathematical statistics. The Wald rec is good and complements Feller.

The OP is reading Spivak and Dedekind and makes no mention of machine learning. He's interested in the deep mathematics.

Thought this book had good detail without going too deep for some dev work I was doing: "Practical Statistics for Data Scientists: 50 Essential Concepts" https://amzn.to/2MUmZt7

“Probability theory: The logic of science” by Jaynes is an amazing book on Bayesian statistics.

Somewhat different to the suggestions so far, but Thrun et. al's 'Probabilistic Robotics' is a very good applied probability text, with a focus on physical systems.

Plenty of worked examples and problem sets as well.

Kolmogorov's Foundations of the Theory of Probability is pretty good, IMO. I haven't made it very far, but he has great examples, plenty of rigor, and the book is like $10.

I've been watching this series recently. It's got quite a bit of statistics, mixed with some code and real world applications. It's certainly refreshed my memory on some things; some of them I'm not sure I really appreciated first time round.


It's probably not as rigourous as you mention in your edit but I'll leave it here in case someone else is drawn to the title.

Introduction to Probability, 2nd Edition by Dimitri P. Bertsekas, John N. Tsitsiklis. It is textbook for EECS Probability course at MIT and very well written for beginners.

Additionally, there are video lectures available on MIT OCW which mirror this book closely.


If you're looking for just mathematical statistics, I liked Hogg McKean and Craig's "Introduction to Mathematical Statistics" (4th edition was much better than the ones after it, imo).

But if you're looking to learn prob/stats for applications to ML, most ML textbooks have a chapter or two reviewing the relevant stuff. I liked the first two chapters of Bishop's PRML for that.

Ronald Meesters book A natural introduction to probability is pretty good. i had it as an introductory probability theory text as an undergrad. its not too long and leaves you with a pretty good intuition of 'basic' probability theory. it does prove things but is not based on measure theory. id recommend searching for it as those big textbooks everyone is using are generally worse.

With your background it's unclear if you want to statistics or mathematical statistics style book.

These are two very different approaches. Mathematical statistics goes trough measure theoretic probability, it connects statistics and mathematic and unifies the discrete and the continuous cases into a coherent whole.

Examples of both

- All of Statistics: A Concise Course in Statistical Inference by Wasserman

- Mathematical Methods of Statistics by Cramer

How to lie with statistics

The Cartoon Guide to Statistics

(Don't laugh, it's a serious tome. Past chapter 2, it's not introductory level info.)

Darrell Huff is hilarious.


FWIW here is a list of texts used in a data science conferene as references in presenter's slides (Cassella Berger was mentioned here by others, Hastie Tibshirani is also good). https://hastebin.com/raw/upiqumenas

Not a textbook, but I found “Statistics in a Nutshell” to be a good refresher for my long dormant stats knowledge.

I found Wilcox's "Fundamentals of Modern Statistical Methods" extremely interesting: it explains the pros and cons of classical statistics with means and standard deviations. And all of this list carefully explained through examples and real-world cases.

Probability and stochastics by Erhan Cinlar is a modern, measure theoretic approach to probability.

Fun: http://xcelab.net/rm/statistical-rethinking/

Rigor: Feller vol 1, Casella & Berger 2e

I'd recommend Feller for Probability seeing how you are interested in rigor and completeness. For Bayesian approach, I'd suggest Probability Theory - The Logic of Science by E.T.Jaynes.

I've heard good things about Lev Tarasov's "The World is Built on Probability", but haven't dived into it so can't offer my own opinion

All of statistics by wasserman, and then all of nonparametric statistics.

Since you’re also interested in the philosophy side of things, definitely suggest Judea pearl too!

Hosmer, Applied Logistic Regression.

Lots of applied stats problems are based on categorical data, and this book really helped me to understand the theory better.

This isn't what the OP is looking for but Introduction to Probability with Texas Hold ’em Examples is a fun intro to probability.

Shafer & Vovk: probability and finance. It bridges this gap for me that I didn’t quite understand.

casella and burger is the standard grad math stats book and good-ish departments i.e. it proves things using calculus but it's for statisticians (covers things like efficiency and biasedness and etc).

What are people’s thoughts on Loeve? Too old, too much to bite off?

It's an excellent book to get ones fundamentals right. Might not work for a large part of hacker news audience. Since CS deals primarily with discrete spaces they don't need as much rigour in analysis or measures.

All of Statistics by Larry Wasserman (for its rigor).

I recommend reading Taleb's books, The Black Swan, Fooled by Randomness, and Anti-Fragile. They will either reinforce your education or serve as an antidote.

Hugh Gordon's Discrete Probability

How I learned: Early in my career, I was around DC doing mostly work in applied math and computing for US national security. No joke -- constantly the work was heavily probability, statistics, and stochastic processes. I had a good ugrad math major but no courses in any of those three subjects. So I was thrown into the deep end of the pool and was constantly struggling to understand. I did pick up a good overview and a lot of intuition. But the sources varied widely, in both the topics and the quality, over stacks of books and papers, documentation of software, etc.

Lesson: At least at first, one way to learn is just to jump in at the deep end and struggle using lots of texts, references, etc.

Sad Lesson: While nearly all the famous books were good, some of the books that, e.g., from the publisher, might have seemed good were not. The guy who wrote the stuff, I hope he got tenure -- can't be any other reason.

Later I got an applied math Ph.D. and had a terrific course in analysis and probability. So, the analysis part was, right, basically Royden, Real Analysis and the first half of Rudin, Real and Complex Analysis. There was also some material from Oxtoby, Measure and Category (ice cream and cake dessert -- super fun stuff).

The probability was right from the beginning sigma algebras, etc. So, a central topic was the Radon-Nikodym theorem and conditional expectation -- gorgeous once see it. So, there was beautiful coverage of the classic limit theorems, especially martingales.

Best course of any kind I ever took in school. The prof was a star student of E. Cinlar, long at Princeton.

For the course, the main texts in probability were from J. Neveu, L. Breiman, K. Chung, M. Loeve.

For statistics, for the applied stuff, I just remember the stacks of books I worked with early on, especially multivariate statistics. For the math, I just regard that as applied probability and sometimes just do my own derivations, sometimes at least a little new. I never found a statistics book I like or can recommend as the single, main book, e.g., like Rudin in analysis or Neveu in probability. All I can suggest is just to dig into the stacks of the most famous books and also glance at some of the software documentation.

I suspect that there is a really good statistics book to be written, and maybe someone has written it, or is writing it, but I haven't seen it.

Here is a simple derivation I typed in yesterday with an intuitive result in statistics that maybe people should keep in mind. In a sense this little derivation shows the strongest possible result in statistical estimation is, and may I have the envelope please [drum roll], and the discrete data version of the winner is just cross tabulation, assuming that have enough data.

The context is a person applying for credit. Might proceed similarly for, say, ad targeting, etc.

We assume that Y is a real valued random variable where E[Y^2], that is, the expectation, of Y^2 is finite -- meager assumption, especially for practice.

The Y is something about credit worthiness, e.g., loss on a loan, we are interested in.

We assume that X is a random variable taking possibly very general values, e.g., a credit history at uncountably infinitely many points in time in the past. We assume that we have the value of X -- that's our credit data on the person.

Let's do a little preliminary derivation: What value of real number a minimizes

E[(Y - a)^2]

Well, we have

E[(Y - a)^2]

= E[Y^2 - 2 Ya + a^2]

= E[Y^2] - 2aE[Y] + a^2

= E[Y^2] + E[Y]^2 - 2aE[Y] + a^2 - E[Y]^2

= E[Y^2] + (E[Y] - a)^2 - E[Y]^2

which we minimize with a = E[Y].

Or, for one interpretation, the minimum rotational moment of inertia is for rotation about the center of mass.

So, for our main concern, suppose we want to use the data we have X to approximate Y. So, we want real valued function f with domain the possible values of X so that f(X) approximates Y.

For the most accurate approximation, we want to minimize

E[(Y - f(X))]^2

Claim: For f(X) we want

f(X) = E[Y|X]

So, f(X), using X, is the best non-linear least squares approximation to Y.


We start by using one of the properties of conditional expectation and then continue with just simple algebra:

E[(Y - f(X))^2]

= E[ E[Y^2 - 2Yf(X) + f(X)^2|X] ]

= E[ E[Y^2|X] - 2f(X)E[Y|X]

+ f(X)^2 ]

= E[ E[Y^2|X] E[Y|X]^2 - 2f(X)E[Y|X]

+ f(X)^2 - E[Y|X]^2 ]

= E[ E[Y^2|X]

+ (E[Y|X] - f(X))^2

- E[Y|X]^2 ]

which is minimized with

f(X) = E[Y|X]


I think if you have such an exposure to so many stat and probability books, and yet cannot recommend one good one, then clearly you are the person fated by the universe to write that one book written correctly. :)

Back in grad school, my fellow students and I were amazed at how polished were the books of Rudin, Neveu, Royden, Luenberger, Dynkin, etc. but how comparatively ..., say not good, were the books in statistics. There are some hints that there are more good statistics books now.

One of my fellow students was very capable, and I was hoping would write a good book; I doubt if he ever got around to it.

I'm glad the statistics community has at least one foot in important applications, but both feet? Way back there was Cramer. At the Brown University of Applied Math long was U. Grenander -- maybe he could have written a Cramer Volume II.

I'd like to see (A) much more polish on the foundations and then (B) selected with good insight and expertise some of the keys to some of the more important applications.

Some of the application areas where I suspect, with varying degrees of strength, there is some good work include (i) particle physics such as at the LHC, (ii) a huge range of bio-medical research, (iii) high end military radar, sonar, and tracking more generally.

When I was in grad school, some of the gossip was that statistics of sample paths of stochastic processes was a wide open field -- I suspect it still is.

Apparently in the US, for well done theory, at least in attitudes, statistics is a poor cousin of probability theory, that is a poor cousin of pure math, and stochastic processes is just out of the picture.

I haven't tried to be a statistician, but I've done some projects and gotten some results. But for each of the results, clearly there were plenty of loose ends and more to do but without any very clear theory, examples, experience, or methods to tie off the loose ends.

Maybe here's one -- maybe since I'm not putting a lot into this just now: Above I gave my little derivation that with data X, the best estimate of Y is E[Y|X] with the idea that this partly justifies cross tabulation as the discrete version. Okay, but X might be a sample path of a history of a stochastic process with lots of dimensions with goofy data types. So, maybe to cut down some on the exponential explosion of the data required for cross tabulation on several variables, exponential in the number of variables, pick and choose the variables. Okay, but first cut we have not even zip, zilch, or zero on how to do that.

Once I published a paper on multi-dimensional, distribution-free statistical hypothesis tests, intended for zero-day computer security. But, again, the number of variables with data is huge; we encounter another exponential explosion and would like some help on which variables to choose.

Very broadly, from 200,000 feet up, we get to choose the variables to use and then want to know something about the accuracy of our results -- too often we are to use the TIFO (try it and find out) method, some form of Monte Carlo, resampling techniques (B. Efron, P. Diaconis) deleting some variables or observations and trying again, etc.

My guess is that finding a welcoming department in a research university and/or an interested problem sponsor in a funding agency would be too tough.

Can you recommend what is the best sequence of books in your opinion to get to rigorous probability theory? E.g.:

- Principles of Mathematical Analysis, Rudin

- Finite-dimensional vector spaces, Halmos

- Real Analysis, Royden

- Mathematical Foundations of the Calculus of Probability, Neveu

Would that be something like this? Any alternatives?

To make the study easier and better rounded, just add some material.

Rudin's Principles is really nice, especially in retrospect once understand it, but going in as a student it can seem quite severe. It's precise but not too severe -- he just makes you go a chapter or two before it becomes clear why he is doing what he is doing.

To help, my nutshell view is that mostly he is just trying to develop the Riemann (Stieltjes) integral. His main result is, the Riemann integral exists for continuous functions on compact sets. So, then, he needs to say what a compact set is. Well, the most relevant example is just a closed interval of real numbers such as [a,b]. So, why compact? Because every continuous function on a compact set is uniformly continuous, and that lets us know that the Riemann sums converge. What is compact? Every open cover has a finite subcover, and that lets us get uniform continuity. And, in R^n, a set is compact if and only if it is closed and bounded. So, Rudin needs to talk about closed versus open -- he does that on metric spaces although really he needs it only on R^n.

So, net, he starts with metric spaces and discusses open, closed, and compact. Then he shows that in R^n, compact is the same as closed and bounded. He shows that a continuous function on a compact set is uniformly continuous. Then, presto, he shows that the Riemann (or Riemann-Stieltjes if you wish) sums converge and the Riemann integral exists.

He does some nice work on infinite sequences and series, and the main reason is that he uses those tools to show lots of limits exist, e.g., for sines, cosines, and Fourier series.

There's more of value in Principles, but IMHO I gave you a good start to make the book easier. I wish I'd had been given that outline when I was working through Principles at 1+ hour a page.

But the Lebesgue integral in Royden is the one to take fully seriously.

Make Halmos the second or third text on linear algebra. And then look at some quantum mechanics where they discuss eigenvalues, eigenvectors, Hermitian, unitary, and the spectral decomposition! Right, the Halmos book is baby Hilbert space. Then look at some applied connections, e.g.,

George E.\ Forsythe and Cleve B.\ Moler, {\it Computer Solution of Linear Algebraic Systems,\/}

Maybe spend an evening on the documentation of LINPACK.

Some weekend of great fun, take a fast pass through the Gauss, ..., Stokes theorem parts of

Tom M.\ Apostol, {\it Mathematical Analysis: A Modern Approach to Advanced Calculus,\/}

where don't take the proofs very seriously but to see how physical science and engineering look at calculus of several variables.

Neveu is a great last probability text but not a good first text. So, before Neveu, look quickly, not very seriously, at whatever, including in some introductory statistics texts.

Also, Breiman's Probability is easier to read than Neveu. So, is K. L. Chung's competitive book. And there are others.

There is some more advanced material, e.g.,

Ioannis Karatzas and Steven E.\ Shreve, {\it Brownian Motion and Stochastic Calculus, Second Edition,\/}

Good luck!

Many thanks for taking the time to write this! It's much appreciated.

Errata: The expression

E[(Y - f(X))]^2

of course, and as later in the derivation, should read

E[(Y - f(X))^2]

I'll have to advise my typist to do better in the future!!!

fuller- intoduction to probability theory

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact