Probability Theory for Scientists and Engineers 444 points by kawera 10 months ago | hide | past | web | favorite | 47 comments

 This is excellent. Amazing (equation + picture)/text ratio.Complaint: you don't define your notion of "space". In chapter 1 it's some informal notion that you use to motivate the definition of a set (??), in 1.3 and 1.4 it becomes clear by space you mean "set". Then later you start talking about dimension of spaces, implying not only do they come with a topology now they have a well defined dimension, so a locally Euclidean Hausdorff space or something - but maybe you just mean R^n.Comment for other commentators in this thread: not all expositions is tailored for the masses. A piece of pedagogical literature that does not appeal to your background doesn't mean it's not good. There's a very clear need for exposition on basic structures in probability theory and this fits there.
 It's like the definition of set defined here should really be a subset, and the definition of space should be a set. Maybe just say a set is any collection of objects?
 Agreed!A sample space would be relevant in probability theory and is often helpful to calculate for your denominator, but this definition is rather vague.
 I'm not really sure who the intended audience is here. There's a lot of material covered very briefly in a very short space, and not enough details that anyone who doesn't already know it would be able to pick up anything substantive.
 The minimum audience would be those who've taken an introductory point set topology, introductory analysis course, and a probability/statistics course or read extensively on the subject. If you do not have that background, you are not qualified to understand this material, no matter how much the author attempts to dumb it down
 As a self-taught programmer, I appreciate learning about mathematical topics from the bottom up (from a « pure mathematics » point of view), after having gained some intuition. It is easier for me to grasp because it relates very much to my daily life of programming, with its type systems, class transformations, mappings. And the author is right in that probability theory is often employed in even looser terms than other areas of mathematics. It feels to me like building up something in java/c++/haskell vs building up the same thing in python/javascript. For a lot of people, python is simpler to handle, but I usually have to go back to my c++ to feel reasonably safe that I’m applying my functions to the right objects.
 I might suffer from the curse of knowledge bias, but it seems to start from the basics.
 Sounds like most undergraduate probability/statistics classes.
 Seriously. Content like this is only useful for people who already know probability.
 Do you know of anything which can help people get over the hurdle to know enough to use this content?For me it's only worked when colleagues have explaining concepts to me when they were needed, after a several occurrences of this everything finally started to make sense and I could then make use of material like this.
 There are no shortcuts with math. If you really want to learn it, you must be willing to put in a large number of hours over a long period time in order to master it. Are you willing to do that?
 This is an excellent book by L. V. Tarasov on probability.https://archive.org/details/TheWorldIsBuiltOnProbabilityI have always found Russian math book writers to be on point, not going too much over your head, also respecting the reader's intelligence. If you like it, then you will love his calculus book, that one is also a real gem.
 Wow, that's pretty expensive book https://www.amazon.com/World-Built-Probability-L-Tarasov/dp/...
 Yeah, his books are out of print. I have been looking for a hard copy of his calculus book. Really expensive.
 Thanks for the reference anyway! I downloaded the Russian version in form of PDF document and enjoy reading it:-) What was great about USSR is its level of popularization of science. All these books written for kids or high school students were amazing.
 Yes, and a number of them are now available for free over the internet. In Russian...
 Based on the very light skim I just gave, I like that book. Thank you for the reference.
 Echoing other comments here, this seems like a hard way to start learning probability. It sounds like the goal is to make probability easier to understand based on what you say here (https://betanalpha.github.io/writing)> In this case study I attempt to untangle this pedagogical knot to illuminate the basic concepts and manipulations of probability theory and how they can be implemented in practiceBut I think this is too hard. I really loved "Probability For The Enthusiastic Beginner" http://a.co/2kp5PZd
 "For Scientists and Engineers" sounds to me like it's targeting people who already have a strong background in more advanced mathematics, but not necessarily probability and measure theory. If so I think this is a decent way to go about it.I'm an engineer and sometimes mathematician that works with fairly in-depth probability theory related things and this looks to be a condensed version of a lot of the basic stuff I had to self learn when I was getting into what I work on now. I'm in a niche area though, and I do wonder if this really is that useful to most scientists and engineers.
 Reading 'All of statistics' by Wasserman would be a much better bet than diving deep into measure-theoretic probability though.
 Depends on your goal, statistics and probability theory are separate (though of course related) fields with different applications. For me I really needed the measure-theoretic bits because I was (am) working on modeling ergodic processes. This article honestly doesn't go into enough detail to be especially useful but I like the direction the author approaches it from.I'm familiar with the text you mentioned, it's certainly good and would be better than this article for most, but comparing a textbook to a short web article isn't exactly fair.
 The goal is worthy, but the product is inadequate to say the least. This thing is littered with typos, and enough of the exposition is sufficiently irrelevant or incorrect to be unintuitive. That said, I like the graphics and layout.For example, when he discusses power sets in order to introduce sigma algebras, he implies that a sigma algebra is a better-behaved alternative to a power set. However, a power set is always itself a sigma algebra (after all, even a power set of an uncountable set still is closed under complements and countable unions).Later, when discussing probability distributions, he writes:> [W]e want this allocation [of a conserved quantity] to be self-consistent – the allocation to any collection of disjoint sets, A_n ∩ A_m=0, n≠m, should be the same as the allocation to the union of those sets, > ℙπ[∪(n=1 to N) A_n]=∑(n=1 to N)ℙπ[A_n].The condition `A_n ∩ A_m=0, n≠m` is actually incorrect, since A_n and A_m are sets and 0 is an integer. The author means the empty set, but typo'd.Sometimes he frequently uses words like "conserved" or "well-defined" without giving us a clue as to what these mean. In what context are probabilities "conserved"? What distinguishes "well-defined" from "not well-defined"?I'm a software engineer. A non-trivial amount of my time is devoted to reading code and finding bugs. Sloppy reasoning, inconsistencies and outright errors like that are big red flags to me. It doesn't help that the whole section on sigma algebras is somewhat irrelevant, since he doesn't really explore measure theory as the basis for modern probability.IMO a better resource is the series of "Probability Primer" videos from mathematicalmonk on YouTube[1]. He does an excellent job (IMO) of covering all pertinent pre-requisites and being mostly rigorous without necessarily proving every single fact or exhaustively covering all edge and corner cases. He also makes a good effort to recommend advanced (and rigorous) treatments of the subject (and ancillary ones like measure theory). A readable version of this YouTube series would be a great resource, and if Michael Betancourt is reading, I'd encourage him to pursue that in his next iteration of this product.
 > It doesn't help that the whole section on sigma algebras is somewhat irrelevant, since he doesn't really explore measure theory as the basis for modern probability.Christ, practically all of measure theory is irrelevant for applied work, in much the same way that an engineer shouldn't care about the definition of a real number.There's a model of real analysis due to Solovay that used the axiom of dependent choice instead of the full axiom of choice. In the Solovay model, all sets are measurable. Thus any results that require measure theory inherently depend on the axiom of choice.I'd be worried if I was relying on Choice as an applied scientist.Edit: same goes for Lebesgue vs. Riemann integration. To quote Richard Hamming: Does anyone believe that the difference between the Lebesgue and Riemann integrals can have physical significance, and that whether say, an airplane would or would not fly could depend on this difference? If such were claimed, I should not care to fly in that plane.
 It matters very much in computational / quant finance
 It matters by convention, because the textbooks are written that way.My point is that you don't need that level of formal rigour to do applied work. You can derive the Feynman-kac formula via a scaling limit of discrete-time Markov chains. Add some levy process (a.k.a compound Poisson processes) and you're basically done.If you want to be ultra-rigourous in your definitions, then you need measure theory, yes. But even Einstein didn't need that for his description of Brownian motion. If a scaling limit is good enough for him, it's good enough for me.
 I confused this with the book "Probability and Statistics for Engineers and Scientists" by Anthony Hayter and I got excited.I am kind of a beginner in Machine Learning and was struggling badly with basic probability and Statistics concepts. I went through so many resources and somehow none of them clicked. Then I stumbled upon this book and I realized this is exactly the kind of book I needed. It assumes no prior knowledge and is very heavy on examples. Other books just dive into jargon/symbol laded theory without giving simple examples or building concepts from ground up.I mentioned this because I feel someone might benefit from this suggestion.
 Wow, this seems like a particularly hard way to learn probability.One thing I noticed about myself as I did more and more work with probability is that I started thinking in terms of distributions a lot more.These days I find it very difficult to think without using them. In just about everything I do now I tend to think about moving probability mass around.
 Probability theory expositions, especially for [software] engineers, would be better served if they were well typed. What is the type of a random variable, E[Y|X], E[E[Y|X]]? Hint, a random variable is not a scalar, but rather a function, the probability distribution.
 Hmm, a random variable (in the sense of measure theory, as in OP) is indeed a function - but it's not a probability distribution.An R.V. is a measurable function from the sample space into the reals. A probability distribution is a function assigning probabilities to measurable sets, formally, a function from the sigma-algebra into [0,1].So in particular, a R.V. (like a gaussian) can take on negative values. A probability distribution cannot.Also, the domain of the R.V. is the sample space. But the domain of the probability distribution is the sigma-algebra over that sample space.
 For some fairly good details on the solid answer since 1933 to your questions, see my posthttps://news.ycombinator.com/item?id=16854106below. In short, the answer is that a real valued random variable X is a real valued function. The domain of the function is a set of trials. So, for a trial w (usually written as lower case omega), X(w) is a real number.Then the event for real number xX <= xis really shorthand notation for{w|X(w) <= x}So, typically we don't mention the w.Moreover, typically for all but grad school mathematicians taking a course in "graduate probability" we don't mention that X is a function. Instead we just say something like, X is the number we get from running an experimental trial, one of all the numbers we "might have gotten" considering the probability distribution of X.You are correct: You sense some mushy ground under the foundations of probability theory, and you are not nearly the first to so sense.Long an answer was, "it works great in practice" which is doesn't make the mush any more firm.Well, in 1933 A. Kolmogorov gave a solid mathematical foundation for probability theory. That's the usual foundations for advanced work in probability, statistics, and stochastic processes. My posthttps://news.ycombinator.com/item?id=16854106here outlines that foundation.Some of the consequences are surprising, but I omit those. And we end up assuming that in all the universe all we ever see is just some one trial and don't say anything about the other trials but imagine about them a lot. That point may be hard to swallow.IIRC, one line of argument is just that in probability there are lots of possibilities we just don't distinguish. E.g, maybe the police have long since concluded that nearly everyone driving a car with custom installed, hidden compartments is a drug dealer and then conclude that a person with such compartment is "likely" a drug dealer. Well, of course, actually, they might not be a drug dealer and have the car and its compartments for some other reason. So, the police are putting all owners of cars with hidden compartments in a box and refusing to distinguish them, insisting that they all be treated the same until there is evidence otherwise. It may be that more can be said. For now, make of such lines of thought what you will.
 I don't see the point of introducing sigma-algebras if you're not doing probability based on measure theory.As others have said I wouldn't suggest this exposition to someone learning probability for the first time, but it's not as bad if you're familiar with the material and need a quick review.
 > The set of all sets in a space, X, is called the power set, P(X). The power set is massive and, even if the space X is well-behaved, the corresponding power set can often contain some less mathematically savory elements. Consequently when dealing with sets we often want to consider a restriction of the power set that removes unwanted sets.I wish people could teach math in plain English. I don't know why the math and physics world refuses to write for the reader. I took this class before, and I still don't know what the author means by "less mathematically savory elements".Here's you explain things to humans:> There is a set called the power set that contains all the sets in a space. This set is huge, and it contains [less mathematically savory elements]. This is why we usually use a restricted version that removes the unwanted sets.Seriously, there's no point to this sort of fancy language. Math is already hard. No need to make it harder.
 "There is a set called the power set that contains all the sets in a space"I don't think he's the best expositor and some of his terminology is crappy, but I understood the author from what I've read so far. I literally have no idea what you're trying to say; this has no meaning
 There is a set. It's called the power set. It contains all the sets in a space.
 The terms space and set are not synonymous. A space is defined on a set.The collection of all subsets of a set X is called the power set and is typically denoted as P(X).Replacing something confusing but loosely understandable with something even more confusing isn't an improving anything
 Well your second statement isn't even logically equivalent to the first for one
 It's based on my comprehension of his paragraph
 I thought this was really nice, strange to see that so many people dislike it.
 I find that this guide unhelpfully conflates probability and inference in a few places. Probability theory on its own is interesting but not terribly useful without the infrastructure of estimation.
 i think these are mistakes>2.5 Conditional Probability Distributions As we saw in Section 3.4,>It turns out that in this case a σ-algebra on Z naturally defines a σ-algebra
 NO NO NO!!! Don’t start with Venn diagrams, sets, and other such fluff. Reminds me of the thin, little book they tried sticking on us in my probability class; undergrad EE. It was meant for math majors.There is a book “Probability and Statistics for Engineers and Scienctists” by Raymond Walpole. That book is excellent. Rolling dice and pulling colored marbles from jars is how you teach probability.
 I studied probability during my undergrad (and high school) using dice, coins and other such things. It made sense to me but there was a dark area in my understanding. It felt like a blind spot and I could never get into it. In the final year of engineering, we had someone do a quick refresher on probability as a prelude to a longer course on pattern recognition and he described the whole thing using set theory (Venn diagrams, functions mapping from one space to another etc.) and I felt that the blind spot was illuminated. So, I don't know if starting from there would make sense but I do think it's useful, atleast sometime in your studies, to look at the whole system through this lens.I've been working through http://www.greenteapress.com/thinkbayes/ and am quite enjoying it. My only complaint is that he, as intended, teaches using programs and a computer and I learn better by doing stuff by hand. He also has a think stats book at http://www.greenteapress.com/thinkstats/ which people might find interesting.
 There is a good connection between probability and Venn diagrams: Both are about area. Probability is about area where the area of everything under consideration is 1. So, there is a set of trials. It has area 1. Each subset of the set of trials is an event and has an area, its probability. Then we can move on to random variables, distributions of random variables, independence of events and random variables, the event that a random variable has value <= some real number x, etc.In pure math, since H. Lebesgue in about 1900, the usual good theory of area is Lebesgue's measure theory. The ordinary ideas of area we learned in grade school, plane geometry, and calculus are all special cases. But Lebesgue's theory of area handles some bizarre, pathological, extreme cases. And we can show that there can be no really perfect theory of area -- e.g., there have to be some bizarre subsets of the real line to which no nice theory of area can assign a length. But, once we have the Lebesgue theory, the usual way to show that there is a subset of the real line without an area uses the axiom of choice.Well, in 1933, A. Kolmogorov wrote a paper showing how Lebesgue's theory of area would make a solid foundation for probability, and that approach is the standard one for advanced work in probability, statistics, and stochastic processes.
 I agree that to build fundamental intuition dice and marbles are great. They only take you so far, though, and it would be terribly wasteful not to utilize mathematical machinery that already exists. Practically applied mathematics is a difficult tool to wield but incredibly powerful. I.e. you need to know when and how to apply it, but when it's used correctly it's immensely practical.
 To echo the comments before this one:a) lots of people do have an intuition for Venn diagrams. Just look at the number of Venn diagram gags on xkcd.b) it is a great mental image to have if you move on to measure theoretic probability.

Applications are open for YC Summer 2019

Search: