A Concrete Introduction to Probability (2018) 529 points by tosh 51 days ago | hide | past | favorite | 75 comments

 I'm always amazed at how Peter Norvig continues to create the kind of content that a top grad student would do, even as his career and rank in the industry has skyrocketed.Back in university and grad school I would write tutorials and post them online (and still get thanks from random people many many years later). I would explore random interesting subjects and dive deep. I would constantly publish code and demos, etc. As my career grew, one of one these started to fall off. I look at my peers and it's the same story: they were all vibrant hot-shots in their early-mid 20s, and now are just weighed down by the teams and projects they manage.Peter is an inspiration. I will ponder this...
 "one of one" or "one by one"?
 Be sure not to miss the other two probability notebooks:- Probability, Paradox, and the Reasonable Person Principle https://github.com/norvig/pytudes/blob/master/ipynb/Probabil...- Estimating Probabilities with Simulations https://github.com/norvig/pytudes/blob/master/ipynb/Probabil...There are dozens of other notebooks on a variety of topics in the 'ipynb' folder: https://github.com/norvig/pytudes/tree/master/ipynb
 Maybe I'm missing something about the voltmeter example. My assumption is that the 100-volt-maximum meter can distinguish between 100 volts and more than 100 volts, in which case there's no problem. If the voltmeter doesn't accurately indicate whether or not a measurement is outside of its range then the statistician is correct that everything should be re-measured.Do some people think that the possibility of not being able to take accurate measurements is the same as not having taken accurate measurements?EDIT: Maybe the ambiguity is in what the engineer would have recorded if finding a voltage >100 volts while the other meter was broken? It's like undefined behavior in programming; if you know your software will have undefined behavior when encountering certain data then you can't trust whether the output is valid unless there's independent confirmation that the data won't cause undefined behavior. If the statistician doesn't have certainty that the engineer will have defined behavior (e.g. say "I couldn't complete the measurements" vs. undefined behavior like writing down "99" or exploding) then they of course want to re-measure.
 > in which case there's no problemI have a box with 1000 tubes, exactly 1% of which are beyond my capacity to measure.I take a random sample of 20, and call that a roughly 80% chance of being able too measure all of them.Let's say the average is 50V and the ones beyond my measurement average 150V. We can use this info to solve for the average of the tubes we can measure: 25V.So, in 80% of the cases we'll get an average measurement of 25V.In 20% of the cases we're going to realize our mistake, buy a better voltmeter, and redo everything. In this case we can measure everything so we get the correct number: 50V.25V * 0.8 + 50 * 0.2 = 30VThis, of course, won't be statistically significant. But let's get a lot of researchers together to measure a lot of tubes. And let's say 100V limit meters are common so half make the same mistake.Now we have a statistically significant 40V. Wrong, wrong wrong.
 > In the correct version of this story, the mathematician says "I have two children", and you ask, "Is at least one a boy?", and she answers "Yes". Then the probability is 1/3 that they are both boys.I don’t understand this reasoning. If at least one is a boy, the only configurations I can think of is 1 boy 1 girl or 2 boys. Where does the 1/3 come from?
 With 2 children, there are 4 configurations of equal probability. The one with 1 boy 1 girl occurs twice. Take away the 2 girl case, then 2 boys is 1 in 3.
 Yeah, the way the problem is formulated though there’s absolutely no indication that order matters so how are there two configurations within which there’s 1 boy and 1 girl?
 Order doesn't matter in the sense that the observed data set is unordered (just counts of girls and boys). What matters is how many ways there are that the universe can give rise to those unordered data sets. And in fact, there are more ways that the universe can give rise to the unordered state 1 boy 1 girl, than to the unordered state 2 boys. For similar reasons , there are more ways in which your papers can be in a mess across your desk than ways in which your papers can be neatly piled up.And to count how many ways the universe can give rise to the unordered data sets, the usual technique is to expand the unordered data sets into all the equivalent ordered data sets, and count the latter.
 Another way to think about it is counting the probability of getting k boys out of 2 children.`````` 0 boys - 1/4 1 boy - 1/2 2 boys - 1/4 `````` There's a half chance of getting exactly one boy, and one way to calculate this is by noticing there are two different ways to get one boy if we take order in account. You are right that the orderings don't matter in this case, so we could also e.g. model this with a binomial distribution. Once you know there are >= 1 boys, the chance you have two is 0.25/(0.25+0.5) = 1/3.
 Because the order exists even if it doesn't matter (at least for two children, maybe not for two quantum particles).With the risk of being accused of binarism, there are four distinct possibilities with (close to) equal a priory probability of 25%: older boy/younger boy, older boy/younger girl, older girl/younger boy, and older girl/younger girl.Discarding the girl/girl case leaves three equally probable cases.
 I immediately modelled the problem like you did, then I thought of this interesting variation:"I have two children, Michael and Alex. Michael is a boy. What's the probability of both being boys?"If you make a truth table with names as columns, you clearly have only two possibilities for Michael=1.However if you pick older/younger again you're back to 3 possible states.I think the answer is still 1/3, but it's a trickier one to reason about immediately.It seems the question adds information by naming the children, but there's a hidden statement in the form "at least one of them is Michael", which invalidates a truth table with names as columns.I can only conclude that birth order is an underlying property of the entity. A strict, real differentiator as much as sex is. Names aren't, so names don't add information in this case.Is there a term for that? Or am I just wrong?
 In the original problem you start by assuming 4 equally-probable cases Bb Bg Gb Gg [1] and you ask a question. Depending on the answer you are in the Bb/Bg/Gb or the Gg subsets.In your variant you need additional assumptions. Will the person always tell you the sex and name of the eldest? Or the names of the boys?“Michael is a boy” is not really different from “the youngest is a boy”. The probability of both being boys depends on why are you being told that.[1] Depending in the context the assumption may not be appropriate (a extreme example may be China).
 There are twice as many families with 1 boy and 1 girl as there are families with 2 boys.
 I got reasonably far in stats -- up to a single grad level course.My biggest problem with the teaching of math (and I can weigh in on this a bit because I spent over a decade as a private tutor of math) is that math is often not introduced in a concrete manner, as well as in a way such that its utility is obvious. These are two sides of the same coin.You can see this problem in most Wikipedia math pages. I took more math than most physics majors and I find quite a lot of what I stumble across either baffling or of unknown utility. Set theory is a great instance of this. I rarely see multiple examples listed as it is explained, nor does anyone bother to tell me what it is for, other than to do more set theory.Stats at least has the benefit of having to stick closer to applicability, I think.
 The problem is that professional mathematicians have a common language that they all have learned which serves them well...and they're the ones writing the Wikipedia articles. Imagine if all the programming documentation you worked with was at the level of a 1980's first book on BASIC.Set theory is the concrete starting point after that training, and the first step of concrete problems is to translate them into an abstract form of maps on sets. It's a decoupling. Instead of translating n techniques into m domains (n*m bits of work), you develop n techniques in terms of set theory, and translate m domains into set theory (n+m bits of work).Physics majors largely use the same pieces of math on the same domains, so this decoupling doesn't make sense for them. Similarly, most domains carve off some piece of math and statistics and specialize it. But if you're writing a reference, whose specialty do you choose?
 >Imagine if all the programming documentation you worked with was at the level of a 1980's first book on BASIC.mathematical maturity is the same as "code sense". when i started writing code a couple of years ago i would get cross-eyed reading large blocks of code i.e. i would get lost in the syntax and the abstractions and the idioms. at the same time i had pretty decent "mathematical maturity" i.e. i could read papers and textbooks pretty handily. comparing it seems obvious that formal mathematics, with its idioms, abstractions, and syntax is basically the same thing (without pushing the curry-howard isomorphism too far).
 Most computer science students get a set theory introduction and construction of natural numbers, rationals, and reals. Those with an interest in statistics get a foundation laid in measure theory. For other domains, what are good places to look for set-theoretic approaches? A set-theoretic approach to calculus has me intrigued.
 > A set-theoretic approach to calculus has me intrigued. Well calculus (and more generally, analysis) really does hinge on set theory. For example, points of accumulation for limits and hence differentiation, Riemann integration via the supremum and infimum etc are all set-theoretic ideas. Just generally encountered in a Real Analysis class and introductory calculus largely abstracts over the fundamental set concepts that underpin the mechanics.
 I believe this is due to historical reasons, in particular due to the Bourbaki and the French school of mathematics where abstraction was heavily prized. On the contrary mathematics in the Soviet Union had more of a focus on intuition and geometric grounding for mathematics, however history played its course and mathematics is taught more towards the Bourbaki style these days, however it did bring out gems like [1] which give an opposite, albeit extreme, view of how mathematics should be taught.
 > On the contrary mathematics in the Soviet Union had more of a focus on intuition and geometric grounding for mathematicsA lot of it comes from mathematics always being taught in conjunction with physics. Not sure what the roots of that are.
 It's difficult tho, the idea is the student should want to learn that they learn no matter how boring because the teacher/institution says so. I remember when I got introduced to matrix in high school, it felt pointless and stupid. Why would I want to transform matrix? I hated the entire thing, sure I could solve it but for what purpose? Then I learned I could use it for 3D engine, and I could load my object in a matrix and transform to move things around, that got my attention. However, a student that has no interest in 3D engines won't care. How then do you make it exciting? It's not enough to make the problem concrete, but the student will also need to be interested in the concrete application of the subject. Teaching is hard!
 Maybe it was taught in the wrong order. Matrix multiplication makes perfect sense if you see (a) that linear transformations can be represented as matrices, I.e A = f, B = g and (b) you define matrix multiplication so that the composition of linear transformations I.e. f(g(x)) = ABx gives the same result. That’s a pretty cool idea, that you can represent a function by a matrix and that you can compose those functions and the composition is the same as multiplying the matrices together. So let’s say f(x) computes the derivative of a polynomial, and you have a matrix representation for that, say A. Now what if I want the second derivative? That would be f(f(x)) or just AAx.I suppose I would tell a student that asks “why would I want to remember the stupid rule for multiplying matrices together in that way?” to try and figure out a rule for multiplying two matrices together such that you could represent functional composition by it, and they would self-discover the matrix multiplication rule and see why it would be useful to do it that way.
 It's not that it didn't make sense, it's that it was boring! Of what use is it? Sure, you have composition of linear transformations, but of what use is it again? If there's no use, it's boring. Pure math is like a puzzle, some people really love to solve the puzzle. The joy is in the solving the puzzle, and then for some people the joy only manifests if it's applied. Point of the original post being that math is harder for students to accept in its pure form if not applied.
 Matrix calc organically fits with systems of linear equations. Hard not to appreciate its expessiveness and beauty.If one was already exposed to the need of solving the linear systems, then the matrix calc becomes of a direct utility.
 A good teacher can make a great difference but as you say, the pupil needs some motivation - some reason to want to learn it.I took exactly one class of 2nd year physics and couldn't see the point of it. I walked out, quit physics altogether and came back the following year to do CS.
 Absolutely. As a private tutor, finding applicability and motivation for students was quite a challenge. I spent a lot of time reformulating problems in terms of things they might care about.
 I agree with you on the incomprehensibility of mathematics Wikipedia pages, even as a working mathematician I can really only read the pages about content I already know in-depth. But Wikipedia (and many other reference-type sources) are not the place I would go to learn a new topic in maths, I would always prefer to read a short book aimed at the correct level of knowledge, or a set of course notes (many of which are available online). I don’t think Wikipedia is representative of the teaching of maths at all.Many of the courses on very abstract content I’ve taken (from other people, in person) has been very concretely introduced, or at least the course has been run with a 50/50 split on the abstract (general theory) and the concrete (solving problems).
 > math is often not introduced in a concrete manner, as well as in a way such that its utility is obviousThis cannot be emphasized enough. Most people won't be doing any higher order math in their lifetime -- we should be teaching math literacy and "real world" math for the masses rather then pretending that every student is going into physics.
 And we should be clear here that the bar for improvement is verrrrry low. Take a look at the questions asked in the National Financial Capability Study, and the results:https://www.usfinancialcapability.orgThe survey is composed of five really-quite-basic questions about interest rates, inflation and risk assessment.As of 2018, 66% of survey participants get 3 or fewer of the questions right.
 AFAIK logo (the old turtle language) was born as a way to address this. Papert and his buddies wanted to turn thinking into semi tangible forms. More interactions, more inputs to feed your brain to think more.Also, maybe it's only me, but mathematicians often forgot their own culture. Or maybe it's due to the iconic status of the field during early education, making people never explain why they do the things they do. When you read about history of mathematics you see that problems were 1) very practical ones 2) first tricks were very natural. It is not a thing from the gods.. at least it didn't start like this.. it condensed into a diamond over centuries of refinements. But if you don't show that to the crowd, you lose 90% of the audience. It's a pity.
 I've found it helpful to use probabilities throughout my life. When picking a career, it's helpful to know the graduation rate and the likely salary if you do graduate to calculate the expected value of pursuing any particular career. I started playing poker and in order to be profitable, I learned the odds of various situations.Sometimes folks will be afraid of things, even though the odds are they'll be okay. Knowing the odds can give you courage where others find it difficult to go beyond fear.I've been a fan of Peter's AI book(s) since my university coursework. Glad he's introduce so many of us to these helpful ideas. Basic ideas can go a long way if you learn where they apply to your life.
 I graduated with a computer science major, and I've taking courses on linear algebra. It didn't really hit me about practical applications of linear algebra until I watched a video on youtube by Stuff Made Here[1].Which is funny because I totally recognize now how it can be applied, but moving from system of equations on paper to real world applications never really clicked in usefulness until it was explained in that video.
 > You can see this problem in most Wikipedia math pages.This gets mentioned so often that I feel it hints at a deep misunderstanding at what Wikipedia is or is supposed to be.Wikipedia is an encyclopedia. It's not a teaching resource, and it shouldn't be. (Nor, for that matter, is it a primary source, which makes it useless as a reference if you're writing a paper, unless you're specifically writing about Wikipedia.)Mathematics is such a ubiquitous language. Different people will require different kinds of mathematics, for vastly different purposes and in various levels of depth and formality, there's no way this can be unified into a single-size-fits-all resource. If you want to learn mathematics, now more than ever there is such an amazing breadth of text books, blog posts, lecture videos, online communities and so on that you can use depending on your very specific needs. I don't know why people go and try reading up on mathematics from Wikipedia, out of all places.
 Wikipedia is an encyclopedia. It's not a teaching resourceIsn't an encyclopedia a teaching resource? Etymology isn't destiny, but the "pedia" part is from Greek παιδεύω, meaning "child rearing". I'd interpret that as meaning "instructional".The "encyclo" part means "circle", which in this case alludes to "comprehensive". As in, it's not just one instructional resource, but a guide to everything.The word will grow and change over time, but I think most encyclopedias are still used as teaching resources. That doesn't mean Wikipedia has to be, but if it really wants to be an encyclopedia, I think it means the opposite of "shouldn't be a teaching resource".
 No, it's not. An encyclopedia is a reference (as you can, indeed, ironically find out by checking the Wikipedia entry for "encyclopedia").> Etymology isn't destiny [...]exactly.> [...] but I think most encyclopedias are still used as teaching resources.Of course, learning how to use a reference properly and effectively can be a goal in didactics, especially when you're studying a particular subject. But the reference work is not, by itself, a teaching resource such as a textbook.I don't understand what's so controversial about that. If I open a page about, say JIT compilation, I'm also immediately confronted with jargon. The fact that most people here would be familiar with that jargon doesn't mean that that page is of any use to someone who doesn't know what a compiler is. The same applies probably to hundreds of different subjects, be it biomedical, philosophical, etc.
 This is a super common issue that people are raising over and over again, but there's already a solution for this: split teaching into technical and non-technical. In some countries universities are split along this axis.Take statistics for example - most people that know something about statistics don't exactly know what a statistical space is (and it is a very precisely defined concept, embedded in the set theory). And that's fine, that's for the "pure" nerds. How to use it can be taught without defining the roots and proving theorems from the ground up. It is also how most of software development is done, few people out there that write code understand how CPUs fetch and execute instructions, talk to other perhipherials, how does malloc()_or sin() work, what is a page fault, or how to balance a red-black tree. Just use std::map or dict() or something, it just works :)
 > most people that know something about statistics don't exactly know what a statistical space is (and it is a very precisely defined concept, embedded in the set theory). And that's fine, that's for the "pure" nerds. How to use it can be taught without defining the roots and proving theorems from the ground up.It's a popular theory, but then one day you read every recent biology paper and notice that only 2% of them are able to do statistics in a way that isn't total nonsense.The only way for "learn how to use it, but not how it works" to work is if you get feedback whenever you make a mistake. That is not true when you use statistics.
 >how CPUs fetch and execute instructions, talk to other perhipherials, how does malloc()_or sin() work, what is a page fault, or how to balance a red-black treeliterally every single one of these things is taught in an undergrad class and understanding of which is deemed important by the community - curricula get lots of input from industry partners. so you're not making a great case for why people don't need to know what a sigma algebra is...>Just use std::map or dict() or something, it just works :)wouldn't it be swell if this is how we practiced medicine too? patient has early stages of atherosclerosis just do a triple-bypass or something, it just works.
 > you're not making a great case for why people don't need to know what a sigma algebra isI can assure you most people that do stats / data mining / big data / machine learning do not know or do not remember any more what a sigma algebra is.> wouldn't it be swell if this is how we practiced medicine too? patient has early stages of atherosclerosis just do a triple-bypass or something, it just works.C'mon writing websites is not surgery. There are some people with deep knowledge required in the industry as a whole, but you really don't know to know much to write an app or a website, especially an internal corporate tool. And this is where most working hours are spent.It's the reality of things. I mean just look at the OP link. Do you think this is targeted at people that already know what a sigma algebra is?
 this is such a tired old debate.1. fundamentals are important even if people forget them.2. not everyone in tech does web dev.these things are true and self-evident. the end.
 Let's end, indeed :)
 I suppose it's tough to balance detail with complexity when it comes to a general wiki like Wikipedia. Especially with math and its domain language.I believe that the issue you're outlining was the precursor to simple.wikipedia.org. Sorry... I couldn't come up with a math example on the spot but here's a good CompSci example:
 The role of university is exactly to teach you the language of math, or at least enough that you can get more by yourself. If your professors spent all the time talking only about concrete examples, you would be completely at a loss about why math is like it is, from the point of view of the untrained person it would be just cargo cult. So, math is hard, but the university has the responsibility to teach it (at least the introductory material) to as many people as possible.
 The problem is sticking to applicability in probability/stats only gets you so far. I'm in that rut having a firm grasp of how it is applied but not being able to actually apply it in the real world because the real world scenarios are all different.It's easy to make trivial mistakes without a firm grasp of the theoretical side in my opinion. It's important to understand the implications of the choices being made and the theoretical limitations of the models being created.
 I am not suggesting ditching all but applicability, I am suggesting that applicability be present. It's not an "either or."
 Neither am I. I just feel like it's not true that applicability is ignored. In fact I feel like books that demonstrate applicability are far more numerous than the theoretical ones.Set theory for example (since you mentioned it) is introduced in many probability books in an applicable fashion since probability is concerned with events which can be modeled as sets of outcomes. So while you might not find a book directly on "set theory applicability in the real world" you can find many introductions to it in a particular domain which are applicable.
 The beauty of Wikipedia is that you have the power to improve and make changes as you see fit.I know it’s a meme that edits to Wikipedia are fraught with editor politics and reversion of good faith changes made by users, but that hasn’t been the case for me lately. If you have something to add or improve, don’t let anything hold you back!
 I'm the odd counterexample, where proofs were what made math come alive for me. I suppose you could say that the utility of math is in furnishing tools for proofs, but that's probably not what most people mean. ;-)Utility was for physics, which I also majored in.
 If you define the probability function you'll need set theory within about 5 minutes.If you want to define the probability measure you'll need to be pretty comfortable with set theory.
 Shoutout for another great online MOOC : https://www.edx.org/course/probability-the-science-of-uncert... (It is the same as MIT OCW's 6.0.41)I did it as preparation for my Masters and it was genuinely helpful. Would recommend it to everyone looking to do a prob 201 before taking advanced-ish courses.
 I would recommend Harvard Stat 110 over MIT's probability courses - https://www.edx.org/course/introduction-to-probability (you can find full lectures on youtube and the book online)
 I found the EdX course too rushed (assuming you don't pay for the verification to get lifetime access). I like Bliztstein, so I instead used his Youtube playlist [0]. It has his full lectures. Also, his book is free to view [1].
 How come?
 Blitzstein is a good lecturer. I think I did this course back in the day.
 do you the link to youtube. Cant' find it; that why i gave up on the class.
 Everything I see from Peter Norvig is just always so incredibly well written and coded. I'd love to know what it's like to work for him.Every year looking at his solutions for advent of code [0] brings just so much learnings. Strongly recommend.
 > Everything I see from Peter Norvig is just always so incredibly well written and coded. ....I feel his skill of dividing a problem into small pieces and expressing them in code in a natural way is unparalleled. Most books/blogs/articles I see often focus on one of two patterns.The most frequent one is pulling in some dependencies and using a high level API, essentially skipping any real problem solving. Great when you just need a problem solved and are familiar with some framework/library but not that great for learning to program or problem solvingThe other one is a deep dive into a data structure, algorithm or performance tuning. This is great when studying theory or optimizing. These articles are more interesting but I haven't encountered many people who are in a position where this is relevant to day to day work.The missing pattern is one where Peters' work shines. The parts in between. All the libraries that are used in the first example I described are the result of someone taking the building blocks that result from the second example and applying them to a real world problem. Peter Norvig is my go to recommendation when someone is interested in becoming better at solving day to day problems because of this.
 The following data science book also does a great job of balancing problem solving with underlying theory. https://www.manning.com/books/data-science-bookcamp And it starts with sample-space probability problems in Python, much like Peter’s tutorial.
 The times I had a peek behind the curtain, he didn't expect to stop with the first version, even though it'd be good.(Maybe those Advent of Code solutions are the first working draft, I don't know.)
 This is a cute introduction to probability. However, I would've loved to see some mention of dependent events and continuos probability.
 I do think that there's some merit in sticking with probability on discrete spaces for a while. Once you start dealing with continuous spaces, soon you're talking measure theory and you can wade deep into the technical details and miss some understanding of what's going on. I go back and forth on this as I think it's largely down to the reader to figure out what works for them, but I think probability is one of those fields where developing intuition early on is a must if you want to go further.
 The actual requirement for measure theory is overblown. As long as you've taken single and multivariable calculus, you can study continuous probability without any problems and without even knowing what measure theory is.
 Agreed, not knowing measure theory never stopped me from computing a conditional expectation. Some courses and books overemphasize rigor in probability and, while it obviously has its place, I've seen newcomers to the field become obsessed with doing everything via measure theory. Further to your point, volume two of Feller is pretty light on measure theory IIRC.
 It does have a section on continuous probabilities at the end.
 Yes, In the appendix thanks.
 I've never seen Peter Norvig choose anything but the most elegant and perfect data model for the problem at hand. I wonder what it's like to be in his brain.
 I read portability and rush to the link waiting to hear about pip and all other mayhems and I landed on a math page. Boy I need my coffee.
 Similarly, I thought "from fractions import Fraction" required a `pip install fractions`. But nope, turns out fractions is a built-in module I've never heard of. Neat!
 Similarly, one of the most suprising things for me was that complex numbers are built into the language (no imports at all).
 This is some extremely stylish expository python (as most of the other comments are saying).
 any recommendations for probability as a foundation for cryptography?
 thanks

Applications are open for YC Winter 2022

Search: