The Man Who Invented Modern Probability 185 points by tchalla on Aug 22, 2013 | hide | past | favorite | 71 comments

 > Kolmogorov armed a group of researchers with electromechanical calculators and charged them with the task of calculating the rhythmical structures of Russian poetryAnd this is what "big data" and NLP were like before computers :).One can only imagine what he could have done with a modern computer. (Which is particular hard to imagine because he's responsible for advances which ultimately led to these same modern computers...)Also, as an aside, I'm very impressed by the quality of nautil.us articles. I've come a across a few randomly, and they've been uniformly high in quality, interesting and insightful.Moreover, unlike much popular science/math reporting, the articles are not ridden with obvious errors and yet are easily followed by somebody not in the field. An impressive combination. (Not surprising for this particular article, I suppose: it was written by a professor teaching the history of mathematics at MIT.)
 >And this is what "big data" and NLP were like before computers :)aside, I recently interviewed a candidate for a "Big Data" position. He got quite chatty about Kolmogorov & Measure Theory, so I quickly cooked up an interesting problem to relax his nerves. I opened the interview with "If the side of a cube comes from a uniform distribution between 1 and 5 inches, what's the expected volume of the cube ?" I thought he'll give me the answer in 10 seconds & we'll move on but instead he was completely stumped & just stared blankly. I asked him to use the whiteboard but he simply drew a cube and wrote a^3, a~[1,5], and then went back to staring at it. Finally after 10 minutes of pin-drop silence, my partner, a non-math guy, took over the interview & asked him some Hadoop related technology questions & he started speaking again.
 If the side length isn't constrained to integers, the expected volume is 39 inches cubed.In a bit more detail: take the integral of x^3 (the volume function) from 1 to 5 and you get 5^4/4 - 1^4/4 = 624/4 = 156. Then we divide by the length of the line segment (4) to get 39.If the side lengths have to be integers, then it's just (1 + 8 + 27 + 64 + 125)/5 = 45 inches.Edit: corrected division mistake, thanks to dxbydtEdit 2: and corrected a second mistake thanks to tzs.
 > Edit: corrected division mistake, thanks to dxbydtWell, one of the mistakes:> If the side lengths have to be integers, then it's just (1 + 8 + 27 + 125)/4 = 40.25 inchesIf I'm not mistaken, 4 is an integer, and so should not have cruelly snubbed when you sent out invitations to your summation of the cubes of the integers from 1 to 5. Sure, 4 is not a cool integer like the others in 1 to 5...he's got a reputation as a square, but that doesn't mean he doesn't have feelings.
 Oy. Someone needs to revoke my math license. Thanks for the correction!
 My, my, 2 mistakes on a silly uniform distribution! You must atone for this sin. Explain what the volume would be if the side came from a normal distribution over [1,5] (ie. mean 3, sigma 2/3) Otherwise I will personally call the MAA director on my speed dial & haul your ass back to stat101 :)But seriously, you write rather well on Quora & I'm sure the crowd would want to know what happens to the volume were you to choose a normal distribution instead of a uniform, so I hope you respond.
 From [0]:Let X be a random variable and g be any function.If X is continuous, then the expectation of g(X) is deﬁned as,E[g(X)] = \int_−∞^∞ g(x)f(x) dx,where f is the probability density function of X.---For us, with mean 3 and sigma 2/3, f = 3/2 e^(-9/8 (x-3)^2)/sqrt(2 π)We need to integrate numerically; using Wolfram [1] we get an estimate of 30.8099.If we ask for the expected value over the entire distribution we get 31.
 156/4=39
 not 59, 39.
 Heh, failed at the simplest step. Thanks for the correction.
 He was staring at it because he was looking for a shortcut that must make the problem obvious enough to be done in 10 secs. If you don't provide context on such tasks, that's expected.
 The shortcut's E h(X) = ∫ h(x) f(x) dx. So, math.
 It's stories like this why I keep coming back to Hacker News
 27?
 Nope. To see where the subtlety lies in this problem, simplify it. Change it to a discrete problem: the side can be 1, 3, or 5, each with probability 1/3. The expected value of the side length is still 3.1/3 of the cubes have volume 1, 1/3 have volume 27, and 1/3 have volume 125. The expected value of the volume is 51. Think about why it is 51, not 27, and you'll see why it is not 27 for the continuous case.In general, if you have a random variable X, and some function of that random variable, f(X), then it is NOT true that expected value of f(X) = f(expected value of X).
 27 would be correct if each side were independent random variables rather than the cube of a random variable. The way the problem is set up you are suppose to solve for E[a^3] (the third moment of the random variable) rather than (E[a])^3.
 39
  from random import random sum([(random()*4+1)**3 for x in xrange(1000000)])/1000000 => ~39  Weird.
  sum(random.uniform(1, 5)**3 for i in range(10000)) / 10000  Many people don't know that the random module allows to sample from many distributions directly without having to construct your own.
 I like the first version better because everyone knows what infix * and + do.
 I like the second version better because it taught me something I didn't know.
 I'd say 45, assuming the side length is an integer.
 >One can only imagine what he could have done with a modern computer.The historical anecdote my thesis advisor was telling about origins of his (i mean founded by him) branch of the science was along the lines "Americans had computers and could get good numerical solutions, while we didn't have computers and thus were forced to work on analytical solution". Well, that analytical solution he produced happened to be a great insight into the problem space and still continues to be the key tool for the research in that area almost 60 years later.
 I only know NLP as an abbreviation of the pseudoscience of neuro-linguistic programming.What does it mean in this context? Non-linear probability?
 Natural Language Processing.
 The treatment of the Paradox of the Great Circle in the article is not very good. If you're left wondering, Wikipedia helps:
 tl;dr Because Great Circles have measure 0, any treatment of probability on them must instead use limits that asymptotically approach Great Circles. Also because they have measure 0, such limits don't converge in a consistent fashion. As it turns out, the two most common limit constructions give different results for that particular problem.The closest analogy I can think of is trying to assign a value to the expression 0/0. It doesn't have a value, but you can construct limits that appear to converge to it. One such limit is lim{x/x} as x approaches 0, which tends to 1. Another limit is lim{x log(x)} as x approaches 0, which tends towards 0. Both of these constructions have claim to being the value of 0/0, and they don't reconcile with each other.
 My father was the grand-grad student of Kolmogorov. He (Kolmogorov) is a very impressive mathematician. My father's advisor, Yakov Sinai, is equally impressive and arguably one of the founders of Dynamical Systems, a very interesting subfield of mathematics (and, in some ways, probability theory) in its own right.
 Do you mean Kolmogorov-Sinai entropy? I just read about this in Chaos Theory Tamed by Garnett P. Williams. Really interesting stuff with respect to information theory in dynamical systems.
 An interesting take the Kolmogorov system can be found here:http://www-biba.inrialpes.fr/Jaynes/cappal.pdfIn it E. T. Jaynes compares Bayesian statistics Kolmogorov probabilities and finds them essentially identical in result even after working from very different first principles.
 For anyone interested, the above appendix comes from ET Jaynes's[1] magnum opus, "Probability Theory: The Logic of Science" [2].Which, IMHO is deserving of it's grand title and provides an excellent framework for using Bayes in the real world.[1] http://en.wikipedia.org/wiki/Edwin_Thompson_Jaynes
 I have no reason to particularly disbelieve the research that went into this article. But I'm just naturally skeptical when he reels off a list of puns that depend on the english language used by a group of Russian researchers. This seems unlikely to me. It seems like unnecessary color of dubious provenance.
 Why haven't I heard of this journal? Looks great.
 It's a startup! They started publishing about a month ago.
 They haven't been around too long, but for the two months I have seen, I have been delighted at the quality of their articles.
 >For example, irrational numbers—those that cannot be written as fractions— almost surely have no pattern in the numbers that appear after the decimal point. Therefore, most irrational numbers are complex objects, because they can be reproduced only by writing out the actual sequence.That's false. Pi is irrational, but there's many short algorithms to calculate it's digits.
 The key phrases there are "almost surely" and "most irrational numbers." There are irrational numbers like pi which can be defined algorithmically, but taken together as a set they have zero Lebesgue measure. (In fact, they're countable.)
 I didn't know that. Any link to a proof?
 Consider the set P of programs that take no input and generate an infinite stream of digits. Each program in P has a finite length and is written out of a finite set of symbols, so there must only be a finite number of programs of any given length. That makes P countable. Let Q be the set of numbers described by programs in P. Each program from P describes exactly one number, so Q must also be countable. The set of irrational numbers is uncountable, so there must be an uncountable set of "complex" irrational numbers remaining after you take away all the "non complex" ones in Q.
 Your proof seems to only prove that a finite number of programs (described by you) which can produce a finite number of irrational numbers, while there are infinite number of irrational numbers.But we're surely not talking about a finite number of programs.I'm not even sure what the Kolmogorov complexity of a "complex irrational number" means. If you need the sequence of digits and you cannot use an algorithm to produce the digits, then the complexity is infinite?
 For each fixed length there are a finite number of programs of that length. If we use 8-bit bytes for the alphabet, there are 256^N programs of length N. The set of ALL programs is infinite (we don't limit the length of programs), but it is countable.The "countable" part means we can put the set into one-to-one correspondence with the natural numbers. The correspondence starts with 0 mapping to the empty program, then 1-256 mapping to programs of a single byte, then 257-65793 mapping to the programs of two bytes, and so on. This mapping will hit each program exactly once, and it will hit every program because every individual program has a finite length.Different types of infinities are not intuitive so don't feel bad if the concepts are confusing. These issues troubled lots of very smart mathematicians for decades. The existence of irrational numbers was hugely troubling to Pythagoreans. The existence of uncountable infinities discovered by Cantor was shocking [1].
 In some sense, the existence of irrational numbers with no pattern in their digits is an illusory artifact of the number system. We can talk about them collectively because decimals are not required to have an end, and we have to postulate them to fill in the gaps in the number line, but such numbers lack any description or means of being separated as individuals.But then, is any mathematical abstraction real? I guess it's all beside the point.
 > But then, is any mathematical abstraction real? I guess it's all beside the point.I actually think the reality of mathematical abstractions is hugely important because of... computer programs! In a real way, programs are the embodiment of mathematics. I want my programs to work so I need the underlying math to work as well.That's why I'm a constructivist. I reject the law of the excluded middle because proofs that use it don't translate into real programs; they translate into programs that ask an omnipotent oracle to decide which branch to take. Constructive proofs translate into working programs.It also ties into philosophy. I am a skeptic, so when someone tells me either A or not A must be true even if we can never know which one, I ask for proof or justification of that fact. The justifications that I get are remarkably similar to logical "proofs" that god exists, and just as fallacious. This isn't to say that there couldn't be an ultimate truth about A or a god, just that it is not logically necessary.
 Interesting. I'm not all that up on philosophy, and I had to look up constructivism and the law of the excluded middle. Computers can only deal directly with discrete math, so that eliminates quite a bit of mathematics. Moreover, despite being regarded as Turing machines, you could in reality count all the states that a computer could be in, by counting the bits of memory, and you will find that it's a finite number. So computers are, in fact, only finite state machines. It's often overlooked that what makes a Turing machine a Turing machine is the infinitely-long tape. So computers occupy a fairly small sliver of the infinite universe of math.Philosophically, my worldview is like that of science: the way to know something is by making observations and formulating and testing hypotheses. What we can observe is limited, and hypotheses are only models that are tested by successive approximations. I'm not sure I understand what you mean that that statement isn't required to have an ultimate truth. Presumably a statement like that has an answer, but the fact that something is knowable in principle doesn't mean that there's any way to get the answer. For instance, the Hubble telescope can see far away galaxies that we can never get an up close look at. The question of whether there's life somewhere else in the universe must have an answer, but we can't know what's in those galaxies; even with a better telescope, we'd be seeing what they looked like a billion years ago. Many things will never be known.
 This is wrong. The square root of two was identified as irrational by the ancient Greeks, when considering the length of the diagonal of the unit square. The proof has nothing to do with decimal fractions (they didn't use decimal fractions at all).
 I'm referring to irrational numbers with no pattern. Square roots can be described and the digits can be enumerated by an algorithm. The diagonalization argument shows that there can't be a description for all irrational numbers.
 Just to be clear: you think that there are many irrational numbers that exist independently of which number system you use, but that there are infinitely many that do depend on the number system you use? Is that right?
 Irrational numbers, by definition, include decimal numbers that have infinitely many digits after the decimal point, and there are no rules about what those digits have to be. This is powerful enough to represent any irrational number regardless of the number base. However, if you're talking about number systems, not all number systems have equal ability to represent irrational numbers. Whatever the system, to represent all the irrationals, it would have to be capable of going on forever. I think the part of my point that you're asking about is my statement that they're a side effect of decimal numbers. Irrational numbers like sqrt(2) and e have concise representations as the limits of Taylor series. Only when converted into decimals do they appear to have infinite amounts of information. So if we used Taylor series as the number system instead of decimals, simple irrationals would look simple, and we might be less inclined to treat them the same as numbers that have no finite descriptions at all.
 I was under the impression that irrational numbers are numbers that simply can't be represented as a ratio. This is independent of the number system used. I'm not familiar with the Taylor series: can you use it as a number system? Isn't it simply a representation of functions (in which case, you might as well just write "sqrt(2)" as invoke anything else)?It's tough to Google for "Taylor series number system" as you just get pages about the guitars. ;-)
 I don't know if you're still reading this, but... Using Taylor series as a number system is just a hypothetical. Any systematic way of describing numbers could be a number system. Irrationals can't be represented as ratios, but beyond that, some have other finite representations, and others have none.
 Take a look at this.
 > In some sense, the existence of irrational numbers with no pattern in their digits is an illusory artifact of the number system.But irrational numbers remain irrational in any integer number base. Therefore no, they aren't illusory at all, nor are they an artifact of the "number system".> ... such numbers lack any description or means of being separated as individuals.Also false. Many irrational numbers are easy to identify unambiguously. For example, the square roots of numbers that are not perfect squares are irrational, but they can be easily identified by squaring them.While we're on the topic, the terms in the running sum of odd numbers are all perfect squares: 0 + 1 = 1 1 + 3 + 4 4 + 5 = 9 9 + 7 = 16 16 + 9 = 25  As is often true in mathematics, this is easier to explain with a picture:http://arachnoid.com/example/#Math_Example> But then, is any mathematical abstraction real?Certainly. Consider general relativity. It's a mathematical abstraction, and it's also real.
 "Many irrational numbers are easy to identify unambiguously."Absolutely. Cantor proved that an infinite number of others can't be identified at all, because they outnumber all possible descriptions. Some numbers can only be described by an infinitely long list of digits, one that can't be produced by a Turing machine and contains an infinite amount of irreducible information. The Kolmogorov complexity is infinite.Newton's equations describe a mathematical model of the universe that agrees well with measurements taken under familiar conditions. Einstein's equations describe a different mathematical model, one that has good agreement with experiment over a much wider range of conditions than Newton's. But general relativity has well-known problems. Its equations give nonsense solutions under some circumstances, e.g. singularities. Physical theories are models that predict the outcomes of experiments. They're reductionist out of necessity. It's unknown and probably unknowable as to whether a perfect model is possible, but there are likely things that can't be reduced.
 > Cantor proved that an infinite number of others can't be identified at all, because they outnumber all possible descriptions.I'm tempted to say that that definition places those examples in a unique set, thus at least unambiguously identifying the set to which they belong.
 This appears to be exactly the concept I was talking about. Maybe I read about these numbers some time ago and forgot that they already had a name.
 You can identify the set of numbers with infinite Kolmogorov complexity. But you can't separate out an individual from the set.Turing machines might not capture all numbers that can be described, but, interestingly, descriptions and Turing machines have the same cardinality.
 The set of irrational numbers which can be defined algorithmically are programable, eg. they are defined by a program of finite length.
 >The set of irrational numbers which can be defined algorithmically are programableYou mean "computable".
 There's only a countable number of formulas of ZFC. Therefore, there's only a countable number ways of writing out a formula of ZFC that uniquely specifies a real number. Therefore, only countably many real numbers exist that have such definitions. And all countable sets of reals have zero Lebesgue measure.
 Additional question: what's the Kolmogorov complexity of a irrational numberwhich is not described with a ZFC formula? And how is it described?
 I haven't really studied algorithmic information theory, but I'd assume that Kolmogorov complexity isn't defined for uncomputable/undefinable numbers. Or maybe it's defined as "infinite," but either way, my guess is that such numbers are simply ignored. (Interestingly, the function which takes a computable number and outputs its Kolmogorov complexity is itself uncomputable!)
 The article at least implies that the Kolmogorov complexity of uncomputable irrational numbers is larger than computable irrational numbers.
 It reads to be like the write heard someone say almost all and didn't realize that that has special mathematical meaning:
 I thought this was going to be about Pierre Simon Marquis de Laplace who wrote A Philosophical Essay on Probabilities in 1814.
 I am not a statistician, but if I found myself in an infinite forest, I would probably get drunk, too.
 Discovered, not invented.
 > Discovered, not invented.This refers to a philosophical debate that asks whether mathematics is part of nature or an artificial invention of man. I think the debate has begun to lean toward mathematics being part of nature, in which case yes, the proper term is discovered.
 Interestingly enough this notion, that there is only this one type of probability was debated before Kolmogorov by people like Ramsey and Keynes. Now the subjective interpretation of probability is called Bayesian statistics, which is not a helpful term.
 How is this not about Francis Galton?!
 For those not in the know, Galton invented the simplest way of understanding normal distribution. Here's a copy you ca ceck on your phone. www.intromath.ca/aakkozzllA quote from Galton: http://www.aakkozzll.com/docs/order/galton.htm

Search: