Hacker News new | past | comments | ask | show | jobs | submit login
Is Probability Real? (arameb.com)
220 points by EbTech on Nov 28, 2020 | hide | past | favorite | 218 comments



Since this topic isn't so well-known, I wrote the case arguing that frequentist interpretations don't work, but algorithmic information theory (Kolmogorov complexity) does. I want to make this accessible and persuasive, so thoughts, questions, and arguments would be appreciated!


A simple sentence that I've found useful for pedagogy: "the probability of that coin toss being 50% does not talk about the coin; it talks about you, and about your partial knowledge of the universe."

You can add: "The coin toss itself is deterministic and the result can be computed if you know the initial position and speed." They will inevitably bother you about the physical impossibility to measure the starting position and speed exactly, and then you say "ok, forget about the coin. You have 5 white and 5 black balls inside this opaque cylinder. What's the probability that the top ball is white? This does not talk about the balls (the color of the top one is already determined) but about your partial knowledge of them".

(EDIT: formatting)


> the probability of that coin toss being 50% does not talk about the coin; it talks about you, and about your partial knowledge of the universe.

But it does talk about the coin - a weighted coin would have a different probability. Same in the example with the white/black balls - if they weren't 5 white and 5 black but 6 white and 4 black, the probability you would assign to the top one would be different. Again, the probability is a way to describe the balls themselves, not just our knowledge.

I get the general idea of representing probability as uncertainty and partial knowledge but your statements strike me as just straight up incorrect.


Thought experiment: you're presented a jar, and offered the chance to bet ten bucks on drawing a white ball. You know nothing about the content of the jar. What odds would you take?

Now, you see ten white balls get added, then are blinded while other balls are added (maybe). You estimate the jar can't hold more than about a hundred balls. What odds would you take?

Now, you see ten white and ten black get put in, and saw it was empty before. What odds?

Now, you see ten white and fifty black, but the whites are larger, and you get to draw a ball. What odds?

The difference between the second-to-last and the last is the missing information we usually think of when we talk about randomness being missing information.

And you'll see that the previous scenarios don't change anything about that.


This is still a partial knowledge situation - you have the information that there are a certain proportion of colored balls in the chamber, but not the information about their order. The probability includes the information we do know and allows inferences about information we don’t know.


Sure, I just can't agree with the parent statement that probability has nothing to do with the object it describes, which is demonstrably false.


It has to do with your knowledge of the object. To that extent, it has something to do with the object it describes.


They didn't say that it has "nothing to do with the object it describes" though. It has to do with your knowledge, which itself has to do with the object your knowledge describes


"The probability of that coin toss being 50% does not talk about the coin." I think there is a subtle equivocation going on here.


Well, it's getting philosophical now. We do not have a way to experience the true nature of the coin. We can only experience an "image" [0] of the coin and we summarize all "images" of that object into knowledge.

[0] by image I do include your vision, but also hearing, feeling and other methods of perception.


I do not think there is anything very philosophical here, just the point that the claim in the post I was replying to was demonstrably false.

I am intrigued by the idea of the coin having a "true nature" that we have no way to experience; I would like to know what this elusive "true nature" is, but if we cannot experience it, I don't suppose you can tell me. Instead, I will settle for an explanation of how you know it has such a true nature.


I don't think it's demonstrably false: If you don't know that the coin is weighted, the probability is 50%. Probabilities are predictions and estimates, not fundamentally about the thing itself, but about what we know about the thing.


The principle of indifference? I know that it is a commonplace assumption, but feels to me as though one is assuming one has more information than is justified. Coming back to the article's "economist's wager", is it rational to bet with even odds on something you know nothing about? If the assumption is interpreted as a testable hypothesis about outcomes, why would complete ignorance imply any particular result? On the other hand, if it is interpreted strictly as a statement about one's knowledge, why present it exactly as if one had sufficient knowledge of the situation to know that the probability is 0.5? Maybe the author will have an answer in part 2.


> If you don't know that the coin is weighted, the probability is 50%.

No, if it's weighted it's not 50%. Your prior probability is 50%, but neither a Bayesian nor a frequentist would claim the true probability is known before testing.


There is no such thing as "true probability" in Bayesian interpretation, it only exists in frequentist world.

Notice that it is possible to build a robot which flips a coin in such a way that it's always heads - sure you might need to build a different robot if the coin is "biased"(you probably mean its weight distribution is uneven) but it's still possible.


But that’s the thing: the “true“ probability is unknowable, and may even be an ill-defined concept. It is a deterministic process, so “probability“ is just a simplifying concept to describe our best guess belief about how the coin behaves in the aggregate.


The true probability requires an infinite sequence of tests, so it's by definition unknowable. But it's what any sort of statistics attempts to approximate.


But, once the approximates are within undetectable differences from the "true probability" then you are done.

Because not only is the true probability unknownable, it is also unencodeable, but, if we are to accept a limitation on encoding, then we CAN give a true probability subject to that.

Like.... if we are to determine a coins probability to 1 decimal place, then we can do that.


> the true probability

Interesting expression.

After testing, it turned out that you flipped it and it landed on heads.

Does that mean that you've discovered that the "true probability" for that flip should have been 100% heads?


Of course not. If you're a frequentist you can say your best estimate is 100% heads with an unknown variance, and if you're a Bayesian you work out p(a|b) = p(b)p(b|a)/p(a) and update your priors (which will not give 100% heads). The more coins you flip, the better you can estimate the true probability


IMO (I should have defined this) the true probability would require an infinite sequence of tests to determine.


Ok, so let's imagine you build a servo-driven flipping machine to carry out an infinite series of tests and notice, after a thousand of them, that 99% of the time, the coin flip matches the orientation of the coin when it's loaded into the machine.

What have you learn about the coin's true flip probability?


You've learned that the system of coin + machine has resulted in the same orientation 99% of the time. You can put some error bars on that, investigate the differences (did that 1% where it changed happen disproportionately with a certain side of the coin up?) and from that provide an estimate for whether the coin is fair. If the confidence intervals aren't small enough for you, you can do more experiments. The confidence interval will never be 0, until you've either done an infinite sequence of trials. (Only axiomatic logic can have confidence interval 0, and it doesn't make statements about the real world, only about the axiomatic system in use.)


So, let's say that we continued the servo tests 1e99 times, with the coin loaded in each orientation equally. We measured 50.00% flips for heads and tails, and continue to see the 0.99 correlation with the initial orientation. The 1% of the time that the correlation doesn't match, it doesn't seem to show any bias for one side or the other.

So after an "infinite" number of tests, we continue to get 50.00% frequency of heads, but with an 0.99 correlation with the orientation when loaded into the machine.

Now I load a coin into the machine and ask you to name the true probability that the result is heads. I don't tell you the initial orientation, but I know it privately.

What's the true probability of heads? Our testing found precisely 50.00% frequency of heads. But are you still sure the probability is an intrinsic property of the system, rather than a property of your state of knowledge of the system?

We can continue the pattern; maybe the 1% error itself correlates to 0.99 with someone running the microwave in the kitchen. This drops the line voltage and causes the servo to impart a little less momentum to the coin, causing it to flip one fewer times on average. Neither of us have currently checked that the microwave is running... And so on...


But what do the error bars themselves mean? Are they not probabilistic in nature themselves?

Say you conduct a thousands trials and calculate the error bar based on the results. If you conduct a hundred such experiments (each consisting of a thousand trials) and one of the experiments violates the error bar, does that invalidate it?


>What have you learn about the coin's true flip probability?

Nothing because you only tested the coin flip machine in aggregate. If you have a different throwing mechanism the results could be completely different.


When I flip it with my hand, aren't I testing the coin-hand system in aggregate?

If nothing is flipping the coin, then what coin flips are we making predictions about?


But what if the apparent probability after n trials does not converge as n grows arbitrarily large?

If we observed such a system in nature, what would its "probability" mean?


I like the balls in container better - the mechanics of determinism is more obvious - and the contrast between perfect and partial knowledge.

And there's another interesting point - you could view "there's five white and five black balls" as your model. If in reality, there's one white ball and nine black - then your math is still right, but your model is wrong.

If you do experiments with the wrong model (assuming 50/50, getting samples from 1/10) - your best conclusion would be the model is wrong.

But for many settings, you'd end up declaring your container or the hand used to pull out balls has magical powers. (and to borrow from Douglas Adams go on to prove that black is white, and get killed in the next zebra crossing).


> But it does talk about the coin - a weighted coin would have a different probability.

Nobody would a priori assume it is a weighted coin, in this example. A coin is chosen because it's been a standard weight and measure-backed object for centuries. It is about the observer's knowledge, which includes assumptions from every day experience.

You have to base a prior on assumptions. If you assume nothing and flip it 1000x, and calculate the probability, than you base that on assumptions of your own flipping ability, and hand wave it away with count divided by trials.


>Again, the probability is a way to describe the balls themselves, not just our knowledge.

Probability is about the fact that you don't know everything about the balls themselves. If you could make 100% predictions then your knowledge of the balls would be equivalent to the balls' description.

Why do you even use "describe the balls themselves" to describe this situation? From the perspective of probability you just set an upper bound to how much knowledge you could possibly have about an object, it's still knowledge.


I think this is just semantics; whether you read the parent comment as describing a hypothetical and fair coin with 50% chance for each state, versus saying that a god-like entity could reduce this to a deterministic computation via replaying the coin flip exactly (eg. same ambient air conditions, position of coin on finger, exact sequence of muscle nerve activations..) and that because that's impossible for you to do you just assign the minimum-information-content-probability to the coin flip (50%)


Forget the coin and the balls — does the nucleus decay? You're not missing any knowledge; there isn't any.


This is definitely the most interesting example, but it's not obvious that a situation where the relevant information is fundamentally inaccessible is a situation where you aren't missing any information.

It's your best bet for a scenario where you can be sure that nobody else has more information than you do, though.


It may be deterministic, but are you sure it would be computable? One does not necessarily imply the other.


As a caveat, while this intuition works for classical mechanics, it does not work for quantum mechanics. All observations are consistent with wave function collapse being fundamentally random. Any hidden variables would need to be transmitted many times faster than the speed of light (~10000x, last time I checked the experiments), and are therefore inconsistent with our understanding of special relativity.


Note that pilot wave theory, an (out of vogue) interpretation of quantum mechanics, also recasts the apparent randomness in quantum mechanics as due to our ignorance of the exact state of the pilot wave.

Even Einstein struggled with quantum mechanics, famously saying "[God] does not play dice with the universe".


Wigner's friend might have something to say about this...

I don't have a specific argument to make here, only the feeling that if it were all just a matter of what a given observer knows, no-one would be talking about there being a QM measurement problem.


This is a very good point!

In fact, some people do argue that there is no measurement problem in the Copenhagen formulation of quantum mechanics to begin with – at least if you take it seriously and strictly go by the rule that the laws laid down by Bohr et al. only concern you as the observer and your knowledge about the system, and not the system itself. Following this train of thought, there is nothing "real" about the wavefunction and it is just a tool to come up with predictions. The same goes for the collapse of the wave function (which just describes a change in your ability to predict future measurements, and not a change of the object) and the term "measurement" (which we might as well replace with "enlightenment", i.e. the moment in which we obtain knowledge about the system).

In that sense, the only difference between classical and quantum mechanics is that our knowledge (viewed as a mathematical quantity) behaves differently in both theories: In classical physics, when we conduct multiple measurements of a given system in a row, our knowledge about that system will increase – to the point that, once we have measured all system properties to sufficient accuracy, we'll able to predict what any future measurement of any of those properties will yield (again, with some predictable uncertainty). So the knowledge of all our measurements has added up, it is an additive quantity.

In QM, this is fundamentally different: We can only know anything about the object the very moment we look at it. The rules of quantum mechanics (again, in the very strict interpretation laid out above) dictate that the second we conduct a measurement, we can forget about any knowledge obtained through previous measurements of other (conjugate) observables: Future measurements of those observables are inherently unpredictable. In that sense, our knowledge about quantum-mechanical objects never "adds up" to anything. (To see that this is really the the distinguishing feature between classical and quantum mechanics, recall that the existence of conjugate observables really is the only thing setting apart the quantum from the classical world: Without conjugate observables it would be impossible to distinguish, say, 100 electrons in a superposition of spin up and down from an ensemble of 100 electrons of which 50 are in a spin up state and the other 50 are in a spin down state.)

Of course, this whole interpretation is very unsatisfactory to lots of people (myself included) for a whole bunch of reasons. I assume that, to a large degree, this is due to the fact that laws of nature that put human observers in their very center seem rather undesirable. (At least since the time we switched from a geocentric to a heliocentric view of the world.)

But my impression is that there's another reason: Our intuition from classical mechanics & statistics has taught us that objects exist independently of us as observers and behave in a deterministic fashion, at least provided we as observers know enough about them. (Meaning that the more we know about the coin's initial position and velocity, the more likely we are to predict the outcome of the coin toss. If we don't know anything about the coin, though, the outcome is as unpredictable as measuring spin up/down in quantum mechanics.) Unfortunately, this whole line of argument is circular: The reason we believe that the existence of physical objects is independent of us, is precisely because knowledge in classical mechanics is an additive quantity and we can get to the point where we know "enough" to come up with deterministic predictions. That is, we never have to discard knowledge when running new measurements and so our knowledge takes on a independent "role" – which we call reality.


This is basically the Bayesian interpretation of probability.


Saying that "the probability of a coin toss of 50% talks about you" is not an interpretation of probability. Saying that we are "50% sure" is also not an interpretation of probability. It's a nonsensical statement. It's like saying we are "50% angry". It doesn't really mean anything.


I don't understand your claims that these statements are are meaningless. They are commonly uttered and understood.


I can understand expressions such as "pretty sure" or "completely sure". I do not understand the expression "to be X% sure". If someone says they're "37% sure" tomorrow will rain, what does that mean exactly?


Can you understand expressions like "more sure of A than B" or "as sure of A as of B"? Then, they are as sure that tomorrow will rain as they are sure that throwing three dice the sum will be 9 or less (37.5%).


It's clear that 37.5% sure is 0.5% more sure than 37% sure. The problem remains how to interpret these numbers.


You're asking about the interpretation of a statement such as "I assign the same probability to events A and B"?

That would mean that both are equally likely as far as that person knows.


No, I'm not asking that.


> If someone says they're "37% sure" tomorrow will rain, what does that mean exactly?

When someone says they're "37% sure" tomorrow will rain they mean that they assign the same probability to "tomorrow will rain" that they do to "if you throw three dice you'll get 9 or less" or "when you threw three dice you got 9 or less". In the second case the event is either true or false already and there is no uncertainty for you, their probability assignment is their best guess with the information they have.


The question is what is probability according to the subjective interpretation of probability? The answer usually given is that probability is a degree of belief. Thus, a probability of 37% means that you're 37% sure that some event will take place. What I'm saying is this definition is meaningless unless you define what it means to be X% sure about something, but the definition of "being X% sure" must not rely on the notion of probability because "probability" is what we are trying to define in the first place!


And what I try to explain is that one way to define what it means to be X% sure about A is to say that

- you put a number on it p(A)

- which is between 0 and 1

- and allows you to compare how sure you are about different things p(A) and p(B)

This number can be used to compute how sure you are about composite things:

p(A or B) = p(A) + p(B) - p(A and B)

p(A and B) = p(A given B) p(B) = p(B given A) p(A)

That number p happens to correspond to the notion of probability, but it has not been defined using a pre-existing notion of probability: https://en.wikipedia.org/wiki/Cox%27s_theorem


What do you mean "you put a number on it"? What number? If the number is arbitrary, which is what your explanation suggests, it cannot mean anything.


It's not completely arbitrary, it represents the degree of plausability you assign to the event.

These numbers have to obey some rules if you require that a set of beliefs is consistent.

The number you assign to the plausability of A and the number you assign to the plausability of not-A have to sum 1.

If you think A and B are equally plausible, you have to put the same number on them.

If you think that A and not-A are equally plausible, you have to assign the number 0.5 to both.

If you put the number p(head)=p(tails)=0.5 as your degree of plausability that the coin I just flipped (I actually did it!) is showing head or tails it's not an "arbitrary" number. It means that you think both (exhaustive) outcomes are equally plausible. Why do you say it cannot mean anything?


It seems to me that if a probability is a quantity representing a degree of belief, and it's only meaningful in relation to another quantity representing another degree of belief, in the sense that we can only say than being X% sure is being more sure than being Y% sure, if X > Y, or equally sure, if X = Y, then such quantity only has an ordinal value, which is to say the quantity itself is meaningless. For it to be meaningful it has to have an interpretation that does not always refer us to another degree of belief. It also must not refer to a "degree of plausibility" since this is just another expression for "probability", and probability is what we are trying to define.


I'm not sure I understand where do you see a problem.

In the example of the degree of belief (between 0 and 1) that you have that the coin on my desk is showing one face or the other, don't you agree that the right numbers that represent your indifference are 0.5 and 0.5? The quantity itself is not meaningless.


You say indifference, but indifference with regards to what? In economics, an individual is said to be indifferent between two alternatives if those alternatives result in the same level of utility for him or her, utility being an abstract concept representing well-being. But you don't say why this person is indifferent to the coin showing one face or the other. Because if it's because he or she thinks both options are equally likely, then we again have a problem, since we don't know what "likely" means.


At what point does the following argument derail for you?

0) I tossed a coin, it lies flat on my desk

1) You have some degree of belief about the statements H:“the coin shows heads” and T:“the coin shows tails”

2) You want to quantify that degree of belief

3) You postulate that you can put a number on your degree of belief about some statement A with the following properties:

3a) p(A) it is between 0 (false) and 1 (true)

3b) p(A or B) = p(A) + p(B) - p(A and B)

3c) p(A and B) = p(A given B) p(B) = p(B given A) p(A)

4) p(H) + p(T) = 1

5) Unless your degree of belief about H is higher than your degree of belief about T

or your degree of belief about T is higher than your degree of belief about H ...

6) ... it follows that p(H) = p(T) = 0.5


(in reply to @kgwgk's comment https://news.ycombinator.com/item?id=25313531)

The argument is flawless, the problem is with the interpretation.

> p(A or B) = p(A) + p(B) - p(A and B)

How does one add degrees of belief and what sense do we make out of the result?

> p(H) = p(T) = 0.5

Sure, two equal quantities representing degrees of belief must mean the degrees of belief are of the same magnitude. But what about P(H) = 2P(T)? What does it mean for one degree of belief to be twice as large as the other?


>> p(A or B) = p(A) + p(B) - p(A and B)

> How does one add degrees of belief and what sense do we make out of the result?

That's how we postulate [1] that the numeric representations of degrees of belief are added. Doesn't that look like a property that you want a numeric representation of degrees of belief to have?

If you have some degree of belief about A, some degree of belief about B, and you believe that A and B are mutually exclusive, wouldn't you want the number representing the degree of belief of "any of them" p(A or B) to be the sum p(A)+p(B)?

>> p(H) = p(T) = 0.5

> Sure, two equal quantities representing degrees of belief must mean the degrees of belief are of the same magnitude. But what about P(H) = 2P(T)? What does it mean for one degree of belief to be twice as large as the other?

Consider p(H or T) = p(H) + p(T) = 2 p(H) = 2 p(T). Isn't it natural to quantify the degree of belief that I got any outcome with a number that is the sum of the numeric representations of the degrees of belief that I got each outcome?

Or say that, instead of flipping a coin, I toss two of them. They're lying flat on my desk right now. The number of heads up is 0, 1, or 2.

How would you describe your degree of belief about the statements "X=0: there are no heads", "X=1: there is one" and "X=2: there are two"?

Wouldn't you say that your degree of belief about "X=1" is of the same magnitude as your degree of belief about "X=0 or X=2"?

Wouldn't you say that your degree of belief about "X=0" is of the same magnitude as your degree of belief about "X=2"?

Wouldn't that make the numerical representation of your degree of belief about "X=1" twice as large as the numerical representations of your degrees of belief about each of "X=0" and "X=2"? (Where you assign numbers to degrees of belief using the representation we're discussing.)

p(X=1) = p(X=0) + p(X=2) = 2 p(X=0) = 2 p(X=2)

[1] in fact I think this is what we get from postulates which are a bit more general, but for the sake of the discussion we may stay in this level


Addition is required by the axioms of probability, not by the interpretation of it. When interpreting probability as a degree of belief this property is not only not useful, but is particularly troublesome, because adding and subtracting degrees of beliefs doesn't appear to make a lot of sense.

All in all, to me it's clear that these degrees of belief are a theoretical construct, not an empirical reality. I don't think people assess the truth value of a statement on a continuum from truth to false. This is not how the human psyche works. Personally, no, it's not natural for me to have a degree of belief (in the way that you have defined them) about a statement, and I have no idea how to interpret arithmetic operations involving these "things".


37% of the time that someone says they are 37% sure of a statement X the statement X is true (assuming they're calibrated correctly/etc).


Of course. But if you pronounce a fancy word like "bayesian" there's a large amount of minds that shut irremediably.


That's also why we call it QBism [1] instead of quantum bayesianism

[1] https://en.wikipedia.org/wiki/Quantum_Bayesianism


There seems to be a lot of quibbling about the simple sentence here but I find it clarifying. Discussing coin tosses is a thought experiment with a very practical physical analog so spelling out clearly what the thought experiment's actual subject is has valuable properties so you don't get lost in the weeds of the physical execution of flipping coins.


How does the ball situation help? It has the same problems of physical impossibility of measuring however the balls were ordered. (Modelling someone's brain?) I guess the argument works on someone with an unscientific model of the human brain, but that's one step forward and two steps back.


Somebody just put the balls there carefully, and did not tell you in what order, just how many of each color.


That still boils down to a lack of prior information which I don't think removes the argument for "I don't have enough information."

Probably have to use actual quantum phenomena that behave probabilistically by definition if you want a currently irrefutable physical example.

I'm personally not convinced even this is fundamentally probabilistic and we currently have to rely on probability theory as a crutch for complex behaviors we just quite don't understand yet or don't have the time and resources to compute.


> "I don't have enough information."

My point exactly. Probability theory is a precise mathematical formalization of the concept of "not enough information".


You don't need quantum physics to formulate a philosophical standpoint that happens to agree with the Kopenhagen Interpretation.

I'm not sure if it helps, but I suppose a compromise here would be the assumption that you don't really know the starting configuration of yourself, why you draw probabilistic inferences naturally, that the sun will go up tomorrow like every day.

If that has a biologic explanation, then the top comment was not just to the illusive argument of platonic ideals.


I would have liked to know what the Kolmogorov approach has to say about the examples used to deflate frequentism ("which of my friends will start a business?"). I don't see from the article how the "smallest-program" approach could say anything useful about those, either. Maybe that wasn't the point--but after poking holes in the frequentist view, it uses unrelated examples like digits of pi to illustrate the Kolmogorov idea, so I'm left unable to directly compare the kinds of statements the two approaches can make.

Even going back to dice or coins would have helped me compare them. Like, I know frequentists can show how the variance in coin-toss outcomes decreases as the sample size increases. What can the smallest-program approach say about that? Or was the point that those variant outcomes aren't "real" enough to talk about? Does that mean there is a connection to constructivism in mathematics here?

It seems either approach benefits from more data, and there must be a concept related to a "confidence interval" where, as 100, then 200, then 300 digits of pi roll in, your pi-program stays the same size while other programs have to keep growing to accommodate the new data. Like, the ratio of the smallest program to the naive encoding ought to say something about how potentially predictive the small program is.

Thanks for the interesting article. It definitely made me think about the issues, and now I'm curious to know more about the topic.


One aspect of Kolmogorov approach is that implicitly models things like biased probabilities through compressibility.

The shortest representation of rolls of fair dice or coins is their exact results, but if there's "less randomness" in some way (biased coin/die, sum of two dice which means non-uniform probabilities, combination of some predictable pattern with random noise) then there are more compact representations of that information, and all of that gets captured by the Kolmogorov approach without any explicit handling of the various possibilities.


Thanks! I hope to better address your concerns in Part 2.


OK - so apply Kolmogorov complexity to election polling.

How does that work out?

I think you're confusing various possible maps with the territory in a less than useful way.

Given that frequentist interpretations are approximations - and understood as such - and Kolmogorov complexity isn't computable at all, what problem have you solved here?


Hm I admit it's hard to talk convincingly about election prediction, since we don't have practical algorithms to do this; a lot of it comes down to human judgment.

The philosophical point (which might be approximated algorithmically someday, or by intelligent minds today) is that your election probabilities should come out of an overall highly compressed model of the world. In theory, a Bayesian who uses the prior 2^-K(x) over all strings x should, with sufficient life experience, come up with good estimates, in a certain sense.

I'll have to think about this example more carefully when fulfilling my promise of writing about how this theory relates to everyday decision-making. Thanks for pointing out a potential weakness :)


The article is excellent, congratulations.

A couple observations/questions.

1) You didn't comment on the bayesian viewpoint that probability reflects a subjective idea about the state of the world. One might argue, for example, that probability isn't measurable, and that therefore, strictly speaking, a statement about the objective probability of an event isn't meaningful. Experimental evaluation would have to be done on an entire model instead. Do you have any objections to that point of view?

2) I don't find the case about Kolmogorov complexity to be actually convincing, at least not as per the requirements the rest of the article sets. "3141592..." could pass as either "random digits" or "first digits of pi". The fact that it's highly unlikely a true RNG would have generated exactly those, we are back to a frequentist argument there. It's likely I'm missing something, could you elaborate more or give me a pointer?


Isn't that what the occam's razor argument was for? Sure, both an RNG and the "40 digits of pie" program can produce that output, but the "40 digits of pie" program is shorter, therefore having less kolmogorov complexity


Is it really shorter?

For all I know this is a false dichotomy. True RNGs don't exist, for one, in a deterministic philosophy.

RNGs are per definition non-deterministic, but algorithms are deterministic--by the definition that I learned. We use floating gates picking up cosmic background radiation or the fallout from radioactive elements to come close to really random. XKCD's butterfly joke applies.

Computable numbers have a computable complexity. For a non-deterministic program the fair comparison to a true RNG would be a programm that outputs all digits of Pi up to infinity, I suppose. Vice-versa, I'm not sure if a pseudo-random number generator couldn't be fit into the equivalent kolmogorov complexity of a 40-digits of PIE.

Are you saying it couldn't be? On the one-hand it is the case that pseudo RNGs simply rely on intractible complexity, which must regularly supercede 40 digits of Pi equivalent to 40 byte key, as opposed to your 4kb RSA key chains and what not.

On the other hand I think your argument is akin to the gamblers fallacy: I have absolutely no clue but I will hope it will not have been a primitive RNG, right?

For reference, here's a simple pseudo RNG taken from TinyPTC example code to produce a TV-noise graphic in a loop

        noise = seed;
        noise >>= 3;
        noise ^= seed;
        carry = noise & 1;
        noise >>= 1;
        seed >>= 1;
        seed |= (carry << 30);
        noise &= 0xFF;
        pixel[index] = (noise<<16) | (noise<<8) | noise;


Yeah but is it? They are both programs with logarithmic Kolmogorov complexity, and Kolmogorov complexity is only defined up to constant factors unless you commit to a computational model. If you do commit to one, which is the shortest is just a function of what are the specifics of the model you chose, which isn't really interesting, it's literally code golf at that point.


In order to discriminate between the output of an RNG and the digits of PI, you are right that they have the same Kolmogorov Complexity to within additive constants.

However, you can ask for resource-bounded Kolmogorov complexity. Suppose you ask for the shortest length of the description of a PTIME machine which output the N bits of an RNG versus that which outputs the first N bits of PI. Complexity theorists believe that the first will be longer. Proving that conclusively might possibly have a bearing on P=BPP.

When you go all the way down to finite-state machines, KC becomes something equivalent to finite-state information-lossless compression, like Lempel-Ziv compressibility.

Long story short: by restricting the resources available for the computation, it is possible to discriminate among some such examples as you point out.


> They are both programs with logarithmic Kolmogorov complexity

In the case of true random numbers, how is that so?

Very few random sequences can be generated by a logarithmic-sized program, since most strings are not significantly compressible [1]. A simple counting argument shows that: there are 2^n strings of n bits, but only 2^(lg n) = n logarithmic-sized strings, a much smaller number!

[1] http://theory.stanford.edu/~trevisan/cs154-12/kolcomplexity-...


1) I rather like the subjective view! Bayesians used to struggle to justify a choice of prior, but it turns out that 2^-K(x) is universal in the sense that it never falls below a constant factor of any given (semi-)computable finite (semi)-measure.

2) Sorry, I should have clarified that the programs are deterministic. So if you want to use an RNG, you also have to supply a string of random bits that cause the RNG to output forty digits of pi.


Why is frequentism bad because it only gives certainty for infinite samples, but complexity is good despite being non-computable?

It's two sides of the same coin -- computable uncertainty va non-computable certainty.


I don't think frequentism is "bad"; just insufficient as a gold standard interpretation of probabilistic claims. I liked an analogy from the reference by Rathmanner & Hutter: the most "correct" chess-playing program involves a complete search along the tree of possible games. In practice, we try to approximate this ideal.

In the case of Kolmogorov complexity, a reasonable takeaway might be to use the shortest program that we're able to find, even if it's not the shortest overall.


Some thoughts on the composition:

* "I wrote the case arguing that frequentist interpretations don't work, but algorithmic information theory does": if that's what you are up to here then I think it would be for readers if you stated that up front in some way. And hit me with some kind of summary at the end that makes the concise version of your argument at the end, it's a long article.

* Shorter might be better: There's a lot of stuff in here that I think you can pare out in the probability discussion that maybe isn't adding that much to your argument. I think there is a lot to be gained by assuming a generous reader.

* Betting might be a distraction to your point: This might be confusing the imperfect knowledge of participants in a market with the imperfect knowledge of all the physical forces involved in a physical phenomenon and how that related to the seeming "randomness" of a coin flip for your reader. (The liquidity and stuff.. this is just not related to your point.)

* Don't undermine your point with unrelated assumptions: "I imagine they wouldn’t consider their world unlikely at all: they would just add a new law to their description of physics: all dice, as if by divine intervention, are deemed to exhibit this strange behaviour" this lead me to think that you were just sort of shooing away the whole last X decades of high vs. low energy physics, we collectively certainly don't think that we have the rules correct precisely because of this complication, we find the idea that we need 2 sets of rules improbable and believe that there must be a way to explain everything with a single set of rules. So your mythical dice society probably would consider their dice exception a very unlikely world.. they would be confident they have the world wrong!

A dubious assertion (or at least one that would need a whole lot of explanation) can be an off-ramp for a subset of readers.


Thanks for the detailed critique! I'll take some time to think about how to better make the points that I wanted to convey with those sections.


There were a few points that I, as someone unfamiliar with many of the ideas presented, got hung up on.

First, the paragraph that begins with "At first blush, the requirement to use..." Seems to be a non sequitur. I don't fully understand how the previous section creates a requirement to use deterministic programs, so I could use more explanation on how that requirement is established.

Second, a very simple concrete example of what one of these programs would look like would be immensely helpful. After re-reading the article a bit I have a mental image of a program that contains a long, compressed string and a decompression algorithm that somehow models the system you're interested in. I can imagine how you might get a useful interpretation of probability from the decompression system, but there are enough open questions there that I'm not sure I have the correct interpretation.

Hope that helps!


Thanks. I should clarify that the computer is deterministic, so as to avoid building randomness into the definition of randomness!

I skimmed over an example too quickly, but your intuition is about right. For that sequence, two possible programs are:

- Compute and print the first 40 digits of pi.

- Decompress the following string according to a Shannon code with probabilities (1/36,1/18,1/12,[etc]): [insert code]


Such an interesting post, thank you for sharing! I'm in my second year of a maths degree currently, and we obviously studied frequentist probability/stats in the first year, but I'm not taking any probability modules this year. I found the tone & accessibility was just right for me :)


My only thought is that discovering a workable definition of "scientific method" is a reach. Philosophers spent a century searching for such a thing, in vain.

On the other hand, providing something that just works would be beneficial enough, even if falling short of the philosophical holy grail, so it's worth pursuing.

I'm a physicist, and physicists have always wondered why math works so well in physics. There's this famous essay by Eugene Wigner:

https://www.dartmouth.edu/~matc/MathDrama/reading/Wigner.htm...


It sounds like you are arguing that i.i.d. frequentism doesn't work. I view AIT as generalizing frequentism to non-i.i.d. timeseries. This is formalized as Martin-Lof tests for randomness, and Solomonoff induction.


This is awesome! Good work!

Are you going to touch on Chaitin's Omega?


Thanks! :)

I wasn't planning to go there! While I enjoy the idea, for now I'm trying to focus on what's needed to make sense of the problem of induction. Is there a nice connection that I missed?


> The scientific method only works because the rules of the universe happen to be simple, while the set of observations it offers is vast. Kolmogorov complexity captures this defining characteristic of our reality.

I've never seen this spelled out so beautifully!


It is a funny statement, I like it. But due to a different reason: all we know about reality is our theories. How we could state, that rules of the universe are simple? As I see it, we could state, that our theories are full of simple rules. But the universe have no theories nor rules outside of human's mind. It is completely our inventions, our dreams, our hopes that the universe have some rules.

We could state that our simple rules works, but what does it mean "to work"? For example, a spider sees reality not like us, it feels vibrations of it's web, runs to a source of vibrations and start to bite, to wrap intruding object with web. It would do it to a tuning fork, if you pressed it to spider's web. His simple rules of reality works though. Despite the fact that sometimes spider bites steel of a tuning fork without any benefits for the spider.

How could we know that our theories not just extended version of spider's? With the same issues, like they make us to do something absolutely pointless. How could we evaluate this fact? To ask our theories? But our theories already predicted that this pointless thing we do would be a good thing. We might ask our theories again and we'd get the same answer.

This statement seems as a tautology for me. Our rules are simple, because they are simple. Our theories work because they tell us, that they work.


I think there is a disconnect between the "rules of the universe" that the quote talks about and the "our simple rules" that your comment focuses on. Without getting too philosophical, I think we can agree that there are "rules" how the universe works (which is what the quote talks about), which are not necessarily the same as the rules that our theories postulate (what you talk about), although we of course try to get closer and closer.

That we can only observe, talk about and know reality in our subjective ways does not mean that there is no underlying mechanism by which reality "works". The quote points out that this "mechanism" is apparently sufficiently simple that we can effectively form and test hypotheses about it.

For example, we have no all-encompassing explanation of the universe that concludes R = U/I for electric circuits. Yet we can observe it to be accurate independently from the infinity of conceivable influences - there appears to be no influence on this observation from your lunch, the day of the week, whether your car is green or somebody was just born in Taiwan. We can't explain why these have no influence. We could imagine a reality where all of these (and infinitely many others) are confounders, in which case we could not effectively form theories about these rules. Yet the actual number of things influencing R = U/I observations in our reality are evidently very finite, allowing us to identify them and build our theories.


> Without getting too philosophical, I think we can agree that there are "rules" how the universe works...

Well, maybe I am too philosophical, because I feel myself reluctant to agree. I accept science method because I know nothing better. But I doubt the idea, that there are no possibility to invent something better.

> we have no all-encompassing explanation of the universe that concludes R = U/I for electric circuits

I'm not a physicist to argue with it. But somewhere deep inside me I see R = U/I as a tautology too. R, U and I was defined by a way, that made R=U/I true. There is some information about the universe encoded in this rule, but there also information about us there. How to separate information about the universe from information about us?


Science has an empirical aspect that you're ignoring here. Our theories are tied to nature through experiment. They are more than just inventions of the human mind. People come up with new ideas about nature all the time. Most of them are wrong, and we find that out by doing experiments.


Yes. If we take a look at social sciences struggling with experimental method, we'll see, that they get better with time. When experiments do not work, people somehow figure out how to do experiments better, and it changes things. It gives a hope that experimental method could falsify itself, so when it stops working we will be able to notice it. So if we do not see it to fail, we could assume that it works.


I disagree that the rules just happen to be simple. Thinking in dynamical systems or evolutionary terms, a language and mental model of the universe just won’t stick around unless it’s useful. Imagining that the universe happens to be horribly complex, but has some emergent phenomena that happen to be simple, we’re almost inevitably going to think in those simpler terms and go “wow the universe is simple!”

Then if the foundation underlying those simple emergent concepts turns out to be horrible complex, maybe we get stuck with the foundations at some point. Or maybe simplicity is relative and a visitor from a different hypothetical universe would be astounded at all the shit we have to simulate because the answer isn’t just obvious.

Like how macroscopic objects are an emergent phenomenon, but they’re simple. Maybe the universe at a fundamental level follows some ultra convoluted string theory. Doesn’t matter for us, the same way that flipping a coin is 50-50 regardless of questions about foundational physics.

It’s taken a long time to develop the hierarchy of physical and mathematical concepts in which we can usefully describe much of the universe as simple. And those concepts, the best we have so far, still don’t tell the whole story.


> if the foundation underlying those simple emergent concepts turns out to be horrible complex

That is extremely unlikely. The simplicity of the universe is not just an artifact of our best theories, it appears to be baked into the very structure of those theories. If our current theories are even in the ballpark, then there are very few places that hidden complexity could possibly hide.

Take quantum randomness, for example. It used to be thought that the apparent randomness could just be papering over our ignorance of some hidden underlying mechanism, but it turns out that we can prove that this is not the case. We can eliminate entire classes of theories based on finite observations, and one of the classes that we can eliminate (with very high probability) is theories with high Kolmogorov complexity.


I assume you’re taking about the Bell experiments. Imagine that it’s the nonlocality side of things that ends up being correct. In that case, a full foundation of QM could be a nightmare.

Regardless, we don’t have a full unified theory of everything, so it seems premature to say we know how complex it’ll end up needing to be.

Also, the universe isn’t just an time evolution differential equation. It’s that, plus all the initial/boundary conditions. Saying the Kolmogorov complexity of the universe is small while only looking at the side that’s simplifiable seems circular.


Not everything that is true is simple or catchy. Biasing towards it is a trap.

PBS SpaceTime has a good discussion on this topic.

https://youtu.be/xFKgIOX8IRE


Neat video, but it specifically disagrees with your point. He advocates for applying scientific rigor between leaps oc intuition. There's nothing wrong with GP appreciating beauty.


My point isn’t contradicted by the video. My point is not "beauty should not be appreciated". My point is that we should not dismiss models of nature based on aesthetics. Nature has no preference on what we see as beautiful. The video makes the point that beauty should be treated as a guiding principle, not as a hard and fast rule to sniff out truth.


> The scientific method only works because the rules of the universe happen to be simple

Are they simple though? That's not the impression I get from physicists, eg, the famous Richard Feynman quote: If you think you understand quantum mechanics, you don't understand quantum mechanics.

We haven't even managed to find a unifying theory yet, but the current contenders, like Loop Quantum Gravity are anything but simple.


> The scientific method only works because the rules of the universe happen to be simple

It's simple in the sense that the universe seems to be the kind of place where inductive reasoning mostly works. For the most part we seem to be able to expect things like predictability and causality at the scale of the objects and processes in our everyday life. If it wasn't like that, things would be a lot more complicated.


As I understand it: The rules are simple, but not necessarily intuitive.


Exactly right. The rules of quantum mechanics are in their essence not much more complicated than high school algebra. It is the logical consequences of those rules that are hard to wrap your brain around.


I have a couple of books on quantum field theory and it looks a lot harder than high school algebra.


Go read a book on number theory some time. It looks hard too. But it is about nothing more than the properties of the natural numbers, which any grade schooler can understand.

Likewise, QM looks hard, but at its core it is little more than linear algebra, which any high-school math student can (or at least should be able to) understand.


Everything can be broken down to simple easy to understand little bits. The complexity comes from there being many, many of these little bits.


Yes, that is exactly my point. The core of QM consists of very few bits.


I've always taken the Feynman quote to mean: just because you understand the rules doesn't mean you understand the implications, which are varied, vast, and counter-intuitive.


Because others already questioned the first part, I’ll question this:

> the set of observations it offers is vast

Actually, in an experiment there is always only one observation at a time. That we group multiple observations together is kind of arbitrary and relies on the hope that the experimental conditions are the same and therefore one experiment is analogous to the next.


Yes, that's exactly the heart of the quote to me: There is sufficiently little complexity in the causality of observable behavior that with sufficiently controlled and repeated experiments/sampling, we can uncover regularities and patterns in it to fuel our predictions (and hypothesize about these causalities).

Does the reading of your multimeter depend on the digits of Arnold Schwarzenegger's phone number? Do you have to repeat the experiment if his phone number changes? Indeed, we assume that this is not an "experimental condition" to take into account. There is no way to determine this a-priori, and one could conceive of a universe with an arbitrary amount of such strange influences. But we do not appear to live in such a universe, which is why we get to apply Occam's Razor.


> Does the reading of your multimeter depend on the digits of Arnold Schwarzenegger's phone number? Do you have to repeat the experiment if his phone number changes? Indeed, we assume that this is not an "experimental condition" to take into account. There is no way to determine this a-priori, and one could conceive of a universe with an arbitrary amount of such strange influences. But we do not appear to live in such a universe, which is why we get to apply Occam's Razor.

I think this is not true for every experiment. For example if you measure the polarization of a photon it will have the same polarization in any subsequent experiment, no matter how hard you try to reduce complexity.

Also, my comment was rather directed at the fact that any experiment has a unique outcome. In that sense we we can’t have perfect control over an experiment, since at least time must have passed between subsequent measurements, such that the experimental conditions are different.


The patterns in our observations appear simple, the rules are deeper: e.g. newtonian motion vs relativistic.


Probability is not real. Probability is subjective, it depends on what you know, and everyone has a different set of things they know.

I flip a coin and look at it, then ask two other people for their probabilities. One of them knows the coin is biased towards head such that it's twice as likely to land heads than tails on any given flip. The other knows nothing about the coin.

The first person guesses a 66% chance of heads. The second guesses a 50% chance of heads. I, having seen the coin, say it's a 0% chance of heads.

None of these probabilities are wrong. They're all correct given the set of knowledge that person had. Probability is subjective.


Probability is relative to the information you have. That doesn't make it not be real.

Probability is something we invented to describe systems, based on what we know about them, when we have imperfect information. If we had perfect information about the coin, its surroundings and how it was flipped, we could tell which side it will land on, but we don't. (Ignoring that there are some quantum things that physicist say can never have perfect information)


Generally we use the term real to refer to objective claims and consider subjective claims not to be real. You can certainly use the term real in a broader sense if you'd like. I don't think the concept of reality is particularly useful or coherent, but to the extent others use the term, probability is not real in the sense it's being used in.


Nitpick on definition of "real" and "objective" follows. Proceed at your own risk :-)

Would you agree that given exact info of conditions one can objectively assign concrete probability? That would give objectivity to it.

Definition of "real" for daily life is hard (impossible?). I would say "probabilities" are as real as other useful constructs that impact our lifes. Examples would be "countries" and "laws".

Is country X real? (what if it is recognised only by 1 UN nation?) Is law real? Is law real in <insert-country-in-turmoil-with-very-very-selective-enforcement>.


I could say "I have a job!" and that statement would be true, but only with the information I currently have available (what if my boss has already decided to fire me but I've just not received the information yet?). So the statement being true is based on my information about the world, just like probability.

Are particles on the quantum level real or just mathematical constructs that describe what we can observe, just like probability is?


Sure, it's as real as those other examples, i.e. not particularly except insofar as it affects peoples' expectations.

I don't think "exact info of conditions" is objective or coherent. This pre-supposes there's an external universe with exact conditions that produces our experiences, but there's no way to distinguish this possiblity from other possibilities (like a multiverse, for instance.) This lack of ability to distinguish, in my ontology, makes the distinction meaningless. Yours may differ.


Your example suggests to me the opposite: Probability is a real and objective way to describe the information you have. "Garbage in, garbage out" still applies when you have no information.


The distinction between objective and subjective collapses under your usage.


No, the distinction is clear. If your assigned probability is something that someone else can reproduce using the same steps given the same information, then it is objective (yet contextual). Your examples each have a clear reasoning behind the assigned probabilities, they're not just opinion-based assertions.


Bob has a very simple algorithm to output probabilities. He just answers 50/50 for any yes or no question. This is reproducible. Is this objective?


This doesn't strike me as a good example of reproducibility in this context. Let me offer another one. Let's go back to your example of the 66% weighted coin. Given the physical properties of the coin, different people could independently come to the same conclusion that the probability of heads is 66%. I would describe this as an "objective" probability, as it's a nice representation of the available information, independently reproducible by different people, given the same information. It's different than "Bob arbitrarily decides that any yes/no question has 50/50 probability", which is inherently subjective.


No two people ever have the same set of information. And very few cases are even as clean as the coin case.

A more typical example is using polls to predict elections. 538's model ended with Biden around 90% to win. Andrew Gelman's model at the Economist ended with Biden around 95%. Do either of those represent objective probabilities?

Or take weather predictions. Per https://www.metaculus.com/questions/4617/will-2020-be-the-wa..., Berkeley Earth gives a 16% chance to something that NOAA gives 29.2% to. Is either of those an objective probability?

I would say no, and I think that just because two people happen to agree on a number in a particular case doesn't make it objective. If you want to use the word objective, I don't have any particular objection. I'm not here to fight over words, and none of these words are really well defined enough to be worth fighting over. I don't think it's useful to think of probabilities as "real" in any sense.


Yeah, that's objective. It's just not a great model.



It sounds like you're argument is actually stating that probability is in fact a real attribute of physical systems, but the accuracy with which you can accurately predict a probabilistic event is dependant on the amount of information you have about the system. Interestingly, someone only observing results of a probabilistic event will eventually learn to predict events with an accuracy approaching ideal (I.e. the true probability distribution of the system - or someone who knows every single detail about the causal system down to the limits of physical reality - subatomic particles and all of the underlying physics we don't understand yet).

There's that phrase again - "true probability" (implying the word "True" == "Real"). You could imagine a system that yields non-linear results, such that no matter how many observations were made, your ability to accuratelt predict outcomes in the system never increases. In this case, you might conclude that you're dealing with one of two special cases: 1.) a non-probablistic event (i.e. a purely random (0% predicatable) event or 2.) on the other end of hyperbole, a purely deterministic event (100% predictable)).


>Interestingly, someone only observing results of a probabilistic event will eventually learn to predict events with an accuracy approaching ideal

This relies on various unstated assumptions.

I would deny the assumption that an event can be probablistic in nature. There are only ever predictions. The distinction between a probablistic world and a deterministic world is incoherent.


If quantum reality is objective, then probabilistic reality is objective.

Deterministic probability like Chaos or second law of thermodynamics imply the existence of incomplete information. This type of probabilistic reality might be subjective.


I don't think that's a coherent concept. It's not clear how one can assert that a particular model of reality is "correct" outside of any predictions than it makes.


How do you resolve the problems raised by Bell's theorem?


Bell's theorem rules out local realism. I view realism as incoherent, so I'm certainly not committed to local realism.


You said the distinction between a probabilistic world and a deterministic one is incoherent. But if there is no local realism, wouldn't it be more accurate to describe your position as saying all events are probabilistic rather than none of them?


No. All events being probabilistic is still realism, just in a weaker sense than Bell.


Yeah, I'm grasping at straws because I don't really understand what your position is from the things you said in this thread.


Tow things.

First, probability is real: it's a construct in one's mind, and minds are just as real as dice or coins. https://www.lesswrong.com/posts/f6ZLxEWaankRZ2Crv/probabilit...

Second, (and the author may be leading up to this), there's Solomonoff's theory of inductive inference, which he has proven complete: when we apply Occam's razor (where the prior probability of each possible theory drops exponentially with its size), the amount of error a perfect Bayesian makes as they observe event and bet on the next one, ad infinitum, is finite. Roughly proportional to the complexity of the simplest theory that correctly predict the whole sequence of events. It's one of the most convincing proofs that Bayesian reasoning works. https://en.wikipedia.org/wiki/Solomonoff's_theory_of_inducti...

There's just a little snag. Perfect Bayesian reasoning is impossible to compute, so us mortals have to resort to approximations. Just as perfect certainty isn't possible, perfect reasoning is not attainable. Oh well.


> the amount of error a perfect Bayesian makes as they observe event and bet on the next one, ad infinitum, is finite.

There's a little assumption that you're leaving out, namely that the Kolmogorov complexity of the data generating process is finite. From Wikipedia:

> expected cumulative errors made by the predictions based on Solomonoff's induction are upper-bounded by the Kolmogorov complexity of the (stochastic) data generating process

Whether the universe (or our observations of the universe) have finite complexity is very much an unresolved philosophical question.


> Whether the universe (or our observations of the universe) have finite complexity is very much an unresolved philosophical question.

Every time there was a significant advance in physics, it tended to go towards simplification and unification. Geocentrism required epicycles. Then Keppler came up with his ellipses. Then Newton unified celestial and terrestrial laws. Maxwell & Einstein allowed us to view time as less special dimension than we thought it was…

I won't presume about the initial state of the universe, to the extent such a notion is even meaningful. But the fact that it is governed by mathematics, and relatively simple maths at that, sounds likelier and likelier every quarter-century.

And I'm not even talking about everyday life, where we can observe in practice that the simplest theories about who ate the last cookie (little Mike, who lives in the house) are more often true than the more outlandish ones (magical imps, which we never witnessed).


> Every time there was a significant advance in physics, it tended to go towards simplification and unification. Geocentrism required epicycles. Then Keppler came up with his ellipses. Then Newton unified celestial and terrestrial laws. Maxwell & Einstein allowed us to view time as less special dimension than we thought it was…

Unification, maybe. Simplification, no. That's evident if you just scroll down the list of Nobels in physics. You even mentioned Einstein, but I don't know how you could claim general or special relativity are simpler than Newtonian physics.


> I don't know how you could claim general or special relativity are simpler than Newtonian physics

Careful there! You cannot compare both theories in isolation from observation. Newtonian theory fails to match observation if high velocities or big masses are involved.

In order to "fix" that using just Newtonian physics, we're back to figurative epicycles.

Taking observations into account, SR is simpler than Newtonian physics in that it has a greater predictive power.

Remember that if you come up with something simpler than SR it also has to match observation at least as well as SR.


If quantum is really random, then our universe has infinite Kolmogorov complexity.


Not quite.

Under the many-world interpretation, when you send a photon through a half sieved mirror, the universe splits in one version where the photon goes through, and one universe where the photon doesn't. This is all very deterministic.

What the researcher subjectively observe however is another matter. If the universe splits, so does the researcher. The problem of observing outcomes turns into an anthropy problem: if I split myself in two copies, in which copy am I likeliest to find myself into? I'm not sure making bets about that even makes sense: which copy I find myself into has no bearing in the final state of the universe.


Most observers will find themselves in universes with infinite kolmogorov complexity.


Care to explain why?


Most bitstrings are not compressible, so if we look at all the observers across the splitting universes watching a continuous quantum coin flip, most will observe a coin flip sequence that is not compressible, so will observe a constant increase in Kolmogorov complexity.


Is Newtonian mechanics real? Well, its not strictly real, but it's apparently real enough for engineering cars, planes, and rockets.

So even if quantum is really random, I'd bet an unbiased coin will still land on heads with 50% probability every time.


No, just because a process is probabilistic, in does not immediately follow that it has infinite Kolmogorov complexity. For example the probabilistic effects could cancel out.


quantum is not the same as random, this analogy breaks down quickly as you try to explain the experimental results


Vitanyi -- one of the authors cited in the linked piece -- has a paper where he and his colleagues show that even though ideal compression isn't knowable/computable, you can show that two encodings approach the ideal differentially with some computable probability (I'm being vague and handwavy because it's been awhile since I've read this literature carefully). That is, if you have two encodings A and B, you can compute some probability that A is closer to ideal than B.

It's kind of a Kolmogorov/algorithmic complexity analogue of a p-value.

I think this literature/inferential paradigm in general is far less known than it should be.


> is real: it's a construct in one's mind, and minds are just as real

This is a weird thing to say. Not all mental constructs are "real" in a meaningful sense.


No mental construct can exist without a cognition engine to run it. No idea can be thought without a mind to think it. Information has to be stored somewhere, and that somewhere is ultimately physical: a nervous system, a hard drive, a piece of paper…


When everything is real, the word doesn't mean anything anymore.


What does all this have to do with reality of objects themselves?


"All models are wrong, but some are useful"


Nice article, on the whole, and usefully provocative.

But I have misgivings about making these close connections between information theory and scientific theory-making. As everyone knows, information theory leaves out any notion of semantics, as it should. But the important thing about our theories of the world is that they have meaning to us. The scientist searches for something that makes sense of the world, not an algorithm for computing a series of numbers. The theories that we search for may not have the smallest Kolmogorov complexity; the criteria that they satisfy go a lot deeper.


That’s such a great phrase: usefully provocative. Thanks for that.


Nice article, there is an even easier interpretation though:

The mathematics of probability defines a bunch of objects that do not exist, anywhere. Random variables, expectations, probabilities, maybe a few others. Then there are situations in the real world that look a lot like those objects from certain perspectives.

It is a bit like Escher's Ascending and Descending - sometimes people make things in the real world that look like the infinite staircase when viewed from the right spot. Similarly, sometimes we find things that look probabilistic (dice rolls, coin flips) when viewed in ignorance of the sum totality of the entire universe. That is why there are a bunch of statistical tests that determine if outcomes are distinguishable from a hypothetical process generating that outcome.

tl; dr; I like the article, but it seems to answer it's own headline with "yes" and it is much easier to answer it with "no". There is, philosophically speaking, nothing that we can guarantee looks random from all perspectives.


Perhaps the most useful connection is that statistics lets us study what kind of math we can do, when we know something about a set, but not everything about it. And our experience with the world is like that too.


> There is, philosophically speaking, nothing that we can guarantee looks random from all perspectives.

Does this imply that the universe is deterministic?

I believe it is false to claim “Humans have knowledge that the universe is deterministic.”


I think this is related to the bayesian/frequentist approach to statistics. Bayesian statistics revolves around incorporating knowledge to update beliefs and probabilities express degrees of belief. IE, bayesian probability is the logic of uncertainty and how we should reason with imperfect information.

The most famous example of this is the "draw a ball from an urn" example, and how we frequently day we "shake" the urn after we add the balls to "randomize" it. There is a lot of wordplay going on in stats to effectively forget certain information so that the problem I'd workable


It means we can't be certain. Whether the universe is random or deterministic is, ironically, a matter of probabilities.


I have a question about how Kolmogorov complexity is related to "Human complexity" or complexity of understanding?

E.g. a program may be very complex in Kolmogorov terms, like describing 1000 random numbers - easy to understand: you have a database of numbers, and a simple procedure that would scan through it. You can also imagine some real-world microservices-based program with a good architecture and a lot of code that handles all the exception cases of incoming data, all easily understandable.

And now imagine an optimizing compiler for prolog programs. It may have much less code but the algorithm will be so complex that it might be impossible to fully understand its behaviour. E.g. fast-downward is a great example of such a program.

So I'm wondering what does Kolmogorov complexity actually tell? Or does it tell anything useful in "real"-world?


Well, that's altering the 'language' that you're measuring the KC against. For instance, in the language that CS majors use the phrase "Kolmogorov Complexity" is sufficient to encode the concept of KC itself. And in the context of that language plus the concept of KC and adding in this entire thread, the letters KC themselves encode Kolmogorov Complexity in entirety.

So the 'real world' version isn't so easy to tell in absolute terms like that because the languages can differ. But if you were thinking of it like a compressor/decompressor pair, then moving things into the language is like moving things into the compressor/decompressor.

Naturally then you conclude that the "human complexity" depends greatly on the humans since any pair of humans creates a new universal description language that we can see as some base language plus the jargon that they are both familiar with.


> in the language that CS majors use the phrase "Kolmogorov Complexity" is sufficient to encode the concept of KC itself

Wait, isn't that just conflating "language" with "knowledge"/"information"? The underlying assumption here is that the CS major has an association of a concept encoded by the letters "Kolmogorov Complexity".

This is not universal, though, i.e. there's no computation that could derive the meaning behind these letters from the encoding alone. It's like claiming "620" is sufficient to encode Mozart's "Die Zauberflöte" ("The Magic Flute"), because in the language of a musician, the Köchel catalogue number along with the context would enable them to decode the full meaning.

But in reality you would still have to look up the number and the score somewhere so it's not really an encoding but more of a pointer or index. I'd see any technical term that way, in that the term itself is not an encoding, but a key/index/identifier of a concept, not a full definition of the concept itself.


What is easy or useful for one person to understand is often much less so for another.


Ultimately, probability must be real, because if you zoom in to a low enough level, the movement of particles is basically random, and this randomness is enough to poison any deterministic explanations for which side a coin flip will land.

You could calculate which side a coin will land if you knew all the variables and thus arrive at a higher probability that it will land on a certain side, but it’s still a probability, because the particles that make up the coin could all move in such a way at any point that causes the coin to fall in a way you didn’t expect, even if that probability is very low. A small probability unlikely to happen is still a probability nonetheless.

Try computing how a dice made up of a few particles will land. Good luck...


Philip McShane expressed this nicely: "...a thing is defined by... systematizations of coincidental aggregates of the properties of lower things". Probability theory "allows for the emergence of the systematic from the non-systematic".


I haven't read the article either but skimmed it briefly for a later read; I think it isn't disputing the phenomena that probability is ultimately used to describe (as you seem to claim), but how they are to be interpreted.


Isn't declaring "the raison d’être of probability theory is to explain the decision-making of individuals facing uncertainty" basically a claim that the sole purpose of probability theory is an economist's approach? It's entirely possible that much more work in probability is done without a bet or payoff in sight than is done for the sake of decision-making models. It seems like basing epistemological arguments about all probability theory on a framing friendly to very specific groups of non-mathematicians.


What is a probability? Nearly always, a probability isn’t a statement about the world, it’s a statement about your knowledge and the information you have. Even for frequentist statisticians, equating probabilities with proportions of outcomes is an admission that you only have partial knowledge about the outcomes.

Of course it also has mathematical structure and properties that may be interesting to people for their own sake. And there may be interesting things to say about quantum physics using probability, but I think a historian of mathematics would not claim that quantum physics was the driving force for the development of probability theory.

Anyway, the “raison d’être” != every conceivable use.


Can you give an alternate definition for probability than "decision making under uncertainty?" I've never heard one. Sure there are mathematical real-analytic "theories of probability" but that's abstract analysis of measure, until it's applied to answer how... probable something is


Very interesting article that cuts to the core of many of the issues with pricing insurance products. While the phrase is hardly limited to the actuarial world, "all models are wrong, but some are useful" is definitely an extension of this.

The fluid nature of probability is the center of the insurance universe. Probability is always a moving target in the insurance world. Indeed, if it weren't, there wouldn't be much of a need for actuaries. Much of actuarial training revolves around the idea of credibility -- how credible is your sample set, what alterations should you make to old data to make it relevant to today, and what data should you add to it as a complement in order to relieve the model of the biases inherent in your sample size. This is inherently Bayesian in it's approach.

Where it truly gets interesting is that insurance companies are very cognizant of tail risk -- the 1-in-100, 1-in-250, 1-in-500 events that can cause insurer insolvency if not properly accounted for. You can survive a miscalculated loss trend within reasonable bounds, but if you haven't thought about the potential Cat 5 hurricane that hits Miami-Dade then you are going to have some very unhappy investors. When it comes to these types of events, you mostly need to be in the right ballpark. The order of magnitude matters more than the exact number -- albeit the exact number matters quite a bit for regulatory reasons. This type of calculation for property lines has largely been outsourced to the stochastic models developed by companies such as AIR and RMS. A sudden change in their models, which I think is likely after this record breaking hurricane season, can inflict capital pressure on the industry almost instantly.[1]

There are some actuarial papers from around 50 years ago that discuss information entropy as another way to approach the issue of constructing probability models, but they never really caught on. It seems that is likely due to the lack of widespread computing power. I'm hoping these ideas can gain some steam now that we can construct some of these distributions from Python and R.

[1] There is a fantastic article by Michael Lewis that describes this issue at great detail: https://www.nytimes.com/2007/08/26/magazine/26neworleans-t.h...


For me, it comes down to what kinds of these risky models are most interesting. Some can be interesting because of potential profit or minimizing loss (financial or actuarial) and others are inherently (theoretical physics).

To add to your tail risk point - I wonder how many people foresaw the Venezuelan oil crisis way back when, or even less likely, the Saudi Arabian oil complex attack in 2019. And of course, the current situation we're in with CoVID that an entire university of forward thoughtful looking people didn't call until it was a week away. As an aside, do insurance companies significantly alter their policies when such a cat-5 hurricane is imminent? What preparations would they make in the face of that sort of event?

Are you talking about chaos theory in the last paragraph? I'll read that article you linked in a bit and see what more I have to say, from skimming through it looks as though my question from the previous paragraph may be answered.


>as an aside, do insurance companies significantly alter their policies when such a cat-5 hurricane is imminent? What preparations would they make in the face of that sort of event?

You cannot retroactively alter a policy, for obviously good reasons. The main preparation policy-wise is that insurance companies do not knowingly write new policies in the affected area when disaster is ongoing or imminent. Reinsurers will also avoid writing new treaties (which is what a standard reinsurance policy is called) -- for these reasons the Florida cat reinsurance market is typically dominated by policies that incept on June 1st and run to May 31st of the next year.

Internally, the companies will start modeling what they think their potential losses will be almost immediately, as investors expect a fairly quick turnaround on getting initial loss estimates out the door.

There's a fairly new paper here, albeit not yet peer reviewed, on the promise of maximum entropy models in an actuarial setting. The appendix has references to the earlier papers: https://www.casact.org/pubs/forum/20wforum/07_Evans.pdf


> You cannot retroactively alter a policy,

Sure you can. Insurers refuse claims all the time. You just claim that the assumptions of the policy were not met. Since assumptions are always idealizations of the messy real world, such a claim is *always" true


Thanks for the link, I'll check it out.


Since the lowest levels of our understanding of the universe are probabilistic, it seems reasonable to assume probability is the most real thing we currently know. It might be that will change some day, but at the moment everything we take to be non-probabilistic is just a simplification of underlying probabilities.


I'm glad to see a write-up of this. I've been searching recently for a way to reasonably define probability without having to invoke either hypothetical infinitely repeated experiments or placing bets (since the latter is really just implicitly invoking the former). I'm looking forward to part 2!


Check out the first chapter of Jayne's probability theory. It's the clearest take down of the frequentist interpretations that you have and introduction to the Bayesian interpretation.

http://www.med.mcgill.ca/epidemiology/hanley/bios601/Gaussia...


Thanks, I had a quick skim through it and it seems really helpful. If I understand correctly, the claim is that the laws of probability as they are known comprise the only possible interpretation that satisfies the 3 desiderata set forth in chapter 1.


I don't remember, the thing which really helped me click at least the bayesian interpretation was if we have A --> B and B, then A is more plausible and probability reasoning is a way to quantify how more plausible A is now that we know B is true.

Chapter 5 is also a doozy in terms of explaining some of the things going on currently.


I work with probably as it relates to financial markets, and my gripe with probably is at an even lower level. Even with dice, the fact of the matter is that you will roll dice with almost entirely Newtonian forces acting on it, and it will settle on a number. This is not unknown to physics. It’s just unknown to us. And because we don’t have the data and computational power to know with any certainty where the dice will land, we call it random.

Everything I do is centered around creating probability distributions of where a stock will be in in the future. I don’t do this because it is fundamentally unknowable, but because I cannot access all of the data necessary to know.

So, I’ve come to regard probably measure as an interesting and useful tool, but one that has no connection to the reality of existence.


Fair enough! I'd like to point out that the Kolmogorov complexity approach can make sense of subjective probability too. Since you lack precise enough information to predict the dice roll, the most compressed way to write down your observations will involve a Shannon-style code with your subjective probabilities. If you have enough information but not enough computational power, the resource-bounded variants of Kolmogorov complexity may be more applicable.


Can you please define Shannon-style code? (Obviously, you don't have the shannon code, the compression method, in mind).

Also, isn't Kolmogorov complexity uncomputable and you run into multiple "who shaves the barber" issues, when trying to determine it?


The compression code can be specified first. If you have a lot of data, the specification will be negligible in length, compared to the code itself. Together, the specification and the Shannon code give an upper bound on the Kolmogorov compmlexity. If this is the shortest known program, we may consider the Shannon code probabilities as our "best explanation" of the data.

You can also get posterior probabilities using a universal prior such as 2^-K(x), but of course, this can only be approximated in the limit of infinite runtime.


Kolmogorov complexity is a useful concept in the abstract but since it's not computable I find it hard to see anywhere it can really be applied. It doesn't solve the impossible problem of somehow deciding the true information content of any piece of data, it only locks it away in a slightly neater box of impossible.

How much some data means depends completely on everything else you know about the world. You could imagine under different priors of knowledge, different strings would have differing kolmogorov complexity. not technically true, because kolmogorov complexity Is fixed, but that assumes you have the absolutely omniscient model for everything.


The reference by Rathmanner & Hutter presents a useful analogy. It argues that Kolmogorov complexity (and Solomonoff induction) are best viewed as a conceptual gold standard, like a perfect chess computer that does an exhaustive tree search. Practical methods are approximations.

There are a few results where researchers were able to automatically infer evolutionary trees and such, by using a standard compression algorithm in place of K(x).


The best part of this is that dice are known to not be fair. There’s a whole niche for people who want to buy fair dice.

I can’t find it but I once saw a post that stacked ~20 d20 dice. The difference in height based on what number you picked to stack was shocking. The dice were incredibly non-uniform.


One way to deal with probability is: defer interpretation until after one has axiomatized calculation by measure theory. I have a quick video on the topic here: https://youtu.be/DnTTAd1TDyQ .


Measure theory is a powerful tool, but how does that answer the question of what probability is?

Akin to physics, studying the wave equation doesn't tell you what the wave equation is in the real world.


The circularity argument for the frequentist interpretation of probabilities seems lazy to me.

The essay argues that the frequentist view of probability is circular because it "reduces probability claims to probability claims".

One can attempt to resolve this apparent circularity by thinking in terms of claims about the mathematical theory of probabilities versus claims about an empirical theory of probability (involving limiting behavior of experiments). Frequentist statistics could possibly be seen as a means of reconciliing these mathematical and empirical theories.

The argument of the essay precludes this kind of interpretation of frequentist statistics without even considering it.


Solid write up. If anyone's interesting in knowing more about how probability _came to be_ 'regarded' as real, i highly recommend Ian Hacking's "The Taming of Chance". Thought provoking in so many ways.


Mostly well argued, but there's some very loose language, such as calling "inconsistencies" mere practical relevance issues (such as nonideal markets, nonprobabilistic decisions, ignorance and the ensuing arbitrage, changes in probability values). A bad theory can be logically inconsistent, but it is not the case of probability theory and its competing interpretations.


I like to think that probability is the ratio unknown/known, information available divided by all possibilities.

You know that a coin has 2 faces (known=2) and that if you toss it, 1 face will be up (unknown which=1).

The ratio is more about something in the mind than intrinsic to the objects.

That helped me understand why in the Monty Hall problem, when you switch doors, the probability of getting the prize increases.


> "Somehow, we must narrow down our hypotheses. Maybe you think that’s easy: only a few hypotheses describe plausible dice behavior; the rest are patently absurd! But now you’re relying on intuitive judgment, not a rigorous methodology."

Maybe, as philosopher Robert Pirsig theorized in "Zen and the Art of Motorcycle Maintenance", you must rely on Quality!


I don't think "first forty digits of pi" should be admissible. As it depends on an external definition that's not computed by the representation. I could come up with a mathematical definition for any prefix of digits, give it a name and say the 'first x digits of C'.


That's just because it's a loose description in English. It would actually be a short computer program that calculates pi, and instead of the first 40 digits you could ask for, say, a megabyte of data, or take the limit as the amount of data grows.

Transmitting a program to compute pi would be shorter than the data needed by any compression algorithm that isn't somehow based on knowing the trick.

The same trick could be used for any mathematically interesting number. The point is that incompressible random sequences exist that are not like that. You can't do better than transmitting the sequence itself.


Yes the point was merely 'referring to an external' isn't a good example of a minimal size description.


Complexity is defined by size over an entire class of objects, not a single one. Pi is reused many many times at constant cost, wile your C becomes C2, C3, C4,...


I think of probability as a summary of the structure/symmetries in a model. A model of a jar of red and blue beads is characterised by the ratio of red to blue, and that beads are only distinguished by colour.


Surely you have to define reality before approaching this question?


I wonder if it is even possible. In math, science, and philosophy some (basic) terms are necessarily left undefined and their meaning is either left open to interpretation or assumed to be evident based on common experience or convention.


It's complex! (according to quantum mechanics)


The article probably has more value than my comment.


It probably depends on the p-value.


Probably...


its pretty complex


Maybe?


> “Given competing programs whose outputs match our observations, always prefer the shortest.”

I need to know context. Output is necessary, but not sufficient. Is this a stand-alone chat server for some niche voices, or, say, Twitter?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: