Thanks for your comments! I don't see how what is discussed here conflicts with the notation I introduced into the post, do you still believe there is a soundness issue in what I have written?
I'm just saying that the notation of say A * X|Y * B seemed unfamiliar to me. I only know conditional notation within a P(...). Or an expectation, etc. Apparently your way of writing is used by others as well, but it may be good to know that it is not fully rigorous.
Again, there are different people preferring different presentations. I as a student was often frustrated by abused notations and was often confused by such things when trying to understand something in detail. For a more cursory and "practical" understanding it could be good enough.
> it may be good to know that it is not fully rigorous
What is the problem with A|B=b being a random variable? (Apart from you unfamiliarity with the concept, I mean.)
Edit: I don’t say there are no problems, I ask what do you think the problem is? There is no problem in the discrete case. In the continuous setting things are indeed more complicated (but if the limiting process is well defined there are no issues).
Note that the same lack of rigour that you find in conditional ramdom variables affects conditional probabilities. If you can accept the latter there is no reason to reject the former.
A random variable is different concept from a distribution. For me personally it is helpful to keep them separate, but I can see that others may not care about the complete conceptual picture.
In the PDF file linked above I can see conditional probabilities, conditional distributions and conditional expectation etc, which are all valid and rigorous. I can see that the author thinks it's a good idea to merge these into a single concept of conditional random variable for didactic reasons, but that's not a rigorous concept.
Practically, if you have two random variables then you can take their joint distribution. What would be the joint distribution of (A|B) and (C|D)? For actual random variables it's simple: you can take intersections in event space, but a "conditional random variable" does not correspond to any subset of the event space.
Very simply speaking (this is my working model, not the exact precise math definition which involves a lot of measure theory): in probability theory we have an event space containing atomic events that cover all possible outcomes for the whole experiment/observation. A random variable is a function that maps from each such potential (atomic) event to a number. That's right. The random variable is a function but not the mass function, which maps from a number to a probability.
Conditional probability P(A|B) is an expression defined to mean P(A,B)/P(B). That's a clear definition. I am yet to see the actual definition of a conditional random variable.
Again, disclaimer 1: I can see the practicality of disregarding formality. Still I argue this is best done only when you do know better but it would be tedious to be technically correct all the time. But as a beginner I find it more useful to keep track of the correct concepts. For example not distinguishing random variables and distributions can be very confusing when considering more advanced things, like mutual information and KL-divergence. The former operates on random variables, the latter on distributions. I remember this was a difficult realization for me because the material we used didn't emphasize the difference enough, probably in the name of practicality.
> Practically, if you have two random variables then you can take their joint distribution.
If they are defined in the same sample space.
> a "conditional random variable" does not correspond to any subset of the event space
I would say it's exactly the other way around, the domain of a "conditional random variable" is a subset of the domain of the "unconditioned" random variable (the subset where the conditioning holds).
I think it will help if you think in terms of conditioning on (for example, a coarser sigma algebra). You would get another random variable that is measurable on the sigma algebra you conditioned on. If that is coarser so would be the new function you obtained by conditioning.
Let's talk about a fair dice roll to make it concrete, and let the rolled number be X and let the event that we rolled an even number be E. P(X=6|E) = 1/3. P(X|E) is a distribution where 1,3,5 has 0 probability mass and 2,4,6 have 1/3 each.
If we consider X|E as a random variable, what is its value if we roll an odd number? Undefined? What does that mean? Random variables always have some value.
Sure you can build a new event space (sigma algebra) but then you can't use random variables over the original one.
Let's consider two independent rolls, X and Y. You can't compute the joint distribution P(Y, (X|E)), it just doesn't make sense as the two "variables" are defined over different spaces. Note that this is not the same as P(X,Y | E). The latter is simple a conditional probability, without any concept "conditional random variables".
Again, this is totally obvious to people who have experience with probabilities, but could be confusing to students. Such cases are where students who try to understand the details may be left more confused than students who just want to get the main idea.
Sure you can. The TLDR would be "piecewise constant projection"
I think picking up a standard graduate probability book will clear this up better than any long comment trail. There are no problems defining a coarser sigma algebra using an original one and then defining a function measurable on the new sigma algebra. Note this continues to be an r.v. in the original space as meaurability is preserved. A consistent definition the values of the conditioned r.v. would be the piecewise constant approximation of the original r.v. over the indivisible elements of the coarser sigma algebra.
Let me try another route.
You seem to be accepting of a conditional expectation. Now what is a conditional expectation if not a function. Now all we need is that function be measurable with respect to the new sigma algebra, thats ensured byconstruction. Hope it helped some
> I think picking up a standard graduate probability book will clear this up better than any long comment trail.
Can you recommend one? I just picked up Probability and Measure by Billingsley and it does not mention "conditional random variable" a single time in over 600 pages. It does have a lot of "conditional probability", "conditional distribution", "conditional expectation" etc.
> You seem to be accepting of a conditional expectation.
Conditional expectation is defined in terms of conditional probabilities, and those are in turn explicitly defined as P(A|B)=P(A,B)/P(B), so there's nothing not to accept.
Billingsley is pretty darn good. It might have left the connection as a dotted line given that the notion is no different from conditional expectation. The only connection you have to make is conditional expectation is a function and a random variable. You must have seen expectation taken of a conditional expectation. That should should convince you that condititional expectation is indeed a random variable. Since that r.v. was obtained by conditioning its not a stretvh to call it a conditioned r.v.
Any book that explains conditioning over a sigma algebra should suffice. You could try Loeve, Dudely or Neveu but dont remember if its mentioned explicitly.
BTW conditional expectation is really more fundamental than conditional probability. Its the former that yields the latter in measure theoretic probability. If you want to drink from the source that would be Kolmogorov.
Finally if you are reading Billingsley you are adequately qualified to call yourself a mathematician.
It's getting a little tedious. Please show me a concrete citation of a serious textbook (not a tutorial/handout by a grad student or a paper by a random researcher) that puts the three words "conditional random variable" next to each other (consistently, not simply as a one-off potential mistake). Google doesn't show serious sources for it.
While I agree with isolated points of your comment I think it doesn't add up to a useful/coherent concept of conditional random variable.
Thats a little too much to ask, perhaps if they were grep'able I could have obliged, unfortunately I dont have a photographic memory.
More concretely its just another name for conditional expectation. I am assuming you are aware that conditional expectation is a random variable obtained via conditioning (equivalently as a piecewise approximation in L_2). If you arent familiar with that view point that would be the place to start. Kolmogorov, Neveu, Dudely, Billingsley will all cover that view point.
> I am assuming you are aware that conditional expectation is a random variable
That's not what we're considering here, but things of the form X|Y=y for a concrete y. Even as E[X|Y=y], that's not a function, y is specified. Do you agree we shouldn't call X|Y=y a conditional random variable?
The expectation E[X|Y=y] is a fixed value. (Edit: it’s the expectation of the random variable “X|Y=y”, while E[X|Y] is a random variable because it’s a function of the random variable Y: for each element in the sample space there is a corresponding value of “y” and in turn there is a value of the expectation E[X|Y=y].)
X|Y=y (as used in the blog post being discussed) is a random variable: it’s a function from a subset of the original sample space (corresponding to the elements for which the value of the random variable Y is y) to real values (or whatever the image of the X random variable is).
> If we consider X|E as a random variable, what is its value if we roll an odd number? Undefined? What does that mean? Random variables always have some value.
Random variables have some value on their domain, and for the random variable X | E=1 the sample space is restricted to the elementary events {2,4,6} which conform the composite event E=1. The original sample space is partitioned in the subspaces {1,3,5} and {2,4,6} when we condition on the values of the random variable E (0:odd, 1: even).
> Sure you can build a new event space (sigma algebra) but then you can't use random variables over the original one.
I guess we all agree then.
> Let's consider two independent rolls, X and Y. You can't compute the joint distribution P(Y, (X|E)), it just doesn't make sense as the two "variables" are defined over different spaces.
The variables X and Y describing independent rolls are also defined over different spaces and to have a joint distribution you have to define a "common" sample space of the form {x=1,y=1},{x=2,y=1},..,{x=6,y=6}.
You could do the same for a roll of a dice and the toss of a coin. Or do you think that computing the joint distribution of a coin toss and a dice roll doesn't make sense because they are defined over different spaces?
> You could do the same for a roll of a dice and the toss of a coin. Or do you think that computing the joint distribution of a coin toss and a dice roll doesn't make sense because they are defined over different spaces?
Of course it doesn't! You first have to define them on a common space (the Cartesian product), and for that you have to specify their joint probabilities. One example might be that you model them as independent. Otherwise we wouldn't know how the coin and the dice relate. Sure independence is usually a good default assumption, but it's still a necessary step.
What did you mean with the following paragraph then?
> Let's consider two independent rolls, X and Y. You can't compute the joint distribution P(Y, (X|E)), it just doesn't make sense as the two "variables" are defined over different spaces.
Do you agree that you cannot compute the joint distribution P(Y,X) either because the two variables are defined over different spaces?
If you mean that the space for this single experiment composed of two rolls (random variables X and Y) is the cartesian product of {x=1,x=2,x=3,x=4,x=5,x=6} and {y=1,y=2,y=3,y=4,y=5,y=6}, then I agree.
But the fact that each variable alone is defined on the "same" sample space {1,2,3,4,5,6} is irrelevant.
The situation is no different from the joint probability for random variables X and Z corresponding to a single experiment consisting of a dice roll and a coin toss, where the relevant space is the cartesian product of {x=1,x=2,x=3,x=4,x=5,x=6} and {z=1,z=2}.
And it is also similar for the situation you asked about, with a random variable Y and a "conditional" random variable X|Even. The relevant space is the cartesian product of {y=1,y=2,y=3,y=4,y=5,y=6} and {x=2,x=4,x=6}.
Let's consider something with less independence, because it makes things harder to notice. Temperature indoors T1, temperature outdoors T2, IsOvercast O.
Let's say T2|O=1 is a "conditional random variable". Let's consider the average temperature indoors and outdoors. What would ((T1|O=1) + T2)/2 even mean? How could you use the two "variables" in the same expression? What is even their joint distribution? They are defined over different spaces!
This means, we must always carefully condition all variables used together on the exact same things. So ((T1|O=1) + (T2|O=1))/2 is valid. But then why do this on all variable instances that we use? It would be very tedious. At some point we want to get to a distribution (or some function of a distribution, like the expectation or variance), so it's much simpler to say for example P((T1 + T2)/2 | O=1), which is just a good old conditional distribution. Conditioning is an operation on a distribution and in my mind the bar (|) is really a slot in the P() notation and is short for P(A,B)/P(B). A bar popping up elsewhere (like in expectations) must be directly determined by the distribution (a random variable is not).
Overall, since you cannot mix differently conditioned "conditional random variables" in a single expression, you may just as well put your conditioning on the side of the whole expression in the P().
> How could you use the two "variables" in the same expression?
Do you expect to be able to use every random variable which can be conceived in the same expression?
If you object to the name “conditional random variable” [+] that’s debatable, but if you say that the resulting thing is not a random variable I think you are wrong.
Another thing that is a random variable, even though I suspect you may not like it, is the probability distribution of a random variable.
[+] which I don’t think was actually used by the OP, by the way.