For those interested in looking slightly more into the characteristic function, it may be worth pointing out that the characteristic function is equal to the Fourier-transform (with the sign of the argument being reversed) of the probability distribution in question.
In my own experience teaching teaching probability theory to physicists and engineers, establishing this connection is often a good way of helping people build intuition for why characteristic functions are so useful, why they crop up everywhere in probability theory, and why we can extract so much useful information about a distribution by looking at the characteristic function (since this group of students tends to already be rather familiar with Fourier-transforms).
Yes, this provides good intuition about why it is useful: the PDF of the sum of two random variables is the convolution of the original PDFs. A convolution is awkward to work with, but by the convolution theorem it is a multiplication in the Fourier domain. This immediately suggests that the Fourier transform of a PDF would be a useful thing to work with.
If you don't say that this is what you are doing then it all seems quite mysterious.
This is a good place to use cumulants. Instead of working with joint characteristic functions, which gets messy, it lets you isolate the effects of correlation into a separate term. The only limitation is that this doesn't work if the moment doesn't exist.
As a physicist, the moment when everything just clicked was when I realised that connected Feynman diagrams were basically the cumulants of that distribution.
Then almost everything in physics is about "what is the characteristic/moment/cumulant generating function?" and associated Legendre transforms
A little known bit of history is Feynman developed a diagrammatic method for expressing the moments of PGFs in his study of the stochastic theory of fission chains. This was before his work on QED. See:
Wow. I am not a physicist, but I use pdfs and moments and cumulants all the time. I came up with my own method to calculate cumulants for affine processes using some recursions, and they work. But if I hear you right, I might have stumbled upon something that Feynman did 70 years ago, and he probably did it better. Any good links you can recommend?
> As a physicist, the moment when everything just clicked was when I realised that connected Feynman diagrams were basically the cumulants of that distribution.
And the generating function of the cumulants is the logarithm of the generating function of the distribution (Fourier transform).
I feel like it's almost criminal of textbook writers not to mention this when introducing the characteristic function... At least as an aside or a footnote, for readers already familiar with Fourier transforms.
but isn't a characteristic function just "the" way to bridge the gap between sets, functions, and logic(? ...a 3way bridge!?)
I mean, it was useful for me to think about like a translation between sets and logic (this variable x is in the set xor not) into functions (a function f(x) that returns 1 or true whenever x is in set S)
You're thinking of a "characteristic function" in the sense of "indicator function" of a subset (https://en.wikipedia.org/wiki/Indicator_function), which is different thing to the characteristic function of a probability density function.
“Characterstic function” is (was) an overloaded term.
What you described is more often referred to as an “indicator function” these days, with “characteristic functions” denoting the transform (Fourier, laplace, z - depending on context). Closely related to “moment generating functions” to the point of being almost interchangeable.
so the same thing but, characterisic function as I knew them before these posts is a rudimentary 2-variable finite version. point and line (but the line is a curve, a circle because e).
but the new and improved 21st century characteristic functions are n-variable and have a full continious spectrum of variables between zero (false) and one (true) but only potentially lest infinite realizes itself (which would make the theories illogical).
- The characteristic function of a random variable X is defined as the function that maps t --> ExpectedValue[ exp( i * t * X ) ]
- Computing this expected value is the same as regarding t as a constant and integrating the function x --> exp( i * t * x) with respect to the distribution of X, i.e. if X has the density f, we compute the integral of f(x) * exp( i * t * x) with respect to x over the domain of f.
- on the other hand: computing the Fourier transform of f (here representing the density of X) and evaluating it at point t (i.e. computing (F(f))(t) if F represents the Fourier transform) is the same as fixing t and computing the integral of f(x) * exp( -i * t * x) with respect to x.
- Rearranging the integrand in the previous expression to f(x) * exp( i * -t * x), we see that it is the same as the integrand used in the characteristic function, only with a -t instead of a t.
In my own experience teaching teaching probability theory to physicists and engineers, establishing this connection is often a good way of helping people build intuition for why characteristic functions are so useful, why they crop up everywhere in probability theory, and why we can extract so much useful information about a distribution by looking at the characteristic function (since this group of students tends to already be rather familiar with Fourier-transforms).