Hacker News new | past | comments | ask | show | jobs | submit login
Generating Random Numbers from a Specific Distribution by Inverting the CDF (demofox.org)
66 points by Atrix256 on Aug 6, 2017 | hide | past | favorite | 10 comments

Statisticians call this inverse transform sampling [0]. The dual result, that Y = F(X) is uniformly distributed for F the cdf of any random variable X, is called the probability integral transform [1].

0. https://en.wikipedia.org/wiki/Inverse_transform_sampling

1. https://en.wikipedia.org/wiki/Probability_integral_transform

Related fun fact: the distribution of F(X) is uniform on the support of F. Not hard to prove, either.

For those needing more detail on generating numbers from a specific distribution there is Knuth's The Art of Computer Programming, Vol. 2 Seminumerical Algorithms, see section 3.4 Other types of random quantities. Special treatment is given for many distributions: normal, exponential, gamma, chi-squared, f-distribution, binomial, Poisson, along with discussion of general methods. Although there have been more recent advances in the generation of high-quality, high-speed generation of uniformly distributed random numbers, Knuth covers a whole range of issues related to pseudorandom number generation.

I used inverse transform sampling extensively in my thesis. I used it for fitting distributions to data (new methods)and similarity analysis of CDF's. In the software engineering field in am frequently appalled by how often basic probability techniques are ignored in favor of just the "mean" and "std. deviation".

For any individual, there is a limited time to study material. Any time that is allocated to study, for example, statistics, is time not spent on studying, as an example, a new computer language paradigm.

I am also surprised how some computer engineers take mathematics/statistics lightly. Even so, I understand that there is a pragmatic path in trying to do one field well (computer engineering), versus spreading effort across more fields. If knowing mean and variance suffices for work, then knowing it is good enough.

I'm curious if you could elaborate on this more. What other probably techniques would you recommend knowing?

I'd recommend knowing about bootstrapping[1], there must be simpler articles but the basic idea is to randomly sample with replacement to generate new datasets which can then be used to test the variation of properties of the original dataset.

[1] https://en.wikipedia.org/wiki/Bootstrapping_(statistics)

I'm sorry, but this post is not a very good explanation. Here's why we should invert the CDF:

> The issue is that if we flip x and y’s in a PDF, there would be multiple y values corresponding to the same x. This isn’t true in a CDF.

Surely there must be a better reason to invert the CDF than that it's possible?

Any textbook will explain this much, much better.

Have another way to explain it that you think is more appropriate?

This is awesome! I wanted to do this for a procedural generation algorithm during a game jam but didn't have the time to figure it out.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact