Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Ok great explanation. Going to your last paragraph, what does this mean “in practice”. Say I have a dataset of 15k people throwing a dice 1 time. Can I group it into 300 random samples of 50 dice throws and now say I have 300 random samples and apply the CLT? Each random sample should be i.i.d no?

I was just curious if one can go the other way around, because usually you only have 1 sample of size 15k and not 300 samples of size 50. If you have the raw data it’s just 15k samples of size 1 or 1 sample of size 15k, depending on how you look at it.

Then I could be wrong here, but doesn’t the proof of the CTL also assume that each random sample is the same size? So you can’t have one sample of size 30 and another of size 20 and another of size 25, etc? Each of the 300 samples must be size 50?



Yes, each dice roll is i.i.d, so you can choose to look at it as 300 random samples of 50 throws each, or any other choice of N.

Put another way, what you have is 15k dice throws, and the fact that they were thrown by 15k, 5k, or 300 people can be ignored, if you choose to. In fact, it may be useful to 'shuffle' dice rolls into new 'samples' - that gets in to the use of resampling and bootstrap techniques.

And also yes, for CLT to apply, each sample must have the same N.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: