Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: How many times will this loop run on average?
1 point by aalhour 4 months ago | hide | past | favorite | 11 comments
A friend of mine shared the following code snippet with me and asked me to guess how many times the loop will run on average:

  for(int i = 0; i < Random(1, 100); i++);
I tried to guess the answer analytically and gussed ~50 but the empirical test was surprising to me. Can someone explain why the average is around 12?

Runnable code: https://replit.com/@aalhour/RandomLoop#main.py

EDIT: Formatting.




In C (and your Python conversion), the i < Random(1, 100) part is evaluated each time through the loop, not once at start of the loop to determine the limit.

So, it’s 1% that the loop ends at i = 1, if it takes that hurdle 2% that it ends at i = 2, if it takes that hurdle 3% that it ends at i = 3, etc.

The calculation is easier if you phrase that this way:

It’s 99% that the loop continues at i = 1, if it takes that hurdle 98% of the rest that it continues at i = 2, if it takes that hurdle 97% that it continues at i = 3, etc.

So, the probability to make it past i = n is

  0,99 × 0,98 × 0,97 × … × (1 - n/100)
Note that this isn’t necessarily the case in all languages. Pascal, for example, has a real for loop where the limit is evaluated once and the index variable cannot be changed inside the loop, so that the compiler can determine the number of iterations before starting the first iteration.


I was aware that the Random function call gets evaluated in Python (and C) every time the loop iterates, I just couldn't imagine the probability distribution myself, I had assumed that all numbers are uniformally distributed but didn't cater for the sums. You're a legend! Now I see it. I wrote the following code to check out your argument and it checks out indeed :thumbs-up:

  def p(n):
    total = 1
    for i in range(1, n):
      total *= (100-i)/100
    return total


  for i in range(1, 100):
    print(f"p({i}) = {p(i)}")

The first 13 results:

  p(1) = 1
  p(2) = 0.99
  p(3) = 0.9702
  p(4) = 0.9410939999999999
  p(5) = 0.9034502399999998
  p(6) = 0.8582777279999998
  p(7) = 0.8067810643199997
  p(8) = 0.7503063898175998
  p(9) = 0.6902818786321918
  p(10) = 0.6281565095552946
  p(11) = 0.5653408585997651
  p(12) = 0.503153364153791
  p(13) = 0.44277496045533604
p(12) sits at 50%, of course it will be the mean! :D


> p(12) sits at 50%, of course it will be the mean!

Technically, “the mean is 12” does not follow at all from “p(12) sits at 50%”.

“p(12) sits at 50%” implies the median (https://en.wikipedia.org/wiki/Median) is 12, and that can differ from the mean (https://en.wikipedia.org/wiki/Arithmetic_mean) of the distribution, and the difference can be quite large.

For example, if that list were to continue

  p(13) = 0.44277496045533604
  p(14) = 0.44277496045533604
  p(15) = 0.44277496045533604
  p(16) = 0.44277496045533604
  p(17) = 0.44277496045533604
  p(18) = 0.44277496045533604
  …
  p(1000) = 0.44277496045533604
  P(1001]) = 0
the about 44% that makes it to 13 also will make it to 1000, and the mean value will be a bit more than 442.77496045533604 (the contribution of that 44.2%)

If, on the other hand, it were to continue

  p(13) = 0.44277496045533604
  p(14) = 0.0
No result would be higher than 13, and the mean would be lower than 12.


Expected index given by

E [X] = \sum_{i \in [2,100)} p(X < i)\prod_{j < i}p (X \ge j)i.

Edit: Sorry rest of reply was wrong. Had to account for not hitting until i^th loop.


Not sure that adds up, I translated the right side of the equality into a Python statement and it returns a number I am not sure how to interpret:

  1/99 * sum([(i-1)*i for i in range(2, 100)])   # 3266.666666666667
Both the median and mean are around 12 for 10K runs of the loop.


I edited my response with an apology! Thanks for checking!


No worries, I think your solution coverges with @Someone's above, if I am not mistaken. You're multiplying the probabilities for all values of j up to i for all values of i.


Yes I am and yes I think it does:

edit - actually, the calculations from the other guy are probability calculations, so this is related to the median. I have given you a mean calculation, which should be related to the average. But you already mentioned that they are close ...


You’re calculating a random number every time the loop runs. If you want an average of 50 you should call random outside the loop and save the value to be compared each time.


Yes, that's true. I don't want an average of 50. My initial guess of 50 is wrong. I am trying to understand why the average turned out to be approximately 12.


The only random variable in that code is "Random(1, 100)" and its expected value is 50.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: