Hacker News new | past | comments | ask | show | jobs | submit login

        # for each lever, 
            # calculate the expectation of reward. 
            # This is the number of trials of the lever divided by the total reward 
            # given by that lever.
        # choose the lever with the greatest expectation of reward.
If I'm not mistaken, this pseudocode has a bug that will result in choosing the expected worst option rather than the expected best option. I believe it should read "total reward given by the lever divided by the number of trials of that lever".



Correct, that's why I don't trust reading code comments




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: