You can instead of selecting according to the largest expectation select according to the largest value you achieve with a certain variance deviation from the calculated expectation. Law of large numbers will apply and you will have tighter and tighter bounds around the expectations. That is a superior and more efficient method used in monte-carlo simulations, e.g. for AI playing the game Go.

