Ok, I'm stumped. If I've parsed your description correctly, we get no info about our opponent's actions or the results until the end. Absent any ability to observe their strategy, it seems like you do want to maximize for expected value of your own actions, and I'm curious about the counterexample.
How do we maximize EV? A single throw's pdf is 1, for x in [0,1], so its EV is 0.5. The question is how to improve on a single throw by deciding to re-throw. A re-throw is independent and gives the same EV. We want a strategy that gives us higher cumulative EV. Say our strategy is that we have a threshold A, where we re-throw any result below A. Because x is uniform, the probability that we re-throw is also A. The cumulative EV of the strategy is A * EV(second_throw) + (1-A) * EV(keep_first_throw). Since we only keep the first throw for results in [A,1], the EV for that event (integrating x * pdf from A to 1) is (1+A)/2. So EV of the whole strategy is A/2 + (1-A) * (1+A)/2. It has max EV when A is 0.5, giving EV of 5/8.
I don't really get it either, but I think the idea is that you're supposed to somehow optimize on the assumption that your opponent is equally likely to be using any strategy. (To my mind this makes the problem very confusing because it's such an unrealistic assumption.) If you assume that the only possible strategies are "rethrow if my first throw is <= x", then you can define a function f(x,y) giving the EV where x is the threshold for your rethrow and y is the threshold for your opponent's. I guess you then integrate with respect to y (to get the EV for any given x assuming a uniform distribution for the opponent's choice of y) and then differentiate the result with respect to x to find the maximum? If the solution is along those lines that would explain the problem's supposed accessibility to good high school students.
The reason maximizing EV isn't the answer is that you care about having a larger number than your opponent, not about having the largest number possible.
Imagine we play a die game where whoever rolls the largest number wins. Which die would you rather play with?
1-1-1-1-1-10^250 or 2-2-2-2-2-2
The first die has an EV of about 1.67e249, the second die an EV of 2. Yet, the second die will win 5 out of 6 games against the other one.
To solve the problem, you must find the Nash equilibrium of the game. That is, you must find a strategy which the opponent cannot exploit, no matter what he does.
And no, you don't assume that every strategy is equally likely. You assume that the opponent plays optimally with full knowledge of your own strategy, which means you're looking for a Nash equilibrium http://en.wikipedia.org/wiki/Nash_equilibrium
That's fine if you already know game theory, but then the problem is not quite as accessible as you made out. You can't expect someone to just invent the concept of a Nash Equilibrium on the spot from the words "best strategy". (Although I understand that you may expect your interviewees to already be familiar with game theory.)
In this case, the candidate's thesis work explicitly dealt with game theory. In addition, this doesn't require much knowledge of game theory... if you think about the problem for a bit, you can re-derive the concept fairly easily.
Nash's brilliance was in proving that under reasonable conditions, a mixed equilibrium always exists, which is far less obvious.
Well not quite, you also need to specify that the opponent assumes that you are perfectly rational, and that he assumes that you assume that he is, and that he assumes that you assume that he assumes that you are, and so on. A perfectly rational opponent would not assume that you are perfectly rational in the absence of any relevant information. Rather, he would assume that his opponent was equally likely to be using any strategy.
Makes sense. I didn't realize the problem description was for a game that would be played repeatedly with the same opponent -- i.e. one where you can gather information. If we were doing this in person instead of on a internet forum and async post/response we'd probably iron that out quickly. Thanks for elaborating.
How do we maximize EV? A single throw's pdf is 1, for x in [0,1], so its EV is 0.5. The question is how to improve on a single throw by deciding to re-throw. A re-throw is independent and gives the same EV. We want a strategy that gives us higher cumulative EV. Say our strategy is that we have a threshold A, where we re-throw any result below A. Because x is uniform, the probability that we re-throw is also A. The cumulative EV of the strategy is A * EV(second_throw) + (1-A) * EV(keep_first_throw). Since we only keep the first throw for results in [A,1], the EV for that event (integrating x * pdf from A to 1) is (1+A)/2. So EV of the whole strategy is A/2 + (1-A) * (1+A)/2. It has max EV when A is 0.5, giving EV of 5/8.
So how do you do better?