> School is a closed-world domain—you are solving crisply-defined puzzles (multiply these two numbers, implement this algorithm, write a book report by this rubric), your solution is evaluated on one dimension (letter grade), and the performance ceiling (an A+) is low. The only form of progression is to take harder courses. If you try to maximize your rewards under this reward function, you’ll end up looking for trickier and trickier puzzles that you can get an A+ on.
> The real world is the polar opposite.
Terry Tao makes much the same point in this video: https://www.youtube.com/watch?v=MXJ-zpJeY3E