Do you know about expected utility? Optimal behaviour (of any kind) can be framed as "At each step, pick the action that maximizes your expected utility." So, for instance, you might study hard tonight because it'll lead you to pass your exam tomorrow and get a high-paying job later. In that scenario, studying's utility is higher than going out for a beer.

Reinforcement learning's goal is either to estimate each action's expected utility (possibly using neuron networks), or to directly learn what the best action to take is in any given situation, without bothering with utility estimation.

