However, any time you have a reward signal (like the score of the game) in a multi-step decision problem, like a game where you take actions sequentially (e.g. once per turn), you need RL machinery to make sense of the data. Maybe you take an action now, and you might only reap the reward of that action in the future. So how do you "label" the action right now? You label it with some measure that takes into account the future of the reward signal.
So some human plays a game and gets a super high score. You only see that they got a high score at the end of the game. How do you go back and label the 150 actions that led you to the score? That's the part that is RL.