Hacker News new | past | comments | ask | show | jobs | submit login

You're welcome!

The agent "learns" by stumbling around and interacting with its environment. At the beginning, its behavior is pretty random, but as it learns more and more, it refines its "policy" to collect more rewards more quickly.

Brute force is certainly possible for some situations. For example, suppose you're playing Blackjack. You can calculate the expected return from 'hitting' (taking another card) and 'standing' (keeping what you've got), based on the cards in your hand and the card the dealer shows.

So...brute force works for simple tasks, but in a lot of situations, it's hard to enumerate all possible states (chess has something like 10^47 possible states) and state-action pairs. It's also difficult to "assign credit"--you rarely lose a chess game just because of the last move. These make it difficult to brute-force a solution or find one via analysis. However, the biggest "win" for using RL is that it's applicable to "black box" scenarios where we don't necessarily know everything about the task. The programmer just needs to give it feedback (though the reward signal) when it does something good or bad.

Furthermore, depending on how you configure the RL agent, it can react to changes in the environment, even without being explicitly reset. For example, imagine a robot vacuum that gets "rewarded" for collecting dirt. It's possible that cleaning the room changes how people use it and thus, changes the distribution of dirt. With the right discounting setup, the vacuum will adjust its behavior accordingly.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact