Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
jampekka
13 days ago
|
parent
|
context
|
favorite
| on:
Dispelling misconceptions about RLHF
RL is about getting numerical feedback of outputs, in contrast to supervised learning where there are examples of what the output should be. There are many RL problems with no delayed rewards, e.g. multi-armed bandits.
Admittely most interesting cases do have delays.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search:
Admittely most interesting cases do have delays.