Hacker News new | past | comments | ask | show | jobs | submit login

Smells like reinforcement learning in real life. Master sets ups the task environment, collects samples from students, picks the best and maybe even augments them. Students watch the master and learn... and the cycle continues.

And then the master becomes a grandmaster (unless entropy explosion occurs).






Yea, and it's what ChatGPT does sometimes even if you don't ask it for multiple options. It shows two options and asks you to select the best one. Essentially using customers to help with training.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: