Hacker News new | past | comments | ask | show | jobs | submit login

This is related, and it is the paper that lives constantly rent free in my head. I think it will retroactively be viewed as revolutionary: https://www.alexwg.org/publications/PhysRevLett_110-168702.p...

Basically, intelligent behavior is optimizing for "future asymptotic entropy" vs maximizing any immediate value. How intelligent a system is then become a measure of how far in the future it can model and optimize entropy effectively for.

(updated with pdf link)




Great paper! There are some similar ideas to this in game theory and reinforcement learning (RL):

[1]: Thermodynamic Game Theory: https://adamilab.msu.edu/wp-content/uploads/AdamiHintze2018....

[2]: piKL - KL-regularized RL: https://arxiv.org/abs/2112.07544

[3]: Soft-Actor Critic - Entropy-regularized RL: https://arxiv.org/abs/1801.01290

[4]: "Soft" (Boltzmann) Q-learning = Entropy-regularized policy gradients: https://arxiv.org/abs/1704.06440




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: