Hacker News new | past | comments | ask | show | jobs | submit login

Very impressive - key quote: "In addition, MuZero is designed to operate in the general reinforcement learning setting: single-agent domains with discounted intermediate rewards of arbitrary magnitude. In contrast, AlphaGo Zero and AlphaZero were designed to operate in two-player games with undiscounted terminal rewards of ±1."

It's worth noting that, as impressive as MuZeros performance is in the many Atari games, it achieves a score of 0.0 in Montezuma's Revenge.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact