I still don't understand how the "prediction function" is generating frames?
From the last line of the paper it seems to suggest MuZero is generalizable to other domains.
But the appendix states "the network rapidly learns not to predict actions that never occur in the trajectories
it is trained on"
Consider the problem of predicting the next N frames of video from a one minute youtabe sample chosen at random. Where there is a high probability of some sort of scene transition in the interval. Short of training on a large subset of the youtube corpus.
> The main idea of the algorithm ... is to predict those aspects of the future that are directly relevant for planning. The model receives the observation ... as an input and transforms it into a hidden state... There is no direct constraint or requirement for the hidden state to capture all information necessary to reconstruct the original observation, drastically reducing the amount of information the model has to maintain and predict.
"We speculate that the complexity of world models could be greatly decreased if they could fully leverage this idea: that a complete model of the world is actually unnecessary for most tasks - that by identifying the important part of the world, policies could be trained significantly more quickly, or more sample efficiently".
That is just what this paper does, as I understand it, by bringing it together with the tree search from AlphaZero.