I honestly had forgot that, if I ever knew it. But I think the point stands that in many contexts you'd rather have the nuances of this kind of thing explained to you - able to represented by many different sequences of tokens, each individually being low probability - instead of simply taking the single-highest probability token "1".