The return of patch-based self-supervision! No ResNets but from one of the authors of the initial paper.
Now with ViT, very simple self-supervised (SSL) pre-training shines again. It outperforms the contrastive learning counterparts and is simpler than BEiT, as its pixel-based (no tokenisation) needed. No need for special augmentation considerations like BYOL. i1k=87.8%
The "Decision Transformer: Reinforcement Learning via Sequence Modeling" : by Chen L. et al explaining how to use transformer to replace dynamic programming for context length reference.