Hacker News new | past | comments | ask | show | jobs | submit | andy12_'s submissions login
1. Tokenformer: Rethinking transformer scaling with tokenized model parameters (arxiv.org)
3 points by andy12_ 78 days ago | past | 1 comment
2. Selective Attention Improves Transformer (arxiv.org)
1 point by andy12_ 3 months ago | past | 1 comment
3. The AdEMAMix Optimizer: Better, Faster, Older (arxiv.org)
2 points by andy12_ 4 months ago | past

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: