Hacker News new | past | comments | ask | show | jobs | submit | andy12_'s submissions login
1. Tokenformer: Rethinking transformer scaling with tokenized model parameters (arxiv.org)
3 points by andy12_ 12 hours ago | past | 1 comment
2. Selective Attention Improves Transformer (arxiv.org)
1 point by andy12_ 24 days ago | past | 1 comment
3. The AdEMAMix Optimizer: Better, Faster, Older (arxiv.org)
2 points by andy12_ 51 days ago | past

Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: