Hacker News new | past | comments | ask | show | jobs | submit login
How to Train a Million Context LLM (latent.space)
15 points by 7d7n 3 months ago | hide | past | favorite | 1 comment



oh hey we're on HN! author/host here, we think the story of long context over the past year is worth reviewing so we invited Mark on to talk about extending Llama 3 to >1m tokens.

a year ago we were talking to MosaicML (https://x.com/swyx/status/1660033177178734592) about their 65k+ model. now people yawn when we have yet another 1m token model. wild.

the TLDR in the pod seems to be Meta choosing to train Llama with a RoPE scaling theta factor that can be tweaked for finetuning. Once Gradient noticed that it was off to the races.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: