Hacker News new | past | comments | ask | show | jobs | submit login

The biggest increase in model performance recently came from training them to do chain-of-thought properly - that is why DeepSeek is as good as it is. This requires a lot more tokens for the model to reason, though. Which means that it needs a lot more compute to do its thing even if it doesn't have a massive increase in parameter size.





Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: