None of the models other than the 600b one are R1. They’re just prev gen models ... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

		siwakotisaurav 1 day ago \| parent \| context \| favorite \| on: Nvidia’s $589B DeepSeek rout None of the models other than the 600b one are R1. They’re just prev gen models like llama or qwen trained on r1 output making them slightly better

int_19h 1 day ago | [–]

"Slightly" is an understatement, though. Distillations of R1 are significantly better than the underlying models.

doctorpangloss 1 day ago | [–]

Yeah but the second comment you see believes they are, and belief is truth when it comes to stock market gambling.

Consider applying for YC's Spring batch! Applications are open till Feb 11.
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact