Hacker News new | past | comments | ask | show | jobs | submit login

Given that you can download and use the weights, the model architecture has to be includded as part of that. And I did read a paper from them recently describing their MoE architecture and how it differs from the original GShard.



Excuse me? What weights can you download from OpenAI? gpt2 does not count


Sorry I meant that DeepSeek release their models. Wrong context.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: