Hacker News new | past | comments | ask | show | jobs | submit login
DeepSeek-V2: A Strong, Economical, and Efficient Moe Language Model (github.com/deepseek-ai)
14 points by jasondavies 33 days ago | hide | past | favorite | 3 comments



It's claiming to be llama3-70B tier in strength, 3x cheaper, 3-5x faster than it due to only having 21B out of 400B+ activated at any one time. With L3-70B normally costing <$1/Million.


It's performance at 21B parameters is very impressive.

I also like using something between 13 and 70B parameters, since it will run on a 32GB MacBook Pro easily.


Do note that it has 236 B parameters which makes the weights ~450 GB.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: