DeepSeek-V2: A Strong, Economical, and Efficient Moe Language Model

unraveller · 2024-05-07T13:24:37

It's claiming to be llama3-70B tier in strength, 3x cheaper, 3-5x faster than it due to only having 21B out of 400B+ activated at any one time. With L3-70B normally costing <$1/Million.

bearjaws · 2024-05-07T00:35:12

It's performance at 21B parameters is very impressive.

I also like using something between 13 and 70B parameters, since it will run on a 32GB MacBook Pro easily.

thethirdone · 2024-05-07T03:42:37

Do note that it has 236 B parameters which makes the weights ~450 GB.