Hacker News new | past | comments | ask | show | jobs | submit login

>I want to know exactly what makes everyone thinks that deepseek totally owns the LLM space?

It achieved competitive performance to the competition at literally 10x less cost of production (training). That's an incredible achievement in any industry, especially given they have such a small team relative to competitors. Their API is 20-50x cheaper than the competitors, and not because they're burning cash by charging less than costs, but rather because their architecture is just that much more efficient.

They already achieved the above in spite of sanctions limiting their availability to top-tier GPUs, and the gap between Chinese domestic GPUs and NVidia is getting smaller and smaller, so in future the GPU disadvantage will be less and less.




It should be noted that DeepSeek routinely claims to be a "language model trained by OpenAI", so it's pretty clear that it wasn't trained at 10x less cost from scratch, but rather on synthetic output generated by ChatGPT.

Not to point a finger at DeepSeek specifically; this is generally the case for best open source models right now. The best LLaMA finetunes tend to also use ChatGPT-generated synthetic datasets a lot.

Either way, it's unclear what the real cost is when you factor that in.


If that was the case why are they 10x cheaper than the competition? If everyone is doing it there would be no gains over competitors.


How would we know if a Chinese company's books on training costs and expenditures was accurate?


are you on CCP's payroll for covering their rise in AI? you are basically implying that people shouldn't believe the progress they made and thus don't need to take it seriously.

nice job, you should get a pretty solid performance review result.


Oh I take China seriously, I'm merely reminding people to not blindly accept stated figures from the CCP.


But like I said, deepseek is open source so why can't the competitors copy whatever source that makes the cost of production 10x cheaper ?


It is not open source, it's just open weight (which is an artifact instead of source) and open "recipe". They do not make their training / serving code available.

If you started to copy what they released in May immediately after release (DeepSeek-V2, which already contained non-trivial architecture innovation - MLA), you'd likely have slightly inferior but mostly on par optimized implementation maybe after some months. And here you go: DeepSeek-V3, try to play the catch up game again!

If you don't replicate their engineering work then your cost would be 10x~20x higher, which renders the entire point moot.

As long as the team can continue this trend there is no hope for copycats. And they are trying to "hijack" the mind of chip designers, too, see the "suggestions to chip manufactures" section. If they succeed you need to beat them in their own game.


You have to distinguish between the current model and DeepSeek the company. DeepSeek the company can do an OpenAI and stop releasing their weights any time they like. The knowledge and skill is retained.

I really wonder how long the current era of giving models away for free can last. How is this sensible from a business perspective? Facebook got burned by iOS and now engage in what would otherwise look like irrational behavior to avoid being locked into a supplier again, but even then, they don't really need to give Llama away for free. They could train and use it for themselves just fine.


If they're smart, and of course they are, they're not releasing the latest they have. They're releasing something enough to show everyone that they're at parity or better compared to OpenAI. I imagine they already have internal models that exceed the open source one, so there's no real advantage in copying what they released.


Open models will win, OpenAI and the other regulatory capture gamers that want to hoard their precious will certainly be an interesting footnote for the history books.


You don't think FB are trying to neuter an emerging threat? They're kneecapping what could have been a trillion dollar company if it was more difficult to replicate their tech.


They would have to pay more to get researchers that don't publish.


I was talking about open weight models more than papers, but OpenAI hardly publishes papers anymore and don't seem to struggle to get researchers. Anthropic are clearly doing a lot of special sauce given Claude 3.5 Sonnet's performance on coding, yet the papers they publish are mostly safety related. So I'm not sure that's really true anymore.


Of course if you arrive last and copy all the existing architecture you can train it cheaper


No, you can only train at the same cost then. (Actually higher, because you don't have the existing hardware/power agreements) The whole point of the last model was that they made significant changes beyond just copying.


No because you can train once and avoid all the errors that are costly and make you train and retrain.


> copy

You mean build on existing public research? Everyone does that. At least deepseek, meta etc. also have the decency to publish research back into this ecosystem.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: