>I want to know exactly what makes everyone thinks that deepseek totally owns th...

int_19h · 2024-12-31T22:36:04 1735684564

It should be noted that DeepSeek routinely claims to be a "language model trained by OpenAI", so it's pretty clear that it wasn't trained at 10x less cost from scratch, but rather on synthetic output generated by ChatGPT.

Not to point a finger at DeepSeek specifically; this is generally the case for best open source models right now. The best LLaMA finetunes tend to also use ChatGPT-generated synthetic datasets a lot.

Either way, it's unclear what the real cost is when you factor that in.

Eisenstein · 2025-01-01T04:55:05 1735707305

If that was the case why are they 10x cheaper than the competition? If everyone is doing it there would be no gains over competitors.

SubiculumCode · 2024-12-31T16:29:53 1735662593

How would we know if a Chinese company's books on training costs and expenditures was accurate?

tw1984 · 2025-01-01T07:08:14 1735715294

are you on CCP's payroll for covering their rise in AI? you are basically implying that people shouldn't believe the progress they made and thus don't need to take it seriously.

nice job, you should get a pretty solid performance review result.

SubiculumCode · 2025-01-02T16:19:09 1735834749

Oh I take China seriously, I'm merely reminding people to not blindly accept stated figures from the CCP.

nsoonhui · 2024-12-31T11:42:27 1735645347

But like I said, deepseek is open source so why can't the competitors copy whatever source that makes the cost of production 10x cheaper ?

rfoo · 2024-12-31T18:44:00 1735670640

It is not open source, it's just open weight (which is an artifact instead of source) and open "recipe". They do not make their training / serving code available.

If you started to copy what they released in May immediately after release (DeepSeek-V2, which already contained non-trivial architecture innovation - MLA), you'd likely have slightly inferior but mostly on par optimized implementation maybe after some months. And here you go: DeepSeek-V3, try to play the catch up game again!

If you don't replicate their engineering work then your cost would be 10x~20x higher, which renders the entire point moot.

As long as the team can continue this trend there is no hope for copycats. And they are trying to "hijack" the mind of chip designers, too, see the "suggestions to chip manufactures" section. If they succeed you need to beat them in their own game.

mike_hearn · 2024-12-31T12:12:59 1735647179

You have to distinguish between the current model and DeepSeek the company. DeepSeek the company can do an OpenAI and stop releasing their weights any time they like. The knowledge and skill is retained.

I really wonder how long the current era of giving models away for free can last. How is this sensible from a business perspective? Facebook got burned by iOS and now engage in what would otherwise look like irrational behavior to avoid being locked into a supplier again, but even then, they don't really need to give Llama away for free. They could train and use it for themselves just fine.

coliveira · 2024-12-31T21:18:36 1735679916

If they're smart, and of course they are, they're not releasing the latest they have. They're releasing something enough to show everyone that they're at parity or better compared to OpenAI. I imagine they already have internal models that exceed the open source one, so there's no real advantage in copying what they released.

mistercheph · 2025-01-01T19:54:00 1735761240

Open models will win, OpenAI and the other regulatory capture gamers that want to hoard their precious will certainly be an interesting footnote for the history books.

nprateem · 2024-12-31T15:21:59 1735658519

You don't think FB are trying to neuter an emerging threat? They're kneecapping what could have been a trillion dollar company if it was more difficult to replicate their tech.

cma · 2025-01-01T00:05:13 1735689913

They would have to pay more to get researchers that don't publish.

mike_hearn · 2025-01-01T13:43:35 1735739015

I was talking about open weight models more than papers, but OpenAI hardly publishes papers anymore and don't seem to struggle to get researchers. Anthropic are clearly doing a lot of special sauce given Claude 3.5 Sonnet's performance on coding, yet the papers they publish are mostly safety related. So I'm not sure that's really true anymore.

arisAlexis · 2024-12-31T11:52:08 1735645928

Of course if you arrive last and copy all the existing architecture you can train it cheaper

viraptor · 2024-12-31T11:56:32 1735646192

No, you can only train at the same cost then. (Actually higher, because you don't have the existing hardware/power agreements) The whole point of the last model was that they made significant changes beyond just copying.

arisAlexis · 2025-01-02T13:16:09 1735823769

No because you can train once and avoid all the errors that are costly and make you train and retrain.

msp26 · 2024-12-31T12:55:06 1735649706

> copy

You mean build on existing public research? Everyone does that. At least deepseek, meta etc. also have the decency to publish research back into this ecosystem.