Except it’s not really a fair comparison, since DeepSeek is able to take advanta...

matthewdgreen · 2025-01-20T14:01:30 1737381690

This is what many folks said about OpenAI when they appeared on the scene building on foundational work done at Google. But the real point here is not to assign arbitrary credit, it’s to ask how those big companies are going to recoup their infinite budgets when all they’re buying is a 6-12 month head start.

wrasee · 2025-01-20T14:09:54 1737382194

This is true, and practically speaking it is how it is. My point was just not to pretend that it’s a fair comparison.

mattlutze · 2025-01-20T14:54:50 1737384890

For-profit companies don't have to publish papers on the SOTA they product. In previous generations and other industries, it was common to keep some things locked away as company secrets.

But Google, OpenAI and Meta have chosen to let their teams mostly publish their innovations, because they've decided either to be terribly altruistic or that there's a financial benefit in their researchers getting timely credit for their science.

But that means then that anyone with access can read and adapt. They give up the moat for notariety.

And it's a fine comparison to look at how others have leapfrogged. Anthropic is similarly young—just 3 and a bit years old—but no one is accusing them of riding other companies' coat tails in the success of their current frontier models.

A final note that may not need saying is: it's also very difficult to make big tech small while maintaining capabilities. The engineering work they've done is impressive and a credit to the inginuity of their staff.

miohtama · 2025-01-20T15:26:47 1737386807

These companies could not retain the best talent if they cannot publish:an individual researcher needs to get his name there "to get better."

kridsdale1 · 2025-01-21T09:10:04 1737450604

Exactly. This is why Apple is so far behind.

wrasee · 2025-01-20T16:25:31 1737390331

Anthropic was founded in part from OpenAI alumni, so to some extent it’s true for them too. And it’s still taken them over 3 years to get to this point.

techload · 2025-01-20T14:16:29 1737382589

You can learn more about DeepSeek and Liang Wenfeng here: https://www.chinatalk.media/p/deepseek-ceo-interview-with-ch...

nowittyusername · 2025-01-21T02:34:11 1737426851

That was a really good article. I dig the CEO's attitude, i agree with everything he says and I am an American. From a Chinese perspective he must be talking an alien language so I salute him with trying to push past the bounds of acceptable hum drum. If the rest of China takes on this attitude the west will have serious competition.

versteegen · 2025-01-20T15:49:30 1737388170

This article is amazing. It explains not just why DeepSeek is so successful, but really indicates that innovators elsewhere will be too: that extensive opportunities exist for improving transformers. Yet few companies do (not just China, but everywhere): incredible amounts are spent just replicating someone else's work with a fear of trying anything substantially different.

qqqult · 2025-01-20T15:18:53 1737386333

great article, thank you

byefruit · 2025-01-20T14:03:06 1737381786

This is pretty harsh on DeepSeek.

There are some significant innovations behind behind v2 and v3 like multi-headed latent attention, their many MoE improvements and multi-token prediction.

wrasee · 2025-01-20T14:16:45 1737382605

I don’t think it’s that harsh. And I don’t also deny that they’re a capable competitor and will surely mix in their own innovations.

But would they be where they are if they were not able to borrow heavily from what has come before?

djtango · 2025-01-20T14:43:28 1737384208

We all stand on the shoulder of giants? Should every engineer rediscover the Turing machine and the Von Neumann architecture?

wrasee · 2025-01-20T15:03:29 1737385409

Of course not. But in this context the point was simply that it’s not exactly a fair comparison.

I’m reminded how hard it is to reply to a comment and assume that people will still interpret that in the same context as the existing discussion. Never mind.

dcow · 2025-01-20T15:37:43 1737387463

Don’t get salty just because people aren't interested in your point. I for one, think it’s an entirely _fair_ comparison because culture is transitive. People are not ignoring the context of your point, they’re disagreeing with the utility of it.

If I best you in a 100m sprint people don’t look at our training budgets and say oh well it wasn’t a fair competition you’ve been sponsored by Nike and training for years with specialized equipment and I just took notes and trained on my own and beat you. It’s quite silly in any normal context.

wrasee · 2025-01-25T12:39:35 1737808775

If someone replies to your comment then I think it’s entirely fair that they take your point in the context in which it was intended. Otherwise, if they are not interested in the point then simply don’t reply to it.

No-one enjoys being taken out context.

But I do accept that given the hostility of replies I didn’t make my point very effectively. In a nutshell, the original comment was that it’s surprising a small team like DeepSeek can compete with OpenAI. Another reply was more succinct than mine: that it’s not surprising since following is a lot easier than doing SOTA work. I’ll add that this is especially true in a field where so much research is being shared.

That doesn’t in itself mean DeepSeek aren’t a very capable bunch since I agree with a better reply that fast following is still hard. But I think most simply took at it as an attack on DeepSeek (and yes, the comment was not very favourable to them and my bias towards original research was evident).

dcow · 2025-01-20T15:30:05 1737387005

Sure, it’s a point. Nobody would be where they are if not for the shoulders of those that came before. I think there are far more interesting points in the discussion.

wrasee · 2025-01-20T14:02:02 1737381722

Also don’t forget that if you think some of the big names are playing fast and loose with copyright / personal data then DeepSeek is able to operate in a regulatory environment that has even less regard for such things, especially so for foreign copyright.

rvnx · 2025-01-20T14:11:30 1737382290

Which is great for users.

We all benefit from Libgen training, and generally copyright laws do not forbid reading copyrighted content, but to create derivative works, but in that case, at which point a work is derivative and at which point it is not ?

On the paper all works is derivative from something else, even the copyrighted ones.

wrasee · 2025-01-20T14:26:35 1737383195

Disrespecting copyright and personal data is good for users? I guess I disagree. I would say that it’s likely great for the company’s users, but not so great for everyone else (and ultimately, humankind).

gizmo · 2025-01-20T14:07:25 1737382045

Fast following is still super hard. No AI startup in Europe can match DeepSeek for instance, and not for lack of trying.

netdevphoenix · 2025-01-20T14:24:07 1737383047

mistral probably would

wrasee · 2025-01-20T14:18:04 1737382684

Mistral.

rvnx · 2025-01-20T14:19:52 1737382792

Mistral is mostly a cheap copy of LLaMA

wrasee · 2025-01-20T14:54:06 1737384846

I would extend the same reasoning to Mistral as DeekSeek as to where they sit on the innovation pipeline. That doesn’t have to be a bad thing (when done fairly), only to remain mindful that it’s not a fair comparison (to go back to the original point).

int_19h · 2025-01-20T23:01:37 1737414097

In what sense is Mistral a copy of LLaMA, specifically?

rvnx · 2025-01-21T08:39:36 1737448776

https://x.com/arthurmensch/status/1752737462663684344?s=46

This is one message of the founders of Mistral when they accidentally leaked one work-in-progress version that was a fine-tune of LLaMA, and there are few hints for that.

Like:

> What is the architectural difference between Mistral and Llama? HF Mistral seems the same as Llama except for sliding window attention.

So even their “trained from scratch” models like 7B aren’t that impressive if they just pick the dataset and tweak a few parameter.

int_19h · 2025-01-22T01:22:40 1737508960

Right, so Mistral accidentally released one internal prototype that was fine-tuned LLaMA. How does it follow from there that their other models are the same? Given that the weights are open, we can look, and nope, it's not the same. They don't even use the same vocabulary!

And I have no idea what you mean by "they just pick the dataset". The LLaMA training set is not publicly available - it's open weights, not open source (i.e. not reproducible).

h8hawk · 2025-01-20T15:39:02 1737387542

That’s totally not true.

https://epoch.ai/gradient-updates/how-has-deepseek-improved-...

netdur · 2025-01-20T13:58:46 1737381526

Didn't DeepSeek's CEO say that Llama is two generations behind, and that's why they didn't use their methods?