DeepSeek not as disruptive as claimed, firm has 50k GPUs and spent $1.6B

amingilani · 2025-02-04T18:54:11 1738695251

This articles that distills a report which doesn't bring anything new to the table. I can't load their site right now, but in their release I recall they explicitly excluded experimentation and R&D costs. And have $1.6b worth of hardware is not the same as spending $1.6b on costs to build the model.

The article says:

> despite the company's claims that DeepSeek only cost $6 million and 2,048 GPUs to train. However, industry analyst firm SemiAnalysis reports that the company behind DeepSeek incurred $1.6 billion in hardware costs and has a fleet of 50,000 Nvidia Hopper GPUs, a finding that undermines the idea that DeepSeek reinvented AI training and inference with dramatically lower investments than the leaders of the AI industry.

The underlying report states:

> The $6M cost in the paper is attributed to just the GPU cost of the pre-training run, which is only a portion of the total cost of the model. Excluded are important pieces of the puzzle like R&D and TCO of the hardware itself. > > https://semianalysis.com/2025/01/31/deepseek-debates/

But we already know this. The DeepSeek paper even gives a breakdown:

> Pre-Training: 2664K > Context Extension: $0.238M > Post-Training: $0.01M > Total: $5.576M > > https://arxiv.org/pdf/2412.19437v1

sxp · 2025-02-04T18:03:11 1738692191

Full title & subtitle:

DeepSeek might not be as disruptive as claimed, firm reportedly has 50,000 Nvidia GPUs and spent $1.6 billion on buildouts

The fabled $6 million was just a portion of the total training cost.

coldtea · 2025-02-04T18:11:00 1738692660

They also run it as a service though? How much of that hardware costs was training vs infrastructure for the public service?

Besides, it's FOSS, isn't? Meaning anybody can see how much it takes to run and how much it takes to train?

>A recent claim that DeepSeek trained its latest model for just $6 million has fueled much of the hype. However, this figure refers only to a portion of the total training cost— specifically, the GPU time required for pre-training. It does not account for research, model refinement, data processing, or overall infrastructure expenses.

So the $6 million is correct - and the rest is irrelevant additional costs. Yes, they'd pay for research. They'll pay for data processing. They'll have overal infrastructure expenses. Duh!

drakenot · 2025-02-04T18:47:14 1738694834

Yes. And the original announcement and paper called out that this ~6 million was just for training the R1 model. It specifically mentioned that this didn't include any other costs, including research, previous models, etc. Just raw training costs for the latest model.

There is Hugging Face's "Open-R1" project which is attempting to replicate the DeepSeek-R1 model given the innovations outlined in the paper(s). They need to replicate the training code, the datasets, etc. This part of the system wasn't published.

And you are right that with the open weights of the R1 model, it should be pretty easy to infer how much it costs to _run_ as well.

ericye16 · 2025-02-04T18:46:14 1738694774

Without the actual training corpus, you can't know how much it takes to train. They could have trained on twice as many tokens for example (not saying they did!)

coldtea · 2025-02-04T20:07:48 1738699668

You could however train it yourself and see what it takes to get the same cognitive performance.

stonogo · 2025-02-04T18:42:47 1738694567

It is not FOSS. The LLM industry has repurposed "open source" to mean "you can run the model yourself." They've released the model, but it does not meet the 'four freedoms' standard: https://github.com/deepseek-ai/DeepSeek-V3/blob/main/LICENSE...

coldtea · 2025-02-04T18:47:41 1738694861

The code is MIT however.

throwaway314155 · 2025-02-04T18:51:13 1738695073

In fairness, you can still reverse engineer the training procedure yourself from their paper and get a close approximation of training cost using some open/synthetic datasets. You might think it's incredibly complicated to do this without source code, but the pre-training portion of the training is something you can grab from other projects or re-derive yourself pretty easily (and will account for the bulk of training time/cost).

Not defending it, but it's a much better situation than some of the non-commercial licensing from e.g. Meta, Stability.

ncann · 2025-02-04T18:46:58 1738694818

So I checked out the original report:

https://semianalysis.com/2025/01/31/deepseek-debates/

They cite themselves as the source, and throughout the article are just a bunch of "We believe...".

Am I missing something?

maxglute · 2025-02-04T19:34:46 1738697686

Likely pulling numbers out of their ass.

They're getting challenged on X about how parent Highflyer hedgefund with 8B AUM, aka their single digit % management fees since founding is in low 100s millions total (for all operating expenses) can sustain 1B+ of just capex, somehow got 1B+ of hardware. It's not financially possible, well not anymore since founder just met with PRC premiere whose going to unlock national compute bazooka. But the fact they just got political attention means they were operating on limited capex, the founder himself said the original batch of A100 cards represented significant gamble and shared resource with hedgefund. They simply did not have the cash for 1B+ of cards that semianalysis thinks they have, doesn't pass basic smell test.

Deepseeks paper also pretty transparent about training cost was for that run. IMO people fixate on the 6m training cost number, but really the story is a bunch of kids, from PRC universities with access to some compute is closing gap with US AI... which TBH is just as embarassing / destabilizing.

Arnt · 2025-02-04T18:55:48 1738695348

Yes. Thhat they also talk about the company group's total costs and silently imply that the training for this model is a significant part of that, or maybe silently implying that the costs for the other stuff the group did should be counted as part of this success.

Aloisius · 2025-02-04T18:55:21 1738695321

They do reference a DeepSeek job ads boasting of "access to 10,000s GPUs" for use without usage restriction.

Though no link to it.