For anyone curious, it took 2048 A100 GPUs to train LLaMa, each GPU costs roughl...

Taek · on March 3, 2023

Electricity costs are basically irrelevant because the cards are so expensive.

A100 cards consume 250w each, with datacenter overheads we will call it 1000 kilowatts for all 2048 cards. 23 days is 552 hours, or 552,000 kilowatt hours total.

Most dataceneters are between 7 and 10 cents per kilowatt hour for electricity. Some are below 4. At 10 cents, that's $53,000 in electricity costs, which is nothing next to $30 million in capital costs.

lmeyerov · on March 3, 2023

The comparison would be renting, not buying the GPUs

I believe capex <> opex is more 1:1 nowadays, so something feels off here...

datadeft · on March 3, 2023

> Electricity costs are basically irrelevant because the cards are so expensive.

You mean in terms of money. I think this is exactly the problem that we have in CS, nobody really cares about CO2.

CGamesPlay · on March 3, 2023

No, I'm willing to bet the CO2 cost of the cards is also way higher than the electricity. Those things are built on the global supply chain, with materials potentially making multiple thousands of kms journeys between each step.

yunwal · on March 3, 2023

Long term I also imagine it's much cheaper to run these large model trainings on renewables. It's a very centralized process that doesn't necessarily need 100% availability.

The manufacturing process, however, is totally decentralized, and NVIDIA mostly manufactures in China where coal is cheap.

Taek · on March 4, 2023

Because the cards are so expensive, you really do want them running 24/7. The electricity is not a big deal for these really expensive chips

verall · on March 3, 2023

A100s are manufactured in Taiwan.

yunwal · on March 3, 2023

True, but previous chips have been manufactured in China, and they’re also developing and manufacturing their successor to A100s (H100s) in China.

https://www.cnbc.com/2022/09/01/nvidia-says-us-government-al...

verall · on March 3, 2023

That says "partially developed" in China. H100s will probably be manufactured in Taiwan just like other x100 chips.

All consumer SKUs that I know of are manufactured in China. By volume this is certainly the majority of manufacturing.

Taek · on March 4, 2023

I believe in terms of climate impact the chips (made in Taiwan) overwhelm everything else.

echoer_lyrist · on March 3, 2023

US grid mix produces about 0.855 pounds of CO2 per kWh[0]. So 552,000 kWh 452,640 pounds of CO2 which is 205.31 metric tons. At a cost of $40 per tonne[1] of CO2 that works out to $8,212.40 which is still small compared to the capital cost of the cards.

[0] https://www.eia.gov/tools/faqs/faq.php?id=74&t=11

[1] https://www.pnas.org/doi/full/10.1073/pnas.1609244114

waffletower · on March 3, 2023

AWS us-west-2 is housed in The Dalles and Prineville, Oregon. Not only are they near a massive wind farm in the Columbia Gorge, but also quite near the Columbia river's many hydro-electric dams. Facebook and Apple also have Prineville data centers. They are built there intentionally. Electricity at many data centers is quite carbon-lean.

jedberg · on March 3, 2023

Pretty sure Facebook uses green energy for their datacenter, so the CO2 cost should be nothing.

SequoiaHope · on March 3, 2023

I always feel there is an opportunity cost here though. If that green energy wasn’t being used for compute it could be available to heat someone’s home instead of them using dirty sources.

SSLy · on March 3, 2023

I'm on some strong hopium that those DC's run on renewables or nuclear, green energy.

fancyfredbot · on March 3, 2023

$30m training cost is too high. Amazon's p4d.24xlarge is $32.77 an hour for 8 A100 GPUs. 2048 A100 GPUs for 23 days costs $4.6m at that rate. You might even get a discount.

orbz · on March 3, 2023

At the same time I guarantee you they didn’t get it right the first time. I’m sure there were multiple (both serially and in parallel) runs as they worked out kinks and tuned hyper parameters.

zamnos · on March 3, 2023

Not to mention, the kind of expertise to run this for a major corporation doesn't come for free either? Facebook employs quite a few high profile ML researchers who undoubtedly make mid-high six figure salaries.

nodja · on March 3, 2023

15k is the price to buy a GPU, not to run it, you'd have to account for electricity costs which isn't so straight forward.

ithkuil · on March 3, 2023

The point was that if you only need to train once, then it's cheaper to rent the GPUs than to buy them. If you need to train it multiple times, then the cost of buying the GPUs is amortized among runs.

In any case the cost per run is going to be lower than 30m

iszomer · on March 3, 2023

I'm sure that's the case. The latest sku I'm responsible for QC testing now contains 4x A100's in a 2U chassis. And oh man the number of QSFP ports it utilizes..

Godel_unicode · on March 3, 2023

That’s something like a 50% premium over Azure, even before reservation discounts, that’s insane!

Edit: todays pricing looks like about 20% higher, still. How are these prices so different.

sofixa · on March 3, 2023

Azure is generally a pretty terrible cloud (poor UX, very slow for anything, multiple highly critical cross-tenant security issues, etc.) far behind the market leader, AWS, so they have to compensate with pricing (same reason why Oracle Cloud is so reasonably price, they're already so far behind their usual pricing wouldn't make any sense).

brookst · on March 3, 2023

Buying the GPU lets you amortize cost over years, probably 20-30 models of this size, at least. Probably better to use cost over time as a unit.

If an A100 costs $15k and is useful for 3 years, that’s $5k/year, $425/mo. 2048 A100’s cost $870k for a month.

minhazm · on March 3, 2023

There's no reasonable way to get an estimate of what it actually costs FB. 1) The GPU's are not single use, they will amortize it over 3 yrs and there are other things that it will be used for that generate revenue. 2) The cost of the servers for these GPU's to run in with massive CPU, RAM, and storage requirements. 3) The overhead of building and operating all of that infrastructure in terms of people, electricity, cooling, etc. 4) The overhead of having dozens or hundreds of engineers & scientists who contributed to this.

One way you can distill the first three is to use AWS/Azure/GCP costs. But then you are still missing a major factor which is the humans that worked on it, and the human may very well exceed the hardware cost.

devmunchies · on March 3, 2023

Plus there's a lot of highly specialized engineers required to keep all those GPUs up and running during training and the ML engineers who are skilled in deep learning + hardware, plus the systems for gathering/cleaning/labelling data. Gather enough engineers and now you need managers, PMs, etc.

At least $10 million/yr just for the talent.

visarga · on March 3, 2023

It was 1M training hours, using the price of $12/h for a box with 8x NVIDIA A100 you get $1.5m

riku_iki · on March 3, 2023

> Also IIRC it took 23 days to train the biggest mode

A100 costs $2/h, so it is $2M to train biggest model. Easily kikstart crowdfundable project.