> You do realize just how small this co2 output is compared to the human co2 output it replaces.
Sure, the issue is that the scale of models is increasing by orders of magnitude in a fairly short span of years, as well as the range of applications rapidly expanding. It doesn't take long for that kind of growth to go from a trivial issue to catastrophic one, and it's the exact kind of risk you have ethicists in a field to call out while it still is trivial so that some of the energy of people doing technical development gets directed to mitigate the risk of it ever reaching the catastrophic stage.
Analogously, I used to pooh-pooh the concerns about the total electrical usage of Bitcoin years ago, but it's since risen to an actually meaningful level. There's easily more money in AI in total than in crypto, and thus, you could see the total energy usage ending up a lot higher than crypto. That's not an insignificant amount of energy we're talking about here.
The number is probably quite a bit lower. The Strubell paper uses a PUE coefficient of 1.58, while Google datacenters are at 1.1. The figures are for GPUs, but Google uses TPUs, whose power characteristics were not public. Price is a proximate for power usage, though. TPUs might have been half as expensive in that experiment. Let's say that a further half of those gains are money that goes to Nvidia and not really related to power. So, making numbers up, training with TPUs might be another 25% more efficient. That's probably conservative, as Google claims that the TPUs are tens of times less power-hungry, due to their simplicity, but on the other hand, you also other fixed costs like racks, fabric and the CPUs to feed the chips.
In an ideal world, nobody would get hung up on details and everybody would understand that there is a lot of nuance when comparing things. In practice, if Google published a paper which quoted the Strubell paper without the caveats, I can see headlines about how inefficient and bad for the environment Google Translate is. PR would get busy and obtain corrections or follow-up articles to clarify things, but those rarely get the same attention. And it's still extra work that could have been avoided, which in itself is bad optics ("do you folks even review the stuff you send out for publication?").
I'm all for reducing emissions and improving efficiency, but I find the premise a bit of a stretch.
Yes, large and wealthy organizations have a big advantage, but that applies to pretty much anything they do, not just language models. Inefficiencies are bad, but if you ask a number of people why fix them, I think they'd mention financial cost and wasted time well before looking at it as an issue of ethics and fairness.
Reducing costs for language models is already a great idea across all fronts. So is reducing inequalities. It's linking the two that sounds like a strained argument to me. Suppose someone makes models ten times smaller next week. Will marginalized communities' lives improve soon? There must be more. I haven't seen the paper, so I'm curious how it is all framed.
You realize the shear scale of co2 output the office the engineers who wrote the model, drove to work, education, etc produced.
I’m actually suprised just how small of a co2 output it was.