There are models much larger than BERT with a much larger footprint, GPT-3 being the most well kniwn example.
Models like BERT aren't just trained once either when they are developed, but trained again with different domains, different parameters, different tasks in some cases. There is also fine-tuning (more frequent, less carbon intendive), so these are real environmental problems, and others have pointed them out.
How much more of a problem are a billion cars and 40% of our electricity being generated from coal?
We’ve squandered decades ignoring the big problems and now people want to run into the weeds with a thousand little problems, that individually don’t amount to much.
Models like BERT aren't just trained once either when they are developed, but trained again with different domains, different parameters, different tasks in some cases. There is also fine-tuning (more frequent, less carbon intendive), so these are real environmental problems, and others have pointed them out.