The problem is that, at review time at Google, you have to be able to "quantify" the impact. Many types of impact are quantifiable (e.g. "Made server request scale from 100 query-per-second to 1,300 qps", "reduced code size by 30%", etc.).
It's much harder to measure, say, the impact of a refactor where you made the code easier to reason about and more maintainable, so that future work can be done on it more easily.
I witnessed the same thing at Google; I worked on a project that everyone joked only existed because the person who wrote it wanted promo, and the best way to get it was to design a very complex system, and convince others to adopt it. (He did get it, and promptly switched teams.)
Some things have been made better, though. I've heard that going from L4 → L5 now involves much more influence from your manager, since they would know and, without quantifying something like a refactor, can speak to the positive impact you had in a project.
I've also seen refactors that just made life difficult for everyone else with constant non-functional changes. In the end there is a lot of fashion in programming, and while some refactors are worthwhile most are not in my experience.
The refactor is supposed to provide payoff in the future, but what normally happens is fashion changes and someone new comes by and says "this code is shit" and starts the process over. The supposed benefits never accrue.
Measure commits/authors before and after a refactor.
Measuring number of commits? Create fewer, larger commits. Measuring commit size? Pull in more third-party libraries, even where it does't make sense. Author count? Add more/less documentation and recruit or inhibit new devs depending on what your goal is.
Not to mention the number of commits/authors before and after an arbitrary point in time might conflate a successful growing project with a project in a death spiral being passed around from group to group.
It's a good idea, but in practice simple metrics like this often (but not always) devolve into prime examples of Goodhart's law.
If you can’t find a measurable benefit to a refactoring (or anything else, really) then maybe it was not worth doing in the first place.
There is no programming project in existence with enough developers working on it, that developer-productivity data derived from a change to it would not be considered "underpowered" for the sake of proving anything.
A high impact infra change will often inconvenience dozens of people and distract from feature work... You know, the shit people actually care about... (this is analogous to how "Twitter, but written in Golang" appeals to approximately no one.)
And solve the halting problem while you're at it
Goodhart’s law is not applicable to scientific management because metrics have different purpose.