For example, an adder's total delay depends on a carry chain. If you have N 4-bit slices, the last slice has to wait for the carry to propagate through all N-1 previous slices.
But if you duplicate all your slices, you can have the results for both carry = 0 and carry = 1 inputs. Then just switch which one is correct - total time 1 add plus N-1 switches.
I believe that every single adder architecture we now use was known by 1980s. The "optimization" is matching the theory to the engineering of the day.
The reason you don't use prefix adders in 1980 is that you can't possibly route them because you don't have enough metal. So instead, you use chunks of Manchester carry chain because the "tapping internal nodes" that everybody cites allows you to route nodes in diffusion and polysilicon instead of having to use metal.
Of course, THAT only works because you have 5V (or more) and can connect lots of transistors in series and still have them work. As your voltage falls you can't connect as many transistors in series, so you switch to architectures that prefer active gates over passthroughs and long chains.
So, as your available metal layers, supply voltage, transistor speed, threshold voltages, capacitive load and power dissipation all shift over the engineering landscape, your "optimization" shifts with it.