There are many interesting points here, but "programmers will not be able to afford to be ignorant about the energy cost of the programs they write" struck me especially.
I worked in the embedded space and we had rough notions of the (power) cost of using various instructions and hardware blocks -- more or less related to the number of bits flipped (for example adding 0 does less work that adding 0xfffffff).
It would be grand if this filtered down to the mainstream. If a program runs on a million machines, making it more power efficient would be a big win economically, but the programmer probably has no incentive.
Working out big O assumptions for power use on your platform is an interesting exercise!
I call your bluff. Worrying about cost of a state change of 32 flip flops on a die with millions of transistors is patently ridiculous. No way could you measure that. No way.
That said, your core point is valid: not choosing the right sleep mode can hurt badly, avoiding polling is important, above all avoid busy loops. On many architectures avoiding FPU or vector instructions can put the units into a power save mode, etc...
But "use as many zeros as possible in arithmetic" can't possible be right.
It was easily measurable if you ran the instructions in a loop and looked at the power draw. I can't remember the numbers now but it was definitely significant (maybe 30% on a wide vector instruction?).
I don't know where you got the idea about the flip-flops from.
Adders for example are implemented so that they grow exponentially in the number of bits in size, so that they use a time proportional to the log2 of the number of bits. Note that this means the number of bit transitions in a simple add can be much larger than the number of bits going in, meaning more power draw.
Of course we didn't worry about the instructions inputs in practice.
I think it was cheaper to store 1bits than 0bits ironically.
edit: explain more clearly that larger circuit size means more power drawn
Still doesn't make sense. Yes, things like Brent-Kung adders are O(2^N) in the bit length of the input, but the adders are pipelined. You only avoid the transition current in the stateless logic if all the addition operations are of the same values every clock cycle. But obviously you can't do that if you want to compute something. I still don't buy it. On anything bigger than a 6502, there is no way this would be measurable. None whatsoever.
Measuring the relative power draw due to flipping logic can definitely be done with present day simulation tools. The question is whether there is any significant advantage in saving X% of power in the processor core. In a complex system, the processor rarely sucks up most of the power. Someone should actually do this as a research and find out if there are any significant savings.
I am guessing that it will all be architecturally specific. An ARM will have a different power profile to a MIPS running the same C code. Personally, I would like to see the option of a -Op switch to GCC that selects power-optimised code paths.
Sometimes when I'm doing some calculations I know the O(n^3) algorithm will run fast enough, but I also know that if I use the O(n.log n.log log n) algorithm then my laptop battery will last measurably longer. That matters on long journeys without power outlets.
I worked in the embedded space and we had rough notions of the (power) cost of using various instructions and hardware blocks -- more or less related to the number of bits flipped (for example adding 0 does less work that adding 0xfffffff).
It would be grand if this filtered down to the mainstream. If a program runs on a million machines, making it more power efficient would be a big win economically, but the programmer probably has no incentive.
Working out big O assumptions for power use on your platform is an interesting exercise!