Making it hard for the compiler to optimize your memset-zero away is not a long-term solution. At some point in the future the compiler might be able to analyze this and optimize it away. As a cryptographer you should not rely on bad compilers.
Actually, using his memzero solution would work, but not because of his reasons. Putting memzero into another compilation unit (.c file) requires to compile it separately. memzero itself cannot be compiled to a NOP, since the compiler does not know how it is used and a call to memzero cannot be optimized since the compiler does not know what it does.
Nevertheless, link-time optimization in theory could still optimize across compilation units. The only solution which comes to my mind is to use 'volatile' for the memory access, but that will never be fast.
> The only solution which comes to my mind is to use 'volatile' for the memory access, but that will never be fast.
As you are insisting that the memory is accessed when you demand that the memory is wiped for cryptographic purposes, you will not be burned by the usage of volatile. (To be clear, you would of course not use the memory with volatile: you would add that qualifier only when you went to wipe it.)
Interesting. Is there a reason for this? I was under the impression that volatile only required that the accesses actually happen, not that the accesses had to happen in a manner considered "boring". Is the issue that volatile is also demanding that the ordering remain consistent, and the SSE instruction is not capable of guaranteeing that?
(edit:) In fact, that instruction, and a small handful of others (MOVNTI, MOVNTQ, MOVNTDQ, MOVNTPS, and MOVNTPD) do seem to cause re-orderings. On x86, at least, any other form of optimization should continue to be allowed (involving cache-lines, etc.), but you are definitely right: this instruction's usage would not be. :(
The easy way of reasoning about what optimizations the compiler can do with a volatile location is to think "If this were actually a memory-mapped IO port, would this compiler optimization change the observed behaviour".
One problem is that volatile is in practice often used for writing to memory mapped ports. I suspect in that situation using multi-memory address instructions might lead to pain. Of course in x64 such things might be less common / not make sense, but in general if you say volatile you are saying "do every read and write I tell you to, in the order I tell you to".
Even putting it into a separate compilation unit isn't a long-term solution: compilers that do whole program optimization may still be able to optimize it out. I believe just declaring the operand to be volatile would prevent it from being optimized out, however. e.g.:
I think (hope?) that the write would be counted as an "access". If not, this would break the "what if this was an IO port" analogue. If for some reason it wasn't, you could use a static variable to hold the result. I fear the real problem with "volatile" is that just about everything about it is implementation dependent rather than clearly defined by standard.
Zero-before use is standard practice, anyway, at least in safety-critical/crypto/life- systems development.
Why leave something like that to compiler semantics? If the block of memory you think is safe turns out in fact to be a back-door, well then: thats your fault, not the compiler, operating system, etc.
Well, there's also the rule "Never use malloc() in-process", too, which means: before main(), all your vars and memory are already initialized and allocated as you need them from the start. Oh, and other silly rules too, which can prevent a death or two along the way ..