

Zero and forget -- caveats of zeroing memory in C - peripetylabs
http://www.eliteraspberries.com/blog/2012/10/zero-and-forget--caveats-of-zeroing-memory-in-c.html

======
qznc
Making it hard for the compiler to optimize your memset-zero away is not a
long-term solution. At some point in the future the compiler might be able to
analyze this and optimize it away. As a cryptographer you should not rely on
bad compilers.

Actually, using his memzero solution would work, but not because of his
reasons. Putting memzero into another compilation unit (.c file) requires to
compile it separately. memzero itself cannot be compiled to a NOP, since the
compiler does not know how it is used and a call to memzero cannot be
optimized since the compiler does not know what it does.

Nevertheless, link-time optimization in theory could still optimize across
compilation units. The only solution which comes to my mind is to use
'volatile' for the memory access, but that will never be fast.

~~~
saurik
> The only solution which comes to my mind is to use 'volatile' for the memory
> access, but that will never be fast.

As you are insisting that the memory is accessed when you demand that the
memory is wiped for cryptographic purposes, you will not be burned by the
usage of volatile. (To be clear, you would of course not _use_ the memory with
volatile: you would add that qualifier only when you went to wipe it.)

~~~
qznc
If you have a type like "volatile int*" for your array elements to zero, the
compiler cannot use SSE intructions to write the zeros, e.g. MOVNTDQ.

[http://www.jaist.ac.jp/iscenter-
new/mpc/altix/altixdata/opt/...](http://www.jaist.ac.jp/iscenter-
new/mpc/altix/altixdata/opt/intel/vtune/doc/users_guide/mergedProjects/analyzer_ec/mergedProjects/reference_olh/mergedProjects/instructions/instruct32_hh/vc194.htm)

~~~
saurik
Interesting. Is there a reason for this? I was under the impression that
volatile only required that the accesses actually happen, not that the
accesses had to happen in a manner considered "boring". Is the issue that
volatile is also demanding that the ordering remain consistent, and the SSE
instruction is not capable of guaranteeing that?

(edit:) In fact, that instruction, and a small handful of others (MOVNTI,
MOVNTQ, MOVNTDQ, MOVNTPS, and MOVNTPD) do seem to cause re-orderings. On x86,
at least, any other form of optimization should continue to be allowed
(involving cache-lines, etc.), but you are definitely right: this
instruction's usage would not be. :(

~~~
jbri
The easy way of reasoning about what optimizations the compiler can do with a
volatile location is to think "If this were actually a memory-mapped IO port,
would this compiler optimization change the observed behaviour".

------
cygx
Personally, I'd just write a wrapper function for memset().

The following one works in at least gcc 4.5.3 and clang 3.1, but is actually
not guaranteed to work by C language semantics:

    
    
        static inline void memzero(void *volatile block, size_t size)
        {
            memset(block, 0, size);
        }
    

A safe alternative is

    
    
        static inline void memzero(void *block, size_t size)
        {
            static void *(*volatile const memset_)(void *, int, size_t) = memset;
            memset_(block, 0, size);
        }
    

but has the downside that the virtual call cannot be optimized away, whereas
gcc and clang actually inline the call to memset() if the first version is
used.

If this is a concern, there's probably no alternative to an explicit loop:

    
    
        static inline void memzero(void *block, size_t size)
        {
            volatile unsigned char *bp = block;
            while(bp < (unsigned char *)block + size)
                *bp++ = 0;
        }

------
fhars
That implementation will die with a bus error for seven out of every eight
possible values of mem on most non-x86 architectures, zero out the wrong
memory regions for these values on some more exotic architectures, and be
horribly slow everywhere else.
[http://stackoverflow.com/questions/1496848/does-unaligned-
me...](http://stackoverflow.com/questions/1496848/does-unaligned-memory-
access-always-cause-bus-errors)

------
dexen
Whether memset(), or any other function, gets optimized away by GCC should
depend on function attributes (1) -- more exactly on the `pure'; possibly some
others. However, GCC (tested with 4.7.1) somehow considers memset() pure
regardles of declaration. The default declaration is:

    
    
      $ echo '#include <string.h>' | gcc -E - | grep memset
      extern void *memset (void *__s, int __c, size_t __n) __attribute__ ((__nothrow__ , __leaf__)) __attribute__ ((__nonnull__ (1)));
    

When replaced by hand with declaration lacking any attributes, it still gets
optimized away.

    
    
        	/* will be optimized away for unclear reasons */
        extern void *memset (void *__s, int __c, size_t __n);
    

Contrast that to behavior of any user-defined function:

    
    
        	/* may be optimized away */
        extern void * memxxx(void *__s, int __c, size_t __n) __attribute__ ((pure));
        	/* should not be optimized away */
        extern void * memyyy(void *__s, int __c, size_t __n);
    

IMHO GCC's special handling of memset() is broken...

(1) [http://www.cs.auckland.ac.nz/references/c/gcc4.7/Function-
At...](http://www.cs.auckland.ac.nz/references/c/gcc4.7/Function-
Attributes.html)

~~~
saurik
(edit: I just noticed that qznc's earlier top-level comment makes this same
overall point, even down to the specifics about future optimization and
volatile. That comment is also shorter, so it is probably provides more "bang
for the buck" to read it instead of mine.)

I am pretty certain that memset is not conceptually being handled especially
different: it is simply being inlined. If you have a memset of a variable to
0, gcc's compiled output might involve just a single mov instruction and if it
is a small array, it might be a few mov instructions: if it is a lot of memory
being cleared, it might involve a call.

However, as it is inlined, the semantics are pretty clear for how it should
work (and attributes like pure will be deduced, as they would for any function
you include in your actual code; you don't need to mark it as such <\- edit:
to be clear, though, that I really don't think that memset is "pure"... it is
pretty much the exact opposite of "pure"): it is equivalent to assigning a
variable a value, and if that variable's value will not ever be used, then the
assignment will not happen.

Instead, if the developer really really _really_ insists that that assignment
happen even if, from the perspective of the C language standard there is no
legitimate side effect of that operation, the correct thing to do is to
temporarily qualify the pointer with volatile and _then_ do whatever it is you
wanted (such as wiping it).

Sadly, this article does not even in a single place mention "volatile", so I'm
not certain that the author understood how this works. Instead, he states that
a "solution" is to instead put a zero in the first element and then copy it to
subsequent elements... something that could easily be detected by abstract
interpretation, and which a sufficiently smart compiler would be correct in
also optimizing away.

~~~
dexen
It's not about volatile; marking the array as `volatile' caused warning, and
the call is still optimized away:

    
    
        a.c:14:2: warning: passing argument 1 of ‘memset’ discards ‘volatile’ qualifier from pointer target type [enabled by default]
    

To take `volatile' into account, the `memset()' would have to to have
`volatile void * s' as first argument in its prototype.

Curiously enough, making a custom function with volatile pointer argument and
`pure' attribute still causes GCC to optimize it away. I guess such
combination makes no sense anyway.

    
    
        extern void * memaaa(volatile void *__s, int __c, size_t __n) __attribute__ ((__pure__));

~~~
saurik
Right. I did not claim that once you added the volatile qualifier that you
could still use the existing implementation of memset: if you simply cast the
pointer to volatile and then pass it to the existing version of memset, you
will certainly get a diagnostic indicating that you lost the qualifier, as
memset itself does not have volatile as part of its type definition.

However, the author's attempts to redefine memset without using volatile rely
on gcc limitations: in the first "trivial" case that it does not use abstract
interpretation while looking for constant variables (in this case, it should
be easy to fold that loop down to "oh, that rvalue is always a constant 0),
and in the second that it does not do extensive optimization across
compilation units.

As for the other thing you noticed: while I would imagine that the gcc-
specific "I am the developer and want you to do something very special and
important" __attribute__ would override any notion that it might have
regarding "volatile", they are really orthogonal concepts: with just that
prototype the compiler wouldn't even know if that function does anything at
all with the memory you passed... the function might just do something with
the pointer value and not indirect it at all; therefore, one would pretty much
demand that "pure" have the semantics that gcc is giving it here.

------
rwg
CERT's Secure Coding wiki has more to say on the subject, including portable
code for a memset_s() function that can still potentially be optimized way:

[https://www.securecoding.cert.org/confluence/display/seccode...](https://www.securecoding.cert.org/confluence/display/seccode/MSC06-C.+Be+aware+of+compiler+optimization+when+dealing+with+sensitive+data)

[https://www.securecoding.cert.org/confluence/pages/worddav/p...](https://www.securecoding.cert.org/confluence/pages/worddav/preview.action?pageId=3524&fileName=protecting-
sensitive-data.pdf)

------
bvrlt
The assumption that the trivial solution won't be optimized out is, I think,
wrong. From your experiment GCC is indeed not smart enough to do so but I
would bet that a compiler like ICC would. In this case the best is probably to
use some pragmas to avoid optimizing out a statement.

------
tedunangst
OpenBSD has a function called secure_bzero. All it currently does is call
regular bzero, but if external compilation isn't enough to do the job, we come
up with something else.

Regarding the article, I don't at all understand why the three arguments are
necessary. Why would the following patch not work?

    
    
        - memset(x, 0, n);
        + memzero(x, n);

~~~
ibotty
i think it's to keep the patch(es) easier to read. it certainly feels awkward.

------
nitrogen
_Note that the type punning is only actually useful on systems where memory
addresses are 64 bits wide, hence we include that code conditionally for
environments with the LP64 data model, which incudes most Unix-like systems._

The first statement seems false. I was not previously aware of an association
between the number of address lines and the size of the data bus on computing
systems? I know I've had 32-bit processors with at least 64-bit memory buses,
and the SheevaPlug has a 32-bit processor with a 16-bit memory bus.

Also, the code above this paragraph will only use wide accesses on 64-bit
architectures ("#ifdef __LP64__"), even though there are benefits available on
32-bit systems.

------
tsahyt
Why isn't there a keyword stating "don't optimize this!" in the C Standard? If
there is is, please correct me. There are a bunch of similar problems
introduced by optimizing compilers that could be solved with such a keyword.

~~~
smcl
One way to inhibit optimisations we used was to declare one of the variables
involved as "volatile". I'm not sure about MS, Intel, GCC etc but ours took
this as a sign to not run any optimisation passes over code involving volatile
variables.

~~~
coob
In GCC 4.4 and up you can use the following:

    
    
        #pragma GCC push_options
        #pragma GCC optimize ("O0")
    
        memset(a, 0, 3);
    
        #pragma GCC pop_options
    

'volatile' should also work, I don't see why there is any need to rewrite
memset. Here is a related question on SO:

[http://stackoverflow.com/questions/2219829/how-to-prevent-
gc...](http://stackoverflow.com/questions/2219829/how-to-prevent-gcc-
optimizing-some-statements-in-c)

Edit: buried at the bottom of that post is another method:

    
    
        __attribute__((optimize("O0")))

------
mseebach
I've only ever done an short compiler class in college, but wouldn't a
sufficiently sophisticated optimizer unroll the loop and propagate the
constant, thus eliminating the read, allowing the entire block to go away?

------
Someone
Relevant for those thinking that "volatile" will save the day:
<http://www.cs.utah.edu/~regehr/papers/emsoft08-preprint.pdf>

It shows that a) volatile does not have to do _anything_ (if that is
documented by the compiler) and b) you cannot trust your compiler to be bug-
free. The latter is one reason to follow cert.org's advice to read the
disassembly output of your compiler.

------
fserb
Could please someone explain why this is an issue in the first place?

If the memory of the process is available to other processes after it finishes
(or while it's running), isn't this already a lost game? I.e., how can you be
sure that this particular chunk of memory wasn't cached on disk at some point?
how can you be sure that someone didn't access it before your memset() call?

~~~
alexkus
Remote exploits for one.

Imagine a web server that didn't wipe plaintext passwords or encryption keys
from its memory after finishing with them. If the web server was remotely
exploitable then it could be possible to obtain the contents of the memory of
that process remotely, thus possibly leaking passwords or other sensitive
information of other people that have connected to that web server at some
point in the past.

> how can you be sure that someone didn't access it before your memset() call?

True, but there's a difference between having a very small window of
opportunity where the data could be obtained via a remote exploit, and leaving
the window wide open for possibly endless period of time.

Obviously you shouldn't have any remote exploits in the code in the first
place, but it's good practice for secure programming to keep the sensitive
information in memory for as short a period as possible just in case there is
something that you aren't aware of.

------
sn
A good solution in c is to use calloc, which is malloc + zero'ing out the
memory.

<http://www.cplusplus.com/reference/clibrary/cstdlib/calloc/>

Also, in a standards-compliant compiler, statically declared variables are
automatically initialized to zero unless stated otherwise.

~~~
Someone
That won't work here. Here, the goal is to have 'cfree': zero'ing out memory
and then freeing it.

~~~
sn
OK, shows me for skimming too much.

There apparently used to be a cfree though it appears to be equivalent to
free.

I upvoted rwg's comment regarding CERT's Secure Coding wiki. memset_s is the
correct solution given it is part of the C11 standard.

------
AndreyKarpov
Relevant artcile: Overwriting memory - why? -
<http://www.viva64.com/en/k/0041/>

