
How to zero a buffer - cperciva
http://www.daemonology.net/blog/2014-09-04-how-to-zero-a-buffer.html
======
jhallenworld
Slightly OT since it has little to do with security, but fighting the
optimizer is something FPGA Verilog and VHDL designers must also master.

If you don't use an the result of some logic it will be optimized out. One way
to prevent this is to route it to a pin.

If logic is fed by a constant, it will be optimized out right up to the point
where the result of the logic is mixed with some external input. (early tools
could not use the dedicated reset net due to this- reset for each flip-flop
had to be routed to a pin or the reset net was optimized out which means the
initial state of your flip-flop is lost).

If you have identical logic, one copy is optimized out due to aggressive CSE.
This is often bad for performance (routing in an FPGA is as slow as logic, so
it's better to regenerate identical results in multiple places), so you add
"syn_maxfan" constraints to prevent the "optimization".

On the other hand, an input flip flop will be duplicated if the fanout limit
is exceeded- but this prevents the use of the dedicated I/O cell flip flop
which then causes external timing to be messed up. So you use
syn_maxfan=infinite for this case.

~~~
legulere
Why would you want your FPGA to have circuits that aren't used? And why don't
you want constant expressions to be pre-calculated by the optimizer?

~~~
exDM69
I have the exact same question.

I presume this is for some incremental development work. Like testing and
seeing the number of gates/pins used for a design but you need to feed the
logic with constant placeholders.

------
Someone1234
Can someone explain this line:

    
    
          static void * (* const volatile memset_ptr)(void *, int, size_t) = memset;     
    

I've written some C but that is utter gibberish to me.

~~~
akavel
As usually in C, you start from the name, then try to go to the right until
you hit parenthesis, then go to the left until parenthesis, rinse and repeat.

So, first, the name:

    
    
        memset_ptr
    

Then, try going to the right, but aha! before we can really gain speed, a
parenthesis is immediately blocking us. We shrug off the bruises and turn to
the left:

    
    
        (* const volatile memset_ptr)
    

_" Hey guys, memset_ptr is a volatile const pointer..."_ Now, we hit left
parenthesis, so we're again allowed to go right, yay!...

    
    
        (* const volatile memset_ptr)(void *, int, size_t)
    

_"...to a function taking such-and-such arguments..."_ Uh, oh, the equal sign,
so no more to read to the right; disappointed, we turn back to the left for
the final run:

    
    
        static void * (* const volatile memset_ptr)(void *, int, size_t)
    

_"...returning a pointer to void! Hah, got you! Simple, really. No arrays, no
pointers to pointers, not even a function pointer returning a function
pointer, meh. Uh, oh, aaaaand, yes, by the way, the variable is static, so,
like, file-local, um. Yeah, yeah, I saw it from the beginning, oh, go away,
you're just picky. And, and, you wouldn't recognize a function returning a
pointer to an array of pointers to functions returning anonymous struct even
if it hit you in the face, pfff!"_

~~~
StavrosK
Is "void *" really something you can pass?

~~~
joelangeway
Yes, it refers to a memory location, without implying anything about the
semantics of the bits located at that location. You can't dereference or
assign to the location because you don't know the type at that location. You
can however assign that pointer to a typed pointer variable to actually read
or write to that memory location. This is useful when you really care about
the bits of memory but you're variable pointing to that memory could just as
well be an (int64_t _) as a (char_ ) and those types are not interchangeable
with each other, only with (void _). So library functions that just care about
memory locations, not the semantics of the bits there, take (void_ ).

Some of this may be technically incorrect. This is my own mental model of the
C language which is sometimes incomplete.

~~~
StavrosK
Hmm, I'm a bit confused. Isn't that a function call, rather than a function
declaration? If it's a function call, it's passing a bunch of types in, which
I thought was not valid C?

~~~
akavel
Ah, now I get your question. So, it _is_ a function declaration, _not_ a
function call. Um, sorry: a declaration of a _pointer_ to a function, where
this function would take as arguments: some (unnamed) void pointer, some
(unnamed) int value, and some (unnamed) size_t value; and would return a void
pointer.

Um; then there's the equal sign, so this is not only a declaration, but a
definition too; but _definitely_ not a call.

A _call_ is further down in the original blogpost, in the below line:

    
    
        (memset_ptr)(p, 0, len);

~~~
StavrosK
Oh! A pointer to a _function_! Aha, I didn't know that was valid, thanks!

------
denim_chicken
In GNU C one can add the statement

    
    
        asm ("" : : "m" (&key));
    

just before or after the memset, effectively telling the compiler that the
address of "key" escapes the scope of the function.

~~~
fafner
GCC also has an `optimize' function attribute which might make sense to use
here. This can set optimizations for the function to -O0. But I haven't tried
it.

~~~
cesarb
That's not enough, since even -O0 applies a few optimizations, and which
optimizations it applies could change in the future.

The best answer is really GCC's __asm__("" : : "m" (&key)), or perhaps
something like __asm__("" : : "r" (key) : "memory"), after the memset. It
generates no extra code, just ensures that the memset won't be removed.

For other compilers (in practice only MSVC, since clang is gcc-compatible),
you could pass the pointer to a dummy assembly function instead of using
inline assembly; even link-time optimization can't know what happens within a
function written in assembly. Or, for better performance, create in assembly a
"safer_memset" which is a single instruction: a jump to the real memset
function.

------
tedunangst
When this still doesn't work: JIT compiled C. The compiler can check for
memset and elide it. (Or hell, one can envision the hypothetical
Antagonizer9000 compiler including a version of memset which peeks up the
stack to see what it's clearing and stops short.)

~~~
davidtgoldblatt
To clarify: the strategy in the post doesn't actually work (or at least, is
not guaranteed to work in every conforming implementation): the "volatile"
only applies to the read of the function pointer, _not_ to the execution of
the function in question.

You don't even need to assume some sort of crazy evil compiler to have to
worry about this - speculative inlining of function pointers guarded by a
safety check is something that FDO builds will actually do.

~~~
BrandonM
The first comment there (by Anonymous) claims that the final technique can
also be optimized:

    
    
        (memset_ptr)(p, 0, len);
    

_> can be replaced by:_

    
    
        if (memset_ptr == memset) {
            memset(p, 0, len);
        } else {
            memset_ptr(p, 0, len);
        }
    

_> Which in turn can be optimized using the other tricks noticed above into:_

    
    
        if (memset_ptr != memset) {
            memset_ptr(p, 0, len);
        }
    

I'm no expert, but this seems like a believable defeat of the technique in the
post.

~~~
cperciva
That's not quite right since it's now reading memset_ptr twice, but the
concept does seem to be right -- the volatile pointer must be read but the
standard doesn't require that the function is invoked.

~~~
spott
What about a data race? Theoretically, the function that memset_ptr points to
could be changed between when it is checked and when it would be run.

~~~
Karellen
If you have multiple threads accessing a shared (mutable) variable in your
program, even a shared volatile variable, then you need to guard _every_
access to that variable (which, in this case, includes every function call
through memset_ptr) with proper thread synchronisation primitives. Marking a
variable "volatile" is not enough to prevent data races in a multi-threaded
environment.

If you've put a semaphore, or mutex lock, or whatever around your calls
through memset_ptr(), the transformations will all take place inside the lock,
and data races should not be an issue.

~~~
spott
I think you missed my point.

memset_ptr is a const (not changed by this program... theoretically) volatile
(allowed to be changed by the system, theoretically) pointer to memset. In
THIS PARTICULAR CASE, memset_ptr points to memset. The compiler however
doesn't know that it won't change due to another processes, but we do. So the
compiler shouldn't be able to optimize out the call directly to the function
pointer because it introduces a possible race condition: the program reads
that memset_ptr points to memset, then the pointer changes (due to some other
process changing it), but the program still calls memset, and not memset_ptr.
The optimization allows for a possible race condition to occur.

~~~
tedunangst
That race exists regardless. Many systems will execute this as loading the
pointer into a register, them jumping to it. The value could change between
those two instructions.

------
annnnd
Is the proposed solution really the best approach? It seems complicated to me
and relies on obscure parts of the language. Maybe the problem (compiler
optimizes away function call because the result is no longer needed) could be
solved like this:

    
    
      memset(key, 0, sizeof(key)); 
      if (key[0])  // we are using key, so you can't skip memset()
        dropDead();
    

Unless the compilers "understand" memset and still optimize away the last two
lines? I would hope not... Does anyone know how aggressive the C optimizers
are these days?

~~~
anon4
> Unless the compilers "understand" memset and still optimize away the last
> two lines

They do. That's the whole "problem" \- the compiler knows what memset is and
what it does, since it's specified in the standard.

------
Genmutant
Why wouldn't you make key volatile? Shouldn't that solve all the problems? Or
is it because it would be to slow because the compiler can't do that many
optimizations in the rest of the function any more?

~~~
cperciva
Yes, making _key_ volatile would force the zeroing to happen; and yes, you
don't want to do that because it would absolutely kill your code performance.

~~~
theseoafs
Can't you just cast it to a `volatile uint8_t *` at some later point when you
need to ensure that we've zeroed the memory?

~~~
mikeash
That's discussed in the article. Volatile ultimately applies to the storage,
so a sufficiently smart compiler may be able to deduce that you're lying to it
with the cast and elide the write.

------
userbinator
_While this completely subverts our intention, it is perfectly legal: The
observable behaviour of the program is unchanged by the optimization._

This begs the question of what is "observable behaviour" \- execution time,
which is definitely "observable" and the basis of timing-based attacks, can
certainly change depending on what the optimiser decides to do.

I think this and similar cases of "fighting the optimiser" should really be
solved with per-function (or even per-statement) optimisation settings; both
GCC and MSVC support #pragma's to do this, although it's nonstandard.

~~~
cperciva
The term "observable behaviour" is defined in the standard: Essentially, I/O
to files and interactive devices, plus accesses to volatile objects.

~~~
schoen
Perhaps in retrospect this was an inappropriate choice of definition, at least
for cryptographic operations.

~~~
bunderbunder
I'm gonna go ahead and say it: Perhaps in retrospect C is an inappropriate
choice of language for these kinds of applications.

This "Performance at all costs, including safety and predictability" thing may
be appropriate in video games, but for security-critical applications that
philosophy is downright negligent.

~~~
tedunangst
I'm not aware of any language that would be better. Most languages don't even
let you touch memory to try to zero it.

~~~
swift
Many languages zero every allocation unless they can prove that you
immediately write over that memory without reading it.

~~~
tedunangst
Which does nothing for memory which has been freed but not reallocated.

~~~
swift
That's a great point; should've thought through that post a little better.
Thanks.

------
xroche
Why would you want to zero a buffer ? Because it may contain sensitive
information, I presume. If you don't have additional properties w.r.t
allocated memory, what prevent a system with high load to temporarily put the
given memory block on swap, leaking the information on disk ? Security is
hard...

~~~
ibisum
>Why would you want to zero a buffer ? Because it may contain sensitive
information, I presume.

You don't zero sensitive buffers. You randomize them, then free() them.

~~~
clarry
Why do you randomize them?

~~~
syncsynchalt
Because just free()ing them means anyone calling malloc() can get your
password.

~~~
clarry
So why don't you zero them?

~~~
ibisum
Because then your attacker knows that your buffer had something in it of
value.

------
haberman
Interesting. This appears to solve a more general problem, which is: how to
create a barrier against inter-procedural optimization and dead code
elimination.

I wonder if this trick could also be used to solve the double-checked locking
problem.

From the quintessential DCLP paper
([http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf](http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf)):

    
    
        Consider again the line that initializes pInstance:
    
        pInstance = new Singleton;
    
        This statement causes three things to happen:
        Step 1: Allocate memory to hold a Singleton object.
        Step 2: Construct a Singleton object in the allocated memory.
        Step 3: Make pInstance point to the allocated memory.
    
        [...]
    
        DCLP will work only if steps 1 and 2 are completed before
        step 3 is performed, but *there is no way to express this
        constraint in C or C++*.
    

But Colin's pattern here seems to be a way of indeed guaranteeing this. The
volatile function pointer is a barrier against inter-procedural optimization:
if the function must be called, then step 3 cannot possibly be performed
before steps 1 and 2.

(There might still be necessary hardware barriers that are missing, and the
lack of a memory model for pre-C11/C++11 probably makes it all technically
undefined behavior anyway. But the key sequential ordering constraint that was
claimed inexpressible in C and C++ appears to indeed be expressible with this
trick, if indeed the trick works for guaranteeing a call to memset).

~~~
nitrogen
_if the function must be called, then step 3 cannot possibly be performed
before steps 1 and 2._

So just to clarify, there's no way the compiler could do "1, 3, 2" instead of
"1, 2, 3"? It seems a naive implementation of a compiler could store the
pointer to the allocated memory in the pInstance variable before calling the
constructor, rather than using a temporary location for the pointer (e.g. a
register). Does C++11 and later specify otherwise?

~~~
haberman
I should have been more specific. To use Colin's trick with this pattern, you
would need to write a separate function (like InitializeSingleton()) that
calls the constructor and returns the pointer. If InitializeSingleton() is
impossible to inline/optimize, which is the goal of Colin's trick, then 1 and
2 must happen before 3, because 3 cannot happen until the function has been
called and returns, and the function does steps 1 and 2.

------
foobarqux
You should have test cases to verify the zeroing behavior in the object code.
Even if the standard says a compiler must do something does not mean that it
does.

~~~
maxlybbert
The difficult thing is that any way to verify the zeroing behavior would
change the compiler's decision about whether it could elide the call to
memset. So it's possible (well, almost guaranteed) that the test would succeed
even though the memory wouldn't actually be zeroed in production.

~~~
paulasmuth
If you set up your tests up correctly they should test the exact binary or
shared library that is deployed to production and not some test-specific
build.

------
apaprocki
At least in LLVM 3.4, this seems to do the trick too:

    
    
      static void secure_memset(void *, int, size_t) __attribute__((weakref("memset")));

------
defen
Nice teaser at the end there. Does it have something to do with the fact that
the OS may have paged the memory containing the sensitive data to disk?

~~~
dsjoerg
I figured the article would be about how you have to write random data to the
buffer to truly "zero" it, otherwise the ghost of the data can still be read
using some trick.

~~~
Dylan16807
Main memory decays in milliseconds.

------
sgentle
This all seems kind of silly. Why doesn't C have a type qualifier like called
"secure" to inform the compiler that it should avoid security-compromising
optimisations and maybe even automatically zero the memory when it falls out
of scope?

~~~
ovi256
That sounds a lot like automatic memory management!

To a C dev, that's the same as communism to a US Republican.

~~~
dllthomas
As a C dev, no. First, stack allocation is a type of "automatic memory
management", and in most situations we C devs are perfectly comfortable with
it. Second, in terms of how the memory is allocated/deallocated, the above
doesn't sound any different than stack allocation. The difference is more like
"volatile", telling the compiler "this memory is special, treat it carefully",
and it mostly doesn't seem unreasonable. Note that C compilers frequently have
extensions providing a way of naming destructors for particular variables. It
probably _would_ still be possible to skip it with a non-local jump (longjmp
or computed goto) but avoiding those in security conscious code is probably
_already_ standard - it basically is in most code I've encountered.

------
scott_s
Colin, you're missing end-parenthesis in your memset calls.

~~~
cperciva
Fixed, thanks.

------
e12e
It is a little mind boggling that support for proper handling of this didn't
arrive until c11. For a symmetric cipher without a demanding setup/init phase
- would it make sense to just do a few rounds on a buffer using the zeroed
key? Obviously quite a few more cycles, but should at least be a predictable
(constant) overhead?

~~~
maxlybbert
What do you mean? The solution that Percival presents compiles fine on my C89
compiler.

~~~
e12e
I didn't mean to imply that the solution as presented didn't work, I was just
wondering if it would _also_ work simply running the cipher with the zeroed
key in order to avoid zeroing the key being optimized away. Obviously that'd
be a lot more cycles; I'm just curious if it would be a viable solution ;-)

~~~
maxlybbert
I was referring to the first sentence ("It is a little mind boggling that
support for proper handling of this didn't arrive until c11."). I don't see
anything C11-specific in the code Percival posted. I don't know enough to say
anything useful about the rest of your comment.

~~~
e12e
My point was that this dance around "observed behaviour" isn't needed in C11,
as per: "(...) on C11 (are there any fully C11-compliant platforms yet?) you
can use the memset_s function. (...) [which is] guaranteed (or at least
specified) to write the provided buffer and to not be optimized away."

On another note, searching for memset_s and openbsd yielded this hit from
2012:

[https://mail-index.netbsd.org/tech-
security/2012/07/22/msg00...](https://mail-index.netbsd.org/tech-
security/2012/07/22/msg000540.html)

Which seems to point back to:

[https://mail-index.netbsd.org/tech-
userlevel/2012/02/25/msg0...](https://mail-index.netbsd.org/tech-
userlevel/2012/02/25/msg006157.html)

So I guess the "trick" outlined in the (very lucid) post has been known for a
while.

~~~
maxlybbert
Thanks. I had missed the part in the article about the C11 changes.

------
pjungwir
Does anyone have any advice on articles about C compiler optimizations in
general (especially gcc)? I'm doing my first serious C work in ten years, and
I keep wondering if I should fuss with things like this or let the compiler
handle it all:

    
    
        foo->bar->baz[i].oof = foo->bar->baz[i].durb + meep;
    

vs

    
    
       what *tmp = foo->bar->baz[i];
       tmp->oof = tmp->durb + meep;
    

EDIT: I'm not asking for a link to this:

[https://gcc.gnu.org/onlinedocs/gcc/Optimize-
Options.html](https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html)

I'm asking if there is _advice_ about it. Any overviews with common pitfalls,
advice on when to use -O1 vs -O2, specific optimizations to turn on/off, etc.

~~~
exDM69

        foo->bar->baz[i].oof = foo->bar->baz[i].durb + meep;
    

This is fine, no need to "optimize" anything. This kind of common
subexpression elimination should be done by any modern compiler (for any
language!) and the algorithm behind it is taught in university classes too.

Most of the time it's safe to use -O3. If you're doing numerical code with
floating points -ffast-math is also pretty safe if your code is correct (ie.
no NaN/Inf bugs). Almost the only reason to turn off optimization (-O0) is
when higher optimizations make using a debugger harder.

Here's a pretty nice article with some specific optimizations that GCC can (or
can't) do. It's pretty old, though, the examples were done with GCC 4.2.1,
current version is around 4.9.

[http://ridiculousfish.com/blog/posts/will-it-
optimize.html](http://ridiculousfish.com/blog/posts/will-it-optimize.html)

These days Clang can be as good or better than GCC most of the time. The
exceptions are in more exotic code like kernel space stuff or micro controller
programming.

There's no room for guesswork if you actually want to optimize code, so spend
some time reading the assembler output from your compiler as well as
benchmarking the results. I usually use objdump -d objfile.o to look at
assembly output.

~~~
pjungwir
Really appreciate everyone's replies! A related question about my example:
what if I want to assign the pointer dereference to `tmp` to improve
readability (rather than avoid multiple traversals). Is there any reason _not_
to use a tmp variable (presumably with a better name)?

~~~
sjolsen
>what if I want to assign the pointer dereference to `tmp` to improve
readability

I personally find code like that harder to follow. The first version is
clearer than the second (and you forgot to take the address of
foo->bar->baz[i]).

~~~
pjungwir
> and you forgot to take the address of foo->bar->baz[i]

Ha, I was afraid of that. :-) Still re-learning when I need that with arrays
and when not.

------
drivingmenuts
Why would the compiler be allowed optimize away a call to a perfectly valid
function? This seems like it's allowing to compiler to make judgement calls on
whether or not your code is worthy of being executed.

~~~
DSMan195276
It's because memset is a standard function and has a defined standard way of
acting, with the most important part being that it doesn't produce any side-
effects. It's also worth noting that accessing memory that's no longer in the
current scope is undefined-behavior, so the compiler can assume it doesn't
happen. Thus the memset has absolutely zero effect on the actual program and
isn't necessary.

In _general_ this isn't a big deal. It's only a big deal here because we're
working on the assumption that you might have an issue in your code that
invokes undefined-behavior and access contents of memory that you're not
supposed to be looking at anymore, and the compiler's just assuming that your
program won't ever allow that.

~~~
pwg
What if one made some trivial use of the block of memory after having
performed the memset, say something like this:

    
    
       void
       dosomethingsensitive(void)
       {
            uint8_t key[32];
            ...
            /* Zero sensitive information. */
            memset((volatile void *)key, 0, sizeof(key));
            key[0] = key[1] + 1;
       }
    

Would that thwart the optimizer, or would it also see through that usage and
eliminate it as well?

~~~
DSMan195276
You should keep in mind that the memset call is almost guarenteed to be
inlined. So your code actually looks like this:

    
    
        void
        dosomethingsensitive(void)
        {
            uint8_t key[32];
            /* ... */
            int i;
            for (i = 0; i < 32; i++)
                key[i] = 0;
            key[0] = key[1] + 1;
        }
    

Assuming the optimizer is sufficiently smart, then it'll remove that 'for' and
it'll remove the addition after it in the same fashion.

------
jstanek
Does GCC include any flags to prevent this sort of detrimental optimization?

~~~
prof_hobart
There's a bunch listed ([https://gcc.gnu.org/onlinedocs/gcc-3.1/gcc/Optimize-
Options....](https://gcc.gnu.org/onlinedocs/gcc-3.1/gcc/Optimize-
Options.html)), including

-O0 // Do Not Optimise

This sounds like it should do the trick (but I've not done C coding for quite
some time, so I don't know if there's a nuance as to why it wouldn't), but
would also presumably kill any other optimisation.

A better option may be to combine that with pragmas
([https://gcc.gnu.org/onlinedocs/gcc/Function-Specific-
Option-...](https://gcc.gnu.org/onlinedocs/gcc/Function-Specific-Option-
Pragmas.html)) to switch optimisation levels within the code.

Anyone who's played with GCC recently know whether this would work or not?

------
Perseids
> on C11 […] you can use the memset_s function

How is the case for modern C++? Are there `vector` or smart pointer
alternatives that reliably zero the memory in the destructor?

------
dllthomas
It seems like ideally what we need is a language designed for as-fast-as-
secure computation, JIT for the specific architecture it is going to run on to
ensure no differences in timing, energy use, or anything (within whatever
bounds are achievable) even in the face of different cache layouts, CPU
optimizations and similar and which makes it a point to clean up everything
that is not meant to be returned.

------
api
If your goal is just to "burn" the memory, why not write your own loop that
copies some arbitrary piece of data that the compiler can't optimize out over
the memory's contents? Do something like fill the buffer with its own pointer
address.

~~~
clarry
It's been stated here already. You can write whatever you want into it. It
doesn't matter. What matters is that the compiler realizes there's a dead
store going into it, i.e. that the data is thrown away after it's written. So
it can optimize out any write, since no conforming program can read that data
after it's thrown away.

------
im3w1l
A sufficiently malicious compiler could keep around a copy of the key in non-
volatile memory.

------
kazinator
The article the completely obvious:

    
    
        /* implemented in another translation unit */
        void zero_for_sure(void *data, size_t size);
    
        void func(void)
        {
          char securedata[42];
          /* ... */
          zero_for_sure(securedata, sizeof securedata);
        }
    

The key here is that our zero_for_sure is an external function in a separately
translated file. In the absence of a stunningly advanced global optimization
that peeks into other previously compiled units, the compiler has no idea what
zero_for_sure does, and so it has to earnestly pass it the given piece of
memory.

In turn, zero_for_sure is just this:

    
    
       void zero_for_sure(void *ptr, size_t size)
       {
          memset(ptr, 0, size);
       }
    

The compiler has no idea where ptr might come from since this is an external
function, and so it cannot optimize away the memset.

Only if the compiler could consider the whole program together could it still
optimize this.

In fact, you don't even need this function, just a dummy external function:

    
    
       void zero_for_sure(void *ptr, size_t size)
       {
          char securedata[42];
          /* ... */
          memset(securedata, 0, sizeof securedata);
          commit(securedata);
       }
     

Of course, commit is a noop which just returns. But the compiler doesn't know
that because commit is in another translation unit.

The only optimization card that the compiler could pull here is since
securedata is going away (so that it is illegal for commit to stash a pointer
to it), it's okay to call commit with a pointer to some _other_ block which
contains zeros, and not actually securedata.

With any trick like this, you should inspect the object code to make sure it's
doing what you think it's doing.

Oh, and sizeof doesn't require parentheses when the operand is an expression;
they are required when a type name is used as an operand.

~~~
dllthomas
From the article, _" Some people will try this with secure_memzero in a
separate C file. This will trick yet more compilers, but no guarantees — with
link-time optimization the compiler may still discover your treachery."_

[https://gcc.gnu.org/wiki/LinkTimeOptimization](https://gcc.gnu.org/wiki/LinkTimeOptimization)

[http://llvm.org/docs/LinkTimeOptimization.html](http://llvm.org/docs/LinkTimeOptimization.html)

~~~
sharpneli
Move the func into a dynamically linked library.

Thanks to the performance requirements of dynamic linking it's going to be a
really long time until we have dynamic linker peeking into .so files and
checking what a func does.

~~~
dllthomas
That certainly does it, yes. Though that's more complicated than the solution
offered here, and of course now the right thing to do is just a memset_s.

------
MichaelMoser123
You can have a function that wrapps memset with zero argument this wrapper
should be in a different shared library, this way the compiler will not follow
it; wait, that's exactly what memset_s is,

~~~
dllthomas
Not exactly. There's no reason memset_s can't be understood by the compiler,
inlined, and optimized, so long as those bits still get zeroed. That's not the
case for any of the other approaches.

~~~
MichaelMoser123
the compiler should not make any assumption about what memset_s is doing (same
goes for user defined memset wrapper function in different shared library -
the implementation of that function is not known at compile or link time); if
it can't make such assumption then it can't optimize the call out.

~~~
dllthomas
I think you misunderstand me. There is no guarantee that there will be no
optimization _of_ or _around_ memset_s (at least, not provided by the
standard), and we don't want one. What the standard has done is assured us
that it will not be optimized _away_ \- that the effect of zeroing that memory
will be treated as visible even if the memory is otherwise dead. Allowing the
compiler to optimize without changing semantics is _desirable_ , and is
permitted by the standard but prevented by the attempts to erect artificial
walls through linking (and also by the volatile function pointer in the
article), so memset_s - in addition to being clearer - is a technically
superior solution.

------
bakhy
I had no idea that such things are possible in C. The things about I've read
recently (the "friendly" C suggestion) and this seem like violations of the
spirit of the language. And for what, really? The language looses its
signature predictability, which to me seemed like a great feature of C.

If you write crappy code and expect the compiler to fix it for you, you should
maybe consider another language. I can only imagine how hard it is to write
reliable system software in a language that does these things.

~~~
PhasmaFelis
> _If you write crappy code and expect the compiler to fix it for you, you
> should maybe consider another language._

This seems rather to be a case of writing good code and having the compiler
break it for you.

~~~
bakhy
Yes it is. You misunderstood my point, which was about the purpose of
existence for such optimizations - they are meant to improve code, which
presumably needs improvement. But if your code needs improvement, then why not
go for a higher level language?

BTW, I gave you an upvote by accident :D

~~~
clarry
They are meant to make the code run faster. If your code needs optimization,
going for higher level languages is seldom a good idea.

You could try do low level optimizations in your C or assembly, but for most
programs this will eventually backfire. So letting the compiler do its job is
actually a good thing.

------
jimmaswell
Couldn't you just compile these functions with optimization turned off, in a
separate binary or something?

~~~
amalcon
You would also need to link with optimizations turned off (or link
dynamically, making the behavior undetermined at link time) to be sure. Linker
optimizations have been a thing for a few years now.

------
oso2k
Wouldn't returning the passed memory block through the return fix the
optimization issue?

------
tiffanyh
For those of you unaware, Colin Percival (author of the blog) was for many
years the FreeBSD Security Officer and he's highly recognized in the field for
his expertise.

He also runs [http://www.tarsnap.com/](http://www.tarsnap.com/) which is
arguably the most secure (and cost effective) back up solution in the market.

(I'm in no way affiliated with Colin and/or Tarsnap. Just a fan of his work
and humble attitude.)

~~~
cperciva
_humble attitude_

I'm guessing you haven't seen the "comeback of all time" thread...

~~~
tiffanyh
I haven't.

But I think it's super funny and cool that you (yourself) are pointing it out.

All the best with you.

Edit: just read the "comeback of all time". That was really funny. Nice nod
from PG as well. For those of you unaware like me:
[https://news.ycombinator.com/item?id=35083](https://news.ycombinator.com/item?id=35083)
Colin is our resident mathematical genius :)

~~~
bentcorner
Thanks for linking that. It's always interesting to see posts from several
years ago and reflect on how things turned out (if you go up a few parents you
can see discussion about tarsnap and dropbox makes a brief appearance).

------
smegel
When I allocate the key on the heap, the memset is carried (heavily optimized
and inlined). When I allocate key on the stack, it disappears. Using gcc -03:

    
    
        #include <string.h>
    
        void doSecure(void)  
        {  
            /*char key[32];*/  
            char *key = (char*) malloc(sizeof(char)*32);
    
            memset(key,sizeof(char),32);  
        }
    
        int main(void)  
        {  
            doSecure();
    
            return 0;  
        }
    
        -- key on stack
    
        main:  
        .LFB13:  
            .cfi_startproc  
            xorl	%eax, %eax  
            ret  
            .cfi_endproc
    
        -- key on heap
    
        main:  
        .LFB13:  
            .cfi_startproc  
            subq	$8, %rsp  
            .cfi_def_cfa_offset 16  
            movl	$32, %edi  
            call	malloc  
            movabsq	$72340172838076673, %rdx  
            movq	%rdx, (%rax)  
            movq	%rdx, 8(%rax)  
            movq	%rdx, 16(%rax)  
            movq	%rdx, 24(%rax)  
            xorl	%eax, %eax  
            addq	$8, %rsp  
            .cfi_def_cfa_offset 8  
            ret  
            .cfi_endproc

~~~
tedunangst
A more realistic idiom is memset followed by free. The free provides a solid
hint to the compiler that the object is dead without relying on escape
analysis.

~~~
smegel
Naturally, however adding free made no difference in this case. I guess if
your dealing with a raw pointer rather than an array type, gcc cant be sure
what memory you intend to erase.

------
angersock
This sort of thing would be exactly what should go into the "Friendly C"
dialect being chatted about the other day--for things like zeroing memory,
it's very unexpected that a compiler would be like "nah, not feeling
it...nobody will notice anyways".

~~~
cperciva
If you're not worried about writing in a language which is widely supported,
just use memset_s and tell people to find a C11 compiler.

~~~
JoshTriplett
Or have your build system add memset_s.c to their compile on systems that
don't have it.

~~~
tedunangst
A C11-unaware compiler will not be guaranteed to provide the always zero
semantics, any more than your own secure_memzero could be optimized away.

------
anon4
Just put it in a shared library and don't worry about it. Why all these
compiler-specific brittle solutions when simply putting a function in a .so
will ensure it's being called and will prevent any link-time optimisations.

~~~
TheLoneWolfling
Last time I checked, there was nothing specifically preventing an
implementation of C from doing whole-program optimization at runtime, even to
the point of dynamic library calls.

So: this is not something you can rely on always working. Yes, it works
currently, but it is not guaranteed to always do so.

------
jasonme
Well, if the buffer is so critical and yet small, why not just free it and re-
allocate the whole thing next time we use it.

~~~
davidcuddeback
Because freeing memory doesn't erase its contents.

