
Stop Memsetting Structures - unmole
https://www.anmolsarma.in/post/stop-struct-memset/
======
anyfoo
Does that also zero the padding that may exist between elements of the
structure (or at its end)? Because if not, not memset'ing will open you up to
all kinds of info leaks, _especially_ if you plan to send that struct over the
network (or to another process, or from kernel to userspace...).

EDIT: Actually I have to apologize, because I’m not certain what the C
standard says about whether the padding bytes can change when changing members
of the structure (6.2.6.1):

 _6 When a value is stored in an object of structure or union type, including
in a member object, the bytes of the object representation that correspond to
any padding bytes take unspecified values. 51)_

If that means that the padding can actually also change when just assigning
individual members (which I think it does, but footnote 51 only calls out
copying the whole struct by assignment, and I’m not 100% sure that “including
a member object” means what I think in this context), then that means that
copying an unpacked structure across information boundaries is technically
_always_ unsafe, because any change of a value in the struct might leak
information by changing the padding to undefined values. It’s safe with
explicitly packed structs, though (which should be the majority of the cases,
as you’d have a representation independent of the architecture), but then so
is what the author describes. I’d appreciate if someone could clear this up!
memset’ing can still help you getting a structure quickly into a well-defined
state, though.

~~~
Someone
Sending structures that include padding or that you don’t know the exact
layout of over the network is another thing you shouldn’t do, period.

Code doing that may break with a compiler change or a compilation flag change
on one end of the connection, even if you can guarantee both ends use the same
CPU.

~~~
johncolanduoni
This kind of issue happens all the time with structs that a kernel, driver
etc. expose to userspace:
[https://j00ru.vexillium.org/papers/2018/bochspwn_reloaded.pd...](https://j00ru.vexillium.org/papers/2018/bochspwn_reloaded.pdf)

Not sending the structure over the network is insufficient, and rigorous
memset is a clear solution to this problem.

Also, C structure layout for a specific ABI is strictly determined; if that
weren’t the case you couldn’t compile two C compilation units with different
flags/compilers and expect them to work.

~~~
blattimwind
Also don't expect memset() to actually be a library call at all.

~~~
rurban
And most importantly dont expect memset to be executed at all. Most -O2
compilers happily optimize the memset away if if doesn't care about the
sideeffects. Therefore there exists the "secure" memset variants, which are
the insecure variants adding a simple compiler barrier, so it isn't optimized
away.

And there exists the very few really secure memset_s variants with a real
memory barrier which guarantees memset to be executed in load and store order,
flushing the cache, so that Spectre attacks are mitigated. security-relevant
crypto libraries, like openssl and variants don't care about that, because
memory barriers are "too slow". only the Linux kernel does it properly. It's
still a mess.

~~~
FrozenVoid
you can write your own memset that bypasses all of that. Its just 2 lines of
code.

~~~
rurban
I don't trust any memset at all. Literally every single memset I looked at was
either broken, too slow or insecure. Mostly the wellknown glibc, freebsd,
msvcrt and compiler implementations.

~~~
rmtgk
That's quite a bold claim. Care to explain further?

~~~
rurban
I already explained it. Noone implements memory barriers in its safe variants,
only compiler barriers. And the trivial variants work byte-wise, not word-
wise, which is usually 8x slower.

------
snops
My favourite trick with designated initialisers is using them with array
indices and enums, e.g.

    
    
      enum color {RED, GREEN, BLUE};
      char *color_name[] = {
        [RED] = "Red",
        [GREEN] = "Green",
        [BLUE] = "Blue"
      };
    
      printf("%s",color_name[RED]);
    

This means you don't have to ensure ordering is the same between the enum and
the array, which can be a pain for more complex tables. It's a shame C++
doesn't support this, more details:
[https://eli.thegreenplace.net/2011/02/15/array-
initializatio...](https://eli.thegreenplace.net/2011/02/15/array-
initialization-with-enum-indices-in-c-but-not-c)

~~~
maxnoe
Why would you ever want to do something like this in c++?

What you are implementing is a map. Just use std::map.

~~~
halayli
Lookup time is widely different if you use std::map. But you can use
std::vector in a similar fashion.

------
userbinator
_Because it’s 2019!_

It really irritates me when people give arguments like this. I seriously don't
care what year it is, I care that the code I write works. A lot of projects
still use C89 because they want the widest portability. C99 isn't very new,
but especially in the embedded and other niche industries, C89 (often with
some extensions) is all you get.

That said, "= { 0 };" is almost always usable instead of a memset() _and_
works in C89 too.

 _And finally, there is absolutely no reason to check if a pointer is NULL
just before calling free() on it_

This is advice I agree with. The nullcheck is in free() itself (IMHO a good
design decision --- especially along error-handling and cleanup paths.)

~~~
deathanatos
> _but especially in the embedded and other niche industries, C89 (often with
> some extensions) is all you get._

Embedded and niche industries are forcing their programmers to work with 30
year old tooling? (Instead of 20 year tooling?)

I think at some point, writing yourself a transpiler from C99 to C89 becomes
profitable… (The JS community did it for a lot less than that w/ babel…)

~~~
userbinator
_Embedded and niche industries are forcing their programmers to work with 30
year old tooling? (Instead of 20 year tooling?)_

Why does how old it is mean anything at all? I personally choose C89 because
of its wider compatibility, and I don't think I'm any worse off because of it.
As the saying goes, "it's not about the tool, but how you use it."
Incidentally, the best programmers I've worked with have also been the ones
with the most conservative choice of tools.

 _I think at some point, writing yourself a transpiler from C99 to C89 becomes
profitable… (The JS community did it for a lot less than that w / babel…)_

If anything, I think the JS community could learn a lot from the "C community"
(a _very_ loosely defined term...) --- the huge amount of churn with anything
related to JS and web stuff in general turns me off.

------
heinrichhartman
Am I the only one who finds this

    
    
        int yes=1;
        setsockopt(listener, SOL_SOCKET, SO_REUSEADDR, &yes, sizeof (yes)) ;
    

much cleaner than this

    
    
        setsockopt(listener, SOL_SOCKET, SO_REUSEADDR, &(int) {1}, sizeof(int));
    
    ?

~~~
petee
I agree anything is better than a magic number, but really only if the
variable is properly descriptive.

What are we saying 'yes' to?

~~~
scarejunba
Definitely agree that calling it `reuse_local` makes more sense, but OP was
just quoting the blogpost.

~~~
unmole
And the blog post was just quoting Beej.

------
nerdile
I disagree with this author on both counts. 1\. Zero filling is theoretically
unnecessary, yes. But theory is not often reality. Zero filling is a defensive
measure that protects you from other code that makes unhealthy assumptions.
Code external to yours may make assumptions about what lies in your padding,
either by doing memcmp's, or adding new fields in a later version of the
struct and intending it to be ABI compatible between modules. I'll prefer safe
and evolvable code over your nitpick about how it's initialized, thank you
very much. 2\. Consider yourself lucky that you only use free() in your code.
In more complex code bases, custom allocators are used. They do not always
follow the same semantics as are dictated for free(). Maybe instead of saying
"this perfectly valid thing drives me nuts", ask why people do it. It's not
because they don't know better.

~~~
iforgotpassword
> Code external to yours may make assumptions about what lies in your padding,
> either by doing memcmp's, or adding new fields in a later version of the
> struct and intending it to be ABI compatible between modules.

The content of the padding between members can arbitrarily change during
assignments of members, so code relying on the content of it is broken
anyways. In the case of an extended struct both methods are equally broken if
you don't recompile your module with the updated definition of the struct.
Your memset will still have the old size of the struct as the third argument.

~~~
hedora
If you use:

    
    
      sizeof(struct_instance)
    

the compiler will automatically update the size passed to memcpy. (I like
passing the instance to sizeof and not the type, just in case the type of the
thing being copied changes in a future version of the code.)

~~~
iforgotpassword
That is exactly what I said:

> ...if you don't recompile your module with the updated definition of the
> struct...

And once you recompiled, the C99 version with the pretty initializer is fixed
as well, so there is no advantage in using memset regarding this.

------
thenewwazoo
> Cue the Rust Evangelism Strike Force chiming in to say that there is really
> no reason to be writing new C code in 2019.

Even as a member of the Force (RIIR 1st brigade), I laughed pretty heartily at
this. :)

~~~
_bxg1
Same. But also, I feel like _most_ of The Force's efforts are directed at C++.
I could be wrong.

~~~
jasonhansel
C++ programmers need to be converted. C programmers just need sympathy.

~~~
nineteen999
We don't really need sympathy, we get paid a lot and can afford to drown our
sorrows.

~~~
jasonhansel
At least until the next zero-day caused by a buffer overrun. (Kidding,
kidding.)

~~~
nineteen999
I know you're kidding, but C programmers also get paid to provide security
fixes. There's plenty of C code out there that doesn't interface with
untrusted data/input, or does it in a safe way. So I'm not really in fear of
losing my job over a simple buffer overflow.

------
flohofwoe
C99 designated init doesn't work for C code that also needs to compile in C++
mode (...yet, a limited subset of designated init is coming in C++20, clang
also seems to be more relaxed about this and allows the full C99 designated
init also in C++ right now).

For code that only needs to compile as C, I agree. It's one of the best (if
not _the_ best addition) to the C language. A nice addition would be default
values for struct members in the declaration with compile-time constants.

~~~
Someone
My C++ is rusty (no pun intended), but if your code also has to compile as
C++, I think this will work instead of memset:

    
    
      struct addrinfo hints = {0};
    

If C++ requires

    
    
      struct addrinfo hints = {};
    

instead, you could create a macro, and write

    
    
      struct addrinfo hints = ALL_ZEROES;

~~~
loeg
Sure. C++ has initializers ({1,2,3}), it just doesn't have C99 designated
initializers ({.a = 1, .b = 2, .c = 3}). This is one of the main pain points
around C++ not being a true superset of C.

~~~
hedora
G++ actually says “sorry” when it bails on that.

The other (bigger) pain point is that malloc returns void*, and C++ requires a
cast.

------
kazinator
Don't use C99 initializers, or not directly. Write a macro:

    
    
      struct point {
        double x, y;
      }
    
      #define init_point(x, y) { (x), (y) } /* C90 */
    
      #define init_point(x, y) { .x = (x), .y = (y) } // C99
    
      struct point p = init_point(0.5, 1.5);
    

The macro is superior because macro calls are checked for number of arguments.
If you have a structure in which certain members must be initialized (to a
non-default value), and such members can be added over time, that will save
your butt. How? Because when you add a new member that must not be zero/null,
you add a parameter to the macro. Then, the compiler will tell you all the
places where the macro is called with too few parameters.

Designated initializers eliminate some of the error-prone aspects of C90
style, but don't help enforce the discipline of initializing all members, or a
certain subset of members.

Functional abstraction with checked arguments beats fancy new syntax.

------
cperciva
Another reason to not memset structures: In the code

    
    
        struct foo {
            void * bar;
        } baz;
        
        ...
        
        memset(&baz, 0, sizeof(struct foo));
        assert(baz.bar == NULL);
    

it's possible for the assertion to fail, since NULL it not guaranteed to be
represented in memory by a zero bytes.

~~~
ThrowawayR2
> _it 's possible for the assertion to fail, since NULL it not guaranteed to
> be represented in memory by a zero bytes._

Technically correct (which is of course the best kind of correct) but you'd be
hard pressed to find a system in 2019 where NULL != (void*)0. The most recent
machines with non-zero NULL in the C FAQ entry on the matter
([http://c-faq.com/null/machexamp.html](http://c-faq.com/null/machexamp.html))
date back to the mid '90s.

~~~
Sharlin
The literal `0` is guaranteed to be the null pointer value when used as a
pointer, so by definition

    
    
        NULL == (void*)0.
    

But given

    
    
        void *p = 0; 
        intptr_t i = 0;
    

it is _not_ guaranteed that `memcmp(&i, &p, sizeof(p)) == 0`.

~~~
nybble41
> it is not guaranteed that `memcmp(&i, &p, sizeof(p)) == 0`.

Even less intuitively, it is not guaranteed that `(void*)i == p` since `i` is
not an integer constant expression, even if the value is known to be 0.

------
tbodt
Most interesting to me is that

    
    
      &(int) {1}
    

is equivalent to

    
    
      int one = 1;
      &one
    

How did I not know about this? I'll be using it from now on.

------
HorstG
The article gives very dangerous advice, especially for networking code.
Padding will not be overwritten and when transmitting the struct (provided
there is no attribute packed or the compiler ignores it) information will
leak.

Yes, someone will point out that there are languages that do not have such
problems. However, when still using C, one has to be extremely careful...

~~~
saagarjha
I am reasonably certain that any code that happens to transmit padding bytes
(or observe them in any way) is abusing at the very least implementation-
defined behavior.

------
kazinator
> _While we’re on the topic of little annoyances, another pattern I often see
> is using a variable just to pass what is effectively a literal to
> setsockopt():_
    
    
      int yes=1;
      setsockopt(listener, SOL_SOCKET, SO_REUSEADDR, &yes, sizeof (yes)) ;
    

_> Which can instead be written as:_

    
    
      setsockopt(listener, SOL_SOCKET, SO_REUSEADDR, &(int) {1}, sizeof(int));
    

_>_

I'm sorry, no it cannot. Or, rather, not without

1\. Switching C90 code to C99; or

2\. Switching C++ code to C99.

Good luck with 2.

GCC and Clang support this in C++ as an extension; then you're not writing in
standard C++ any longer.

1 is a nonstarter in project that mandates C90.

In other words, there may be valid reasons why the code is that way other than
ignorance of compound literals.

------
seba_dos1
> Elements that are not specified are initialized as if they are static
> objects: arithmetic types are initialized to 0; pointers are initialized to
> NULL.

That's actually not true. It is the case when initializing arrays, but for
structs (and unions) any non-named member has "indeterminate value" (see
paragraph 9, section 6.7.8 of ISO/IEC 9899:1999; unchanged in C11 and C18).
Well, at least according to the standard - GCC, Clang and MSVC all happen to
make it zero, but it's not guaranteed by the C language.

The trick with avoiding a variable when passing an address to a literal value
is nice though, I didn't know that. There are some places in my code that
could use it ;)

~~~
fanf2
Para 9 is talking about members that do not have names in the struct
definition. Missing initializers are covered later:

21 If there are fewer initializers in a brace-enclosed list than there are
elements or members of an aggregate, or fewer characters in a string literal
used to initialize an array of known size than there are elements in the
array, the remainder of the aggregate shall be initialized implicitly the same
as objects that have static storage duration.

------
cosmin800
The memsetting code is more obvious for older programmers and probably more
compatible too. I don't want to start a flame or something, but if you want
shiny new programming features and initialisation you should probably look
away from C and try javascript ot something similar. One of the reasons we
love C is that the same code compiles and works cleanly on the latest linux, a
freebsd 4.11. or PIC16.

------
dwrodri
A new edition of K&R that is edited to address the new features that have been
added since C89. I suppose that some might argue that there isn't much need
for a new edition because anyone who can grok C89 could probably grok the rest
themselves Internet-based self-teaching, but I would pay good money to have
all those resources in consolidated into one printed resource.

------
vnorilo
After debugging undefined padding bytes breaking per-byte pod type hashes,
I'll hang on to my memset, thank you very much! :)

------
seren
This is mostly a matter of taste, but if the first example makes sense, the
second example looks less clear, because, for someone quickly skimming through
the code, a variable which have a somewhat clear meaning "yes" (that could be
improved) is replaced with a magic number who have no intrinsic meaning.

~~~
TheSoftwareGuy
I agree. I’m also not a fan of doing `sizeof(typename)` as it can hide intent

------
jschwartzi
I'll stop memsetting structures as soon as you support designated initializer
lists in C++.

~~~
NikkiA
Apparently it's in C++20, so feel free to stop memsetting whenever you update
your C++ compiler next (clang needs a flag to enable it, gcc has had it as
part of it's c++2a support for 2 years)

------
GuB-42
It is C99 code.

It is worth noting that many C projects go for extreme portability. That's why
C is used in the first place, and sometimes C99 is too much. I still sometimes
work with compilers that barely support C89. That's for the aerospace industry
BTW. There are even some libc that don't accept free(NULL).

If you know the platform you are writing for is not ancient, that's fine but
if you are writing portable code, you have to realize these ancient platforms
are still alive.

I think ditching K&R C is fine though. I've seen some of it but never
encountered a case where it was the only thing the compiler supported.

~~~
saagarjha
> There are even some libc that don't accept free(NULL).

That’s pretty much the completely opposite of portable, since it violates the
C standard.

~~~
dooglius
The "extreme portability" he's talking about is portability of projects to
compilers and libc implementations that behave badly, possibly in standards-
violating ways like this.

------
emilfihlman
Interesting, I was under the impression that compound literals of form
(type){value} were illegal and the correct form was (type []){value} but
apparently not!

------
CJefferson
There were a few old free implementations which misbehaved when given null.
Maybe they are all gone now, but there was a reason to do it.

~~~
hazeii
Indeed, I recall getting messages along the lines of "error: freeing a null
pointer" or something like that. On the other hand, what are the chances of
some freshly-written C being built with an old compiler? (possibly moderately
high in the embedded world?).

------
childintime
Last I checked C99 initializers require constants, so they don't generalize,
whereas the pattern shown does. I'd prefer if the initializers pattern did
allow variables on the right-hand side for this reason.

~~~
atq2119
Not true, designated initializers expect the same kind of expressions are
variable initialization (which is limited at global scope, but arbitrary when
initializing local variables).

------
ndesaulniers
Oh? I think I reviewed a few Linux kernel patches this week that used memsets
to fix up messed up aggregate or designated initialization which is the most
concise way of initializing the padding.

------
benou
Frankly, use the style that is idiomatic to your code base is what is the most
important. Keeping things consistent is what matters.

And if you start a new project, just use the style you prefer and enforce it
:)

------
stochastic_monk
-Wclass-memaccess is a warning emitted by newer g++ versions. It’s led me to use safer/better, if less convenient, options. (That being said, it’s only applicable to C++.)

------
bayareanative
We use Crystal for networking code, and it has nice C binding semantics.
Haven't had to drop down to pure C much because Crystal just works, and
doesn't have the learning curve or un/boxing hoops of rust. It's basically
Ruby-like but with rudimentary type-inference, like what Ruby 3 wants to be
but Crystal already exists... and orders-of-magnitude more performant.

------
paavoova
Interestingly enough, Clang (but not GCC) appears to embed memset for you
despite the code using initializers:
[https://godbolt.org/z/khJQLV](https://godbolt.org/z/khJQLV)

The author of the article should file a bug report. After all, if they wanted
to memset, they'd just do it themselves.

~~~
loeg
The compiler knows the representation of NULL on the targeted platform and if
it is safe to use memset. It is perfectly valid for the compiler to make this
transformation if the platform represents NULL as zero.

~~~
paavoova
My post is sarcasm, I just find it humorous that one's efforts to eliminate
explicit memset doesn't stop the compiler from going one step backwards and
embedding it just the same. If you replace the initialization with memset, it
still generates identical code for both Clang and GCC.

~~~
BeeOnRope
I don't think the point was to eliminate "code that looks like memset at the
assembly level", but to eliminate memset at the source level, which the author
considers cleaner.

Under the covers, one would hope a good optimizing compiler could choose the
same assembly for both, which may very well look "memset like" since fewer,
wide writes are generally more efficient.

------
swiley
Ugh, what do you do when the structure definition is updated and you now have
uninitialized values?

If there's no explicit initializer I think memset,calloc or ={0} is just fine
thanks.

~~~
hazeii
Unclear to me why you've been downvoted; I've seen many cases where a
structure has had elements added which would be end up uninitialised. A
memset(&struct,0,sizeof(struct)) seems pretty reasonable to me.

~~~
NikkiA
Probably because the article specifically addresses this issue and C99
specifically initializes any members who are not named in the initializer form
to either zero or NULL.

------
edoo
If you only use super simple structures maybe. It is quite common to zero out
a structure and use only parts of it. It is also quite common to have to
pragmatically fill in the structure.

~~~
usefulcat
> It is quite common to zero out a structure and use only parts of it.

That point is addressed in TFA.

------
skookumchuck
memset() heads off the dreaded "I added more fields, but forgot to add the
initializers everywhere" bugs. Well, it will still be a bug, but at least it
won't be a heisenbug.

~~~
atq2119
The point of the article is that C99 initialization does the same.

------
olliej
Because of c spec idiocy it isn’t event sufficient to use memset anymore, you
need to use memset_s.

But “stop memsetting” is clearly nonsense if you care about security.

~~~
eridius
Surely you only need memset_s if you're doing this for security reasons, as if
memset is optimized away it's because the compiler can prove the observable
semantics are the same.

~~~
olliej
Nope, memset will often be “optimized” such that padding is not zeroed. It’s
not because the compiler sees that padding isn’t observed, it’s because
reading from padding is undefined. Because reading from padding is undefined
the write (also undefined) by definition has no observable side effects.

Memset_s is the only way to guarantee that padding is actually zeroed, well
memset or bzero_s obviously.

It’s critically important for people to understand that if the C spec says
doing X is undefined, then current compiler writers interpret this as allowing
them to do anything they want, even when it clearly introduces security
vulnerabilities.

~~~
MichaelMoser123
are you sure that the padding is not zeroed by memset?

[1] says that memset_s exists because it is not optimized away if it just
clears a variable that will not be used after the call (unlike memset)

On the other end: people who do this for security reasons say not to trust
memset_s is 'an optional feature of c11 and not really portable' [2]

[1]
[https://en.cppreference.com/w/c/string/byte/memset](https://en.cppreference.com/w/c/string/byte/memset)

[2] [https://www.cryptologie.net/article/419/zeroing-memory-
compi...](https://www.cryptologie.net/article/419/zeroing-memory-compiler-
optimizations-and-memset_s/)

~~~
saagarjha
I don't think memset is required to zero out any padding, because the effects
of this are not observable.

~~~
MichaelMoser123
The function memset gets the length of the area to overwrite, how is it to
know where the padding is?

~~~
saagarjha
The compiler knows the layout of your structure and can choose to elide memset
calls or replace them with equivalent code.

