
C is not your friend: sizeof and side-effects - ingve
http://blog.tjd.phlegethon.org/post/159564806182/c-is-not-your-friend-sizeof-and-side-effects
======
mpweiher
If you use ++ in a sizeof expression...you're probably getting what you
deserve. C may not be your friend, but it is fair.

~~~
RubenSandwich
I'm not sure what a fair or unfair language would be. Is an unfair language
one who's compiler output is non-deterministic? If that is the case then C
fails that test with undefined behavior.

What I'm really trying to say is that as an industry we need to stop proping
up C as if it's some golden god. Was it a good compromise for it's time? Of
course but we can and should do better because it's clear that writing 'good'
C code might be beyond the realm of man.

~~~
jackmott
If we want to do that, we need to make it possible to fully harness the
abilities of the computer in other languages.

Just as a litmus test: In how many languages can you cause the SIMD shuffle
instruction to be used, explicitly? How many have compilers that can take
advantage of it?

JVM / .NET languages don't do this (I know .NET can do SIMD, but it can't do
shuffle). Rust can maybe do it? Not sure on the status. Dlang SIMD support
seems only partial.

Which can do a popcnt explicitly or automatically?

There are lots of little holes in other languages just in the space of "Can I
even use the instructions the cpu has". Then of course being able to control
memory completely is a huge advantage in some domains. Any language that lets
you do that will have most of the same danger problems as C or C++

I'm sure you could come up with something that is a little safer with as much
power, but it isn't so easy for someone to just stop using C/C++ right now if
they need that control.

~~~
RubenSandwich
I would answer that you can use Rust to your first question.

But I agree with your general sentiment that currently C/C++ is the only
choice for certain domains. But there is nothing theoretically stopping us
from creating safer system languages. (See Rust and Swift, much less so Swift.
But Apple is working towards that goal with the option to disable reference
counting in a future version which will give much more predictable
performance.)

My argument is that we need to stop pretending that C/C++ is not worth
improving. Every time someone criticizes a gotcha of C/C++ HN gets very
defensive. We need to kill our gods if we want better ones and I personally
think we need better system programming languages because it is far too easy
to write bad C/C++ code.

~~~
pmelendez
>"My argument is that we need to stop pretending that C/C++ is not worth
improving."

I don't think that's the sentiment of their communities though. Not sure about
C, but in the case of C++ it has been continuously adding significant features
in the latest three iterations (C++11,C++14 and C++17), if that is not signal
of improvement I don't know what it is

------
tobyhinloopen
C is a language that gives you a very sharp laser knife that can cut
everything clean off very efficiently and fast. However, without care, you'll
cut your fingers off. Always wear safety goggles around dangerous equipment
and don't do stupid things like i++ inside a macro call.

You're writing C, don't give me the "I might not know that what I'm calling is
a macro" argument. You better know what you're calling in C. If you don't want
to know what you're calling, don't write C.

~~~
Sir_Substance
>You better know what you're calling in C. If you don't want to know what
you're calling, don't write C.

That puts a de-facto cap on either the speed of C development or the size of C
projects.

It is definitely possible to have a project so large that no one person can
understand it all. So either you can't have C projects larger than whatever
that size is, lest you risk not knowing what you're calling, or you have to
accept that developers working on your project will be reading and re-reading
parts of source code that haven't changed in years in order to make sure they
know what they're calling, rather than re-implementing common patterns in the
codebase and trusting to prior diligence.

Either way, you're not selling C to me as a language to embark upon a major
project with.

~~~
stouset
> Either way, you're not selling C to me as a language to embark upon a major
> project with.

Sounds about right.

I dearly love C, but it's time to move on. Languages like Rust seem to solve
most of the major pitfalls while not sacrificing on speed.

~~~
socmag
Is that something you are just musing out loud, or are you proselytizing?

If it is the former free to use whatever language you choose. If it is the
latter, can you please keep your opinion as just that.

I believe we are all adults here and quite capable of making judgment calls.

Rust has some interesting features and lots of other languages too. No need
for the drive by.

~~~
stouset
Who pissed in your cornflakes?

I'm not sure how much more diplomatically I could have phrased my previous
post. If mentioning a language as one possible modern alternative ("languages
like Rust") is enough to set you off, perhaps you need to reconsider who's
falling short of adulthood here.

~~~
socmag
That's not actually what you said.

What you said is it is about time we moved on from C and suggested Rust.

That seems like your own subjective opinion. Which is fine of course, each to
their own.

Personally I will continue to use whatever language makes sense for the job no
matter what it might be.

Happy to take one for the team if that stance doesn't work for you, I'm not
the one here passively aggressively proselytizing.

And thank you for confirming that by the way.

------
knieveltech
This feels like a rant about a language handing bad developers enough rope to
hang an entire village. I'm unconvinced this represents a failing of the
language.

~~~
tdb7893
I think in software engineering it's prudent to just assume that all
developers make mistakes sometimes. Anything that can have developers make
less mistakes or can catch developer mistakes saves time (and therefore money)
in the long term so I think it's fair to complain that c makes mistakes easy
and difficult to catch.

~~~
Baeocystin
Complaining about C in this manner strikes me as complaining that the knife
you're using is too sharp, and you don't have time to make sure you won't cut
yourself.

Don't get me wrong. I prefer working in Python or its ilk, because I prefer a
higher level of safeguards. But nothing I work on needs the precision of
behavior that C provides. And sometimes you need sharp and precise.

~~~
dbaupp
C is not precise, not as a model of modern CPUs (they're very different to the
PDP-11, and compiler optimisers are very invasive) nor as a precision
engineering tool (too many silent footguns and landmines adding a pile of
incidental complexity).

It's definitely true that it's sharp and sometimes (currently) the best tool
for the job, but don't ignore the reality of its design and limitations when
praising it.

~~~
ori_b
I'm not sure where this meme that C is a direct translation to PDP-11 assembly
came from. It's true that there were some quirks in K&R C that came from it
(eg, the way that floats were widened to doubles when being passed to
functions), but those largely went away with ANSI C.

Other than that, C isn't a significantly better match for PDP-11 than it is
for x86 or alpha or sparc.

------
breadbox
This flaw was pointed out while C99 was under review. The idea of changing
sizeof to not be evaluated at compile-time did not sit well with a lot of
people. Unfortunately, the alternative (sizeof not being usable with dynamic
arrays) was clearly worse.

~~~
bvda
Do you happen to know why that was perceived as worse? If I understand
correctly the difference would be that you'd have to write (i * sizeof(char))
vs. (sizeof(char[i]))? Personally I'd even prefer the former.

~~~
gandalf013
A very common idiom is "T _p = malloc(n_ sizeof *p)". That only works if
sizeof doesn't evaluate its arguments.

~~~
kazinator
Why not? Suppose you have a C99 VLA object, and you want to malloc the
equivalent amount of space. Couldn't you just do this:

    
    
       int vla[nparam];
       int *pcopyspace = malloc(sizeof vla);
    

Here _sizeof vla_ is basically just _nparam_ sizeof(int) _.

_ vla* is evaluated to the extent of calculating its run-time size.

------
buserror
Ah, the weekly 'C is evil' HN bit.

FYI, your mouse is also dangerous, you can easily click the wrong way and send
all your money away to someone in Nigeria. Should we blame the mouse?

~~~
Etzos
This seems like a bit of a knee-jerk reaction on your part. The article itself
is interesting and points out an edge case which is likely to trip people up
in certain situations. I don't really see the article mentioning or even
hinting that "C is evil", it even includes a bit of a qualifier at the end: "C
is reaching PHP-like levels of confusion here, but luckily for us this
combination . . . is pretty hard to hit. I’ve never seen it in “real” code."

~~~
buserror
A bit yes, I have to admit this. However it's not HARD to find new&interesting
way of screwing up with the language, it's been done for longer than I'm
alive, and some people abused it in extraordinary ways over the years [0], so
all I can think of when I see that sort of article is 'HN clickbait'. It
works. It's actually borderline trolling at this rate.

[0]: [http://www.ioccc.org/](http://www.ioccc.org/)

------
woliveirajr
The beauty of C/C++ is exactly that it gives you freedom to do things. Period.
There are a lot of languages designed to limit you (or make safer programs,
you name it). Some might be faster / more productive / safer / environment-
friendly, use what you think you should. But don't try to limit C, please.

~~~
self-diversity
Then stop writing important code that can cause me harm when inevitably easily
exploited because of that freedom, please.

------
klodolph
You can see why VLAs were made optional in C11, even though they were
mandatory in C99. There were just too many weird interactions that VLAs had
with other parts of the language. (That, and it was an easy way to get a stack
overflow.)

~~~
vvanders
Coming from C++ VLA are one of things that I really wish I had access too. The
number of times I've seen stuff created on the heap just to toss at the end of
the block in production code always made me cringe.

Agreed that they're a very sharp tool and need to be used with care.

~~~
kllrnohj
C++ has way way better ways to handle this. What you do instead is a
std::vector with an allocator that has a fixed-size chunk of memory to cover
your common, small case and falls back to malloc if the size becomes too large
to fit into the internal allocation.

Then your VLA is stack allocated when small and heap allocated when large and
works everywhere, even as a return value or other cases where alloca would
fail.

Example:
[https://howardhinnant.github.io/stack_alloc.html](https://howardhinnant.github.io/stack_alloc.html)

~~~
vvanders
Yeah, that's a much nicer solution. In the cases where it mattered we just
built a standard block/arena allocator but that's a bit more work which means
it was rarely used.

------
headcanon
Is there a kind of "linter" for C that spots stuff like the problems presented
here (as opposed to just formatting)? What kinds of static analysis are
available here?

~~~
syncsynchalt
There are lots of useful -W flags to gcc/clang that expose language pitfalls
(examples: -Wparentheses and -Wshadow).

As far as dedicated tools, my best experience is with Coverity, which seems to
have a design focus of finding errors in codebases with a minimum of false
positives.

------
jasonlhy
C is a very "strict" language. It do the right thing only if you write the
right code. It has many undefined behaviors. One of the undefined behaviors I
found when I studied the compiler course. The purposed grammar from
[https://www.amazon.com/Programming-Language-Brian-W-
Kernigha...](https://www.amazon.com/Programming-Language-Brian-W-
Kernighan/dp/0131103628/ref=sr_1_1?ie=UTF8&qid=1492654731&sr=8-1&keywords=c+programming)
allows this kind of structure, there can be statements after the switch but
before the case labels. gcc also accepts this, but those statements will not
be executed. Note: code modified from
[https://www.tutorialspoint.com/cprogramming/switch_statement...](https://www.tutorialspoint.com/cprogramming/switch_statement_in_c.htm)

#include "stdio.h"

int main () {

    
    
       /* local variable definition */
       char grade = 'B';
       int a = 1;
    
       switch(grade) {
          printf("great!\n" );
          a = a + 1;
          
          case 'A' :
             printf("Excellent!\n" );
             break;
          case 'B' :
          case 'C' :
             printf("Well done\n" );
             printf("a: %d\n", a);
             break;
          case 'D' :
             printf("You passed\n" );
             break;
          case 'F' :
             printf("Better try again\n" );
             break;
          default :
             printf("Invalid grade\n" );
       }
       
       printf("Your grade is  %c\n", grade );
     
       return 0;
    }

------
joatmon-snoo
This comes with me not being very experienced or familiar with C, but this
is... disturbing.

I mean, I totally get that this is a really poor use case, and as I just
learned from looking it up (because I didn't know this before), sizeof in C
isn't a function or macro, but an _operator_.

That just doesn't jibe with the mental model of C that I had in my head. And
programming languages, if anything, should be _conducive_ to mental models,
not hostile to them.

~~~
notalaser
One could equally point out that programmers should form mental models by
reading documentation, rather than relying on intuition. It's not like you
have to dig through compiler sources to find out that sizeof is an operator.
Even if it wasn't plainly documented in pretty much any properly-written
material on the C programming language, it's pretty obvious that sizeof _can
't_ be a function, since a function doesn't have access to the sort of data
required in order to retrieve this sort of information.

~~~
dbaupp
The reality is people form mental models by intuition. Things designed for
humans to use (like programming languages) should take this reality into
account.

~~~
notalaser
And then people wonder why other engineers smirk when they hear the term
"software engineering" :-).

We're not all fitted with the same Umbrella IntuCard Mk. II Intuition card so
that, relying on intuition alone, we can all reach the same results from the
same inputs. Relying on intuition to figure out how a language works is about
as smart as relying on intuition to figure out the limit of a series, how a
telephone work or how a transistor works -- all three of which have been
definitely designed for humans to use, since there were nothing _but_ humans
who could use them back when they were invented.

I don't like it any more than you do, and my life would be much easier (and
probably happier) if it weren't like this, but blaming it on the language is
the equivalent of the turtle blaming calculus after losing the race to
Achiles.

~~~
dbaupp
I don't see how your series or resistor examples are particularly relevant:
neither mathematical nor physical facts are human inventions. The packaging
is, but I'd argue that the packaging for both _is_ relatively intuitive for
someone likely to be encountering them.

For this example, there is a large amount of intuition both from C
specifically and more broadly in programming (i.e. the background of people
likely to be encountering this) that foo(bar) is a function call that
evaluates all its various components. The syntax used for many "calls" of
sizeof violate this intuition. The uniformity that leads to this corner case
may indeed be the right choice (and probably actually is, IMO), but discarding
intuition---and assuming everyone is a robot who's memorised all corner cases
of their tools and never makes a mistake---is dumb, especially in a relatively
restricted domain like programming where there is a reasonable set of
common/shared assumptions to build on.

Lastly, the "a good carpenter never blames their tools" sentiment is also
dumb, as a universal guideline: tooling can be objectively bad and unhelpful,
and, even if someone can make it work, there will be downsides. Taking things
to the limit, it's clear that "C but instead of text, just write down the
UTF-8 encoding of a file as one huge base-10 number" would be an actively bad
programming language and lead to all sorts of problems over normal C. The same
thing holds for, say, C compared to other tools: people make objectively more
of certain particularly bad classes of errors when writing C because the tool
isn't helpful enough.

~~~
notalaser
> I don't see how your series or resistor examples are particularly relevant:
> neither mathematical nor physical facts are human inventions.

Mathematical facts are a very human invention, just like the _trans_ istor. I
picked calculus precisely because intuition based on physical observation
gives the wrong answer nine times out of ten.

> assuming everyone is a robot who's memorised all corner cases of their tools

But this is not a corner case! I'm not talking about an obscure quirk (God
knows how many C has!), this is _literally_ syntax, the first thing you learn
about a language!

~~~
dbaupp
_> _trans _istor_

(Whoops, but my point still stands just as much.)

 _> Mathematical facts are a very human invention, just like the transistor. I
picked calculus precisely because intuition based on physical observation
gives the wrong answer nine times out of ten._

I think I didn't convey my point very well: humans and their intuitions have
no influence over the value of the mathematical series 1 + 1/4 + 1/9 + ... nor
do we have any influence over the behaviour and interactions of certain doped
(etc.) silicon. Both of these just are, outside the influence of humans.

Certainly, for the former, we have control over the syntax we use ("the
packaging")/how we write it down, but, say, changing what "1" means changes
the series to a different object. In any case, control over the syntax is
_exactly_ like a programming language, and indeed, mathematical notation is
very similar and it's great when people have sympathy for the reader's
intuition (and their own) when designing new instances. (The ⌈x⌉ notation for
the ceiling of x is a nice example of this working well.)

Similarly for a transistor, humans can decide how we put the bits of silicon
together, and what wires we connect up, but we absolutely cannot decide the
fundamental principles that make them work. This means, if we want a certain
behaviour, we're limited by what physics can do.

 _> But this is not a corner case! I'm not talking about an obscure quirk (God
knows how many C has!), this is literally syntax, the first thing you learn
about a language! _

Yes, you're right `sizeof(foo)` is syntax... that looks a lot like
`function(foo)` but behaves differently with respect to any side-effects in
foo. As I said above, I do actually think this was a reasonable choice on C's
part, but it seems silly to, seemingly, dismiss any possibility that it could
be confusing & that maybe future languages/language designers should avoid
doing this (or at least explicitly think about the choice they're making) and
instead just repeat RTFM over and over.

------
deepsun
Many people comments can be reduced to "Yes, you can shoot yourself to the
foot. Don't do that".

The problem, however, is whether we call ourselves "engineers", and not just
"coders". Real world engineers take responsibility for anything related to
their project, even if it's not their job.

Flying instructors often teach "everyone is stupid for 15 minutes a day".
That's why every plane control is made to prevent errors, even if other
prevention measures failed. Of course, we often don't need that kind of
reliability in computer programming.

But in cases when we really do "engineering", and not just "programming", we
want to prevent shooting to the foot, even when user points out gun to their
foot and pulls the trigger. That's why they put safety switch on guns.

------
socmag
File under:

"When I poke myself in the eye, it hurts."

I don't think in 40 years of programming in C and C++ I have ever seen an
example of someone doing something with sizeof() quite that weird, and I've
seen plenty of weird things.

Plus one for the happy ending that they won't be doing that again.

------
mikeash
Is this a pitfall people actually fall victim to? It's an interesting corner
case, to be sure, but has anyone been bitten by it in real work?

~~~
mfukar
I doubt it. Expressions like this raise all sorts of red flags in code review.

------
faragon
C is OK: it is not your friend, it is a tool.

------
benchaney
I don't understand why this would come up. When would you ever use ++ on the
result of evaluating an expression.

Edit: Now I see the issue. Still, this should be easy to avoid. Don't use
expressions with side effects as the input to a macro.

------
jstimpfle
> sizeof (char[i]) is equivalent to i * sizeof char, and will be evaluated at
> run time.

Correction: _sizeof working_array_ is equivalent to...

~~~
jstimpfle
my bad, "sizeof (char[i])" actually works!

------
camgunz
Jamming `++` and `--` everywhere is bad. Don't do it. Hooray we can use C
again.

------
kazinator
> _The length of the working array depends on the value of i._

Not classic C90; only the brain damaged C99 dialect.

Prior to C99, _sizeof expr_ is a constant expression, always.

You can easily program in C90. Plenty of currently maintained or even new
projects are C90 only.

If using gcc, use _gcc -ansi_ or equivalently _-std=c90_ , and there you are.

There is a way to detect (albeit at run time) when a macro is given an
expression which contains side effects. This:

[http://git.savannah.nongnu.org/cgit/kazlib.git/tree/sfx.h](http://git.savannah.nongnu.org/cgit/kazlib.git/tree/sfx.h)

compile the sfx.c into your code, along with the except.c and hash.c modules.
Then you can use the SFX_CHECK macro in your macro bodies. For instance, the
dict.h header file uses it like this:

    
    
      #ifdef KAZLIB_SIDEEFFECT_DEBUG
      #define dict_isfull(D) (SFX_CHECK(D)->dict_nodecount == (D)->dict_maxcount)
      #else
      #define dict_isfull(D) ((D)->dict_nodecount == (D)->dict_maxcount)
      #endif
    

If the debugging is enabled, and some place in the application calls
dict_isfull(d++), then there will be a diagnostic, thanks to code generated by
SFX_CHECK(D), which parses the expression (or pulls the parse out of a cache)
and then diagnoses based on the results. Unfortunately, this is a run-time
check: the expression has to be stepped on.

(A bit of work could turn this into a static system. E.g the macros could spit
out a special annotation, then the output of gcc -E could be collected and
post-processed to look for the annotated material and determine if there are
side effects.)

I wrote this in around 1999 and it hasn't been maintained since.

The parser has to work with incomplete information and hence deal with
ambiguities. For instance (a)(b) could be a function call or a cast of (b) to
the type (a). We don't know and so we have to conclude that this is a "maybe
side effect" (function call). In the face of ambiguities, we parse the
expression multiple ways. If it parses correctly in two ways, we conclude
pessimistically: if one of the ways of parsing it suggests there is a side
effect we go with that. If errors are encountered in some of the parses,
backtracking takes place, and this is done with the help of exception
handling, which is built on setjmp/longjmp.

------
gpvos
It shouldn't be too hard for compilers to warn about this, I think.

------
MooMooMooney
C of course is not your friend, C is always computer's friend.

------
kbradero
c is not your friend as a knife is not.

