Hacker News new | comments | show | ask | jobs | submit login
C is not your friend: sizeof and side-effects (phlegethon.org)
70 points by ingve 242 days ago | hide | past | web | favorite | 116 comments

If you use ++ in a sizeof expression...you're probably getting what you deserve. C may not be your friend, but it is fair.

...you're probably getting what you deserve.

I suspect we could use a crawler that detects this phrase applied to a programming language, and use that as a proxy metric for programming language design quality. (The more unique detections of the phrase, the worse the language.)

>If you use ++ in a sizeof expression...you're probably getting what you deserve.

For every developer who reads your comment and learns something, there's going to be one who commits a bug based on this issue to a production repo without realizing it. When it comes to discovering this kind of minutia in programming languages, it basically goes down one of two ways: Either you randomly stumble across information about the hazard on stack overflow (or hacker news, in my case), or you blunder right into it at least once. I call these kinds of quirks "landmines".

I prefer my languages landmine-free, or as close to it as we could hope to achieve. This kind of behavior is definitely not something to make quips about, it's exactly the kind of language flaw that might cause government software to bill incorrectly calculated invoices to people who can't afford it, or overbook planes and cause confusion at airports. Maybe I've misinterpreted your comment, but your comment reads like you think of this sort of detail as "separating the men from the boys". It's totally not doing that. What it's really doing is randomly blowing up civilians.

I'm not sure what a fair or unfair language would be. Is an unfair language one who's compiler output is non-deterministic? If that is the case then C fails that test with undefined behavior.

What I'm really trying to say is that as an industry we need to stop proping up C as if it's some golden god. Was it a good compromise for it's time? Of course but we can and should do better because it's clear that writing 'good' C code might be beyond the realm of man.

Play stupid games, win stupid prizes. If you're passing an expression with side-effects (which should, in most cases, be on its own line) into a macro, you need to make sure that macro behaves as you intend.

Writing good C code isn't impossible, it just requires more resources than writing poor C code. A lot of places aren't willing to put the resources in to write good, secure code, in any language.

If it's not possible to write function-like-macros that actually behave like functions, then function-like-macros need to be not used.

It's a good recommended that you don't use them.


This is a recommendation I've seen in most coding guidelines I've read.

If we want to do that, we need to make it possible to fully harness the abilities of the computer in other languages.

Just as a litmus test: In how many languages can you cause the SIMD shuffle instruction to be used, explicitly? How many have compilers that can take advantage of it?

JVM / .NET languages don't do this (I know .NET can do SIMD, but it can't do shuffle). Rust can maybe do it? Not sure on the status. Dlang SIMD support seems only partial.

Which can do a popcnt explicitly or automatically?

There are lots of little holes in other languages just in the space of "Can I even use the instructions the cpu has". Then of course being able to control memory completely is a huge advantage in some domains. Any language that lets you do that will have most of the same danger problems as C or C++

I'm sure you could come up with something that is a little safer with as much power, but it isn't so easy for someone to just stop using C/C++ right now if they need that control.

I would answer that you can use Rust to your first question.

But I agree with your general sentiment that currently C/C++ is the only choice for certain domains. But there is nothing theoretically stopping us from creating safer system languages. (See Rust and Swift, much less so Swift. But Apple is working towards that goal with the option to disable reference counting in a future version which will give much more predictable performance.)

My argument is that we need to stop pretending that C/C++ is not worth improving. Every time someone criticizes a gotcha of C/C++ HN gets very defensive. We need to kill our gods if we want better ones and I personally think we need better system programming languages because it is far too easy to write bad C/C++ code.

>"My argument is that we need to stop pretending that C/C++ is not worth improving."

I don't think that's the sentiment of their communities though. Not sure about C, but in the case of C++ it has been continuously adding significant features in the latest three iterations (C++11,C++14 and C++17), if that is not signal of improvement I don't know what it is

I agree there are a number of improvements that could be made if we sacrificed backwards compatibility, it is just so hard to get everyone on board.

Explicit SIMD is nightly-only in Rust right now; it's being actively worked on.

The issue is that sizeof is most often used in function-like macros, and the user may not even realize that it's a macro. (Or that sizeof is involved in it.)

The reasonable places you can use `++` and `--` in C are exclusively:

- In a bare expression, with only the variable: `i++;` (That's what usually goes into `for`.)

- In a simple assignment, without any extra expression: `int a = b++;`

- While updating an array index that appears nowhere else: `v[i++] = 0;`

Using it anywhere else is asking for trouble. In C++, using the last one is also asking for trouble.

The deeper you get into C++ the more you realize that Swift made the right call by removing all four inc/dec operators.

In-place manipulation of values is a troublemaker.

C is too old for its creators to know that, given that it's the language where people got to learn that. But no newer language should carry them.

Well the user should check what that thing is you're calling, or use it with caution (ie not using side-effects as params)

C is a language that gives you a very sharp laser knife that can cut everything clean off very efficiently and fast. However, without care, you'll cut your fingers off. Always wear safety goggles around dangerous equipment and don't do stupid things like i++ inside a macro call.

You're writing C, don't give me the "I might not know that what I'm calling is a macro" argument. You better know what you're calling in C. If you don't want to know what you're calling, don't write C.

You're unnecessarily aggressive about blaming the caller. Whose to say that the caller didn't do their due diligence, but instead that the callee changed itself later on?

It's not uncommon to replace a function with a wrapper macro later on in an attempt to debug something or trace execution or whatever. It's not really realistic to expect that when such a replacement is done that they go analyze every possible call site to see if sizeof() results in side-effect evaluation or not.

This is just another case of C being a language built out of undefined behavior.

Some libraries explicitly document whether a given interface may be implemented as a macro. The C standard library and POSIX make these things a part of the specification. See getc vs. fgetc for example.

So, don't call any standard library functions, which might all actually be macros according to the standard?

I think a chainsaw is a better description. On the surface C gives you very fine control over memory, but it's so full of pitfalls that it takes a lot of skill to actually use it properly. When we put security into the equation, what you usually get in C is developers cutting limbs apart. There is no sugarcoating it - C is the top reason for security vulnerabilities.

>You better know what you're calling in C. If you don't want to know what you're calling, don't write C.

That puts a de-facto cap on either the speed of C development or the size of C projects.

It is definitely possible to have a project so large that no one person can understand it all. So either you can't have C projects larger than whatever that size is, lest you risk not knowing what you're calling, or you have to accept that developers working on your project will be reading and re-reading parts of source code that haven't changed in years in order to make sure they know what they're calling, rather than re-implementing common patterns in the codebase and trusting to prior diligence.

Either way, you're not selling C to me as a language to embark upon a major project with.

> Either way, you're not selling C to me as a language to embark upon a major project with.

Sounds about right.

I dearly love C, but it's time to move on. Languages like Rust seem to solve most of the major pitfalls while not sacrificing on speed.

Is that something you are just musing out loud, or are you proselytizing?

If it is the former free to use whatever language you choose. If it is the latter, can you please keep your opinion as just that.

I believe we are all adults here and quite capable of making judgment calls.

Rust has some interesting features and lots of other languages too. No need for the drive by.

Who pissed in your cornflakes?

I'm not sure how much more diplomatically I could have phrased my previous post. If mentioning a language as one possible modern alternative ("languages like Rust") is enough to set you off, perhaps you need to reconsider who's falling short of adulthood here.

That's not actually what you said.

What you said is it is about time we moved on from C and suggested Rust.

That seems like your own subjective opinion. Which is fine of course, each to their own.

Personally I will continue to use whatever language makes sense for the job no matter what it might be.

Happy to take one for the team if that stance doesn't work for you, I'm not the one here passively aggressively proselytizing.

And thank you for confirming that by the way.

I have no interesting in convincing you to use C for anything, but it IS true in absolutely any language that if you are changing it, you should know what you are changing and how it works.

You don't need to know the entire project, but you had damn well better understand the code right in front of you that you are editing.

In literally any language, developers who don't understand what they are changing will cause bugs.

"What you're calling" in the sense of "what the syntax elements, language-intrinsics, and macros you're using mean", not "what you're calling" in terms of plain-function APIs.

Macros (in any language, not just C) are the big problem, because you have to know a macro's definition before you can know precisely what its usage rules are.

Without macros, C projects can get as large as they like. Everyone must be required to know the language, but you can work on any plain-functional project given that knowledge.

Once you start #define-ing tons of preprocessor macro-functions, all bets are off; now your project is essentially its own language that every developer must first familiarize themselves with before usefully contributing, and that does indeed put "a de-facto cap on either the speed ... or the size" of projects.

> Macros (in any language, not just C) are the big problem

Not true, see: https://doc.rust-lang.org/book/macros.html#hygiene (not unique to Rust, though, Scheme even has some of this like Hygienic macros)

Macros don't have to be C-style simple text substitution.

Hygenic macros are still "the big problem."

There is a reason there are no thousand-developer FOSS projects written in a Lisp; and there is a reason Ruby is derided for "monkey-patching"; and there is a reason that a single parse-transform in an Erlang project is a very strong code-smell that needs intensive justification.

The reason, in all of these cases, is that a project that includes one unique macro definition, is now a project effectively written in a different language—"language Foo, plus macro Bar." HN, for example, is written in Arc, a.k.a. "Racket + several macros." Emacs is written in Emacs Lisp, a.k.a. "MacLisp + quite a few macros." In general, if a project goes on long enough without any prohibition against adding yet another macro to the codebase, eventually that project—like Arc, like Emacs—will have accidentally created an entirely-separate language.

But before then—even on the commit of the very first macro definition—the prerequisite knowledge required to contribute to the codebase has now diverged: you can't just jump in as someone who "knows Foo" and immediately understand the logical flow of 100% of the code. To contribute to such projects, people have to learn how the macros change the logical flow—i.e. learn that diverged language.

To which they very likely will say: screw that! Thus capping the number of contributors, and/or the speed of development.

> There is a reason there are no thousand-developer FOSS projects written in a Lisp

Yes; the main one being that about 996 of them wouldn't have anything to do.

> you can't just jump in as someone who "knows Foo" and immediately understand the logical flow of 100% of the code.

Can you just jump into the Linux kernel as "someone who knows C" and understand the logical flow of everything? USB, TCP/IP stack, virtual memory, file systems, block devices, scheduler, ...

Maybe I wasn't clear with my wording.

I didn't mean "logical flow" to refer to following the branches of the code at runtime by keeping a model of the code's run-time state in your head. That does, indeed, require having a pretty solid grasp of the entire codebase (and, in large projects, is just rarely done for this reason.)

Instead, I meant "logical flow" to refer to, effectively, the ability to do static analysis of the code in your head: knowing which blocks of code are live or dead; which blocks might execute zero or one times, like a branch, or zero or more times like a loop; which blocks might execute immediately and which will be closed over and then thunked later; which blocks are within synchronization barriers and which aren't, and so will be touched by multiple threads; etc.

When you know a language, and you're debugging code written in it, this is how you "skim" the code efficiently: you figure out where the Instruction Pointer might go, and then you follow it through the blocks of code it might execute along the way.

In the absence of macros, this is possible. But in the presence of macros—any of which could slice-and-dice your code, rearranging it, throwing some bits away, duplicating others, wrapping any given given expression in an if{} or a while{} or a lambda{} or a synchronize{}—you now have no idea what a given block of code that calls a macro will do. Where is the Instruction Pointer allowed to go? Who knows? It might even enter—for quite a long time—code that's only referred to in the macro body, not at the macro call-site. (But only sometimes; at other call-sites the macro-expansion might not call that function at all. It might depend on the function. It might depend on the module attributes. It might depend on the time of day!)

A block of code passed through a macro is its own world with its own rules. Those rules might be 100% similar to our own world save for one little thing... or they might be "only compile this to an expression if DEBUG has been defined at the project level" or "apply the statement body to itself as a Y-combinator" or "convert all blocking calls in your function into async calls followed by returns to the trampoline of an FSM to enable continuations, splitting your block at those points into several private functions."

Well, yes. If you don't know what defclass is, how do you know that (defclass a (b c)) isn't calling a function b with argument c?

You have to understand the macros. Either their documentation, or their code if that is lacking. If we don't know the macro at all, we can't do static or dynamic or any kind of analysis; we don't have the information.

Do you know what happens inside a function without chasing its definition, and then chasing its children and so on?

Programming is all about "controlling something with this little piece of text here, such that the real meat of it is done in multiple elsewheres". You do this with functions, or macros. If you don't do this ("everything happens right here and nowhere else") you will not get anywhere beyond a certain small complexity.

Okay, I know that a function in C is call by value. These expressions are evaluated and converted into argument values. They are not relocated into another scope, or turned into thunks or whatever. Well, so what? Once the function takes over, I have no idea, if I don't know what the function does.

> Do you know what happens inside a function without chasing its definition, and then chasing its children and so on?

Yes. That was my point: when you know a language, you can "unfocus your eyes" and read code without knowing anything about what the code does, except that A calls B calls C. You don't need to know what A/B/C are, or what they're supposed to do; if the program crashed, "errors in business logic" aren't your concern. You want to find an error in the model of the world the program is using, that lead to an "impossible" scenario. You want to uncover a series of static guarantees—or lack thereof—that lead to the possibility of a code-path [containing a throw() or an abort() or whatever] being activated that is never supposed to be activated.

If you have a crash with a backtrace of function C <- function B <- function A, then you will start with A, and look at the code around the call to B to see if any of it "is relevant"—i.e. if it might, lexically, have been the cause of the problem—and, after a second or two, you likely determine that it wasn't. So you move on to B. Do the same. Move on to C. Stare at the code. See something that smells vaguely wrong. Then start trying to understand the code, from there. Switch from skimming, to reading.

Any developer versed in a language can do this to any program written in that language, with no idea what it is they're reading. This is why "over-the-shoulder" debugging can magically work—if you know a language, you can "know broken code when you see it" without knowing a thing about the code itself.

But if, in the above, function A's call to function B, or B's call to function C, happened within a block wrapped in a macro call—now you can't skim. Now Mr. Joe "over-the-shoulder" Schmoe has no idea whether A or B are relevant. Now you have to go and find the docs/definitions of those macros, and familiarize yourself with them, before you can do anything.

If there's one macro, this is easy enough. FOSS projects of this type still have many contributors, although there might be sections of the codebase considered "harder to contribute to" than others. (See: the Linux kernel's syscall mapper macros.)

But if the whole language is effectively half-DSL—and not a business domain kind of DSL, but just the developers' idea of a bunch of awesome extra primitives, like Arc or Elisp—then random drive-by "oh, hey, I noticed that was broken and fixed it" FOSS contributions will go to zero, because nobody will spend enough time getting versed in the new one-off language to be able to "notice something was broken."

No, those would require domain specific knowledge.

But at least inside the Linux kernel everyone is talking the equivalent of English.

Understanding English does not imply you will be able to jump into any random scientific paper with 45 years of background theory and immediately understand it either.

Being able to understand the grammar and sentence structure really helps, even if you don't understand the vocabulary or technical principles of the domain you are reading.

Domain specific knowledge, domain specific language, ...

I smell fear of macros.

A macro is just another function, but it does some code transformation. The code base will not explode, the world will not stop revolving, just because a few macros.

I use LispWorks, which adds a few hundred macros (amongst many other things) to Common Lisp. It's still Common Lisp, but extended with macros for defining new data types, macros for defining UI features, even a few new control structures have been added. For most of the stuff I don't even have the source code, since it is a closed commercial product - but the macros are documented.

Lisp is still used and one of the reasons is its seamless integration of various meta-programming facilities. This requires developers to learn a slightly more complicated language. But in larger projects this is not more difficult than learning the architecture tools of large OO frameworks - frameworks which are common in Java, Javascript and many other languages.

Not fear, not at all. Resentment. I work on a codebase full of macros. I wrote a lot of them, when I was more naive. And now I'm the one who can't get any help with my code, because nobody wants to spend the time to familiarize themselves. :/

What have you done to get people interested in the code base?

Have you conducted a survey which confirms that lots of people want to help, but for the macros? Or are you guessing at the reasons?

First of all, do lots of people use it; is there a large user to contributor ratio?

It's hard to get help with a FOSS project no matter what; the ones that get help are vastly outnumbered by those which don't get help. According to openhub.net, "over half of all active projects [listed] on Open Hub are solo efforts". (This is boiler plate text which appears in the summary of projects which are that way.)

> That puts a de-facto cap on either the speed of C development or the size of C projects.

I don't think anybody will disagree.

This feels like a rant about a language handing bad developers enough rope to hang an entire village. I'm unconvinced this represents a failing of the language.

I think in software engineering it's prudent to just assume that all developers make mistakes sometimes. Anything that can have developers make less mistakes or can catch developer mistakes saves time (and therefore money) in the long term so I think it's fair to complain that c makes mistakes easy and difficult to catch.

Complaining about C in this manner strikes me as complaining that the knife you're using is too sharp, and you don't have time to make sure you won't cut yourself.

Don't get me wrong. I prefer working in Python or its ilk, because I prefer a higher level of safeguards. But nothing I work on needs the precision of behavior that C provides. And sometimes you need sharp and precise.

C is not precise, not as a model of modern CPUs (they're very different to the PDP-11, and compiler optimisers are very invasive) nor as a precision engineering tool (too many silent footguns and landmines adding a pile of incidental complexity).

It's definitely true that it's sharp and sometimes (currently) the best tool for the job, but don't ignore the reality of its design and limitations when praising it.

I'm not sure where this meme that C is a direct translation to PDP-11 assembly came from. It's true that there were some quirks in K&R C that came from it (eg, the way that floats were widened to doubles when being passed to functions), but those largely went away with ANSI C.

Other than that, C isn't a significantly better match for PDP-11 than it is for x86 or alpha or sparc.

C is like swinging a very sharp Katana to make small precise incisions. You need to stand far back to have any home of success, and you need to be very precise. With practice, your cuts may be be very precise, but it takes a lot of skill and effort. Until that point (some occasionally thereafter), a lot of accidents happen...

With a knife, you know where the sharp bit is. With C, sometimes the handle explodes, killing everyone in the kitchen.

Yea if you hold it wrong. Undefined behavior doesn't magically happen.

Sure, but who makes knife handles out nitroglycerin?

You are absolutely correct that the areas of Undefined behavior are well defined in C, and should be taken into account.

In programming, as in all products, the winning combination is good enough + quick enough. When using C, code has to be much higher quality than other languages.

Anyway, my point of view is that you should be REALLY REALLY sure you NEED C before using it. It is a very sharp knife, and it only takes one false move.

I don't really disagree with anything you're saying.

But that being said, for the times you do need C, what are the better alternatives? C# is managed, solves a ton of problems...and is completely unusable for driver development. C++ just gives you more rope to hang yourself. It seems to me that its danger and utility are inseparable elements of the same ultimate thing.

I agree completely.

Many projects (and individuals) do not have the discipline to use C properly.

If you need to use C, you need to take the time and care to use it properly.

"You're holding it wrong". I've heard that utterance elsewhere, another world where users are the wrong ones...

Every power tool I've ever used has a right way and a wrong way to use it, and the more powerful the tool, the greater the risk if you're 'holding it wrong'.

That Jobs had a gaffe using the same words doesn't strike me as relevant. We are, after all, talking about tools and tooling, not a phone design sacrificed on the altar of fashion.

Putting aside C specifically, is it possible to have a language that gives you the power you need, without the danger? I'm not completely sure it is.

I thought it was C++ that gave you enough to hang the village. I'm not sure who said it, but it's one of my favorite PL-related quotes:

If C gives you enough rope to hang yourself, C++ gives you enough rope to bind and gag your neighborhood, rig the sails on a small ship, and still have enough rope left over to hang yourself from the yardarm.

C++ refused to give you the rope; they left it in the SGI template library:


It's a demonstration that function-like-macros are not actually all that function-like, no matter how defensively the writer of said macros tries to be.

It might not be a failing of the language, but it might be a reason not to use it all the same.

"Use the right tool for the job at hand." and "A poor craftsman blames his tools." are axioms in every industry and craft.

This flaw was pointed out while C99 was under review. The idea of changing sizeof to not be evaluated at compile-time did not sit well with a lot of people. Unfortunately, the alternative (sizeof not being usable with dynamic arrays) was clearly worse.

Do you happen to know why that was perceived as worse? If I understand correctly the difference would be that you'd have to write (i * sizeof(char)) vs. (sizeof(char[i]))? Personally I'd even prefer the former.

A very common idiom is "T p = malloc(n sizeof *p)". That only works if sizeof doesn't evaluate its arguments.

Why not? Suppose you have a C99 VLA object, and you want to malloc the equivalent amount of space. Couldn't you just do this:

   int vla[nparam];
   int *pcopyspace = malloc(sizeof vla);
Here sizeof vla is basically just nparam sizeof(int).

vla* is evaluated to the extent of calculating its run-time size.

Ah, the weekly 'C is evil' HN bit.

FYI, your mouse is also dangerous, you can easily click the wrong way and send all your money away to someone in Nigeria. Should we blame the mouse?

This seems like a bit of a knee-jerk reaction on your part. The article itself is interesting and points out an edge case which is likely to trip people up in certain situations. I don't really see the article mentioning or even hinting that "C is evil", it even includes a bit of a qualifier at the end: "C is reaching PHP-like levels of confusion here, but luckily for us this combination . . . is pretty hard to hit. I’ve never seen it in “real” code."

A bit yes, I have to admit this. However it's not HARD to find new&interesting way of screwing up with the language, it's been done for longer than I'm alive, and some people abused it in extraordinary ways over the years [0], so all I can think of when I see that sort of article is 'HN clickbait'. It works. It's actually borderline trolling at this rate.

[0]: http://www.ioccc.org/

To be fair the article is called "C is not your friend"

The beauty of C/C++ is exactly that it gives you freedom to do things. Period. There are a lot of languages designed to limit you (or make safer programs, you name it). Some might be faster / more productive / safer / environment-friendly, use what you think you should. But don't try to limit C, please.

Then stop writing important code that can cause me harm when inevitably easily exploited because of that freedom, please.

You can see why VLAs were made optional in C11, even though they were mandatory in C99. There were just too many weird interactions that VLAs had with other parts of the language. (That, and it was an easy way to get a stack overflow.)

Coming from C++ VLA are one of things that I really wish I had access too. The number of times I've seen stuff created on the heap just to toss at the end of the block in production code always made me cringe.

Agreed that they're a very sharp tool and need to be used with care.

C++ has way way better ways to handle this. What you do instead is a std::vector with an allocator that has a fixed-size chunk of memory to cover your common, small case and falls back to malloc if the size becomes too large to fit into the internal allocation.

Then your VLA is stack allocated when small and heap allocated when large and works everywhere, even as a return value or other cases where alloca would fail.

Example: https://howardhinnant.github.io/stack_alloc.html

Yeah, that's a much nicer solution. In the cases where it mattered we just built a standard block/arena allocator but that's a bit more work which means it was rarely used.

There is no reason why C99 VLA's cannot be implemented in a way which does the same thing: small enough VLA's go on the stack, and bigger ones on the heap. An unwinding mechanism in the back end can be used to do the scoped clean up.

(Well, there is a possible reason: lack of integration between longjmp and unwinding cleanup mechanisms. I'm not 100% certain here, but I think VLA's are supposed to be cleaned up when abandoned by longjmp. Mind you, longjmp will also outsmart your std::vector.)

VLAs are actually not so nice. There is no sane way to handle failures if you make one too big.

Then don't make it too big and fall back to malloc(3) above a certain threshold?

The code to do what you describe is going to be nicer with alloca, because of the similarity between alloca and malloc.

   type *ptr = (nelem > THRESH)
               ? malloc(nelem * sizeof *ptr)
               : alloca(nelem * sizeof *ptr);


   if (nelem > THRESH)

Do you mind spelling out the few lines of code needed to do this?

Something like the following:

  pedanticmemmove(void *dst, const void *src, size_t n)
  	char buf[n > THRESHOLD ? 1 : n];
	char *p = n > THRESHOLD ? xmalloc(n) : buf;
  	memcpy(p, src, n);
  	memcpy(dst, p, n);
  	if (n > THRESHOLD)
  	return dst;
But yeah, the alloca(3) variant is a bit nicer to look at and probably results in exactly the same code (but is non-standard, of course).

VLAs in C99 may be nicer syntactically than doing something similar in C++, but there are much better examples of how to do multidimensional arrays correctly. For example, how they work in Fortran is better.

If you want to allocate from the stack, you can usually use alloca.

And nothing bad will happen if you alloca too much?

Exactly the same bad things will happen as if you declare a variable-length array.

Memory from alloca has function lifetime, while VLAs and compound literals have block scope. In a loop alloca could easily blow your stack even if you limit each allocation to something small. This is a serious pitfall, especially when using alloca with macros. The reverse is true for VLAs and compound literals in macros that return a pointer--the caller might not know that the lifetime is block-scoped and not function-scoped.

Also, alloca and VLAs aren't necessarily implemented the same way. When instantiating a VLA a compiler might extend and test the stack pointer in page-sized increments so an allocation doesn't extend past the guard page without triggering a segfault. An alloca implementation might not.

There's always alloca() when you want to get that small VLA-style buffer on the stack rather than heap... It's not POSIX, but ubiquitous on desktop Unixes at least.

Is there a kind of "linter" for C that spots stuff like the problems presented here (as opposed to just formatting)? What kinds of static analysis are available here?

There are lots of useful -W flags to gcc/clang that expose language pitfalls (examples: -Wparentheses and -Wshadow).

As far as dedicated tools, my best experience is with Coverity, which seems to have a design focus of finding errors in codebases with a minimum of false positives.

C is a very "strict" language. It do the right thing only if you write the right code. It has many undefined behaviors. One of the undefined behaviors I found when I studied the compiler course. The purposed grammar from https://www.amazon.com/Programming-Language-Brian-W-Kernigha... allows this kind of structure, there can be statements after the switch but before the case labels. gcc also accepts this, but those statements will not be executed. Note: code modified from https://www.tutorialspoint.com/cprogramming/switch_statement...

#include "stdio.h"

int main () {

   /* local variable definition */
   char grade = 'B';
   int a = 1;

   switch(grade) {
      printf("great!\n" );
      a = a + 1;
      case 'A' :
         printf("Excellent!\n" );
      case 'B' :
      case 'C' :
         printf("Well done\n" );
         printf("a: %d\n", a);
      case 'D' :
         printf("You passed\n" );
      case 'F' :
         printf("Better try again\n" );
      default :
         printf("Invalid grade\n" );
   printf("Your grade is  %c\n", grade );
   return 0;

This comes with me not being very experienced or familiar with C, but this is... disturbing.

I mean, I totally get that this is a really poor use case, and as I just learned from looking it up (because I didn't know this before), sizeof in C isn't a function or macro, but an operator.

That just doesn't jibe with the mental model of C that I had in my head. And programming languages, if anything, should be conducive to mental models, not hostile to them.

I'm going to be that guy for a moment and ask: why were you using sizeof without ever looking it up to find out what it was? I don't expect programmers to be perfect, but some basic RTFM before using a new construct doesn't seem too much to ask.

> That just doesn't jibe with the mental model of C that I had in my head

I'm curious, what mental model of sizeof() did you have in your head, and how did it work?

It was a bug introduced in C99 -- C versions before this did not have this bug. I would say it's an extreme oversight on the part of the standardization committee. They standard has this whimsical statement:

>Where a size expression is part of the operand of a sizeof operator and changing the value of the size expression would not affect the result of the operator, it is unspecified whether or not the size expression is evaluated.

I think it's a long-winded way of saying "my bad"

One could equally point out that programmers should form mental models by reading documentation, rather than relying on intuition. It's not like you have to dig through compiler sources to find out that sizeof is an operator. Even if it wasn't plainly documented in pretty much any properly-written material on the C programming language, it's pretty obvious that sizeof can't be a function, since a function doesn't have access to the sort of data required in order to retrieve this sort of information.

The reality is people form mental models by intuition. Things designed for humans to use (like programming languages) should take this reality into account.

And then people wonder why other engineers smirk when they hear the term "software engineering" :-).

We're not all fitted with the same Umbrella IntuCard Mk. II Intuition card so that, relying on intuition alone, we can all reach the same results from the same inputs. Relying on intuition to figure out how a language works is about as smart as relying on intuition to figure out the limit of a series, how a telephone work or how a transistor works -- all three of which have been definitely designed for humans to use, since there were nothing but humans who could use them back when they were invented.

I don't like it any more than you do, and my life would be much easier (and probably happier) if it weren't like this, but blaming it on the language is the equivalent of the turtle blaming calculus after losing the race to Achiles.

I don't see how your series or resistor examples are particularly relevant: neither mathematical nor physical facts are human inventions. The packaging is, but I'd argue that the packaging for both is relatively intuitive for someone likely to be encountering them.

For this example, there is a large amount of intuition both from C specifically and more broadly in programming (i.e. the background of people likely to be encountering this) that foo(bar) is a function call that evaluates all its various components. The syntax used for many "calls" of sizeof violate this intuition. The uniformity that leads to this corner case may indeed be the right choice (and probably actually is, IMO), but discarding intuition---and assuming everyone is a robot who's memorised all corner cases of their tools and never makes a mistake---is dumb, especially in a relatively restricted domain like programming where there is a reasonable set of common/shared assumptions to build on.

Lastly, the "a good carpenter never blames their tools" sentiment is also dumb, as a universal guideline: tooling can be objectively bad and unhelpful, and, even if someone can make it work, there will be downsides. Taking things to the limit, it's clear that "C but instead of text, just write down the UTF-8 encoding of a file as one huge base-10 number" would be an actively bad programming language and lead to all sorts of problems over normal C. The same thing holds for, say, C compared to other tools: people make objectively more of certain particularly bad classes of errors when writing C because the tool isn't helpful enough.

> I don't see how your series or resistor examples are particularly relevant: neither mathematical nor physical facts are human inventions.

Mathematical facts are a very human invention, just like the transistor. I picked calculus precisely because intuition based on physical observation gives the wrong answer nine times out of ten.

> assuming everyone is a robot who's memorised all corner cases of their tools

But this is not a corner case! I'm not talking about an obscure quirk (God knows how many C has!), this is literally syntax, the first thing you learn about a language!

> transistor

(Whoops, but my point still stands just as much.)

> Mathematical facts are a very human invention, just like the transistor. I picked calculus precisely because intuition based on physical observation gives the wrong answer nine times out of ten.

I think I didn't convey my point very well: humans and their intuitions have no influence over the value of the mathematical series 1 + 1/4 + 1/9 + ... nor do we have any influence over the behaviour and interactions of certain doped (etc.) silicon. Both of these just are, outside the influence of humans.

Certainly, for the former, we have control over the syntax we use ("the packaging")/how we write it down, but, say, changing what "1" means changes the series to a different object. In any case, control over the syntax is exactly like a programming language, and indeed, mathematical notation is very similar and it's great when people have sympathy for the reader's intuition (and their own) when designing new instances. (The ⌈x⌉ notation for the ceiling of x is a nice example of this working well.)

Similarly for a transistor, humans can decide how we put the bits of silicon together, and what wires we connect up, but we absolutely cannot decide the fundamental principles that make them work. This means, if we want a certain behaviour, we're limited by what physics can do.

> But this is not a corner case! I'm not talking about an obscure quirk (God knows how many C has!), this is literally syntax, the first thing you learn about a language!

Yes, you're right `sizeof(foo)` is syntax... that looks a lot like `function(foo)` but behaves differently with respect to any side-effects in foo. As I said above, I do actually think this was a reasonable choice on C's part, but it seems silly to, seemingly, dismiss any possibility that it could be confusing & that maybe future languages/language designers should avoid doing this (or at least explicitly think about the choice they're making) and instead just repeat RTFM over and over.

The reality is that any model formed for any programming language that isn't based on reading the documentation has no chance of being correct...

Whether or not that is true, that doesn't contradict what I said. Taking into account intuition doesn't mean one can't or shouldn't do some unintuitive things, just that it should be an explicit decision. In the extreme, no amount of documentation about how the "+" operator does a multiplication is going to make that a good language design choice, and this applies to smaller papercuts too.

But it can't be a function, because functions don't have access to that information.

Many people comments can be reduced to "Yes, you can shoot yourself to the foot. Don't do that".

The problem, however, is whether we call ourselves "engineers", and not just "coders". Real world engineers take responsibility for anything related to their project, even if it's not their job.

Flying instructors often teach "everyone is stupid for 15 minutes a day". That's why every plane control is made to prevent errors, even if other prevention measures failed. Of course, we often don't need that kind of reliability in computer programming.

But in cases when we really do "engineering", and not just "programming", we want to prevent shooting to the foot, even when user points out gun to their foot and pulls the trigger. That's why they put safety switch on guns.

File under:

"When I poke myself in the eye, it hurts."

I don't think in 40 years of programming in C and C++ I have ever seen an example of someone doing something with sizeof() quite that weird, and I've seen plenty of weird things.

Plus one for the happy ending that they won't be doing that again.

Is this a pitfall people actually fall victim to? It's an interesting corner case, to be sure, but has anyone been bitten by it in real work?

I doubt it. Expressions like this raise all sorts of red flags in code review.

C is OK: it is not your friend, it is a tool.

I don't understand why this would come up. When would you ever use ++ on the result of evaluating an expression.

Edit: Now I see the issue. Still, this should be easy to avoid. Don't use expressions with side effects as the input to a macro.

> sizeof (char[i]) is equivalent to i * sizeof char, and will be evaluated at run time.

Correction: sizeof working_array is equivalent to...

my bad, "sizeof (char[i])" actually works!

Jamming `++` and `--` everywhere is bad. Don't do it. Hooray we can use C again.

> The length of the working array depends on the value of i.

Not classic C90; only the brain damaged C99 dialect.

Prior to C99, sizeof expr is a constant expression, always.

You can easily program in C90. Plenty of currently maintained or even new projects are C90 only.

If using gcc, use gcc -ansi or equivalently -std=c90, and there you are.

There is a way to detect (albeit at run time) when a macro is given an expression which contains side effects. This:


compile the sfx.c into your code, along with the except.c and hash.c modules. Then you can use the SFX_CHECK macro in your macro bodies. For instance, the dict.h header file uses it like this:

  #define dict_isfull(D) (SFX_CHECK(D)->dict_nodecount == (D)->dict_maxcount)
  #define dict_isfull(D) ((D)->dict_nodecount == (D)->dict_maxcount)
If the debugging is enabled, and some place in the application calls dict_isfull(d++), then there will be a diagnostic, thanks to code generated by SFX_CHECK(D), which parses the expression (or pulls the parse out of a cache) and then diagnoses based on the results. Unfortunately, this is a run-time check: the expression has to be stepped on.

(A bit of work could turn this into a static system. E.g the macros could spit out a special annotation, then the output of gcc -E could be collected and post-processed to look for the annotated material and determine if there are side effects.)

I wrote this in around 1999 and it hasn't been maintained since.

The parser has to work with incomplete information and hence deal with ambiguities. For instance (a)(b) could be a function call or a cast of (b) to the type (a). We don't know and so we have to conclude that this is a "maybe side effect" (function call). In the face of ambiguities, we parse the expression multiple ways. If it parses correctly in two ways, we conclude pessimistically: if one of the ways of parsing it suggests there is a side effect we go with that. If errors are encountered in some of the parses, backtracking takes place, and this is done with the help of exception handling, which is built on setjmp/longjmp.

It shouldn't be too hard for compilers to warn about this, I think.

C of course is not your friend, C is always computer's friend.

c is not your friend as a knife is not.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact