Hacker News new | past | comments | ask | show | jobs | submit login
When static makes your C code 10 times faster (mazzo.li)
201 points by rostayob 25 days ago | hide | past | favorite | 108 comments

This can be accomplished more simply and reliably by marking modulus as const. The compiler currently has to reason about the whole compilation unit to determine that modulus is not modified, which works. However, if future code modifies modulus (either on purpose or accidentally) or something changes that prevents the compiler from performing global reasoning, the optimization will be lost. By marking the actual intention, any modifications of modulus turn into compiler errors. Plus, if it becomes important to expose modulus to another compilation unit, now that's possible.

This is a common issue with C code (including lots of code I've written). It's really easy to forget to const something, which forces the compiler to do global reasoning or to generate worse code. I've gotten into the habit of making things const unless I know I plan on mutating them, but I wish there was tooling that encouraged it. (BTW, this is something Rust does well by making things constant by default and requiring "mut" if it's mutable.)

I agree. Static alone was the incorrect choice. If the global were being modified elsewhere in the file then this optimization would also not be possible.

The core optimization is modulus % constant (and a power-of-2 as well). The static just enabled the optimizer to do better heavy lifting to get there. A const would've made the intent clear to the human reader and the compiler.

static const would've been best.

You'd think the compiler would let you know you have a constant that is not labelled as such, like the way `tslint` complains about this incessantly. (I think `splint` for c/c++ may also do this, but I've only briefly used it.)

The compiler only operates on one unit (file) at a time so it has literally no way of telling this in C. There are legitimate uses for having a non-const global which is never modified by local source: library config options, hooks for external programs, and what not. As you say, this would be a job for the linter.

You can get pretty far with a compiler warning like "warn if a global isn't preceded by an 'extern' declaration". Also, LTO does have enough information to warn about these things, especially with default hidden visibility.

The global must be declared without "extern" somewhere or else no memory is allocated for it.

LTO could handle this if you're compiling an executable, but not a library.

> LTO could handle this if you're compiling an executable, but not a library.

That's not the correct distinction, that's why I said default vs hidden visibility.

Libraries typically export more symbols but executables can also export them eg for plugins.

If there were any expressions that took the address of the variable, then even both `static const` qualifiers wouldn't work for a sufficiently paranoid compiler.

Are you sure? Isn't it UB to modify a const object? It will probably end up in a non-writable memory page.

EDIT: gcc seems to agree with me: you can see the optimized version here[1] and the unoptimzed version if you remove "const".

[1] https://godbolt.org/z/KWrW45rK8

This is why language standards specify what the compiler can assume and call out some behavior as undefined, exactly so compilers don't have to be paranoid and produce code that sucks. If an underlying object is const, the compiler is allowed to assume that it does not change (it is valid to cast away const on a pointer or reference, but not if the object itself was declared const).

> (it is valid to cast away const on a pointer or reference, but not if the object itself was declared const).

Isn't it valid to cast to non-const for a const, but only invalid to modify the const through the casted pointer?

Interestingly GCC will still optimize the loop function even if you have code that modifies the modulus. https://godbolt.org/z/EE9PnrY7s

That code is invalid, it would give a compiler warning and possibly a runtime exception.

Const is to state that this module is not allowed to modify. Think on what'const volatile int foo' means.

Mostly seen in embedded space.

i think this is most compilers.

Author here -- I agree that const is better. The perf difference I encountered was due to the static keyword though, which is why the blog post talks about that specific issue.

The perf difference was due to the compiler being able to infer 'const' by way of 'static'.

Advocating 'static' when your actual intent is 'const' does less experienced readers a disservice; they will assume that 'static' is meant to make things faster, and be disappointed when it doesn't work for non-constant values.

I am not advocating static. I'm advocating for looking at what the compiler outputs when surprising behavior is encountered. The example is extracted from a larger piece of code, and I reported the minimal case as-is.

If anything, the title is meant to be read as "isn't it amusing that something apparently unrelated such as `static` causes a performance improvement".

That said, I have added a note clarifying this at top of the post now.

I thought you made your point neatly and succinctly. Code can be optimized on multiple levels; the compiled result is clearly more efficient. This lesson applies regardless of the language being compiled.

I started to use const whenever possible after being familiar with some compiler optimizations and the Haskell pl. The point is knowing how to give the compiler an easier job.

Its better to make it constexpr than const or static.

> It's really easy to forget to const something, which forces the compiler to do global reasoning or to generate worse code.

Global mutable state is pure evil. Don’t write globals.

It shouldn’t be easy to forget const on a global because a mutable global should produce immediate revulsion and nausea.

(I don’t really consider a const global to be “a global”. So ordinarily I’d just say globals are evil don’t write globals. But I’m trying to be explicit here.)

This is one of those sayings that I don't think is helpful. What it should be is: limit variable scope to the smallest thing it can be.

Nobody is being fooled about global state when you have a singleton database connection, event bus router, or network stack. I don't think your program is better when you pass in i/o functionality to every single class context in the constructor.

Similarly a mega-class that encapsulates everything your program does is also a code smell. There's no point to a private variable when everything can access it.

I respectfully disagree on both points.

> I don't think your program is better when you pass in i/o functionality to every single class context in the constructor.

Abstracting over I/O transport is an excellent thing to do. This allows you to do things like easily record and replay a network stream. Which is useful for both debugging and automated tests.

I/O comes in a kazillion flavors. Networked, interprocess, serial port, file, synthetic, etc etc. It's definitely something that should be abstracted around and not doing so is something I've deeply regretted in the past.

> a mega-class that encapsulates everything your program does is also a code smell

Ok I agree it can have a foul odor. But even this can be advantageous.

Once upon a time Blizzard gave a GDC presentation about Overwatch. Kill-cam replays are notoriously difficult in video games.

Blizzard's solution to this was delightfully elegant. They made two copies of their world. One perpetually runs on latest. One takes snapshots of the world every N frames. When a player dies their viewport switches to the old snapshot which then simulates and renders for ~6-10 seconds. When the replay finishes or skips the viewport switches back to the main game, which never stopped receiving updates. This was a relatively trivial implementation given the complete lack of globals and singletons.

A mega-class lets you run parallel instances of your "world". It's also a nice pattern when you want to build-up and tear-down your world in-process and guarantee no stale state. For example when running tests you likely want certain tests to "start clean". It's nice to be able to do this without restarting the entire process.

I'll double-down that globals are evil. They are a sometimes necessary evil. Or the least bad choice. But my experience is that not using globals is almost always simpler, more elegant, more flexible, and ultimately preferable.

There is one common situation where not using mutable globals is a recipe for complexity and a serious code smell. That case is when your internal structures directly control the physical resources of the machine. There is no amount of window dressing that can make these anything but global structures because that is what they literally are. That gets hidden a bit if you delegate resource management to the OS but you can’t do that if you care about performance.

And if you are creating and scheduling all of your concurrency in user space, which is common for some types of server software, then passing what are effectively global objects down the call stack becomes a real mess and introduces a number of suboptimal behaviors in the code gen.

I’ve seen people try to design database engines, the high-performance kind that directly manage all the resources they use, that don’t use globals in a misguided attempt to adhere to this heuristic. The end result was a convoluted mess of indirection that just obscured the reality that all of those objects were mutable globals. If you care about performance then I/O isn’t very abstract; the code knows exactly what kind of device it is dealing with.

I don’t like mutable globals as a general rule, but for some types of software they are unambiguously the correct engineering choice and not using them would be a design defect.


There are definitely resources which are globally unique. But that does not necessarily follow that access should also be global.

Rust has some elegant patterns when working with embedded devices. For example GPIO pins could totally be stateful globals. But instead their passed around as types and Rust’s type system + borrow checker ensure correctness. It’s pretty neat.

I’ll assume you’re right for databases. My expertise is real-time VR video game type stuff. Which is also high-performance, but of a different variety.

I strongly agree that layers of abstraction compound into convoluted and inscrutable garbage. I loathe web development for this very reason.

Rust struggles with software where most of the address space is used for DMA, like many database engines, because it requires ownership and mutability to be observable at compile-time. Also, DMA often does not respect object boundaries as a compiler sees it because DMA does not understand objects.

In modern C++ it is straightforward to write wrappers in the style of unique_ptr that safely hide the DMA and life cycle mechanics, which means the average dev using them doesn’t need to know how it works, but someone has to write that code and it is necessarily global heavy because it references physical devices that have their own behavior in your address space. Under the hood, if a physical device is stomping on address space your code accesses, you need a way to both detect that an object is effectively owned by a particular DMA engine before touching it and immediately de-schedule the thread of execution until such a time as there is no concurrent DMA operation that might conflict with the code execution. This happens within a single thread, so no blocking or OS context switching.

A big part of database kernel internals is coordination and management of physical resources, which are global by nature.

I agree with everything you said. Global variables also make it much harder to use multithreading. The mega-class is only bad if many parts of the code require references to it. In a well-structured program most subsystems will only need references to a few of the mega-class's (transitive) members.

Back in 2012 I observed exactly the same thing. I actually managed to get a warning for this added to Clang, called -Wmissing-variable-declarations. When set, warnings are generated in case a non-static global variable is defined without an external declaration that precedes it.

I worked on this as part of FreeBSD, which is why their base system is nowadays built with that flag enabled.

Thanks, that's good to know!

Without `static`, compiler exports a symbol.

    $ cat value.c 
    int value = 42;

    int get_value() {
        return value;

    $ make value.o
    gcc    -c -o value.o value.c

    $ nm value.o
    0000000000000000 T get_value
                     U _GLOBAL_OFFSET_TABLE_
    0000000000000000 D value
This symbol, not being `const` can be modified by any other compilation unit.

    $ cat main.c
    #include <stdio.h>

    int value;
    int get_value();

    int main() {
        value = 123456789;
        printf("%d\n", get_value());

    $ make main.o
    gcc    -c -o main.o main.c

    $ cc value.o main.o

    $ ./a.out
Compiler when generating an object file has to assume the value of exported non-const symbol can change. It's necessary to tell the compiler that the value cannot change, either by not exporting the symbol by using `static` or making the value of it `const`. In example provided in your article `static` makes sense (or even `static const`) as I don't think there is a reason to export this global.

Yes, see last few paragraphs of the post, in which I also speculate why GCC doesn't infer that there is only one compilation unit.

How could it ever infer it, given C's compilation model?

The only option is when all TU are given at the same time to the compiler, or when LTO is used (in which case it is actually the linker doing the work).

Even then, this won't apply to libraries.

> The only option is when all TU are given at the same time to the compiler

Exactly -- which is the case here. But implementing such a cross cutting implementation would probably be annoying, which is what I wanted to convey with

> I think they could concievably assume that the value of modulus won’t be changed in this case, since we’re producing an executable directly, but it’s probably annoying to have an optimization looking so far into the future of the compiler pipeline.

I quite like C syntax, and I sometimes long for a language that would change a few elements (default to const and static, different/safer standard api for strings). Some might call it Go but a simple slightly updated C would be nice.

Have you seen zig? It's a modern language that tries to keep the simplicity and explicitness of C (it does change the syntax a bit though).

I really wish Zig had followed Rust’s lead and made variables const by default :/

Variables are immutable by default in Rust, but they're not const by default. The difference is that a variable can become mutable while a const is always immutable.

How do you set a variable as mutable after the fact in rust?

You can do:

    let foo = "1234".to_string();
    let mut foo = foo;
    foo = "5678";
This only works if you have ownership of the variable though. And in practice it's a pretty good compromise, because it's pretty hard to do this by mistake.

Note: const in rust is different to const in some other languages: it represents a value that is duplicated inline into each use site at compile time.

Yes I think i see it more like constexpr.

That is pretty much D (assuming you only want to use the C-like bits) - it's not const by default but const is much easier to use and much more violent

And arrays are bounds checked by default, no more issues with them (and strings are arrays)

The appropriate way to mark something as known compiletime is to make it constexpr, nothing else.

I agree with many of the sibling comments, static vs. non-static is almost a complete red-herring.

Static only means that the variable is local to the translation unit (the C file). The relevant difference in the example is actually the const-ness of the variable, which you may put explicitly, but which a powerful compiler can also infer here in the static case.

Other than this optimization possibility, const and static are orthogonal concepts. I'm not sure to what extent the article author is aware of this.

So the lesson should be: use const (or #define) if you mean to have a constant. It's still a good idea to also make things static, but the real reason for that is to avoid name collisions with variables in other C files.

this is totally wrong in my experience, although some of the understanding is right. by using static to limit to the compilation unit the compiler can workout the value is constant.

const doesn't do this thanks to const_cast etc. maybe things have changed, but const on a file level variable doesn't do this reliably, or at least hasn't for considerable lengths of time.

of course 'static const' is the better answer. :)

I hope your comment wasn't intended to be as hostile as it reads to me, but I do wonder what your experience was exactly, because there might be a misinterpretation.

From a technical standpoint, using const_cast to modify a constant is Undefined Behavior. This was explicitly specified that way to make constants inlineable by the compiler. Every compiler that I've ever used does this, it's a trivial but very effective optimization.

Proof by Godbolt: https://godbolt.org/z/djEdvee4s

Observe how GCC completely eliminates the contents of undefined_mutate_modulus (which it's allowed to -- UB means it can do anything with that function, and it chooses the simplest possible thing) rather than de-optimizing mod4_const like you suggested. Compilers are smart.

I think a simple “const” would have also done the trick. Sometimes -O3 is clever enough to figure out that the value is never written, so it makes it an asm constant.

It is mostly about the fact that if the variable is not static, then it's non-local to the translation unit and can be modified from everywhere else. So its value needs to be loaded and a plain and slow division is applied.

Having it local or const makes the compiler able to inline it and do a simple bitwise and with a constant.

So yes, make your variables static const by default (if you really need global).

const_cast and linkage make const weaker than static on a variable that doesn't change.

Author here -- just to be clear, I agree that marking the variable as const is the "right" thing to do here. I reported the investigation as-is because removing a static declaration made the code slower, which I then narrowed down to the isolated bit of code.

We have learned so much over the decades of using C. Lowest-visibility-by-default is just one of the many good choices of Rust and other C successors. The example here is for codegen benefits but reducing visibility benefits encapsulation too.

As an embedded programmer working on small micros I make every single function static (including third party code, which I modify and bring into the source tree and curate myself), gives you global/link time optimizations and dead code elimination for free, leads to better code even at -O0, -Og and -O1, static const configuration structs/values gets optimized away, and so on.

Really wish it was the default.

It also gives you a nice private namespace for the translation unit so you can confidently modify any of the static objects without concern for impacting external code.

I once had a gray beard chew me out over changing a function signature when revising things. I had to point out politely that it was static and that anyone who managed to link to the function had to be breaking a lot of rules to do so.

Const / static allows for "immediate" ASM instruction generation: which means the value is known at compile time so it can compare it directly inline as opposed to the overhead of comparing it to a labeled memory address. It's generally good practice whenever possible.

Isn't that constexpr?

Isn’t constexpr c++-only?

No, its C11 too


static doesn't imply the value can be known, only in some cases where its not modified in that compilation unit (which is trivial to detect with SSA)

const is something else.

Wouldn’t making that const or using #define be a bit cleaner?

Honestly if I as the programmer knew that I was really trying to select bits from a number I’d just use a binary and directly. In that specific situation I think the intent is more clear that way. Like:

//select the bottom 8 bits

unsigned bottom8 = val & 0xff;

My first C intuition would be defining this value as a macro. If I was in C++, then a const would make sense.

> If I was in C++, then a constexpr would make sense.

If it was in C++, what you really want is consteval or constinit, constexpr only for pre-C++20.

Title rephrase:

When the compiler can assume your values don't change magically, it can optimize their use.

This is true for restricted pointers, for global-scope variables which can only be accessed in the same translation unit, for stuff in inlined functions (often), etc.


const is a bit shifty. const makes the compiler restrict what it allows you to write, but it can still not really assume other functions don't break constness via casting:

  void i_can_change_x_yeah_i_can_just_watch_me(const int* x)
     *(int*) x = x + 1;
now, if the compiler sees the code, then fine (maybe), but when all you see is:

  void sly(const int* x);
You can't assume the value pointed to by x can change. See this on GodBolt: https://godbolt.org/z/fGEMj9Meo

and it could well be the same for constants too. But somehow it isn't:


Specifically it's well-defined behavior to mutate an object after const_cast-ing away constness if the object wasn't const to begin with (const references or const pointers can refer to non-const objects).

In your first example you have `int x = 1;` which isn't const, so the compiler has to assume that `f` may mutate it after const casting.

In your second example you have `const int x = 1;` which is const so the compiler can assume the value will never change.

Except that on embedded const objects might land on read only memory and cast-ing away constness will give hours of debugging pleasure.

They covered that with if the object wasn't const to begin with.

Partially "(const references or const pointers can refer to non-const objects)".

Here are the references to linker scripts?

I'm afraid I'm not following.

I believe this account of things is accurate (ignoring C++'s reference types for simplicity):

The C and C++ standards are written in such a way as to enable compilers to place constant data on read-only memory. Casting away constness is never illegal in and of itself, but if you do so and then assign to a variable which was declared as const, that is undefined behaviour.

Similarly, assigning into the character array of a string literal is undefined behaviour, whether or not const was used. Again that's to enable the compiler to make use of read-only memory.

Going in the other direction is safe, as preventing assignments doesn't introduce problems. That is to say, using a pointer-to-const type to point to a non-const variable poses no problem.




You must be right. But - it's so easy to forget that! Or - never to be told that in the first place.

static can also make your code 10 times smaller: note that in the linked godbolt, there are actually two copies of both loop functions: one regular, and one inlined. this is because the compiler wants to inline the function, but is required to generate an additional one in case someone else will be calling it. what's more, at least on Linux, this copy cannot be removed from the final executable even if nobody calls it, unless a) the compilation is done with -ffunction-sections and the linking is done with --gc-sections, or b) LTO is enabled. adding static to the function declaration resolves this issue.

the situation is even worse with ELF dynamic libraries due to the interaction of two rules: a) by default, all functions are exported, and b) by default, all functions can be interposed, e.g. by LD_PRELOAD. here, if you specify -fPIC in the compilation arguments (as is required to produce a modern dynamic library), inlining is totally disabled. for small functions, the call overhead can be substantial.

Great write up: a precise problem that digs into the internals pointedly to teach a simple concept. This is exactly how I tell the junior devs where I work to do lunch talks that give people some concrete, memorable piece of learning that makes them better in their practical work.

Link time optimization would enable a similar change even without a code edit. Using static is good, but it’s a good idea to figure out how to just let other people’s code run fast too.

Only if the value isn't visible outside the final image. This will still happen if it has "default"/dllexport visibility or its address is taken.

In ideal world const, constexpr, explicit (for constructors), and no default implicit conversions (and others that I'v missed) should've been the default...

I think this is beyond simply making the variable static/constant. It is the specific value of the constant that is allowing the division to be substituted with bitwise AND, which then makes it so much faster. I wonder how much the speedup would be if some other near-random value is there for the constant (which is likely beyond the purpose at hand).

Speed up would be less but not zero, almost any div or modulo can be replaced by a multiply by a magic constant plus a bit shift.

If I could travel back in time I'd tell Dennis to make "static" the implicit default, and have a special keyword like "public" or "export" for items that are meant to be accessible from outside the compilation unit.

I'd also ask him to make "switch" break by default.

Then I'd go kill Hitler or something.

But then you would need 'unbreak' or just 'goto' to the next case. No duffs device no cigar.

Probably a module/namespace system would be the biggest improvement.

FWIW: Go's switch statements are break by default, with an explicit 'fallthrough' keyword if one wishes to override that behaviour.

clang and gcc have a -Wimplicit-fallthrough warning flag that will warn about switch cases that fall through without either C++17's [[fallthrough]] attribute or a /* fallthrough */ comment.

I made Firefox's code base able to compile with -Wimplicit-fallthrough. About one hundred fall through cases needed to be annotated and about 2-3 were actual bugs (though minor).

The only real case I use switch fallthrough for is when you have two cases with the same code, e.g.:

  switch (foo) {
  case 1:
  case 2:
    /* common body */
  case 3:
    /* body 3 */
It's not cognitively hard to make an empty case body fallthrough to the next run, but include an implicit break at the end of every nontrivial body.

Supporting Duff's Device is not a compelling feature to support--irreducible loops are basically going to destroy any hope of optimization you might accrue.

> The only real case

which is probably the case in 90% of all my switch'es. Worst use of a time machine ever.

> But then you would need 'unbreak' or just 'goto' to the next case. No duffs device no cigar.

From Algol's linage point of view, a very good outcome.

Yes, but you would need way fewer of such phrases than you need now. Making the exceptional case take more code is the way to go, IMO.

Many languages also support something like

  case 1,2,4:
or even

  case 1-10, 20-22, 34:
That further decreases the need for falling through.

Don't forget to tell him to remove (most) implicit conversion and to use a greppable keyword cast(foo) like C++ did instead of "ungreppable" syntax (foo).

That said, I'm more annoyed at C's design commitee to not being able to add new features such as having length-aware arrays in C in the language.

No, even if you could kill Hitler, then WW2 doesn't happen, computers aren't invented yet, so time travel isn't invented yet so you can't head back to kill him. Read your briefing notes.

Also https://tvtropes.org/pmwiki/pmwiki.php/Main/HitlersTimeTrave...

"Locating a lone, disillusioned war veteran wandering around post-WWI Europe is perhaps the ultimate needle-in-a-haystack search"

A time traveler shouldn't have a limit on time to search, so this doesn't make sense as an objection to me.

Apparently everyone kills Hitler on their first trip [1].

[1] https://www.tor.com/2011/08/31/wikihistory/

Because, admittedly, non-default switch breaks are on par with Hitler

More programmers really should have a look at what sort of assembly the compiler generates for their code. Compilers aren't magic, and seeing what sort of code it generates does give authors more insight into how concise their code truly is.

I think every programmer should know, that logical AND is many times faster than Modulus (which is at least as hard as a division), and use & instead of % right in his code for powers of two (and not expect it to be done by a compiler).

This is like the easiest thing for a compiler to detect and optimize.

True, but something to be aware of: If the compiler uses bitwise operations to implement %, it emits special handling of negative values (it's really more of a remainder operator than a modulus). You can avoid that either by using unsigned numbers, or & as suggested:


That's bitwise AND.

Use a const instead. It's the right tool for the job.

>"When modulus is static, gcc / clang know that it is private to the current compilation unit, and therefore they can inline the value itself. Then, they turn the expensive div into a much cheaper and – since

mod’ing by a power of two -- is equal to bitwise and of that number minus one!

All you need to do is keep the bits lower than that power of two, which is what the and will do."

the biggest win here is informing the compiler sufficiently to swap out div for and.

the use of static is just a tool to inform the compiler that the value is a constant (which const /might/)

really? people mentioned const?

you can tell how much low-level optimisation they have done if they think its gonna change codegen reliably, or at ll.

See my other comment here regarding const.

Its better to make it constexpr.

use const is a better choice, I changed static to const, the result is the same here.

So the compiler cannot always optimize...

Division is slow, which is something most programmers don't know. If you can binary AND instead of MOD this can be a huge win.

Multiplication is also very fast, usually one or two cycles on larger chips.

>Division is slow

I wondered if that's really still true, since I haven't done much assembly language programming since PowerPC was new.

Here, it says the M1 has 7-9 cycles latency for division instructions, but throughput of 2 cycles per.


"The M1 is 10x faster than the Xeon at 64 bit divides. It’s…just wow."

So, given all of the other things that can slow you up, I wonder if it really makes sense to avoid division any more?

(I guess the energy efficient "Icestorm" cores have throughput equal to latency, so it's only the "Firestorm" ones where it's super fast)

You shouldn't assume you're running on the performance cores. Not everyone is writing an app, and even if you are, most of your code will be better off on the efficiency cores.

If you're not concerned enough about performance to run on the performance cores, then why worry about...performance?

They aren't always available because you can't always get what you ask for.

If your compiler doesn't do this for divisors known at compile time, get your money back.

Applications are open for YC Winter 2022

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact