Specifically they say GCC requires this form for the busy loop to be emitted:
for (int i = 0; i < 1000000; i++)
asm volatile ("" ::: "memory");
Where 9c will output a bunch of useless code when you tell it this:
for (int i = 0; i < 1000000; i++);
And this is... a good thing?
>Plan 9 C implements C by attempting to follow the programmer’s instructions, which is surprisingly useful in systems programming.
It's like coding with -fno-strict-aliasing or -fwrapv in GCC, it's perfectly fine and justifiable but that doesn't mean that it makes sense for a compiler to default to it IMO because you're basically lulling your devs into writing into a specific dialect of C instead of the "real" language. It means that your code is effectively not portable anymore which is probably less of an issue for low level kernel code but could still easily cause issues as code is shared between projects. Again, there are situations where it makes sense to do so but I strongly believe that it should be an explicit choice by the programmer, not a compiler default.
Now I would argue that the for loop example is even worse than aliasing or wrapping-related issues because I very rarely write busy timing loops but I do very often write for loops that I expect the compiler to optimize (drop useless code, unroll etc...) correctly. So yeah, that really seems like a way to spin a limitation of the compiler into a "feature" that makes really little sense.
Also I just checked and gcc 8.2 does output the loop code when building with -O0 I guess they could alias that to --plan9-mode.
I feel like the "Plan 9 C" author would argue that optimizations like that should be explicitly enabled using inline pragmas, where something that has an optimization pragma is requiring the compiler to optimize it (so if it can't be optimized, the compiler should generate an error) and anything without the pragma requires the compiler to not optimize it. (And then you can have an "optimize if you can" pragma, too, but its usage would be comparatively rare to either explicitly requiring or disallowing optimization.)
Whereas, with regular C compilers—unlike compilers for most other systems languages—optimizations get turned on by a compiler switch entirely outside of the code, and then what gets optimized and what doesn't is invisible, and there are both no guarantees that anything will be optimized, and no guarantees that anything won't be optimized (unless you "trick" the compiler by using things like the asm volatile() above.)
I'm not sure if I personally agree with the PoV I just stated, but I think that's what they're thinking.
Recognizing and preserving special syntax patterns requires additional work and can add substantial complexity. This is a common dilemma in software engineering, especially high quality software that applies sophisticated algorithms. The smarter a compiler in terms of the application of state-of-the-art algorithms, the more that these rigorous (but sometimes annoying) optimizations naturally happen. On the other hand, anything that breaks abstraction boundaries results in complexity which can make comprehension and maintenance quite burdensome.
If you've ever written code to build and transform an AST it should be obvious how difficult it can be to add in ad hoc logic that leads to inconsistent treatment of nodes. Even adding pragma opt-outs can add substantial complexity. The Plan 9 compiler recognizes this because it basically does no optimizations. In that sense it behaves much like GCC in preferring simplicity over ad hoc semantics; both recognize that to "have your cake and eat it too" is too costly.
Fortunately, C does make it relatively easy to compile different source units independently. So all you really need is a single mode that disables all optimizations, and put your special code in its own source file. But the trend is to remove this separate linking step (Go and Rust both do static linking across the application), and even C compilers are defaulting to so-called LTO which effectively recompiles the application at link-time and which deliberately violates previous semantics regarding cross-unit transformations and optimizations. That's something of a shame.
GCC does permit all manner of function-level attributes, but it adds substantial complexity, which is why clang and most other compilers don't support such flexibility to the same degree, and why GCC is often reticent to support yet another option.
Which, I might add, is a very silly thing to say. A programmer's intent and their written code are two very different things. How one maps to the other is defined only by the C standard, which says nothing about emitting specific assembly instructions, but only about the ultimate effect of code on memory.
The Plan 9 compiler deciding to pessimize your code because it assumes you actually meant for the code to be interpreted as portable assembly rather than a high-level description of a computation is kind of presumptuous. At that point it's just a different language with different (albeit compatible) semantics.
Interestingly, with the exception of long long, these are the features that effectively forked C and C++.
For example, if you're writing a spin-lock, the compiler may lift a read of the lock value out of a loop because, assuming a single thread, the value will never change. This can result in a non-terminating spin-lock. For more see Linux's ACCESS_ONCE.
The example you gave is unfortunate but the consequences of optimizing loops carelessly can be serious.
After all, not just the compiler, but also the processor can reorder operations. So you have to annotate synchronizing memory operations regardless of whether the compiler is optimizing. e.g., a lock-free algorithm implemented using only volatile (what ACCESS_ONCE does), even with -O0, is almost certainly wrong.
The alternative to explicit annotation is for the compiler to generate full memory barriers around every memory access. That would indeed preserve semantics in a multithreaded context, at a ridiculous performance cost.
The example I gave is simple and relates to the example of the parent but there are more complex cases for which it is a matter of ongoing research to define a semantics that also admits compiler optimizations.
For example the "well-defined" semantics of (C|C++)11's atomics admits executions where values can materialize out of thin air .
The broader point I was hoping to make is that optimizations are great but are not free in a multi-threaded context with data-races (even benign ones). As a consequence the choice to just remove many of them is one that is supported by many people in the weak-memory community and even appears in newer memory models . For example preventing read-write reorderings to prevent causal cycles.
 http://gee.cs.oswego.edu/dl/html/j9mm.html (ruling out po U rf cycles)
Because perhaps it contains a body that optimizes away based on conditions out of control of the programmer? This happens all the time with macros/templates, and with platform-agnostic code. Only the compiler can resolve what's in the body; I want to trust the compiler to remove the loop if it is useless.
#define REG_VCOUNT *(volatile u16*)0x04000006
while(REG_VCOUNT < 160);
for (int a = 0; a < 10000; a++);
C compiler optimizations seems like micro-optimizations when people should be looking at the bloat elsewhere. Missing the forest for the trees.
C is basically a low level language. A portable assembly language. A predictable compiler shouldn’t second guess the programmer’s intent. To put things in perspective, if all the man-years spent on gcc were spent on GNU Hurd... :-)
EDIT: On x86. If you don't cross-compile your raspberry pi kernel, you're in for a bad time.
Here is an AI chip:
It's an interesting proposition b/c they using RISC for the core, but the APUs are custom - so they can create some lock-in there for themselves (without lock in it'll just be a race to the bottom with razor thin margins)
And here is RISC-on-an-FPGA in a nice package. It's very Chinese hobbyist oriented
Both those projects are by Zepan. That guy is a machine
But I'm not quite sure what's holding up general purpose CPUs (even just something crappy/good-enough)..
The way I understand it CPUs aren't just beefy microcontrollers and they require some extra onchip hardware, but no one has done that yet for some reason.. Maybe someone knows better :)
For example, Graphics, Bluetooth, Wi-Fi, modem, are all heavily encumbered with patents. Very complex subsystems. Even components that have expired patents or no patents, such as an MMU, are non-trivial to create and take time. I suspect it'll take time before FOSS implementations appear.
There's general-purpose RISC-V CPU RTL lying around, and it's not too difficult to license the necessary peripherals, but it costs money to put together a board and fabricate at volume if you want to hit a Raspberry Pi/hobbyist price point. Unfortunately, it takes time and you need a market to justify the effort. But eventually it'll happen.
- Kendryte KD233
- HiFive1 (https://www.sifive.com/boards)
- GAPUINO GAP8 (https://greenwaves-technologies.com/product/gapduino/)
- HiFive Unleashed (https://www.sifive.com/boards/hifive-unleashed)
Those are the only ones that exist commercially as far as I know.