The arguments in this blogpost are fundamentally flawed. The fact that they opened a bug based on them but got shut down should have raised all red flags.
When compiling and running a C program, the only thing that matters is "what the C abstract machine does". Programs that exhibit UB in the abstract machine are allowed to do "anything".
Trying to scope that down using arguments of the form "but what the hardware does is X" are fundamentally flawed, because anything means anything, and what the hardware does doesn't change that, and therefore it doesn't matter.
This blogpost "What The Hardware Does is not What Your Program Does" explains this in more detail and with more examples.
The blog post is also kind of unhinged because in the incredibly rare cases where you would want to write code like this you can literally just use the asm keyword.
I think it's also worth considering WHY compilers (and the C standard) make these kinds of assumptions. For starters, not all hardware platforms allow unaligned accesses at all. Even on x86 where it's supported, you want to avoid doing unaligned reads at all costs because they're up to 2x slower than aligned accesses. God forbid you try to use unaligned atomics, because while technically supported by x86 they're 200x slower than using the LOCK prefix with an aligned read.[^1] The fact that you need to go through escape hatches to get the compiler to generate code to do unaligned loads and stores is a good thing, because it helps prevent people from writing code with mysterious slowdowns.
Writing a function that takes two pointers of the same type already has to pessimize loads and stores on the assumption that the pointers could alias. That is to say, if your function takes int p, int q then doing a store to p requires reloading q, because p and q could point to the same thing. Thankfully in some situations the compiler can figure out that in a certain context p and q have different addresses and therefore can't alias, this helps the compiler generate faster code (by avoiding redundant loads). If p and q are allowed to alias even when they have different addresses, this would all go out the window and you'd basically need to assume that all pointer types could alias under any situation. This would be TERRIBLE for performance.
> For starters, not all hardware platforms allow unaligned accesses at all.
Yeah and always everywhere a mistake. It was a mistake back in the 1970's and it's increasing bigger mistake as time goes on. Just like big endian and 'network order'
While the sentiment is correct as to why compilers makes alignment assumptions, a lot of the details here I think are not quite right.
> For starters, not all hardware platforms allow unaligned accesses at all
If you're dealing with very simple CPUs like the ARM M0, sure. But even the M3/M4 allows unaligned access.
> Even on x86 where it's supported, you want to avoid doing unaligned reads at all costs because they're up to 2x slower than aligned accesses
I believe that information hasn't been true for a long time (since 1995). Unless you're talking about unaligned accesses that also cross a cache line boundary being slower [1]. But I imagine that aligned accesses crossing a cache line boundary are also similarly slower because the slowness is the cache line boundary.
> God forbid you try to use unaligned atomics, because while technically supported by x86 they're 200x slower than using the LOCK prefix with an aligned read
What you're referring to is atomic unaligned access that's also across cache line boundaries. I don't know what it is within a cache line, but I imagine it's not as bad as you make it out to be. Unaligned atomics across cache line boundaries also don't work on ARM and have much spottier support than unaligned access in general.
TLDR: People cargo cult advice about unaligned access but it's more because it's a simpler rule of thumb and there's typically very little benefit to pack things as tightly as possible which is where unaligned accesses generally come up.
Many architectures sold today still claim unaligned accesses are optional (e.g. all ARM pre-v7, which includes the popular Raspberry Pi Zero). Not to mention that even if they are supported, not all instructions support it (which is the case today on all ARM cores and even on x86).
From the architectures and instructions which may support it, it may have a performance penalty which may range from "somewhat slower" (e.g. Intel still recommends stack alignment, because otherwise many internal store optimizations start giving up) to "ridiculously slower" (e.g. I once had to write a trap handler that software-emulated unaligned accesses on ARM -- on all 32-bit ARMs Linux still does this for all instructions except plain undecorated LDR/STR when the special unaligned ABI is enabled).
And finally, even if the architecture supports it with decent enough performance, it may do it with relaxed atomicity. E.g. even as of today aarch64 makes zero guarantees regarding atomicity of even atomic instructions on unaligned addresses (yes, really). To put it simply because it is a _pain in the ass_ to implement correctly (say programmer does atomic load/store on overlapping addresses with different alignments). This is whether they cross cache lines or not.
i.e. it's as a bad as the GP is saying. You can't just put one example of one processor handling each case correctly to dismiss this claim, because the point is that most processor's don't bother and those who do bother still have severe crippling limitations that make it unfeasible to use in a GP compiler.
And there is still a lot of benefit to packing things up... but it does require way too much care and programmer effort.
> If you're dealing with very simple CPUs like the
> ARM M0, sure. But even the M3/M4 allows unaligned
> access.
On ARM M3/M4 you have the same issue with LDRD and STRD instructions which do not allow unaligned access. Even the normal load/stores don't allow unaligned access in all cases. Try this in the peripheral memory region for starters. And things get even more complicated when the memory protection unit shakes up things.
Yeah even Microsoft's compiler aligns values on appropriate boundaries for performance reasons. DWORDs on DWORD boundaries etc. And if you want to pack the data structure to avoid the gaps in structures there are methods to do so via #pragma options. I think their complaining about what was done for performance reasons shows a great lack of overall understanding. More time researching and less time griping would have served them better.
> The present blog post brings bad, and as far as I know, previously undocumented news. Even if you really are targeting an instruction set without any memory access instruction that requires alignment, GCC still applies some sophisticated optimizations that assume aligned pointers.
I could have told you this was true ~20 years ago, and the main reason I'm so conservative in how far back gcc has been doing this is that it's only around that time I started programming--I strongly suspect this dates back to the 90's.
It dates to the first standardization of C in 1989. The "C as portable assembly" view ended when ANSI C got standardized, and K&R's 2nd edition was published.
I would argue it's the modern understanding of C standard is flawed.
Back in 89, many of those unspecified behavior were understood as implementation/hardware dependent, not undefined. Aliasing was the norm, `restrict` was actually a keyword.
Ascertaining the state of the mind of the C committee in 1989 is difficult, since only the documents from ~late 1996 are consistently available online (the earlier documents are probably sitting somewhere in a warehouse in Geneva, but they may as well not exist anymore).
But definitely by the time C99 came out, it is clear that optimize-assuming-UB-doesn't-happen was an endorsed viewpoint of the committee [1]. C99 also added restrict to the language (not C89 as you suggest), and restrict was the first standardized feature that was a pure UB-optimization hint [2].
It is important to remember that there isn't just one catch-all category of implementation-varying behavior. There is a difference between unspecified behavior, implementation-defined behavior, and undefined behavior. Undefined behavior has been understood, from its inception, as behavior that doesn't constrain the compiler, and often describes behavior that can't be meaningfully constrained (especially with regards to potentially-trapping operations).
[1] The C99 rationale gives an example of an optimization that compilers can perform that relies on assuming UB can't happen--reassociation of integer addition, on one's complement machines.
[2] The register keyword is I believe even in K&R C and would also be qualified as a compiler hint feature, but I note that it prohibits taking the address of the variable entirely, so it doesn't rely on UB. Whereas restrict has to rely on "if these two variables alias, it's UB" to allow the compiler to optimize assuming nonaliasing.
I haven't gotten to use C in industry, but I was taught that undefined behavior just means that it is defined by the running system and not the compiler. Is that not the general understanding? Maybe I was just taught that way because it was old timers teaching it.
If the language standard leaves some behavior undefined, other sources (e.g., POSIX, your ABI, your standard library docs, or your compiler docs) are free to define it. If they do, and you are willing to limit your program’s portability, you can use that behavior with confidence. But they also leave many behaviors undefined, and you can’t rely on those.
For implementation-defined behavior, the language standard lays out a menu of options and your implementation is required to pick one and document it. IMHO, many things in the C standard are undefined that ought to be implementation-defined. But unaligned pointer accesses would be hard to handle that way; at best you could make the compiler explicitly document whether or not it supports them on a given architecture.
Implementation Defined behavior means the standards authors provided a list of possible behaviors, and compiler authors must pick one and document which they picked.
Unspecified behavior is more what you're thinking of, though in that case the standard still provides a list of possibilities that compiler authors have to pick from, they just don't have to document it or always make the same choice for every program.
There's no allowed subset of behavior where compiler authors are free to pick whatever they want and document it (but must do so). IMO there should be, most "Undefined Behavior" could be specified and documented, even where that choice would be "the compiler assumes such situations are unreachable and optimizes based on that assumption" like much of current UB. At least it'd be explicit!
> Implementation Defined behavior means the standards authors provided a list of possible behaviors
The standard definitely does not require implementations to pick from a list of possible behaviors. All the standard requires is that the implementation document the behavior.
For example, the behavior on integer demotion is implementation-defined and there's no list of possible behaviors:
> When an integer is demoted to a signed integer with smaller size, or an unsigned integer is converted to its corresponding signed integer, if the value cannot be represented the result is implementation-defined.
> Unspecified behavior is more what you're thinking of, though in that case the standard still provides a list of possibilities that compiler authors have to pick from
That contradicts the standard's definition of unspecified behavior. For example, from the C89 draft (emphasis added) [0]:
> Unspecified behavior --- behavior, for a correct program construct and correct data, for which the Standard imposes no requirements.
The TL;DR is that compilers compile code based on assumptions that UB won't be invoked. This sometimes produces extremely surprising results which have nothing to do with the hardware/OS.
That’s why, while much of the linked blog is kind of off the mark (signs of someone knowing less than they think they know), the general conclusion, using aligned pointers is recommended, is one that I typically recommend to developers new to C or C++ anyway.
I’m alright with folks sticking to aligned pointer operations, largely for performance reasons. On some platforms, unaligned operations are really expensive.
There are some other reasons, but that's one of them.
Another is that you want to guarantee objects are stored aligned in memory because that gives you some free bits in pointers you can hide stuff in. (This has less hardware support than it should.)
My point here is that you can’t have “everything works as it does in the native assembly language” and “portable assembly” at the same time because if you rely on implementation defined or undefined behaviour then it’s not portable any more
That depends on what you mean by "portable". I think being able to use the same code across many platforms is enough to qualify. Being able to access raw machine behavior is part of the premise of portable assembly, not a disqualifier.
indeed, I still have ~20 years old code that picks up and rectifies unaligned memory so gcc does the right thing. To claim a compiler bugs out on unaligned memory sounds very weird, I assumed that was common knowledge.
27 years ago I was helping someone rearrange structs because word-sized fields were being word aligned, and you would waste a good deal of memory if you arranged by concept instead of for byte packing. I believe that was under Microsoft’s compiler.
What you’re saying and what the blog post is implying are different things. This is an admission that GCC optimizes on this behavior in practice. Your claim is that GCC could optimize on this, which is a much less interesting claim.
That's what the author meant when he said "The shift of the C language from “portable assembly” to “high-level programming language without the safety of high-level programming languages”"
Back in the 1980s, C was expected to do what hardware does. There was no "the C abstract machine".
The abstract machine idea was introduced much later.
> The arguments in this blogpost are fundamentally flawed.
The "fundamentally flawed" comment is revisionist idea.
This turns out to be contentious. There are two histories of the C language and which one you get told is true depends on who you ask.
1/ a way to emit specific assembly with a compiler dealing with register allocation and instruction selection
2/ an abstract machine specification that permits optimisations and also happens to lower well defined code to some architectures
My working theory is that the language standardisation effort invented the latter. So when people say C was always like this, they mean since ansi c89, and there was no language before that. And when people say C used to be typed/convenient assembly language, they're referring to the language that was called C that existed in reality prior to that standards document.
The WG14 mailing list was insistent (in correspondence to me) that C was always like this, some of whom were presumably around at the time. A partial counterargument is the semi-infamous message from Dennis Richie copied in various places, e.g. https://www.lysator.liu.se/c/dmr-on-noalias.html
An out of context quote from that email to encourage people to read said context and ideally reply here with more information on this historical assessment
"The fundamental problem is that it is not possible to write real programs using the X3J11 definition of C. The committee has created an unreal language that no one can or will actually use."
> My working theory is that the language standardisation effort invented the latter. So when people say C was always like this, they mean since ansi c89, and there was no language before that. And when people say C used to be typed/convenient assembly language, they're referring to the language that was called C that existed in reality prior to that standards document.
But the committee has always had a lot of C compiler developers in it. The people who wrote the C89 standard were the same people who developed many of the C compilers in use before C89. The people who created the reality prior to C89 created the reality after C89. Any perception of "portable assembly" probably stemmed simply from the fact that optimizers were much less sophisticated.
> Back in the 1980s, C was expected to do what hardware does. There was no "the C abstract machine".
There was also a huge variety of compilers that were buggy and incomplete each in their own ways, often with mutually-incompatible extensions, not to mention prone to generating pretty awful code.
If you want a correct compiler it has to be correct according to a model, which means it can't handle things outside that model, and now you have "undefined behavior".
People want compilers to limit how much they transform UB, but that's not possible unless it gets defined. Which you can do, of course, but it's more limiting than it looks.
It doesn't, it is up to the compiler and optimizer to decide how to go at it.
Vector instructions, replacing library functions with compiler intrisics, splitting structs across registers and stack, unrolling loops are all examples absent from the language standard.
Two ways. One is the platform ABI sometimes says specific arguments are passed in specific registers. The second is (essentially) assigning local variables offsets on a machine stack where some offsets are stored in registers.
> The semantic descriptions in this Standard describe the behavior of an abstract machine in which issues of optimization are irrelevant
[...]
> Alternatively, an implementation might perform various optimizations within each translation unit, such that the actual semantics would agree with the abstract semantics only when making function calls across translation unit boundaries. In such an implementation, at the time of each function entry and function return where the calling function and the called function are in different translation units, the values of all externally linked objects and of all objects accessible via pointers therein would agree with the abstract semantics. Furthermore, at the time of each such function entry the values of the parameters of the called function and of all objects accessible via pointers therein would agree with the abstract semantics.
If you really are targeting the x86_64 instruction set, you should be writing x86_64 instructions. Then you get exactly what the hardware does and don’t get any of those pesky compiler assumptions.
Of course you don’t get any of those pleasant optimizations either. But those optimizations are only possible because of the assumptions.
I think it is a good blog post, because it highlights an issue that I was not aware of and that I think many programmers are not. I do think I am a decent C programmer, and I spotted the strict aliasing issue immediately, but I didn't know that unaligned pointer access is UB. Because let's face it, the majority of programmers didn't read the standard, and those who did don't remember all facets.
I first learned many years ago that you should pick apart binary data by casting structs, using pointers to the middle of fields and so on. It was ubiquitous for both speed and convenience. I don't know if it was legal even in the 90s, but it was general practice - MS Office file formats from that time were just dumped structs. Then at some point I learned about pointer alignment - but it was always framed due to performance, and due to the capabilities of exotic platforms, never as a correctness issue. But it's not just important to learn what to do, but also why to do it, which is why we need more articles highlighting these issues.
(And I have to admit, I am one of these misguided people who would love a flag to turn C into "portable assembler" again. Even if it is 10x slower, and even if I had to add annotations to every damn for loop to tell the compiler that I'm not overflowing. There are just cases where understanding what you are actually doing to the hardware trumps performance.)
I think you (and most of the other commenters in this thread) misunderstand the perspective of the author. This is a tool meant to do static analysis of a C codebase. Their job is not to actually follow the standard, but identify what “common C” actually looks like. This is not the same as standard C.
There are a lot of things compilers do not optimize on even though they are technically illegal. As a result, people write code that relies on these kinds of manipulations. No, this is not your standard complaint about undefined behavior being the work of the devil, this is code that in certain places pushes the boundaries of what the compiler silently guarantees. The author’s job is to identify this, not what the standard says, because a tool that rejects any code that’s not entirely standards compliant is generally useless for any nontrivial codebase.
> When compiling and running a C program, the only thing that matters is "what the C abstract machine does". Programs that exhibit UB in the abstract machine are allowed to do "anything".
This view is alienating systems programmers. You're right that that's what the standard says, but nobody actually wants that except compiler writers trying to juice unrealistic benchmarks. In practice programmers want to alias things, they want to access unaligned memory, they want to cast objects right out of memory without constructing them, etc. And they have real reasons to do so! More narrowly defining how off the rails the compiler is allowed to go, rather than anything is a desirable objective for changing the standard.
Great, except no implementation of the C abstract machine actually exists. So you can't test against it. All you have are compilers that use it to justify miscompiling your code.
We need a C interpreter that intentionally implements C machine features that don't correspond to any architectural feature - i.e. pointers are (allocation provenance, offset) pairs, integer overflow panics, every pointer construction is checked, etc. If only to point out how hilariously absurd the ISO C UB rules are and how nobody actually follows them.
My personal opinion is that "undefined behavior" was a spec writing mistake that has been rules-lawyered into absurdity. For example, signed integer overflow being UB was intended to allow compiling C to non-twos-compliment machines. This was interpreted to allow inventing new misbehaviors for integer overflow instead of "do whatever the target architecture does."
> For example, signed integer overflow being UB was intended to allow compiling C to non-twos-compliment machines.
This is indeed a design mistake, but in another sense. Ordinary arithmetic ops like + or - should throw an exception on overflow (with both signed and unsigned operands) because most of the times you need an ordinary math, not math modulo 2^32. For those rare cases where wrap around is desired, there should be a function like add_and_wrap() or a special operator.
UBSan covers each of those except provenance checking, and ASan mostly catches provenance problems even though that's not directly the goal. There are some dumb forms of UB not caught by any of the sanitizers, but most of them are.
Making your program UBSan-clean is the bare minimum you should do if you're writing C or C++ in 2023, not an absurd goal. I know it'll never happen, but I'm increasingly of the opinion that UBSan should be enabled by default.
> Great, except no implementation of the C abstract machine actually exists. So you can't test against it. All you have are compilers that use it to justify miscompiling your code.
All C compilers implement the C abstract machine. It is not used to justify miscompiling code, it is used to specify behavior of compiled code.
> We need a C interpreter
Interpreter or not is not relevant, there must be some misconception. Any behavior you can implement with an interpreter can be implemented with compiled code. E.g., add a test and branch after each integer operation if you want to crash on overflow.
> that intentionally implements C machine features that don't correspond to any architectural feature - i.e. pointers are (allocation provenance, offset) pairs, integer overflow panics, every pointer construction is checked, etc.
As others have mentioned there are static and dynamic checkers (sanitizers) that test for such things nowadays. In compiled, not interpreted code, mind you.
> If only to point out how hilariously absurd the ISO C UB rules are and how nobody actually follows them.
It's not that bad.
> My personal opinion is that "undefined behavior" was a spec writing mistake that has been rules-lawyered into absurdity. For example, signed integer overflow being UB was intended to allow compiling C to non-twos-compliment machines. This was interpreted to allow inventing new misbehaviors for integer overflow instead of "do whatever the target architecture does."
The spec uses implementation defined behavior for that. Although you can argue that they went the wrong way on some choices -- signed integer overflow "depends on the machine at hand" in the first K&R, which you could say would be reasonable to call it implementation specific and enumerate the behaviors of supported machines.
C had a long history with hardware manufacturers, compiler writers, and software developers though, so the standard can never universally please everybody. The purpose of standardization was never to make something that was easiset for software development, ignoring the other considerations. So a decision is not an example of design by committee gone wrong just because happened to be worse for software writers (e.g., choosing to make overflow undefined instead of implementation dependent). You would have to know why such a decision was made.
The general problem with this argument is that “do what the hardware does” is actually not easy to reason about. The end results of this typically are impossible to grok.
Not if possible implementations are specified, and especially if you can target machines of particular behavior. Which is of course how you can write endian and bit size portable code.
And one of the anythings permitted would be to behave in a documented manner characteristic of the target environment. The program is after all almost certainly being built to run on an actual machine; if you know what that actual machine does, it would sometimes be useful to be able to take advantage of that. We might not be able to demand this on the basis that the standard requires it, but as a quality of implementation issue I think it a reasonable request.
This is such an obvious thing to do that I'm surprised the C standard doesn't include wording along those lines to accommodate it. But I suppose even if it did, people would just ignore it.
The problem is that what the machine does isn't necessarily consistent. If you're using old-as-the-green-hills integer instructions then yes, the CPU supports unaligned access. If you want to benefit from the speedup afforded by the latest vector instructions, now it suddenly it doesn't.
Also, to be fair, GCC does appear to back off the optimisations when dealing with, for example, a struct with the packed attribute.
C has always had a concept of implementation defined behavior, and unaligned memory accesses used to be defined to work correctly on x86.
Intel added instructions that can’t handle unaligned access, so they broke that contract. I’d argue that it is an instruction set architecture bug.
Alternatively, Intel could argue that compilers shouldn’t emit vector instructions unless they can statically prove the pointer is aligned. That’s not feasible in general for languages like C/C++, so that’s a pretty weak defense of having the processor pay the overhead of supporting unaligned access on some, but not all, paths.
> C has always had a concept of implementation defined behavior, and unaligned memory accesses used to be defined to work correctly on x86.
There are a bunch of misconceptions here:
- unaligned loads were never implementation defined, they are undefined;
- even if they were implementation defined, this would give the compiler the choice of how to define them, not the instruction set;
- unaligned memory accesses on x86 for non-vector registers still work fine, so old instructions were not impacted and there's no bug. It's just that the expectations were not fulfilled for the new extension of those instructions.
Note: SIMD on x86 has unaligned instructions that used to be much slower (decoded differently) than their aligned counterparts.
For example, on Pentium 3 and Pentium Core 2, the unaligned instructions took twice as many cycles to execute. On modern x86 family processors, it’s the same cycle count either way. The only perf penalty one should account for is crossing of cache lines, generally a much smaller problem.
> The alignment of the addressable storage unit is unspecified.
This is for struct field access, but it clearly implies the compiler can choose to use unaligned struct fields. Also, the size of the integer types are all implementation defined:
then #12:
> Each non-bit-field member of a structure or union object is aligned in an implementation- defined manner appropriate to its type.
Alignment is defined as:
> requirement that objects of a particular type be located on storage boundaries with addresses that are particular multiples of a byte address
It doesn't say which multiple. 1 is a multiple. (So is 0.5, just in case the complier wants to go nuts with arcane code gen.) The spec even allows chars to be 7 bits. I didn't bother looking up the definition of byte in the spec for those architectures. (7 bits? 8 bits?)
In section 6.2.5, they talk about implementation-defined restrictions on integer types + alignment requirements:
> For each of the signed integer types, there is a corresponding (but different) unsigned integer type (designated with the keyword unsigned) that uses the same amount of storage (including sign information) and has the same alignment requirements.
So, the alignment of integers has to be the same for signed + unsigned types. That still doesn't say byte-aligned integers are disallowed.
Later:
> An integer may be converted to any pointer type. Except as previously specified, the result is implementation-defined, might not be correctly aligned,
Again, the alignment behavior is clearly implementation-defined.
I can't find a definition of implementation in the spec, but it clearly includes the compiler, standard library, and operating system. There is this quote:
> For implementations with a signed zero (including all IEC60559 implementations)
which, according to the IEC60559 abstract "An implementation of a floating-point system conforming to this standard may be realized entirely in software, entirely in hardware, or in any combination of software and hardware." I doubt they were trying to constrain floating point to be done in software by compilers, so it's pretty clear they intended to incorporate the physical hardware in the definition of the "implementation".
Later, they say:
> ...is defined if and only if the implementation supports the floating-point exception
which was definitely in the realm of hardware support back in 1989. Some later sections says that some macros (such as for FMA) are defined iff the implementation implements the primitive in hardware, and not just software.
Loads of architectures can't do misaligned memory access. Even x86 has problems when variables span cache lines. The compiler usually deals with this for the programmer, e.g. by rounding the address down then doing multiple operations and splicing the result together.
Most modern architectures that target high performance implementations can do unaligned accesses, even ones crossing page boundaries.
Less common is support for atomic RMW access to unaligned location. x86 does support it but crossing a cache line causes the operation to be very slow.
Unaligned memory accesses are undefined behavior in C. If you're writing C, you should be abiding by C rules. "Used to work correctly" is more guesswork and ignorance than "abiding by C rules". In C, playing fast&loose with definitions hurts, BAD.
Frankly, I'd be ashamed to write this blog post since the only thing it accomplishes is exposing its writers as not understanding the very thing they're signaling expertise on.
What makes you think they don't understand it? They acknowledge that it is UB. I read them as realistic, since they know that people rely on C compilers working a certain way. They even wrote an interpreter that detects UB: https://github.com/TrustInSoft/tis-interpreter
I understand why people like the compiler being able to leverage UB. I suspect this philosophy actually makes Trust-In-Soft more money: You could argue that if there was no UB, there would be no need for the tis-interpreter.
So isn't it in fact quite self-less that they encourage the world to optimize a bit less (spending more money on 'compute'), while standing to profit from the unintended behaviour they'd otherwise be contracted to help debug?
I made a comment a few levels up to a sibling where I point out the parts of the C89 spec that are relevant.
Alignment requirements for integers are implementation defined, not undefined behavior. On x86, the implementation used to define the alignment requirement to be one byte.
In fact, if you've do enought hardware register and bus-level (e.g., PCIe) programming, you'll quickly realize that there are all sorts of other exotic implementation-defined alignment constraints on modern systems.
Pretty much everything you wrote in that comment is wrong since you're interpreting the spec in a way that's clearly not what the spec describes (e.g. the spec is talking about alignment requirements for conversions, but you generalize it to "alignment requirements" which is dead wrong).
An example of a recent compile target that breaks on unaligned pointer accesses was asm.js. There, a 32-bit read turns into a read from a JavaScript Int32Array like this:
HEAP32[ptr >> 2]
The k-th index in the array contains 4 bytes of data, so the pointer to an address must be divided by 4, which is what the >> 2 does. And >> 2 will "break" unaligned pointers because it discards the low bits.
In practice we did run into codebases that broke because of this, but it was fairly rare. We built some tools (SAFE_HEAP) that helped find such issues. In the end it may have added some work to a small amount of ports, but very few I think.
asm.js has been superceded by WebAssembly, which allows unaligned accesses, so this is no longer a problem there.
They are confused, and seem not to realize that ABIs exist, and often specify alignment requirements. They seem to believe there are just ISA and architecture specs.
When you compile for Linux x86_64 ABI, gcc assumes that the stack is 16 byte aligned because it’s required by the ABI.
Regardless of whether the ISA needs it.
If they want the compiler to make no assumptions about aligned accesses, they would need to define an ABI in GCC that operates that way and compile.with it. They were historically supported (though its been years since I looked)
In a project I'm working on[0], there's an array object type used throughout, which can sometimes point to arbitrary data elsewhere. In a funky edge-case[1], such an array can be built with an unaligned data pointer.
Thus, if gcc/clang started seriously utilizing aligned pointer accesses everywhere, nearly every single load & store in the entire project would have to be replaced with something significantly more verbose. Maybe in a more fancy language you could have ptr<int> vs unaligned_ptr<int> or similar, but in C you kinda just have compiler flags, and maybe __attribute__-s if you can spare some verbosity.
C UB is often genuinely useful, but imo having an opt-out option for certain things is, regardless, a very reasonable request.
[1]: Any regularly allocated array has appropriate alignment. But there are some functions that take a slice of the array "virtually" (i.e. pointing to it instead of copying), and another one that bitwise-reinterprets an array to one with a different element type (again, operating virtually). This leads to a problem when e.g. taking i8 elements [3;7) and reinterpreting as an i32 array. A workaround would be to make the reinterpret copy memory if necessary (and this would have to be done if targeting something without unaligned load/store), but that'd ruin it being a nice O(1).
> This leads to a problem when e.g. taking i8 elements [3;7) and reinterpreting as an i32 array.
Even ignoring alignment issues, this is already UB because it violates the strict aliasing rule. You technically need to memcpy and hope that the compiler optimizes the memcpy out. In C++20 you can use std::bit_cast in some circumstances. https://en.cppreference.com/w/cpp/numeric/bit_cast. In C11 you can use a union, but that still requires a "copy" into the union.
I'm of course already using -fno-strict-aliasing (primarily because without it it's impossible to implement a custom memory allocator, but it also helps here).
As others have pointed out, GCC is completely allowed to do this because unaligned access is UB.
So the problem is not that GCC assumes your code has no UB.
The issue is that the C (and C++) specifications persist in this obnoxious and odious desire to label definable behaviour as UB, with no justification.
All of the arguments about needing UB to support different hardware fail immediately to the simple fact that the specification already has specific terms that would cover this: Implementation Defined Behavior, and Unspecified Behaviour. Using either of these instead of UB would support just as much hardware, without inflicting clearly anti-programmer optimizations on developers where the compiler is allowed to assume objectively false things about the hardware.
Undefined behaviour should be used solely for behavior that cannot be defined - for example using out of bounds, unallocated, or released memory cannot be defined because the C VM does not specify allocation, variable allocation, etc. Calling a function with a mismatched type signature is not definable as C does not specify the ABI. etc.
The big thing seems to be less about GCC, and more a question of, "what should a compiler be?"
He'd be better looking at smaller, less-known compilers, like the Portable C Compiler or the Intel C Compiler. If you want hyper-optimized, better-than-assembly quality, you pretty much have to give up predictability. The best optimizations that are predictable can't be written using modern compiler theory. They instead involve a lot of work, care, and attention that can't be generalized to other architectures. It can require a love for an architecture, even if's a crap one.
It's a tradeoff. Not every compiler needs to be optimized, and not every compiler needs to embody the spirit of a language.
> The C standards, having to accommodate both target architectures where misaligned accesses worked and target architectures where these violently interrupted the program, applied their universal solution: they classified misaligned access as an undefined behavior.
No. If the C standard wants to accommodate different target architectures, they use implementation-specified behavior. The undefined behavior is just polite way to say that the code is buggy.
The C standard just requires natural alignment even on architectures that allows unaligned accesses.
If you're going to read byte-level data you should be using a char pointer.
The author also speculates on how common this "bug" is. I'd say 15000 Debian packages that work properly indicates that just about nobody is relying on this undefined behavior.
At present there's almost no reliance on this UB by compilers on something that actually has a chance of affecting real code, so it's not particularly unexpected that software appears to work.
Undefined behavior doesn't necessarily mean the program will exhibit an issue. It could silently break with a future version of the compiler, which it sounds like was the case here
Getting to that point has required years of maintenance work since compiler writers started interpreting more and more undefined behaviour as optimization opportunities. At least now we have UBSAN to actually test for undefined behaviour at runtime.
The D programming language does not allow the creation of misaligned pointers in code marked as @safe, and in @safe code assumes they are aligned. In @system code you can do whatever you like, but things need to be aligned that are provided to @safe code.
Doesn't look like that changes anything about actual dereferencing though, which is the primary thing discussed - https://godbolt.org/z/4vW5Ksnab still emits an "align 4", which llvm could still assume as UB if violated (though I don't know if it ever does).
Yeah, I end up using __attribute__((packed)) for this at work. For tortured reasons, part of our codebase allocates memory with only 8-byte alignment, but the buffer is cast to a type that would have 16-byte alignment without __attribute__((packed)). As a result Clang wants to generate VMOVDQAs that require 16-byte alignment, unless you use packed, in which case it generates VMOVDQU.
If the compiler is so smart, I guess it could insert a memcpy when needed?
The standard, you may say.. I would argue it's the standard need to be changed. The modern reading of the standard is not useful as a low-level language and is unsafe as a high-level language.
> If the compiler is so smart, I guess it could insert a memcpy when needed?
If I'm reading your comment and the blog post correctly, the compiler would need a memory like access on every multibyte pointer argument where the compiler cannot otherwise prove alignment. Is that correct?
Internally, the compiler could represent it however it wants. LLVM IR's load/store instructions just have an "align" property, which is usually sizeof(the type), but can be set to 1 to mimic memcpy (and indeed llvm/clang immediately translate a memcpy to such - https://godbolt.org/z/7T46a6aqT).
Though it seems that, independent of this, it assumes that an int* in general will be 4-byte-aligned, so e.g. https://godbolt.org/z/aWTEd4s3K still has an "align 4" despite using memcpy. So one must also cast to a char* before using memcpy() to actually have it work. yay for more footguns!
Right, I agree that it would be nice to have some way to request unaligned load/store to be permitted, alike -fwrapv for signed int wrapping. But nevertheless the UB behavior is a reasonable option that's beneficial for other things.
In other words, the result they got is different from the expected one. The keyword here is "expected": if your code contains a part that generates undefined behavior accordingto the standard, then you should have no expectations. What's worth mentioning in this blog post?
Unaligned pointer accesses are for 80386 bozos.
Period
End of story.
If you want to play in 64-bit land, live by the architectural rules. If you do not, your code will likely die. And you need to "Lurn" a lot
Machine language architecture is flawed. The assumption of alignment is in the machine language.
Compilers can only use the instructions that are there. They have a difficult choice: close their eyes and generate aligned-pointer moves, or use a sequence of tests and partial move instructions that is orders of magnitude less efficient.
We've needed machine instructions to load or move memory efficiently regardless of alignment, for decades.
In retrospect, too many crazy software dances are due to miserliness on the part of hardware designs. Saving a couple bits in the address lines and not needing to straddle cache lines? We're far past the point where that's a considerable cost, as all modern ISA implementations now attest.
Storing pointers unaligned and using memcpy to extract them to an aligned pointer, can be a performance gain, if it means less padding taking up valuable cache space.
If a --k&r mode were to be reliable, wouldn't it need to get specified first? Otherwise people would start relying on some edge case.
If speed is not a requirement for the --k&r mode, you could just take the tis-interpreter and note that if it runs without UB, it is still much faster than an actual computer was when k&r were active.
Would it even be possible to specify a variant of C that contains no UB (e.g. would define exactly what happens on unaligned access), but can compile practical existing C89 programs? I wonder if it could be written such that it could actually specify the behaviour consistently across the language intersection supported by both of e.g. GCC 2.95 and Chibicc[0].
Or maybe there are so many bugs in GCC 2.95 that it would simply be infeasible? How much time would it take to specify?
It could probably only be used with a neural link. It would read your mind, then emit code that matches your perception of what you imagine old compilers did.
You jest, but there was some real effort put in to attempt define a dialect of C that would be less UB etc. And indeed the big problem was defining the semantics:
> After publishing the Friendly C Proposal, I spent some time discussing its design with people, and eventually I came to the depressing conclusion that there’s no way to get a group of C experts — even if they are knowledgable, intelligent, and otherwise reasonable — to agree on the Friendly C dialect. There are just too many variations, each with its own set of performance tradeoffs, for consensus to be possible.
Right, One of the many benefits of Rust is that it provides a convenient litmus test for C and C++ programmers who complain about UB. For some of them, having a language (safe Rust) where programs don't have Undefined Behaviour is excellent - but for a whole lot of them this isn't OK.
And the reason IMO is that they didn't actually want Defined Behaviour. What they wanted is for their nonsense programs to just work anyway. They want to skip the hard part of the job of software engineer where you need to correctly express what you meant as a program. They're the current generation of the people Charles Babbage complained of, 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?'.
When compiling and running a C program, the only thing that matters is "what the C abstract machine does". Programs that exhibit UB in the abstract machine are allowed to do "anything".
Trying to scope that down using arguments of the form "but what the hardware does is X" are fundamentally flawed, because anything means anything, and what the hardware does doesn't change that, and therefore it doesn't matter.
This blogpost "What The Hardware Does is not What Your Program Does" explains this in more detail and with more examples.
https://www.ralfj.de/blog/2019/07/14/uninit.html