
GCC always assumes aligned pointer accesses - fanf2
https://trust-in-soft.com/blog/2020/04/06/gcc-always-assumes-aligned-pointers/
======
notacoward
About 30 years ago, I was porting an Ethernet driver to the MC88K, which also
didn't allow misaligned accesses. Guess what happens when you fill a page-
aligned buffer with an Ethernet packet, go past the 14-byte Ethernet header,
and try to read the first 4-byte value in the IP header as a single word? You
guessed it: SIGBUS. Every time. The same problem existed many other places, so
I had to go find all of them. Eventually I implemented an exception handler
that would emulate the access, and also log an error message. For months
afterward, this turned up ever more obscure cases until there seemed to be
none left.

Coming next: how modern programmers can't imagine a different word size or
byte order. After that: different memory-ordering models. That one's _really_
fun.

~~~
anticensor
Mixed memory order as in, a memory write stores the data in an order like 5,
7, 6, 4 instead of 4, 5, 6, 7?

~~~
notacoward
_Basically_ yeah, but it gets much more complicated. For example, when a write
hits memory is actually not the same thing as when it's observable by some
particular other processor. There might be particular rules for writes that
have dependencies too. Here's some good reading material if you have a spare
afternoon.

[https://www.kernel.org/doc/Documentation/memory-
barriers.txt](https://www.kernel.org/doc/Documentation/memory-barriers.txt)

Amusingly, it treats dependency issues as obsolete, but I worked on a
processor as late as 2009 that had all of the weak-memory-ordering issues of
the Alpha (because it was the same design team). We uncovered _lots_ of
missing memory barriers in the Linux kernel. Good times.

~~~
loeg
The modern thing is using acquire/release semantics on the variables you
actually care about intra-thread ordering on, rather than full barriers, which
impose a higher cost.

~~~
notacoward
I think we're talking about different kinds of barriers. Acquire/release at
the C++ level translates roughly into an rmb/wmb pair at the processor level,
unless it's implemented using an actual interlocked instruction which is
_worse_ than barriers.

~~~
loeg
Nope. Acquire/release at C level translates into a compiler barrier and the
exact machine implementation will depend on hardware. However, on x86, for
example, with its strong memory semantics, ordinary stores already have
release semantics and do not require an architectural barrier (wmb/sfence). No
sfence, no lock prefix.

~~~
notacoward
First you said that the modern way is to use acquire/release instead of a
barrier, then admitted that it _is_ a barrier at the compiler level. The fact
that it doesn't require a barrier instruction on the x86 is pretty much
irrelevant. Not all the world's an x86, and even the x86 hasn't always been
the same in this regard. So your rude "nope" is unwarranted. This attitude of
"as it is (in my world) now, so it has ever been and ever will be
(everywhere)" is exactly why these bugs occur.

~~~
loeg
Compiler barriers and machine barriers are completely different things,
despite having a similar name.

In about the era you described, the Linux kernel community did go around
throwing rmb()s and wmb()s everywhere, and those did translate to hardware
fences. (And in general, it is not safe to elide a hardware fence on x86
despite relatively strong memory ordering.) This is what I believed you were
talking about; maybe I misunderstood: apologies.

Putting explicit compiler barriers in code is still not quite the modern model
for relaxed atomic consumers. You use the abstract acquire/release load/store
pseudo-functions, and they do the right thing depending on implementation.
_The compiler barriers in the relaxed atomics model are an internal
implementation detail, not the API_.

x86 is just an example where the cheapest way to do a release-semantics store
is a plain store. I never said all the world was x86 — if it were, abstract
acquire/release relaxed atomics would be kind of pointless.

------
LennyWhiteJr
This isn't just an issue with GCC but embedded compilers as well. On embedded
systems its common to work with packed data structures to maximize serial
protocol efficiency.

Lets say you have a packed structure with a type of

    
    
        struct MyPackedStruct {
          uint8_t byteValue;
          uint32_t intValues[5];
        }
    

Normally a compiler would add 3 extra bytes between the uint8_t value and the
uint32_t array to keep alignment of the int array, but that won't be the case
if you force it to be packed. This results in uint32 values that span two word
boundaries. If you access one of those array values in code directly, the
compiler is smart enough to perform 2 separate memory reads and combine the
resulting value so you don't have to really think about it.

But if you do something like this:

    
    
        uint32_t *intValues = myPacktedStruct.intValues;
    

The compiler allows this, but the resulting intValues pointer loses all packed
awareness, and trying to dereference that pointer will result in an unaligned
memory access exception.

Moral of the story - only use packed data types when serializing/deserializing
a protocol stream. Avoid using packed data types 'at rest', because it can
cause subtle issues like this. The downside is that it results in extra
parsing work when converting between packed and unpacked types.

~~~
Asooka
I would argue that this is a bug and should probably be fixed in a future
C/C++ standard that allows declaring misaligned types. Something like "members
of a packed struct have [[aligned(1)]] attribute and cannot be converted to a
pointer type with different alignment except with a reinterpret_cast", so
you'd have to do

    
    
        uint32_t [[aligned(1)]]*intValues = myPackedStruct.intValues;
    

Would be nice if we also got e.g. float/intNua_t for unaligned types. Any code
that gets broken by this change was broken to begin with.

------
unwind
Specifying a target architecture (ISA) when compiling does not change the
programming language's semantics.

C is _not_ a high-level assembly language, as pointed out in the article.

So just because you know that the target CPU supports e.g. unaligned accesses
does not make them suddenly valid C.

------
eqvinox
No. [as in: this isn't about GCC.]

The C standard requires that pointers generally be created from referencing a
valid, existing object. A misaligned integer is not a valid "object", thus the
compiler may assume that all pointers to ints are aligned (to 4 bytes.)

[https://en.cppreference.com/w/cpp/language/pointer](https://en.cppreference.com/w/cpp/language/pointer)

    
    
      Every value of pointer type is one of the following:
      
      - a pointer to an object or function (in which case the pointer is said to point to the object or function), or
      - a pointer past the end of an object, or
      - the null pointer value for that type, or
      - an invalid pointer value.
    

Relatedly, `malloc` is required to return a pointer with the largest alignment
of any type ("suitably aligned to hold an object".)

Edit/Add: that's C++ in the link above, but it's the same in C. You can get a
pointer from either a valid static/stack object, or malloc, which is "max"
aligned. Either way it's required to point to a valid object, which includes
correct alignment. If you implement your own memory management / allocator,
it's also your responsibility to meet these same alignment requirements.

[https://en.cppreference.com/w/c/language/object](https://en.cppreference.com/w/c/language/object)

~~~
dooglius
Why "no", you're agreeing with the conclusion

~~~
IshKebab
Sort of. He definitely talks about the fact that it is undefined behaviour to
create a misaligned pointer:

> Strictly speaking, the function f invokes Undefined Behavior when it
> computes [the misaligned pointer]

But he thinks it is still a problem with GCC, in that GCC should not be
allowed to take advantage of this particular UB. He explains his view much
better in the bug report:

> GCC assumes that pointers must be aligned as part of its optimizations, even
> if the ISA does not force it (for instance, x86-64 without the vector
> instructions). The present feature wish as for an option to make it not make
> this assumption.

> Since the late 1990s, GCC has been adding optimizations based on undefined
> behavior, and “breaking” existing C programs that used to “work” by relying
> on the assumption that since they were compiled for architecture X, they
> would be fine. The reasonable developers have been kept happy by giving them
> options to preserve the old behavior. These options are -fno-strict-
> aliasing, -fwrapv, ... and I think there should be another one.

~~~
cyphar
The problem with disabling this optimisation is that it will produce garbage
code for basically every architecture. Code which does unaligned accesses are
noticeably slow compared to aligned accesses -- this is why there are separate
instructions and intrinsics for aligned and unaligned addresses on most
architectures. That way you don't pay the penalty of unaligned accesses all
the time. But if you try to use the aligned instructions on unaligned pointers
you will get all sorts of fun errors.

And note that this slowdown would have to apply to _every pointer operation_
in your program because GCC can't know whether a pointer is aligned at
compile-time. I'm sure many more people would complain about significant
performance slowdowns in a GCC update than about unaligned memory accesses
being something you need to do with great care in C.

~~~
pron
> And note that this slowdown would have to apply to every pointer operation
> in your program because

No, because I think that on x86 the instructions are the same for aligned and
unaligned up to, and including, 64 bits (quadword). The only slowdown that
would occur is due to missed compiler optimizations, and the author proposed
disabling them as an option.

~~~
vardump
Yes, up to 64 bits. They're different for x86 SIMD (128 bits and up, SSE &
AVX). For example vmovapd (requires alignment) vs vmovupd (can be unaligned).

------
vardump
Few seem to know x86 can generate misaligned access exception. Enabling EFLAGS
Alignment Check (bit 18) flag causes an exception on any misaligned access, if
the kernel has enabled this feature by setting CR0 register Alignment Mask
(bit 18).

~~~
SomeoneFromCA
You do not need to enable anything. SSE instructions require aligned access,
and will crash if used with wrong alignment, irrespective of the flag.

~~~
magicalhippo
I noticed this when I discovered the NVIDIA OpenGL driver used SSE without
thinking.

Nothing in the OpenGL specs says the buffers passed to it (think it was
glBufferData) has to be 16byte aligned, but with NVIDIA it would crash with an
alignment exception if they were not. The memory allocater in the language I
used, Delphi, only used 4 byte alignment (32bit).

They might have fixed this, this was well over a decade ago.

------
twoodfin
Putting aside that gcc’s behavior is totally in line with the standard, I’d
rather have a compiler (& language spec) with more effective room for auto-
vectorization than support for unaligned pointers.

------
ajross
The complaint isn't about alignment at all, it's that the optimizer assumes
that two pointers to the same basic type cannot overlap in memory. The
generated code is correct except for the fact that the two arguments are
distinct pointers to the same three bytes in memory.

I believe this behavior is actually specified in the standard, actually, in
the same section that defines the aliasing rules.

~~~
loeg
Yes, this is the compiler optimizing UB due to _aliasing_ , not due to
alignment.

~~~
not2b
Yes, the optimization assumes that given two pointers to int, either they
point to the same int, or they point to distinct, non-overlapping ints. Under
that restriction the result is guaranteed to be 1. It's allowed to do so by
the standard because those are the rules.

C was portable assembly back in the early 1980s. The result was that it was
crushed in performance by Fortran for scientific codes, and most of the
difference had to do with assumptions the compiler could make about aliasing
(or the lack of it). This was fixed by the first ANSI C standard (finalized in
1989).

Yet I still see people who weren't even born then, or were young children
then, talk as if C is supposed to be "portable assembly". It isn't.

~~~
naniwaduni
> The result was that it was crushed in performance by Fortran for scientific
> codes, and most of the difference had to do with assumptions the compiler
> could make about aliasing (or the lack of it). This was fixed

Calling it "fixed", as if it were a problem of the language rather than
programmers' expectations, is a rather slanted view of the history.

------
vkaku
Why not? We program with a model that assumes NULL is zero, that we have a
flat memory model...

If it's meant to be in a certain way, it is because it would not only simplify
the implementation, but also - "I hope you know what you are doing."

Of course, the point of this is that if you feel strongly against it, I'd
rather you submit a patch / use a fork where it shows benefit. If enough
people require this behaviour, they may enable it in the tree. They're looking
for more contributors, not less.

------
hedora
The example of bad gcc behavior was essentially type punning, which is well-
known to cause trouble.

Is there an example where things break on misaligned access to non-overlapping
objects?

~~~
eqvinox
Misaligned accesses, in general, without any qualification, can earn you a
SIGSEGV or SIGBUS. It's mostly x86 that's being very tolerant here; other
architectures are much less forgiving to varying degrees.

That's also why this rule is in the C standard to begin with. Misaligned
accesses need different / multiple instructions on some of these
architectures, making them significantly more expensive. So the decision was
made in favor of assuming things are aligned.

~~~
loeg
All (or almost all) of the big architectures that have survived to this day
and attempt to compete with x86 permit misaligned accesses in general purpose
memory. It's true that if you threw a dart at a historical computer
architecture, they would likely not be tolerated. But e.g., aarch64 tolerates
misaligned access.

------
SomeoneFromCA
Interestingly enough, I've just checked the code from this article
([https://pzemtsov.github.io/2016/11/06/bug-story-alignment-
on...](https://pzemtsov.github.io/2016/11/06/bug-story-alignment-
on-x86.html)), linked by the OP's article, and guess what - CLANG generates
code that does not crash, although it still uses SSE4.

