Related: The Strict Aliasing Situation is Pretty Bad (2016) https://blog.regehr.org/archives/1307
Depends on what you mean by tighten the rules. If you tighten the rules for the optimizer by simply allowing everything to alias, that should not introduce bugs into currently well formed programs. (Except that they might be slower.)
> That means that writes through uint8_t pointers are treated as potentially updating any memory of unknown provenance. So the compiler assumes that every increment v[i]++ potentially modifies size() and hence must recalculate size() every iteration.
However, I think that a smart compiler would still be able to sort out that size() is not affected by such writes.
edit: specifically for std vector, the implementer could somehow hardcode the knowledge that size can never alias with the buffer, but that would be ad hoc and won't extend to user defined containers, which seems suboptimal.
In general, compilers don't generate runtime checks like this. In the case of aliasing problems, however, they actually do make such checks, but usually only in the case of possibly overlapping arrays. An example:
Both gcc and clang make a check to see if the x and y arrays overlap, and if they do they fall back to a scalar loop (instructions ending in ss: scalar single-precision) - but if there is no overlap they proceed with the vectorized approach (instructions ending in ps: packed single-precision).
So I guess these cases were important enough that they got specific handling in the compiler, but I haven't seen it for possible aliasing with a scalar, even though the same type of check could be used. Actually it's a bit trickier, because the compiler wants to know the array size to make this calculation, but the array size itself, which comes from size() is the thing subject to aliasing! So it takes a more complex analysis to show that it's safe.
void inc(float* x, float* c, size_t n)
//#pragma GCC ivdep
for (int i = 0; i < n; ++i)
It seems like it's not just the loop counts that's the problem: in the blog post even when size is a local, it fails to check to see if the vector array pointer itself can alias.
I.e., analogous to incB here:
incA is like your example: the possibly aliasing is between the array and the value added, and an alias check occurs. incB is like the v[i] example: possible aliasing is between the array and the pointer to the same array, no check and conservative code is generated.
edit: it is possible it only does it on PGO builds.
Dataflow analysis. If size() depends on values that are in one malloc'ed block, and the vector elements are in a different malloc'ed block, then you know there is no aliasing. (For example).
I don't know enough about compiler optimization algorithms to say how trivial it would be to figure out these two things don't alias, but when you have a type with the magic property "this guy is allowed to alias ANYTHING", then I imagine it gets pretty tricky.
And that might still not be enough: passing the vector by value still doesn't trigger the optimization in GCC and clang (but it does for a bare bone vector implementation, probably std vector itself ends up leaking its own internals).
> when you have a type with the magic property "this guy is allowed to alias ANYTHING", then I imagine it gets pretty tricky.
well, yes, that's the whole point of the article :)
If the size and the buffer pointer are both private, wouldn't it be enough to analyze all the member functions to ensure that there is no aliasing between them? Users might try to muck around in the internal representation using memcpy or whatever, but that would presumably be undefined behavior.
I mean, it's not like the optimization is impossible (by rejiggering the code a bit, it kicks in), it's just that using uint8_t's makes the optimzation far more brittle, because the compiler has to walk on eggshells around uint8_t's. Right?
In both cases, yes, if you hardcode the specific std::vector invariants in the compiler, it might be able to do better, but a user provided vector class wouldn't benefit from the same optimisations.
Note that compilers are not beyond hardcoding invariants of standard library facilities: malloc, C string functions, etc are treated as primitives by the compiler (for example, the result of malloc cannot alias with any other pointer), but for a complex class like vector, it does not seem appropriate and it is better to use compiler hooks in the implementation that are also available to the user (restrict, builtin functions, alias annotations, etc).
but I am not good at reading this sort of thing. It looks like it has the same issue, but I didn't think rust had the type ambiguity that C++ does around u8 / char?
On the other hand, if you use the more idiomatic `&mut [u8]`, that passes the pointer and size by value, so it should avoid the issue.
Edit: Using an iterator also avoids the problem, unlike in C++, because Rust iterators store their bounds within the iterator itself rather than calling iterable.end() on each iteration.
But iterating over the loop with .iter_mut is different than iterating over 0..v.len(), which is interesting.
Also you can use `-C target_cpu=native` and get AVX512 instructions on godbolt's.
Because Rust bounds-checks indexes, which likely breaks LLVM's vectorisation (you'd have to reimplement vectorisation & vectorised bounds-checking in MIR for it to work I guess e.g. bounds-check the entire chunk before applying the vectorised operation).
No comments really, your blogpost seems to answer all my questions. Excellent work!
In this specific case, the compiler should be able to assume that a previous write cannot affect the next read from the size field, but it seems that neither GCC nor clang take advantage of it.
`#pragma omp imd` should force the vectorization, but maybe not the hoisting, I don't know. Could be nice to check.
<source>:5:5: warning: loop not vectorized: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
#pragma omp simd
But ICC seems happy with it.
You do need -fopenmp of course.
strangely, it still fails to vectorize the original vector example.
Good point about #pragma ivdep. I feel like the semantics of that one are a bit less than rigorous, but it's a useful tool.
It's not that the size field is a function of the iteration variable (i?) but that the previous write may have overwritten the size field, because the size field is located at some arbitrary place in memory, and the array backing the vector is located at some other arbitrary place and they may overlap (but only because the vector type is uint8_t/char - if it other than a char type the compiler can be sure they don't overlap because the size() fields are pointers which have a distinct type from uint32_t and are not subject to the "char aliasing loophole" rule).
I realize now you were saying "[With ivdep] and in this specific case, the compiler should be able to assume..."
It makes sense now.
EDIT ignore the following, I was mistaken: Infuriatingly the article doesn't benchmark signed 8-bit vectors. If the issue is due to the special aliasing properties of pointers to unsigned bytes, signed should just work really fast out of the box, without jumping through any hoops.
My understanding is that the special aliasing rules apply only to `char` and `unsigned char` in C++, not to `signed char`. In C, however, I think it applies to all three.
AFAIK char is a distinct type from signed char and unsigned char, regardless of whether it is signed or not. In practice it will have the same representation as those but will not be the same type:
So `signed char` would seem to be a way to get a strongly typed char not subject to the aliasing rules (even if bare char is signed), but compilers don't take advantage as far as I can tell.