
Incrementing vectors - BeeOnRope
https://travisdowns.github.io/blog/2019/08/26/vector-inc.html
======
slavik81
Ugh. Aliasing rules in C/C++ are very unfortunate, but it's hard to tighten
the rules after the fact.

Related: The Strict Aliasing Situation is Pretty Bad (2016)
[https://blog.regehr.org/archives/1307](https://blog.regehr.org/archives/1307)

~~~
pingyong
>but it's hard to tighten the rules after the fact.

Depends on what you mean by tighten the rules. If you tighten the rules for
the optimizer by simply allowing everything to alias, that should not
introduce bugs into currently well formed programs. (Except that they might be
slower.)

------
amelius
The essence:

> That means that writes through uint8_t pointers are treated as potentially
> updating any memory of unknown provenance. So the compiler assumes that
> every increment v[i]++ potentially modifies size() and hence must
> recalculate size() every iteration.

However, I think that a _smart_ compiler would still be able to sort out that
size() is not affected by such writes.

~~~
gpderetta
how, without whole program compilation?

edit: specifically for std vector, the implementer could somehow hardcode the
knowledge that size can never alias with the buffer, but that would be ad hoc
and won't extend to user defined containers, which seems suboptimal.

~~~
tom_mellior
The compiler could generate both vectorized and non-vectorized variants and
decide dynamically which one to use based on a simple aliasing check. That's
an approach that would be more usual for JIT compilers than AOT compilers, but
habits can change.

~~~
BeeOnRope
Yeah I came here to say this: I don't see a good way to prove this at compile
time, but you could always do a dynamic check.

In general, compilers don't generate runtime checks like this. In the case of
aliasing problems, however, they actually _do_ make such checks, but usually
only in the case of possibly overlapping arrays. An example:

[https://godbolt.org/z/9vZfl_](https://godbolt.org/z/9vZfl_)

Both gcc and clang make a check to see if the x and y arrays overlap, and if
they do they fall back to a scalar loop (instructions ending in ss: scalar
single-precision) - but if there is no overlap they proceed with the
vectorized approach (instructions ending in ps: packed single-precision).

So I guess these cases were important enough that they got specific handling
in the compiler, but I haven't seen it for possible aliasing with a scalar,
even though the same type of check could be used. Actually it's a bit
trickier, because the compiler wants to know the array size to make this
calculation, but the array size itself, which comes from size() is the thing
subject to aliasing! So it takes a more complex analysis to show that it's
safe.

~~~
gpderetta
If I'm reading the generated asm correctly (which I might well be not), the
following function is vectorized conditional to a runtime alias check (the non
vectorized case is completely removed if the pragma is uncommented). So even
scalars seem to be handled:

    
    
      void inc(float* x, float*  c, size_t n)
      {
        //#pragma GCC ivdep
        for (int i = 0; i < n; ++i)
          x[i]+= *c;
      }
    

the issue really seems to be that the compiler just gives up very early if the
iteration count is not a loop invariant.

~~~
BeeOnRope
Yeah, I read it the same as you: so alias checks can definitely happen even
for single values.

It seems like it's not just the loop counts that's the problem: in the blog
post even when size is a local, it fails to check to see if the vector array
pointer itself can alias.

I.e., analogous to incB here:

[https://godbolt.org/z/ptQZcs](https://godbolt.org/z/ptQZcs)

incA is like your example: the possibly aliasing is between the array and the
value added, and an alias check occurs. incB is like the v[i] example:
possible aliasing is between the array and the pointer to the same array, no
check and conservative code is generated.

Interesting...

------
totalperspectiv
I'm trying to see if rustc does the same thing:
[https://godbolt.org/z/6WcYMf](https://godbolt.org/z/6WcYMf)

but I am not good at reading this sort of thing. It looks like it has the same
issue, but I didn't think rust had the type ambiguity that C++ does around u8
/ char?

~~~
lytigas
Regardless of what exactly rustc does right now, in the future Rust's strict
aliasing rules will basically ensure Rust does this optimization in every
possible case, because Rust reasons at compile time to ensure no mutable
references ever alias. Things have been held up for a while on this LLVM bug,
though[0].

[0] [https://github.com/rust-lang/rust/issues/54878](https://github.com/rust-
lang/rust/issues/54878)

~~~
yoshuaw
Last I heard there's a fair chance that this might be implemented at the MIR
level instead, allowing it to be unblocked from the LLVM bug.

------
dragontamer
Oh, its BeeOnRope. I was wondering who could have investigated this detail to
such fine degree.

No comments really, your blogpost seems to answer all my questions. Excellent
work!

------
mihaitodor
@BeeOnRope Regarding comments on the blog, have you considered using
[https://utteranc.es/](https://utteranc.es/)? It stores them as GitHub issues.
I think [https://staticman.net/](https://staticman.net/) does something
similar.

------
gpderetta
re Disappointment, the somewhat portable #pragma GCC ivdep in theory would
help in a lot of cases.

In this specific case, the compiler should be able to assume that a previous
write cannot affect the next read from the size field, but it seems that
neither GCC nor clang take advantage of it.

~~~
TApplencourt
In HPC, we use `#pragma omp simd`.

`#pragma omp imd` should force the vectorization, but maybe not the hoisting,
I don't know. Could be nice to check.

Edit:

Clang fail: ``` <source>:5:5: warning: loop not vectorized: the optimizer was
unable to perform the requested transformation; the transformation might be
disabled or specified as part of an unsupported transformation ordering
[-Wpass-failed=transform-warning]

    
    
        #pragma omp simd 
    
        ^
    

1 warning generated. Compiler returned: 0 ```

But ICC seems happy with it.

~~~
gpderetta
it works on my test:

    
    
       https://godbolt.org/z/sSSdjw
    

where ivdep fails. Nice!

You do need -fopenmp of course.

edit:

strangely, it still fails to vectorize the original vector example.

------
miga
What about AVX? [https://stackoverflow.com/questions/41086366/how-to-
incremen...](https://stackoverflow.com/questions/41086366/how-to-increment-a-
vector-in-avx-avx2)

`-mavx2`?

~~~
stagger87
What about it?

------
heyiforgotmypwd
More-so than the compiler, it depends on the design goals that are implemented
in width and speed of the processor's registers, SIMD vector and conventional
integer units. For x86, one could readily guess un/signed ints (32 & 64-bit)
would be fast; un/signed 8/16-bit math is unlikely to be as fast. That
hypothesis is backed-up by this article's data.

~~~
tom_mellior
Did you read to the end? The final results show that in the end the
computation on the vectors with 8-bit elements is 4x as fast (per element) as
the computation on the vectors with 32-bit elements.

EDIT ignore the following, I was mistaken: Infuriatingly the article doesn't
benchmark signed 8-bit vectors. If the issue is due to the special aliasing
properties of pointers to _unsigned_ bytes, signed should just work really
fast out of the box, without jumping through any hoops.

~~~
slavik81
The issue is due to the special properties of C character types, which include
char, signed char and unsigned char. uint8_t and int8_t are not required to be
character types, but they're typically implemented as typedefs of the char
types, so they pick up the same properties. Here's the GCC bug discussing this
problem:
[https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66110](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66110)

~~~
tom_mellior
Thanks. I misremembered C's rules as only unsigned char having this special
property (as well as plain char if it's unsigned), but not signed char.

~~~
BeeOnRope
I think you are partly right.

My understanding is that the special aliasing rules apply only to `char` and
`unsigned char` in C++, not to `signed char`. In C, however, I think it
applies to all three.

AFAIK char is a distinct type from signed char and unsigned char, regardless
of whether it is signed or not. In practice it will have the same
representation as those but will not be the same _type_ :

[https://godbolt.org/z/D6ylkf](https://godbolt.org/z/D6ylkf)

So `signed char` would seem to be a way to get a strongly typed char not
subject to the aliasing rules (even if bare char is signed), but compilers
don't take advantage as far as I can tell.

~~~
slavik81
Just to back you up, this post quotes both standards:
[https://stackoverflow.com/a/51228315/331041](https://stackoverflow.com/a/51228315/331041)

