I wrote this just for fun when saw article about sso in Rust[1]. My string can store up to 23 (excluding null-terminator) 8-bit chars without calling allocator.
Here I can mistake, but..
Curious fact: both - libstdc++[2] and libc++[3] - do access to union member without any check that it is active now.
AFAIK, this is UB in C++. But I assume that they just rely on theirs compiler features. I tried to avoid this using `std::byte[]`.
But I'm still sure that there are several UB's in my code :)
[1] https://tunglevo.com/note/an-optimization-thats-impossible-i...
[2] https://github.com/gcc-mirror/gcc/blob/d09131eea083e80ccad60...
[3] https://github.com/llvm/llvm-project/blob/4468d58080d0502a05...
Accessing a data member that's within the common initial sequence[1] of both union alternatives is perfectly well-defined.[2]
However, it's true that in this case (I'm looking at libc++) the member isn't quite the same in both alternatives: In one case it's a `char:1` and in the other case a `size_t:1`. Also, in both cases it's nested inside an anonymous `struct __attribute__((packed))`, which means we're dealing with two different compiler extensions already. (Standard C++ supports anonymous unions,[3] but not anonymous structs.) So yes, pedantically speaking, they're relying on the compiler's behavior.
> I tried to avoid this using `std::byte[]`
I don't know about Rust, but in C++ you probably wouldn't be able to type-pun `std::byte[]` in all the ways you'd need to during constant evaluation (i.e., at constexpr time). C++20-and-later require `std::string` to be constexpr-friendly. So that's probably relevant to the library vendors' choices here.
[1] https://eel.is/c++draft/class.mem#def:common_initial_sequenc...
[2] https://eel.is/c++draft/class.mem#general-28
[3] https://eel.is/c++draft/class.union.anon