Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: SSO – Small String Optimization (github.com/feelamee)
4 points by feelamee 3 months ago | hide | past | favorite | 4 comments
I wrote this just for fun when saw article about sso in Rust[1]. My string can store up to 23 (excluding null-terminator) 8-bit chars without calling allocator.

Here I can mistake, but.. Curious fact: both - libstdc++[2] and libc++[3] - do access to union member without any check that it is active now. AFAIK, this is UB in C++. But I assume that they just rely on theirs compiler features. I tried to avoid this using `std::byte[]`. But I'm still sure that there are several UB's in my code :)

[1] https://tunglevo.com/note/an-optimization-thats-impossible-i...

[2] https://github.com/gcc-mirror/gcc/blob/d09131eea083e80ccad60...

[3] https://github.com/llvm/llvm-project/blob/4468d58080d0502a05...




> Curious fact: both - libstdc++ and libc++ - do access to union member without any check that it is active now.

Accessing a data member that's within the common initial sequence[1] of both union alternatives is perfectly well-defined.[2]

However, it's true that in this case (I'm looking at libc++) the member isn't quite the same in both alternatives: In one case it's a `char:1` and in the other case a `size_t:1`. Also, in both cases it's nested inside an anonymous `struct __attribute__((packed))`, which means we're dealing with two different compiler extensions already. (Standard C++ supports anonymous unions,[3] but not anonymous structs.) So yes, pedantically speaking, they're relying on the compiler's behavior.

> I tried to avoid this using `std::byte[]`

I don't know about Rust, but in C++ you probably wouldn't be able to type-pun `std::byte[]` in all the ways you'd need to during constant evaluation (i.e., at constexpr time). C++20-and-later require `std::string` to be constexpr-friendly. So that's probably relevant to the library vendors' choices here.

[1] https://eel.is/c++draft/class.mem#def:common_initial_sequenc...

[2] https://eel.is/c++draft/class.mem#general-28

[3] https://eel.is/c++draft/class.union.anon


> Accessing a data member that's within the common initial sequence[1] of both union alternatives is perfectly well-defined.[2]

Wow, thanks for the detailed answer. I'm not that familiar with standard. I will explore your links.

> I don't know about Rust, but in C++ you probably wouldn't be able to type-pun `std::byte[]` in all the ways you'd need to during constant evaluation

Probably yes. Despite on that I used `constexpr` in impl, I didn't tested this. I will add task to backlog (and never return to it, xd).


Interesting. The writing is a little unclear, but I enjoyed nonetheless!

Here's my user test:

https://news.pub/?try=https://www.youtube.com/embed/tQXoCbUh...


Sure, I need to add some descriptions for people who don't know what it is before seeing my implementation.

Thanks for the review!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: