These short strings would actually NOT have COW optimization because they're structs and cloning one would clone its embedded string as well (as there are no pointers involved in < 24 byte strings).
This is how the dinkumware implementation of std::string has behaved for years. The basic form of the structure is a buffer and a pointer. If the pointer is filled in, it points to a heap string and follows cow semantics. Otherwise, the buffer is used, accomplishing the small string optimization.
I suppose I should have elaborated more. It just feels like the OP is "discovering" the wheel.
BTW, I think though I cannot be sure that both the GCC and MSVC std::string implementations use this optimization in release mode, but I gotta dash and don't have time to verify this fact, so take it with a grain of salt, if you will.