libstdc++ and libc++ are both suboptimal in using available bits for SSO, but SSO23 [0] provides a string type which is and provides an interface compatible with std::string.
Optimal in using bits, but possibly at a loss of performance (it needs to subtracts two numbers to extract string length)
(possibly because a) my observation that It needs an extra operation may be incorrect (I didn’t count all the binary ops in both algorithms) and b) I have no idea whether that makes any significant difference on modern hardware)
If there is loss of performance, it might be enough to offset the loss of ability to do SSO on strings of length 23 (how common are those? I would guess less than one in every million strings has that length)
Also, severely nitpicking: C++03 doesn’t say anything about the time complexity of c_str (https://softwareengineering.stackexchange.com/a/124784), so in that version of the language, we can do a tiny bit better, and use the wasted bits in the last byte to store a 24th character of the string, if that character wouldn’t make us decode the last byte as a valid string length.
C++11 said c_str must be constant time, so that wouldn’t work there anymore.
> C++11 said c_str must be constant time, so that wouldn’t work there anymore.
How so? Any SSO scheme should allow you to determine whether a string is short or not in constant time. Copying a 24-character string into a 25-byte zero-terminated buffer takes constant time, so c_str could simply do the copy.
You need to find space to copy the data to, too. If you allocate that space, your allocator needs to be constant time. I don’t know of any even somewhat competitive allocator with that property (a bumping one that gives up when it runs out of addresses would work, but isn’t competitive)
You could set aside a buffer to use, but you would potentially need one for every string in your program of which the compiler cannot prove that c_str won’t be called on it.
I think the only way to do that would be to use SSO and allocate a memory buffer, and make it possible to find that buffer ∞ O(1) from the ‘SSO’ version. That’s doable, but I wouldn’t call it an optimization anymore.
Also, doing that and keeping the operation constant time would mean you would have to spend an equivalent amount of time on non-short strings. That can be done, but not in a performant way.
Greetings, fellow language lawyer! Pedantic you are, but this time you were not quite pedantic enough :)
I think you're conflating two different meanings of "constant-time". The one the C++ standard cares about is "the time this takes can be upper-bounded by some constant that doesn't depend on the string length or other stuff."
There is a different idea of constant time that's used in crypto when people worry about timing side channels: "the time this takes should be exactly the same, every time."
So, no you need not "spend an equivalent amount of time" on long strings. As you pointed out, you DO have to make sure your memory allocator is good enough. But actually this is easier than you admit: you don't need constant time guarantees for arbitrary allocations; you only need them for allocations of EXACTLY 25 bytes. In practice, you might get this for free, since many allocators try to optimize small allocations.
But the other problem you alluded to sinks this proposal, even in C++98. Because, like you said, you have only two choices:
1. pre-allocate emergency buffer, which defeats the point, or
2. allocate "just in time" which might FAIL if you run out of memory
Since C++ does not allow c_str to ever fail, you can't rely on dynamic allocation to implement it. Which is a bit of a pity since in practice, if allocations that small are failing you have bigger problems anyway most likely.
[0]: https://github.com/elliotgoodrich/SSO-23