It is seriously unfortunate that C++ managed to standardize std::string, a not-very-good owning container type, but not (until much, much later) std::string_view, the non-owning slice type you need far more often.
Even if Rust had chosen to make &str literally just mean &[u8] rather than promising it is UTF-8 text, the fact &str existed in Rust 1.0 was a huge win. Every API that doesn't care who owns the textual data can ask you for this non-owning slice type, where in C++ it had to either insist on caring about ownership (std::string) or resort to 1970s hacks and take char * with zero terminated strings.
And then in modern C++ std::string cares about and preserves the stupid C-style zero termination anyway, so you're paying for it even if you never use it.
> And then in modern C++ std::string cares about and preserves the stupid C-style zero termination anyway, so you're paying for it even if you never use it.
I don't think this in itself is a real problem. You pay for the zero at the end, which is not much. The real cost of zero termination is having to scan the whole string to find out the size, which with std::string is only needed when using it with C-style APIs.
If you want to pass the string_view to some API that expects NULL terminated strings, then a copy is necessary (well, maybe is some cases you can cheat by writing a NULL in the string and remembering the original character, and then after the API call restore the character).
This isn't as much a fault of a string_view type of mechanism, but rather API's wanting NULL terminated strings. Which are kind of hard to avoid on mainstream systems today, even at the syscall interface. Oh well..
Sure, but the thread here was about the forced NUL-terminator in std::string and the costs associated with that. If you want a NUL-terminator (e.g. for use with a C API) then you have to pay the copy (and in the general case, allocation) cost for substrings no matter how your internal strings look (unless you can munge the original string) and std::string is exactly the right abstraction for you.
But yeah, it would be nice if the kernel and POSIX APIs had better support for pointer+size strings.
> And then in modern C++ std::string cares about and preserves the stupid C-style zero termination anyway, so you're paying for it even if you never use it.
Is this required now? I've seen a system where this was only null terminated after calling .c_str()
c_str has to be constant complexity, so I guess the memory needs to be already allocated for that null character. I'd be surprised to see an implementation that doesn't just ensure that \0 is there all the time.
Ah, the system I ran into would've been pre-c++11.
Only saw it trying to debug a heap issue and I was surprised because I thought surely it's a null terminated string already right? They also checked the heap allocation size, so it would only reallocate if the length of string data % 8 was zero.
Facebook / Meta had their own string type which did this, turns out now you have an exciting bug because you end up assuming uninitialized values have properties but they don't, reading an uninitialized value is Undefined Behaviour and so your stdlib conspires with the OS to screw you in some corner cases you'd never even thought about because that saved them a few CPU cycles.
The bug will be crazy rare, but of course there are a lot of Facebook users, so if one transaction out of a billion goes haywire, and you have 100M users doing 100 transactions on average, the bug happens ten times. Good luck.
Even if Rust had chosen to make &str literally just mean &[u8] rather than promising it is UTF-8 text, the fact &str existed in Rust 1.0 was a huge win. Every API that doesn't care who owns the textual data can ask you for this non-owning slice type, where in C++ it had to either insist on caring about ownership (std::string) or resort to 1970s hacks and take char * with zero terminated strings.
And then in modern C++ std::string cares about and preserves the stupid C-style zero termination anyway, so you're paying for it even if you never use it.