When using respectable language, is it guaranteed that these types of overflows ...

LegionMammal978 · on Oct 25, 2022

The constructor is guaranteed to either construct a valid `std::string` or throw an `std::bad_alloc` exception if the allocation fails. If there were a size limit past which calling the function is undefined, that would have to be noted in the function's documentation.

jeffbee · on Oct 25, 2022

Honestly I don't see anything in the chapter and verse that supports your argument. It says that the constructor has a precondition that [s, s+traits::length(s)) is a valid range, which throws just a little uncertainty over the question without resolving it. It seems easy to create a range that is not valid (consider a very large starting position s, for example).

LegionMammal978 · on Oct 25, 2022

The requirements are perfectly well-defined. I'll be working with the N4868 draft of the C++20 standard [0]. First, `std::string::string(s)` is equivalent to `std::string::string(s, std::char_traits<char>::length(s))` (§21.3.3.3). Next, `std::char_traits<char>::length(s)` is equal to the smallest `size_t` value `i` such that `s[i] == 0` (§21.2.2, §21.2.4.2).

From here, we yield our only preconditions for the call being valid: `s` must point to either a `char` equal to 0 or an element of an array of `char`s that is eventually followed by a 0 (§7.6.1.2, §7.6.6), and there must be no thread or signal handler concurrently modifying it (which would cause a data race, §6.9.2.2). These are necessary for each `s[i]` access in the length computation to be valid.

Letting `n` be the computed length, we have our final precondition, that `[s, s + n)` is a valid range (§21.3.3.3). "Valid range" has a specific meaning in the standard (§23.3.1):

> A sentinel `s` is called reachable from an iterator `i` if and only if there is a finite sequence of applications of the expression `++i` that makes `i == s`. If `s` is reachable from `i`, `[i, s)` denotes a valid range.

Since `s` must point to an array of `char`s of length greater than `n`, `s + n` is clearly reachable from `s`. Therefore, `[s, s + n)` denotes a valid range, given our preconditions.

The only length restriction in the standard is that the source string must be no more than `SIZE_MAX` characters long (excluding the null terminator), since otherwise `std::char_traits<char>::length(s)` would have no correct return value.

[0] https://open-std.org/jtc1/sc22/wg21/docs/papers/2020/n4868.p...

jeffbee · on Oct 26, 2022

Thanks for the detailed answer.

There is another thing I found string able to do: throw std::length_error. But that's consistent with your statement that it either succeeds or throws.

So, going back to my question, it seems like the small leap from C to C++, if the author embraces the STL, does categorically prevent this class of overflow.

LegionMammal978 · on Oct 26, 2022

> So, going back to my question, it seems like the small leap from C to C++, if the author embraces the STL, does categorically prevent this class of overflow.

"Categorically" is a bit strong; if any index were ever truncated to a 32-bit integer before writing to s[i], that could also cause major issues.

Also, the C++ standard library does not have a monopoly on safe interfaces; all of the relevant APIs can be trivially reimplemented in C. It's just that C libraries have not historically made safe internal interfaces a priority.

leni536 · on Oct 25, 2022

That basically means that s+traits::length(s) must be defined behavior. If s is not a pointer to a 0-terminated string then the behavior of this constructor is undefined.