When using respectable language, is it guaranteed that these types of overflows are impossible? Suppose for example I call C++'s std::string::string(const char*) ... is it fundamentally impossible for it to fail in this way, or is it just UB if you pass it a huge string?
The constructor is guaranteed to either construct a valid `std::string` or throw an `std::bad_alloc` exception if the allocation fails. If there were a size limit past which calling the function is undefined, that would have to be noted in the function's documentation.
Honestly I don't see anything in the chapter and verse that supports your argument. It says that the constructor has a precondition that [s, s+traits::length(s)) is a valid range, which throws just a little uncertainty over the question without resolving it. It seems easy to create a range that is not valid (consider a very large starting position s, for example).
The requirements are perfectly well-defined. I'll be working with the N4868 draft of the C++20 standard [0]. First, `std::string::string(s)` is equivalent to `std::string::string(s, std::char_traits<char>::length(s))` (§21.3.3.3). Next, `std::char_traits<char>::length(s)` is equal to the smallest `size_t` value `i` such that `s[i] == 0` (§21.2.2, §21.2.4.2).
From here, we yield our only preconditions for the call being valid: `s` must point to either a `char` equal to 0 or an element of an array of `char`s that is eventually followed by a 0 (§7.6.1.2, §7.6.6), and there must be no thread or signal handler concurrently modifying it (which would cause a data race, §6.9.2.2). These are necessary for each `s[i]` access in the length computation to be valid.
Letting `n` be the computed length, we have our final precondition, that `[s, s + n)` is a valid range (§21.3.3.3). "Valid range" has a specific meaning in the standard (§23.3.1):
> A sentinel `s` is called reachable from an iterator `i` if and only if there is a finite sequence of applications of the expression `++i` that makes `i == s`. If `s` is reachable from `i`, `[i, s)` denotes a valid range.
Since `s` must point to an array of `char`s of length greater than `n`, `s + n` is clearly reachable from `s`. Therefore, `[s, s + n)` denotes a valid range, given our preconditions.
The only length restriction in the standard is that the source string must be no more than `SIZE_MAX` characters long (excluding the null terminator), since otherwise `std::char_traits<char>::length(s)` would have no correct return value.
There is another thing I found string able to do: throw std::length_error. But that's consistent with your statement that it either succeeds or throws.
So, going back to my question, it seems like the small leap from C to C++, if the author embraces the STL, does categorically prevent this class of overflow.
> So, going back to my question, it seems like the small leap from C to C++, if the author embraces the STL, does categorically prevent this class of overflow.
"Categorically" is a bit strong; if any index were ever truncated to a 32-bit integer before writing to s[i], that could also cause major issues.
Also, the C++ standard library does not have a monopoly on safe interfaces; all of the relevant APIs can be trivially reimplemented in C. It's just that C libraries have not historically made safe internal interfaces a priority.
That basically means that s+traits::length(s) must be defined behavior. If s is not a pointer to a 0-terminated string then the behavior of this constructor is undefined.