Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> The C++ designers allowed you to avoid friction here by allowing you to pass a std::string to a function expending a const char . Technically, this is an operator method.

It's the other way round. You can pass a const char* to a function expecting a std::string. Passing a std::string to a function expecting const char* will generate a compile error.

You need to call c_str() on the std::string if you want to pass it as a parameter to a function expecting a const char*.



I'm not sure if this is what OP was referring to, but in the ancient past before the STL was fully standardized, some implementations had an `operator const char*` in std::string to allow implicit conversions.


> the ancient past before the STL was fully standardized

Specifically the "ancient past" here is prior to C++ 11 when C++ decided now it wanted to actually define how its string type works because C++ 98 and C++ 03 strings are both even more dangerous than most things are in C++ and had to be put out of their misery.


> because C++ 98 and C++ 03 strings are both even more dangerous

... how so ? they were just CoW which is actually I think the better choice most of the time... now there are copies all over the place


The C++ API lets you take references into the string. These, understandably, are lightweight, no C++ programmer would expect a reference into the sixth character of a string s[5] to be expensive to make or carry about, but they're mutable and as a result of being lightweight they are not reference counted...

So I've got the string "IR Baboon big star of cartoon" and I take references into it, which are cheap and then you use your C++ 98 copy constructor to get another string, which of course also says "IR Baboon big star of cartoon", when you took it -- and then I scrawl "I AM Weasel" on top of my string using my reference and now your string was changed because it was COW.

If you liked COW for this purpose Rust has std::borrow::Cow which is a smart pointer with similar flavour, Cow<T> is a sum type that's either a thing you own T (and thus you could modify it) or it's a reference, perhaps &T (and thus you can't modify it) but which promises you could get an owned thing (e.g. for strings by deep-copying the string) if you need one. Methods that would be OK to call on the immutable reference (e.g. asking how many times an ASCII digit appears in the string) work on Cow<T> and if you find you need to mutate it (maybe in a rare case) you can ask the Cow for the mutable version, if it already had the owned version you get that, if not it will make one for you.

Rust's traits kick in here, Cow<T> requires T: ToOwned, which is a trait saying "I can make an immutable reference to T into a new thing T you own", obviously types you shouldn't do that to simply do not implement ToOwned and so you can't make a Cow of those types. The standard library provides in particular an implementation of ToOwned for &str which makes Strings from it.


> The C++ API lets you take references into the string

> and then I scrawl "I AM Weasel" on top of my string using my reference and now your string was changed because it was COW.

I mean, that's the point of references... no ? If I wanted a different object I'd make a copy.

Like, even with just one string, without any CoW, your post makes it sound like you'd be surprised than if you had:

    void set_some_config(const char*);
    char* get_some_config();

    std::string s = "foo";
    set_some_config(&s[1]);
    s = "bar";
    get_some_config();
you'd get "ar" in get_some_config().


> If I wanted a different object I'd make a copy.

In the explanation I posted, you do make a copy to get a different object "you use your C++ 98 copy constructor to get another string".

The problem happens because both strings share the same bytes to represent the text "IR Baboon big star of cartoon" as part of the COW optimisation. But my reference can scribble on this shared text.

I don't see how your get_some_config is similar at all. Notice that with C++ 11 strings, the copy constructor gives you a deep copy of that "IR Baboon" text and so my references can't smash your string.


CoW strings with atomic reference counting was definitely the wrong choice for a multi-core universe. The performance penalty is way too high. If you need that semantic there are other ways to get it.


> Performance pentalty way too high?

Is a single atomic increment really that expensive? I mean we are not even talking about a full memory barrier here, just the atomic increment's implied acquire and release on the single variable. Other operations not dependent on a subsequent read could still be re-ordered in both directions.

And also keep in mind that the alternative was copying the whole string instead. Which means both heap memory allocation (which is often pretty expensive, even with per-core heaps), plus the actual copying. Unless a platform has a terrible implementation of atomic increment, or you have a std::string that is frequently getting copied on multiple cores (so as to have meaningful contention), I would have expected the actual copying implementation to be slower. But I'm not super familiar with the timings of these things, so i certainly could be mistaken.

My understanding was that the change was more for about being able to set proper bounds on some operations, ensuring .c_str() is O(1), and not O(n) sometimes, and similarly with string writes, etc.


Copying short strings does not necessarily involve an allocation in implementations using short string optimization. Shooting down the cache line in a remote CPU that happens to have used a frequently-used string recently is absurdly expensive by comparison.


The COW and short string optimizations are not mutually exclusive. If we assume short string optimization is implemented both before and after, then we are back to comparing the atomic increment to allocation. And different allocation approaches can make the cost of heap allocation differ quite substantially. I'd fully expect that some allocation approaches are cheaper than the cache line invalidation from atomic increment, but some others that tend involve a lot of pointer chasing can be rather costly.

Certainly plenty of widely copied strings are short strings, so a COW implementation that lacks the short-string optimization could very easily be a bad bottleneck for multi-core compute.


You have accurately described the GNU CoW string :-)

My impression through the fog of history is that what happened was a really clever GNU person with little foresight and no access to an SMP system implemented std::string with CoW. Its performance in practice was so poor that the standard committee intentionally changed the standard to make it an illegal implementation, thereby eradicating the GNU CoW string. There was no higher principled logic.


Yet more recent benchmarks show that there are pretty important use cases where CoW string can be faster:

https://blogs.msmvps.com/gdicanio/2016/07/09/is-copy-on-writ...

https://oribenshir.github.io/afternoon_rusting/blog/copy-on-...

Also, the point of that was to improve multithreading of string: I think this very idea is problematic. I've written at this point hundreds of thousands of line of C++, and the number of times where strings are really, by design, supposed to be shared across threads is honestly counted on the fingers of one hand, just like e.g. justification for using Arc over Rc in rust. 99% of string handling is done as some GUI work on the main thread or as part of some task processing done in some network thread, which stays in that thread.


Clearly there's a frontier where the cost situation begins to favor the CoW approach, and I think authors should consciously choose whether they want a CoW string or not based on their use-case, but that goes against the idea of std::string as a jack-of-all-trades. Personally I don't really like std::string as a concept. It overlaps with too many other concepts. It is just vector<char> or std::unique_ptr<char> with SSO? The latter is nice in cases where you want std::string to adopt or release existing memory. Or do you want something like absl::Cord, which is like the old GNU CoW string but with even more stunts under the hood?


Perhaps, but in the in the ancient past before STL was standardized, Chrome didn't exist.

10 years ago (when the parent mentioned they were still at Google) c++11 was already out.


While it's true that the Standard Template Library is truly a "long time" ago, being a 1990s project, the poster's phrase "before STL was standardized" actually refers to C++ 98 and C++ 03 where the C++ standards don't specify std::string internals.

Originally C++ doesn't have a string type, the C++ 98 standard does standardize a string type but it's only loosely specified. Most implementations do something "clever" which it turns out is a bad idea (this is a recurring theme in C++. Only in C++ 11 does the standard say OK, we'll prescribe how the string class actually works, making it more complicated but hopefully avoiding the worst problems.

Chrome was launched in 2008, and much of its internal structure was far older having incorporated work by Mozilla and Apple.


> work by Mozilla and Apple

Don’t forget the origin of WebKit, KHTML from the KDE folks.


Ten years ago was 2012. C++11 came out in 2011. Do you believe a big codebase like Chrome would be converted to C++11 less than one year after the spec was published? I find that unlikely but i never worked on such a big codebase so i wouldn't know.


The STL was mostly standardised in C++98, 1998. There were additions in C++03 and C++11, but nothing like removing this type of overload.

Long running systems were still using pre-standardisation libraries for string up to 2012 however, so you may well have come across such projects.


> nothing like removing this type of overload.

Bzzt. C++ 11 completely overhauls how std::string is defined.


> C++11 came out in 2011

C++0x was a thing, with varying levels of support from all major compilers, for years before C++11 was finally ratified.

In the context of my original comment though, no matter how dated the code base, I think it unlikely that Chrome was using any variant of std::string that had an implicit conversion operator for const char* such that string could be passed as a parameter to a function taking const char* without needing to call c_str().


Ja, they have no excuse. They mostly just don't care. And why should they? Google cares nothing for them.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: