

A C++ Immutable String class - cdmh
http://blog.reliablecpp.com/2013/09/a-c-immutable-string-class/

======
simfoo
If you can't access the website, here's the code:
[https://github.com/cdmh/cpp_immutable_string](https://github.com/cdmh/cpp_immutable_string)

------
modulus1
I guess I expected the whole point of an immutable string class would be to
have O(1) copies. What does this buy you beyond 'const std::string'?

~~~
cdmh
I'm toying with implementing O(1) copies by using a reference counted pointer
to basic_string instead of storing it by value within the class. It would mean
a small memory overhead and another level of indirection, but probably worth
doing. Thoughts?

~~~
alexchamberlain
Most implementations are reference counted with copy on write. No point.

~~~
nly
libc++ (the Clang project standard C++ library) doesn't use COW for
std::string, and I'm pretty sure GNU stdlibc++ will also drop it the next time
they have to break ABI. C++11 move semantics and the 'small string
optimisation', as it's known, blow away any performance benefit of COW for
most sane uses.

~~~
cdmh
COW is prohibited in C++11 Standard implementations of std::basic_string

~~~
nly
I don't have a copy of the final 2011 standard, but as far as I can see
there's nothing in the latest 2011 draft of C++11 that expressly _prohibits_
COW.

[21.4.1.4] does say "Every object of type basic_string<charT, traits,
Allocator> shall use an object of type Allocator to allocate and free storage
for the contained charT objects as needed", but that "as needed" most
definitely leaves the door open for a COW implementation.

The only other part that I can see that may preclude a COW implementation is
the postconditions specified for copy operations [21.4.2], which says data()
returns a pointer which "points at the first element of an allocated copy of
the array whose first element is pointed at by str.data()". Again though,
"allocated copy" doesn't necessarily mean "a copy I just allocated". When I go
and get a copy of a book I don't literally go and copy it.

In fact the standard specifies that the move constructor leave the source
value "in a valid state with an unspecified value"... which again suggests you
could use COW and have the source argument return the same value it had before
you moved from it.

In the latest C++14 draft though (N3690), you're right, it is explicitly
prohibited because "Invalidation is subtly different with reference-counted
strings".

~~~
cdmh
21.4.1 p6 states invalidation of iterators/references is only allowed for — as
an argument to any standard library function taking a reference to non-const
basic_string as an argument. — Calling non-const member functions, except
operator[], at, front, back, begin, rbegin, end, and rend.

The non-const operator[] in a COW string requires making a copy of the string
if the ref count > 1 which invalidates references and violates this paragraph.

~~~
nly
Hmm true. This is what happens when you let the user violate the iterator
abstraction for performance and rely on contiguity and raw refs/ptrs. It's
kind of a shame, because the moment you mutating characters in a string you're
often doing something silly anyway.

------
fauigerzigerk
This is not an implementation I would use. One major reason for using an
immutable string class instead of a const string is to get rid of the capacity
member. For short strings (e.g words) the capacity member alone can be larger
than the contents of the string.

~~~
sesqu
A string would have to be short indeed for the capacity to be larger -
something like 2 characters, I'm guessing. It's certainly a valid concern in
occasional cases, but I wouldn't consider it a major reason to pass on an
implementation.

~~~
fauigerzigerk
The string capacity on a 64 bit machine is usually represented as an 8 byte
integer. The average length of a unique english word is 8 and the average
length of a word occurrance is 5.

------
alexchamberlain
The main question I have is, why?

The author also has a rather [confusing article][1] about immutable vs const,
that I would discredit personally. He claims that objects accessed via
reference to const can modify themselves, which is not true, as you can only
access const functions through a const reference.

[1]: [http://blog.reliablecpp.com/2013/09/immutable-vs-
const/](http://blog.reliablecpp.com/2013/09/immutable-vs-const/)

~~~
Deregibus
I think the implication is that since void output(std::string const &msg);
takes a const reference, that function cannot change the value of msg. What
the function does not know is whether or not anyone else outside of the
function could change that value. It doesn't know if the original creation of
the object was const or not const.

The example is misleading because there isn't any reason why you would be
concerned with such a distinction in this case.

~~~
alexchamberlain
Agreed, it is completely irrelevant that it could be modified before or after.

------
arithma
This defeats the purpose of

    
    
        std::string const s("Hello");

~~~
ajuc
You specify that the place s is const, not the value it keeps. You can still
do:

    
    
        #include <string>
        #include <iostream>
    
        // not your code
        void innocentF(std::string const& s) {
          (const_cast<std::string&>(s)) += std::string(" I'm evil.");
        }
    
        // your code
        int main() {
          std::string const s("hello world.");
          innocentF(s);
          std::cout << s << std::endl; // hello world. I'm evil.
          return 0;
        }

~~~
vinkelhake
That code isn't valid C++. Modifying a const object is undefined behavior. You
would have to remove the const on your string in main to make the program
well-defined.

~~~
thwest
const_cast is undefined behavior? Can you cite the spec?

~~~
aristidb
const_cast is undefined behavior if the actual object is const. If it's a
mutable object that just happened to be passed as a reference you can
const_cast it. Or that's how I understand it.

~~~
ajuc
The funny thing is - it doesn't even show a warning compiled with -Wall.

~~~
alexchamberlain
Well, the `const_cast` usage is fine, but calling that function on an actual
`const` object is not; it's a lot to ask a compiler to warn on that.

------
aristidb
This class doesn't seem like you actually get any tangible benefits from it,
as it just stores a std::string plain and vanilla inside...

------
asveikau
I don't really understand why you would want this other than to trick yourself
into thinking you are writing java. (I already find it really annoying when
people port their Javaisms to C++.)

~~~
thirsteh
The page is down, but: Most languages have immutable strings.

Also, I hope you're not suggesting immutability is a Javaism...

~~~
asveikau
Yes, actually, I don't think it's controversial to say that Java did more to
popularize immutable strings than any of the languages you're thinking of. And
when people get CS degrees completely centered around Java, bring this kind of
"feature" and others to C++, they end up churning out what looks a lot like
pretend-Java.

~~~
pjmlp
Do those CS degrees exist? I pity those students.

Back then when I graduated (1999), at least in Portugal there were lots and
lots of languages to use for assignments, during the degree (5 years long).

------
mwcampbell
This implies that C++'s std:;string is mutable. I think I'll continue to run
the other way, to avoid C++ (and C as well) whenever I can. Mutable strings
are insane.

~~~
alexchamberlain
Why (are mutable strings insane)?

~~~
mwcampbell
OK, that comment was too light on substance.

The problem with strings being mutable by default is that one has to do a lot
of defensive copying to avoid unexpected behavior. Mutable strings do have
their place, inside a function that's building up a string, before that string
is visible to any other part of the program. But it shouldn't be the default.

See also: "The Value of Values" by Rich Hickey
([http://www.infoq.com/presentations/Value-
Values](http://www.infoq.com/presentations/Value-Values))

~~~
StefanKarpinski
The primary reason when deciding whether a type should be mutable or not is
psychological: mutable things are containers with independent identity from
their content; things that are identified by their value should be immutable.
The prototypical mutable type is an array; the prototypical immutable types
are numbers: if you change the imaginary part of a complex number, you don't
have the same number with different content, you have a different number. When
you think about strings as arrays of bytes like you do in C, it makes sense
for them to be mutable; in a higher level language where they behave much more
like atomic values, it makes much more sense for them to be immutable – it can
be really jarring when some called code deep down in the guts of a program
mutates a string and you see the results at the top level. In C++, which is
somewhere between a low level and high level language, it's hard to say which
way it should be, but the STL approach does seem to treat them more like
values than containers, which implies that they probably should be immutable.

------
polskibus
I was hoping for an optimization of 8char strings to be stored internally as
longs.

------
iam
It's immutable but it defines swap? That seems.. odd.

~~~
tlb
It does not. swap() only appears in the documentation as a function not
implemented.

------
chrismorgan
reliablecpp, maybe, but alas: not reliablesite. :-(

------
dllthomas
Was hoping for ropes. Not bad, though.

------
programmerby
Benchmarks?

