
Std::string half of all allocations in the Chrome browser process - mlrtime
https://groups.google.com/a/chromium.org/d/msg/chromium-dev/EUqoIz2iFU4/kPZ5ZK0K3gEJ
======
humanrebar
The problem with std::string is that it's named wrong. It should be called
std::string_buffer, because that is what it is. Its performance
characteristics are closer to a std::vector than a std::array (now available
since C++11).

Many projects cannot copy around std::vector<char> in good conscience. They
really want a copy-on-write string, an immutable string, a rope, a reference-
counted string, or an always-in-place string. Or some combination of the above
depending on the circumstance.

The problem is that std::string is _not_ a good type to use as a parameter for
various reasons. In addition to its aggressive allocation behavior, it's also
fairly inflexible. What are the alternatives?

1\. boost::string_ref - available now, so use it

2\. std::string_view - available starting in C++14 and works roughly like
boost::string_ref

3\. pass around pairs of iterators instead of single objects

3) is actually the most flexible, though it requires different kinds of
overhead. The most obvious way would be to template all your string-accepting
functions on two parameters: the type of your begin iterator and the type of
your end iterator. But the benefit is that you can pass around any of the
above to your heart's content, plus more, like elements in tries.

std::string still has an important place, but it should generally be used as a
private member variable, not as something you require in your interface.
Pretty much the same thing goes for char* unless you are implementing a C ABI
(plus a size, please). Even then, you can immediately convert to/from a
boost::string_ref and still have yourself a self-contained reference to a
bounded character sequence.

~~~
rwmj
Or proper garbage collection so you don't need to worry about who owns the
string and you don't have the big overhead of ref-counting.

Why on earth are they using C++ for a web browser anyway? That's about the
worst possible choice of programming language for that problem domain.

~~~
IvyMike
So which web browsers are not written in C++?

~~~
rwmj
Almost all of Windows (including apps that predominantly do string and tree
manipulation) is written in C++. Almost all of Linux userspace is written in
C. These are still terrible, insecure, fragile languages which cause
frustration and loss to programmers and users every day.

~~~
vardump
Shame you're downvoted, because you're pretty much right. Don't get me wrong,
correct code can very well be written in C/C++. It's just 95% of the
programmers using those languages are not very skilled. The programs tend to
be wrong and fragile. In C++, most of this fragility derives from its C-roots.

C/C++ is also a poor match to today's CPUs. It's very slow compared to what
the hardware is capable of. Compare for example with Intel ispc
([https://ispc.github.io/](https://ispc.github.io/)). It gives some idea how
much performance we're currently missing.

When C was young, memory latency was typically 1 or 2 clock cycles. There was
no pipelining, at most a simple state machine that would finish in a few clock
cycles. A branch didn't cost much. A few cycles at most. Neither did a pointer
reference. Random access was almost as fast as sequential access.

Today's CPUs have memory latency of 150-300 clock cycles. A modern CPU core
can typically retire 1-4 instructions per clock cycle. A single instruction
takes typically about 16 clock cycles from decoding until retire. So CPUs have
to often execute blind and just guess where the execution flow will go.
Branches modify this flow. CPUs simply guess the flow, branch predict. When
they're wrong, they just have to invalidate currently executing instructions
and start again. Branches are something to avoid. Especially unpredictable
ones. Function pointers are branches. Even though they can be predicted, they
often just fall out of the branch predictor cache.

We need something that can minimize the costs modern CPUs are bad at. C/C++ is
very branchy and uses slow function pointers often (vtable, switch jump
tables, etc).

The problem is, there's more variation among CPUs than ever before, even
within same instruction set architecture. For x86, not only the costs for
instructions are wildly different, but the instruction set support is
fragmented.

C/C++ is not the last word. Unsafe and way too slow compared to what current
hardware is capable of. The problem is, the language that is safer and a good
match just does not exist yet. Some safety can be sacrificed for greater
speed, but it should be situational choice by the programmer, not the only way
or even default.

C/C++ is what I do at my day job.

~~~
tedunangst
Neither of you are really answering the question. You still haven't proposed a
language is better.

When the half dozen browsers that represent about 99.9% of the browswer market
all use C++, perhaps the explanation is not "They're all wrong."

~~~
pjmlp
Around 20 years ago we had quite a few systems programming languages to choose
from.

The widespread of UNIX into the industry, killed the other languages because
they weren't the UNIX system programming language and other system vendors
weren't able to fight against UNIX based Workstations.

So like JavaScript in the browser, C eventually killed the alternatives in the
workstation market.

C++ was able to gain industry acceptance because C++ compiler vendors were
actually C compiler vendors bundling C++ in their products, since C++ came
from AT&T as well.

Additionally, the computers started to be fast enough for mainstream VM based
systems.

The business application developers moved along to VM based languages, while
the system programmers focused on the languages that were supported by larger
OS vendors, C and C++.

All the compiler vendors selling system programming languages compilers that
didn't enjoy first class treatment on a mainstream OS, either closed down or
changed business.

So 20 years later, C and C++ became the only alternatives for systems
programming, unless one wants to play with dead languages.

With Ada being used in projects that required it, or for software scenarios
where human life are at risk like aviation, train control systems, medical
devices and so on.

And now we are getting beaten every day in security exploits due to daggling
pointers and out of bounds errors.

~~~
vardump
C++ bugs are sometimes - too often - faster to find even from disassembly
_than from source code_. Seriously. Yes, it's a very slow approach. But you
can see what's really going on, and the picture is shockingly different from
the source code point of view. While disassembly is verbose, there's no
syntactic sugar that can obscure true function. C++ features, like operator
overloading and exceptions, locally hide actual functionality. I've seen
another language that has a _lot_ of different ways to do same thing and
enabled write-only development. Perl.

No, disassembly listing is not what I typically use for debugging. Just one of
the tools. The point is, something is wrong when disassembly can be easier to
understand than source code!

So depressing when you need to deal with it often. Our best tool is not very
good.

Well, all this puts bread on my table, so I guess I shouldn't complain too
much...

~~~
ConceptJunkie
Well, the problem with C++ is that it's way too easy to take its features too
far. I've seen commercial C++ libraries that overloaded operators to an absurd
degree. I don't recall the name at the moment, but it was a database library.

Overloading operators is a great feature, but should be used as sparingly as
possible, and only where it makes intuitive sense.

Templates offer even more rope to hang yourself, and even though I'm a late
convert to using templates, I would never suggest the feature goes too far.
It's kludge fuel, no doubt, but the language needs the feature, and it allows
you to do things that are truly useful and elegant.

It never occurred to me that viewing the disassembly could be useful for
debugging, and I would imagine it's not normally useful unless you are really
doing some pretty sophisticated stuff with the language. It would be extremely
educational to understand what the compiler actually does however, but if
inspecting the disassembly is instructive on how best to use the language,
then I would suspect the language design, or at least the compiler, is doing
something wrong.

I do think C++ requires a little too much consideration of how certain
operations are implemented, e.g., this very discussion, but it's a price I'm
willing to pay for a language that lets me do things any way I want to do
them.

I program in Python in my spare time and for small scripting tasks, and can't
imagine choosing C++ over Python for any personal project I've done in the
past couple years (which tend to be small anyway), but I use C++ at work and
am very happy to continue using it after 20 years on and off (mostly on).

------
ryandrake
If you dig deeper and actually look at the source diffs, you will see that
this is not about std::string being "bad" (it's not), but it's about problems
with how they're using std::string. Most of the trouble was constantly
converting back and forth between std::string and const char*, which
needlessly produces temporary allocations. Simply moving to passing everything
around as const references should help enormously with memory churn.

EDIT: Spelling :)

~~~
MichaelCrawford
I beat the subject of data representation completely to death in my article
"Pointers, References and Values" at:

[http://www.warplife.com/tips/code/c++/memory-
management/para...](http://www.warplife.com/tips/code/c++/memory-
management/parameters/)

One of the worst performance problems in C++ is quite commonly the creation of
invisible temporaries. Sometimes they are necessary but commonly they are not.

There are specific cases where it's better to pass a value rather than a
reference, for example if you have an integer in a register, and your
parameter passing conventions calls for passing parameters in registers, then
passing that integer as a value, rather than a reference (const or otherwise)
avoids the use of memory; it also avoids the use of the memory cache, which is
likely to be a more serious problem.

In general, for a C++ program to make lots of allocations of just one class
isn't such a bad thing. If your default implementation is slower than you
like, you might be able to speed things up considerably, as well as save
memory, reduce disk paging and so on by using a custom memory allocator just
for that class.

------
ctur
folly::fbstring, a drop-in replacement for std::string, is part of the folly
library that we (Facebook) open sourced a while back. It allocates small
strings in-line and larger strings on the heap and has optimizations for
medium and large strings, too. It's proven quite effective for us,
particularly when used with jemalloc, which it conspires with for more optimal
memory management. We use it _as_ std::string for our C++ applications and
libraries, completely replacing std::string both for our own code and third-
party code.

[https://github.com/facebook/folly/blob/master/folly/docs/FBS...](https://github.com/facebook/folly/blob/master/folly/docs/FBString.md)

In addition, it is worth noting folly::StringPiece (from folly/Range.h), which
is generally a better interface for working with in-memory ranges of bytes.
Hardly a new idea (it's inspired by similar libraries, such as in re2), but it
permeates the APIs of many of our larger C++ systems, and folly itself, and
often avoids passing std::string objects around at all.

Finally, there is also folly::fbvector, which offers similar improvements over
std::vector.

~~~
raverbashing
Yeah, and if I want to use it I need to replace std::string all around my
code.

I can't use it for other libraries, unless I replace and recompile them.

And tomorrow a new library comes I need to replace everything again.

Your effort is commendable, and I know squeezing gains are important in the
case of fb, but in the end it's just locking yourself in a library that
should've been a second though/built in.

~~~
Cthulhu_
1\. Find / replace, I'm sure the API is simple enough and performance gains
are worth the effort.

3\. You don't need to replace anything, ever - only if you feel like your
string performance is lacking, and that particular new library satisfies your
needs (in terms of effort vs performance gains). I don't get why people
believe that whenever something new comes out they have to switch over, and as
a result are terrified of innovation.

4\. I disagree with the "should've been built in" statement; the default
std::string implementation is good enough for most (like any std:: thing).
Besides, std:string was designed 25+ years ago, over a dozen CPU generations
ago; the demands of today are must different.

~~~
raverbashing
I agree with most points, but

"std:string was designed 25+ years ago, over a dozen CPU generations ago; the
demands of today are must different."

Sure, but you don't need to break the API for that

You can replace the allocator in a C program without changing malloc/free to
something else, this might have been a design goal. This way you can go back
and forth to it, run your tests again, compare performance again, etc

------
userbinator
_25000 (!!) allocations are made for every keystroke in the Omnibox._

The Omnibox is no doubt far more complex than simple text box since entering
characters into it can invoke things like network connections (for search
suggestions), but 25k allocs is still a bit on the excessive side.

Strings are an interesting case in that in general they are of indeterminate
(and variable) length, which makes them somewhat difficult to accommodate in
computer memory which is finite and allocated in fixed-length pieces.
Abstractions like std::string have been created to make it simpler and easier
to perform operations like appending, resizing, copying, and concenating, but
I think this is part of the problem: by making these operations so easy and
simple for the programmer, they're more inclined to _overuse_ them instead of
asking questions like "do I really need to create a copy just to modify one
character? do I really need to append to this string? how long can it be?"
Essentially, the abstraction encourages ignorance of the real nature of the
actual operations, leading to more inefficient code. It only helps the
programmer to perform these tedious operations more easily, and doesn't help
at all with the decision of whether such tedious operations should be needed
at all, which I think is more important; the first question when designing
shouldn't be "what abstractions should I use to do X?", but "do I really need
to do X, or is there are simpler way that doesn't need to?" The most efficient
way to do something is to not do it at all.

Contrast this with a language like C, in which string operations are (unless
the programmer writes or uses a library) far more explicit, and the programmer
can be more aware of what his/her code is actually doing. That's why I believe
every programmer who has to deal with strings should have at one point been
exposed to implementing a resizable string buffer and/or length-delimited
string library, to see the real nature of the problem (including how to do
length management correctly.) Without this basic, low-level understanding of
how to use memory, the advantages of all the other fancy string abstractions
won't make much sense either.

~~~
humanrebar
> Contrast this with a language like C, in which string operations are (unless
> the programmer writes or uses a library) far more explicit

String operations are more explicit. Mostly.

But some things (like sharing between threads, type conversion, and memory
ownership) are very implicit and unsafe. Some of the implicitness to C
programmers is so familiar that they don't notice it:

    
    
      /* 1. This function returns an error code
       *    and not a size
       * 2. zero is success, nonzero is an error */
      int
      /* 3. this method belongs in mylib
       * 4. this function supports ASCII???
       * 5. this function allows memory to overlap? */
      mylib_copy_first_n(
          /*  6. *out_s is allocated
           *  7. *out_s is greater than n bytes in size
           *  8. out_s will hold a null terminated string */
          char const * out_s,
          /*  9. in_s is allocated
           * 10. *in_s holds at least n characters
           * 11. it's not a big deal if in_s has null characters */
          char const * in_s,
          /* 10. n is not negative
           * 11. if n is zero, something sane will happen */
          int n);
    

I wouldn't hold up C as an example of explicitness. Now, this isn't a great
example of C code, but even in the best examples of C code, there is a lot of
correctness by convention and documentation.

To be fair, I'm not aware of any languages and libraries that are fully
explicit in type, behavior, and memory usage. Maybe Ada or one of the
functional languages get it right. But there are a set of C++ tools that can
make this sort of thing fast _and_ correct. That's why it's disappointing that
std::string isn't fast and is only mostly correct.

~~~
pjmlp

        mylib: DEFINITIONS =
        BEGIN
          copy_first_n: PROCEDURE [VAR out_s: STRING, in_s: STRING, n: CARDINAL ]
          RETURNS [error: BOOLEAN];
        END
    

Mesa at Xerox PARC - 1979, one of the influences for strong type systems
programming languages.

But the 80's startups had to go build workstations based on UNIX.

------
acqq
I'm old enough that I actually programmed in Turbo Pascal 3 (around 1985)
which of course produced quite fast code even for the speeds of the processors
then (4 MHz processors were enough for everybody, not really, but that's what
we had!). That Turbo Pascal had strings that were of the limited size, but
they were able to use the stack, not the heap. I still don't understand that
the library string in C++ can't use the heap instead of the stack even for the
small strings. Most of the allocations detected in the post are actually the
short-time ones, and I'm also quite sure that most of the strings aren't too
big, which means that using the strings in the Turbo Pascal style (on the
stack for the local variable) would remove the need for most of the
allocations.

I guess that will maybe come in 2020 in C++ standard, if anybody of the people
who would need that reads this and works hard. Yay for the march of progress.

~~~
simias
The C++ string library cannot bind the string to the stack frame since it
doesn't know how long it's going to live.

I suppose the compiler might realize that however and replace dynamic
allocation by a static one but I'm not sure it's allowed to do that.

~~~
repsilat
> The C++ string library cannot bind the string to the stack frame since it
> doesn't know how long it's going to live.

Surely some kind of string that lives on the stack can have its storage on the
stack because the storage goes out of scope exactly when the object does. I
think the only place you might run into trouble is if you try to `move` a
stack-allocated string somewhere, because it'll invariably result in a copy.

~~~
simias
Without any compiler magic involved I don't think it's possible to implement
that in C++ in a way compatible with the current std::string interface.

The problem is not the allocation of the std::string object itself, it's the
allocation of the underlying buffer containing the string. std::string has a
static size, the buffer itself is dynamically sized.

You could replace the heap alloc with something like calloc (or dynamically
sized arrays) in the constructor but then the buffer would be tied to the
stack frame of the constructor, not the calling scope (so the memory would
become invalid as soon as the constructor returns).

Unless I'm missing something completely obvious the only way to implement that
would be to ask for the caller to provide the buffer and let the string take
ownership of it. It's definitely possible but it's not how std::string works
and it's arguably more error-prone since unlike languages like rust there's no
full blown lifetime checker in C++, so you'd have to make sure that the string
object never outlives the supplied buffer.

Alternatively you might be able to supply a custom allocator to std::strings
but here be dragons.

~~~
repsilat
> std::string has a static size, the buffer itself is dynamically sized

Yeah, I guess if you're happy with the stack-allocated buffer being sized at
compile-time (either switching or overflowing into heap-allocated storage if
it gets too big) this all becomes very straightforward. If you want the stack
storage to be runtime-sized (or resizeable) then I agree it's going to be a
real pain in the neck. It doesn't fit into the C++ programming model cleanly
at all. A pity.

------
ajuc
I've worked on a project that used all of these: std::string, QString,
OString, char*. All were required by a different library that we needed.

This is why a good string type should be in core language.

~~~
raverbashing
This is the problem with C++ (and in a certain part, with C)

"Everybody" has their own C++ string (MFC, QT, several libraries, heck, even
GObject library has their strings)

Same with C

Now, you won't see anybody reimplementing strings in Java. C#, Python, JS, etc
Lessons learned, their strings work

std:string is certainly a step forward and should be used for most projects
today

~~~
roel_v
"Lessons learned, their strings work"

Except that they don't. Either they are 'complete' but massive and thus slow,
or they start as 'array of byte' and then their designers spend 10 years
implementing a more 'complete' string type that is still fast enough and end
up as #1 anyway.

Of course the C++ way where there is no string type that everyone uses sucks
too, it's just that strings are almost impossible to get 'right' because there
is no real 'right' and so many special cases that aren't apparent at first
sight.

~~~
kibwen
I think this bears trumpeting: strings are hard! It's easy to gloss over their
issues via garbage collection and pervasive heap allocation, but once you're
in a domain where you care about stack allocation and avoiding copies you
start running into difficult tradeoffs (above and beyond even the question of
string encoding, which is a different beast altogether).

Speaking as a dynamic language programmer who's trying to break into systems
programming, it took a long time for me to fully appreciate the difficulties
around strings. And now that I do, I'm perpetually paranoid about how many
allocations my Python programs are doing behind the scenes...

~~~
ajuc
If you care about performance use std::performant_but_tricky_string. 99% of
code doesn't care about string-related performance, but needs string anyway.

~~~
crpatino
If you care about performance _at all_ , strings are going to slow you down in
unexpected and counter-intuitive ways.

Far better if you pass around text data as binary buffers (with metadata
describing encoding, please), and only convert those to strings once they are
ready to be consumed by the user (which is _typically_ not where performance
bottlenecks show up anyways)

~~~
ajuc
If you care about performance a lot, it may well be that most of your critical
paths are in numeric code, and strings are only used to read input and write
output. So you should just use strings unless profiling shows problems there.

------
ExpiredLink
During the 10 years I used C++ I never saw a project that used std::string as
_their_ standard string. The implementation of std::string not standardized.
It may or may not have a 'small string optimization' which may not be an
optimization at all. Instead of specifying a mundane immutable built-in string
like in Java the C++ Standards committee decided to add even more 'advanced'
features to an already overloaded language.

~~~
richardwhiuk
Heh, as if Java's string was mundane, or even particularly well standardised.
The implementation for String have varied in each of the JVM versions, and
Android has a different version to the JDK.

Java's string, depending on the string, length, JVM version and a bunch of
other factors may or may not:

\- Be interned - so the result of "str1" == "str1" is compiler dependant (you
must do "str1".equals("str1")).

\- Maintain references to the string when doing a substring - e.g. "Java is a
language full of quirks......".substring(0,4) may or may not hold a reference
to the entire string. This has huge performance tradeoffs - e.g. if you parse
JSON by doing .substring(), and aren't careful, then you can end up holding
onto the entire unparsed JSON, even if you only keep track of a single obejct.

~~~
kjetil
Are you seriously suggesting that String incompatibilites are a big issues in
Java?

String interning is specified in the JLS:
[http://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#...](http://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-3.10.5)

As far as I can tell, the substring "feature" has been there since Java 1.0
and was fixed in Java 7. Hardly groundbreaking stuff.

------
DanielBMarkham
One of the recurring things I see in programming literature for the last 20
years or more is performance hits when using string. I've seen essays on this
in at least four different programming languages.

You'd think it'd be simple, but it's not. String is an allocated buffer of
unknown final size, and since it represents some kind of meaning in a human
language, and since human languages have indeterminate length for conveying
any one concept, concatentation is extremely common.

This is actually one of those cases where I like c better, warts and all.
Whenever you use a string, you should carefully think about what you're going
to do with it, and if at all possible allocate all you need up front. Beats
the heck out of taking an unexpected GC call somewhere later when you weren't
expeccting it.

~~~
zl4000
> Beats the heck out of taking an unexpected GC call somewhere later when you
> weren't expeccting it.

How is that an issue with c++ `std::string`

------
ceejayoz
I sure as hell believe it. I'm on an older, slightly creaky Macbook Pro, and
typing in Chrome's box is frequently a nightmare if I'm running something like
a VM. Keystrokes lag tremendously.

~~~
themoonbus
Same situation here... I have to clear my browser history every couple of
months in order for it to perform adequately.

~~~
gongar
Yeah, same here. I've to clear history like every few weeks. Else it lags so
badly.

------
hesdeadjim
It's always driven me nuts that the base string in the STL is a fully mutable
container that must manage it's own memory. I much prefer a string being
either immutable, or an interface that you can implement however you see fit:
a view over a raw buffer, a ref to a globally shared string instance that may
or may not be ref counted, or like the STL, a version that self-manages its
own memory.

Besides boost::string_ref, there is also a more generic flyweight
implementation requiring only an equality concept for it's template parameter:

[http://www.boost.org/doc/libs/1_56_0/libs/flyweight/doc/tuto...](http://www.boost.org/doc/libs/1_56_0/libs/flyweight/doc/tutorial/)

------
DigitalSea
It all makes sense now. For some reason on my gaming PC which is pretty spec'd
out in almost every way, Chrome will lag when typing into the Omnibox and
requires me to close it completely and reopen it frequently. For a while I
thought perhaps I had an issue with my CPU getting too hot, a bad plugin or
faulty RAM, but this exact issue appears to be the cause of all of my
problems. The only thing that seems to fix the issue temporarily is clearing
out my browsing history every few weeks, otherwise the issue gets to the point
where you can wait a couple of seconds for a word you have typed to appear.

Don't get me started on the performance of using Chrome inside of a VM, that
is a whole other world of hurt right there.

~~~
dwild
On my Asus Transformer Book t100, that I use for all my personal stuff, I
never have any issue with lag. In some rare case it can lag but it's temporary
and then it's fast again.

If your history is pretty big and it's all stored on a slow HDD, I can see how
your IO could be the bottleneck.

~~~
DigitalSea
I am a front-end developer, so my history grows a lot every single day. After
a few weeks, my history is a culmination of documentation links, testing
links, StackOverflow posts and more. I actually did some testing a little
while ago and could see the bottleneck taking place. My CPU usage would spike
up to 100% whenever I would type something into the Omnibox when I had a few
weeks worth of history. I spotted this on a clean install of Chrome, I signed
into my Google account, but disabled all plugins and it still occurred.
Weirdly enough, not everyone seems to experience this issue.

~~~
groby_b
If you feel comfortable sharing your history, filing a bug on crbug.com with
this info might help out the Chromium devs. (Don't share it right away - ask
to have the bug closed to public view before you share)

------
CyberDildonics
I'm not even sure if I do stuff like this in prototypes. My experience has
been that using a matrix/arena/pool can speed a program up that has inner loop
allocations by x7. I think the average pc can do about 10,000,000 heap
allocations per second, but as far as I know it causes thread locking to some
degree.

Don't many std::string implementations have small string optimizations? This
is actually the first time I have every heard of C++ strings being the
bottleneck of an application (and it seems that is even still up for debate
here).

~~~
paulhodge
If you're interested in this topic then you should read the whole discussion
in the link. There is a lot of talk about what optimizations are in play.

------
jamesu
In a certain 3d game engine we use, the devs decided to refactor a lot of old
code in the animation code which used a hashed string type to use their new
all-in-one reference counted String type instead.

Safe to say this turned out to be a disaster for performance since every time
an animation or not had to be evaluated from a name (which we ended up doing a
lot), a string had to be allocated on the heap.

Some people sadly underestimate how bad objects which rely on heap allocation
can be in performance-critical code.

------
amaks
There are many things in the standard C++ library which are named incorrectly
(std::string), awkwardly (std::unordered_map) and and implemented
inefficiently from the perspective of modern CPUs (same std::unordered_map,
which uses linked list for the underlying hash table buckets). See the great
talk on CppCon 2014 about those issues -
[https://www.youtube.com/watch?v=fHNmRkzxHWs](https://www.youtube.com/watch?v=fHNmRkzxHWs).

------
TazeTSchnitzel
One thing I've learned from PHP's internals is that using reference-counted
strings and copying on write is a fantastic idea. You can save an awful lot of
memory and allocations, and simplify your code.

~~~
adamtj
Reference counted copy-on-write strings are little landmines just waiting to
blow your leg off should you venture into multi-threaded territory.

If you use copies of such a string in multiple different threads, you may find
that simply creating a new copy of the original string or any of its
subsequent copies can cause an incorrect reference count which will trigger a
double-free and a segfault at some later time, possibly long after all non-
main threads have ended.

You can't even lock all accesses to a string with a simple mutex. You have to
also lock all descendant and ancestor copies, including all temporary copies,
as when passing by value to a function. A COW string is basically a pointer to
an internal shared data structure. You need to lock accesses to the shared
structure, not the pointers to it, and you can only do that internally.

The only way to reliably use such a string with multiple threads is to make
the internal reference count and buffer manipulations thread-safe. It may save
you some allocations and copies, but the thread safety will slow things down.
How much, I don't know. I wouldn't be surprised if thread-safe copy-on-write
is slower than copy-always.

~~~
kibwen
This is where Rust is worth mentioning: its Rc smart pointer will statically
prevent multiple threads from accessing the inner value, while its Arc (atomic
reference counting) smart pointer will let you share the value while using
atomic operations to adjust the refcount, same as C++'s shared_ptr. Rust's
move semantics by-default also mean fewer refcount adjustments overall, since
you can transfer ownership instead.

There's also a neat new copy-on-write smart pointer in the stdlib, though I
have no experience with it yet: [http://doc.rust-
lang.org/std/borrow/enum.Cow.html](http://doc.rust-
lang.org/std/borrow/enum.Cow.html)

------
wfunction
Why in the world are they even using c_str so often in the first place? That
makes me worry more about security holes than performance...

------
GregBuchholz
Well, it is a good thing they wrote it in such a high-performance language
then...

------
vegancap
I literally started programming in C++ this week and I figured the over-use of
std::string couldn't be a good thing hehe

~~~
zl4000
No that's BS. `std::string` should be used where ever it is applicable.

The whole issue that this post about chrome was talking about was dealing with
a poor usage of `std::string`, such as passing c_str() to then go and
construct another string instead of passing by const ref.

Or building a set of `std::string` to simply check if a value exists.

That's just shit code, not an issue with `std::string`.

~~~
humanrebar
There's a right way to do it. It's not the obvious way. Berating people for
coming to terms with that isn't helpful.

It's not their fault that C++ only really supports std::string out of the box.
What are the alternatives?

1\. const std::string & : what if I have a vector<char>?

2\. const char * : what if it's not null terminated?

3\. const char * and size_t : Better, but what if I have a deque<char>?

4\. const char * start, const char * stop : Better because you can write
algorithms around this, but still, doesn't help with deque<char>?

5\. template on START_ITER and STOP_ITER : The best we have now if you need an
extremely general solution. But I hope writing your implementation in headers
is fine.

6\. home grown type : a very popular choice, but
[http://xkcd.com/927/](http://xkcd.com/927/)

7\. boost::string_ref : maybe the best choice, as it can be created from 1-4
(and most 6's), but still doesn't work with deque<char>.

...so give the rookie a break. But I'll support any comment in a code review
about not accepting std::string by reference or by value in an interface.

~~~
nightcracker
#5 and many, many other reasons is why C++ desperately needs a standardized
range type. Let's pray it comes in C++17.

~~~
pja
What would a range type offer over a pair of iterators?

~~~
humanrebar
cout << takeFirstN(sort(myVector), 10) << '\n';

...try that with iterators.

