
Performance Improvement with the StringBuilder for C++ - AndreyKarpov
http://www.codeproject.com/Articles/647856/4350-Performance-Improvement-with-the-StringBuilde
======
ot
The performance test is comparing apples to oranges, since the StringBuilder
is filled only once and the timing is taken only on the .Join() method.

If you replace the code

    
    
        start = clock();
        for (int i = 0; i < loops; ++i) {
            std::wstring result2 = tested.ToString();
        }
        double secsBuilder = (double) (clock() - start) / cps;
    

with

    
    
        start = clock();
        for (int i = 0; i < loops; ++i) {
            StringBuilder<wchar_t> wide;
            wide.Add(tested2.begin(), tested2.end()).AppendLine();
            std::wstring result2 = wide.ToString();
        }
        double secsBuilder = (double) (clock() - start) / cps;
    

the results go from

    
    
        Accumulate took 0.134847 seconds, and ToString() took 0.014123 seconds.
        The relative speed improvement was 854.804%
    

to

    
    
        Accumulate took 0.146171 seconds, and ToString() took 0.099802 seconds.
        The relative speed improvement was 46.461%
    

Much less impressive.

Furthermore, the author is not using any of the standard techniques to avoid
memory allocations in C++ (such as reusing the same container with .clear()
instead of creating a new one each time), that would improve even more the
performance.

Besides, despite what the author says, std::list is an _awful_ container (one
allocation per element, terrible locality, ...). You should never use it,
unless you really know what you are doing (for example, see Stroustrup's
recent talks).

~~~
jemfinch
> std::list is an awful container...You should never use it, unless you really
> know what you are doing

To put a finer point on that: you should use std::list iff you need to splice
(or insert into the middle) in constant time.

~~~
dkersten
And order is important and splicing is a common task and the list is large
enough or copying elements expensive enough that other schemes are not
viable...

Linked lists really have a very narrow use case.

------
AshleysBrain
The article doesn't go in to much detail about why that class is really
necessary. Couldn't you just loop the vector of strings, work out the total
memory, then `reserve()` exactly the right amount? Then all the concatenations
should be fast.

Alternatively `std::ostringstream` is specially designed for this type of task
as well... how does it compare? Is it better/worse? Looks like reinventing the
wheel to me.

~~~
scast
Somebody in the OP already commented about ostringstream and the article was
updated accordingly. ostringstream is better, and I suposse more idiomatic.

------
quchen
Why concatenate the strings at all instead of printing them piece by piece?
Don't flush in between and it'll land in the same buffer. (Even if you flushed
after every string, if output is "lightning fast", that shouldn't matter
either.)

~~~
alayne
It says they are to be written to a file. That's like the textbook case for
buffered I/O writes. Get rid of that silly concatenation, it's just increasing
the amount of data copying.

~~~
gngeal
String concatenation? Are we living in the 1980's? What about string trees?
These problems can be solved simply using a proper data structure.
Concatenation, insertion, deletion etc. should be almost constant-time
operations.

~~~
alayne
Data structures like ropes are really more for editing than concatenation.

~~~
phaemon
I'd never heard of "ropes" in that context and read that as the worst simile
ever ("Data structures, like ropes, are...")

------
ndepoel
How does this compare to std::stringstream? I don't see any mention of that
class in this article.

~~~
CJefferson
I was just about to post this.

If I use ostringstream, and also I change the code so it has to construct the
StringBuilder every test (at the moment they build it once and then keep
calling 'toString'), then I get the output (from the test program on that
website):

    
    
        Accurate performance test:
          ostringstream took 0.0120331 seconds, and ToString() took 0.0221947 seconds.
          The relative speed improvement was -45.784%
          Join took 0.0176613 seconds.

~~~
jonhohle
I came to post the same thing, and got similar results:

    
    
         Accumulate   took 0.00195327 seconds
         ToString()   took 0.00283577 seconds.
         Join         took 0.00462704 seconds.
         stringstream took 0.00084927 seconds.
        The relative speed improvement was -71.1482%

------
dalore
This isn't just a C++ issue, in nearly every language strings will be
immutable, that is if you add strings together it needs to create a new string
somewhere with the new length of the string. So if you adding multiple strings
together it does this each time. The better way (and how stringbuilder and
it's elk do it) is to put the strings into an array and then concat (or even
better output/send that to the thing that is needing the concatted string).

~~~
Eiwatah4
What's interesting here to me is that C++ strings are not immutable. So I'd
have expected them to behave basically the same way as StringBuilders in other
languages. But apparently they are required to be stored continuously, and I
guess that's what makes them slower here.

~~~
archangel_one
Yes, I think the semantics of c_str() and data() effectively require that it
is stored contiguously.

Although it is still possible to make it faster by overallocating in the same
way as std::vector, but at the cost of more memory use.

~~~
Someone
I haven't looked at any implementation recently, but the standard specifically
leaves open that implementatioms postpone joining string buffers until c_str()
or data() is called (also, the pointers returned by those calls could contain
copies of the strings; that is not something I would expect, but I see nothing
in the standard that precludes it)

~~~
Eiwatah4
[http://en.cppreference.com/w/cpp/string/basic_string/c_str](http://en.cppreference.com/w/cpp/string/basic_string/c_str)

According to that link, c_str() and data() work in constant time. With that
restriction, it's impossible to do the joining lazily - it must be done when
data is added to the string.

~~~
Someone
Ha, it looks like they changed that in C++11.
[http://www.cplusplus.com/reference/string/string/data/](http://www.cplusplus.com/reference/string/string/data/)
claims "Unspecified or contradictory specifications." for C++98, but constant
complexity for C++11.

An answer to
[http://programmers.stackexchange.com/questions/124731/what-p...](http://programmers.stackexchange.com/questions/124731/what-
performance-can-we-expect-from-stdstrings-c-str-always-constant-time)
indicates that C++03 doesn't require constnat time, either.

Thanks for the education.

------
dalek_cannes
If you _really_ want performance, declare a fixed size char array (so no heap
use) optimized for the best write size for your disk, mem copy the strings
sequentially until you fill the array and write. Go back to beginning of array
and repeat. _Runs back to cave_

------
twoodfin
I write enough performance-sensitive code that I've gotten into the habit of
calling .reserve() with a generous final size estimate immediately after
constructing a string or vector (assuming I'm not using a constructor that
sizes it appropriately to begin with). It's hard to overestimate just how
_expensive_ repeated calls to malloc()/free() are.

In the innermost of inner loops, I've been known to use a static string or
vector to avoid repeated allocation entirely. Only in single-threaded code of
course!

------
vinkelhake
This is just a side note, but a problem with accumulate in this context is
that is defined as doing `acc = op(acc, element)` for each element. This means
that whatever allocation the accumulator had is going to be thrown away on
each iteration of the loop. Had it been defined as `acc += element`, then
allocation schemes such as doubling the allocated memory would have been more
effective and greatly reduces the number of allocations (and copies).

------
cheez
Just last night I improved the startup of one of my apps in C++ which had a
previously unexplainable 1 second delay by precomputing some string joins and
splits at build time. I nearly cried. Thanks to the Instruments app on OSX
which is seriously awesome!

Startup is now instantaneous. It was also making queries slower. Queries are
now also instantaneous.

------
mjcohen
I had a similar problem with a gawk (yes, gawk!) program I was writing. I had
to accumulate 10,000,000 32-character strings to produce a 320,000,000 (three
hundred and twenty million!) character string.

It was taking forever.

I eventually realized that this string reallocation that was being done
10,000,000 times was the problem.

To solve this, I did a two-level accumulation (perhaps three levels would have
been better, but two was enough). I first accumulated 3,000 of the
32-character strings (3,000 because that was about the square root of
10,000,000).

I then accumulated the (about) 3,000 of these (about) 100,000 character
strings.

The result took about 30 seconds, which was good enough for what I needed to
do.

------
Someone

        string s = accumulate(vec.begin(), vec.end(), s);
    

Is that legal C++? I would think that passes s to 'accumulate' before
constructing it
([http://www.gotw.ca/gotw/001.htm](http://www.gotw.ca/gotw/001.htm)). IMO, a
correct way to do this would be:

    
    
        string s; // calls string::string()
        s = accumulate(vec.begin(), vec.end(), s);
    

or string s = accumulate(vec.begin(), vec.end(), "");

------
10098
This is reinventing the wheel. There are string streams fro that.

~~~
eropple
Stringstream is _really_ slow. It's great for a lot of stuff, but performance
isn't one of them.

~~~
jonhohle
Except, as pointed out below, it's significantly faster than this
implementation.

~~~
eropple
Huh, so it is. That's really surprising to me, but I guess those standard
library folks are smart fellas. =)

~~~
10098
Yeah, never underestimate the library authors. There were a couple of times I
thought I had found a bug in a standard library implementation only to be
pointed at the language standard and told that it's supposed to work that way
:)

------
pjbringer
You have to give credit to languages like java or C# which provide the
programming interface wich does the right thing. Those who use lower level
languages because they want better performance should reconsider unless they
have the required know-how. It baffles me that someone would write C++, and
mindlessly concatenate string.

~~~
CJefferson
Code written in Java would have exactly the same problem I believe? You are
advised to use a StringBuilder.

C++ has a StringBuilder, it's called std::ostringstream, but the author didn't
seem to know about it, so reinvented it.

To be polite, his reinvention is reasonable, and knowing about this problem is
useful.

~~~
mbell
Not quite exactly the same issue, not sure about all JVMs but the HotSpot JIT
will replace concatenation with StringBuilder usage in many cases but it may
not be ideal.

For example it may create a new StringBuilder in every iteration of a loop
whereas you may be able to code it such that only a single StringBuilder needs
to be created and you may be able to provide better initial array size
hinting. If it's just a single concatenation statement, building a log message
or something, then using the '+' operator won't have much if any impact on
performance.

~~~
jonhohle
> Not quite exactly the same issue, not sure about all JVMs but the HotSpot
> JIT will replace concatenation with StringBuilder usage in many cases but it
> may not be ideal.

It's not even the JIT, it's a static transformation at byte code creation
time. Last I checked:

    
    
        String s = "foo" + "bar";
    

Produced identical byte code to:

    
    
        String s = new StringBuilder()
            .append("foo")
            .append("bar")
            .toString();

------
toblender
That's the most C++ I've looked at in years. Great tip on adding memory
allocation into the list of resource thieves.

I'm actually wondering if we can get a speed boost for javascript in a similar
way. I find myself concating strings together often in the code.

------
mpyne
Qt has had a compile-time string builder since about 4.6 or so, for those
wanting to take advantage in real-life code.

Just grep the QString API docs for "QStringBuilder".

------
polskibus
wouldn't it be more in C++11 manner to use move semantics to solve the problem
of generating new strings on concatenation?

~~~
zaphoyd
Move wouldn't actually help in this case. Move works by letting a wrapper
object (like a string or vector) that manages a pointer to some dynamically
allocated storage take over the pointer of the object being moved rather than
allocating new storage, copying data, then freeing the old storage.

In the case of concatenation, where the goal is to end up with a contiguous
array of the characters from the strings to be joined, no block of memory
sufficiently large exists anywhere to be appropriated, so new memory must be
allocated.

------
corresation
While it was dealing with C strings, some years back I was curious about
Firefox's poor Sunspider showing, so I dove into both the benchmark and the
code, determining that-

a) SunSpider was overwhelmingly a benchmark measuring string concatenation
performance. b) Firefox had slow string concatenation.

The solution to b) was trivial -- whenever Firefox saw that you were doing str
= str + something, it would realloc str to the new length of
len(str)+len(something)+1 and then strcpy something to the tail of str. By
changing the code slightly to trade a relatively small amount of memory (in
most situations), making every realloc size to the next power-of-two greater
than the new combined length, this improved SunSpider performance 20x+ because
the vast majority of concatenations could be done in place.

