Want speed? Pass by value.

edanm · on June 18, 2010

Seeing this article reminds me of all the reasons I don't use C++ any more.

Instead of focusing on the important problems, like code structure, design, or God forbid, the design of the actual product, most of my life as a C++ programmer revolved around learning the mechanics of the language. And then learning the exceptions. And then learning the optimizations. And then learning the intricacies of the STL.

And I'm still not an experienced C++ programmer.

ramy_d · on June 18, 2010

Not sure if i understood, but here's my summary from what i got from this:

In fewer words, the argument is this

  std::vector<std::string> get_names();
  std::vector<std::string> const names = get_names();

Passing by value causes a lot of under-the-hood moving and copying which is slow. but we will learn later why this is actually correct.

  get_names(std::vector<std::string>& out_param );
  std::vector<std::string> names;
  get_names( names );

Passing by reference causes the need for many extra lines of code to be written throughout the code base, no more constants, mutating variables, other crap no one ever told you about when learning about pointers.

The solution to both of these is to use "RValue expressions". RValues are expressions that create anonymous temporary objects.

When defining variables, using RValues allows transferring ownership of, in this case a dynamically allocated string array (vector), from the source vector to the target vector.

When using functions, returns are also anonymous temporary objects, so we transfer the resources from the return value to the target value in the same way as with variables.

Oh wait, the compiler actually takes care of optimizing stuff for you, it's called Return Value Optimization (RVO) and it works like this:

  std::vector<std::string> names = get_names();

Oh shit, isn't this what we wrote as a first example that's expensive slow? yeah well, apparently there's nothing to worry about. use this.

Do pass a function within a function, because then you're passing rvalues as parameters which is unicorn-level magic

  std::vector<std::string> sorted_names2 = sorted( get_names() );

RVO optimizations aren't required by any standard, but "recent versions of every compiler I’ve tested do perform these optimizations today."

Don't pass a variable by reference and then make an explicit copy of its values - that defeats the whole purpose of what we are trying to talk to you about

Guideline: Don’t copy your function arguments. Instead, pass them by value and let the compiler do the copying.

Lesson: don't explicitly copy variable references, just get their values and the compiler will copy optimize for you

jheriko · on June 18, 2010

This is very wrong for so many reasons and misses probably the most important reasons to pass by copy...

The example of the sorted functions is striking - because the compiler should be able to produce the same quality of code from both cases - the fact that it doesn't suggests that the compiler is not that smart. The r-value, l-value distinction is, in this case, unimportant - because of const the l-value can not be modified, as long as you don't use const_cast or forget a volatile keyword etc. elsewhere the compiler can safely assume that it really will remain const. I'll have to play with this myself I guess cos I might be missing something here that looking at bytes would reveal... but it seems that your example should fail to show any difference on a good compiler.

Worse though is that the const-reference and explicit copy is done at all - what possible reason could there be? This is where you should use pass by copy /anyway/, before your argument about speed, because you quite explicitly need to work with a copy. If you chose a suitable example you would see the opposite... unless your compiler is smart enough to treat your redundant copy as if it was a const-reference, which, e.g. the MS C++ compiler does infact do.

The important point I think that was missed though is to always pass small things by value - throwing heap pointers around to be dereferenced is typically much more expensive than using the stack - your compiler might optimise it away for you, but especially if you aren't using const references/pointers it can be difficult. Imagine your compiler is not smart - passing a 4/8-byte value as a reference to a 4/8-byte value is just silly. This is probably the most common case - ints, floats, doubles and even small structs will go on the stack, or better.

grogers · on June 18, 2010

Yes, but in an article on copy elision, it doesn't really make sense to talk about small objects because it really doesn't matter if they are copied or not. Pass them by value anyways.

What are you referring to about the sorted function - that he says that the compiler isn't smart enough to optimize the copy out of a function returning function argument? His argument makes sense to me here, the caller doesn't know anything about the internals of the function, so it has to allocate separate space for the return value and the function argument, leading to at least one copy. With inline functions or whole program analysis (or some type of link time optimization) it should be smart enough to do this.

malkia · on June 18, 2010

So how many things now you have to keep in your head, when you write and read C++ code?

Also, this would probably not be optimized in Debug builds (-O0, -Od, etc.) which means bigger difference between debug/release, which would make debug useless, if more and more of such things were used in release.

Sorry I work in games, and I've seen the horrors of really slow debug, and fast release... To the point where one of the leads say - who needs debug version - just use printfs, or decode what the debugger meant to say (in release).

s3graham · on June 18, 2010

Oh yes. And then there's "release" and "release-final" (because we started putting a few asserts into release after dropping "debug").

And then "release", "release-final", and "release-final-final" because marketing needed some functionality in "release-final" that was went on QA DVDs. :)

malkia · on June 19, 2010

Funny that you mentioned it - but we have ReleaseDebug :) :) :) and Final, and few others.

jheriko · on June 18, 2010

Try building the performance critical sections with release settings in debug builds and have a seperate "slow debug" version which doesn't. Its not a magic bullet that will make debug as fast as release, but it can usually bring debug framerates to acceptable levels whilst letting you use the debugger on whatever you are working on most of the time (unless you are trying to fix some core engine bug etc...)

malkia · on June 18, 2010

There are no performance critical sections with RVALUES - it's everywhere - it's not isolated.

It's not like here is the skinned code on CPU using SSE2, and here is the video decoder, and here is the this and that.

The same effect have the overloaded math operators. Yes they are inlined expanded in release (-O9 if you will), but usually not (unless explicitly specified) in debug. So once you start using them more and more, every even simple math operation becomes function call (in debug), to a point where I have seen 6x slowdown (in one of our heavily templated math libraries - that had many types of vectors - direction, position, normal, etc (depending on it's contents, but still all of them 3 or 4 dimensions), and many types of 4x4 and 3x4 matrices (diagonal, LU, identity, etc.). It required so much to type in the interface in the function, that it was heavy burden on the programmer.

I much prefer simple float* or double* (or both), rather than some obscured DirectionVector3, or PositionalVector3, DiagonalMatrix44, and such.

We are so much led to believe that if we give the compiler the best instructions it'll help us, that we forget that what helps the compiler in this case hurts us as programmers. It's the compiler that should be figuring out these things, not us. And if it can't, then we better find some compromise.

stingraycharles · on June 18, 2010

I'm not 100% certain whether I like this. Consider the following concurrent pseudo-code:

  class A {
  public:
  
    void add () {
     // <locking happens here>
     _v += "foo";
    }
  
    std::string const get () {
     // <locking again>
     return _v;
    }
  
  private:
   std::string _v;
  };

Now, if I launch multiple threads that call get () and add () at the same time, normally this would be thread-safe, if locking occurs. However, if I understood it correctly, get () can also return by reference, since it is const.

Wouldn't this create race conditions?

tbrownaw · on June 18, 2010

No.

If your function does "return <some local var>", as in

    Foo xyzzy() {
        Foo ret;
        /* do stuff to ret */
        return ret;
    }

then the compiler can rewrite it to be something like

    void xyzzy(void * _ptr) {
        Foo * _ret = new(_ptr)Foo();
        /* do stuff to *_ret */
    }

where the function-local variable that gets returned is actually just constructed in the place that it would eventually be returned/copied to and so doesn't have to be returned at all.

pdovy · on June 18, 2010

This would introduce another performance consideration though, wouldn't it? You may not be paying the cost for the explicit copy, but now you are paying the cost of allocating space for a Foo on the heap instead of the stack.

It would depend on the situation, but this seems like it might just be wash at the end in terms of performance?

124816 · on June 19, 2010

That weird new(T* )T(...) syntax is how you ask c++ to run the constructor of T on an existing T* .

(In this case that T* would point to the stack.)

Note that the memory at the T* shouldn't be a real T; or if it is, its destructor ought to be run first.

zokier · on June 18, 2010

c++ is scary

jheriko · on June 18, 2010

i think its only the hordes of terrible C++ programmers that make it seem that way...

jonsen · on June 18, 2010

Yes. It induces respect. That is not necessarily a bad thing.

Perhaps C++ is for those who bother to learn a language before using it.

zokier · on June 18, 2010

I think C++ is more for those people who can use a language without knowing it and learn in the progress. I'm getting a feeling that knowing all of C++ is something that cannot be achieved in one lifetime, so those who want to learn it before using will just spend all their time studying without actually doing anything (some may call them academics).

jpr · on June 18, 2010

If that were the case, the set of programmers who use C++ should be much smaller than it is currently.

To me, this article describes yet another reason to not even try learning C++.

zandorg · on June 18, 2010

I wrote my own class which created the data (basically an array) once, passed it around, and automatically deleted the data once its parent function returned.

I did this by having a reference counter which goes up on a constructor, and down on a destructor, and when it gets to zero and destructs, only then does it delete the data.

This gave me about 10% extra speed in the program as a whole, over using the C++ classes in the STL.

czhiddy · on June 18, 2010

http://en.wikipedia.org/wiki/Reference_counting ?

koenigdavidmj · on June 18, 2010

Sounds like the Flyweight design pattern. Basically, if you have a lot of small objects that would take a lot of memory, then you just cache them all.

The canonical use case is a word processor that needs one instance of a letter class for each letter in a document. That's a lot of letters, but there is also a lot of repetition. However, if you have several hundred letters `e', all of the same font face and size, then why not just use the same reference to that identical object and save a lot of space? (You improve your program's spatial locality quite a bit as well.)

d0m · on June 18, 2010

Well, "just" because the pass-by-value example is simpler and clearer, I would use that (screw the speed factor). And only if there's a provable penalty speed on large data, I would consider to try other alternatives. Still, it's cool to know that the "better way" is in fact the "faster way" in same time.

bediger · on June 18, 2010

Isn't his caveat at the end of the article possibly very important? Knowing where copy constructors get called seems like a tricky, yet performance-important thing, given the costliness of cache-flushes and lack of locality of reference C++ objects might incur.

karatchov · on June 18, 2010

Please, anyone can recommend a good tutorial to understand C++ functions arguments ? I'm starting coding in C++ and seeing this article makes me more confused.

ori_b · on June 18, 2010

Just read the ABI specs. A good start is the SysV ABI (refspecs.freestandards.org/elf/IA64-SysV-psABI.pdf), although it doesn't directly cover C++.

Then, the C++ ABI is defined here - specifically, the Itanium ABI draft, which is in fact used on most GCC-supported systems, as far as I know. The "Itanium": is a misnomer. http://www.codesourcery.com/public/cxx-abi/

tbrownaw · on June 18, 2010

This kinda relies on your objects being cheap to copy/move. Try doing your pass-by-value (not return-by-move) with a non-COW container, and see how it works.

okmjuhb · on June 18, 2010

This is totally wrong; the examples rely on compilers doing copy elision together with return value optimization so that the copy becomes unnecessary (even if a copy is expensive, it's fine - it never needs to happen). They do not use copy on write containers (indeed, they return modified versions of input variables).

tbrownaw · on June 18, 2010

...yeah, I suppose he does limit the examples to just nested function calls (and assumes everything ends up in the same object file).

jey · on June 18, 2010

No, I'm pretty sure RVO is a feature of the calling convention, so it works across TUs (object files).

tbrownaw · on June 18, 2010

..wha?

OK, re-reading this it looks like maybe he is talking strictly about return-value optimization. For some odd reason I thought he was claiming significantly more than that (such as not making copies of your pass-by-value parameters).