

Want speed? Pass by value. - VeXocide
http://cpp-next.com/archive/2009/08/want-speed-pass-by-value/

======
ramy_d
Not sure if i understood, but here's my summary from what i got from this:

In fewer words, the argument is this

    
    
      std::vector<std::string> get_names();
      std::vector<std::string> const names = get_names();
    

Passing by value causes a lot of under-the-hood moving and copying which is
slow. but we will learn later why this is actually correct.

    
    
      get_names(std::vector<std::string>& out_param );
      std::vector<std::string> names;
      get_names( names );
    

Passing by reference causes the need for many extra lines of code to be
written throughout the code base, no more constants, mutating variables, other
crap no one ever told you about when learning about pointers.

The solution to both of these is to use "RValue expressions". RValues are
expressions that create anonymous temporary objects.

When defining variables, using RValues allows transferring ownership of, in
this case a dynamically allocated string array (vector), from the source
vector to the target vector.

When using functions, returns are also anonymous temporary objects, so we
transfer the resources from the return value to the target value in the same
way as with variables.

Oh wait, the compiler actually takes care of optimizing stuff for you, it's
called Return Value Optimization (RVO) and it works like this:

    
    
      std::vector<std::string> names = get_names();
    

Oh shit, isn't this what we wrote as a first example that's expensive slow?
yeah well, apparently there's nothing to worry about. use this.

Do pass a function within a function, because then you're passing rvalues as
parameters which is unicorn-level magic

    
    
      std::vector<std::string> sorted_names2 = sorted( get_names() );
    

RVO optimizations aren't required by any standard, but "recent versions of
every compiler I’ve tested do perform these optimizations today."

Don't pass a variable by reference and then make an explicit copy of its
values - that defeats the whole purpose of what we are trying to talk to you
about

Guideline: Don’t copy your function arguments. Instead, pass them by value and
let the compiler do the copying.

Lesson: don't explicitly copy variable references, just get their values and
the compiler will copy optimize for you

------
edanm
Seeing this article reminds me of all the reasons I don't use C++ any more.

Instead of focusing on the important problems, like code structure, design, or
God forbid, the design of the actual _product_ , most of my life as a C++
programmer revolved around learning the mechanics of the language. And then
learning the exceptions. And then learning the optimizations. And then
learning the intricacies of the STL.

And I'm still not an experienced C++ programmer.

------
jheriko
This is very wrong for so many reasons and misses probably the most important
reasons to pass by copy...

The example of the sorted functions is striking - because the compiler should
be able to produce the same quality of code from both cases - the fact that it
doesn't suggests that the compiler is not that smart. The r-value, l-value
distinction is, in this case, unimportant - because of const the l-value can
not be modified, as long as you don't use const_cast or forget a volatile
keyword etc. elsewhere the compiler can safely assume that it really will
remain const. I'll have to play with this myself I guess cos I might be
missing something here that looking at bytes would reveal... but it seems that
your example should fail to show any difference on a good compiler.

Worse though is that the const-reference and explicit copy is done at all -
what possible reason could there be? This is where you should use pass by copy
/anyway/, before your argument about speed, because you quite explicitly need
to work with a copy. If you chose a suitable example you would see the
opposite... unless your compiler is smart enough to treat your redundant copy
as if it was a const-reference, which, e.g. the MS C++ compiler does infact
do.

The important point I think that was missed though is to always pass small
things by value - throwing heap pointers around to be dereferenced is
typically much more expensive than using the stack - your compiler might
optimise it away for you, but especially if you aren't using const
references/pointers it can be difficult. Imagine your compiler is not smart -
passing a 4/8-byte value as a reference to a 4/8-byte value is just silly.
This is probably the most common case - ints, floats, doubles and even small
structs will go on the stack, or better.

~~~
grogers
Yes, but in an article on copy elision, it doesn't really make sense to talk
about small objects because it really doesn't matter if they are copied or
not. Pass them by value anyways.

What are you referring to about the sorted function - that he says that the
compiler isn't smart enough to optimize the copy out of a function returning
function argument? His argument makes sense to me here, the caller doesn't
know anything about the internals of the function, so it has to allocate
separate space for the return value and the function argument, leading to at
least one copy. With inline functions or whole program analysis (or some type
of link time optimization) it should be smart enough to do this.

------
malkia
So how many things now you have to keep in your head, when you write and read
C++ code?

Also, this would probably not be optimized in Debug builds (-O0, -Od, etc.)
which means bigger difference between debug/release, which would make debug
useless, if more and more of such things were used in release.

Sorry I work in games, and I've seen the horrors of really slow debug, and
fast release... To the point where one of the leads say - who needs debug
version - just use printfs, or decode what the debugger meant to say (in
release).

~~~
s3graham
Oh yes. And then there's "release" and "release-final" (because we started
putting a few asserts into release after dropping "debug").

And then "release", "release-final", and "release-final-final" because
marketing needed some functionality in "release-final" that was went on QA
DVDs. :)

~~~
malkia
Funny that you mentioned it - but we have ReleaseDebug :) :) :) and Final, and
few others.

------
stingraycharles
I'm not 100% certain whether I like this. Consider the following concurrent
pseudo-code:

    
    
      class A {
      public:
      
        void add () {
         // <locking happens here>
         _v += "foo";
        }
      
        std::string const get () {
         // <locking again>
         return _v;
        }
      
      private:
       std::string _v;
      };
    
    
    

Now, if I launch multiple threads that call get () and add () at the same
time, normally this would be thread-safe, if locking occurs. However, if I
understood it correctly, get () can also return by reference, since it is
const.

Wouldn't this create race conditions?

~~~
tbrownaw
No.

If your function does "return <some local var>", as in

    
    
        Foo xyzzy() {
            Foo ret;
            /* do stuff to ret */
            return ret;
        }
    

then the compiler can rewrite it to be something like

    
    
        void xyzzy(void * _ptr) {
            Foo * _ret = new(_ptr)Foo();
            /* do stuff to *_ret */
        }
    

where the function-local variable that gets returned is actually just
constructed in the place that it would eventually be returned/copied to and so
doesn't have to be returned at all.

~~~
pdovy
This would introduce another performance consideration though, wouldn't it?
You may not be paying the cost for the explicit copy, but now you are paying
the cost of allocating space for a Foo on the heap instead of the stack.

It would depend on the situation, but this seems like it might just be wash at
the end in terms of performance?

~~~
124816
That weird new(T* )T(...) syntax is how you ask c++ to run the constructor of
T on an existing T* .

(In this case that T* would point to the stack.)

Note that the memory at the T* shouldn't be a real T; or if it is, its
destructor ought to be run first.

------
zandorg
I wrote my own class which created the data (basically an array) once, passed
it around, and automatically deleted the data once its parent function
returned.

I did this by having a reference counter which goes up on a constructor, and
down on a destructor, and when it gets to zero and destructs, only then does
it delete the data.

This gave me about 10% extra speed in the program as a whole, over using the
C++ classes in the STL.

~~~
czhiddy
<http://en.wikipedia.org/wiki/Reference_counting> ?

------
zokier
c++ is scary

~~~
jonsen
Yes. It induces respect. That is not necessarily a bad thing.

Perhaps C++ is for those who bother to learn a language before using it.

~~~
zokier
I think C++ is more for those people who can use a language without knowing it
and learn in the progress. I'm getting a feeling that knowing _all_ of C++ is
something that cannot be achieved in one lifetime, so those who want to learn
it before using will just spend all their time studying without actually doing
anything (some may call them academics).

------
d0m
Well, "just" because the pass-by-value example is simpler and clearer, I would
use that (screw the speed factor). And only if there's a provable penalty
speed on large data, I would consider to try other alternatives. Still, it's
cool to know that the "better way" is in fact the "faster way" in same time.

------
bediger
Isn't his caveat at the end of the article possibly very important? Knowing
where copy constructors get called seems like a tricky, yet performance-
important thing, given the costliness of cache-flushes and lack of locality of
reference C++ objects might incur.

------
karatchov
Please, anyone can recommend a good tutorial to understand C++ functions
arguments ? I'm starting coding in C++ and seeing this article makes me more
confused.

~~~
ori_b
Just read the ABI specs. A good start is the SysV ABI
(refspecs.freestandards.org/elf/IA64-SysV-psABI.pdf), although it doesn't
directly cover C++.

Then, the C++ ABI is defined here - specifically, the Itanium ABI draft, which
is in fact used on most GCC-supported systems, as far as I know. The
"Itanium": is a misnomer. <http://www.codesourcery.com/public/cxx-abi/>

------
tbrownaw
This kinda relies on your objects being cheap to copy/move. Try doing your
pass-by-value (not return-by-move) with a non-COW container, and see how it
works.

~~~
okmjuhb
This is totally wrong; the examples rely on compilers doing copy elision
together with return value optimization so that the copy becomes unnecessary
(even if a copy is expensive, it's fine - it never needs to happen). They do
not use copy on write containers (indeed, they return modified versions of
input variables).

~~~
tbrownaw
...yeah, I suppose he does limit the examples to just nested function calls
(and assumes everything ends up in the same object file).

~~~
jey
No, I'm pretty sure RVO is a feature of the calling convention, so it works
across TUs (object files).

~~~
tbrownaw
..wha?

OK, re-reading this it looks like maybe he is talking strictly about return-
value optimization. For some odd reason I thought he was claiming
significantly more than that (such as not making copies of your pass-by-value
parameters).

