Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: How come pass-by-reference is not the default in C++?
4 points by speedylight 9 months ago | hide | past | favorite | 8 comments
I recently learned the concept of pass-by-reference where instead of copying data into a function’s parameter, you simply give it the address of where the variable being passed is located in memory.

In C++ you have to explicitly define that you want to pass by reference instead of value. Would it not be more efficient to use pass by reference by default since you’re not duplicating data in memory potentially for no real reason?




There's no "pass-by-reference", never_has_been.jpg. With pass-by-reference what you're actually passing is a value, of a pointer. So you're "duplicating" a value: of the pointer.

What's the point of passing a pointer to an integer instead of passing the integer directly? You're just creating indirection and pointer-walking, which makes your code strictly worse.

Passing by value also prevents the function from having accidental side-effects of changing stuff outside it.

Passing by value allows you to save memory when passing things smaller than a pointer (like, say, an int in x86_64).

Every C/C++ programmer should learn assembly. It makes C trivial to learn, and would certainly have helped you analyze the reasoning behind stuff like this.


>There's no "pass-by-reference", never_has_been.jpg. With pass-by-reference what you're actually passing is a value, of a pointer. So you're "duplicating" a value: of the pointer.

thats in c , i think c++ has references


dyingkneepad is talking about how the cpu actually works (x86-64 & cache hierarchy). It doesn't really matter what constructs a particular programming language does or doesn't have -- we can reason about performance in terms of what x86-64 instructions the cpu ends up executing.

Suppose we want to compute the sum of n scalars, represented as n 32 bit floats, say, and we're weighing up if we should store these n scalars in a linked list or an array. The "introduction to algorithms" analysis might be that both approaches require O(n) storage and O(n) running time, so they're indistinguishable, up to some constants that get absorbed into the big-oh notation.

The basic big-oh analysis is true as a first cut for some theoretical simplification of a machine, but with our performance hats on, maybe we're interested in the constants being small so our code actually runs fast on a real machine -- we need to understand how a real machine works in a little bit more detail, in particular we need to have some crude mental model of the memory hierarchy.

On real hardware, if we use a linked list to store our data then each node in the list may end up in an unpredictable region of memory. The CPU will need to load these fragmented chunks of memory into L1 cache memory. Depending on the whims of the memory allocator, each time we read a node into cache we may get a single useful scalar value surrounded by a bunch of other junk -- our precious expensive L1 cache may be 90% filled with junk and only contain 10% or less useful data. (this is the big downside of "pointer-walking" AKA "pointer-chasing"). If our calculation is bottlenecked by memory bandwidth, we're wasting 90% of our machine's memory bandwidth to read useless data because we chose to store our data in an unpredictable, fragmented arrangement throughout memory.

In contrast, if we store our data in an array in a contiguous block of memory that we linearly traverse, each time we read a chunk of values into L1 cache, all the neighboring values are ones we're going to need to read next as we compute our sum. So we're filling our precious tiny L1 cache with 100% useful data and 0% junk. Because we load multiple useful data values each time we load a chunk of our array into cache, we reduce the number of times we need to load into cache, and also unblock more useful compute-bound work (adding scalar values) that the cpu can keep busy with before we need to load main memory into cache again.

see also:

CppCon 2014: Mike Acton "Data-Oriented Design and C++" https://www.youtube.com/watch?v=rX0ItVEVjHc

2022: Andrew Kelley - Practical Data-Oriented Design (DOD) https://vimeo.com/649009599


thanks for the in detail reply and the two links


A large number of parameters are scalars -- ints, chars, floats, pointers, etc. automatically passing them by reference would be terrible for all the reasons. For OOP, the most important parameter -- this -- is automatically passed by reference (essentially).

(I've heard a tale of a language/compiler (Fortran?) where everything was passed by reference so if you called function(5), it might set 5 (in a literal pool) to some other number...)

Many automatically pass-by-reference languages pass scalars by value so sometimes a parameter is a reference, sometimes it isn't. With C++ and typedefs, it's not always immediately obvious what's a class or a struct or an enum or a scalar.

As it is, for static/inline functions, C++ compilers can analyze the code and convert pass by value to pass by reference if it's appropriate.

Edit --

In most modern ABIs, small structs/classes are passed in registers or on the stack so there's little overhead. Large structs/classes are passed by pointer and then a new copy would be instantiated in the function. With inline constructors and functions (which would be the case for most everything in the STL), a reasonably good compiler could eliminate the copy constructor and use the pointer directly for read-only/const access.


It's all about backwards compatibility. C++ was originally a modification to C, which does not have the concept of references. In fact, while C++ was created in 1979, references were not added until 1985, so there's no way that they could have been the default anyway.

There's no way that it will be changed, either, as changing language behavior would break tons of production code.


Pass by value is safer because you're less likely to accidentally modify data outside the scope of your function.


Sometimes pass by value is what you need, for example passing a shared pointer to get the increment on use count.

Also I believe passing types smaller than or equal to pointer size by reference is redundant.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: