If you're declaring an object on the stack, then there is no reason to be using ...

umanwizard · 2024-05-06T12:27:56

> If you're declaring an object on the stack, then there is no reason to be using a pointer to refer to it.

Why not? What if you have some function f(T *) that you want to call?

But anyway, we're not _just_ talking about stack allocations, but also extra levels of indirection on the heap. For example, vectors store their elements in a heap-allocated buffer directly. If they kept them all in shared pointers, there would be an extra level of indirection. This means e.g. vector::operator[] has to return a reference (which is basically the same thing as a pointer under the hood); it can't return shared_ptr or similar (because storing all its elements as shared pointers would make it way slower due to the extra allocations).

In Rust, vector access is safe (due to the borrow checker), but in C++, it's not.

    vector<int> v {1, 2, 3};
    int& x = v[0];
    v.push_back(4);
    printf("%d\n");

This code is UB in C++. In Rust, it's impossible to write something like this.

    fn main() {
        let mut v = vec![1, 2, 3];
        let x = &v[0];
        v.push(4);
        println!("{x}");
    }

This code fails to compile.

HarHarVeryFunny · 2024-05-06T13:35:41

> Why not? What if you have some function f(T *) that you want to call?

In C++ (vs C), if the intent is to pass something large efficiently, then you'd use a reference parameter, not a pointer.

You seem to be confused about the meaning of C++ smart pointers - the whole point of them (as a replacement for C's raw pointers) is that they control and indicate ownership. You can't just assign a smart pointer to something you don't own (like an element of a vector). You can copy a shared_ptr to create an additional reference, or move a unique_ptr to move ownership.

A C++ compiler might generate a warning for that invalidated reference. clang++ is generally much better than g++, but I agree it'd be nice if a conforming compiler was forced to at least flag it, if not reject it.

The problem with doing this in the general case, where it's a user-defined (or library defined, as here) data structure, rather than one defined by the language, is that the compiler needs to inspect the implementation of that "push" method and realize that it might do something to invalidate references (& iterators). In the case of a library the compiler won't have access to the implementation to figure that out. How would Rust handle this if "vec" were a user-defined type where only the definition (not implementation) was available - how would it know that the push() was unsafe?

umanwizard · 2024-05-06T13:58:57

> In C++ (vs C), if the intent is to pass something large efficiently, then you'd use a reference parameter, not a pointer.

Sure, sorry, I was using "pointer" and "reference" interchangeably. Indeed, references are pointers under the hood.

> You seem to be confused about the meaning of C++ smart pointers

I am not confused at all. I understand exactly what unique_ptr and shared_ptr are in C++. They are basically the equivalent of Rust's Box and Arc (except that they can be null), but I used C++ before Rust so I learned about unique_ptr and shared_ptr first.

You are the one who asked what the advantage of Rust's borrow-checker is over C++-style memory management with smart pointers, but you seem to understand that it doesn't make sense to use smart pointers everywhere. Aren't you answering your own question? The advantage of Rust over C++ is that the borrow checker helps you in the cases where it doesn't make sense to use smart pointers / heap allocations.

You are the one who is maybe confused about what the borrow checker even is/does.

> A C++ compiler might generate a warning for that invalidated reference.

Neither clang nor g++ does so, even with -Wall. I just checked. How could they?

> I agree it'd be nice if a conforming compiler was forced to at least flag it, if not reject it.

If you did this then you would have basically reinvented the borrow checker.

> The problem with doing this in the general case, where it's a user-defined (or library defined, as here) data structure, rather than one defined by the language, is that the compiler needs to inspect the implementation of that "push" method and realize that it might do something to invalidate references (& iterators).

Not in Rust. It only needs to inspect the declaration. That is the whole point of the borrow checker. The fact that you think this can only be done for built-in types is what made me suspect that you don't understand what the borrow checker is.

The declaration of the indexing operator for Vec<T> is roughly (getting rid of some irrelevant details):

    fn index(&self, i: usize) -> &T

This is shorthand for

    fn index<'a>(&'a self, i: usize) -> &'a T

Those references (the `&self` and the returned `&T`) have the same lifetime. That lifetime cannot overlap with any lifetime of a _mutable_ reference to the same data. `push` can be declared like so:

    fn push(&mut self, value: T)

Because this requires a mutable reference to `self`, the compiler statically checks that it does not overlap with any other reference to the same data, which includes the reference returned by the indexing operation, which is why the example I gave won't compile. This works the same way with user-defined types; Vec is not special in any way.

The reason you can't do a similar thing in C++ is because it has no syntax for lifetimes. If you had a function on vector like

    const T& index(size_t i)

you have no idea if the returned `T` is derived from `this` or from somewhere else, so you don't know what its lifetime should be.

HarHarVeryFunny · 2024-05-06T15:08:56

Interesting - so essentially calling a "non-const" (mutable) method invalidates any existing references to the object, with this being implemented at compile time by not allowing the mutable method to be called while other references are still alive ?

How exactly is this defined for something like index() which is returning a reference to a different type than the object itself, and where the declaration doesn't indicate that the referred to T is actually part of the parent object? Does the language just define that all references (of any type) returned by member functions are "invalidated" (i.e. caught by compiler borrow checker) by the mutable member call?

What happens in Rust if you attempt to use a reference to an object after the object lifetime has ended? Will that get caught at compile time too, and if so at what point (when attempt is made to use the reference, or at end of object lifetime) ?

umanwizard · 2024-05-06T15:22:26

> Interesting - so essentially calling a "non-const" (mutable) method invalidates any existing references to the object, with this being implemented at compile time by not allowing the mutable method to be called while other references are still alive ?

Yes, exactly.

> How exactly is this defined for something like index() which is returning a reference to a different type than the object itself, and where the declaration doesn't indicate that the referred to T is actually part of the parent object?

Only if they have the same lifetime (the 'a in my example). For example, imagine a function that gets an element of a vector and uses that to index into another vector. You might write it like this:

    fn indirect_index<'a, 'b, T>(v1: &'a Vec<usize>, v2: &'b Vec<T>, i: usize) -> &'b T {
        let j = v1[i];
        &v2[j]
    }

The returned value is not invalidated by any future mutations of the first vector, but only the second vector, since they share the lifetime parameter 'b.

> What happens in Rust if you attempt to use a reference to an object after the object lifetime has ended?

This is prevented at compile time by the borrow checker. E.g.:

    // this takes ownership of the vec,
    // and just lets it go out of scope 
    fn drop_vec<T>(_v: Vec<T>) {
    }
    
    fn main() {
        let v = vec![1, 2, 3];
        let x = &v[0];
        drop_vec(v);
        println!("{x}");
    }

This program fails to compile with the following error:

    error[E0505]: cannot move out of `v` because it is borrowed
      --> src/main.rs:9:14
       |
    7  |     let v = vec![1, 2, 3];
       |         - binding `v` declared here
    8  |     let x = &v[0];
       |              - borrow of `v` occurs here
    9  |     drop_vec(v);
       |              ^ move out of `v` occurs here
    10 |     println!("{x}");
       |               --- borrow later used here

HarHarVeryFunny · 2024-05-06T19:23:37

Thanks!

HarHarVeryFunny · 2024-05-06T15:31:53

> Neither clang nor g++ does so, even with -Wall. I just checked. How could they?

Just by having built-in knowledge of standard library types such as std::vector, the same way the compiler has built-in knowledge of some library functions such as C's printf().

I wouldn't expect such policing to be perfect, but the compiler could at least catch simple cases where reference/iterator use follows an invalidating operation in the same function.

Don't get me wrong - I'm not defending C++. It's a beast of a language, and takes a lot of experience and self-discipline to use without creating bugs that are hard to find.

umanwizard · 2024-05-06T16:09:00

> I'm not defending C++.

Right, but you were asking what advantage Rust has over C++, which is what I'm trying to explain. (If you had instead asked what advantage C++ has over Rust, I'd have given a very different answer!)

> It's a beast of a language, and takes a lot of experience and self-discipline to use without creating bugs that are hard to find.

Rust makes creating a certain class of these hard-to-find bugs much harder.