
Rust vector guarantees - chewbacha
https://doc.rust-lang.org/std/vec/struct.Vec.html#guarantees
======
saagarjha
Just curious: what are differences between the guarantees provided by Rust's
vectors and std::vector from C++?

~~~
Manishearth
It's hard to know what really you're asking for, and how much you know about
Rust.

Rust itself has a bunch of guarantees. Rust's vector is guaranteed to work
well within that framework. So,

\- No dangling pointers / UAF from stuff it gives you

\- No iterator invalidation from iterators you get from it

\- You can't push to it and invalidate references to its insides (this is
really the iterator invalidation one over again)

\- You can't use it in a way that will not be data race safe

These are all Rust guarantees; Vec just adheres to them. You can say that
these guarantees are Vec guarantees too (which is why I'm saying "I'm not sure
what you're asking for).

c++ std::vector does not have these guarantees because c++ does not have these
guarantees. You can get some of these guarantees by restricting yourself to
coding a certain way but they kinda fall apart when stuff gets complex in my
experience.

Aside from that it's basically the same guarantees as std::vector, i.e. "will
store stuff in contiguous space", "will allocate buffer with extra capacity as
it grows" (this isn't actually a guarantee, but it's not going to change for
either).

One thing Rust's vector guarantees that C++ doesn't is that Rust's vector will
not store elements inline; anything (with nonzero size) you push into it will
be heap allocated.

Rust's vector also never implicitly performs deep copies, but this isn't
really a guarantee.

Rust also has a concept of zero sized types, which C++ does not, and Rust's
vector deals with them in a certain way, however since C++ doesn't have these
types you can't really compare.

It's possible C++ guarantees a drop order for where Rust's vector does not.
I'm not sure.

~~~
saagarjha
> how much you know about Rust

Very little, so please bear with me ;)

For example, from the linked page:

> Most fundamentally, Vec is and always will be a (pointer, capacity, length)
> triplet. No more, no less.

Does C++ also provide this guarantee?

> Vec will never perform a "small optimization" where elements are actually
> stored on the stack for two reasons:

Does C++ do this?

and from your response:

> No iterator invalidation from iterators you get from it

How are you guaranteeing this? C++ doesn't, so what is going on behind the
scenes to make this work in Rust?

As you can see, I don't know enough about C++ as I would like either…

~~~
Manishearth
> Does C++ also provide this guarantee?

I don't think it does. It might.

> Does C++ do this?

IIRC C++ is allowed to perform the optimization. I don't recall if any current
implementations do. They might.

Edit: I'm wrong, it's not, however std::string is allowed to.

> How are you guaranteeing this? C++ doesn't, so what is going on behind the
> scenes to make this work in Rust?

Because Rust has an entire system of ownership and borrowing which exists at
compile time to ensure these things don't happen.

~~~
saagarjha
Well, I must be misunderstanding how iterators work in Rust, then. AFAIK in
C++ an iterator is a thin wrapper around a pointer in memory, and resizing the
container can cause the underlying "base" array to move around to somewhere
else if it's reallocated–thus, the invalidation of the old iterator. What is a
Rust iterator doing in this case?

~~~
Manishearth
The same thing. At compile time rust won't allow you to push to a vector if
you have outstanding iterators.

~~~
kretin45
So, sorry for being not bright. But if I may ask, if I have a Vec<B> A, then,
it has I have a mutable reference to it in the form of A. Now, I take a const
iterator to an object in A, say it, then, though the iterator is const for
object of type B, it is a mutable reference for the Vec<B> A, and hence, I now
cannot reallocate A till all the iterators have died(most probably because of
scope). Is that true?

~~~
Manishearth
I don't really understand the scenario you're putting out. Why would the
iterator be a mutable reference? (there are mutable iterators but you
explicitly said const).

Basically, in Rust, this happens:

    
    
        let x = vec![1,2,3,4];
        let iterator = vec.iter();
        vec.push(5); // will not compile, until iterator goes out of scope

~~~
kretin45
Yes. So I'm asking the logic behind this in context of ownership model. How
can you derive the above behavior from the principle that only one mutable
reference is allowed? If const iterator is not a mutable reference to a Vec,
then A itself is the only mutable reference, and push back should be allowed.

~~~
viraptor
Because vec.push has signature push(&mut self, value) [https://doc.rust-
lang.org/std/vec/struct.Vec.html#method.pus...](https://doc.rust-
lang.org/std/vec/struct.Vec.html#method.push)

That means you can think of it as push(&mut vec, 5) - you need to pass a
mutable reference to the vec as the first a argument and the iterator is
already holding one of them. So you're not allowed to.

------
OtterCoder
The one part of the spec that gives me pause is the fact that dropped memory
isn't cleared. That sort of behavior has been the cause of so many network
exploits. Is it worth worrying about here?

~~~
cwzwarich
If your language (and program in Rust's case, since Rust has unsafe code) is
memory safe, then the contents of deallocated memory shouldn't be internally
observable. Technically speaking, you could add an API to Rust that returns an
Option of a buffer by virtual address, and this would be completely safe.

If you're truly concerned about the implications of preserving the contents of
existing memory, you need to do it at a lower level in the allocator anyways.
In the case that a pushing elements on a Vec causes the Vec's buffer to be
reallocated, the allocator is free to either extent the existing allocation of
the buffer or make a new allocation. If the allocator chooses the latter, then
clearing the memory upon drop doesn't clear the old copy.

~~~
mehrdadn
> then the contents of deallocated memory shouldn't be internally observable.

Only as long as you can assume control flow integrity. As soon as the control-
flow gets diverted then an attacker can read the memory. That's the root of
the security concern to begin with.

~~~
cwzwarich
I was assuming memory safety, and that violation of control-flow integrity
would also be a violation of memory safety.

~~~
mehrdadn
You _cannot_ assume memory safety; the entire _point_ of this security concern
is that you cannot assume memory safety. There will in general be unsafe Rust
code _and_ non-Rust code that you cannot verify in external libraries, etc.

~~~
Manishearth
In which case you can always implement a hardened vec that clears itself, like
the parent commenter said.

That comment had two parts, it first said that if your code is memory safe
(even explicitly calling out unsafe code!) then it's not a problem, and _then_
saying you can do it at a lower level if you have to. You can do it at a lower
level either by changing the allocator, or by having a hardened vec that does
a volatile write in its destructor.

Look again at the text right before your quote "then the contents of
deallocated memory shouldn't be internally observable."; it explicitly
addresses this exact concern.

~~~
mehrdadn
What I am trying to say is that that "if the program is memory-safe" condition
is never satisfied in reality. You _will_ be calling unsafe code no matter
what you do -- not only will any non-trivial program itself have unsafe code
as already called out, but because even a trivial "hello, world" would need to
invoke a syscall which will not be written in Rust (until the time comes when
we get a Rust kernel...) and hence not proven memory-safe. Does this make
sense? Yes, _if_ your program was secure then you wouldn't need to worry about
security, but that's not the world we live in.

~~~
Manishearth
Sure. At which point, again, there is still a solution, just that Vec itself
won't help here.

However, it's worth noting that at this level of "everything is unsafe" the
problem is intractable. Just as you trust Rust's stdlib to have safely-crafted
syscalls, you trust Rust's compiler to generate the right code if you tell it
to do a volatile clear on a drop/resize.

You have to define your trust level somewhere. For some, it is "I have audited
my small amount of unsafe code (if any), and I trust Rust's stdlib". The
parent's comment was addressing that level of trust. If you're not trusting
the stdlib, you also probably don't trust the compiler to generate the right
code; at which point the most you can do is try to keep things safe, you can
never succeed short of auditing the generated asm extensively.

------
mehrdadn
This seems wrong to me:

> Vec will never perform a "small optimization" where elements are actually
> stored on the stack [because] it would penalize the general case, incurring
> an additional branch on every access.

I don't follow this. I can see appending to a vector requiring an extra
branch, but why would _every access_ (like, say , vec[i] = 0) need an extra
branch?

~~~
Manishearth
This would be true in C++.

However, Rust has no move constructor; so the small vector optimization can't
be implemented as "make the pointer point to itself and fix it up every time
you move things", it must be implemented by having some check and then
changing how you refer to elements.

Note that in C++ the move constructor has a similar cost whenever it gets
moved around (and the cost can become problematic in cases like nested vectors
where resizing is no longer a memcpy).

(moves are also implicit and more common in Rust so the impact would be more
even if Rust did have move constructors. Which isn't really a workable
hypothetical, since Rust's model is not designed in a way that that would be
consistent anyway)

~~~
mehrdadn
Wow, this seems like a very painful restriction coming from the C++ side.
Thanks for the explanation.

~~~
Manishearth
It's actually not that painful.

C++ has a bit of a different safety model that somewhat relies on move
constructors to work. Rust builds this in, with some crucial changes.

For one, all C++ values are forced to have a zero state; e.g. when you move
out of a unique_ptr (in certain ways, not all of them) the old one must be
zeroed, because the destructor will still be run on it. Use after move leads
to unspecified but not necessarily undefined. Rust OTOH does not have null as
a valid value for its pointers. In case of conditional moves Rust will use an
extra bit on the stack called a "drop flag" to track whether or not the
destructor needs to be run, but only when needed, and usually it isn't so this
complexity is abstracted away and also doesn't impact the runtime for the
majority of times it isn't necessary.

99% of the move constructors in C++ are focused on making this work correctly.
It's not necessary in Rust.

There are the 1% that do stuff like the small string optimization. Yes, you
can't do this as neatly in Rust. But it's worth noting that move constructors
have a significant cost, both in runtime (things like vectors need to deal
with this correctly -- in particular vector<smart_object<_>> can't memcpy on
resize anymore), and in compile time -- if you want to build good generic
abstractions you have to guard against the fact that moves (and copies) can do
_anything_. This would completely destroy Rust's ability to easily
compartmentalize safety because now you have to worry about arbitrary code
running every other line of your generic code. In C++ you don't care about
this as much; since compartmentalizing safety is more of a best-effort thing
and blame is harder to assign, whereas in Rust your generic datastructure must
be safe when you throw any (safely implemented) type at it.

So yeah, sounds painful, but not really, and from a Rust POV what C++ has in
this space is _really_ painful.

~~~
mehrdadn
> 99% of the move constructors in C++ are focused on making this work
> correctly. It's not necessary in Rust. There are the 1% that do stuff like
> the small string optimization. Yes, you can't do this as neatly in Rust.

Hm, so let's say had this binary tree in C++:

    
    
      template<class T>
      class Node {
        Node(Node const &) = delete;  // don't worry about copying for now
        Node &operator =(Node const &) = delete;
    
        unique_ptr<Node> a, b;
        Node *parent;
        T value;
      public:
        Node(Node &&other) : a(move(other.a)), b(move(other.b)), parent(move(other.parent)), value(move(other.value)) {
          if (parent) {
             if (parent->a == &other) { parent->a = this; }
             if (parent->b == &other) { parent->b = this; }
          }
          other.parent = NULL;
        }
      }
    

You're saying I... can't have my binary tree in Rust?

~~~
Manishearth
[https://doc.rust-
lang.org/std/collections/struct.BTreeMap.ht...](https://doc.rust-
lang.org/std/collections/struct.BTreeMap.html)

you can't implement it _this_ way, but you can still implement a binary search
tree (Btreemap is a b-tree, which is a generalization of a binary tree).

Normally in Rust for a tree like structure with parent pointers you'd either
use unsafe code or use Rc/Weak, depending on what you need.

As far as binary search trees are concerned IIRC you can implement them
without parent pointers, you'd move your relocation method to something called
on the parent node.

~~~
mehrdadn
Oh, the parent pointer (or even the BST nature) wasn't really the point. The
parent pointer was just an example that was easy to understand. For BSTs
specifically, in C++, IIRC they use threading (= sibling pointers) to allow
iterators O(1)-time access to the siblings in constant space, and those would
have the same problems. And I didn't even mean to limit this to the BST case
(I edited my comment while you were probably typing yours so you probably saw
the version where I mentioned BSTs). In other non-BST binary trees you may
well _need_ a parent pointer for good time/space complexity, not merely find
it convenient.

Note that this isn't just limited to trees. Even doubly-linked lists would
have this issue, as would other more-complex data structures.

I guess my larger point is, it seems like the repercussions aren't just
limited to missing low-level optimizations like small-string optimization like
you claim, but they extend to your entire global object model as well as to
what operations & time complexities you can support without the compiler
yelling at you. That seems quite painful to me. Who wants to constantly argue
with the compiler as to whether or not he should be allowed to have
parent/sibling/other pointers?

~~~
viraptor
> Who wants to constantly argue with the compiler as to whether or not he
> should be allowed to have parent/sibling/other pointers?

People who value "I'm sure this can't explode", over "I'm pretty sure it won't
explode, but at least it's a fast template that can stack-allocate" ;-)

But seriously - you can still do a lot of that if you really need to for some
reason. Either use an external library in C++ or write some unsafe code which
won't restrict your pointers.

~~~
mehrdadn
> But seriously - you can still do a lot of that if you really need to for
> some reason. Either use an external library in C++ or write some unsafe code
> which won't restrict your pointers.

How would unsafe code help here? The problem we were talking about was the
lack of move constructors, and I just gave an example of their usefulness with
parent pointers. How would unsafe code help? Unsafe code doesn't suddenly give
you move constructors does it?

(As for writing external C++ code and linking to it... I mean yeah, but if the
entire object model you have to interact with has to be in C++, then you might
as well write the whole code in C++ at that point...)

~~~
Manishearth
> How would unsafe code help? Unsafe code doesn't suddenly give you move
> constructors does it?

Unsafe code lets you store parent pointers as raw pointers and update them
manually when you move things. You don't actually _want_ a move constructor
here; your btree values are on the heap anyway; what you want is a .move()
method that chains correctly, and you can write this in Rust. In C++ the move
constructor is a convenient way of doing this but not the only way.

Move constructors are how you organize the code in C++. Rust will not support
the same code organization. This does not mean it's not doable in Rust, just
that you have to design it differently. Rust is not C++; this is ok.

------
tom_mellior
I don't understand the following sentence:

> ... it is strongly recommended that you only free memory allocated by a Vec
> by creating a new Vec and dropping it.

Could someone parse this for me? To me, it reads like "to free memory
allocated by a Vec _v_ , create a new Vec _w_ and drop it (i.e., _w_ )".

This is obviously not what it is intended. What's the role of the second Vec?
Is this only in a context where _v_ is empty but its memory not yet freed? If
so, why wouldn't shrink_to_fit work in that case?

~~~
steveklabnik
I've filed a bug to elaborate, thanks! [https://github.com/rust-
lang/rust/issues/46879](https://github.com/rust-lang/rust/issues/46879)

~~~
tom_mellior
Cool, thanks to both of you for the quick report and fix!

