

Rust borrow and lifetimes - arthurtw
http://arthurtw.github.io/2014/11/30/rust-borrow-lifetimes.html

======
Animats
That covers the easy cases for ownership and lifetimes. But it's apparently
necessary in Rust to use "unsafe" code in libraries to implement basic
structures like re-sizable arrays. This suggests a limitation of the ownership
primitives.

I looked at this problem a decade ago for C/C++, and got as far as borrow
semantics (I used the term "keep", as the opposite of borrow). But there were
ownership cases I couldn't figure out a good way to handle. This sort of thing
is why the C++ committee beat their head against the wall with "auto_ptr".
Rust has the same problems, and the solutions don't seem to be entirely clean.

Living with strict single ownership semantics is hard. One amusing approach is
to have an atomic "swap" as a primitive. Swap is safe as regards single
ownership. With enough swaps, you can do things like move the contents of an
array to a new, larger space. This is an inefficient way to do it, but easy to
prove sound. "Move semantics" can be viewed as a special case of swap, where
one of the moved pointers is known to be null.

It can be useful to view something like moving the contents of an array as a
sequence of swaps, which are then optimized. If the destination array is all
NULL at the start, each swap operation becomes a pointer copy followed by a
store of NULL into the source pointer. Then, if we can show that all the swap
operations are non-overlapping, we can do all the pointer copies before all
the stores of NULL. Then, if the source buffer is about to be discarded and is
never read again, we can omit the stores of NULL. Then if the pointer copies
are all adjacent, they can be rolled up into one MOV operation.

This may be a way to avoid using "unsafe" so much.

~~~
kibwen

      > But it's apparently necessary in Rust to use "unsafe" 
      > code in libraries to implement basic structures like 
      > re-sizable arrays.
    

Efficient data structures are one domain where single-ownership prevents
barriers to implementation, true. Your domain will determine how much unsafe
code you need to deal with. For efficient data structures, you will need a
large amount of unsafe code (probably a third to half of your code). For
applications that need to really push the performance envelope, I expect
you'll need around 10% unsafe code (this is a ballpark estimate for the amount
of unsafe code in Servo, last I checked). For most domains I'd expect the
volume of unsafe code to be less than 1% of your codebase (contrast this with
C and C++, where by Rust's definition 100% of the codebase is unsafe).

    
    
      > Living with strict single ownership semantics is hard.
    

This is why Rust provides a reference-counted smart pointer (somewhat like
C++'s shared_ptr) in the standard library, for those cases where multiple
ownership is necessary and you don't want to manually wrangle raw pointers.
There's unsafe code underlying it (the aforementioned raw pointer wrangling),
but putting it in the stdlib means that it can be well-audited.

    
    
      > This may be a way to avoid using "unsafe" so much.
    

Rust tries very hard to give you the tools to avoid using `unsafe` blocks, but
it's a thoroughly pragmatic language and so it knows that sometimes such code
is unavoidable. If you could afford inefficiency in the name of 100% safety,
you'd just be using a GC'd language. If it denied you efficiency in the name
of dogmatic safety, you'd just go back to using C++. The complexity here is
inherent to the domain. There must exist a compromise if we expect to ever
make progress.

~~~
Animats
_If it denied you efficiency in the name of dogmatic safety, you 'd just go
back to using C++._

For new work, most efforts are going forward to C# or D or Go or Java, all of
which solve this problem by using a garbage collector. The claim for Rust is
that it's memory-safe without a garbage collector. This claim does not appear
to stand scrutiny. If 10% of a browser has to be "unsafe", the language is
seriously flawed.

~~~
pcwalton
> For new work, most efforts are going forward to C# or D or Go or Java, all
> of which solve this problem by using a garbage collector. The claim for Rust
> is that it's memory-safe without a garbage collector.

In C#, D, Go, and Java, arrays are implemented as language primitives in the
runtime. Their runtimes are implemented in raw C or C++ and are not formally
verified. So my question is: Why is implementing an array in C as a language
primitive safer than implementing it in the standard library via unsafe code?

There's a similar story for the swap operation. Rust in fact used to have a
swap operation, as you suggested, but it was removed because such an operation
can easily be implemented in the library. We could have moved the code back
into the compiler, but how would that have made the language safer? (In fact,
I would be tempted to argue that implementing language primitives in codegen
is _less_ safe, since programmatically generating LLVM IR is more difficult
than writing essentially C with a different syntax.)

~~~
tomp
You didn't really answer the question. The GP was asking, how can Rust claim
to be "memory-safe" if 10% of _application_ code (e.g. Servo) is unsafe code.
We can accept unsafe code in the standard libraries (equivalent to how C# or
Java or Go are implemented), as it's reasonable to assume it will be
extensively peer-reviewed and battle-tested.

~~~
pcwalton
You shouldn't be writing unsafe application code. If your applications needs
new unsafe _abstractions_ (which you can think of as language extensions),
then you can write those. But unsafe is absolutely not for application code.

(This is, as I explained downthread, part of the reason why I dislike going
around quoting "X% of the code in Y app is unsafe." Your application code
shouldn't be unsafe.)

------
shadowmint
The recent discussion about the word 'lifetime' in rust interests me, because
it _is_ something I found confusing at first... but I still don't see any of
the alternatives ('scope', 'lifespan', 'borrow bound') as being any meaningful
improvement.

I think the lifetime guides should be more explicit about how memory
management occurs.

Something like this:

    
    
        When the instance X of type T is freed in rust and its destructor (if any) 
        is invoked, we refer to this as 'dropping' X.
    
        X will be dropped if:
    
        - It goes out of a scope
        - It's immediate parent is dropped
    
        *T in rust is a 'raw' pointer, like a C pointer.
    
        *T can result in a segmentation fault like in C, if the memory it points
        to has already been dropped (Use after free).
    
        &T is a special type in rust that prevents use after free errors by 
        applying compile time guards enforcing safe usage.
    
        Typically the compile time guards are automatically determined using
        static analysis, but sometimes ambiguities require explicit hints when
        writing an API. In these cases it is necessary to hint to static type
        checker what the the appropriate lifetime for a &T is.
    

The existing lifetime definition:

    
    
        A lifetime is a static approximation of the span of 
        execution during which the pointer is valid
    

Doesn't really cut it for me.

It needs to be explicit 1) that lifetimes are for the static type checker, and
2) that they are more than just for borrowed pointers.

For example, for a trait or closure, the lifetime is not the duration during
which the pointer is valid. It's a _lower bound_ on the _possible lifetimes of
other types_ that can go into either that closure or trait.

For example, to pass a closure to a thread it must be 'static, which means
that the lower bound for any lifetime used in the closure is static. This for
example:

    
    
        let foo = 0u;
        let bar = |:| { let y = foo + 1; ... }
    

The closure in this example has a lifetime which is <= the lifetime of foo,
which it uses. This is < 'static, so bar cannot be 'static.

The same is true of structures and traits. A boxed trait Box<Foo + 'a> means
the struct implementing Foo contained in the box must have an lifetime of _at
least_ 'a.

It's almost like we have two completely separate ideas:

1) Lifetime <\--- The lifetime of a borrowed pointer

2) Lifetime _bound_ <\---- The lifetime on a closure, struct, trait, etc.

~~~
kzrdude
Memory management is tangential to borrowing.

I think scope is a great word, because my own important insight was that
"lifetimes" represent the scope a borrowed reference lives in -- it's not
really about the life of the value you borrow from.

Yes if you have a lifetime 'a, it will correspond to some reference on the
stack in a scope somewhere up the callchain.

------
norswap
What I never see discussed is how this borrowing system works in the presence
of global state. Can't you have two global variables that reference the same
value? How do you then go about building indexes and things like that, is
there a form of weak reference?

I don't know Rust, so the question may be non-sensical, still, an explanation
would be much appreciated :)

~~~
kzrdude
* You can have thread-local storage, which is a form of "global" state for a whole task/thread.

* You can use Arc (Atomically reference counted smart pointer), to share immutable data with all your tasks, without needing any locking. You may generate this data, and either hand tasks an Arc pointer at their creation, or send it out using channels.

* You can use RWLock or Mutex to share owned data with limited/locked mutable access between several tasks. These types are smart wrappers that only allow access to their protected value while properly locked.

Oh, and fyi the borrow checker is strictly local to a function or method.
Other features of the Rust language, like the type kinds Send and Sync, are
involved in checking which values are safe to share between tasks. Normal rust
references (&T) only point across tasks if you get them through one of the
primitives Arc, Mutex, RWLock or similar.

