

Rust, Lifetimes, and Collections - kibwen
http://cglab.ca/~abeinges/blah/rust-lifetimes-and-collections/

======
kibwen
The backstory here is that Alexis is a community member who came to Rust with
a strong interest in improving the quality of the data structures in our
stdlib, many of which had deteriorated significantly over the years through
neglect and language upheaval. He began by compiling an enormous list of
general cleanups[1] which eventually sparked an RFC for wide-scale collections
reform[2] which was accepted and has begun work in earnest.[3] It's an
enormously impressive effort that has engaged tons of new contributors, and
goes to show what even a single driven community member can achieve.

[1] [https://github.com/rust-lang/rust/issues/18009](https://github.com/rust-
lang/rust/issues/18009)

[2] [https://github.com/rust-lang/rfcs/pull/235](https://github.com/rust-
lang/rfcs/pull/235)

[3] [https://github.com/rust-lang/rust/issues/18424](https://github.com/rust-
lang/rust/issues/18424)

------
haberman
This is long with a lot of detail, but to me the key bit of information is:
iterators (as expressed by Rust's Iterator trait) have some noticeable
drawbacks. Notably, you can't delete while iterating.

However, a different abstraction called Cursors can accommodate this. I asked
recently: "does this mean iterators are broken, or are iterators vs. cursors
more of a fundamental tradeoff?" I got this very helpful answer in reply:
[http://www.reddit.com/r/rust/comments/2mkra1/impressions_aft...](http://www.reddit.com/r/rust/comments/2mkra1/impressions_after_writing_a_little_rust/cm58heh)

------
Animats
_But, with a little sprinkle of unsafe inside this method, we can make it
happen._

Uh oh. If you have to turn off subscript checking in Rust for "performance"
for simple loop iteration, something is very wrong.

The way this ought to work is by the compiler hoisting checks out of the loop.
Consider

let mut v = vec![1i32, 2, 3, 4, 5];

let vlen = v.len()

for i in range(0, vlen) {

    
    
        let x = &mut v[i];
    
        // do some work with x
    

}

(Have to double space due to lack of formatting capabilities here.)

Subscript checking requires something comparable to

"assert(i >= 0 && i < v.len())" at the subscript check v[i]

The compiler, knowing how "for" works, can hoist that check out of the loop,
so it becomes

"assert(0 >= 0 && vlen-1 < v.len())"

at the the top of the loop, before the FOR statement is entered. Now the check
only has to be executed once. This optimization is valid for all iterative FOR
loops where the iteration variable and the array are not modified within the
loop, and the compiler has to check that.

Then, after hoisting, some algebraic simplification can be applied to the
expression in the assert. This requires a small theorem prover, or more
cheaply, some rewrite rules for the common cases.

"assert(true && vlen <= v.len())" "assert(vlen <= v.len())"

Now, flow analysis shows that, at the point of the assert, vlen = v.len(). So
we get

"assert(v.len() <= v.len())"

and finally

"assert(true)"

which is eliminated as dead code. Zero overhead for subscript checking, yet
proved correct.

Now that's how it ought to work. One of the Go compilers does this for simple
loops like this. Sprinkling "unsafe" around, and making people use funny
library constructs for simple loops, is doing it wrong.

This seems to be a forgotten technology. There were Pascal compilers in the
1980s which did this. In practice, about 95% of subscript checks could be
optimized out for Pascal. You can almost always get checks out of iterative
inner loops, which is where they really matter.

~~~
veddan
If we take your code and modify it so that the compiler can't just get rid of
the array acces, we get:

    
    
        extern crate test;
    
        fn main() {
            let mut v = vec![1i32, 2, 3, 4, 5];
            let vlen = v.len();
            for i in range(0, vlen) {
                let x = &mut v[i];
                test::black_box(x);  // do some work with x
            }
        }
    

Compiling this file with rustc -O, the compiler not only gets rid of the
bounds checking, it gets rid of the loop altogether by unrolling it
completely.

LLVM IR:
[https://gist.github.com/veddan/1535a8718cf2c85006ea](https://gist.github.com/veddan/1535a8718cf2c85006ea)

~~~
kibwen
To elaborate, when Rust says that array access is bounds-checked while
iterators aren't, what it means is that iterators are guaranteed not to bounds
check, while typical array accesses may or may not be optimized away. With the
GP's approach, not only do you need to hardcode the hoisting optimization into
the frontend, you need the body of the loop to follow a specific patterns and
you need to trust that the compiler will recognize the cases where the
optimization can be applied. By avoiding privileged optimizations in the
frontend and by giving programmers low-level tools where they need them, Rust
empowers library authors to get things done without having to bug the Rust
developers themselves to implement them.

~~~
Animats
That's kind of what compilers are for. One of their jobs is static analysis.
They have the graphs needed for hoisting. Libraries don't.

This may be a problem with the existing compiler, if the Rust implementation
is mostly a front-end to LLVM. I'd hate to see "unsafe" code baked into the
language standard, though. C++ tried to fix the mess underneath with template
libraries. That didn't end well.

"Giving programmers low-level tools when they need them" as an excuse for
abandoning language safety is a recipe for bad code. There's a long, long
history of that not working. "Unsafe" code should be very, very rare, used for
dealing with device registers and such.

This sounds like designing buffer overflows into Rust.

~~~
Dewie
> That's kind of what compilers are for.

To be black boxes? Yes, they seem to have a long history of being that.

If I want to implement a stack data type being backed by an array, and I can
not be sure that bounds checking is optimized away, I want to be able to turn
off bounds checking if I'm 100% sure that my implementation indexes into the
array in a correct way. I have all the information that I need about indexing
into the array, since I don't expose indexing to whoever is using my stack.
The compiler doesn't know more than me, in that sense. So why would I need
compiler support - as in some graph which is a result of static analysis - in
order to eliminate bounds checking?

~~~
Animats
_I want to be able to turn off bounds checking if I 'm 100% sure_

But is everyone else 100% sure? Look at that CERT advisory I posted from
Microsoft. "SafeBufferResize" \- wasn't. Somebody at Microsoft was probably
"100% sure".

If a buffer overflow in your code meant being fired, would you turn off
subscript checking?

~~~
Dewie
> If a buffer overflow in your code meant being fired, would you turn off
> subscript checking?

If I could get fired for my implementation not being efficient enough (and the
performance hit from the bounds checking mattered as shown through
benchmarking, yadda yadda), then yes. :)

------
rdtsc
I am trying to understand how these two statements work together.

1) > unsafe code is infectious like Java's exceptions. If you call an unsafe
function, you need to mark yourself as unsafe, or explicitly state the
boundaries where the unsafety is handled.

2) > Unsafe is for the lowest levels of abstraction to deal with

So according to 1) unsafe code is infectious but according to 2) you should
only use it at the lowest level in your code. By this I understand things like
mmap-ing a file, get a value out of shared memory, interacting with C code and
so on. Presumably if your high level code depends on low level code, wouldn't
the lower level "usafety" bubble all the way to the top. And next thing your
main do_top_level_business_logic() function has to be wrapped in an unsafe
block?

Now while typing this I am thinking of maybe one way to handle is and that is
to create a separate task and shuffle data via messages between another unsafe
task. If task crashes just restart it...

~~~
Hemospectrum
It bubbles up as far as an `unsafe { ... }` block, but no higher.

~~~
vutekst
Yes, presuming that the author of the unsafe code in question has manually
verified the effective safety in all relevant scenarios. `unsafe` is an
assertion to the Rust compiler that you know what you're doing, and you will
preserve its guarantees as far as external observers are concerned.

I might point out to the grandparent post that even the basic vector
collection in Rust is full of unsafe {} blocks.

------
enjoy-your-stay
After reading this bit:

    
    
      // Vec is our growable heap-allocated array type.
      // This is just a handy way to make one with some fixed data.
      let mut v = vec![1i32, 2, 3, 4, 5];
      let x = v.get(2);
      v.push(6); // oops, sorry x!
    

I really hope that the Rust compiler has some _really_ good error messages
explaining what the problem is because I couldn't see what was wrong with this
fragment until I read the explanation.

Saying that, after reading the explanation it became a bit more obvious, and I
really like the way of describing the relationship to the pointer as a 'view'
and 'loan', which as a c++ programmer (though clearly spending too much time
with GC languages) I can understand.

~~~
renox
I'm quite surprised by Rust behaviour here: if x isn't used after 'v.push(6);'
(doesn't seem too difficult to know by the compiler) then I would have
expected the compiler to allow this code and to reject it only if x is used
after the 'v.push(6)'.. Sure as he explained you can do "let mut v =
vec![1i32, 2, 3, 4, 5]; { let x = v.get(2); <work with x> } v.push(6);" but
this must be quite annoying..

~~~
Jweb_Guru
Currently, borrows last for block scope, which is why this doesn't work. There
is a feature in the making called SEME (single entry, multiple exit) which
will resolve this and other annoying borrow hazards. It at least at one point
was considered a priority for 1.0.

