Even if the GC were in the language, the optimizations that you speak of would have negative performance ramifications on all code that did not use GC. The reason that we took the plunge and moved GC into the stdlib was because we weren't willing to make that sacrifice.
It does mean that our GC will never be particularly optimal. And that's fine, because if you really need shared ownership you should be using our really great refcounted pointers instead. :)
They fall out automatically from move semantics, a reference count only occurs on an explicit clone call:
fn just_a_ref(x: &T) { ... }
fn rc_by_val(x: Rc<T>) { ... }
fn rc_by_ref(x: &Rc<T>) { ... }
let some_rc_pointer: Rc<T> = ...;
just_a_ref(&*some_rc_pointer); // no ref counting
rc_by_ref(&some_rc_pointer); // no ref counting
rc_by_val(some_rc_pointer.clone()); // ref count incremented
// last use of a value (statically guaranteed that
// some_rc_pointer is never used again):
rc_by_val(some_rc_pointer); // no ref counting
It does mean that our GC will never be particularly optimal. And that's fine, because if you really need shared ownership you should be using our really great refcounted pointers instead. :)