
Rust for C++ programmers – part 4: unique pointers - pohl
http://featherweightmusings.blogspot.com/2014/04/rust-for-c-programmers-part-4-unique.html?m=1
======
rdtsc
Rust looks very exciting and promising. I see the hardest things for it to be
not necessarily syntax and concurrency (which are very well done), but
performance and getting to compete with C++11 (C++14), which actually seems to
become fresh and interesting again.

Performance is tough. I feel most often C++ is not chosen for its inherent
cleanliness, elegance and beauty, but because there are no viable competitors
at the given performance point. C++ compilers have been honed and tweaked for
more than a decade now, so it will be a hard battle ahead.

~~~
loup-vaillant
> _I feel most often C++ is not chosen for its inherent cleanliness, elegance
> and beauty, but because there are no viable competitors at the given
> performance point._

My feeling is that most often, (i) people don't need nearly as much
performance as they think, and (ii) they greatly overestimate the performance
gap between C++ and garbage collected languages (most notably those who are
compiled to native code, such as Lisp, ML, and Haskell).

~~~
pohl
_they greatly overestimate the performance gap between C++ and garbage
collected languages_

I think you're right about that, but I also think that people greatly
underestimate the determinism gap between them, too. (Not necessarily the same
set of people, of course.)

~~~
loup-vaillant
I'm not sure. If you start using smart pointers in C++, you can get the same
problem: instead of a GC pause, you get a cascading delete pause. To eliminate
it, you have to devise a smarter memory scheme, at which point you probably
_don 't_ underestimate the determinism cap.

Also don't forget that garbage collectors can often be tuned. Granted, many of
them suck, but a generational, incremental collector whose parameters can be
tweaked? Much less room for pauses.

~~~
pohl
A cascading delete pause may be undesirable, but is it nondeterministic?

~~~
jerf
In the same way that GC is, yes. Technically both are probably fully
deterministic, just not readily determinable by casual examination of the code
at compile time.

~~~
pohl
I guess I didn't mean nondeterministic in the philosophical indeterminism
sense, but in the sense that different runs might produce different behavior.

[http://en.wikipedia.org/wiki/Nondeterministic_algorithm](http://en.wikipedia.org/wiki/Nondeterministic_algorithm)

~~~
loup-vaillant
Well, you can get there if you lag enough to drop a few frames in a
competitive FPS game: while the game is still deterministic, the game/players
_system_ will diverge: a few dropped frames can get you fragged.

Also, some naive simulations will adapt the length of their steps with the
time it takes to compute them. Any performance variation gives you full blown
unpredictability.

------
captainmuon
Rust gives me the same feeling I had when learning C. I started with BASIC,
then learned Java and a bit of Pascal, Perl, etc., but C was my first language
with pointers. Now it seems silly, but understanding pointers was a huge step
to me. There's a before and an after. Getting used to owned/transferable
pointers seems to involve a similar step, although it's probably easier this
time since I understand what they do.

Btw., if you like this kind of intelligent pointers, Vala something similar
[1], and the syntax seems a bit easier at first glance. You have owned
pointers (and can transfer ownership), shared pointers (with reference
counting) and unmanaged pointers. It uses reference counting for all the UI
stuff (Vala is highly integrated with Glib and Gtk), since that is usually not
performance critical and you'd rather be correct there. If you feel the need
to do without reference counting, you can manage memory manually, too.

[1]:
[https://wiki.gnome.org/Projects/Vala/ReferenceHandling](https://wiki.gnome.org/Projects/Vala/ReferenceHandling)

~~~
pcwalton
> You have owned pointers (and can transfer ownership), shared pointers (with
> reference counting) and unmanaged pointers. It uses reference counting for
> all the UI stuff (Vala is highly integrated with Glib and Gtk), since that
> is usually not performance critical and you'd rather be correct there. If
> you feel the need to do without reference counting, you can manage memory
> manually, too.

That doesn't sound safe—what if unowned pointers outlive the owned pointer,
and you get use-after-free?

I don't believe it's possible to avoid pervasive reference counting or GC in a
practical sense, and still be safe, without something like lifetimes built
into the type system.

------
bambam12897
I've been working C++ professionally for a couple of years and honestly I'm a
_huge_ fan - So I was excited to read about an alternative. After reading your
5 posts, I get the impression that RUST is mostly mildly useful syntactic
sugar on top of C++.

Here is my feedback:

1 - If memory management is a serious problem for the software you work on,
I've never found the boost library lacking. This seems like the main selling
point for RUST. Given the scope of the project: you guys must be doing
something that is so different that it couldn't be rolled into a library - so
I'm looking forward to your future posts to see if there is something here
that I really am missing out on.

2 - I'm not a fan of the implicitness and I personally don't use 'auto' b/c it
makes scanning code harder. I guess this is more of a personal preference.

3 - A lot of things are renamed. auto->let, new->box, switch->box You get the
feeling that effort was put in to make the language explicitly look different
from C++

4 - the Rust switch statement don't fall through... This one was truly mind
blowing. The one useful feature of switch statement got ripped it out! If you
don't really need the fall through, I'd just avoid using them completely...

5 - I've never really seen an equivalent to boost (in combination to the STL)
in other languages (maybe I didn't look hard enough). Could you maybe make a
post about the RUST standard library? Libraries are always the deal breaker

To that point, my last comment is maybe a little more wishy washy. The main
reason I'm consistently happy with using C++ (and why I put up with the header
files) is that everything is available. If you need to do X, and X has at some
point been put into library by someone: you can be sure that that library will
be available in C++. Since Rust seems so close to C++, does this mean that
linking to C++ code is trivial? If I can seamlessly start programming parts of
our codebase in RUST, that could potentially make a huge impact.

~~~
bad_user
> _If memory management is a serious problem for the software you work on, I
> 've never found the boost library lacking._

As a developer that isn't working with C++, I'm finding memory management in
C++ to be a nightmare and no amount of libraries can solve it.

Say you receive a pointer from somewhere. Is the referenced value allocated on
the stack or on the heap? If allocated on the heap, do you need to free it
yourself, or is it managed by whatever factory passed it to you? If you need
to deallocate that value, is it safe doing so? Maybe another thread is using
it right now. If you received it, but it should get deallocated by the factory
that gave it to you, then how will the factory know that your copy is no
longer in use? Maybe it's automatic, maybe you need to call some method, the
only way to know is to read the docs or source-code very carefully for every
third-party piece of code you interact with.

All of this is context that you have to keep in your head for everything you
do. No matter how good you are, not matter how sane your practices are, it's
easy to make accidental mistakes. I just reissued my SSL certificate, thanks
to C++.

Yeah, for your own code you can use RAII, smart pointers, whatever is in boost
these days and have consistent rules and policies for how
allocation/deallocation happens. Yay! Still a nightmare.

Even if manageable, there's a general rule of thumb that if a C++ project
doesn't have multiple conflicting ways of dealing with memory management and
multiple String classes, then it's not mature enough.

~~~
nly
> Say you receive a pointer from somewhere.

Here's your problem. In general I _don 't want_ to be receiving a _single_
pointer from anyone. Lately, I've found it helpful to think of pointers in C++
as special iterators rather than a referential relic from C. In such a mindset
passing pointers around without an accompanying end iterator, or iteration
count, just makes no sense. Anywhere that implied iteration count is always a
constant, I'm probably not structuring my code correctly.

So my recommendation is to use references (foo&) for passing down (well, up)
the stack, never to heap allocated objects. Because you can't use delete on a
reference there's no longer an ambiguity. Use smart pointers to manage the
heap. Write RAII wrappers (it's not a lot of code) to manage external
resources. RAII wrappers are especially useful for encapsulating smart
pointers so big things can be passed around with value semantics, which gives
you even stronger ability to reason. Implementing optimisations like copy-on-
write becomes fairly trivial.

> I just reissued my SSL certificate, thanks to C++.

If you're referring to Heartbleed then OpenSSL is written in C, not C++.
Generally only a language that inserts array bounds checks for every access
would have shielded you from this bug... C++s <vector> does this if you use
the at() function of <vector>, but op[] doesn't by default for performance
reasons.

~~~
scott_s
The problem that Rust solves is that your advice, while good, is still
_advice_. I absolutely agree that naked pointers are a code smell, and stack
allocated objects should be the norm, with passing around (const) references
to them. And RAII wrappers are great.

But all of that are patterns of use, enforced mostly by convention. In Rust,
that's enforced by the language itself, and violating it will be a compiler
error. The following kind of shenanigans won't be allowed outside of unsafe
regions:

    
    
      int main()
      {
        int on_stack;
        int& ref = on_stack;
        int* ptr = static_cast<int*>(&ref);
        delete ptr;
        return 0;
      }
    

Yes, it's obviously bad code, but C++ happily let me write it, and it compiled
with no warnings under -Wall -Wpedantic.

~~~
nly
This is because delete is an operator that can be overridden, and whether it
has been overridden isn't known until link time.

    
    
        void operator delete(void*) {  }
    
        int main()
        {
          int on_stack;
          int& ref = on_stack;
          int* ptr = &ref;
          delete ptr;
          return 0;
        }
    

and now it's safe :P... and yes, never freeing any memory is arguably a
perfectly valid memory management strategy. Ok, this example is nuts... but
it's a feature of C++, in the C tradition, that it lets you do crazy things.
Can I plug custom per-type memory allocators in to Rust?

------
stewbrew
I recently wondered whether it's possible to compile rust into a dll/so or
whether there is a way to call rust from other languages (e.g. C, R, or ruby).
All I found is that this isn't (easily?) possible because of rust's runtime.
Is this true? If so, will it be possible to call rust from, e.g., C code? I'd
like to have an alternative to c/c++ for writing native extensions for
interpreted languages.

~~~
dbaupp
Yes, it is possible to do it by just avoiding the runtime, e.g. the following
works fine:

    
    
        #![crate_id="example_c"]
        #![crate_type="dylib"]
        
        #[no_mangle]
        pub extern fn my_rust_function(x: i32, y: i32) -> i32 {
            x + y
        }
    

Then compiling that gives a libexample_c....so file, which can be linked
against the following C:

    
    
        #include<stdio.h>
        extern int my_rust_function(int, int);
    
        int main() {
            printf("%d", my_rust_function(1, 2));
            return 0;
        }
    

printing 3. (There's no interaction with a runtime here at all.)

In fact, it's even possible to "manually" start a runtime[1] inside another
process (and if it's started on its own thread then I think it will work
flawlessly, if not, then it might break in some corner cases).

[1]: [http://static.rust-lang.org/doc/master/guide-
runtime.html](http://static.rust-lang.org/doc/master/guide-runtime.html)

------
pkulak
Man. Now I really want to read the next post...

------
coldtea
From the blog:

> _I have history with Firefox layout and graphics, and programming language
> theory and type systems (mostly of the OO, Featherweight flavour, thus the
> title of the blog)._

Hmm, what are "type systems" of "featherweight flavour"? Anything real or some
inside joke? Or perhaps an elaborate way to say "not that complicated"?

(Google mostly returns references to the same blog)

~~~
nrc
Featherweight Java
([http://www.fos.kuis.kyoto-u.ac.jp/~igarashi/papers/fj.html](http://www.fos.kuis.kyoto-u.ac.jp/~igarashi/papers/fj.html))
was a seminal paper which showed type soundness for Java. The formal syntax
was a subset of Java subset and preserved the interesting features in the
semantics (as opposed to encoding a language in an extension of the lambda
calculus, which is an alternative style of formalisation). So, "featherweight
flavour" refers to formal type systems work which follows this style of
formalisation.

------
frozenport
>>The memory will not leak however, eventually it must go out of scope and
then it will be free. Yes, if the function that called it lives for ever the
point is perhaps moot, in an extreme case if it was called by 'main'.

~~~
rcthompson
If the pointer is returned all the way up to the main function, then it must
be required for the lifetime of the program (or else the program is poorly
designed). What point are you trying to make?

~~~
frozenport
It could also be a memory leak. For example, I have a for loop that keeps
requesting a new pointer...

~~~
AlisdairO
I just tried the following:

    
    
            let mut blah = ~MyStruct{x: 3, y: 4};
            for i in range(0,100000000) {
                blah = ~MyStruct{x: i + blah.x, y: i + blah.y};
            }
            println!("{} {}", blah.x, blah.y);
    

The memory usage didn't increase over time - the owned pointer frees when it
gets reassigned.

~~~
azth
See this example for more details:
[https://news.ycombinator.com/item?id=7665617](https://news.ycombinator.com/item?id=7665617)

------
kirab
What the heck is this?

    
    
      y: ~~~~~Foo

~~~
akavel
I assumed it reads more or less like:

    
    
      Foo *****y;
    

Or, probably, more like:

    
    
      std::unique_ptr<std::unique_ptr<std::unique_ptr<std::unique_ptr<std::unique_ptr<Foo>>>>> y;

~~~
kirab
Thanks, I hope this is never needed anywhere, ever.

~~~
pohl
I suspect it's a deliberately-perverse example, meant only to illustrate how
deeply method calls will automatically dereference.

~~~
kzrdude
It also dereferences through ~~~&~@~~&~Foo.

