
What's a reference in Rust? - giacaglia
https://jvns.ca/blog/2017/11/27/rust-ref/
======
kibwen
_> In Rust, a boxed pointer sometimes includes an extra word (a “vtable
pointer”) and sometimes don’t. It depends on whether the T in Box<T> is a type
or a trait. Don’t ask me more, I do not know more._

For those wanting to know more about this, the idea is that types whose size
is unknown at compile-time receive this two-word representation. I tend to
refer to these as "fat pointers", which is terminology from Cyclone (though
Cyclone's fat pointers serve a different purpose). More documentation on these
can be found at [https://doc.rust-lang.org/beta/nomicon/exotic-
sizes.html#dyn...](https://doc.rust-lang.org/beta/nomicon/exotic-
sizes.html#dynamically-sized-types-dsts) and in the section in the book on
slices (terminology taken from Go, whose slices are similar though with an
extra word) [https://doc.rust-lang.org/book/second-
edition/ch04-03-slices...](https://doc.rust-lang.org/book/second-
edition/ch04-03-slices.html#string-slices)

~~~
saghm
> terminology taken from Go, whose slices are similar though with an extra
> word

Interesting; I've always thought of Rust slices as being rather different from
Go slices. A Rust slice is always used through a reference and doesn't own its
data, whereas a Go slice is not generally used through a pointer and sometimes
points to a heap allocated section of memory, so it's basically the union of
Rust's vector and slices.

Tangentially, the inability to tell whether some value is heap allocated or
not from the type is one of my main gripes when working with Go as opposed to
Rust; in Rust, I can be sure that `Vec`, `String`, `Box`, `Rc`, and `Arc` are
all heap allocated and that slices, arrays, `str`, `&T`, and `&mut T` are not.
In Go, slices and pointers _might_ be heap allocated--or they might not.

~~~
sriram_malhar
> the inability to tell whether some value is heap allocated or not from the
> type is one of my main gripes when working with Go

Why is this important? The whole point of GC is to not spend time debating
this point, as long as performance is good enough

~~~
saghm
Normally, it isn't! Unfortunately, "good enough" is relative, and for some
applications this is vitally important.

I'm not trying to knock Go's performance here; from a naive standpoint, GC'ing
only some pointers is better than GC'ing _all_ of them like you have in more
traditional garbage-collected languages, but it's still easier to know what
exactly is being heap allocated and what isn't in a language like Java
precisely because you know that all objects are on the heap. From what I've
seen of low-level optimizations in Go code, it relies heavily on techniques
like generating flame graphs to analyze where allocations are occurring, which
IMO isn't a very good workflow, whereas in Rust you could do this much more
easily by just looking at the types that are used. I don't think this approach
is necessarily incompatible with garbage collection; theoretically a language
like Go could have separate vector and slice types like Rust does, and I think
that would make these types of optimizations much easier!

(I'm also not sure why you were downvoted for asking this; it's a perfectly
reasonable question)

~~~
sriram_malhar
I wonder if you have seen the OpenHFT project written in pure Java for high
frequency trading ([https://github.com/OpenHFT](https://github.com/OpenHFT)).
Would 175 _million_ trading transactions per second on modest hardware be
considered good enough? Check out the Chronicle log in the same project, that
persists tens of millions of records on disk.

All it takes is a basic understanding of cache architecture and of
generational GC, and simple data structures.

~~~
saghm
Sure, for high frequency trading, I think that's good enough! On the other
hand, if I'm writing an operating system or a device driver, getting a GC'd
language to be "good enough" is a very different type of problem.

As an aside, I don't think Java actually suffers from the specific problem
that I was mentioning in my original comment, namely that it's hard to tell
what's on the heap or not. I was under the impression that _all_ objects on
Java are on the heap, which makes it trivial to determine whether something is
heap-allocated or not based on the type like in Rust.

~~~
sriram_malhar
> On the other hand, if I'm writing an operating system or a device driver,
> getting a GC'd language to be "good enough" is a very different type of
> problem

Nicklaus Wirth's Oberon OS (written in Oberon), Microsoft's Singularity OS
(written in a variant of C#), the Mirage Unikernel written in OCaml, these are
all examples of OSs written in GC'd languages. I am not aware of performance
being an issue in any of these cases. Oberon was extensively used at ETH, and
the components of Mirage that I am aware of (such as their OpenSSL and DNS)
are competitive in performance with their C counterparts.

------
pornel
It's a very important topic to understand to be productive in Rust.

My knowledge of C made learning Rust _so much harder_ for me. It's really hard
to stop thinking in pointers. While Rust's references are technically
implemented as pointers, for the purpose of "fighting with the borrow checker"
it makes more sense to think of them as read/write locks for regions of
memory.

~~~
shmerl
C++ helps more than C for Rust.

~~~
vog
This is especially true if you are familiar with the RAII principle in C++.

~~~
shmerl
Yeah, there is a bigger emphasis on it in C++11 and on.

------
codefined
As someone who has just spent the last 15 minutes escaping to HN from Rust due
to reference errors, this was amazingly useful and actually helped me fix the
error I was getting.

~~~
gamegoblin
If you ever find yourself stuck for too long, drop a code snippet into
[https://play.rust-lang.org/](https://play.rust-lang.org/) and share a link to
the code in IRC channel at #rust or #rust-beginners (irc.mozilla.org). Very
friendly community.

~~~
gjtorikian
I second this. I asked a bunch of dumb questions on IRC, Reddit, and the
forums and every single time the responses were so patient and helpful.

I work at GitHub and I’ve been telling people that for the future of open
source we really ought to be looking at the Rust community, both the amount of
automation they have and also their general communication style.

~~~
CleanCut
Yay! I start at Github on Dec 5th, and I'm a huge Rust enthusiast.

------
eximius
Overall really great.

> These 3 types all have equivalent reference types (again: a reference is a
> pointer to memory in an unknown place): &[T] for Vec<T>, &str for String,
> and &T for Box<T>.

This seems to accidentally imply that these reference types are for things on
the heap. i.e., that &T is borrowed equivalent to Box<T> which is not true.
All three of these reference types can point to memory not on the heap. The
former two 'usually' don't, while the latter will vary wildly depending on the
application.

~~~
gpm
&[T] are commonly created from stack allocated arrays, and &str are even more
commonly created from read only string literals... so I don't think it's
correct so say that those "usually" point to things on the heap. (But of
course the definition of "usually" could vary, it wouldn't shock me to find
out they did 60% of the time).

Or did you mean &T usually points to things on the heap, in which case I
should just say it very very commonly points to stack allocated things as
well.

~~~
danieldk
_& [T] are commonly created from stack allocated arrays,_

Really? I would say that in my typical Rust code _& [T]_ is created from a
heap-allocated array >90% of the time. Most functions that do not require
ownership of an argument will use _& [T]_ and not _& Vec<T>_ (or perhaps _S:
AsRef <[T]>_), since _& [T]_ works for stack and heap memory and _& Vec<T>_ is
automatically converted to _& [T]_ through Deref coercion.

E.g.:

    
    
        fn main() {
            let v = vec![1, 2, 3, 4, 5];
            blah(&v);
        }
    
        fn blah<T>(s: &[T]) {
            println!("{}", s.len());
        }
    

(The same is true for _& str_.)

~~~
gpm
When you pass a `Vec<T>` directly to a non-mutating function or method not
implemented on `Vec<T>` itself you pass it as a `&[T]`. But more often I pass
it as part of a struct so it remains as (indirectly) `&Vec<T>`. However pretty
much whenever you use a stack allocated array you use it as a &[T], part of a
struct or not. I'm sure I use a heap allocated &[T] more often, but I doubt it
reaches 90%.

For &str you have to remember that every string literal in your program is
one. When you do `some_String.starts_with("/mnt")`, `println!("hi there {}",
name)`, etc you are using a new &str. I suspect most programs use more static
strings than dynamic Strings (particularly since Rust isn't heavily used in
GUIs yet).

------
crispweed
> The most important thing about Rust (and the thing that makes programming in
> Rust confusing) is that it needs to decide at compile time when all the
> memory in the program needs to be freed.

> ...

> When the function blah returns, x goes out of scope, and we need to figure
> out what to do with its my_cool_pointer member. But how can Rust know what
> kind of reference my_cool_pointer is? Is it on the heap?

> ...

> If we knew that my_cool_pointer was allocated on the heap, then we would
> know what to do when it goes out of scope: free it!

The way this is written kind of seems to suggest that Rust will sometimes free
heap memory when a reference to that memory goes out of scope, which I think
is misleading.

As I understand it, this is not the case, and the point is just that Rust
needs to be able to prove that nothing else freed the referenced heap memory
at any point where the reference may be used.

~~~
jmite
No, I think the article is right. When a value goes out of scope, its drop
method is called, which for Box values deallocates it.

The trick is that if it is (possibly) returned from a function, it is moved
instead of dropped.

It's also important to distinguish Box from Rc. Both are heap values but have
very different behavior.

~~~
crispweed
The text I quoted seemed to suggest that _the reference_ going out of scope
could trigger deallocation.

~~~
Manishearth
Which is true.

The word "reference" is overloaded, it can be used to mean "anything pointery
that's guaranteed to exist" too. Box<T> in this context is a reference.

The post does kind of dance between definitions of "reference" a bit, but I
think that's intentional.

------
burntsushi
Great post! I appreciate the socratic style. I agree with other posters that
stuff like this is important to be comfortable with when writing Rust, and
more material like this blog post is fantastic. I think if I were to write a
part 2 of this blog post, it would be about learning how to read Rust code
such that you know what is a reference and what isn't, and more pointedly,
when something is behind two references. These things are important for
effectively using pattern matching among other things.

With that said, I'd like to add some advice by spring-boarding off a part of
the post.

> Converting from a Vec<T> to a &[T] is really easy – you just run
> vec.as_ref(). The reason you can do this conversion is that you’re just
> “forgetting” that that variable is allocated on the heap and saying “who
> cares, this is just a reference”. String and Box<T> also has an .as_ref()
> method that convert to the reference version of those types in the same way.

While on the surface this is absolutely correct, there is a subtle point
missing here: as_ref on Vec/String/Box is implemented as part of the AsRef[1]
trait, which is _intended_ for use in generic programming. Aside from intent,
practically speaking, using as_ref in a non-generic context can often be
somewhat unergonomic, since depending on how you use it, it might require a
type annotation (because it's generic!).

Where AsRef is useful is in making the types of parameters to functions a bit
more liberal. One particularly convenient place where it's used in the
standard library is for defining functions that accept file paths. For
example, the type signature of the function that opens a file is[2]:

    
    
        fn open<P: AsRef<Path>>(path: P) -> Result<File>
    

Basically, this function says that it accepts a parameter `path` with a type
`P` that can be infallibly converted into a `Path`. Why is that convenient?
Because lots of useful types implement `AsRef<Path>`. They include OsStr,
Cow<'a, OsStr>, OsString, str, String, PathBuf, and of course, Path itself.
This is what let's you write `File::open("foo/bar")`. Without the generic
`AsRef<Path>` constraint, the signature would look like this:

    
    
        fn open(path: &Path) -> Result<File>
    

Which would mean that you'd need to write something like
`File::open(Path::new("foo/bar"))` instead.

So what's the alternative to using `as_ref` if I'm here poo-pooing it? In my
experience, the typical thing to do here is to rely on something called deref.
That is, if `s` is a `String` then `{STAR}s` is a `str` and `&{STAR}s` is a
`&str`. In many cases, the explicit dereference (so that's `&s` instead of
`&{STAR}s`) can be elided and the compiler will "auto-deref" for you. For
example, given a function like the following

    
    
        fn repeat(string: &str, count: u64) -> String
    

and a string `s` with type `String`, then

    
    
        repeat(&s, 5)
    

will "just work." If you prefer the explicit, then I think the recommendation
is to use type specific conversion methods. For `Vec<T>`, `as_slice` will give
you a `&[T]`. For `String`, `as_str` will give you a `&str`.

OK, that's enough for now! This rabbit hole goes deeper, but I'll stop here.
:)

> One question I have (that I think I will just resolve by getting more Rust
> experience!) is – when I write a Rust struct, how often will I be using
> lifetimes vs making the struct own all its own data?

If I were forced to give a pithy answer to this question, then I think I would
say (predominantly from the perspective of a library writer): "It's a healthy
mix, but if I don't care about performance for $reasons, I can usually ignore
lifetimes in the types I define."

[1] - [https://doc.rust-
lang.org/std/convert/trait.AsRef.html](https://doc.rust-
lang.org/std/convert/trait.AsRef.html)

[2] - [https://doc.rust-
lang.org/std/fs/struct.File.html#method.ope...](https://doc.rust-
lang.org/std/fs/struct.File.html#method.open)

~~~
ejanus
> open<P: AsRef<Path>>(path: P)

Can't one achieve same with enum??

~~~
kelnos
Yes, but it's awkward to work with from the caller's side. You'd have to do
something like this (hard to pick non-clashy names, too):

    
    
        enum Path {
          FromString(String),
          FromStr(&str),
          FromOsStr(OsStr),
          // ...
        }
    
        fn open(path: Path) ... {
    
        }
    

And then as the caller you'd have to do things like:

    
    
        let file = open(Path::FromStr("/foo/bar"));
    

It's not particularly nice to read, and you also have the overhead of creating
and throwing away the enum instance.

~~~
wtetzner
An using AsRef means you can define your own types that can be used as paths,
whereas an enum would be fixed once it was defined.

------
TheDong
> Every struct (or at least every useful struct!) refers to data

Not true, zero-sized structs are quite useful too. They can be used to fulfill
traits, indicate certain errors (often in enums), etc.

A couple quick examples from the stdlib:

[https://github.com/rust-
lang/rust/blob/1.22.1/src/libstd/syn...](https://github.com/rust-
lang/rust/blob/1.22.1/src/libstd/sync/mpsc/mod.rs#L579-L581)

[https://github.com/rust-
lang/rust/blob/1.22.1/src/libstd/col...](https://github.com/rust-
lang/rust/blob/1.22.1/src/libstd/collections/hash/map.rs#L31-L33)

I recommend watching this excellent rustconf 2017 talk for more information;
it heavily features information on how zero-sized types can be used:
[https://www.youtube.com/watch?v=wxPehGkoNOw](https://www.youtube.com/watch?v=wxPehGkoNOw)

~~~
sagichmal
By opening with "Not true" you're establishing a contrarian position, which
puts people -- likely the author, potentially even the reader -- on the
defensive, emotionally.

It's sufficient and actually a lot nicer to simply state your point: e.g.
"Zero-sized structs are quite useful too."

~~~
littlestymaar
I agree for the author's point of view, but as a reader I enjoy contradiction
and argumentation because that's where I learn most. Then when I see someone
starting with `not true` or `I disagree`, I immediately interested in reading
more. YMMV though.

------
b0rsuk
> I’ve written a few hundred lines of Rust over the last 4 years, but I’m
> honestly still pretty bad at Rust and so my goal is to learn enough that I
> don’t get confused while writing very simple programs.

This makes me feel hopeless, as I'm only about to start using Rust in my hobby
projects after reading the essential book chapters. I hope it's just excessive
humility on her part ? At the same time, I'm excited because if I commit
myself to mastering such a language it can make me stand out. I still have an
opportunity to be an early adopter, and have a head start in a promising new
language.

~~~
oconnor663
The IRC channel and the r/rust subreddit are all very helpful for new
rustaceans who get stuck, so don't hesitate to reach out.

~~~
littlestymaar
don't forget stackoverflow ! It doesn't have all the answers easy to google,
but /u/shepmaster is doing an amazing job as a curator here. You usually get
an answer in less than half an hour (assuming he's awake, but I'm not even
sure he sleeps :p)

------
shock
> I know in Java you have boxed pointer versions of primitive types, like
> Integer instead of int. And you can’t really have non-boxed pointers in
> Java, basically every pointer is allocated on the heap.

That's not true. In Java pointers can very well be allocated on the stack, but
the _objects that they point to_ will be on the heap

~~~
Rusky
It looks like the article pretty consistently uses the phrase "the pointer is
allocated ..." to mean that that's where it's pointing _to_.

~~~
shock
So the article is pretty consistently misleading/incorrect. A pointer is a
data structure like any other, in fact Java is pass-by-value, the pointer
values are copied when objects are being passed as function arguments.

~~~
Rusky
To me it seems it's just using different terminology than you expected. I've
heard and used the article's version plenty of times and it generally works in
context.

------
tempodox
For me, the question in Rust is not, what's a reference. But how do I find all
functions applicable to a given type? In C/C++, I can just grep the header
files for the type name and voilà. I find header-less languages like Rust or
Swift really obscure in that way.

~~~
simias
Interesting, if there's one thing I really don't miss in Rust it's bloody
headers.

Can't you simply use the docs? When I code in Rust I generally have the docs
opened: [https://doc.rust-lang.org/std/vec/struct.Vec.html](https://doc.rust-
lang.org/std/vec/struct.Vec.html) (or more like likely the locally installed
version).

Most crates have documentation available as well (generally linked directly
from their entry on crates.io) and if it's not online for some reason you can
just run "cargo doc" to generate it locally. Randomly taking the "image" crate
as an example:
[https://docs.rs/image/0.17.0/image/](https://docs.rs/image/0.17.0/image/)

Beats grepping header files IMO.

~~~
jcelerier
> Beats grepping header files IMO.

who greps header files in 2017 (or even 2010) ? just fuzzy search a few
characters that more or less looks like what you want in your IDE's search
box.

~~~
jstimpfle
I still grep header, as well as implementation, files a lot.

I miss being able to fuzzy search sometimes, but I keep coming back to vim.
IDEs just don't cut it for me. They are too slow (Visual Studio 2017 on my
desktop from 2011 is unbearable for even starting a new project). And most
things I really need to do - in vim they are a few memorized keypresses or a
plain shell command in a Makefile away, while in IDEs I have to dig through
wizards which really brings me out of the zone.

Not relying on API search much has the huge advantage of not relying on
external APIs, which leads to good modularization. As a general rule, a module
shouldn't call into other modules much.

And by the way it's the same for OOP: OOP has the advantage of supporting IDE
member/method autocomplete (noun first syntax), but it's just the wrong
mindset for me and leads to really broken architectures.

~~~
wasted_intel
> Not relying on API search much has the huge advantage of not relying on
> external APIs, which leads to good modularization. As a general rule, a
> module shouldn't call into other modules much.

When writing Rust, you'll likely use the standard library a _lot_ ; this rule
might not be as applicable as in other languages/environments.

~~~
jstimpfle
What kind of use do you mean exactly?

~~~
wasted_intel
Data structures (vectors, hashmaps, trees), I/O, etc. are all part of the
stdlib, and their rich feature sets make an API reference essential. You can
certainly write Rust without it, but you’d be missing out on a lot of useful
functionality.

------
tybit
So in comparison to C++ would it be correct to say that Box<T> is like
unique_ptr<T>, Vec<T> is Vector<T>, and that references are the same in both
languages?

~~~
kibwen
Rust's Box and Vec are analogous to C++'s unique_ptr and vector, yes. But
references in Rust really aren't anything like C++ references, given that Rust
references 1) are first-class, 2) come in two varieties (mutable/exclusive and
immutable/shared), 3) feature mechanically-checked lifetimes, and 4) will be
two words in size (rather than one) if the underlying type is dynamically-
sized.

~~~
jcelerier
> Rust references 1) are first-class

what makes them more first-class than C++ references ? eg in C++ given a type
T, you can use `std::add_lvalue_reference<T>`, `std::remove_reference<T>`,
overload on references, check if a type is a reference to another...

~~~
Sharlin
C++ reference types are first-class. But instances of reference types are not
first-class _values_. References are not objects, in standard speak, they do
not have a memory location, you can't take their address, you can't pass them
to functions (a reference parameter means passing a value by reference, not a
reference by value). And so on. Rust references are more like C/C++ pointers
or Java references in that they are actual values, and AFAIK Rust functions,
like Java and C functions, are strictly pass-by-value.

~~~
wyldfire
This is a really helpful distinction, thanks for clarifying.

------
PaulBGD_
Fantastic explanation, coming from a C background I definitely got confused by
some of the lifetime things that Rust does.

