
Where Rust really shines - Manishearth
http://manishearth.github.io/blog/2015/05/03/where-rust-really-shines/
======
hckr1292
I've been following Rust for a long time now, and I'm very excited about the
possibilities. However, I don't have a CS degree, and my work as a full stack
web developer has only put me in contact with garbage collected languages:
NodeJS, PHP, and a little Python.

I just spent an entire week working on a chatbot in Rust, and I found the
steep learning curve to be very challenging. While Rust seems to be a big win
for teams that are accustomed to C++/C development, it does seem to be very
demanding for web devs who always work with a garbage collector.

In particular, I was trying to use Iron to write an http server that could
receive web hooks. Iron, quite sensibly, handles each incoming request to a
server in a separate thread. However, as a beginner it's very challenging to
figure out how to persist data between requests without writing to disk or
persisting to a db. Mutating state across threads is hard. Currently, there's
also no version of channels that are single producer, multiple consumer. I'm
sure as the ecosystem develops, more examples will make this easier, but for
now it's surprisingly difficult to figure out.

Fortunatey, the #rust IRC channel supported me all the way. It's a very
friendly and active community, but without their support I would have probably
given up.

I look forward to seeing more posts at this level of detail help those of us
struggling to learn Rust understand how the experts use Rust.

~~~
joslin01
So I'm not sure I fully understand why you would target Rust as a web
development language in the first place except pure curiosity of working in
the language, but it appears there's more to the story. You're excited about
Rust's possibilities -- in web development or just in general? And you gripe
it's "demanding for web devs who work with a garbage collector." This makes me
pause and think, why are we talking about web devs in the first place? From
what I understood, Rust is a systems language and by systems language, it
means stuff a lot lower level than `route '/me' => controller.toMe!`

Thing is about web dev is that (1) it's a get-things-done playing field and
(2) primarily just data delivery. The idea that you would want a language that
doesn't have garbage collection is just silly. Why? What complex operations
are you doing in each request/response cycle that demands this sort of
computational horsepower? And if you are doing said complex operations, maybe
you should rethink how you're delivering that data and creating it? GitHub is
huge and runs on Ruby. Surely, you can't be wanting to use Rust to improve
your performance are you?

> Mutating state across threads is hard

Yep, so maybe you should target a language / framework that abstracts this
away from you? I wouldn't say anything usually but you say it's "surprisingly
difficult to figure out." Come on really.. programming languages are not
created just for web devs...

I don't know. If you're trying to be productive AND change your toolset, it
sounds like you just want a typed language. I wouldn't put bet too much of
your time-and-focus-chips on Rust becoming a language for web developers. Do
you see many people using C++ to write their servers? Nope. And don't take my
message the wrong way. It really just bewilders me because I see a comment
like "Yea I want to help people like you!" and I'm like Huh??

~~~
Manishearth
FWIW Rust is intended to be more than a systems language, but is designed as a
systems language first (but other use cases are all taken into consideration)

------
alkonaut
As a novice I find that when the compiler starts telling me do to things it
will go one of two ways: either what the compiler tells me to add will quickly
propagate until it's happy OR it will propagate until lifetime arguments have
spread through the entire codebase, turning everything into token salad. The
hard part is having a mental model of the data so you can reason about
lifetimes. The compiler may help you to get it formally correct, but what
would be really cool would be if it would help you find the "inversions" or
"isolations" that you will need in order to not end up with lifetime soup.

~~~
Manishearth
Yum, salad. In my experience this happens when you overspecify a lifetime or
something. After I got this code working, I messed with the lifetimes a bit --
I added one extra `'a` tying the lifetimes of something else to existing bound
references. That propagated like wildfire and finally became unusable (proving
that those two references could not/would not have the same lifetime).

In more complex situations, it's generally worth thinking which lifetime
should be bound together for the first few functions and structs. After that
point just trust the compiler.

It's not too hard to get a mental model of lifetimes, really, and once you
have it you can look at lifetime parameters and figure out what they mean and
how they work with each other. You're anyway implicitly thinking of them as
scopes. I usually just ignore lifetime parameters -- they're something you
learn to gloss over whilst reading Rust code, and I only really read them (and
try to understand them) when I have lifetime errors. Sometimes the lifetime
error is due to something unfixable, in which case the compiler will often
lead me through a wild goose chase saladifying the entire codebase, or the
lifetime error is fixable, in which case the compiler's suggestions _usually_
work. Sometimes they don't, but in those situations you can still fix it by
trying to understand the why of it (like how I did in this post, though in the
case of this post the entire "understand the why of it" was an afterthought)
because things worked.

~~~
alkonaut
I bet there are a few common "patterns" to rust ownership, and equally many
pitfalls and rookie mistakes when one fails to pick one of these patterns. It
becomes extra hard if you approach Rust from a Java/Python/C# perspective
where the slightest bit of allocation/RAII and mostly even thinking of the
stack feels completely alien. You have to learn _that_ and Rust.

I'd love to see a kind of rust ownership tutorial in which you are asked to
address a simple CS problem where these patterns occur in Rust. For example,
many seem to find it hard to write a factory type. The next problem could be a
doubly linked list and so on.

------
SamReidHughes
The article says you _have_ to clone the vector, as if it's not completely
normal to have one object hold a reference to another, " _unsafely_ ," in C++,
without the world coming to an end. Having one object contain a reference or
pointer to another in C++ is completely normal and something you certainly
would "dream" of doing -- it happens all the time, you can find it everywhere
in the STL, so I don't know what the blog author is getting at. Looking at the
diff, the type SubstructureFields already has the lifetime parameter on it,
and making FieldInfo, which is only used as a field in the SubstructureFields
type (which owns its FieldInfo objects), be subject to the same lifetime
constraint, is something that would be perfectly ordinary C++ and it registers
near the bottom of the use-after-free risk scale.

~~~
pcwalton
> Looking at the diff, the type SubstructureFields already has the lifetime
> parameter on it, and making FieldInfo, which is only used as a field in the
> SubstructureFields type (which owns its FieldInfo objects), be subject to
> the same lifetime constraint, is something that would be perfectly ordinary
> C++ and it registers near the bottom of the use-after-free risk scale.

Which is why it works. But isn't it nice when the compiler is aware of the
"use-after-free risk scale" for your data structures?

~~~
SamReidHughes
I prefer technical accuracy in technical blog posts. If you want somebody to
preach at about the merits of compiler-checked safety, it's not me.

~~~
Manishearth
It was accurate. See my comment below -- just because `SubstructureFields` had
a lifetime parameter already doesn't mean it was borrowing data of a similar
lifetime. That data being borrowed could probably live longer than the
attributes, but fortunately the attributes live long enough that I can equate
the lifetimes. I was able to use the same lifetime parameter by luck -- there
was a good chance it wouldn't have compiled and I would have had to introduce
another lifetime. (Though since this was a fix the compiler suggested I didn't
need to worry too much about that happening -- generally those just work)

Just having a lifetime doesn't mean that it's safe to put any random borrowed
data in. In C++ we could have a single pointer to a _very_ long-lived struct
("SubstructureFields"), and wish to introduce another struct ("FieldInfo")
which contains a pointer to something that is shorter lived. Note that in
large codebases knowing which is "longer lived" is not easy, so from the
programmer's perspective there are just two pointers. Assuming that "Okay, we
don't have any segfaults now due to the first pointer, introducing the second
FieldInfo pointer should be fine then" would be fallacious -- we might be
accessing data during a period of time when the first, original pointer is
alive, but the second is invalidated. Use after free.

~~~
SamReidHughes
I'm trying to look more closely at the situation and will compose an answer,
hopefully soon, but after going to Staples to get a $50 DVI cable so I can
plug this workstation into a monitor (I just moved) and running make -j33 to
build the rust compiler and play around with it, it's been thirty minutes and
this is what I see:
[https://i.imgur.com/2JL66Fg.png](https://i.imgur.com/2JL66Fg.png) so bear
with me.

Update: You don't have to wade through a lot of code at all. All the code is
taking AST parameters that are the source of these Attribute vectors by const
reference and returning stuff like a P<Expr> and P<Item> and the like, AST
stuff that by its nature doesn't have references with tricky lifetime
dependencies to other far-flung AST stuff. So in C++ you can see right from
the type signatures (and some _basic_ institutional knowledge) that you're OK.

You can see that the lifetimes are not long or dangerous. SubstructureFields
is used in Substructure. A simple grep for that type shows a bunch of
functions that take a Substructure by const reference and return a P<Expr>.
There is nothing holding wiggly little references to Substructure objects or
the like.

~~~
pcwalton
You can easily write a function in C++ that takes an object by const reference
and creates a dangling reference without returning it. Such functions
regularly result in use-after-free (sometimes causing security
vulnerabilities) in the wild. Running grep on the signatures of functions is
not a sound analysis.

~~~
SamReidHughes
You can pretty well see by grepping whether it's going to create a dangling
reference or not. For example, in C++ codebases I'm most familiar with,
anything that takes a const foo& isn't going to retain a reference, and for
unfamiliar stuff that takes a const foo* or foo*, you can look at the outputs,
which, if it's a P<Expr>, all you need is the institutional knowledge that
Exprs don't have dangling references.

You don't need to debate with me the merits of replacing visual analysis with
sound analysis. (If I have a negative opinion of Rust on the matter, it's that
it's not good enough at that.) My beef here is with the way the blog article
overstates the case, saying you just wouldn't do this sort of thing or this
specific thing. You so would make temporary references deep into an AST while
expanding derived implementations.

------
shafiee01
This is really useful; I have been recently working on a huge c++ project and
I had the exact same problem you mentioned with large vectors. The way I
handled this problem was to wrap vector in a class which provides locking
mechanism for expanding operations so while the vector is growing nobody can
access it. I am excited to try Rust soon.

------
picardo
For an outsider, Rust is still barely usable. I tried building a simple demo
using the beta release today, and regretted right away. Lots of "unstable
feature" errors that I can't figure out how to get rid of. Many useful
features are unstable. I don't think I'm going to touch it again for a few
months.

~~~
sjolsen
I tried using Rust the other day, and I had the same problem. What I found
indicated that "unstable" features are permanently disabled in all but the
nightly releases.

~~~
picardo
Ah, gotcha. Well, that was not clear to me.

------
hoodedmongoose
This is really exciting, and clearly more powerful than what you get in C/C++.
That being said:

>In a language like C++ there’s only once choice in this situation; that is to
clone the vector.

What about a shared_ptr to the vector?

~~~
coherentpony
Or scoped_ptr to transfer ownership.

There's certainly _more_ than one choice, though.

~~~
imron
Not really, and I say this as a fan of C++.

These things protect against one part of the problem - deletion of the vector,
but not the other part - mutation of the vector, causing a new internal
allocation, and leaving references to any elements of the previous vector
invalid.

It's possible to make sure this isn't happening in C++ if a) your code base is
small enough and b) you are careful, but the point of the article is that with
Rust, you can make the changes and be confident that the compiler will fail if
you do anything dangerous.

With C++, you can make the changes and through careful checking be reasonably
confident everything works (perhaps only to find out later that you were
wrong). It's very different from having the compiler _verify_ that you are not
doing something unsafe.

------
jeorgun
Maybe I'm missing something, but why is it crucial that you hold on to a
reference to the _contents_ of the vector, and not the vector itself? Because,
to me, the obvious "C++ way" to do it would be something like

    
    
        struct FieldInfo {
          //
          const std::vector<ast::Attribute>& attrs;
        }

~~~
pcwalton
Well, if the object that owns the vector is destroyed then that reference will
go dangling. The compiler can't (soundly) check this.

------
shin_lao
Actually having a constant reference as a member works as expected in C++
because the compiler will force you to initialize it a construction.

It is true it will not check at compilation that you access a const reference
of a destroyed object, but to be honest with RAII this is not really a problem
as the lifetime of your objects should be pretty clear, by construction.

In debug mode your program will quickly assert if you access a destroyed STL
container.

To be honest I think in this case:

\- either you want to snapshot the value and therefore you should clone the
container (almost no performance cost for small containers)

\- you want to access the current value but then it means you know about the
lifetime.

~~~
al2o3cr
"the lifetime of your objects should be pretty clear, by construction"

Given the potentially-unbounded consequences of triggering UB in C++, "pretty
clear" seems like not enough.

------
berkut
> And this is despite the fact that I still don’t know where that particular
> vector is modified or destroyed — I didn’t explore that far because I didn’t
> need to! (or want to :P)

I find that a bit worrying - while it might well be an example of what's
technically possible in some cases of using Rust to prevent segfault-type
crashes, it doesn't mean there aren't now logic errors in the code which would
do something just as bad. I guess it depends on how well you know the code and
the context in which it's being used.

~~~
tatterdemalion
> it doesn't mean there aren't now logic errors in the code which would do
> something just as bad

Yes, it does. He wants to have immutable access to some data, the Rust borrow
checker's rules provide certainty that the data is not deleted or mutated
while his code is accessing it.

What could happen that is "just as bad"? Data being referenced cannot be
changed.

~~~
berkut
In this case it does.

I was more talking about the overall not caring theme. I'd hope that's only
because they know the code so can make those sort of assumptions.

Are you saying Rust can find and prevent at compile time logic errors as well?

~~~
tatterdemalion
You "don't care," I'm sure, about all kinds of things when you're programming
- depending on what level you're at. Unless you're drawing up CPU schematics,
there is something that you have abstracted away as no longer your concern.

Rust allows you to not care about more things than C/C++ despite compiling to
a comparable binary. In particular, Rust makes refactoring and extending a
code base much more care-free than many other languages (including high level
languages) because of the strong type and memory safety guarantees of its
semantics.

EDIT (re: your edit) - No one said anything about preventing arbitrary logic
errors, just that what Manishearth did will not introduce logic errors because
of Rust's guarantees. This could not be the data that Manishearth wants, but
the compiler guarantees that Manishearth cannot change the data or delete the
data and that the data is not changed or deleted while he is referencing it.
That is what is so great.

~~~
berkut
What edit? My second comment?

That was my original point regarding logic errors and not caring - yes, it may
well be that Rust gives you much more confidence that you don't do
invalid/wrong things. But that excludes logic errors (it may well reduce the
possibility of logic errors, but it doesn't prevent them), and there's still
room in large code bases for things to break even with very good static type
checking.

------
thrownaway2424
A C++ function returning pointer to vector, or a public field of that type,
would be considered bad API for the reasons mentioned in this article. The
solution is to not do that. I'm not sure that Rust's ability to allow shared
ownership of a vector makes this API pattern any better.

~~~
lobster_johnson
But it's a bad pattern in other languages because they don't guarantee safe
access. Rust does, thus making it a good pattern.

It's like saying that driving your car off a cliff is always a bad idea even
when you've acquired a car that can fly.

~~~
thrownaway2424
It's a bad pattern in any language because it exposes too much details of
storage layout and violates encapsulation. You probably don't want to expose
internal implementation details in your API unless such exposure is critical
to performance, and if it is critical to performance it seems unlikely that
you'd be writing it in anything other than C++ anyway.

~~~
lobster_johnson
This seems orthogonal to internal implementation hiding, especially if the
"API" is not a public library API, say, but an internal one.

Then again, contracts about who can modify what data are also part of any
public API; historically, there have been plenty of (quite decent) APIs that
have returned pointers to internal structures that have limited lifetimes, in
languages like C that don't enforce lifetimes at all.

------
ExpiredLink
People don't choose a language. They choose a platform (or stack or
environment). If you want to evaluate a language look at the language's
platform. Currently Rust is not part of any platform but may become the
language of the 'Mozilla platform' in the future.

~~~
grayrest
There's no overhead in caling a C api nor in other code calling Rust code as
if it were a C api. There's FFI setup and some caveats both ways but a lot of
the larger Rust projects involve (e.g Servo, Piston) interfacing with pre-
existing C libraries.

------
vbit
Garbage collected languages handle this fine too.

~~~
jwilm
They handle it fine by tracking references at runtime. The advantage here is
compile time verification that the reference will stay valid for the life of
the struct - zero runtime overhead.

