Hacker News new | comments | show | ask | jobs | submit login
Non-Lexical Lifetimes in Rust (smallcultfollowing.com)
135 points by kibwen on May 2, 2016 | hide | past | web | favorite | 58 comments

Short background: in today's Rust, the lifetime of a reference is bound to a lexical scope. Sometimes this forces one to create temporary variables solely to change the lexical scope of a reference, which is an annoyance. Take the following example:

    let mut nums = Vec::new();
The above code would compile if we had non-lexical lifetimes, but today it doesn't. This is because method calls are effectively desugared to something like this:

    let mut nums = Vec::new();
    Vec::push(&mut nums, nums.len());
The arguments are evaluated left-to-right, and the mutable borrow prevents the immutable borrow later. Once non-lexical lifetimes arrives this will be possible thanks to control-flow analysis, but today we have to make a temporary:

    let mut nums = Vec::new();
    let length = nums.len();
It doesn't involve any new user-facing features to the language and is fully backward-compatible, so it's basically just a straight quality-of-life improvement for Rust programmers.

I'm not quite sure how non-lexical lifetimes is supposed to fix this issue. Control-flow analysis doesn't change the fact that the arguments are evaluated in left-to-right order, and it doesn't change the fact that the &mut reference is created prior to the method arguments being evaluated and lives until the method itself is invoked. Control flow analysis will simply confirm the fact that the &mut borrow exists while the arguments are evaluated.

What might fix this is a special-case rule where an implicit¹ & or &mut reference as a method receiver is temporarily ignored while the arguments are being evaluated (with the borrow checker double-checking after the arguments are evaluated that the & or &mut reference is still legal, i.e. that the referenced value wasn't moved and, in the case of a &mut reference, that no other borrow exists as a result of evaluating the arguments). This rule, being a special case, doesn't require non-lexical lifetimes.

¹I say implicit because it would be potentially confusing for an explicit reference to be ignored while the arguments are evaluated.

You're correct in that I was painting with too broad a stroke by attributing all of NLL to control-flow analysis, I was just trying to be brief. :) Having knowledge of control flow is just one of the approaches that will be used to achieve non-lexical lifetimes, though it is particularly applicable when also talking about MIR given that MIR is structured as a control-flow graph.

The assumed fix here is to distinguish between a mutable borrow and the reservation of a location to later do a mutable borrow. Locations based on local variables and their fields can be reserved, although a move out of the variable would invalidate a reservation. Other locations (e.g. the result of a function call) can't be reserved and would give an error.

Seems like it would be confusing to allow

  let mut nums = Vec::new();
but disallow

  let mut nums = Box::new(Vec::new());
The second example would have to be disallowed because the mutable reference is created by calling DerefMut::deref_mut().

The particular case of Box is safe, and is along the lines of special knowledge about Box that is assumed by the type checker, but the special-cased nature of Box has been gradually decreasing over time. I don't think there would be an easy safe way to allow user-defined types to establish same guarantee.

In retrospect, changing the evaluation order of a function with respect to its arguments might have been a better choice, but something like the solution I mentioned is probably the only backwards-compatible option.

My guess is that if a mutable reference dominates the lifetimes of all its borrows, then an exception to the rule can be made.

That Niko's text-arrow lifetime annotations are so helpful here suggests a rust IDE could offer similar highlighting or annotations to explain borrow checker errors.

One of the ergonomic hurdles to rust adoption by experienced C++ users is that the language looks similar enough that such people may think they fully understand the semantics even before they've fully internalized all the details. It's a tricky zone when something feels familiar enough that you don't realize the remaining mistakes in your conceptualization.

At this very moment there is a PR from Niko that is about to land that will overhaul borrow checker error message reporting (and error formatting in general): https://github.com/rust-lang/rust/pull/32756 .

The borrow checker already outputs these arrows.

This output is improving, too https://github.com/rust-lang/rust/pull/32756 -- feedback appreciated!

My understanding was that implementing this is dependent on MIR fully landing, is that correct?

In an extremely strict sense, no, but in reality yes. MIR makes it so much easier to implement a feature like this that we will wait for it before adding this.

So is it likely that this will be a Rust 2.x kind of thing?

Not at all – 2.0 means a breaking change, but currently the borrow checker is just too restrictive. Loosening it a bit won't be a breaking change, as it allows strictly more code, not less. The landing of MIR is an implementation detail (a quite large at that, but nevertheless), and it shouldn't have any observable effects. (Except for maybe some bugs getting fixed.)

Not at all. MIR is not user-facing, it's a refactoring of compiler internals, and it's getting close to being finished. The reason Niko wrote this blog post was to start thinking about the eventual RFC to specify exactly how they should work.

There are no current plans for a Rust 2.0.

Nope, nowhere near that far off. MIR will be landing this year, and while we don't have a timeframe for NLL yet it's definitely a priority and it's clear that the devs are giving thought to it, as per this post.

> it's basically just a straight quality-of-life improvement for Rust programmers

Nice to read. Are there any other such improvements planned?

Doing mostly scripting languages, Rust feels often a bit clunky with solutions like you mentioned or its unwrapping etc.

If you're aware of Rust's `try!()` macro for error handling, its functionality is being elevated into the language via the new `?` operator, which, in addition to being less verbose, will make for better error messages and also allow us to potentially extend the error propagation to types other than `Result` (such as `Option`). This feature is currently in nightly waiting to be stabilized, and the compiler codebase itself has been converted to use it: https://github.com/rust-lang/rust/pull/32390

There's also a huge overhaul to compiler error message formatting that should make error messages friendlier and more informative, and should hopefully be landing in nightly sometime today: https://github.com/rust-lang/rust/pull/32756 .

Also, part of the same work that is enabling non-lexical lifetimes (MIR, discussed here last week: https://news.ycombinator.com/item?id=11581681 ) is also going to enable incremental recompilation of Rust code, which should have a dramatic effect on turnaround time when developing code, as you'll spend enormously less time waiting for things to compile.

try!/? doesn't seem to really do much to solve the clunky ergonomics around Result/Optionals because, while shorter, it also creates an enormous amount of "upward pressure" for Results: to use try!/?, your method has to return an Result, which puts pressure on the caller to use try!/?, which then has to be defined to return an Result, which then puts pressure on...etc

So now instead of dealing with errors or none "at the boundaries", the codebase is shot-through with uncertainty just to get access to a less-clunky syntax.

It also doesn't solve silly things like io::stdout().flush().unwrap(), where you have to unwrap an empty return value, because there's nothing sane to do if it's not Ok(()) other than panic. But the only thing that would really solves those ergonomics is proper exceptions.

edit: And if every function is just going to be defined as returning Result<foo> instead of foo it feels like the language might as well bite the bullet and add a fn definition syntax that sugars the endless repetition of -> Result<foo> just for readability's sake.

  > So now instead of dealing with Nones "at the 
  > boundaries", the codebase is shot-through with 
  > uncertainty just to get access to a less-clunky syntax.
I'm not sure what this is referring to. The try macro and ? operator are solely for propagation, it doesn't make sense to speak of someone restructuring their code "just to get access" to them. Either you were propagating an error before, or you weren't. And if you weren't propagating it that means that you were handling it, which, if you're capable of doing so, is certainly better than propagating.

As for exceptions, having them is a non-starter given that Rust intends to be useful in environment where unwinding is simply unavailable. Both Dropbox and Firefox make use of Rust code compiled without any unwinding support. So you can either bifurcate your ecosystem with factions that alternately support errors-as-exceptions and errors-as-return-values, or you can rally around a single one and work to make its usage as nice as you can.

> I'm not sure what this is referring to. The try macro and ? operator are solely for propagation, it doesn't make sense to speak of someone restructuring their code "just to get access" to them. Either you were propagating an error before, or you weren't. And if you weren't propagating it that means that you were handling it, which, if you're capable of doing so, is certainly better than propagating.

It's about ergonomics. The concern is what happens when a junior dev hits some Result<> they don't really know how to deal with. Something like this:

This kind of sucks. The "verb" of this line of code is flush(), and in most languages the verb of a line of code is the last thing on it. It's what people are used to scanning for. .unwrap()s bury it in the middle of the line, which makes code harder to scan for verbs. Additionally, everybody has "unwrap is a code smell" chanted at them by Rust people "in the know".

So I want to make this go away, but there isn't a sensible thing for me to do if the universe is fucked to the point where flush isn't working, so matching on the result feels like enormous overkill because I don't really know what to do with the error.

Reads a hell of a lot better. It communicates what I'm actually trying to do more simply, and scans a lot better. Of course, it only works in functions that return Result<>s, because it's a Result propagation mechanism, not a do-or-die mechanism.

So if I'm a junior programmer, coming from some less strict language, a bit confused about all of this and I want code to read the way I expect it to read coming from other languages, I might start sprinkling in ?s every time I have to deal with these "Result" thingys. Of course, then the compiler yells at me because my caller has to unwrap Results, so I sprinkle in some more ?s and change my return type, and then...etc.

Pretty soon every function in the codebase returns a Result<> just so the code stays more readable and they can punt errors to main and exit, like they're used to.

tl;dr I think we're likely to see junior devs using a lot more Result<>s than they need to, because ? is more readable and "don't use unwrap" is Dogma.

For what it's worth, on Unix writing to stdout (including flushing a buffer) can fail in an extremely common situation: your process was piped to another, which exited before you called flush. (This also causes you to receive SIGPIPE, but Rust blocks SIGPIPE by default.) For example, `head` exits after the requested amount of data was read, and `less` exits immediately when the user presses 'q'. Since neither of those is an error, if you want to be well behaved, you should probably either ignore write errors or (preferably[1]) check for EPIPE in particular and respond by silently exiting - not panicking, which writes to stderr.

Not sure how well this generalizes to your broader complaint. One could argue that in general, you really do need to think about errors to write good software, so even if languages with exceptions make it slightly easier to write of code that doesn't, that's actually an antifeature because it encourages sloppiness. But maybe that's overly punitive rather than pragmatic. Still, I don't think the answer is exceptions.

[1] preferably because of course other errors are possible, especially if stdout is redirected to a file on disk - but then, in a quick and dirty program, if you ignore errors, you're doing no worse than umpteen million C programs that call printf without checking the return value.

Solid point, and flushing a buffer isn't necessarily the best example, other than I think it's an obvious place where a less-than-experienced dev is forced to deal with a Result they don't know what to do with and might turn to a beguilingly simple operator that seems to make the problem go away but encourages them to just keep punting Results up the call stack.

But you could make the case that I'm being overly pessimistic in focusing on the worst case mess that could be made with this feature.

"Don't use unwrap" is only dogma in library code, because ultimately it should be up to the ultimate library consumer (the application) as to whether an erroneous situation should cause the program to exit. Relative to this, unwrapping is perfectly acceptable in application code.

That's a message I'm not sure is getting through to end-users.

Eg) plenty of people posting code snippets asking for help in the usual places (/r/rust, the forum, SO) will immediately have 5 commentors jump down their throat if there's an unwrap in sight.

> Relative to this, unwrapping is perfectly acceptable in application code.

Unwrapping may be acceptable in application code - it depends on your application. Specifically it depends on whether you need specific handling of unexpected events.

If your program is a 'script' - by which I mean something that executes for a short period of time to perform a series of tasks and then exits - then panicking on an error is a reasonable way of aborting.

On the other hand, if it's a daemon, then you will likely want to gracefully handle all error conditions and carry on.

I think the issue here is that Rust is primarily used as a systems programming language and usage tends more to fall in the 'daemon' category. In that mentality you handle every error (even if it's just to print something), and that is probably where the general community pointers towards avoiding unwrap come from.

> like io::stdout().flush().unwrap(), where you have to unwrap an empty return value, because there's nothing sane to do if it's not Ok(()) other than panic

Try to flush other files, finish off transactions, remove potentially hanging locks, alert on stderr or in logs, etc. I don't think there isn't anything sane to do. Every app should have some global "something's fucked, try to clean up as much as possible" code path.

One thing that would be helpful for those looking to write scripts is to use crates which provide a higher level API. So for example, the text-io crate wraps up the lower-level standard library interface to make it easier to use, though it gives you much less control over errors.

Due to Rust's low-level nature, we've focused on the low-level stuff first: the higher-level APIs can then be built on top. I think this kind of thing would be more helpful to you than actual language improvements in many cases.

Yes, I understand that. I just had the hope, that the low level stuff is done with 1.0

Can't happen fast enough. For the last two years in a row I try Rust every 6 months to see if it's at a place that I would consider using it. I hit the 3 problems described here every time I try it. You can work around all of them by restructuring your code it's just so tedious. I rather be doing something else versus working around this.

The community hasn't really been helpful, the answer is always restructure how write things. I'm writing a column store query engine (with table blocks accessing them by columns) that leads to a lot of borrowing so the advice is not all that helpful. I want to like Rust, but I can't.

> The community hasn't really been helpful, the answer is always restructure how write things.

Um, what alternative should there be other than "the compiler should magically figure it out"? Every language has specific idioms that are the way you fix "Issue X"--the Gang of Four book, for example, is effectively an inventory of idioms to deal with C++ weaknesses.

You may be correct in your assessment that Rust is "too mentally expensive" for your task. That's fine. Rust has a very specific target, and that target is systems programs like kernels or Firefox--large programs that have long lifetimes and complicated memory management.

So, for example, if your program restarts processes often, then memory issues pretty much go away, and Rust really doesn't buy you much. If you can tolerate garbage collection, then there are probably much more expressive languages for the task.

I like Rust, but I'm really only going to consider Rust when I'm thinking "C or C++". If I'm thinking Python, Ruby, Go, Swift, etc., Rust probably isn't on the plate for now.

I always wonder about patterns like this, and how they arise. For the kind of Rust code I personally write, I can count on one hand the number of times I needed to deal with a non-lexical lifetime. But it's clearly a pain point for a lot of people, so I wonder where the difference is.

Fixing a compiler error by adding a lexical scope level has never occurred in any other language I can think of.

This was shocking to me when I bumped into it.

The problem is that you don't have the tools to even ask the right question to debug Rust code when you start. You have to ask questions on the IRC channel or you will get nowhere for quite a while.

A lot of people who program quite successfully in C or C++ do not really understand the idea of scope (which is what lets so many bugs creep in). Rust smacks you with "lifetime" in the face and makes you deal with it at compile time.

For those of us who are in environments where we have to struggle to debug C, the issues Rust is flagging are simply syntax errors. For other folks, they generally have to upgrade their mental models.

"Oh, that would have escaped the function after having gotten freed. Oops." is easy to understand when you have had to debug that dozens of times in C (generally because you accidentally added a code branch that didn't handle something). That error is not so apparent when you are used to simply returning an object in a garbage-collected language and letting the VM handle the issue.

I run into it infrequently (in particular, kibwen's example), but it's like a mosquito bite. "Oh, yeah, that sucks. Ah well, it's not going to stop me from doing what I was going to do anyway." It's definitely a little weird to hear it's preventing someone from using the language, so I too would like to hear more.

I bet it's to some extent an emotional thing, if you're an experienced C++ programmer. "Argh, yet again the stupid borrow checker is complaining about something I wrote that would have would work fine in C++ [which I'm used to]! Now I have to rewrite it for no reason." It feels suffocating. Most of the time, it's because the code pattern you tried to use is safe but fundamentally not expressible in Rust's type system (often due to simultaneous mutability), and you need to switch to a different one - which is annoying, but gets less common as you get more Rust experience, and hey, even if the borrow checker feels limiting, it's the state of the art: nobody has done better, you're living in the future. Occasionally it's because your code is actually wrong and would crash in C++, and that really makes you appreciate Rust. But sometimes, quite often in fact, it's something entirely local to one block of code that the stupid thing should be able to determine is safe, but can't, because it's stupid. And that's when you flip the table. That's when you question your decision to use Rust. It's like dealing with buggy software: far more annoying if the thing you're trying to do is supposed to work but broken, than if it just isn't supported. It's not the future, just a profusely bleeding edge.

Non-lexical lifetimes solve only a subset of that last category, but a significant subset.

I can imagine it's preventing someone from getting _into_ the language in the sense that if you run into such a situation, you don't have the mental tools to solve it, and it won't feel just a mosquito bite but worse. Even if you manage to solve it, there's the nagging feeling that there might be more annoying hurdles to overcome – but you don't know yet, so you are prone to give it up and use your time for something else.

It's probably because you're used to Rust and know how to write Rust in a way that just avoids all compiler errors. People new to the language try to write Rust code as if it were C++ or something (or, in general, are not yet used to the Rust way of doing things). I have had this issue with every language I've learned so far, most recently Go (which I tried writing as if it were Rust).

So often newcomers end up designing code in a way that at a higher level isn't too flexible when it comes to borrow check, leading to these issues.

Yeah, this is one of the angles I'm getting at: I wonder if we're doing a poor job communicating how to write proper Rust.

I've also found that basic problems like how to implement typical data structures (without unsafe code) are controversial, difficult, or actually impossible. That's really not good.

Still optimistic about Rust. But it has some fundamental issues to solve before it's generally usable.

You can express data structures safely perfectly fine in Rust as long as you do all the things that make them safe in every other language -- aggressively garbage collect, indirect, and runtime validate.

I'm not aware of any language that lets you implement an interesting high performance data structure totally safely. Heck, Rust is impressive because you can encode singly-linked-stack and binary-tree-without-parent-pointers efficiently and safely without garbage collection! You can even build iterators that are statically verified to iterate over every element at most once, and statically guaranteed to not be invalidated.

Type systems that are powerful enough to ensure the safety of even moderately complex designs (doubly linked list, tree with parent pointers, singly linked stack, hashmap) appear to be unwieldy and relegated to academia.

> even moderately complex designs (doubly linked list, tree with parent pointers, singly linked stack, hashmap)

In other words, imperative data structures?

My experience with implementing custom data structures in Rust:

(0) Purely functional data structures: You don't even need borrows to express them. The main downsides are: poor locality of reference, inability to parameterize the data structure over whether its nodes are uniquely owned or reference counted. HKTs would fix the latter.

(1) Imperative data structures: Not even mutable borrows will help you. You just need to drop down to `unsafe` code.

More or less. Hence why high performance data structures are invariably unsafe to implement in every language if they can be implemented at all. Rust lets you implement them, and for almost all of them you can even expose a safe high-performance interface to them.

The only major exceptions I know for interfaces are

* intrusive data structures that just blindly point to random data lying on the stack or stored in other types (a construct like C++ move constructors is necessary to safely manipulate these).

* priority queues with high-performance decrease-key (you effectively need to hand clients a pointer to every node, that they can always pass back to you to deref and update the structure from -- this is unsound if that node has been popped and you aren't using a GC scheme).

Certainly a safe interface is better than none, but unsafe and/or inefficient implementations are just symptoms of expressiveness issues, not a fact of life. That being said, what Rust has is already a strict improvement over not even being able to state how ephemeral data structures can be safely used. Also, establishing the safety of code that manipulates arrays (without runtime bounds checks), cyclical data structures (e.g. doubly-linked lists) and user-defined forms of indirection (e.g. hashing) is probably best done using tools other than unification-based type checkers.

I have no idea what a "priority queue with high-performance decrease-key" is, but from your brief description, you can't replace the "pointer to every node" with a RAII value that keeps the node from being deallocated?

Yes, that's garbage collection. :)

Ah so you're defining reference-counting as garbage collection?

Absolutely. I don't find the distinction to be particularly meaningful in this case.

Reference counting is a form of garbage collection.

Let's not exaggerate the problem. First of all, non-lexical lifetimes have nothing to do with doubly linked lists (what you're referring to). Second, the language is clearly generally usable, as evidenced by the hundreds of thousands of lines of code written in it, most of which was written by people who have never touched the compiler. The reason is simple: most people get by just fine with the rich set of generic collections available in the standard library and on crates.io. For cases in which those aren't sufficient, you write unsafe code or use Rc.

Nobody claims Golang is "generally unusable" because you can't implement your own generic data structures. That it's harder than in unsafe C or C++ to implement doubly linked lists in Rust is just a less severe version of the same kind of thing.

Implementing data structures is considered an 'advanced' topic in Rust. That's ok for two reasons:

1) The data structures you'll need for most scenarios are in the standard library. The data structures you'll need for pretty much every scenario anyone has ever encountered in Rust are on crates.io. It should be relatively rare that you need to write one.

2) Writing data structures IS an advanced topic. The fact that you have trouble with the borrow checker trying to implement them highlights that understanding data ownership can get very complicated. Plus there's usually lots of corner cases to get tripped up in (I can't tell you how many times I've had to debug issues causes with crap implementations of circular buffers in C).

And when you do need to write a data structure - unsafe is there to allow you to do it. And given Rusts type system, you can write that data structure once, make REALLY sure you got the unsafe bits right and then hide it behind a beautiful abstraction and never think of it again.

I'm just learning Rust and this is the primary question I have: If I can't use the data structures I'm accustomed to (because of mutability issues), what are the alternatives?

Where are the articles describing those alternatives? I've seen a lot of interesting discourse about the low-level stuff (e.g. error handling), but not higher-level "start here" approaches.

> If I can't use the data structures I'm accustomed to (because of mutability issues), what are the alternatives?

You can use the data structures you are used to just fine, thanks. Just because everything is immutable by default doesn't mean everything is immutable all the time.

In addition, any "unsafety" is generally buried deep inside a library and contained properly. Yes, if you are trying to implement something like a ConcurrentSkipListMap, you're probably going to have "unsafe" in your code (and probably some assembly language). But, if you are using the library, probably not.

I'm surprised you keep encountering these problems. I've written a few significant (>1000 LOC) Rust programs and never encountered them.

This reminds me of an irritating problem in CPS based compilers where you can have a transformation that is obvious and legal except that after the transformation one of the relevant variables would not be in scope. The solution(s) are to decline to do that transformation, fart around with sinking and hoisting in order to get the scope right, or ditch scope and move to a graph model in which dependence is represented with explicit edges rather than being approximated by the lexical structure of the program.

I'm guessing that the lexical borrow problems that Rust are facing are an instance of the same problem. It will be interesting to read the follow up post and see what relation, if any, the suggested solution has to the graph approaches that I am familiar with.

The solution is basically the same as the CPS case: switch to dominator-based scoping rather lexical scoping. I wrote an RFC on what would need to be done to the region system for nonlexical borrows a while back: https://github.com/rust-lang/rfcs/pull/396.

I want to like Rust, and in the abstract I do, but every time I dive in the concepts have too much complexity for me to get going, and this post reinforces that.

It's not that I couldn't eventually grok Rust, it's that I don't have time to given that it's not part of my day job. I really hope that Rust leads to some of the same features being expressed in a more intuitive way by a subsequent language.

I wonder how much of this is required to use rust effectively. A lot of languages have fairly tricky semantics and type systems, and most developers don't fully understand them. C++ is obviously a big language, but even C has things like volatile pointers and pointer aliasing and numerical type promotion.

And if you don't understand non-lexical lifetimes, I think that's OK. You can just be more explicit about the lifetimes and it will still work. And reading someone else's code should still make sense.

"The lifetime of a value"

Surely he means variable, not value, right?

Applications are open for YC Summer 2018

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact