Hacker News new | past | comments | ask | show | jobs | submit login
Common Rust Lifetime Misconceptions (2020) (github.com/pretzelhammer)
195 points by nic_wilson on Dec 5, 2023 | hide | past | favorite | 87 comments



How I bail myself out of Rust lifetime problems as somebody who probably learned Rust the wrong way (by just trying to build stuff as if it were C/node.js and run into problems instead of slowing down + reading):

1. .clone() / .to_owned()

1. String -> vs &String/&str with & borrow

1. lazy_static / OnceCell + Lazy / OnceLock

1. Arc<Mutex<T>> with lots of .lock(). What would start off as like NULL in Java/C is Arc<Mutex<Option<T>>> and you have to check if it is .is_none()

I think that's about it. I rarely introduce a lifetime to a struct/impl. I try to avoid it honestly (probably for worse). Arc kind of bails you out of a lot (whether that's for good or not I don't know).

edit: Remembered another. I think it's kind of weird/too verbose from the compiler / borrow checker when you have a primitive like u32 and it's borrowed and you had to dereference it or clone it.


> who probably learned Rust the wrong way (by just trying to build stuff as if it were C/node.js and run into problems instead of slowing down + reading):

This is the right way to learn. I'm quite familiar with lifetimes and whatnot, but when I didn't bother with them much at all when learning - I just "clone"'d.

This allowed me to learn 95% of the language, and then once I did that, learn the last 5% (lifetimes).

Highly recommend.


I think there's a lot to be gained by not writing Rust like C, to the extent that it might be worth taking some time to pick up another language (maybe a lisp variant?) first.

2.) Be careful because &String is not &str, although in many cases you can pretend thanks to magic of AsRef/AsDeref.

4.) If you find yourself calling is_none, rethink things a bit. Pattern matching and the various destructuring (e.g. if let) features are some of the most powerful tools at your disposal with Rust. This is where experience with something else first (e.g. Elixir) can be helpful.

  I rarely introduce a lifetime to a struct/impl.
IMO the big use case here is &str.

  Arc kind of bails you out of a lot
While that's totally reasonable it's good to remember that you're essentially trading compile time guarantees for runtime ones. If you can figure out how to beat the borrow checker into submission that's one less thing to debug at runtime.

  primitive like u32 and it's borrowed and you had to dereference it or clone it.
The primitive types should all implement Copy which means you should (almost?) never have to explicitly clone them. Dereferencing is another story tho.


> maybe a lisp variant?

Nah, go with an ML-family language. Maybe even Standard ML, because it will nudge you away from writing "C in ML" and encourage you to pick up the idiomatic way of doing things. (Laurence Paulson's book has an online version available for free on his homepage).


SML is great, but I always suggest OCaml. Still my favourite language that I never get to write these days!


For practical programming I'd probably agree, but if the point is to learn a non-Algol way of thinking then I think SML is a better way to go; OCaml makes it easier to write imperative-style code, for better and for worse.


Yeah that's a fair point!


This is how my team writes production Rust code. Knowing which one to use and when is important, but there's nothing wrong with using the tools available to you.

Non-lexical lifetimes are, in my experience, pretty uncommon in most non-library code. You don't really need them until you really need them.


Non-lexical lifetimes are, in my experience, pretty uncommon in most non-library code.

To avoid confusing the newcomers: lifetimes are always non-lexical (see [1] for the pedantic details.) I suppose you meant that explicit lifetime annotations are pretty uncommon, which is not wrong.

[1] https://blog.rust-lang.org/2022/08/05/nll-by-default.html


Oh yeah, wrong terminology. My bad!


Good advice, though I'd recommend Rc<RefCell<T>> instead of Arc<Mutex<T>> if you're not sharing the data between threads, to avoid synchronization overhead. I use Arc pretty rarely.


The overhead of an uncontested lock is not much more than a memory operation but it allows you to be able to use the same code in threaded context in tokio async which is a huge benefit. Unless you need the optimization (i.e. you profiled and determined that Arc in a hot loop is slowing you down) I think it's fine to use Arc in general.


> The overhead of an uncontested lock is not much more than a memory operation

An atomic memory operation! These can be orders of magnitude slower than regular memory operations.


An atomic read-modify-write. Atomic non-seq-cst load/stores can be cheap.

/Overly pedantic


> An atomic read-modify-write.

No, this also applies to (non-relaxed) atomic loads and stores, depending on the platform.

> Atomic non-seq-cst load/stores can be cheap.

Relaxed atomic loads and stores are always cheap, but anything above requires additional memory order instructions on many platforms, most notably on ARM.

Here we are talking specifically about mutexes, which follow acquire release semantics.

To be clear: locking an uncontented mutex is indeed much, much cheaper than an actual call into the kernel, but it is not free either.


Ok, technically we both used the weasel word 'can' so we are both right.

But even on ARM, these days store releases and load acquires, while not as free as on x86 are very cheap.

To make my statement more precise, typically what is still expensive pretty much everywhere is anything with #StoreLoad barrier semantics, which is what you need to acquire a mutex.


`RefCell` does have one big advantage, though: it'll panic instead of deadlock for reentrant borrow.


tokio is so wide spread now such that Arc<Mutex<T>> is coincidentally the right choice.

I'm not saying that's a good thing.


Doesn't tokio have a single-threaded runtime where that's not needed?


Yes but Send + Sync is required everywhere regardless.


This is not true, you can run non-send futures using Tokio: https://docs.rs/tokio/latest/tokio/task/struct.LocalSet.html


Eh, I don't think the overhead of an uncontested lock acquire is all that much.


Using `clone` etc when it's easier is actually common advice, it's perfectly OK to not have borrowing everywhere if you don't need it. My usual starting point / default / rule of thumb is to take references as parameters and return owned values (for example, take `&str` as an argument, return an owned `String`).


IMO stringy data is a good example of where you should think about what you're returning in part because common APIs (e.g. regex) will take a more nuanced approach.

If you're creating a new string, then sure return String. But if you have a path where you could just return the original input, consider returning a Cow<str>.


burntsushi actually regrets making regex replace return a Cow<str>: https://github.com/rust-lang/regex/issues/676#issuecomment-6.... I’m glad it does, and wish it took an impl Into<Cow<str>> there, for the reasons discussed in the issue, but burntsushi has a lot more experience of the practical outcomes of this. Just something more to think about.


So from reading those comments, I'd come to the opposite conclusion: Cow<str> is absolutely the right choice and perhaps String should really have been Cow<str>.

Insofar as taking an impl into, burntsushi linked to a rust playground demonstrating where that approach falls down. In general (heh) taking arguments , especially options or stringy ones, that are generalized over an into impl is one of those things that seems real nice at first but gets real unpleasant pretty quick IMO.


In my defense, I said I occasionally regret the choice. But in rebuttal, I certainly do not have your confidence that returning Cow<str> is the right choice. Basically, when it comes down to it, I'm not 100% convinced that it pulls its weight. But like I said in the issue, it's decently motivated.

I don't think String could be a Cow<str>. Remember, Cow<str> is actually a Cow<'a, str>, and if you want to borrow a &str from a Cow, the lifetime of that &str is not 'a, but rather, attached to the Cow itself. (This is necessary because the Cow<str> may contain an owned String.) This in turn would effectively kill string slicing.

In order for something like Cow<str> to be the default, you need more infrastructure. Maybe something like hipstr[1]. It is a nice abstraction, but not one that would be appropriate for std.

[1]: https://docs.rs/hipstr/latest/hipstr/


(Sorry I missed the word “occasionally” there!)


This is totally valid. Lifetimes can be an optimization you go back and add later as needed; in structs, especially, they require a ton of code changes to add and a ton of code changes to change your mind about later, so they should be used judiciously

Other things I would add to this list:

- For structs that don't own anything on the heap, derive Copy. Then you can just pass them around willy-nilly without explicit clone()s

- Using a functional style where it makes sense to helps a lot; it can be really easy to pass things by reference when they only need to be temporarily, immutably used by that one function. And if you make Copyable structs, you can pass owned structs and return owned structs and not worry about any of it


I know there was a thread involving a rust team member saying that clone / to_owned is ok to start with The memory copying just nags and distracts me from moving on


Yes, it absolutely is.

And it's even okay beyond just starting with.

Search the regex crate repository for clones. There are a lot of them. Hell, Regex::new accepts a &str for a pattern, and one of the first things it does is convert it to a String.


Interesting. Most regexes being short, I reckon that copy is very cheap. Still I wonder, wouldn’t Cow be an acceptable middle ground, “best of both worlds” style (only copy when needed)?


No, because the clone is always a marginal cost, no matter how big the pattern is.

`Cow` would be a needless and gratuitously bad type to accept for Regex::new. There's no point. It would very likely suffer the same class problems as using Into<String>, because we'd need to use Into<Cow<'_, str>>, and thus it is susceptible to annoying inference failures.

It's always important to contextualize costs. In this case, regardless of the size of the pattern string, cloning it is always marginal relative to the other costs incurred. (One possible exception to this is if the pattern is just a simple literal with no regex meta characters. There in theory could be a fast path to side-step regex parsing and other things, but in practice there's not much need for that.)


I'm pretty sure everyone in the teams would endorse that statement :)


Cloning things which are Copy (such as u32) is futile and Clippy will tell you not to bother where it can see this is definitely Copy. If you don't use Clippy, I'd suggest trying it for a while.

Rc will be faster (if that matters to you) than Arc but it can't cross threads. (Safe) Rust will check you didn't get this wrong, so there's no danger but obviously knowing ahead of time avoids writing an Rc that you then need to be an Arc instead.

Sometimes it's tidier to write the borrow in the type of a pattern match e.g. if let Some(&foo) = ... Means you won't need to dereference foo inside the block.


Once I started taking this advice, Rust became manageable to me! Currently, Trait Implementations have been more of a stumbling block for me than the borrow checker


This is the way. The real misconception of lifetimes is that people have to use them often. You don’t unless your are writing libraries or system code.


For the edit: if you have an idea what the output should look like instead, please file a ticket.


I just clone everywhere and write Rust like a high level language. Then, once I need to optimize more, if I ever do (as Rust is many times faster than other languages even with liberal cloning), then I simply go through and remove the clone where needed.


> as Rust is many times faster than other languages even with liberal cloning

Have you really compared? I have. Rust was faster for "small input", but quickly got beaten by Java and other languages I tried because the cost of doing things this way grows exponentially. I suggest you run benchmarks before you make your mind up and start throwing opinions around.


I have never benchmarked Java vs other languages, but from experience java applications always have horrible startup time. So, there might be some contexts in which long running Java application can beat Rust or other language, but if you need something that starts instantly (like CLI utilities) Java is a no-go.

Another wart is that there are some written and non-written standards on CLI arguments (e.g. long option names should start with double hyphens) that 99% of Java CLI apps violate for some reason. Maybe I'm a perfectionist but it makes me uncomfortable to use Java CLI apps.


I'm no Java expert but they seem to be solving startup time with ahead-of-time (AOT) compilation.

https://medium.com/@subhajitc77/how-java-17-and-spring-boot-...


> So, there might be some contexts in which long running Java application can beat Rust or other language, but if you need something that starts instantly (like CLI utilities) Java is a no-go.

Maybe I'm in a bubble, but to me this sounds like Java would have faster performance in nearly all professional development situations. Very few people are writing CLI tools compared to those writing server code.


> So, there might be some contexts in which long running Java application can beat Rust

yes, context is essentially all server side SaaS business segment..


My other languages generally used are Python and TypeScript of which it absolutely is faster. I don't write Java anymore, generally speaking, so it could be faster, but it has its own problems, such as having null pointers and exceptions.


Me too. I also do this with .unwrap() - when I'm pretty sure of the happy path - while I'm prototyping. It's pretty easy afterwards to run back through and replace .unwrap() calls with better error handling. But as I improve, I'm getting more in the habit of using matching directly. Understanding lifetimes is probably really helpful for understanding library errors though and I need to go deeper there for sure


This is what I love about Rust, and other languages with these kinds of constructs. You have to acknowledge that something might e.g. fail. You can choose to do "nothing" (e.g. unwrap), but it must be done explicitly, which makes the ignored cases simple to identify later


I used to use OCaml before using Rust, it was similarly great and it's where Rust got the notion of the Option and Result types.


.unwrap() is the new // TODO:


for better or worse...


I found prototyping with `anyhow` really easy then I can convert to `thiserror` if I'm not interested in the boxing.


Another technique for dodging [manual] lifetimes is not store references in struct fields etc.


Other tips for understanding Rust lifetime issues:

- Enable rust-analyzer inlay hints for elided lifetimes, reborrows, etc

- Enable the `elided_lifetimes_in_paths` lint

Together, these should ensure that all lifetimes in your code are clearly visible on the screen.


#9 (downgrading mut refs to shared refs) is a big one. It makes things quite a bit more complicated in the context of our work on the OCaml-Rust interface (more precisely the safe interface for the GC). As I understand it, this is not a sacrifice we make at the "Altar of Memory Safety", but one we make at the Altar of Mutex::get_mut and Cell::get_mut, which is a much smaller altar (how often do you find yourself in possession precisely of a mutable borrow of a Mutex or of a Cell?).


Once I discovered ‘static was the subtype of all lifetimes rather than the super things began clicking for me.


`static is actually the superlifetime of all lifetimes, but & is contravariant in its lifetime parameter (and covariant in its type parameter).


Mind expanding on this? The nomicon describes &’a T where both ‘a’ and ‘T’ are covariant. I thought I had a clear picture but now I’m confused.

https://doc.rust-lang.org/nomicon/subtyping.html#variance


Here is the original GitHub issue on the question:

https://github.com/rust-lang/rust/issues/15699

And an RFC by some people that felt frustrated by this arguably implementation-centric view that kind of lost steam:

https://github.com/rust-lang/rfcs/issues/391

Intuitively, the bottom lifetime should the one that is uninhabited, which would be a lifetime with no extent rather than 'static.


Interesting, thanks for sharing!


`rustc` used to use different terminology than the nomicom, but this is no longer the case: https://github.com/rust-lang/rust/pull/107339



Could you elaborate? I find this unintuitive.


Since static lifetimes last for the entirety of the program, it can sub in for any other lifetime (as it is guaranteed to exist until the end of the other lifetime)


I am not familiar with the formal type theory, but this one is intuitive if you view it this way:

1. When A is a subtype of B, it means A can be used as B (a Teacher can be used by any function that accepts a Human).

2. static lifetime lives longer any other lifetimes, so it can be used as other lifetimes.


I use ChatGPT to ask questions about my code - including rust lifetimes - and usually get pretty good detailed answers. More recently I started using diesel ORM and was pleasantly surprised that the bot can answer questions about diesel usage.


Incredible. I never held the quoted misconception about `T: '*`, but I didn't understand it. It was a known unknown - I simply applied it when told to do so. This is the first time someone has explained it an understandable way, I guess the implications (it's a ref of that lifetime or an owned) are a better explanation than the technical (T is bounded by the lifetime).


“Applied it when told to do so.”

This is my beginner-level experience with Rust. It’s amazing that the compiler can be so specific about what’s wrong. But taking the error and getting explanations that even I can understand has been tricky.


If you have thoughts on how to improve the output for better understanding, do file a ticket against rustc. We are space constrained, so we try to avoid long explanations as much as possible, but we sometimes do or add links to the right spot in the docs.


Oh, I don’t have any criticism for the messages. It’s amazing that they can be so correct. And I’ve yet to find one I cannot quickly and accurately google. I’m criticizing my inability to convert them into knowledge vs just blindly doing what it’s telling me to do.

I would personally discourage trying to accomplish too much in the error messages. As long as they are a sufficient breadcrumb for what’s wrong, where, and enough key words to research, that seems ideal.


That is part of the strategy: remove jargon when possible, consciously feed jargon when unavoidable to introduce concepts and give something useful to search for.


This is the kind of thing I like to ask GPT4. It can usually explain what's going on.

Asking different ways usually leads me to understanding it well.


Same. I wrote a lot of Rust, even some open source libraries that are being used and generally feel comfortable using it but I learned a lot out of this very nice write-up.


About #10:

> because to unify them at this point would be a breaking change

Couldn't they change this in a future edition without breaking older editions?


A new edition is not a free pass to break source compatibility; it's more like a fire axe hidden behind breakable glass. You really, really want to avoid breaking changes because your users will now have to know about the difference between the compiler editions used by each of their projects.

In short, yes, but be very wary.


Technically yes, but editions are more like different "flavors" of Rust than actual breaking changes. A 2021 crate is supposed to be able to import a 2018 crate, likewise for 2015 and 2024. Because they all need to work together, edition changes still need to be somewhat compatible with older versions.

Because you can pass closures into functions across crate boundaries, which requires consistent lifetime semantics, I find it unlikely that this will be implemented.


As far as I can tell, this change is compatible with other editions. Closures can already be higher-ranked, and changing closures to be higher-ranked will work with existing editions.

A more problematic thing is that `cargo fix --edition` should be able to transform existing code to the same meaning in a new edition, so there should be a way to opt-in for the old behavior.


Most likely yes.

However, if you don't specify the argument type, it compiles fine. It's rare that you need to specify argument or return types on a closure, so it's not actually a large issue.


Wouldn't fixing 10 just mean allowing things that were previously not allowed? That's not a breaking change. What am I missing?


No it will not:

    fn requires_static(_: &'static str) {}
    
    let long_lifetime: &'static str = "";
    let closure = |_input: &str| -> &str { long_lifetime };
    let short_lifetime = &String::new();
    requires_static(closure(short_lifetime));
This code compiles currently as the returned `&str` is inferred to be `&'static str` because this is what it actually returns, but with #10 fixed it will not because lifetime elision rules say that the lifetime of the output is the same as the lifetime of the input.


Hm, in this case it's actually nice that this works because there's nothing unsafe or unexpected going on.


Discussed at the time:

Common Rust Lifetime Misconceptions - https://news.ycombinator.com/item?id=23279731 - May 2020 (43 comments)


This is a nice writeup. (5) and (10) are particularly good to know IMO -- (5) makes it pretty easy to design correct-looking APIs that only fail to compile when actually called (versus when defined), and (10) is a significant roadbump in Rust's otherwise relatively smooth (IMO) learning curve.


This is fantastic and definitely hits a few points that confused me when I first started as well. I remember definitively pounding the keyboard in frustration over at least one of these


I stopped reading when he started talking set theory. there's a far simpler way to explain it and learn it. nice way to tie ourselves into knots. no thanks.


This is very helpful. But I find it cumbersome to mix "misconceptions" with "clarifications".

To elaborate, stating "Foo is bar" as a misconception to be clarified, and then following it up with "Baz is quux", makes it very hard to follow and clearly identify what bits of information should be ingrained. In my opinion, information should only be conveyed "in the affirmative".

For example, don't write "Foo is bar is not true", write "Foo is NOT bar". Or have some consistent and unmistakable typography for the "false statements" (highlighting or color, etc)


I wholeheartedly agree and also extend this to software I write. I try very hard to have booleans never be called e.g. disableFoo. It isn't always easy, but you glue enough logic together and then try to say it becomes very difficult. This becomes even more true when you have a language that has unset or null type values and then configurations for those settings where the logic gets even nastier because usually you want unset to be false. If you have a negatively declared variable in this case, you have to check if it is true OR it is unset, whereas if it is a positively named thing, simply checking for a truthy value is adequate, more concise, and less error prone.


> have some consistent and unmistakable typography for the "false statements" (highlighting or color, etc)

THIS ^^

I stopped reading after about paragraph two, when I realized that it was very unclear that the code and table that I was reading was in the "this is a false belief" part of the explanation.

Don't do that.

Communicate more clearly and the article will be useful to more people, longer.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: