How I bail myself out of Rust lifetime problems as somebody who probably learned Rust the wrong way (by just trying to build stuff as if it were C/node.js and run into problems instead of slowing down + reading):
1. .clone() / .to_owned()
1. String -> vs &String/&str with & borrow
1. lazy_static / OnceCell + Lazy / OnceLock
1. Arc<Mutex<T>> with lots of .lock(). What would start off as like NULL in Java/C is Arc<Mutex<Option<T>>> and you have to check if it is .is_none()
I think that's about it. I rarely introduce a lifetime to a struct/impl. I try to avoid it honestly (probably for worse). Arc kind of bails you out of a lot (whether that's for good or not I don't know).
edit: Remembered another. I think it's kind of weird/too verbose from the compiler / borrow checker when you have a primitive like u32 and it's borrowed and you had to dereference it or clone it.
> who probably learned Rust the wrong way (by just trying to build stuff as if it were C/node.js and run into problems instead of slowing down + reading):
This is the right way to learn. I'm quite familiar with lifetimes and whatnot, but when I didn't bother with them much at all when learning - I just "clone"'d.
This allowed me to learn 95% of the language, and then once I did that, learn the last 5% (lifetimes).
I think there's a lot to be gained by not writing Rust like C, to the extent that it might be worth taking some time to pick up another language (maybe a lisp variant?) first.
2.) Be careful because &String is not &str, although in many cases you can pretend thanks to magic of AsRef/AsDeref.
4.) If you find yourself calling is_none, rethink things a bit. Pattern matching and the various destructuring (e.g. if let) features are some of the most powerful tools at your disposal with Rust. This is where experience with something else first (e.g. Elixir) can be helpful.
I rarely introduce a lifetime to a struct/impl.
IMO the big use case here is &str.
Arc kind of bails you out of a lot
While that's totally reasonable it's good to remember that you're essentially trading compile time guarantees for runtime ones. If you can figure out how to beat the borrow checker into submission that's one less thing to debug at runtime.
primitive like u32 and it's borrowed and you had to dereference it or clone it.
The primitive types should all implement Copy which means you should (almost?) never have to explicitly clone them. Dereferencing is another story tho.
Nah, go with an ML-family language. Maybe even Standard ML, because it will nudge you away from writing "C in ML" and encourage you to pick up the idiomatic way of doing things. (Laurence Paulson's book has an online version available for free on his homepage).
For practical programming I'd probably agree, but if the point is to learn a non-Algol way of thinking then I think SML is a better way to go; OCaml makes it easier to write imperative-style code, for better and for worse.
This is how my team writes production Rust code. Knowing which one to use and when is important, but there's nothing wrong with using the tools available to you.
Non-lexical lifetimes are, in my experience, pretty uncommon in most non-library code. You don't really need them until you really need them.
Non-lexical lifetimes are, in my experience, pretty uncommon in most non-library code.
To avoid confusing the newcomers: lifetimes are always non-lexical (see [1] for the pedantic details.) I suppose you meant that explicit lifetime annotations are pretty uncommon, which is not wrong.
Good advice, though I'd recommend Rc<RefCell<T>> instead of Arc<Mutex<T>> if you're not sharing the data between threads, to avoid synchronization overhead. I use Arc pretty rarely.
The overhead of an uncontested lock is not much more than a memory operation but it allows you to be able to use the same code in threaded context in tokio async which is a huge benefit. Unless you need the optimization (i.e. you profiled and determined that Arc in a hot loop is slowing you down) I think it's fine to use Arc in general.
No, this also applies to (non-relaxed) atomic loads and stores, depending on the platform.
> Atomic non-seq-cst load/stores can be cheap.
Relaxed atomic loads and stores are always cheap, but anything above requires additional memory order instructions on many platforms, most notably on ARM.
Here we are talking specifically about mutexes, which follow acquire release semantics.
To be clear: locking an uncontented mutex is indeed much, much cheaper than an actual call into the kernel, but it is not free either.
Ok, technically we both used the weasel word 'can' so we are both right.
But even on ARM, these days store releases and load acquires, while not as free as on x86 are very cheap.
To make my statement more precise, typically what is still expensive pretty much everywhere is anything with #StoreLoad barrier semantics, which is what you need to acquire a mutex.
Using `clone` etc when it's easier is actually common advice, it's perfectly OK to not have borrowing everywhere if you don't need it. My usual starting point / default / rule of thumb is to take references as parameters and return owned values (for example, take `&str` as an argument, return an owned `String`).
IMO stringy data is a good example of where you should think about what you're returning in part because common APIs (e.g. regex) will take a more nuanced approach.
If you're creating a new string, then sure return String. But if you have a path where you could just return the original input, consider returning a Cow<str>.
burntsushi actually regrets making regex replace return a Cow<str>: https://github.com/rust-lang/regex/issues/676#issuecomment-6.... I’m glad it does, and wish it took an impl Into<Cow<str>> there, for the reasons discussed in the issue, but burntsushi has a lot more experience of the practical outcomes of this. Just something more to think about.
So from reading those comments, I'd come to the opposite conclusion: Cow<str> is absolutely the right choice and perhaps String should really have been Cow<str>.
Insofar as taking an impl into, burntsushi linked to a rust playground demonstrating where that approach falls down. In general (heh) taking arguments , especially options or stringy ones, that are generalized over an into impl is one of those things that seems real nice at first but gets real unpleasant pretty quick IMO.
In my defense, I said I occasionally regret the choice. But in rebuttal, I certainly do not have your confidence that returning Cow<str> is the right choice. Basically, when it comes down to it, I'm not 100% convinced that it pulls its weight. But like I said in the issue, it's decently motivated.
I don't think String could be a Cow<str>. Remember, Cow<str> is actually a Cow<'a, str>, and if you want to borrow a &str from a Cow, the lifetime of that &str is not 'a, but rather, attached to the Cow itself. (This is necessary because the Cow<str> may contain an owned String.) This in turn would effectively kill string slicing.
In order for something like Cow<str> to be the default, you need more infrastructure. Maybe something like hipstr[1]. It is a nice abstraction, but not one that would be appropriate for std.
This is totally valid. Lifetimes can be an optimization you go back and add later as needed; in structs, especially, they require a ton of code changes to add and a ton of code changes to change your mind about later, so they should be used judiciously
Other things I would add to this list:
- For structs that don't own anything on the heap, derive Copy. Then you can just pass them around willy-nilly without explicit clone()s
- Using a functional style where it makes sense to helps a lot; it can be really easy to pass things by reference when they only need to be temporarily, immutably used by that one function. And if you make Copyable structs, you can pass owned structs and return owned structs and not worry about any of it
I know there was a thread involving a rust team member saying that clone / to_owned is ok to start with
The memory copying just nags and distracts me from moving on
Search the regex crate repository for clones. There are a lot of them. Hell, Regex::new accepts a &str for a pattern, and one of the first things it does is convert it to a String.
Interesting. Most regexes being short, I reckon that copy is very cheap. Still I wonder, wouldn’t Cow be an acceptable middle ground, “best of both worlds” style (only copy when needed)?
No, because the clone is always a marginal cost, no matter how big the pattern is.
`Cow` would be a needless and gratuitously bad type to accept for Regex::new. There's no point. It would very likely suffer the same class problems as using Into<String>, because we'd need to use Into<Cow<'_, str>>, and thus it is susceptible to annoying inference failures.
It's always important to contextualize costs. In this case, regardless of the size of the pattern string, cloning it is always marginal relative to the other costs incurred. (One possible exception to this is if the pattern is just a simple literal with no regex meta characters. There in theory could be a fast path to side-step regex parsing and other things, but in practice there's not much need for that.)
Cloning things which are Copy (such as u32) is futile and Clippy will tell you not to bother where it can see this is definitely Copy. If you don't use Clippy, I'd suggest trying it for a while.
Rc will be faster (if that matters to you) than Arc but it can't cross threads. (Safe) Rust will check you didn't get this wrong, so there's no danger but obviously knowing ahead of time avoids writing an Rc that you then need to be an Arc instead.
Sometimes it's tidier to write the borrow in the type of a pattern match e.g. if let Some(&foo) = ... Means you won't need to dereference foo inside the block.
Once I started taking this advice, Rust became manageable to me! Currently, Trait Implementations have been more of a stumbling block for me than the borrow checker
This is the way. The real misconception of lifetimes is that people have to use them often. You don’t unless your are writing libraries or system code.
I just clone everywhere and write Rust like a high level language. Then, once I need to optimize more, if I ever do (as Rust is many times faster than other languages even with liberal cloning), then I simply go through and remove the clone where needed.
> as Rust is many times faster than other languages even with liberal cloning
Have you really compared? I have. Rust was faster for "small input", but quickly got beaten by Java and other languages I tried because the cost of doing things this way grows exponentially. I suggest you run benchmarks before you make your mind up and start throwing opinions around.
I have never benchmarked Java vs other languages, but from experience java applications always have horrible startup time. So, there might be some contexts in which long running Java application can beat Rust or other language, but if you need something that starts instantly (like CLI utilities) Java is a no-go.
Another wart is that there are some written and non-written standards on CLI arguments (e.g. long option names should start with double hyphens) that 99% of Java CLI apps violate for some reason. Maybe I'm a perfectionist but it makes me uncomfortable to use Java CLI apps.
> So, there might be some contexts in which long running Java application can beat Rust or other language, but if you need something that starts instantly (like CLI utilities) Java is a no-go.
Maybe I'm in a bubble, but to me this sounds like Java would have faster performance in nearly all professional development situations. Very few people are writing CLI tools compared to those writing server code.
My other languages generally used are Python and TypeScript of which it absolutely is faster. I don't write Java anymore, generally speaking, so it could be faster, but it has its own problems, such as having null pointers and exceptions.
Me too. I also do this with .unwrap() - when I'm pretty sure of the happy path - while I'm prototyping. It's pretty easy afterwards to run back through and replace .unwrap() calls with better error handling. But as I improve, I'm getting more in the habit of using matching directly. Understanding lifetimes is probably really helpful for understanding library errors though and I need to go deeper there for sure
This is what I love about Rust, and other languages with these kinds of constructs. You have to acknowledge that something might e.g. fail. You can choose to do "nothing" (e.g. unwrap), but it must be done explicitly, which makes the ignored cases simple to identify later
#9 (downgrading mut refs to shared refs) is a big one. It makes things quite a bit more complicated in the context of our work on the OCaml-Rust interface (more precisely the safe interface for the GC). As I understand it, this is not a sacrifice we make at the "Altar of Memory Safety", but one we make at the Altar of Mutex::get_mut and Cell::get_mut, which is a much smaller altar (how often do you find yourself in possession precisely of a mutable borrow of a Mutex or of a Cell?).
Since static lifetimes last for the entirety of the program, it can sub in for any other lifetime (as it is guaranteed to exist until the end of the other lifetime)
I use ChatGPT to ask questions about my code - including rust lifetimes - and usually get pretty good detailed answers. More recently I started using diesel ORM and was pleasantly surprised that the bot can answer questions about diesel usage.
Incredible. I never held the quoted misconception about `T: '*`, but I didn't understand it. It was a known unknown - I simply applied it when told to do so. This is the first time someone has explained it an understandable way, I guess the implications (it's a ref of that lifetime or an owned) are a better explanation than the technical (T is bounded by the lifetime).
This is my beginner-level experience with Rust. It’s amazing that the compiler can be so specific about what’s wrong. But taking the error and getting explanations that even I can understand has been tricky.
If you have thoughts on how to improve the output for better understanding, do file a ticket against rustc. We are space constrained, so we try to avoid long explanations as much as possible, but we sometimes do or add links to the right spot in the docs.
Oh, I don’t have any criticism for the messages. It’s amazing that they can be so correct. And I’ve yet to find one I cannot quickly and accurately google. I’m criticizing my inability to convert them into knowledge vs just blindly doing what it’s telling me to do.
I would personally discourage trying to accomplish too much in the error messages. As long as they are a sufficient breadcrumb for what’s wrong, where, and enough key words to research, that seems ideal.
That is part of the strategy: remove jargon when possible, consciously feed jargon when unavoidable to introduce concepts and give something useful to search for.
Same. I wrote a lot of Rust, even some open source libraries that are being used and generally feel comfortable using it but I learned a lot out of this very nice write-up.
A new edition is not a free pass to break source compatibility; it's more like a fire axe hidden behind breakable glass. You really, really want to avoid breaking changes because your users will now have to know about the difference between the compiler editions used by each of their projects.
Technically yes, but editions are more like different "flavors" of Rust than actual breaking changes. A 2021 crate is supposed to be able to import a 2018 crate, likewise for 2015 and 2024. Because they all need to work together, edition changes still need to be somewhat compatible with older versions.
Because you can pass closures into functions across crate boundaries, which requires consistent lifetime semantics, I find it unlikely that this will be implemented.
As far as I can tell, this change is compatible with other editions. Closures can already be higher-ranked, and changing closures to be higher-ranked will work with existing editions.
A more problematic thing is that `cargo fix --edition` should be able to transform existing code to the same meaning in a new edition, so there should be a way to opt-in for the old behavior.
However, if you don't specify the argument type, it compiles fine. It's rare that you need to specify argument or return types on a closure, so it's not actually a large issue.
fn requires_static(_: &'static str) {}
let long_lifetime: &'static str = "";
let closure = |_input: &str| -> &str { long_lifetime };
let short_lifetime = &String::new();
requires_static(closure(short_lifetime));
This code compiles currently as the returned `&str` is inferred to be `&'static str` because this is what it actually returns, but with #10 fixed it will not because lifetime elision rules say that the lifetime of the output is the same as the lifetime of the input.
This is a nice writeup. (5) and (10) are particularly good to know IMO -- (5) makes it pretty easy to design correct-looking APIs that only fail to compile when actually called (versus when defined), and (10) is a significant roadbump in Rust's otherwise relatively smooth (IMO) learning curve.
This is fantastic and definitely hits a few points that confused me when I first started as well. I remember definitively pounding the keyboard in frustration over at least one of these
I stopped reading when he started talking set theory. there's a far simpler way to explain it and learn it. nice way to tie ourselves into knots. no thanks.
This is very helpful. But I find it cumbersome to mix "misconceptions" with "clarifications".
To elaborate, stating "Foo is bar" as a misconception to be clarified, and then following it up with "Baz is quux", makes it very hard to follow and clearly identify what bits of information should be ingrained. In my opinion, information should only be conveyed "in the affirmative".
For example, don't write "Foo is bar is not true", write "Foo is NOT bar". Or have some consistent and unmistakable typography for the "false statements" (highlighting or color, etc)
I wholeheartedly agree and also extend this to software I write. I try very hard to have booleans never be called e.g. disableFoo. It isn't always easy, but you glue enough logic together and then try to say it becomes very difficult. This becomes even more true when you have a language that has unset or null type values and then configurations for those settings where the logic gets even nastier because usually you want unset to be false. If you have a negatively declared variable in this case, you have to check if it is true OR it is unset, whereas if it is a positively named thing, simply checking for a truthy value is adequate, more concise, and less error prone.
> have some consistent and unmistakable typography for the "false statements" (highlighting or color, etc)
THIS ^^
I stopped reading after about paragraph two, when I realized that it was very unclear that the code and table that I was reading was in the "this is a false belief" part of the explanation.
Don't do that.
Communicate more clearly and the article will be useful to more people, longer.
1. .clone() / .to_owned()
1. String -> vs &String/&str with & borrow
1. lazy_static / OnceCell + Lazy / OnceLock
1. Arc<Mutex<T>> with lots of .lock(). What would start off as like NULL in Java/C is Arc<Mutex<Option<T>>> and you have to check if it is .is_none()
I think that's about it. I rarely introduce a lifetime to a struct/impl. I try to avoid it honestly (probably for worse). Arc kind of bails you out of a lot (whether that's for good or not I don't know).
edit: Remembered another. I think it's kind of weird/too verbose from the compiler / borrow checker when you have a primitive like u32 and it's borrowed and you had to dereference it or clone it.