Hacker News new | past | comments | ask | show | jobs | submit login
Indirect ownership, shallow borrow and self-referential data structures in Rust (yoyo-code.com)
57 points by asimpletune on April 30, 2022 | hide | past | favorite | 10 comments

Microsoft's Verona language is an interesting point in this design space--it allows references to be scoped to arbitrary heap regions: https://github.com/microsoft/verona/blob/master/docs/explore.... You can have arbitrary aliasing within a region, but the type system ensures that inter-region references obey lifetime rules.

In a language like Verona, ParsedFile could have a contents Vec allocated in the same region as the ParsedFile, and a words Vec containing references into the contents Vec. Reference-typed fields by default are constrained to the same region as the parent object, so the above example works straightforwardly.

The downside of this is that, in the general case, you have to garbage collect the regions. You can partition the heap using regions but all the same problems apply to the partitioned heap.

NB: Rust uses the term "regions" somewhat differently than Cyclone (https://homes.cs.washington.edu/~djg/papers/cyclone-cuj.pdf) or Verona. In the latter languages, regions are partitioned subgraphs of the heap. Rust regions, as far as I can tell, are slightly different.

Vale does something similar as well, though presumably at compile time only?

> This is also how generators and async functions actually work and where this problem often arises, so it's not a stretch to think about it like that.

There is a crate that uses async functions to accomplish exactly this. https://crates.io/crates/escher

Disclaimer, I've never actually tried using it. I've used owning_ref or ouroboros depending on what I need.

This is a really cool hack, I didn't know about that. It's another example showing that we really aren't conceptually that far from full fledged language support.

>A safe alternative is to use indexes instead of references.

    struct ParsedFile {
        contents: Vec<u8>,
        words: Vec<(usize, usize)>
>This works well, and you can make it ergonomic by wrapping the index pair in a new type

You should probably just use the std::ops::Range type for this. It's what's created by the `..` syntax

Eh, Range has a lot of really weird stuff going on that is worth shying away from if possible. For instance they're Iterator and not Copy so if you try to iterate over them multiple times or pass them as indices multiple times, you're gonna have a bad time (i.e., you'll need to clone every time you do). Usize is Copy, so you can use them to construct a Range every time you need one and never run into those issues.

Yes. For anyone who's not up on extremely obscure standard library issues in Rust, the Range type kind of serves two different purposes. One is the general concept of a range, and the other is an iterator over a range. Range is more of the latter, and copying iterators does slightly weird things (the behavior is contrary to some human intuition, even though the behavior is consistent), and so it was decided that Range shouldn't be Copy. But for the generic "I want a range of numbers" case, you would want that to be Copy. So there's a push and pull here. It seems like some of the latest discussion (and this has been going on for years) is that maybe this decision was wrong and even though the iterator behavior is weird ranges should start being Copy, but no decision is made yet. But when you're not doing iteration directly, imho you probably should just use a tuple of (start, end) rather than a Range, even though that may feel a bit odd.

Also: it's weird and really unfair imho that the GP is downvoted. While it is true that some folks have a different opinion, it's extremely subjective, and downvoting them for a slightly minority opinion doesn't seem right to me. I upvoted it, personally.

The “invisible stack frame” idea is genius and so elegant. I’m currently working on a file parser in Rust and have run into this issue several times now, especially when I really want to avoid heap allocation (say the references to the file contents are MB in size).

Wouldn’t anything of variable size require a heap allocation?

Right, but a Vec is basically a tuple containing a pointer to allocated memory on the heap, the capacity of the allocation, and the length of the Vec. That tuple is what lives in the stack, or in a struct in the example from the article. The tuple has a fixed size, known at compile time, so it is perfectly fine for it to live on the stack. In fact, it is also ok to move a Vec around in memory; the pointer will still point to the same allocation on the heap. The problems don’t start until you want to both borrow (take references to) individual elements stored in the Vec, and also move or resize the heap allocation. That’s where the borrow checker must step in.

Applications are open for YC Winter 2024

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact