Hacker News new | past | comments | ask | show | jobs | submit login

In fact, in the pursuit of eliminating memory-safety and security bugs, Rust can sometimes makes some privacy bugs more likely.

For example, in GC'd/RC'd languages, if we have several UserAccount instances and a bunch of long-running operations on them, any particular long-running operation will just hold a reference to the UserAccount itself and modify it at will without any confusion.

In Rust, the borrow checker often doesn't let us hold a reference from a long-running operation in practice, so we work around it by putting all UserAccount instances into a Vec<UserAccount>, and have our long-running operations refer to it via an index. However, we might erase and re-use a spot in that Vec, meaning the index now refers to some other UserAccount.

If the operation uses that "dangling index", it can lead to leaking sensitive data to the wrong users, or data loss.

When using Rust, one has to use discipline to avoid this bug: use IDs into a hash map, or generational indices, or Rc<RefCell<T>>. Each has its own performance hit, but that hit can be worth it to prevent privacy bugs.

In the GC'd/RC'd language, this would still be a bug, but it wouldn't cause any mixups between different users' data.

I'm not saying we should always use GC'd or RC'd languages for privacy-sensitive purposes, but one should be aware of the particular shortcomings of their tools and have proper practices and oversight in place to mitigate them.




> [...] just hold a reference to the UserAccount itself and modify it at will without any confusion

Except the confusion of data races and having multiple concurrent writers more generally. Not a hypothetical: I've worked in a large C# code base where other people had decided it was fine to just pass a bunch of references around to different long running processes, and sure enough, they ended up stomping over each other's assumptions in really dangerous ways.

Unless of course you're actually controlling access to the data somehow (mutex / read/write lock), in which case you can just use _exactly the same pattern_ in Rust... so this whole thing seems like a bit of a red herring.

> [...] so we work around it by putting all UserAccount instances into a Vec<UserAccount> [...]

No, "we" don't. That's one particular (bad) pattern you could choose, and I wouldn't even say it's an obvious alternative. If in your hypothetical alternative programming language you would have just kept a reference to the data (via GC or ref counting, as you said) then why not do exactly the same thing? `Arc` is a thing. It works just fine.

This sounds like a case of trying to come up with convoluted solutions to simple problems and thereby doing something unnecessarily bad that nobody made you do.

Rust certainly has its warts... but this isn't one of them. Rust doesn't make you do what you're describing, and you could equally choose to do the same bad design in your preferred GC language.


That's why I mentioned one needs to have discipline, to not use that particular solution.

We agree it's not the best solution. And it's easy for us to say that now, after I've spelled out why. You'd be surprised how many people don't know that this can be a problem.

Also, it amuses me that having a simple index into a Vec would be a "convoluted solution". It's the easiest solution of all the alternatives. It can also be risky for privacy.

Compare that to a GC'd language, where the easiest solution (just hold a reference) doesn't introduce privacy risks.


Arc is by far the easiest solution. You don't have to deal with indices at all.


In Rust, people tend to go for the easiest solution that works within the borrow checker, and not workarounds like Rc or Arc. Otherwise, the performance hit means there's little reason to use Rust over much easier languages.


Callinc Rc a workaround is like calling i64 a workaround. It is a type that encodes a certain contract, it's there to be used. The advantage Rust has is that for 95% of your codebase the simple code using stack allocated values and references will be as fast as it can be, and yoi opt-into the slower behavior for the remaining 5% if they are not part of your performance critical path. If it is, then you need to rearchitect things, like you would in other languages. But you can do so after meaauring.


You don't have to throw out the baby with the bathwater. A few Arcs to hold onto a Users metadata is hardly going to be in the critical path of the app.


That's where the discipline comes in: one has to know when it's okay to use the simpler approach (indices into a Vec), and when it's better to use Arcs such as to prevent privacy problems.


The point is that we disagree that indices into a Vec is the simple approach. It's much harder. It infects the entire code. Whereas just using a reference count is basically set and forget.


Storing the index of an item in a vector in an actual application (vs in some tightly bounded api of a small, purpose-built library used by said application) is highly non-idiomatic and would emit noticeable stink in any code review. It’s just too hard to remember to update all indices after removing an entry. I’ve seen this mistake made in C, C++, rust, C#, JS, and Python code bases and it is usually someone that hasn’t been around long enough to understand the implicit technical debt in certain approaches that would suggest such a thing, regardless of language.

The most obvious/naive solution along the lines you spelled out (in rust but really for any language) would be a hash map of ids/values, otherwise Rc and maybe some weak references.


That's why I said one should have the discipline and practices in place to avoid this bug.

I think we can do better than saying that indexes into Vecs are "non-idiomatic" in applications... such advice could remove much of Rust's performance advantage, and make folks wonder why we're not just using GC.

Perhaps we could instead say that if one finds themselves reusing slots in a Vec, they should instead use generational indices, hash maps, or Rc<RefCell<T>>, depending on their use case and what kind of performance overhead they'd prefer.


Better than a hash map of ids/values is to use slotmap, a library that handles everything safely for you (with generation indexing and free lists behind the scenes).

https://crates.io/crates/slotmap


To be honest, this seems like a bizarre choice. Rc<> would not only be more idiomatic, it seems like it would even be more ergonomic because you don't then have to worry about passing your whole Vec around everywhere you want to access one of its elements

Doing it with an indexed Vec is basically re-inventing your own memory management system on top of the native one, which as you point out can get very contrived and error-prone. Because then you also have to re-invent allocation/freeing, removal of holes, etc etc.

People sometimes do this in VMs/interpreters where they really do need custom/"unsafe"[1] memory management, which makes sense, but it's definitely not needed for application code like this

[1] Of course it's still memory-safe, but it's more fallible in terms of panics and bugs, as you've pointed out


Saying it's "idiomatic" isn't very actionable advice for people, the average programmer hears that and doesn't really know when to use Rc over other approaches. I also wouldn't say to always go with the most ergonomic approach, that approach can lead to code that is even slower than other languages.

I would rather advise: don't reuse indices, even if that is the simplest solution that complies with the borrow checker. When one finds themselves reusing like that, that's when to turn to other more expensive approaches such as Rc.


I don't think the average programmer would think to do the Vec indexing for this situation in the first place

> that's when to turn to other more expensive approaches such as Rc

If you're saying this was done just as an optimization... all I can say is, I hope you benchmarked first. As estebank pointed out, Rc is very fast: https://news.ycombinator.com/item?id=32240478. It can even be faster than mark-and-sweep in some situations. In fact the Swift language only uses reference-counting, not mark-and-sweep, at the language level.

If you profiled and found that the Vec approach solved some performance problem you had with reference-counting, then so be it. But I would be surprised if it meaningfully helped, and shocked if it helped enough to outweigh the extra complexity.


> When using Rust, one has to use discipline to avoid this bug: use IDs into a hash map, or generational indices, or Rc<RefCell<T>>. Each has its own performance hit, but that hit can be worth it to prevent privacy bugs.

The perf hit of generational indices/arenas is minimal, and the cost of Rc<RefCell<T>> is still lower than complex GCs without JIT.

> In the GC'd/RC'd language, this would still be a bug, but it wouldn't cause any mixups between different users' data.

I've seen that exact bug in systems written in all kinds of languages.

And for what is worth, keeping a Vec<UserAccount> in memory only works on single instance services, anything beyond that and you'd have to deal with cache invalidation as well.


Good point. This is a case where the problem may be in part people's preconceptions about how Rust "should" be used. People sometimes feel like cloning or using references is somehow cheating, but I think if people let go of that idea, it would be natural for them to choose a Rc<RefCell<T>> for these long running operations, as it mimics what GC'd or RC'd languages do.

I wonder if this error could be preventable by Rust's type system so that you can't have indexes that fall out of sync with the underlying Vec. It definitely wouldn't be backward compatible, so it would have to a new type built on Vecs. Although like another reply to your comment points out, keeping indexes is unidiomatic (in many languages) and would probably cause other bugs, which would hopefully prompt a developer to rethink it, so maybe an abstraction on top of Vec isn't needed.


> Each has its own performance hit but that hit can be worth it

Rust's focus on making things explicit has a tendency to make programmers want to remove every clone and allocation and make a mess of lifetimes and references in the process.

Just use Arc<Mutex<UserAccount>>. Clone freely. Box things. It makes programming so much easier, and performance will still match or exceed other languages. GC languages do the same things, but implicitly.


GCs rarely do something as expensive as Arc.


Barring a JIT, either they do or they have thread unsafety.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: