Reference counting also adds per-element space overhead. Also linked lists are o...

volta83 · on Oct 11, 2021

I know. So? How does that change anything?

A doubly-linked list is already space inefficient, it adds 2 pointers per element.

Using Rc + Weak adds one extra word per element to store the two (strong and weak) refcounts. So for a 64-bit type, a Vec takes 1x space 1xN, a list takes 3xN, and a ref-counted list 4xN.

Performance wise, the extra space doesn't change anything, since that word is allocated with the element in the same cache line and would have been read anyways (even if you don't access it, the hardware does access it).

If you are using a doubly-linked list, you really don't care that much about any of this, and the only thing that makes a real difference, is the guarantee that your list implementation is correct.

gpderetta · on Oct 11, 2021

The additional refcount takes space that could be used for the payload that now either needs to spill into the next cacheline or needs to be compressed further.

You can still use double linked lists in high performance code, as long as the list traversal doesn't happen on the critical path (or at all).

volta83 · on Oct 11, 2021

> The additional refcount takes space that could be used for the payload that now either needs to spill into the next cacheline or needs to be compressed further.

It could, but since the cachelines are adjacents the prefetcher will pull subsequent cache lines anyways, and since the objects typically used in doubly-linked lists are very large (much larger than 64-bit), then this doesn't matter on any practical application that I am aware of. Our results of 4% performance increase is for the worst-case that we found. Usually is 1% or less.

If you have a real-world benchmark that shows otherwise, can you please share it?

AFAIK there does not exist any hardware for which the performance model for:

- non-intrusive doubly-linked lists,

- typical object sizes used on these lists (>> 64-bit)

- typical operations done on these lists (splice, etc.)

would predict a performance difference, and in our case, having replaced unsafe lists with safe list in many large applications, we never were able to measure a difference in application performance, only in the synthetic worst-case micro-benchmarks.

So I am truly interested to learn about your use case.

quotemstr · on Oct 11, 2021

You're ignoring the case of a list being densely packed and embedded inside another data structure. If you bloat a data structure with a useless machine word, there are all sorts of ways for that machine word to hurt the resulting program, and you can't just handwave that away.

The Linux kernel is made by people who spend weeks to shave one bit off the size of a data structure. Do you really think they'd respond positively to your saying they don't need to do that?

You really need to stop assuming what kind of computers other people are programming

The larger point here is you don't get to decide whether other people's use cases are legitimate. Either Rust gives you complete control of the machine or it does not. It does not, not in safe mode, but Rust people keep doing this annoying motte and bailey thing about it.

arcticbull · on Oct 11, 2021

This conversation started where folks were talking about where to begin learning Rust. Beginning by learning unsafe{} implementations of a doubly-linked list to save 1-4% of the performance isn't where I would recommend anyone begin learning.

If they do need to shave a single bit off, they can do that, that's a neat feature too.

volta83 · on Oct 13, 2021

> You're ignoring the case of a list being densely packed and embedded inside another data structure.

No, I am not. This conversation is exclusively about non-intrusive doubly-linked lists.

If you want to have a conversation about intrusive doubly-linked lists we can have it, but it is a very different data-structure.

> You really need to stop assuming what kind of computers other people are programming

Every week I touch x86, arm, ppc64, riscv and gpu assembly.

I write code for apps daily that run from 16-bit micro controllers to the largest super computers in the world, going through phones, desktops, cloud, etc.

I am not assuming what others are programming, but rather talking about what I program for, which include most hardware that anyone can buy today, and quite a bit of hardware that almost no one can even buy, as well as hardware that's not for sale.

estebank · on Oct 11, 2021

If you are concerned about high performance/space overhead/cache awareness you wouldn't want to use a linked list in the first place, instead you'd use an arena.

Jensson · on Oct 11, 2021

An arena is not a replacement for linked lists. In some cases you can use it as a replacement, but in many cases you can't. So can the idiomatic rust apologists please stop with this? I know you love it the language and wants more people to use it, but these sort of statements doesn't help your case.

"Performance is almost the same in real world cases"

"You don't really need that data structure anyway"

"You can learn how to optimize code later"

People come and want to use Rust for some fast code. They know what they want the code to do and want it to be as fast as the code they wrote before, and see how Rust would help them achieve that. The above statements are then wrong or patronizing or completely misses the point.

ATsch · on Oct 11, 2021

> The above statements are then wrong or patronizing or completely misses the point.

I think more what's more going on is that longtime Rust programmers can get frustrated by the sheer volume of people who have never used Rust but dismiss it because they hypothesize that they will need to regularly write things like linked lists in Rust just because they are the easiest data structure to write in C.

When in reality let alone the amount of high performance data structures in the standard library and crate ecosystem mean it's extremely unlikely they'll even feel to write any data structures in the first place. Programming Rust and C or C++ is a very different experience in numerous ways and it is unproductive to expect that not to be the case.

Jensson · on Oct 11, 2021

But C++ programmers have ergonomic vectors in the standard library. There is no reason to implement data structures there unless you need them for performance reasons.

And correct use of linked lists is not uncommon at all. Binary trees are linked lists for example, would you implement binary trees as a vector pointing to vectors rather than a linked list containing pointers to similar linked lists?

sanderjd · on Oct 11, 2021

Yes, I would avoid implementing my own binary tree using linked lists (unless it was just an educational exercise). But also yes, if I did decide that I had a production use case for creating my own binary tree implementation, I would be very likely to look for approaches that use contiguous rather than linked data structures.

I think this perspective you're espousing is "sometimes you just want to knock out a simple bespoke binary tree implementation", but I just don't think that use case makes sense. Either you want to use one from a good library or you have a really good reason to roll your own and it is worth the effort to do it well.

Jensson · on Oct 11, 2021

> I think this perspective you're espousing is "sometimes you just want to knock out a simple bespoke binary tree implementation", but I just don't think that use case makes sense. Either you want to use one from a good library or you have a really good reason to roll your own and it is worth the effort to do it well.

There is no library for generic tree data structures, you have to implement those yourself. There are libraries implementing a table interface using ordered binary trees but that is just one of many use cases.

sanderjd · on Oct 13, 2021

Coincidentally I was just watching this talk from Bryan Cantrill: https://youtu.be/vvZA9n3e5pc. Around the 58 minute mark he shares an anecdote that is relevant to this thread. It boils down to him doing something where he would have written a simple bespoke balanced binary tree in C, but when writing a reimplementation in Rust used a "naive" library approach and was surprised to see a 35% speedup, which boiled down to the library using a btree instead of a balanced binary tree.

This doesn't invalidate your point about there being use cases for binary trees that may necessitate you to write your own, I take you at your word that you've often had strong use cases for this, but I do think it illustrates something I believe to be true, which is that people reach for linked data structures too often, when contiguous ones are a better default.

ATsch · on Oct 11, 2021

No, I probably would use the BTreeMap in std instead, assuming I want a search tree. Which is even better, because the primary reason you use binary trees in C is that they are easy to make intrusive. But if I want a priority queue instead there's also BinaryHeap. If I wanted to store data on the branch nodes I might use intrusive_collections. Or maybe if I was dealing with a more complex graph I'd pull in a library like petgraph. If all that fails I could still arena-allocate the nodes to get around the lifetime issues. If I really absolutely wanted to, which I don't see happening any time soon, personally. I'd need a more concrete example there.

Jensson · on Oct 11, 2021

I said binary tree, not "Map interface implemented using an ordered binary tree". They are not the same thing.

ATsch · on Oct 11, 2021

I am aware, as you might be able to gather by reading past the first sentence of my comment. Of course if you'd just said what you needed the binary tree for, I could have given you a better answer.

If you don't have a use case in mind, I'd suggest just trying to actually use Rust for something instead of spending time trying to construct problems where Rust would be bad at the solution.

Jensson · on Oct 11, 2021

You can't even explain how doing something as simple as a binary tree with parent pointer is possible in rust, so my assumption is that it isn't possible and I'd have to make work arounds. I can make work arounds just fine in C++, you don't have to teach me those, but I don't see why I'd learn a language where I'd be forced to make those work arounds rather than use the version that fits my problem best.

It might be possible that you can write nice versions etc, if so I'd like to see them, but it seems everyone is hell bent on just saying "You shouldn't do it!".

volta83 · on Oct 13, 2021

> You can't even explain how doing something as simple as a binary tree with parent pointer is possible in rust, so my assumption is that it isn't possible

A beginner learns this in the first week:

    struct Node<'a, 'b> {
        left: Option<&'a Node>,
        right: Option<&'b Node>
    }

Many people don't know how to read or write, but you don't hear them every day claiming that "therefore it must be impossible", yet here we are.

> but I don't see why I'd learn a language where I'd be forced to make those work arounds rather than use the version that fits my problem best.

> but it seems everyone is hell bent on just saying "You shouldn't do it!".

The claim was and still is "there are infinitely more effective - as in faster - ways to learn Rust than by starting with how to write doubly-linked lists".

This whole thread is you failing at 3rd grader logic hard and concluding:

- therefore it is impossible to write doubly-linked lists in Rust

- therefore doubly-linked list in Rust are efficient

- therefore it is impossible to learn Rust

- ...and many other things...

I've taught Rust to a 9th year old, but I don't think I could have taught Rust to a 2 year old, so arguably, I think you are right, in that probably _FOR YOU_ it is impossible to learn Rust, you wouldn't be able to grok how to write doubly-linked lists in Rust properly, you wouldn't be able to master Rust to write efficient code in it, etc.

That's ok, don't be frustrated by it, everybody is different.

Just try to stop projecting your limitations to other people. Particularly in this forum, or when talking with Linux kernel developers, were most people are smart experienced programmers that would learn Rust well in a day and master it in a week.

volta83 · on Oct 11, 2021

The question is fair.

Why would you want to use a non-intrusive doubly-linked list?

AFAIK there is only one answer: you have very big objects and you need O(1) splice.

If you don't have very big objects, or you don't need O(1), pretty much any other data-structure in existence is going to give you much much better performance than a doubly-linked list.

For the only use case for which non-intrusive doubly-linked lists are good at, however, there exists no hardware you can buy for which you can measure a performance difference between ref-counting and not ref-counting.

This has nothing to do with Rust. The same applies to C++, or Java, or Python, or even C. You can do ref counting on any language, and all these languages run pretty much everywhere.

The only thing that Rust gives you over Java, Python, or C here is the possibility to implement a doubly-linked list in safe Rust that will perform the same but that the compiler will prove for you to be memory and thread safe.

Sure, Rust also allows you to use unsafe code, and then prove that safe and create a safe wrapper. It even does this for you and provides this in std::List. But what value does this add? It's more work, and it doesn't perform better, and why a significant number of people have argued in the past that adding std::List was a mistake in one form or another.

Jensson · on Oct 11, 2021

> Why would you want to use a non-intrusive doubly-linked list?

Or you have a list of objects with many references to parts inside the list and you want to rearrange those parts from those references at O(1) speed.

Lets say I have pointers two elements A and B in the list. I now want to say that B should come after A, I can do this easily in O(1) time with this with no overhead and all references sees this update instantly. Lets say I put integers in these so the data is embedded in the list. You have many other places referring to elements inside the list and you want those references to track the objects position.

How would you solve this using a Rust compatible data structure?

> The question is fair.

No it is not. Experienced people almost surely knows their use case better than you do.

> there exists no hardware you can buy for which you can measure a performance difference between ref-counting and not ref-counting.

Are you kidding me? Seriously, I don't see how you could believe this unless you never tried to use ref counting to replace other code and compare the performance difference.

volta83 · on Oct 11, 2021

> Lets say I have pointers two elements A and B in the list. I now want to say that B should come after A, I can do this easily in O(1) time with this with no overhead and all references sees this update instantly.

You can do this with a vector of pointers as well, at lower space complexity cost (1 pointer per element instead of two), and a lower runtime cost (1 swap of two pointers instead of swapping 4 pointers).

Such a data-structure has also other advantages, like higher cache efficiency, etc.

Jensson · on Oct 11, 2021

You can't move an element in a vector in O(1) time. Swapping isn't the same as moving, you'd have to move n elements to make room for the new one.

volta83 · on Oct 11, 2021

You mentioned changing the order of two elements in a list, such that one comes after the other.

With a vector of pointers to the elements (C++ std::vector<T*>) this can be easily done in O(1) as I explained above.

If now that I've proven you wrong, you want to change the problem to something else, feel free to state your new problem, and I'll proceed to prove you wrong again.

Jensson · on Oct 11, 2021

> You mentioned changing the order of two elements in a list, such that one comes after the other.

Putting B after A is not swapping them, it is putting the element B so it is right after A. You could technically interpret it as you did, but only a contrarian would do that. If you want more people to support rust then you should stop being a contrarian. If you want me and others to assume that the Rust community is full of hard to work with people then please continue, but that wont make people more likely to pick up rust.

If instead of behaving like you do here people would just say "Sure thing, in order to get that performance in rust you just do X and Y!" I bet people would be way more supportive of rust. But if working in the language means that people like you will come and argue like this then why not just write a C library, and then someone will write a Rust wrapper and people are happy? While if they'd write it in unsafe Rust people would come and complain like hell.

volta83 · on Oct 11, 2021

> You could technically interpret it as you did, but only a contrarian would do that.

Or you could have been more clearer.

I didn't intend any animosity, but you clearly do.

> If you want more people to support rust

I don't care whether more people support Rust or not. I am just fighting the spectacular amount of disinformation in this thread by malicious actors that have never used Rust.

> Putting B after A is not swapping them, it is putting the element B so it is right after A.

Then you should have written that instead. Take the pointer in the vector right after A, and swap it with B. That way, A is now right before B, and B is now right after A.

Jensson · on Oct 11, 2021

> Take the pointer in the vector right after A, and swap it with B. That way, A is now right before B, and B is now right after A.

This is a joke, right? I don't see how it can't be.

But in case it isn't, in almost all problems like this it is assumed that you don't want to change the order of the other elements.

For example, if you wrote a function MoveToDirectlyAfter(a, b), and it did move elements other than a or b people would say that your function contains bugs or that it doesn't do what it says.

pas · on Oct 11, 2021

In a real life system what is the usual use case for this?

Let's say a user reorders documents with drag and drop. What's the low-level equivalent? Could you help me come up with an algorithm that depends on this?

If write speed is important, I'd just use an in-memory LevelDB-like thing (so a log structured merge tree).

If the data is not much, then I'd optimize for code simplicity.

volta83 · on Oct 12, 2021

Before I delve more, are you talking about `rotate` ?

https://en.cppreference.com/w/cpp/algorithm/rotate

Is that what you want?

PaulDavisThe1st · on Oct 11, 2021

The objects don't have to be big. They have to be expensive to copy. These are related but not always identical. Objects that refer to things in the real world (or in a complex model) can be either actually non-copyable or just hard to copy.

estebank · on Oct 11, 2021

Linked lists are not good for cache locality, it has nothing to do with the language. It'd be my advice over linked lists even if you were doing it in C. Rust does make doubly linked lists in safe code without reference counting impossible, but I don't the argument that you'd use one due to performance or memory overhead.

Jensson · on Oct 11, 2021

> Linked lists are not good for cache locality

That doesn't matter if a solution with good cache locality doesn't exist for your problem.

gpderetta · on Oct 11, 2021

They are not, that's why you would use a flat layout for your primary iteration order. But for any secondary ordering/indexing you might need, if you also need fast insert/removal, chaining in a list can be an option.

jhgb · on Oct 11, 2021

> A doubly-linked list is already space inefficient, it adds 2 pointers per element.

...unless you use the XOR trick to store two pointers in one field?